What we’re about
ChatGPT 4o has impressed all of us with its abilities.
Yet, seeing the vast amount of data and parameters necessary to get this far, it has become apparent that Yann LeCun is right: AGI is not going to come from these Transformer-based approaches.
This working group intends to pursue alternative approaches, ones that are inspired by neuroscience. The human cortex contains 150,000 copies of the cortical column. Each one of these is able to model objects and execute motor commands. How audacious are we to think that essentially one brain (ChatGPTx) should be able to match the combined efforts of 150,00.
Why Transformers Are Not The Answer
Current Transformer-based approaches extract too little from the vast information stream.
So much training data is needed to cope with the weak inference-only use that makes such weak use of the vast datasets.
The human brain, somehow, is able to achieve these functions with a vastly smaller dataset.
How is it able to do that?
Unlike LLMs, the human brain is invested in accumulating and building a model of the world.
The human brain performs its predictions by noticing that a newly-arriving sequence happens to match something that happened in the past. The prediction comes from seeing what the next sequence was in the past.
A happy side effect of this process is learning things.
The edge-detection behavior of retinal neurons leads to a key trait: sparse representations. Neurons in this context only fire when they detect something novel, a change–not more of the same.
When the retina detects an edge, it only saves the outline–not the fill in. This allows for a sparse representation.
Let’s consider the process needed to recognize the word: “P”, “e”, “a”,”c”,”h”, when the letters are presented one at a time.
The first letter seen is the “P”. A limited number of ‘dots’ in the retina would be activated by seeing a “P”.
The square that contains the “P” may have 20 dots and together they represent a “sequence”.
An instant later, the second letter “e” appears and it leads to another sequence of dots that together are a sequence.
The sequence for the “P” is connected (basal dendrite) to the next item in the sequence “e”.
So the sequence of dots are like slots in a one-dimensional array.
The first time these letters are encountered, they are really:
sequence of darkened bits (“P”) → basal dendrite → next sequence (“e”) → basal dendrite → next sequence (“a”) → basal dendrite → next sequence(“c”) → basal dendrite → next sequence(“h”).
So, in humans, as new sequences flow in, we look to see if they are familiar. If so, the brain “cheats” and looks ahead in the similar set of sequences from the past.
Obviously, this is a widely different approach than is used in most of today’s naive “AI” implementations.
In summary, the true AGI we seek will not come from Transformer-based approaches.