THINKING ABOUT THINKING

My fifty Year Thought Experiment

John Searle uses his famous 'Chinese Room' to refute
the idea that a computer program, which correctly
answers questions about a story, can actually
understand the story. He imagines himself being
locked in a room while stories and questions about them
written in Chinese are passed into the room. Without
knowing a word of Chinese, he follows a set of
instructions comparable to the algorithms in the
computer program to produce the correct answers which
are then disgorged from the room like the output of the
"understanding" computer. Since he does what the
computer does without understanding the story, the
computer doesn't demonstrate an understanding either.

I have long been concerned with the other side of the
equation. How do people know they understand
something? Many years ago as a student of philosophy I
was very much occupied with the question of just what
was in my mind when I seemed to be having a thought.
Like Hume searching for the self and continually
stumbling on to perceptions, I went searching for
thoughts and kept stumbling on to words. Aside from
the words there is the functionality. There is also
the sense of understanding, the confidence that if one
pursues the words further it will lead to the proper
associated thoughts (Words and functionality, of
course, are precisely what the computer program is
designed to handle). Since a thought seemed so noble,
and words so mundane this apparent reduction of
thoughts to words was not very satisfying.

It occurred to me that one way to determine the
dependence of thought on language would be to change
the language. The biggest change I could think of
would be to go from English to written Chinese since it
was not necessarily tied to any sound, as is written
English.

I had no serious intent to act
on this idea, but one day I was
browsing the shelves in the
university library, when I
happened to pull out a book and
open it up to a page of 12 big
beautiful Chinese characters
introducing the first lesson in
an introductory Chinese reader.
My casual thought experiment
came back to me, and I was soon
in its grip. Reaching out more
or less at random and grabbing
that book proved to be the most
important accident of my life.

Now it is a thought experiment which has lasted over 50
years, and has at several junctures accidentally
rechanneled the direction of my life. With this kind
of impact the philosophical fruits seem rather had long
seemed rather modest. I realize now that what I was
originally seeking was the phenomenolgy of thought. I
never did satisfy my self on that score. I am still
puzzled by just what an abstract thought consists of,
beyond the words (or numbers and figures) and the
functionality. How do I know I understand an abstract
statement, or the concepts in a story, other than by
examining the functionality of thought?

If I didn't find just what I was looking for, I did
find a great deal of what I was not looking for. Over
a period of 50 years, this thought experiment and its
impact on my life have directly and indirectly provided
many varied experiences which afforded numerous
unexpected clues on the functionality of thought.

In the past three or four years the pieces seem to have
fallen into place, and I am now prepared to sum up the
results of my 50 year thought experiment with one
simple statement: We see with our memory! As we shall
see, this turns out to be essentially equivalent to
another simple statement: All efficacious knowledge
depends on "overlearning" through the continuous acquisition of
"knowledge atoms".

What does this mean? First of all I use the word see
to mean both to perceive, and to understand, as in "I
see". All of our cognitive capacities rest upon a
massive memory store that we are largely unaware of
deriving from our entire perceptual history. I refer
to this as "unattended knowledge." The following
experiment can clarify the basic idea. Fill a small
coffee can with pennies, and leave it sitting on your
desk. Later casually ask a visitor to hand it to you.
Very likely your visitor will have to make two attempts
to pick up the coffee can. Why? Every time we pick up
anything, open a door, close a window, bite off a piece
of bread, brush our teeth, etc. we make an estimate of
the amount of force we should apply, usually unawares.
Based on what? Common sense? It is actually based on
a portion of the millions of pieces of unattended
knowledge, continually being added to our memory store
(or subtracted from it). The reason we don't gouge our
scalp or drop our comb every time we comb our hair is
not just that we have learned to comb our hair, but we
have stored in our memory the feel of the proper forces
for picking up a comb and pulling it through our hair.
Of course, we can be tentative, and react to the
resistance to our cautious effort, but we would never
get through the day without the more usual quick and
confident effort afforded by that massive memory store.

DO WE REALLY SEE WITH OUR
MEMORY? Everyone who would
like a privileged peek at the
workings of his own brain
should see this movie. The
hero is a blind man. The
heroine is his girlfriend who
led him to a doctor who
restored his sight. But after
his physical tools of sight
were restored he still remained
very confused about the world
before his eyes. He had never
learned to see. It was not
that he couldn't understand and
interpret what he saw. His
brain couldn't take all those millions of 1's and 0's
streaming into his head every second and form them into
meaningful shapes and perspectives. As we shall soon see
this is exactly what my model predicts.

Now what have I said, beyond simply "Gee Whiz! we sure
have a big memory!"?

I will try to clarify this idea and establish its
significance in three ways. First I will take a look
at the final piece of the puzzle, which seemed to pull
it all together for me. This is a small neural net
simulation I have developed, which seems to provide an
excellent model of the concept of learning and thinking
that I have arrived at. This should allow me to make a
very precise definition of my position.

Then I will go back and retrace the steps of my thought
experiment which led me to that position. Along the
way I will point out other conventional or commonly
accepted positions which must be in error if I am
right, or perhaps I should say where I must be in error
if they are right. I will also point out certain
potentially productive cognitive experiments which are
implied by my position.

One evening a few years ago my son Eric was home for
Christmas, while he was studying for his doctorate in
Computer Science and Berkeley. We sat down at the
kitchen table and he walked me through the Hopfield
algorithm for implementing a simple associative memory
with a neural net. This is based upon that exceptional
leap of insight called the Hebb learning rule. Hebb
speculated that when two neurons "fired" during the
same small time frame the synapses between them would
be strengthened. This would increase the likelihood
that subsequent discharges of one neuron would excite
the other. According to Steven Rose, the "idea of
modifiable, or Hebb, synapses" remains the basis for
most current memory research.

In this version of the Hopfield neural model, a pattern
is input as a vector in which each element is a 1 or a
-1. Each element stimulates a neuron, aand each neuron
synapses with every other neuron. The product of the
input vector with itself gives a synaptic weight
matrix. For multiple patterns the weights are summed
into a single matrix, so in effect the patterns are
stored in the same space, not as a unique code, but as
a perturbations to the existing mix.

The mathematical details of this process are not important
but the results are very interesting.

In one experiment we have three patterns:

We convert each ten by ten pattern into a binary vector
where the * becomes 1 and the space -1.

After we input the first pattern the upper left corner of
our 10,000 element weight matrix looks like this:

The second and third patterns produce the
following:

All three patterns are thus blended into the same mix.
Now we want to present the network with a stimulus
representing a part a pattern and have the system
retrieve the full pattern from the matrix, as for
example

We convert this pattern to a 100 element vector. To
obtain each bit in the output vector we take the
product of the stimulus vector and the corresponding
column in the weight matrix. A positive product
produces a 1. A negative product a -1. The output
vector is fed back into the system and processed as a
new stimulus. This continues until there is no change
in successive outputs, or until we have completed 8
iterations. Converting the output vector back to a ten
by ten pattern we do indeed retrieve the original
patterns. In other words all three patterns are
preserved together in the same space. We can put more
than 3 patterns in the weight matrix, but then we have
to increase the strength of the stimulus to succeed in
retrieving them.

The model becomes interesting when we make various
extensions to probe its versatility. Just as the brain
can often continue to function after massive trauma, we
can often zero any quadrant of the weight matrix
without significant loss of performance. Losing half
of the weights usually will not cripple it.

A more interesting extension is to add an element of
randomness to the inputs in creating the weight matrix.
That is, we take the product of each element
multiplication, a 1 or -1 and multiply that by a weight
factor. If the weight factor is 1 we have our original
model. However for each element product we can
generate a random number between zero and one and use
that for the weight factor. Our model still works
quite properly.

The model becomes quite realistic when we divide the
weight factor into a constant part and a random part.
For example, if constant part is set equal to -1 and
the random part set equal to 2 our model degenerates
into total impotence since the expected random value is
1 and the expected net value is 0. However if the
random part is set equal to slightly more than 2, say
2.2 the expected net value becomes 0.1. The model at
first appears to be impotent. However, by repeatedly
teaching each pattern, the patterns are slowly learned.
Thirty learning experiences produce significant recall
capability. Forty produce more. With sixty the
learning process seems to be complete. However, even
after perfect recall has been achieved, performance can
improve with continued learning.

For example, in one experiment we taught the system
five 8 by 8 patterns representing the letters A, K, O,
X and V. The stimuli consisted of degraded versions of
the same patterns. After 60 learning experiences all
patterns could be recalled perfectly. At that point it
took five iterations to retrieve the original X from
the degraded X. After eighty learning experiences it
took only three iterations. Thus, the more experienced
the net, the faster it can "see". We not only "see
with our memory", we see faster with the stronger
memory trace.

By reducing the expected net value of the randomized
weight factor, we would greatly increase the number of
learning experiences required, but we would not change
the final capability achieved.

This modified randomized Hopfield net with very low
expected net weight seems to provide a very good model
of the process by which we learned the elements of our
native language. That is, in the first few years of
our lives we heard the various common words of English
hundreds, thousands or tens of thousands of times, and
we seemed to be learning nothing. Then it seemed like
overnight we were bubbling forth with dozens, hundreds,
and then thousands of words. In the same way our model
is stimulated dozens or hundreds of times, remaining
quite incompetent, until all of a sudden, the
connective weights grow strong enough to produce an
effective associative recall of more and more of the
learned patterns.

My observations of language formation over the years
have long ago led me to believe that this quasi
invisible learning process is going on throughout our
waking lives. When we don't seem to be learning we are
slowly strengthening our hold on things in a way that
renders our grasp of them eventually efficacious. As I
would say now, we are slowly strengthening the neural
connections by which our memories are implemented.
Conversely, when we seem to be learning with great
rapidity we are ringing the changes on this slow
unattended learning. When we seem unable to learn
something we should, we are likely confronting a hidden
deficit of unattended learning. The issue is not
capability to learn but time to learn.

If we are continuously learning throughout our lives,
of course we are also losing knowledge throughout our
lives. Another extension of the Hopfield model allows
us to randomly disconnect the synapses. A fifty
percent destruction of connections will usually degrade
performance only slightly. Seventy-five percent
destruction will strongly degrade retrieval of about
half the patterns. Full retrieval of the others will
take more iterations, which seems correspond with the
delayed reaction time of those whose loss of neurons
has continued through the generations. Also this model
would tend to indicate that the physical growth of
neural connections does not represent a method of
recording memories, but rather a way of increasing
memory capacity.

A final extension of my Hopfield model seems to provide
the strongest reinforcement of the results of my
overextended thought experiment. Over the years I have
come to the growing realization that the human brain
must have a very powerful facility for distilling
efficacious knowledge from the chaos of experience.

We almost never hear all we think we hear when
listening to normal give and take in our native
language. We are always filling in the missing pieces.
How then did we learn it. Did we only learn what was
perfectly articulated? Or is it possible to learn to
speak perfectly in our native language while never
perfectly hearing it spoken? The answer seems to be
that not only is it possible, but for a large part of
our language that is just what happens. We are not
talking about hearing imperfect speech. We are talking
about hearing correct speech imperfectly. This
distinction would not be important in a single
experience, but vastly important over time.

We extended our Hopfield net to model this situation by
adding an option which permits the random blurring of
the inputs during the teaching process. In a typical
experiment we set the expected bit loss at fifty
percent. Thus at no time in the teaching process does
the net "see" more than a fraction of the bit pattern
representing each letter. However, over a large number
of teaching instances the net easily learns each
pattern perfectly. Of course this could be thought of
as a special case of the randomized learning described
above.

However, it corresponds so closely with my observations
of language formation over the years that I am inclined
to attach a fair amount of significance to it. Neural
logic based on the Hebb rule is a process of summing
inputs, both in learning and recall. It appears to be
the case that the capacity to create efficacious
knowledge from the extended summation of degraded
inputs is inherent in such logic, and such appears to
be the very essence of language formation as I have
observed it.

The Hopfield neural model as here extended should allow
me to summarize the results of my thought experiment
thus:

There is no specific code in the brain by which
specific thoughts or memories are encoded. The actual
encoding that gives rise to any memory will change as
other memories are added or subtracted. Rather than
storing specific memories the brain stores knowledge
atoms, from which specific memories can constructed.
Knowledge atoms are not efficacious by themselves
but only in conjunction with everything else that
happens to be in the neural mix. In reality, each
memory trace is basically a perturbation of a larger
mix of memory traces.

A knowledge atom is simply the smallest unit that can
efficaciously contribute to a memory recall. Leaning
toward the connectionist theory we will say that it is
the smallest unit capable of altering the strength of
the synapse in such a way as to promote or inhibit the
passage of neural signals.

Eric Kandel receives the
Nobel Prize for Physiology or
Medicine from the King of Sweden
in December, 2004.
The isolation of the memory
trace on the level of the
synapse is one of most exciting
and challenging scientific
efforts of the age. The
discoveries of Eric Kandel and
other researchers such as Steven
Rose, Daniel I. Alkon all seem
to point to memory traces which
would in some way or another
tend to produce an amplified
release of neurotransmitters
on the level of the synapse, given the proper stimulation.
Which is to say, this research gives support to the Hebb rule.
This synaptic alteration will likely turn out to be quasi analog,
more or less continuous and certainly random in strength to a
large degree. I think that whatever is ultimately discovered at
the synaptic level it will justify the use of the word knowledge
atom to refer to refer to knowledge acquisition which occurs
widely distributed in tiny amounts which are accrued over time
and become gradually or suddenly efficacious depending on what
else is in the neural mix.

In any event the individual knowledge atoms produced by
one specific experience will lose their original
identity in the neural mix as new experiences impact
the mix. This means that different "pieces of
knowledge" will share many of the same knowledge atoms.
Also many knowledge atoms will be needed to produce a piece of
knowledge. Thus there will blurring of the boundaries
between such. We will use the term piece of knowledge
to refer to any symbol, experience, or fact, etc.
that can be recalled from memory with potential
efficacy.

The point is that our hold on a piece of knowledge can be weakened or
strengthened as the number of knowledge atoms which produce it are
increased or decreased. This variable strength of the
elements of our knowledge base is effectively an
additional dimension of our intelligence, and
contributes massively to our cognitive versatility, and
our ability to respond quickly to more important
matters

We do not experience knowledge atoms and we do not normally
experience the differing strengths of our various pieces of knowledge.
A newly learned loosely held word, once recalled, comes
into our consciousness, much like all the others.
However, the strength of our pieces of knowledge or sets of pieces of knowledge
determines the efficaciousness of that knowledge.
These considerations lead us to what we will call
The Fundamental Theorem of the Theory of Knowledge Atoms

Understand this and with a little effort you will be able to
understand almost anything, and therefore learn almost
anything. You can always go on to the next level by
reinforcing what you already know.

................................................

Some might assume that this need
for continuous reinforcement of
previously acquired knowledge in
order to advance only applies to
people who are not very bright.
Hardly. Newton's recollections
of his struggles with the new
analytical geometry of Descartes
says otherwise. He kept
starting over on page 1, plowing
ahead until totally stumped.
Each time he would get a little
farther. Finally 'La geometrie'
was his. He was not only
figuring out a little bit more
each time through the book,
he was also strengthening his hold on what he
had already come to understand. Without a teacher his
understanding was slower but stronger with this forced
overlearning. So one of the most powerful intellects in all
history provides a great illustration of our memory model and
our theory of learning as well as princely lesson of how almost
anyone can raise his or her functional IQ with repeated
refinforcement of acquired knowledge.

I doubt if the genius at observation
displayed by Darwin during the voyage of the Beagle welled up
from his DNA. It had more to do with the thousands of hours
devoted to his famous passion for collecting insects.

................................................

We see with our memory. We hear with our memory. We understand
with our memory. But does the strength of our memory hold really
determine the amount of sensory data we need in order to hear
something or to see something?

Twenty odd years ago, before I needed glasses to read English, I
found I needed glasses to read Chinese. With English I was
filling in the missing pieces. With Chinese I had to see more in
order to see. Then I thought back to an introductory Chinese
course I taught in the l950's. I once played a record in which a
sentence would be spoken once in English an then twice in
Chinese. The students were sure their problem in getting it was
the poor quality of the record. How about the English? That was
just fine! The quality, of course, was the same. Often I have
been able to "hear" an English conversation nearby, when I
couldn't "hear" a Chinese conversation at my own table. By the
same token I can "hear" a Chinese conversation better than a
French conversation.

................................................

So to get your complete copy of
THINKING ABOUT THINKING
send
$5.00 to
Always Learning
P O Box 2267
Gaithersburg, MD 20886-2267