[under construction] Human intelligence appears to require
many concepts (that work well enough together), for instance
to achieve daily 'common sensical' behavior. There are two
distinct but inter-related problems here, in the ways I
have understood and approached the issues:
- [Time snapshot] How does one, efficiently and adequately,
figure out (mostly unconsciously) which of one's many concepts
are useful at a given situation (for instance when looking at a picture)?
- [Historical/Developmental] Where do these many concepts
come from, in the first place? (and where are they going?!)
A few other ensuing or related questions include: what
is a concept? (answer: a
recurring/reusable structure.. at
least for the perceptual stages.. see below!) and, for me, how
can machine learning (ml), and more broadly, computational
thinking, help? (eg with the wealth of learning and inference
techniques and theories that continue to be developed)
Cognitive scientists tell us that most concepts (such as: water, chair,
a house, my house, mind, mad, ...) develop
over time, in a sequential and cumulative
manner. This development appears to be largely unsupervised,
ie no explicit teacher. By the time one shows signs of
learning a language, if concepts are to be useful, many such
must have already been developed to some extent, as the child
already knows and can do a lot! In the mid 2000s, I proposed
Prediction Games (PGs) to develop and study such
learning systems. In PGs, a system composed of
multiple learning and inferencing parts, given its input stream
(broken into episodes), plays the game of prediction
on it. The system tries to get better at the prediction task
over time: predicting more extensively, into the future or into
space, with possibly less (becoming a faster and more
powerful predictor). This has plausible survival advantage! Thus,
prediction, of one's world, could be a unifying task: could it be
sufficient for providing the feedback needed to achieve the
conceptual complexity of humans (assuming learning is the main
vehicle of development)? But of course, one needs (much) more details.
In order to get better at prediction, the PGs system keeps
expanding its hierarchical networked vocabulary of
concepts. In PGs, concepts are both the predictors and
the predictands (ie the targets of prediction). This
symmetry is a major draw of this approach to
me.
Furthermore, concepts not only predict one another but are also
built out of one another,
akin to Lego pieces. This is the cumulative
(constructivist!) part of the approach. To
start the whole process of learning, the system is given
an initial set of primitive
(innate/hardwired) concepts (an alphabet, a finite
discrete set), with the capability to break its raw sensations
in an episode into those primitives. So each episode begins with
a sequence (or string) of primitives in the input buffer. In
order to predict better, the system separates and puts together
its buffer contents (segments) and maps the chunks into its
current most useful (highest level) concepts. I have termed this
process (of structuring ones input with one's concepts) interpretation (prediction and
interpretation are intertwined). By practicing many
interpretations, over many episodes, the system figures out which
concepts (new and old) go together: predict each other, and
could perhaps be joined to make larger more useful (higher
level) concepts. Thus concepts correspond to hierarchical
structured patterns (such as finite state machines), and PGs
involve a
continual self-supervised and cumulative online learning
process for learning more and more concepts and their evolving (prediction) relations.
I currently believe that
my
research on PGs is most relevant to learning in perception.
Over the years, I have built a few versions of such systems
that play the game in one dimensional text (see the pointers
below). There are many challenges: How do we avoid
combinatorial explosion (because such systems work with
explicit structures)? These systems make local decisions, eg
in determining which concepts to activate in a given episode,
or which concepts to join to make composite concepts, based on
noisy indirect information. There is much uncertainty:
uncertainty on top of uncertainty (reminiscent of a house of
cards!): can the system build a robust networked hierarchy of
concepts (without the errors in earlier decisions
compounding)? How do we design algorithms and objectives so
that learning can go on and not get stuck in poor local
optima? When or why should such systems succeed over time?
Issues of (code/engineering) complexity and how to control and
understand the dynamics (of subsystems interactings): for
instance, how biased will such a system, which is learning
sequentially, be? (what it learns now affects its future
choices and its future learning) There are many problems and
subproblems to be discovered and defined here. Along the way,
problems of philosophical nature arise too, as the system is
building its own (biased) reality in a sense (so then, is
there 'one unique objective truth out there' or are we closer
to idealism?) (relevant philosophies: phenomenology of
perception, constructivism in epistemology, semiotics, ...). I
think this has made for a good long-term research
project!
Our (human) thoughts are built on concepts (and embodied!). A few
references:
- The Big Book of Concepts. Gregory L. Murphy, MIT press, 2002.
- Philosophy in the Flesh: the Embodied Mind and
its Challenge to Western Thought. George Lakeoff and Mark Johnson, Basic Books, 1999.
- Surfaces and Essences: Analogy as the Fuel and Fire of Thinking. Douglas R. Hofstadter and Emmanuel Sander, Basic Books, 2013
[More material (pointer and papers) to go here. ]
My work on PGs began with the problems of learning under large
and growing output spaces (classes, concepts, or just items!),
and that line of work continues to advance (those ideas led to
PGs, and now PGs drive them!). So I am breaking the PG-related
papers into two columns below, in reverse chronological order, plus a brief
explainer for each selected work:
On the PGs approach/systems:
An
Information Theoretic Score for Learning
Hierarchical Concepts, 2023, (focuses on and
further develops CORE, but describes the system as
well) In Frontiers in Computational Neuroscience,
special topic on Advances in Shannon-based
Communications and Computations Approaches to
Understanding Information Processing in the
Brain.
Expedition: A System for the Unsupervised Learning
of a Hierarchy of Concepts , arXiv 2021
(revival of PGs after a ~10 year hiatus! Introduces a version of
CORE=COherehce+REality, a measure of gain in
information, useful for concept use, ie
interpretation/inference). We have a good
candidate objective now! And PGs become much more
probabilistic and information theoretic.
Systems
Learning for Complex Pattern Problems, BICA
2008: We may need systems, composed of
multiple parts (eg ml algorithms) interacting over
long periods, for improving at perception
(and, specially once we add control, because of
feedback, understanding the development of such
systems is an interesting
challenge).
Prediction Games in Infinitely Rich Worlds, 2007
(AAAI
position paper ,
Longer
technical report on the basic idea/philosophy and
various motivations/considerations/challenges. )
Selected papers on the prediction sub-problem (online and open-ended, non-stationary, ..):
- Tracking
Changing Probabilities via Dynamic Learners, arXiv
2024 (formalizes open-ended probability prediction and advances sparse EMA and counting techniques
for the task; please
see my page
on Sparse Moving Averages)
-
Efficient Online Learning and Prediction of Users' Desktop
Actions., IJCAI 2009 (on non-stationarity, continual
learning, or pure online learning, and personalization; uses Sparse EMA).
- On
Updates that Constrain the Features' Connections
During Learning, ACM KDD, 2008 (further focus on
types of weight updates that keep the number of
connections small; introduces sparse EMA).
-
Learning When concepts Abound, JMLR 2009 (online and open-ended, and just a
pure weighted index, no prototypes any more! (for the prediction part)).
- Index Learning: Recall Systems:
Efficient Learning and Use of Category Indices, AISTATS, 2007 (index into concept prototypes, many-class learning).