In a
nutshell, the what, why, and how behind Prediction Games (PGs):
- What? PGs research investigates systems that learn a
growing network of hierarchical concepts (patterns, such
as probabilistic finite-state machines) in an
unsupervised cumulative fashion.
- Why? (why learn higher-level concepts?) Beginning at a
low level 'starter kit' (eg pixels or basic shapes in the
visual modality, or more generally some basic finite set of
patterns) and building and adapting one's own higher level
useful combinations, depending on what one encounters in
one's life, is often more flexible and powerful than
preprogramming the higher levels: it is impossible to
anticipate what will be experienced.
- How? A few current salient ingredients of the solution approach
are: (1) Improving at prediction is the major unifying goal,
and (1.1) concepts predict one another, serving as both
predictors and
predictands in the system (an attractive
symmetry); (2) concepts are made up of one another (legos);
(3) the system learns by repeatedly interpreting
(interpretation is imposing or projecting one's concepts
onto the lowest level input) (4) a systems
approach, meaning several learning/inference processes
working together (not a single algorithm!).
The techniques have become increasingly probabilistic and
information theoretic in recent years. My goal is to show
feasibility and utility of the approach. A source of power
but also a challenge is that
such systems consume what
they produce (i.e., concepts). Below, I motivate
the PGs approach from a cognitive and developmental point of
view (under the constraints of computational and learning
efficiency).
Background Motivation. Human intelligence appears to require
many concepts (that work well enough together), for instance
to achieve daily 'common sensical' behavior. There are two
distinct but inter-related problems here, in the ways I
have understood and approached the issues:
- [Time snapshot] How does one, efficiently and adequately,
figure out (mostly unconsciously) which of one's many concepts
are useful at a given situation (for instance when looking at a picture)?
- [Historical/Developmental] Where do these many concepts
come from, in the first place? (and where are they going?!)
A few other ensuing or related questions include: what
is a concept? (answer: a
recurring/reusable structure.. at
least for the perceptual stages.. see below!) and, for me, how
can machine learning (ml), and more broadly, computational
thinking, help? (eg with the wealth of learning and inference
techniques and theories that continue to be developed)
Cognitive scientists tell us that most concepts (such as: water, chair,
a house, my house, mind, mad, ...) develop
over time, in a sequential and cumulative
manner. This development appears to be largely unsupervised,
ie no explicit teacher. By the time one shows signs of
learning a language, if concepts are to be useful, many such
must have already been developed to some extent, as the child
already knows and can do a lot! In the mid 2000s, I proposed
Prediction Games to develop and study such
learning systems. In PGs, a system composed of
multiple learning and inferencing parts, given its input stream
(broken into episodes), plays the game of prediction
on it. The system tries to get better at the prediction task
over time: predicting more extensively, into the future or into
space, with possibly less (becoming a faster and more
powerful predictor). This has plausible survival advantage! Thus,
prediction, of one's world, could be a unifying task: could it be
sufficient for providing the feedback needed to achieve the
conceptual complexity of humans (assuming learning is the main
vehicle of development)? But of course, one needs much more details.
In order to get better at prediction, the PGs system keeps
expanding its hierarchical networked vocabulary of
concepts. In PGs, concepts are both the predictors and
the predictands (ie the targets of prediction). This
symmetry is a major draw of this approach to
me.
Furthermore, concepts not only predict one another but are also
built out of one another,
akin to Lego pieces. This is the cumulative
(constructivist!) part of the approach. To
start the whole process of learning, the system is given
an initial set of primitive
(innate/hardwired) concepts (an alphabet, a finite
discrete set), with the capability to break its raw sensations
in an episode into those primitives. So each episode begins with
a sequence (or string) of primitives in the input buffer. In
order to predict better, the system separates and puts together
its buffer contents (segments) and maps the chunks into its
current most useful (highest level) concepts. I have termed this
process (of structuring ones input with one's concepts) interpretation (prediction and
interpretation are intertwined). By practicing many
interpretations, over many episodes, the system figures out which
concepts (new and old) go together: predict each other, and
could perhaps be joined to make larger more useful (higher
level) concepts. Thus concepts correspond to hierarchical
structured patterns (such as finite state machines), and PGs
involve a
continual self-supervised and cumulative online learning
process for learning more and more concepts and their evolving (prediction) relations.
I currently believe that
my
research on PGs is most relevant to learning in perception.
Over the years, I have built a few versions of such systems
that play the game in one dimensional text (see the pointers
below). There are many challenges: How do we avoid
combinatorial explosion (because such systems work with
explicit structures)? These systems make local decisions, eg
in determining which concepts to activate in a given episode,
or which concepts to join to make composite concepts, based on
noisy indirect information. There is much uncertainty, in
particular uncertainty on top of uncertainty (reminiscent of a
house of cards!): can the system build a robust networked
hierarchy of concepts (without the errors in earlier decisions
compounding)? How do we design algorithms and objectives so
that learning can go on and not get stuck in poor local
optima? When or why should such systems succeed over time?
Issues of (code/engineering) complexity and how to control and
understand the dynamics (of subsystems interactings): for
instance, how biased will such a system, which is learning
sequentially, be? (what it learns now affects its future
choices and its future learning) There are many problems and
subproblems to be discovered and defined here. Along the way,
problems of philosophical nature arise too, as the system is
building its own (biased) reality in a sense (so then, is
there 'one unique objective truth out there' or are we closer
to idealism?) (relevant philosophies: phenomenology of
perception, constructivism in epistemology, semiotics,
relevance realization, ...). I think this has made for a good
long-term research project!
Our (human) thoughts are built on concepts (and embodied!). A few
references:
- The Big Book of Concepts. Gregory L. Murphy, MIT press, 2002.
- Philosophy in the Flesh: the Embodied Mind and
its Challenge to Western Thought. George Lakeoff and Mark Johnson, Basic Books, 1999.
- Surfaces and Essences: Analogy as the Fuel and Fire of Thinking. Douglas R. Hofstadter and Emmanuel Sander, Basic Books, 2013
[More material (pointer and papers) to go here. ]
PGs
Publications. My work on PGs began with the problems of
learning under large and growing output spaces (classes,
concepts, or just items!), and that line of work continues to
advance (those ideas led to PGs, and now PGs drive them!). So I
am breaking the PG-related papers into two columns below, in
reverse chronological order, plus a brief explainer for each
selected work:
On the PGs approach/systems:
Towards Understanding and Developing Open-Ended
Intelligences for Infinite Worlds, in BICA 2025.
Includes a short review of the latest ideas
(what concepts and interpretations are), a few related
philosophies (such as semiotics and constructivism),
a conceptual comparison with deep language
models, and possible applications of PGs.
An
Information Theoretic Score for Learning
Hierarchical Concepts, 2023, (focuses on and
further develops CORE, but describes the system as
well) In Frontiers in Computational Neuroscience,
special topic on Advances in Shannon-based
Communications and Computations Approaches to
Understanding Information Processing in the
Brain.
Expedition: A System for the Unsupervised Learning
of a Hierarchy of Concepts , arXiv 2021
(revival of PGs after a ~10 year hiatus! Introduces a version of
CORE=COherehce+REality, a measure of gain in
information, useful for concept use, ie
interpretation/inference). We have a good
candidate objective now! And PGs become much more
probabilistic and information theoretic.
Systems
Learning for Complex Pattern Problems, BICA
2008: We may need systems, composed of
multiple parts (eg ml algorithms) interacting over
long periods, for improving at perception
(and, specially once we add control, because of
feedback, understanding the development of such
systems is an interesting
challenge).
Prediction Games in Infinitely Rich Worlds, 2007
(AAAI
position paper ,
Longer
technical report on the basic idea/philosophy and
various motivations/considerations/challenges. )
Selected papers on the prediction sub-problem (online and open-ended, non-stationary, ..):
- Tracking
Changing Probabilities via Dynamic Learners, arXiv
2024 (formalizes open-ended probability prediction and advances sparse EMA and counting techniques
for the task; please
see my page
on Sparse Moving Averages)
-
Efficient Online Learning and Prediction of Users' Desktop
Actions., IJCAI 2009 (on non-stationarity, continual
learning, or pure online learning, and personalization; uses Sparse EMA).
- On
Updates that Constrain the Features' Connections
During Learning, ACM KDD, 2008 (further focus on
types of weight updates that keep the number of
connections small; introduces sparse EMA).
-
Learning When concepts Abound, JMLR 2009 (online and open-ended, and just a
pure weighted index, no prototypes any more! (for the prediction part)).
- Index Learning: Recall Systems:
Efficient Learning and Use of Category Indices, AISTATS, 2007 (index into concept prototypes, many-class learning).