cards

Prediction Games (PGs)

(in Infinitely Rich Worlds!)

[under construction] Human intelligence appears to require many concepts (that work well enough together), for instance to achieve daily 'common sensical' behavior. There are two distinct but inter-related problems here, in the ways I have understood and approached the issues:

  1. [Time snapshot] How does one, efficiently and adequately, figure out (mostly unconsciously) which of one's many concepts are useful at a given situation (for instance when looking at a picture)?1
  2. [Historical/Developmental] Where do these many concepts come from, in the first place? (and where are they going?!)
A few other ensuing or related questions include: what is a concept? (answer: a recurring/reusable structure.. at least for the perceptual stages.. see below!) and, for me, how can machine learning (ml), and more broadly, computational thinking, help? (eg with the wealth of learning and inference techniques and theories that continue to be developed)

Cognitive scientists tell us that most concepts (such as: water, chair, a house, my house, mind, mad, ...) develop over time, in a sequential and cumulative manner. This development appears to be largely unsupervised, ie no explicit teacher. By the time one shows signs of learning a language, if concepts are to be useful, many such must have already been developed to some extent, as the child already knows and can do a lot!2 In the mid 2000s, I proposed Prediction Games (PGs) to develop and study such learning systems.3 In PGs, a system composed of multiple learning and inferencing parts, given its input stream (broken into episodes), plays the game of prediction on it. The system tries to get better at the prediction task over time: predicting more extensively, into the future or into space, with possibly less (becoming a faster and more powerful predictor). This has plausible survival advantage! Thus, prediction, of one's world, could be a unifying task: could it be sufficient for providing the feedback needed to achieve the conceptual complexity of humans (assuming learning is the main vehicle of development)? But of course, one needs (much) more details. In order to get better at prediction, the PGs system keeps expanding its hierarchical networked vocabulary of concepts. In PGs, concepts are both the predictors and the predictands (ie the targets of prediction). This symmetry is a major draw of this approach to me.4 Furthermore, concepts not only predict one another but are also built out of one another, akin to Lego pieces. This is the cumulative (constructivist!) part of the approach. To start the whole process of learning, the system is given an initial set of primitive (innate/hardwired) concepts (an alphabet, a finite discrete set), with the capability to break its raw sensations in an episode into those primitives. So each episode begins with a sequence (or string) of primitives in the input buffer. In order to predict better, the system separates and puts together its buffer contents (segments) and maps the chunks into its current most useful (highest level) concepts. I have termed this process (of structuring ones input with one's concepts) interpretation (prediction and interpretation are intertwined). By practicing many interpretations, over many episodes, the system figures out which concepts (new and old) go together: predict each other, and could perhaps be joined to make larger more useful (higher level) concepts. Thus concepts correspond to hierarchical structured patterns (such as finite state machines), and PGs involve a continual self-supervised and cumulative online learning process for learning more and more concepts and their evolving (prediction) relations.

I currently believe that my research on PGs is most relevant to learning in perception. Over the years, I have built a few versions of such systems that play the game in one dimensional text (see the pointers below). There are many challenges: How do we avoid combinatorial explosion (because such systems work with explicit structures)? These systems make local decisions, eg in determining which concepts to activate in a given episode, or which concepts to join to make composite concepts, based on noisy indirect information. There is much uncertainty: uncertainty on top of uncertainty (reminiscent of a house of cards!): can the system build a robust networked hierarchy of concepts (without the errors in earlier decisions compounding)? How do we design algorithms and objectives so that learning can go on and not get stuck in poor local optima? When or why should such systems succeed over time? Issues of (code/engineering) complexity and how to control and understand the dynamics (of subsystems interactings): for instance, how biased will such a system, which is learning sequentially, be? (what it learns now affects its future choices and its future learning) There are many problems and subproblems to be discovered and defined here. Along the way, problems of philosophical nature arise too, as the system is building its own (biased) reality in a sense (so then, is there 'one unique objective truth out there' or are we closer to idealism?) (relevant philosophies: phenomenology of perception, constructivism in epistemology, semiotics, ...). I think this has made for a good long-term research project!

Our (human) thoughts are built on concepts (and embodied!). A few references:

[More material (pointer and papers) to go here. ]

My work on PGs began with the problems of learning under large and growing output spaces (classes, concepts, or just items!), and that line of work continues to advance (those ideas led to PGs, and now PGs drive them!). So I am breaking the PG-related papers into two columns below, in reverse chronological order, plus a brief explainer for each selected work:

On the PGs approach/systems:

  • An Information Theoretic Score for Learning Hierarchical Concepts, 2023, (focuses on and further develops CORE, but describes the system as well) In Frontiers in Computational Neuroscience, special topic on Advances in Shannon-based Communications and Computations Approaches to Understanding Information Processing in the Brain.
  • Expedition: A System for the Unsupervised Learning of a Hierarchy of Concepts , arXiv 2021 (revival of PGs after a ~10 year hiatus! Introduces a version of CORE=COherehce+REality, a measure of gain in information, useful for concept use, ie interpretation/inference). We have a good candidate objective now! And PGs become much more probabilistic and information theoretic.
  • Systems Learning for Complex Pattern Problems, BICA 2008: We may need systems, composed of multiple parts (eg ml algorithms) interacting over long periods, for improving at perception (and, specially once we add control, because of feedback, understanding the development of such systems is an interesting challenge).
  • Prediction Games in Infinitely Rich Worlds, 2007 (AAAI position paper , Longer technical report on the basic idea/philosophy and various motivations/considerations/challenges. )
  • Selected papers on the prediction sub-problem (online and open-ended, non-stationary, ..):


    1Much of this may apply to other organisms too. An episode can be seeing an image or hearing an utterance. The word 'useful' is probably better than other choices such as 'present' (I think I also prefer it over 'relevant').
    2Including more or less figuring out, or having developed the appropriate biases for, the very challenging task of what the uttered noisy words of a parent, heard in speech, could refer to! (reference also implies a shared, or some good intersection in, the conceptual spaces)
    3 This is after a few years of gaining experience in ml, but also increasingly thinking about how we can make learning systems less 'needy' (such as avoiding expensive manual labeling). I also saw how large amounts of (unlabeled) data (while at Yahoo!, through the works of colleagues), could nevertheless help provide the needed feedback.
    4 In most ml work, features/predictors and the targets of prediction are distinct.

    Back to home