flatland

Flatland Agents

towards agent-friendly learning in open worlds

The Flatland project is joint work with Tom Dean, Brian Burns, and Reza Eghbali. 1 We explore different algorithms (learning & inference) and architectures for autonomous agents exhibiting flexible behavior in simplified yet challenging virtual environments. The tasks, involving navigation and foraging, can be varied and can change over time (due to changes in the agent and/or the environment). In particular, we strive to develop and understand agent-friendly learning: learning that is adequately fast (occurs within the lifetime of the agent!), preferrably cumulative (new skills built on top of, or complimentaty to, capabilities already acquired, thus sequential and non-IID agent experience), mostly unsupervised, and robust to various uncertainties.

Performance of different strategies under change. In a first foray, we asked how well several navigation strategies (and their combinations), a few with some learning/remembering but mostly hardwired, perform under certain changes and uncertainties. Learning and memory require some stability in the environment (to be useful in making future decisions), and any strategy that relies on such (learning/remembering for predicting) eventually fails under too much non-stationarity.2

In a simple grid world environment (developed using pygame), our agent, every day, needs to find its way to food in the presence of barriers. It doesn't initially know where food or barriers are, but it could remember the locations once encountered. However, the barrier and food locations can change from day to day. We introduced knobs on the rate of location change, and on the agent's location estimation noise/uncertainty. We also had knobs on food/goal distance (grid size), and barrier (wall) density. We looked at a range of strategies, from simple to complex (assuming different agent capabilities). In one simple greedy strategy (using no memory), we assume the direction that lowers the distance to food is always available to the agent (imagine the agent has a very powerful/perfect smell sense!): we wondered whether strategies based on remembering barrier locations (more generally acquiring knowledge that while incomplete and imperfect remains potentially useful) could beat simple but powerful (smell) greedy variants (and, if so, in what range of conditions). We find that as long as the location estimation error is reasonably low, and with appropriate learning (eg robust to non-stationarity including two-tiered updates) and architecture (appropriately combining different strategies), indeed the benefits of remembering and planning can be substantial in reducing steps to goal (specially as we increase task difficulty along the dimensions of barrier density and distance to food). We also tried a model-free RL technique based on training neural networks (highly successful at fully-observable Atari games) but it required too much training and suffered under change. We found that adding a bit of memory could substantially enhance performance over no memory! Our agent had limited perception, and increasing location error quickly erased the benefits of remembering and planning.

We hope to extend the work by relaxing some of the (hardwired) assumptions, such as: how does an agent know what to remember? We want to add more powerful learning, in particular under the theme of efficiently acquiring reusable (structured) patterns that enable flexibility under change (eg see Prediction Games).


Footnotes:

1We arrived at this specific research topic after 2+ years of readings and discussions, covering topics such as the hippocampus and the prefrontal cortex, the various aspects and uses of memory, learning, decision making, and so on!
2However, note that "learning" implies some change too: if there were no change, the designer engineer of the agent could hardwire all that would be needed. We seek to develop and understand learning systems that can keep up with significant levels of change (in the task, environment, or the agent itself).
Back to Omid Madani's homepage
Back to publications page