SMA Problems

Algorithmic understanding and algorithmic variations:

How fast is the variance improved by increasing the number of queue cells in the Qs SMA (in the stationary setting)?
If one could have different queue capacities for different items, how should the capacities be allocated? Should queues for more frequent items have higher capacities? Does differential queue capacity make a significant difference?
Are there other better reduction schedules than harmonic decay of the learning rate for EMA?
Are there simple enough pure learning-rate based approaches that rival DYAL? pure counting based approaches? Other approaches? (is a hybrid a necessity?)
How can DYAL be simplified (such as improving or simplifying the change-detection, or switching to the queue, test)?
Could there be gains if items (entries) are not updated basically independently of one another?
Improve the analyses: eg the convergence of EMA (tighten the gap in the theorem).
Feed-forward neural networks, with back propagation, can support outputting probabilities, but often multiple passes are required and extra step of calibaration is needed for good probabilities. Typical NNs are not open ended: the classes should be fixed. For a fixed set of classes, e.g. 10 many, how do such techniques do when the distribution changes along a stream?

Evaluation:

Alternatives to bounded logloss? Bounded logloss can go up initially as the predictors are warming up and learning, which is counter intuitive.
What if the real-world deviates substantially from periords of stability IID? Do proper losses still remain relevant? How do evaluate?

Applications:

Could these techniques improve intrusion detection? Eg. explore techniques for the Masquerade data set? (larger data sets may be necessary)
Could these techniques improve multiclass prediction in a non-stationary task? Eg. the Unix command sequences? (larger data sets may be necessary) See the paper 'Efficient Online Learning and Prediction of Users Desktop Actions' on prediction/ranking under non-stationarity, for personalization and so on.
Within Prediction Games, as new concepts are generated, how do different SMAs compare?

Exercises and Project Ideas Around SMAs