Review of Multi-Armed Bandit Problems

Context

A one-armed bandit or slot machine.

A multi-armed bandit is a collection of slot machines or one-armed bandits. Each slot machine returns some random reward each time its lever or arm is pulled. Since slot machines provide different average payoffs, a gambler could increase his reward by choosing intelligently which arm to pull.

Many real-world problems can be modeled in this way. Whenever you are faced with the same options repeatedly, you would like to find the best one while minimizing the number of trials. For example, clinical trials, where a treatment must be selected for each patient, have a clear tradeoff between finding the best treatment and avoiding too many negative outcomes. This dilemma is also called: balancing exploration and exploitation.

In literature many different multi-armed bandit problems have been defined: some have real-valued payoffs, others have binary payoffs (+1 or -1). In some cases the gambler also observes the payoffs he would have received for the other options, this is known as the 'transparent' bandit problem. A good starting point is the article by Vermorel and Mohri (2005): "Multi-armed bandit algorithms and empirical evaluation".

Goal

  • Classify different multi-armed bandit problems.
  • Discuss existing optimal solutions or the currently best known algorithms for each class.
  • If possible, propose a new or improved algorithm.
  • Implement known techniques and evaluate by doing computer experiments.
  • Analyse and compare results.

Contact

Interested students should contact David Catteeuw (dcatteeu@vub.ac.be) for more information.

Promotor: Bernard Manderick

teaching/thesis_multiarmedbandit.txt · Last modified: 2011/04/26 10:57 by dcatteeu
Recent changes RSS feed Creative Commons License Donate Driven by DokuWiki