Heuristic Policies for Reinforcement Learning
Context
Reinforcement Learning allows an agent to learn optimal behaviour through trial-and-error interactions with its environment. By repeatedly evaluating possible actions in different situations, the agent can discover the consequences of his actions and select the optimal one. Normally, RL does not assume any prior information and the agent's environment is considered to be completely unknown. However, when dealing with specific learning problems expert human knowledge may be available. This knowledge can then be used to speed up learning.
Goals
The goal of this thesis is to investigate and evaluate methods of incorporating expert knowledge in the RL process. The idea is that we have some simple rules (a heuristic policy) available which can be used by the agent to select actions in a specific learning problem. Several possibilities exist to combine this policy with the RL learning system. The heuristic policy can be used to initialize the RL algorithm and bias it towards the actions indicated by the policy. Other alternatives are to use the policy to alter the rewards received by the agent or guide its exploration. A more complex possibility is to use the information given by the heuristic policy as additional environment features and use these features during the RL process. The goal of this dissertation is to develop some possible methods and evaluate them, based on both the speed of learning and the quality of the obtained solutions.
Contacts
- Ann Nowe (promotor): ann.nowe@vub.ac.be

