The Reinforcement Learning problem : evaluative feedback, non-associative
learning, Rewards and returns, Markov Decision Processes, Value functions, optimality and approximation.
Dynamic programming : value iteration, policy iteration, asynchronous
DP, generalized policy iteration.
Monta-Carlo methods : policy evaluation, roll outs, on policy and off policy
learning, importance sampling.
Temporal Difference learning : TD prediction, Optimality of TD(0), SARSA, Q-learning, R-learning, Games and after states.
Eligibility traces : n-step TD prediction, TD (lambda), forward and
backward views, Q (lambda), SARSA (lambda), replacing traces and
accumulating traces.
Function Approximation : Value prediction, gradient descent methods,
linear function approximation, ANN based function approximation, lazy
learning, instability issues
Policy Gradient methods : non-associative learning – REINFORCE
algorithm, exact gradient methods, estimating gradients, approximate
policy gradient algorithms, actor-critic methods.
- Teacher: ravib RAVINDRAN B