The Reinforcement Learning problem : evaluative feedback, non-associative
learning, Rewards and returns, Markov Decision Processes, Value functions, optimality and approximation.
Dynamic programming : value iteration, policy iteration, asynchronous
DP, generalized policy iteration.
Monta-Carlo methods : policy evaluation, roll outs, on policy and off policy
learning, importance sampling.
Temporal Difference learning : TD prediction, Optimality of TD(0), SARSA, Q-learning, R-learning, Games and after states.
Eligibility traces : n-step TD prediction, TD (lambda), forward and
backward views, Q (lambda), SARSA (lambda), replacing traces and
accumulating traces.
Function Approximation : Value prediction, gradient descent methods,
linear function approximation, ANN based function approximation, lazy
learning, instability issues
Policy Gradient methods : non-associative learning – REINFORCE
algorithm, exact gradient methods, estimating gradients, approximate
policy gradient algorithms, actor-critic methods.
Planning and Learning : Model based learning and planning, prioritized
sweeping, Dyna, heuristic search, trajectory sampling, E 3 algorithm
Hierarchical RL : MAXQ framework, Options framework, HAM framework,
airport algorithm, hierarchical policy gradient
Case studies : Elevator dispatching, Samuel’s checker player, TDgammon,
Acrobot, Helicopter piloting
- Teacher: CS17S011 AJAY KUMAR PANDEY
- Teacher: CS17S016 BHAVSAR NIRAV NARHARIBHAI
- Teacher: CS17D003 NITHIA V
- Teacher: prashla Prashanth