Robotics 学习笔记
Problem Framework
Markov Decision Process (MDP)
- Discrete time step, can be continuous space of action and state
- We don’t know the exact outcome of the action
- Once the action is performed, we know exactly what happened
- The agent’s state is known (fully observed) – observation and the state is the same here
Formally defined as a 4-tuples (S, A, T, R):
- State Space
- Action Space
- Transition Function
- Reward Function
Partially Observable Markov Decision Process (POMDP)
Almost the same as MDP, except: the effect of the action are not known exactly before the action is performed (non-deterministic action effects)
In addition, the agent’s state is not known exactly (partially observed)
State Space (not known), instead, we have “Belief” – distribution over the state space
Action Space
Observation Space
Transition Function
Observation Function
Reward Function
A POMDP can be viewed as an MDP in the belief space. (The belief is 1 at a particular thing.)
Reinforcement Learning (RL)
- The agent learns by trying and evaluating states and actions
- An RL Agent is an MDP agent where the transition and/or reward functions are not initially known
- Problem-wise, it’s essentially a POMDP, where partial observability is caused by incomplete information about the underlying MDP problem
Solving
MDP
Online:
- Value Iteration
- Policy Iteration
Offline:
- Real-Time Dynamic Programming (RTDP)
- Monte Carlo Tree Search (MCTS)
POMDP
- Offline planners
- Online solvers
- Learning-based Particle Filter
- Learning-based Solvers