Robotics 学习笔记
Problem Framework
Markov Decision Process (MDP)
- Discrete time step, can be continuous space of action and state
- We don’t know the exact outcome of the action
- Once the action is performed, we know exactly what happened
- The agent’s state is known (fully observed) – observation and the state is the same here
Formally defined as a 4-tuples (S, A, T, R):
- State Space
- Action Space
- Transition Function
- Reward Function
Partially Observable Markov Decision Process (POMDP)
Almost the same as MDP, except: the effect of the action are not known exactly before the action is performed (non-deterministic action effects)