Problem Framework Markov Decision Process (MDP) Discrete time step, can be continuous space of action and state We don’t know the exact outcome of the action Once the action is performed, we know exactly what happened The agent’s state is known (fully observed) – observation and the state is the same here Formally defined as a 4-tuples (S, A, T, R): State Space Action Space Transition Function Reward Function Partially
现象 神经网络训练,一开始准确率很高,然后逐渐下降。如下所示: Epoch Time Train Loss Train ACC Val Loss Val ACC Test Loss Test ACC LR 1 197.8234 0.0053 0.8645 0.0412 0.1443 0.0412 0.1443 0.0100 2 108.6638 0.0084 0.7311 0.0272 0.1443 0.0272 0.1443 0.0100 3 108.4892 0.0095 0.6777 0.0267 0.1443 0.0267