Lecture 3: MDPs and Dynamic Programming

AI/RL (2021 DeepMind x UCL )

Lecture 3: MDPs and Dynamic Programming

Tony Lim 2021. 11. 20. 20:58

728x90

Markov Property

Once the state is known , the history may be thrown away

returns

discount gamma (0~1) is the present value of future rewards

if close to 0 leads to mypoic (very short) evaluation

if close to 1 leads to far-sighted evaluation

why discount?

probelm specification = immediate rewards may actually be more valuable (e.g. consider earning interest)

Solution side = Mathematically convenient to discount rewards , avoid infinite returns in cyclic markov processe

Policies

Goal of an RL agent = to find a behavior policy that maximises teh expected return Gt(total reward)

value function

expected value under the policy that we are evaluating under pie

optimal value function = best possible perfomance in the markov decision process

mdp is solved when we know the optimal value function

bellman equation

policy evaluation

when gamma is less then 1 it will always converge

policy iteration

iterate until both converge

Asynchronous Dynamic Programming

In-place dynamic programming

prioritised sweeping

real-time dynamic programming

only update states that are relevan to agent (debatable , not sure) ㅋㅋ

728x90

저작자표시 (새창열림)

'AI > RL (2021 DeepMind x UCL )' 카테고리의 다른 글

Lecture 5: Model-free Prediction (part 1) (0)	2021.12.04
Lecture 4: Theoretical Fund. of Dynamic Programming Algorithms (0)	2021.11.27
Lecture 2: Exploration and Exploitation (part 2) (0)	2021.11.14
Lecture 2: Exploration and Exploitation (part 1) (0)	2021.11.13
Lecture 1: Introduction to Reinforcement Learning (0)	2021.11.07

현재글Lecture 3: MDPs and Dynamic Programming

250x250

파일입출력, Median Find, systemd, Matrix Mutilply, spring, Linux, Algorithm, 날짜시간, 자바8, Interval Scheduling, 메소드 참조, JPA, dijkstra, 람다, fft, 영속성, Weighted Interval Scheduling, Quicksort, Text Justification, 스레드,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

관심있는것들