Lecture 4: Theoretical Fund. of Dynamic Programming Algorithms

AI/RL (2021 DeepMind x UCL )

Lecture 4: Theoretical Fund. of Dynamic Programming Algorithms

Tony Lim 2021. 11. 27. 13:09

Contraction mapping

we take a sequence that is convergent in that space , apply Transfomation (T) to that sequence, we get another convergence sequence that covnerges to T of x.

which is limit of that sequence

fixed point

if apply T trasnform to x it goes back to original x

Banach Fixed Point Theorem

we can know that sequence is convergent to unique fixed point X star

The Bellman Optimality Operator

Value Iteration through th elens of the bellman operator

value iteration always converge as long as gamma is less than 1

Approximate Dynamic Programming

we have assumed prefect knowledge of the MDP and perfect / exact representation of the value funcitons

realistically , more often than not

we don't know the underlying MDP

=> sampling / estimation error , as we don't have access to the true operators "T"

we won't be able to represent the value function exactly after each update

=> approximation error , as we approximate the true value functions within a class

Objective = under the above conditions , come up with a policy pie that is close to optimal

A = Approximation ( sample or function approximation)

the case where function approximator might diverge

if gamma is high , even though we chose pretty good "q" it still might be far from optimal case, because our upper bounds are high

if gamm is low ,vice versa

저작자표시

'AI > RL (2021 DeepMind x UCL )' 카테고리의 다른 글

Lecture 5: Model-free Prediction (part 2) (0)	2021.12.04
Lecture 5: Model-free Prediction (part 1) (0)	2021.12.04
Lecture 3: MDPs and Dynamic Programming (0)	2021.11.20
Lecture 2: Exploration and Exploitation (part 2) (0)	2021.11.14
Lecture 2: Exploration and Exploitation (part 1) (0)	2021.11.13

현재글Lecture 4: Theoretical Fund. of Dynamic Programming Algorithms

Median Find, 메소드 참조, Interval Scheduling, Quicksort, Matrix Mutilply, Algorithm, 람다, Weighted Interval Scheduling, Text Justification, systemd, 날짜시간, 파일입출력, spring, dijkstra, 영속성, 스레드, JPA, Linux, fft, 자바8,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

관심있는것들