Lecture 2: Exploration and Exploitation (part 2)

AI/RL (2021 DeepMind x UCL )

Lecture 2: Exploration and Exploitation (part 2)

Tony Lim 2021. 11. 14. 14:56

Theorem : What is Possible?

unlike greedy we can explore other action (even though its Qt is not high because of Ut, uncertainty, not very picked often)

how worng our estimate are going to be (Ut)

intuition

1. more number we have in average the less likely we are that if we then add "u" an added amount(Xn) that this is still going to be smaller than the actual mean

similarly if we pick "u" to be larger , if we consider to be far away enough then it becomes exceedingly unlikely that our sample mean is far off

larger c we exlore more smaller c explore less , c == 0 greedy

Qt = sample average , delta a = optimal - current chosen action "a" 's value

total regret can be bound by log t prove (1:10:40)

https://www.youtube.com/watch?v=aQJP3Z2Ho8U&list=PLqYmG7hTraZDVH599EItlEWsUOsJbAodm&index=2&ab_channel=DeepMind

Bayesian Approach

first assume certain distribution and change it as we go on

Thompson Sampling

probability matching = it picks an action according to likelihood (according to our beliefs) that this action is optimal action

actions have higher probability when either the estimated value is high , or the uncertainty is high

sample from belif distribution by using actual action value

then we peek greedy action according to th esample action values

planning to explore

저작자표시

'AI > RL (2021 DeepMind x UCL )' 카테고리의 다른 글

Lecture 5: Model-free Prediction (part 1) (0)	2021.12.04
Lecture 4: Theoretical Fund. of Dynamic Programming Algorithms (0)	2021.11.27
Lecture 3: MDPs and Dynamic Programming (0)	2021.11.20
Lecture 2: Exploration and Exploitation (part 1) (0)	2021.11.13
Lecture 1: Introduction to Reinforcement Learning (0)	2021.11.07

현재글Lecture 2: Exploration and Exploitation (part 2)

Interval Scheduling, 영속성, fft, 스레드, systemd, JPA, Matrix Mutilply, 메소드 참조, 날짜시간, Quicksort, 자바8, dijkstra, 람다, Algorithm, Median Find, spring, Linux, 파일입출력, Weighted Interval Scheduling, Text Justification,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

관심있는것들