AI/NLP (cs224n)

Lec4) NLP with Deep Learning

Tony Lim 2021. 4. 13. 13:02
728x90

 

instead of doing each word error ,  we calculate windows error and later on we split into (in this case 5) 5 and update our word vector.

this things might happen when we fine tune our pretrained word vectors. so what shoudl we do?

we only "fine tune" my own word vector when we have only large training dataset. most of the time just using pretrained word vectors are best.

 

our goal is to calculate "Downstream Gradient" and we can do that by using chain rule with Local gradient and upstream gradient.

we skipped the very final one because derivative s by s is just 1. this case is where # of input and ouput is 1 we want general case.

 

 same chain rule computation but doing it for each input.

when we have more than one back prop gradients , in this case y , we just add them up.

 

Things to remember in deep learning

1.Reugularization

to prevents overfitting otherwise model will only good for trained data.

2. Vectorization

in order to make use of GPU we need to vectorrize and matrix

3. Current Best Non Linearity = RELU

4. Parameter Initalization

never start with all zeros. always intitalize with some small random value.

5. Optimziers

SGD is fine but there are many others. Adam is one of them

6. Learning Rate

Better results can generally be obtained by allowing learning rates to decrease as you train 

728x90

'AI > NLP (cs224n)' 카테고리의 다른 글

Lec6) Language Models and RNNs  (0) 2021.04.27
Lec5) Dependency Parsing , Optimizer(GD to ADAM)  (0) 2021.04.26
Lec3) Neural Networks  (0) 2021.04.12
Lec2) Word Vectors and Word Senses  (0) 2021.04.06
Lec1) introduction and Word vectors  (0) 2021.04.05