AI/NLP (cs224n)

Lec2) Word Vectors and Word Senses

Tony Lim 2021. 4. 6. 01:26

most of the word vectors are represented as row

 

Gradient Descent

computing naviely takes too much time.

Stochastic gradient descent(SGD) = randomly choose small sample(or batch) for each step and do same regular gradient descent, effectively computing faster.

 

Skip - grams = you have 1 center word and predict all the 'outside' words in the context

Continous Bag of Words = predict center words from context words

 

negative sampling

trying to mimize object function 

1. we want our observed words to have high probability

2. we choose K random words and give them low probability 

by this little change we can sort of reduce high frequency problem

 

Glove

countbased + distrubtion 

using probe word (solid ,gas water..) we can measure co-relation between word (ice, steam)

f is for reducing power of high frequency words

glove tries to capture the counts of the overall statistics of how often these words appear together

 

 

 

 

'AI > NLP (cs224n)' 카테고리의 다른 글

Lec6) Language Models and RNNs  (0) 2021.04.27
Lec5) Dependency Parsing , Optimizer(GD to ADAM)  (0) 2021.04.26
Lec4) NLP with Deep Learning  (0) 2021.04.13
Lec3) Neural Networks  (0) 2021.04.12
Lec1) introduction and Word vectors  (0) 2021.04.05