AI/NLP (cs224n)

Lec8) Translation , Seq2Seq , Attention

Tony Lim 2021. 5. 5. 14:33
728x90

Statistical Machine Translation (SMT)

we are dividing a single conditional probability distribution to Translation Model and Language Model so it can learn more efficiently. 

searching for all possible y and calculating probability take too long. so we use heuristic search algorithm to search best translation. = decoding

 

Neural Machine Translation (NMT)

unlike SMT there are no dividing to smaller component it is directily learning whole thing. simpler and easier.

during training we feed source sentence to encoder and transfer the final encode output as input to decoder.

in decoder we let decoder do Exhasutive search or beam search to find the next word and compare loss to train. 

 

Disadvantages of NMT compared to SMT

NMT is less interpretable = hard to debug , we don't know why it is not working very well

NMT is difficult to control = can't easily specify rules or guidelines for translation.

 

Evaluation

BLEU compares the machine-written translation to one or several human-written translation and compute simliarity socre based on n-gram precision , plus a penalty for too short system translations. (because mode might predict just n words that it is pretty sure about and it would be too short.)

Good translation might get bad score at BLUE if there are no matching n human translation.

 

Attention

too much pressure on the less encoding layer vectors. it is hard too encode all the information if source sentence is long. (information bottle neck problem)

at every decoder layer we do dot producting and compute attetion scores and do softmax and make it to distribution.

compute Attention output by taking weighted sum of enocder hidden state based on Attention distribution.

at last you concatenate Attention output with decoder hidden state and compute loss

 

why Attention

  • solve bottleneck problem
  • helps with vanishing gradient problems
  • some interpretability

entarte means in french "to hit something with a pie" and attention model colors table pretty similarly.

 

General Attention

Given a set of vector values(encoder), and a vector query(decoder), attention is a technique to compute a weighted sum of the values, dependent on the query.

can get sinlge attetion output vector even with many values(encoder).

 

 

 

728x90

'AI > NLP (cs224n)' 카테고리의 다른 글

Lec11) Convolutional Networks for NLP  (0) 2021.06.05
Lec10) Question Answering  (0) 2021.05.11
Lec7) Vanishing Gradients, Fancy RNNs  (0) 2021.04.28
Lec6) Language Models and RNNs  (0) 2021.04.27
Lec5) Dependency Parsing , Optimizer(GD to ADAM)  (0) 2021.04.26