Lec8) Translation , Seq2Seq , Attention

AI/NLP (cs224n)

Lec8) Translation , Seq2Seq , Attention

Tony Lim 2021. 5. 5. 14:33

728x90

Statistical Machine Translation (SMT)

we are dividing a single conditional probability distribution to Translation Model and Language Model so it can learn more efficiently.

searching for all possible y and calculating probability take too long. so we use heuristic search algorithm to search best translation. = decoding

Neural Machine Translation (NMT)

unlike SMT there are no dividing to smaller component it is directily learning whole thing. simpler and easier.

during training we feed source sentence to encoder and transfer the final encode output as input to decoder.

in decoder we let decoder do Exhasutive search or beam search to find the next word and compare loss to train.

Disadvantages of NMT compared to SMT

NMT is less interpretable = hard to debug , we don't know why it is not working very well

NMT is difficult to control = can't easily specify rules or guidelines for translation.

Evaluation

BLEU compares the machine-written translation to one or several human-written translation and compute simliarity socre based on n-gram precision , plus a penalty for too short system translations. (because mode might predict just n words that it is pretty sure about and it would be too short.)

Good translation might get bad score at BLUE if there are no matching n human translation.

Attention

too much pressure on the less encoding layer vectors. it is hard too encode all the information if source sentence is long. (information bottle neck problem)

at every decoder layer we do dot producting and compute attetion scores and do softmax and make it to distribution.

compute Attention output by taking weighted sum of enocder hidden state based on Attention distribution.

at last you concatenate Attention output with decoder hidden state and compute loss

why Attention

solve bottleneck problem
helps with vanishing gradient problems
some interpretability

entarte means in french "to hit something with a pie" and attention model colors table pretty similarly.

General Attention

Given a set of vector values(encoder), and a vector query(decoder), attention is a technique to compute a weighted sum of the values, dependent on the query.

can get sinlge attetion output vector even with many values(encoder).

728x90

저작자표시 (새창열림)

'AI > NLP (cs224n)' 카테고리의 다른 글

Lec11) Convolutional Networks for NLP (0)	2021.06.05
Lec10) Question Answering (0)	2021.05.11
Lec7) Vanishing Gradients, Fancy RNNs (0)	2021.04.28
Lec6) Language Models and RNNs (0)	2021.04.27
Lec5) Dependency Parsing , Optimizer(GD to ADAM) (0)	2021.04.26

현재글Lec8) Translation , Seq2Seq , Attention

250x250

날짜시간, 영속성, Interval Scheduling, Matrix Mutilply, 람다, 자바8, Quicksort, 스레드, Median Find, 파일입출력, systemd, JPA, dijkstra, Text Justification, 메소드 참조, fft, Algorithm, Weighted Interval Scheduling, Linux, spring,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

관심있는것들