Transformer = uses attention
elmo
just concatenating hidden vectors from left and right LSTM. have 2 half blind model , suboptimal
we want to single model to look left and right simultaneously
gpt
built for Language Model so it looks left to right = genrating language
bert
masked LM = replace words with mask
pretrained with 2 tasks
1. feed 2 input sentence and guess is it reasonable next sentence or just some random corpus
2. model need to guess what mask should be
word and character embbeding
playing is divided into play and ##ing. if sub exist but cribe doesn't we treat "cribe" as character
a) MNLI = need to guess whether the 2nd sentence is an entailment, contradiction , netural with respec to first one.
assume we get 512 vectors at class label , then all we need to train is 512 * 3(those 3 result we need to guess) matrix.
c) start , end works as query in attention model and with every paragraph token it has vector
then we do softmax and inner product for each start and end.
guess where are start and end.
'AI > Yannic Kilcher' 카테고리의 다른 글
Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions (Paper Explained) (0) | 2021.11.28 |
---|---|
Attention is All You Need (0) | 2021.11.21 |