AI/Yannic Kilcher

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Tony Lim 2021. 11. 21. 12:25

Transformer = uses attention

elmo

just concatenating hidden vectors from left and right LSTM. have 2 half blind model , suboptimal

we want to single model to look left and right simultaneously

 

gpt

built for Language Model so it looks left to right = genrating language

 

bert

masked LM = replace words with mask

pretrained with 2 tasks

1. feed 2 input sentence and guess is it reasonable next sentence or just some random corpus

2. model need to guess what mask should be

 

word and character embbeding

playing is divided into play and ##ing. if sub exist but cribe doesn't we treat "cribe" as character

 

a) MNLI = need to guess whether the 2nd sentence is an entailment, contradiction , netural with respec to first one.

assume we get 512 vectors at class label , then all we need to train is 512 * 3(those 3 result we need to guess) matrix.

c) start , end works as query in attention model and with every paragraph token it has vector

then we do softmax and inner product for each start and end.

guess where are start and end.