AI/Yannic Kilcher

Attention is All You Need

Tony Lim 2021. 11. 21. 11:14

unlike other RNN, LSTM we can some how tell decoder to where you should focus (attention)

 

positional encoding = gives network significant boost for performance

 

top right attention = 3 connection going into it (key, value = encoding part of source sentence) , (query = encoding part of target sentence)

dot product of keys and query = gives angles between these 2 vecotors , if both of them are pointing to same direction result will be large

by doing softmax on dot product we can index V2(value for K2) , the most relevant soruce token

V is bunch of information that we might find intersting about soruce and K is representation(index, address) of each value

Query = i would like to know certain thing, like name, height something like this , we can find it by dot product with key and doing softmax we can find matching V(value, [name, height ..])