we can apply filter and let it represent what we want , first filter can be used to filter "polite thing" and so on.
we have 2 channels for each kernel size (2,3,4) -> output size is (4,5,6) -> 1 max pooling from each channel and concatenate -> concatenate whole and put it into softmax to tell positve or negative.
Regularization
Dropout = Create masking vector r of Bernoulli random variable with probability p of being 1.
prevents over fitting , but test time we dont do r x z we just scale final vector W with probability p.
Shortcut connection
first one's semantic meaning is to see deviation form doing nothing by just passing x
second one's semantic meaning is bit more complex.
Batch Normalization
it is kind of doing conv block's oupt and do Z transform. tends to make scale things in same size.
1*1 Convolutions
pytorchZeroToAll) CNN , Advanced CNN(inception) (tistory.com)
pytorchZeroToAll) CNN , Advanced CNN(inception)
convolution use kernel matrix(filter) as weight parameter (keep changes as training goes on) with RGB (depth =3 ) picture we use kernel matrix depth 3 we can have mutiple kernel matrix and create de..
tonylim.tistory.com
we can reduce demensionality
'AI > NLP (cs224n)' 카테고리의 다른 글
Lec10) Question Answering (0) | 2021.05.11 |
---|---|
Lec8) Translation , Seq2Seq , Attention (0) | 2021.05.05 |
Lec7) Vanishing Gradients, Fancy RNNs (0) | 2021.04.28 |
Lec6) Language Models and RNNs (0) | 2021.04.27 |
Lec5) Dependency Parsing , Optimizer(GD to ADAM) (0) | 2021.04.26 |