'AI/Andrej Karpathy' 카테고리의 글 목록

AI/Andrej Karpathy 7

Let's build GPT: from scratch, in code, spelled out.

# here are all the unique characters that occur in this text chars = sorted(list(set(text))) vocab_size = len(chars) print(''.join(chars)) print(vocab_size) account_circle !$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz 65 small shakespear 라는 몇장의 문단에서 character들만 추출 한 것이다. # create a mapping from characters to integers stoi = { ch:i for i,ch in enumerate(chars) } itos = { i:ch fo..

AI/Andrej Karpathy 2023.11.26

Building makemore Part 5: Building a WaveNet

lossi 를 표현한 그래프가 굉장히 보기가 어렵다. plt.plot(lossi) plt.plot(torch.tensor(lossi).view(-1, 1000).mean(1)) (200,1000) 으로 만든후에 row를 기준으로 평균을 내어 loss decay가 일어난곳을 좀 더 명확히 확인 할 수 있다. class Embedding: def __init__(self, num_embeddings, embedding_dim): self.weight = torch.randn((num_embeddings, embedding_dim)) def __call__(self, IX): self.out = self.weight[IX] return self.out def parameters(self): return [se..

AI/Andrej Karpathy 2023.08.15

Building makemore Part 4: Becoming a Backprop Ninja

logprobs = probs.log() loss = -logprobs[range(n), Yb].mean() logprorbs 는 ([32,27]) tensor인데 -logprobs[range(n), Yb]는 1~32 row를 iterate하면서 그중 Yb에 해당하는 column 만 indexing 하는것이다. -logporbs[range(n), Yb] 의 shape 은 32 이다. batch size = 32 dlogprobs/da = -1/3a -1/3b + -1/3c dlogporbs/dsomething = -1/n dlogprobs = torch.zeros_like(logprobs) dlogprobs[range(n), Yb] = -1.0/n -logprobs[range(n), Yb] 여기에 평균값..

AI/Andrej Karpathy 2023.02.26

Building makemore Part 3: Activations & Gradients, BatchNorm

fixing the initial loss weight 초기화를 우선 잘못하고 있다. 현재는 loss 가 거의 27이 나오는데 27개의 alphabet 중에 첫 번째 훈련에서는 어느 것이 나와도 이상하지 않다. 즉 최소한 기대할 수 있는 것 uniform distribution을 가정할 수 있다. -torch.tensor(1/27.0).log() tensor(3.2958) 3 정도의 loss를 init step 에 가져가면 괜찮게 가져간 것이다. 3보다 높으면 그냥 뽑는 것만 못하다는 의미이다. logits 의 값들이 가질 수 있는 범위가 클수록 loss 가 굉장히 커지기 쉽다. 거의 0 에 수렴하게 만들고 싶다. # MLP revisited n_embd = 10 # the dimensionality o..

AI/Andrej Karpathy 2023.02.19

Building makemore Part 2: MLP

현재는 bigram이니 27개의 row만 존재하지만 2, 3개의 input 기준으로 다음 단어를 예측하게 되면 27^2 , 27^3 의 row가 생겨서 점점 W.shape이 말도 안되게 커지게 된다. https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf 논문은 word modeling 우린 character modeling 이지만 아이디어는 똑같이 적용할 수 있다. word를 30차원으로 embedding한다. 비슷한 의미들은 embedding space에서 비슷한 곳에 분포하게 된다. 3개의 word 를 기준으로 다음 word를 예측하는 모델이다. C.shape는 (17000,30) 으로 17000은 총 word갯수이고 30은 우리가 embbedin..

AI/Andrej Karpathy 2023.02.04

The spelled-out intro to language modeling: building makemore

['emma', 'olivia', 'ava', 'isabella', 'sophia', 'charlotte', 'mia', 'amelia', 'harper', 'evelyn'] b = {} for w in words: chs = [''] + list(w) + [''] for ch1, ch2 in zip(chs, chs[1:]): bigram = (ch1, ch2) b[bigram] = b.get(bigram, 0) + 1 sorted(b.items(), key = lambda kv: -kv[1]) [(('n', ''), 6763), (('a', ''), 6640), (('a', 'n'), 5438), (('', 'a'), 4410), (('e', ''), 3983), names.txt에 이름들에 start..

AI/Andrej Karpathy 2023.01.24

The spelled-out intro to neural networks and backpropagation: building micrograd

a = Value(2.0, label='a') b = Value(-3.0, label='b') c = Value(10.0, label='c') e = a*b; e.label = 'e' d = e + c; d.label = 'd' f = Value(-2.0, label='f') L = d * f; L.label = 'L' L 현재 dL/dd = -2 인 상태에서 chain rule 를 통해 dL/de 를 구하려면 dL/de = (dL/dd) * (dd/de) e를 변화했을떄 미치는 L의 변화 = d를 변화했을떄 L의 변화 * e를 변화했을때 d의 변화 (local derivative) + 는 여기서 router의 역할을 하게 된다 그냥 넘어온 dL/dd 를 골고루 뿌려준다. 현재 e,c local deri..

AI/Andrej Karpathy 2023.01.23

dijkstra, systemd, spring, 메소드 참조, Weighted Interval Scheduling, Text Justification, JPA, Quicksort, 스레드, 날짜시간, 람다, Algorithm, 영속성, fft, 자바8, Linux, Interval Scheduling, 파일입출력, Median Find, Matrix Mutilply,

Today :
Yesterday :

관심있는것들

AI/Andrej Karpathy 7

티스토리툴바

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30