部署nanoGPT

安装Docker

1
2
3
4
sudo docker --version
sudo ducker run hello-world
docker ps -a #容器粒度
docker ps #运行中的
  • 容器是对镜像的虚拟实例化

运行模型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
root@6ec195fa4e80:/workspace# python train.py config/train_shakespeare_char.py
Overriding config with config/train_shakespeare_char.py:
# train a miniature character-level shakespeare model
# good for debugging and playing on macbooks and such

out_dir = 'out-shakespeare-char'
eval_interval = 250 # keep frequent because we'll overfit
eval_iters = 200
log_interval = 10 # don't print too too often

# we expect to overfit on this small dataset, so only save when val improves
always_save_checkpoint = False

wandb_log = False # override via command line if you like
wandb_project = 'shakespeare-char'
wandb_run_name = 'mini-gpt'

dataset = 'shakespeare_char'
gradient_accumulation_steps = 1
batch_size = 64
block_size = 256 # context of up to 256 previous characters

# baby GPT model :)
n_layer = 6
n_head = 6
n_embd = 384
dropout = 0.2

learning_rate = 1e-3 # with baby networks can afford to go a bit higher
max_iters = 5000
lr_decay_iters = 5000 # make equal to max_iters usually
min_lr = 1e-4 # learning_rate / 10 usually
beta2 = 0.99 # make a bit bigger because number of tokens per iter is small

warmup_iters = 100 # not super necessary potentially

# on macbook also add
# device = 'cpu' # run on cpu only
# compile = False # do not torch compile the model

tokens per iteration will be: 16,384
found vocab_size = 65 (inside data/shakespeare_char/meta.pkl)
Initializing a new model from scratch
number of parameters: 10.65M
/workspace/train.py:196: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
scaler = torch.cuda.amp.GradScaler(enabled=(dtype == 'float16'))
num decayed parameter tensors: 26, with 10,740,096 parameters
num non-decayed parameter tensors: 13, with 4,992 parameters
using fused AdamW: True
compiling the model... (takes a ~minute)
step 0: train loss 4.2874, val loss 4.2823
iter 0: loss 4.2654, time 31455.46ms, mfu -100.00%
iter 10: loss 3.1462, time 30.46ms, mfu 12.23%
iter 20: loss 2.7322, time 30.10ms, mfu 12.25%
iter 30: loss 2.6184, time 30.09ms, mfu 12.26%
iter 40: loss 2.5757, time 30.33ms, mfu 12.26%
iter 50: loss 2.5249, time 30.35ms, mfu 12.26%
iter 60: loss 2.5143, time 30.17ms, mfu 12.27%
iter 70: loss 2.4947, time 30.19ms, mfu 12.28%
iter 80: loss 2.4936, time 30.05ms, mfu 12.29%
iter 90: loss 2.4679, time 30.50ms, mfu 12.28%
iter 100: loss 2.4594, time 30.74ms, mfu 12.27%
iter 110: loss 2.4667, time 30.73ms, mfu 12.25%
iter 120: loss 2.4262, time 30.29ms, mfu 12.26%
iter 130: loss 2.4127, time 30.42ms, mfu 12.26%
iter 140: loss 2.4148, time 30.33ms, mfu 12.26%
iter 150: loss 2.4139, time 30.26ms, mfu 12.27%
  • 迭代5000次 250会在out-shakespeare-char目录下产生ckpt.pt用于后续sample.py的推理

部署nanoGPT
http://sjx.com/2025/02/18/部署nanoGPT/
作者
sjx
发布于
2025年2月18日
许可协议