pdb+gdb启动微调

245机器上启动调试：

1
2
3

conda activate torch_new_env
cd /home/dell/sdb/LLaMA-Factory
export FORCE_TORCHRUN=1

CUDA_VISIBLE_DEVICES=0 python3 $(which llamafactory-cli) train   --stage sft   --do_train   --model_name_or_path /home/dell/sdb/.cache/Qwen2-0___5B-Instruct   --dataset identity   --dataset_dir ./data   --template qwen   --finetuning_type freeze   --output_dir /home/dell/sdb/saves/Qwen2-0___5B-Instruct/freeze/sft   --overwrite_cache   --overwrite_output_dir   --cutoff_len 1024   --preprocessing_num_workers 16   --per_device_train_batch_size 2   --per_device_eval_batch_size 1   --gradient_accumulation_steps 8   --lr_scheduler_type cosine   --logging_steps 50   --warmup_steps 20   --save_steps 100   --eval_steps 50   --evaluation_strategy steps   --load_best_model_at_end   --learning_rate 5e-5   --num_train_epochs 5.0   --max_samples 1000   --val_size 0.1   --plot_loss   --fp16   --deepspeed examples/deepspeed/ds_z3_offload_config.json

开第二个窗口查看现有进程

1 2	`ps aux \| grep python sudo gdb python pid`

Llama_factory

pdb+gdb启动微调

http://sjx.com/2024/11/27/pdb启动微调-1/

作者

sjx

发布于

2024年11月27日

许可协议

deepspeed/runtime/engine.py 上一篇

检查点存储流程下一篇