Skip to content

pmetal pretrain

Run full-parameter pretraining from tokenized binary shards. This is separate from LoRA/QLoRA fine-tuning and is designed for packed text corpora.

Terminal window
pmetal pretrain \
--arch <ARCH> \
--shards <SHARD_GLOB_OR_LIST> \
[OPTIONS]
Terminal window
pmetal tokenize \
--input corpus.jsonl \
--output ./shards \
--tokenizer Qwen/Qwen3-0.6B
pmetal pretrain \
--arch qwen \
--shards "./shards/*.bin" \
--seq-len 2048 \
--batch-size 4 \
--steps 10000 \
--output ./pretrain-output
ParameterDefaultDescription
--archrequiredModel architecture, such as llama, qwen, gemma, mistral, phi, or gpt-oss
--shardsrequiredGlob pattern or comma-separated list of .bin shard files
--seq-len2048Packed sequence length
--batch-size4Batch size
--steps10000Training steps
--learning-rate3e-4Peak learning rate
--min-lr1e-5Cosine floor learning rate
--warmup-steps1000Linear warmup steps
--lr-schedulecosineconstant, linear, or cosine
--weight-decay0.1AdamW weight decay
--max-grad-norm1.0Gradient clipping; 0 disables
--eos-token-id0EOS token inserted between documents
--output./pretrain-outputOutput and checkpoint directory
--checkpoint-every1000Save checkpoint every N steps; 0 disables
--resumeResume from checkpoint directory
--model-configModel config JSON overriding architecture defaults
--z-loss0.0MoE router z-loss coefficient
--gradient-accumulation-steps1Effective batch multiplier
--log-every10Log every N steps
--eval-every0Evaluate every N steps; 0 disables
--eval-batches10Batches per eval round
--seed42Random seed