Skip to content

pmetal train

Fine-tune a model with LoRA or QLoRA. Supports SFT on supported architectures with automatic hardware detection and kernel tuning.

Terminal window
pmetal train \
--model <MODEL> \
--dataset <DATASET> \
--output <OUTPUT_DIR> \
[OPTIONS]
Terminal window
# Basic LoRA fine-tuning
pmetal train \
--model Qwen/Qwen3-0.6B \
--dataset train.jsonl \
--output ./output \
--lora-r 16 --batch-size 4 --learning-rate 2e-4
# QLoRA with 4-bit quantization
pmetal train \
--model meta-llama/Llama-3.2-1B \
--dataset train.jsonl \
--output ./output \
--quantization nf4 --lora-r 16
# Custom schedule
pmetal train \
--model Qwen/Qwen3-0.6B \
--dataset train.jsonl \
--lr-schedule cosine_with_restarts
# From a config file
pmetal train --config training.yaml
ParameterDefaultDescription
--modelrequiredHuggingFace model ID or local path
--datasetrequiredPath to training dataset (JSONL, Parquet, CSV)
--output./outputOutput directory for weights and logs
--lora-r16LoRA rank
--lora-alpha32.0LoRA scaling factor (2× rank)
--batch-size1Micro-batch size
--learning-rate2e-4Learning rate
--max-seq-len0Max sequence length (0 = auto-detect)
--epochs1Number of training epochs
--max-grad-norm1.0Gradient clipping
--quantizationnoneQLoRA method: nf4, fp4, int8
--gradient-accumulation-steps4Gradient accumulation steps
--anefalseEnable experimental ANE training when compiled with ane
--embedding-lrNoneSeparate LR for embeddings
--no-metal-fused-optimizerfalseDisable Metal fused optimizer
--lr-schedulecosineconstant, linear, cosine, cosine_with_restarts, polynomial, wsd
--no-gradient-checkpointingfalseDisable gradient checkpointing
--gradient-checkpointing-layers4Layers per checkpoint block
--warmup-steps0Learning rate warmup steps
--weight-decay0.01AdamW weight decay
--no-sequence-packingfalseDisable sequence packing
--pack-max-seq-lenOverride adaptive sequence-packing length
--cut-cross-entropyfalseMemory-efficient loss (avoids full logit materialization)
--eval-datasetOptional evaluation dataset
--log-metricsWrite training metrics JSONL
--no-adaptive-lrfalseDisable automatic adaptive LR
--text-columnCustom JSONL column name for training text
--text-columnsMulti-column concat (comma-separated, e.g. thinking,solution)
--prompt-columnColumn for prompt (enables SFT loss masking)
--response-columnColumn for response (with prompt masking)
--column-separator\n\nSeparator for --text-columns
--distributed-autofalseDiscover peers and run distributed training when compiled with distributed
--distributed-peersExplicit distributed peer addresses
--compression-strategynoneDistributed gradient compression strategy
--configPath to YAML configuration file

Training data is auto-detected:

  • ShareGPT: {"conversations": [{"from": "human", "value": "..."}, ...]}
  • Alpaca: {"instruction": "...", "input": "...", "output": "..."}
  • OpenAI/Messages: {"messages": [{"role": "user", "content": "..."}, ...]}
  • Reasoning: {"problem": "...", "thinking": "...", "solution": "..."}
  • Simple: {"text": "..."}
  • Parquet: Standard text columns or reasoning formats

Use --text-column for arbitrary field names, or --text-columns to concatenate multiple columns:

Terminal window
# Single custom column
pmetal train --model ... --dataset data.jsonl --text-column response
# Concatenate thinking + solution columns
pmetal train --model ... --dataset data.jsonl \
--text-columns thinking,solution --column-separator "\n\n"
# SFT loss masking (only train on response, mask prompt)
pmetal train --model ... --dataset data.jsonl \
--prompt-column instruction --response-column output

Training produces:

  • lora_weights.safetensors — LoRA adapter weights
  • training_metrics.jsonl — Per-step metrics log
  • checkpoint/ — Resumable checkpoints (if training is interrupted)