Skip to content

pmetal quantize

Quantize a model to GGUF format for efficient inference. Supports importance matrix for quality-preserving quantization.

Terminal window
pmetal quantize \
--model <MODEL> \
--output <OUTPUT_FILE> \
--type <QUANT_TYPE> \
[OPTIONS]
Terminal window
# 4-bit quantization
pmetal quantize \
--model ./output \
--output model.gguf --type q4km
# With importance matrix
pmetal quantize \
--model ./output \
--output model.gguf --type q4km \
--imatrix calibration.jsonl
# Dynamic per-layer quantization
pmetal quantize \
--model ./output \
--output model.gguf --type dynamic
FormatDescription
dynamicAuto-select per layer
q8_08-bit quantization
q6k6-bit k-quant
q5km5-bit k-quant (medium)
q5ks5-bit k-quant (small)
q4km4-bit k-quant (medium)
q4ks4-bit k-quant (small)
q3km3-bit k-quant (medium)
q3ks3-bit k-quant (small)
q3kl3-bit k-quant (large)
q2k2-bit k-quant
f16Float16
f32Float32