Skip to content

Supported Models

Model-family support status for inference, embeddings, LoRA/QLoRA training, and direct architecture modules in PMetal.

PMetal supports a wide range of model architectures. Models are loaded from HuggingFace Hub or local safetensors with automatic architecture detection.

All causal language models below work with the CLI (pmetal infer), TUI, GUI, and SDK.

FamilyArchitectureVariantsmodel_type values
LlamaLlama2, 3, 3.1, 3.2, 3.3llama, llama3
Llama 4Llama4Scout, Maverickllama4
Qwen 2Qwen22, 2.5qwen2, qwen2_5
Qwen 3Qwen33qwen3
Qwen 3 MoEQwen3MoE3-MoEqwen3_moe
Qwen 3.5Qwen3Next3.5 (Next)qwen3_next, qwen3_5
DeepSeekDeepSeekV3, V3.2, V3.2-Specialedeepseek, deepseek_v3
MistralMistral7B, Mixtral 8×7Bmistral, mixtral
GemmaGemma2, 3gemma, gemma2, gemma3
Phi 3Phi3, 3.5phi, phi3
Phi 4Phi44phi4
CohereCohereCommand Rcohere, command_r
GraniteGranite3.0, 3.1, Hybrid MoEgranite, granitehybrid
NemotronHNemotronHHybrid (Mamba+Attention)nemotron_h
GPT-OSSGptOss20B, 120Bgpt_oss, gpt-oss
Gemma 4Gemma44gemma4, gemma4_text
FamilyArchitectureVariantsmodel_type values
BERTBertBERT, RoBERTa, DistilBERT, XLM-RoBERTabert, roberta, distilbert, xlm-roberta, xlm_roberta
ArchitectureLoRAQLoRANotes
LlamaYesYesCovers Llama 2–3.3. Gradient checkpointing supported.
Llama 4YesYesScout/Maverick support via DynamicLoraModel.
Qwen 2YesYesUses Qwen3 LoRA implementation internally.
Qwen 3YesYesGradient checkpointing supported.
Qwen 3 MoEYesYesSparse MoE support.
Qwen 3.5 (Next)YesYesHybrid architecture with nested text_config.
GemmaYesYesGeGLU activation, special RMSNorm.
Gemma 4YesYesMultimodal-era Gemma text path.
MistralYesYesSliding window attention support.
Phi 3/4YesYesPartial RoPE, fused gate_up projection.
DeepSeekYesYesV3-family support.
CohereYesYesCommand R support.
GraniteYesYesDense and hybrid variants.
NemotronHYesYesHybrid architecture support.
GPT-OSSYesYesMoE variants.

Architecture Modules (Not Yet in Dispatcher)

Section titled “Architecture Modules (Not Yet in Dispatcher)”

These have implementations in pmetal-models but are not in the DynamicModel dispatcher:

FamilyModuleNotes
Pixtralpixtral12B vision-language
Qwen2-VLqwen2_vl2B, 7B vision-language
MLlamamllamaLlama 3.2-Vision
CLIPclipViT-L/14 vision encoder
WhisperwhisperBase–Large speech models
T5t5Encoder-decoder architecture

These can be used directly via their Rust types (e.g., pmetal_models::architectures::pixtral::Pixtral).

FamilyVariantsStatus
Flux1-dev, 1-schnellDispatcher + pipeline implemented