Powdered Metal

ML on Apple Silicon,
written in Rust

A complete machine learning platform — from custom Metal GPU kernels and Apple Neural Engine integration to high-level training APIs, a terminal TUI, and a full desktop GUI.

Terminal
$ pmetal train \
  --model Qwen/Qwen3-0.6B \
  --dataset train.jsonl \
  --lora-r 16 --batch-size 4 --learning-rate 2e-4
[████████████████████████████] 100% · 3 epochs · loss: 0.847
Saved LoRA weights to ./output/lora_weights.safetensors
$ pmetal infer --model Qwen/Qwen3-0.6B --lora ./output --chat
> Explain quantum entanglement
Quantum entanglement is a phenomenon where two particles
become correlated...

Custom Metal Kernels

FlashAttention, fused LoRA, fused SwiGLU, fused cross-entropy, and more — all tier-tuned for your specific Apple Silicon chip.

🧠

12 Training Methods

SFT, LoRA, QLoRA, DoRA, DPO, SimPO, ORPO, KTO, GRPO, DAPO, knowledge distillation, and ANE training — all from one toolkit.

🖥️

GUI, TUI, CLI, SDK

A Tauri desktop app, 9-tab terminal interface, 21 CLI commands, Rust SDK with builder API, and Python bindings via PyO3.

Every Interface You Need

A full desktop GUI with 10 pages, a 9-tab terminal TUI, and 21 CLI commands. All backed by the same Rust SDK.

PMetal desktop GUI showing the training dashboard with live loss curves, model configuration, and dataset management

Desktop GUI — Tauri + Svelte with 10 pages for training, inference, merging, and quantization.

PMetal terminal TUI showing live training metrics with braille sparklines, device info, and job history

Terminal TUI — 9 tabs with live loss curves, device info, model search, and interactive chat.

Ship from Rust or Python

High-level builder APIs get you training and inferring in a few lines. Drop to lower-level crates when you need full control.

use pmetal::easy;

// Fine-tune with LoRA
let result = easy::finetune("Qwen/Qwen3-0.6B", "train.jsonl")
    .lora(16, 32.0)
    .learning_rate(2e-4)
    .epochs(3)
    .output("./output")
    .run()
    .await?;

// Inference with streaming
easy::infer("Qwen/Qwen3-0.6B")
    .temperature(0.7)
    .lora("./output/lora_weights.safetensors")
    .generate_streaming("Tell me a story", |delta| {
        print!("{delta}");
        true
    })
    .await?;
import pmetal

# Fine-tune with sensible defaults
result = pmetal.finetune(
    "Qwen/Qwen3-0.6B",
    "train.jsonl",
    lora_r=16,
    learning_rate=2e-4,
    epochs=3,
)
print(f"Loss: {result['final_loss']}")

# Inference with LoRA adapter
text = pmetal.infer(
    "Qwen/Qwen3-0.6B",
    "Explain quantum entanglement",
    lora="./output/lora_weights.safetensors",
)
print(text)

Built for Performance

Custom Metal shaders, tier-aware kernel tuning, and an 18-crate architecture designed for Apple Silicon from the ground up.

Metal

FlashAttention

O(n) memory attention with fused softmax. Block sizes auto-tuned per chip tier and head dimension.

Hardware

Neural Engine

Native ANE pipeline with dynamic weight compilation, hybrid CPU decode, IOSurface zero-copy, M1–M5 support.

Training

Sequence Packing

2–5× throughput by packing multiple sequences per batch. Enabled by default with attention masking.

Models

Model Merging

16 merge strategies — SLERP, TIES, DARE, Task Arithmetic, DELLA, and more. GPU-accelerated with FP8 support.

Models

GGUF Quantization

13 format options from F32 down to Q2K. Importance matrix support for quality-preserving quantization.

Training

Knowledge Distillation

Online, offline, progressive, and TAID methods. Cross-vocabulary support with reasoning-aware rationale distillation.

Model Support

Load from HuggingFace Hub or local safetensors. Architecture detection is automatic.

Inference

Family Variants model_type
Llama 2, 3, 3.1, 3.2, 3.3 llama, llama3
Llama 4 Scout, Maverick llama4
Qwen 2 2, 2.5 qwen2, qwen2_5
Qwen 3 3 qwen3
Qwen 3 MoE 3-MoE qwen3_moe
Qwen 3.5 3.5 (Next) qwen3_next, qwen3_5
DeepSeek V3, V3.2, V3.2-Speciale deepseek, deepseek_v3
Mistral 7B, Mixtral 8×7B mistral, mixtral
Gemma 2, 3 gemma, gemma2, gemma3
Phi 3 3, 3.5 phi, phi3
Phi 4 4 phi4
Cohere Command R cohere, command_r
Granite 3.0, 3.1, Hybrid MoE granite, granitehybrid
NemotronH Hybrid (Mamba+Attn) nemotron_h
StarCoder2 3B, 7B, 15B starcoder2
RecurrentGemma Griffin recurrentgemma, griffin
Jamba 1.5 jamba
Flux 1-dev, 1-schnell flux

LoRA / QLoRA Training

Architecture LoRA QLoRA Notes
Llama Yes Yes Covers Llama 2–3.3. Gradient checkpointing.
Qwen 2 Yes Uses Qwen3 LoRA implementation internally.
Qwen 3 Yes Yes Gradient checkpointing supported.
Qwen 3.5 Yes Hybrid architecture.
Gemma Yes Yes GeGLU activation, special RMSNorm.
Mistral Yes Yes Sliding window attention support.
Phi 3 Yes Partial RoPE, fused gate_up.

Install in Seconds

Prebuilt signed binaries, crates.io, or build from source. Requires macOS on Apple Silicon.

# Download the latest release
curl -fsSL https://github.com/Epistates/pmetal/releases/latest/download/pmetal-aarch64-apple-darwin.tar.gz | tar xz
sudo mv pmetal /usr/local/bin/
# Install from crates.io
cargo install pmetal

# Or with all features
cargo install pmetal --features full
# Build from source
git clone https://github.com/epistates/pmetal.git && cd pmetal
cargo build --release

# Build with GUI (requires bun)
cd crates/pmetal-gui && bun install && bun tauri build