Powdered Metal

ML on Apple Silicon,
written in Rust

A complete machine learning platform — from custom Metal GPU kernels and Apple Neural Engine integration to high-level training APIs, a terminal TUI, and a full desktop GUI.

Get Started View on GitHub

Terminal

$ pmetal train \

--model Qwen/Qwen3-0.6B \

--dataset train.jsonl \

--lora-r 16 --batch-size 4 --learning-rate 2e-4

[████████████████████████████] 100% · 3 epochs · loss: 0.847

✓ Saved LoRA weights to ./output/lora_weights.safetensors

$ pmetal infer --model Qwen/Qwen3-0.6B --lora ./output --chat

> Explain quantum entanglement

Quantum entanglement is a phenomenon where two particles

become correlated...

⚡

Custom Metal Kernels

FlashAttention, fused LoRA, fused SwiGLU, fused cross-entropy, and more — all tier-tuned for your specific Apple Silicon chip.

🧠

12 Training Methods

SFT, LoRA, QLoRA, DoRA, DPO, SimPO, ORPO, KTO, GRPO, DAPO, knowledge distillation, and ANE training — all from one toolkit.

🖥️

GUI, TUI, CLI, SDK

A Tauri desktop app, 9-tab terminal interface, 21 CLI commands, Rust SDK with builder API, and Python bindings via PyO3.

Every Interface You Need

A full desktop GUI with 10 pages, a 9-tab terminal TUI, and 21 CLI commands. All backed by the same Rust SDK.

PMetal desktop GUI showing the training dashboard with live loss curves, model configuration, and dataset management

Desktop GUI — Tauri + Svelte with 10 pages for training, inference, merging, and quantization.

PMetal terminal TUI showing live training metrics with braille sparklines, device info, and job history

Terminal TUI — 9 tabs with live loss curves, device info, model search, and interactive chat.

Ship from Rust or Python

High-level builder APIs get you training and inferring in a few lines. Drop to lower-level crates when you need full control.

Rust Python

use pmetal::easy;

// Fine-tune with LoRA
let result = easy::finetune("Qwen/Qwen3-0.6B", "train.jsonl")
    .lora(16, 32.0)
    .learning_rate(2e-4)
    .epochs(3)
    .output("./output")
    .run()
    .await?;

// Inference with streaming
easy::infer("Qwen/Qwen3-0.6B")
    .temperature(0.7)
    .lora("./output/lora_weights.safetensors")
    .generate_streaming("Tell me a story", |delta| {
        print!("{delta}");
        true
    })
    .await?;

import pmetal

# Fine-tune with sensible defaults
result = pmetal.finetune(
    "Qwen/Qwen3-0.6B",
    "train.jsonl",
    lora_r=16,
    learning_rate=2e-4,
    epochs=3,
)
print(f"Loss: {result['final_loss']}")

# Inference with LoRA adapter
text = pmetal.infer(
    "Qwen/Qwen3-0.6B",
    "Explain quantum entanglement",
    lora="./output/lora_weights.safetensors",
)
print(text)

Built for Performance

Custom Metal shaders, tier-aware kernel tuning, and an 18-crate architecture designed for Apple Silicon from the ground up.

Metal

FlashAttention

O(n) memory attention with fused softmax. Block sizes auto-tuned per chip tier and head dimension.

Hardware

Neural Engine

Native ANE pipeline with dynamic weight compilation, hybrid CPU decode, IOSurface zero-copy, M1–M5 support.

Training

Sequence Packing

2–5× throughput by packing multiple sequences per batch. Enabled by default with attention masking.

Models

Model Merging

16 merge strategies — SLERP, TIES, DARE, Task Arithmetic, DELLA, and more. GPU-accelerated with FP8 support.

Models

GGUF Quantization

13 format options from F32 down to Q2K. Importance matrix support for quality-preserving quantization.

Training

Knowledge Distillation

Online, offline, progressive, and TAID methods. Cross-vocabulary support with reasoning-aware rationale distillation.

Model Support

Load from HuggingFace Hub or local safetensors. Architecture detection is automatic.

Inference

Family	Variants	model_type
Llama	2, 3, 3.1, 3.2, 3.3	`llama, llama3`
Llama 4	Scout, Maverick	`llama4`
Qwen 2	2, 2.5	`qwen2, qwen2_5`
Qwen 3	3	`qwen3`
Qwen 3 MoE	3-MoE	`qwen3_moe`
Qwen 3.5	3.5 (Next)	`qwen3_next, qwen3_5`
DeepSeek	V3, V3.2, V3.2-Speciale	`deepseek, deepseek_v3`
Mistral	7B, Mixtral 8×7B	`mistral, mixtral`
Gemma	2, 3	`gemma, gemma2, gemma3`
Phi 3	3, 3.5	`phi, phi3`
Phi 4	4	`phi4`
Cohere	Command R	`cohere, command_r`
Granite	3.0, 3.1, Hybrid MoE	`granite, granitehybrid`
NemotronH	Hybrid (Mamba+Attn)	`nemotron_h`
StarCoder2	3B, 7B, 15B	`starcoder2`
RecurrentGemma	Griffin	`recurrentgemma, griffin`
Jamba	1.5	`jamba`
Flux	1-dev, 1-schnell	`flux`

LoRA / QLoRA Training

Architecture	LoRA	QLoRA	Notes
Llama	Yes	Yes	Covers Llama 2–3.3. Gradient checkpointing.
Qwen 2	Yes	—	Uses Qwen3 LoRA implementation internally.
Qwen 3	Yes	Yes	Gradient checkpointing supported.
Qwen 3.5	Yes	—	Hybrid architecture.
Gemma	Yes	Yes	GeGLU activation, special RMSNorm.
Mistral	Yes	Yes	Sliding window attention support.
Phi 3	Yes	—	Partial RoPE, fused gate_up.

Install in Seconds

Prebuilt signed binaries, crates.io, or build from source. Requires macOS on Apple Silicon.

Binary Cargo Source

# Download the latest release
curl -fsSL https://github.com/Epistates/pmetal/releases/latest/download/pmetal-aarch64-apple-darwin.tar.gz | tar xz
sudo mv pmetal /usr/local/bin/

# Install from crates.io
cargo install pmetal

# Or with all features
cargo install pmetal --features full

# Build from source
git clone https://github.com/epistates/pmetal.git && cd pmetal
cargo build --release

# Build with GUI (requires bun)
cd crates/pmetal-gui && bun install && bun tauri build

ML on Apple Silicon, written in Rust