Powdered Metal
A complete machine learning platform — from custom Metal GPU kernels and Apple Neural Engine integration to high-level training APIs, a terminal TUI, and a full desktop GUI.
FlashAttention, fused LoRA, fused SwiGLU, fused cross-entropy, and more — all tier-tuned for your specific Apple Silicon chip.
SFT, LoRA, QLoRA, DoRA, DPO, SimPO, ORPO, KTO, GRPO, DAPO, knowledge distillation, and ANE training — all from one toolkit.
A Tauri desktop app, 9-tab terminal interface, 21 CLI commands, Rust SDK with builder API, and Python bindings via PyO3.
A full desktop GUI with 10 pages, a 9-tab terminal TUI, and 21 CLI commands. All backed by the same Rust SDK.
Desktop GUI — Tauri + Svelte with 10 pages for training, inference, merging, and quantization.
Terminal TUI — 9 tabs with live loss curves, device info, model search, and interactive chat.
High-level builder APIs get you training and inferring in a few lines. Drop to lower-level crates when you need full control.
use pmetal::easy;
// Fine-tune with LoRA
let result = easy::finetune("Qwen/Qwen3-0.6B", "train.jsonl")
.lora(16, 32.0)
.learning_rate(2e-4)
.epochs(3)
.output("./output")
.run()
.await?;
// Inference with streaming
easy::infer("Qwen/Qwen3-0.6B")
.temperature(0.7)
.lora("./output/lora_weights.safetensors")
.generate_streaming("Tell me a story", |delta| {
print!("{delta}");
true
})
.await?; import pmetal
# Fine-tune with sensible defaults
result = pmetal.finetune(
"Qwen/Qwen3-0.6B",
"train.jsonl",
lora_r=16,
learning_rate=2e-4,
epochs=3,
)
print(f"Loss: {result['final_loss']}")
# Inference with LoRA adapter
text = pmetal.infer(
"Qwen/Qwen3-0.6B",
"Explain quantum entanglement",
lora="./output/lora_weights.safetensors",
)
print(text) Custom Metal shaders, tier-aware kernel tuning, and an 18-crate architecture designed for Apple Silicon from the ground up.
O(n) memory attention with fused softmax. Block sizes auto-tuned per chip tier and head dimension.
Native ANE pipeline with dynamic weight compilation, hybrid CPU decode, IOSurface zero-copy, M1–M5 support.
2–5× throughput by packing multiple sequences per batch. Enabled by default with attention masking.
16 merge strategies — SLERP, TIES, DARE, Task Arithmetic, DELLA, and more. GPU-accelerated with FP8 support.
13 format options from F32 down to Q2K. Importance matrix support for quality-preserving quantization.
Online, offline, progressive, and TAID methods. Cross-vocabulary support with reasoning-aware rationale distillation.
Load from HuggingFace Hub or local safetensors. Architecture detection is automatic.
| Family | Variants | model_type |
|---|---|---|
| Llama | 2, 3, 3.1, 3.2, 3.3 | llama, llama3 |
| Llama 4 | Scout, Maverick | llama4 |
| Qwen 2 | 2, 2.5 | qwen2, qwen2_5 |
| Qwen 3 | 3 | qwen3 |
| Qwen 3 MoE | 3-MoE | qwen3_moe |
| Qwen 3.5 | 3.5 (Next) | qwen3_next, qwen3_5 |
| DeepSeek | V3, V3.2, V3.2-Speciale | deepseek, deepseek_v3 |
| Mistral | 7B, Mixtral 8×7B | mistral, mixtral |
| Gemma | 2, 3 | gemma, gemma2, gemma3 |
| Phi 3 | 3, 3.5 | phi, phi3 |
| Phi 4 | 4 | phi4 |
| Cohere | Command R | cohere, command_r |
| Granite | 3.0, 3.1, Hybrid MoE | granite, granitehybrid |
| NemotronH | Hybrid (Mamba+Attn) | nemotron_h |
| StarCoder2 | 3B, 7B, 15B | starcoder2 |
| RecurrentGemma | Griffin | recurrentgemma, griffin |
| Jamba | 1.5 | jamba |
| Flux | 1-dev, 1-schnell | flux |
| Architecture | LoRA | QLoRA | Notes |
|---|---|---|---|
| Llama | Yes | Yes | Covers Llama 2–3.3. Gradient checkpointing. |
| Qwen 2 | Yes | — | Uses Qwen3 LoRA implementation internally. |
| Qwen 3 | Yes | Yes | Gradient checkpointing supported. |
| Qwen 3.5 | Yes | — | Hybrid architecture. |
| Gemma | Yes | Yes | GeGLU activation, special RMSNorm. |
| Mistral | Yes | Yes | Sliding window attention support. |
| Phi 3 | Yes | — | Partial RoPE, fused gate_up. |
Prebuilt signed binaries, crates.io, or build from source. Requires macOS on Apple Silicon.
# Download the latest release
curl -fsSL https://github.com/Epistates/pmetal/releases/latest/download/pmetal-aarch64-apple-darwin.tar.gz | tar xz
sudo mv pmetal /usr/local/bin/ # Install from crates.io
cargo install pmetal
# Or with all features
cargo install pmetal --features full # Build from source
git clone https://github.com/epistates/pmetal.git && cd pmetal
cargo build --release
# Build with GUI (requires bun)
cd crates/pmetal-gui && bun install && bun tauri build