Skip to content

pmetal cluster

Run multi-Mac cluster operations with mDNS discovery, fabric classification, Thunderbolt-first ring formation, all-reduce benchmarks, and distributed train/serve forwarding.

The distributed feature is enabled by default for the stock pmetal binary.

SubcommandDescription
upAdvertise this node, discover peers, form a ring, and hold the connection open
statusPrint local interfaces, discovered peers, and fabric classification
benchRun all-reduce throughput benchmarks across the ring
pipeline-benchRun the pipeline activation transport harness
trainWrapper placeholder; use pmetal train --distributed-auto directly
serveWrapper placeholder pending per-architecture partial-layer execution

Run on every Mac:

Terminal window
pmetal cluster status
pmetal cluster up

Then benchmark or train:

Terminal window
pmetal cluster bench --mb 64 --iters 10
pmetal cluster pipeline-bench --tokens 16 --layers 32
pmetal train \
--model Qwen/Qwen3-0.6B \
--dataset train.jsonl \
--distributed-auto \
--compression-strategy fp16
ParameterDefaultDescription
--discovery-port52415mDNS/libp2p discovery port
--gradient-port52416Gradient exchange port
--activation-port52417Pipeline activation port
--result-port52418Pipeline result-loopback port
--timeout60Discovery timeout in seconds
--min-peers1Minimum peers before proceeding
--jsonfalseEmit JSON where supported

PMetal classifies local interfaces and prefers Thunderbolt over Ethernet over Wi-Fi when forming the distributed ring. If a faster fabric disappears during a job, distributed components can fall back to available paths.