pmetal serve
Start an HTTP inference server with an OpenAI-compatible API. Requires the serve feature flag.
pmetal serve --model <MODEL> [OPTIONS]Examples
Section titled “Examples”# Start serverpmetal serve --model Qwen/Qwen3-0.6B --port 8080
# With LoRA adapterpmetal serve --model Qwen/Qwen3-0.6B --lora ./output/lora_weights.safetensors --port 8080API Compatibility
Section titled “API Compatibility”The server exposes OpenAI-compatible endpoints:
POST /v1/chat/completions— Chat completionsPOST /v1/completions— Text completionsGET /v1/models— List loaded models
curl http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "Qwen/Qwen3-0.6B", "messages": [{"role": "user", "content": "Hello"}]}'See Also
Section titled “See Also”- pmetal infer — Interactive inference
- Feature Flags — Enable the serve feature