Getting Started¶

Salut turns your Macs into a distributed LLM inference cluster. Install it, pair your machines, and start querying local language models — all without touching the cloud.

Installation¶

Download Salut and drag it to your Applications folder. The app runs as a menu bar icon — no dock window, no clutter.

First Launch¶

When you launch Salut for the first time:

A unique identity is generated — an Ed25519 keypair stored in your Salut data directory.
The control plane starts on port 7258 (configurable in Settings).
Salut begins advertising on your local network via mDNS and browsing for other peers.
The menu bar icon appears — a waving hand.

If other Salut peers are running on your network, they’ll appear in the menu within a few seconds.

Pairing¶

Salut uses mutual pairing — both sides must agree before inference can flow between peers.

When a new peer is discovered:

You’ll receive a notification that a peer wants to pair.
Open Settings → Peers to see the request.
Click Accept to complete the pairing.
The other peer must also accept your pairing request.

Once both sides have accepted, the peers are paired and can participate in distributed inference together.

Using the API¶

Salut exposes an OpenAI-compatible API at http://localhost:7258. You can use it with any tool that speaks the OpenAI protocol.

curl¶

curl http://localhost:7258/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/Qwen3-8B-MLX-4bit",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Python (openai library)¶

from openai import OpenAI

client = OpenAI(base_url="http://localhost:7258/v1", api_key="unused")

response = client.chat.completions.create(
    model="mlx-community/Qwen3-8B-MLX-4bit",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

List Available Models¶

curl http://localhost:7258/v1/models

This returns all models available across your cluster — both locally loaded and available on paired peers.

Preloading a Model¶

By default, no model is loaded at startup. You can set a model to preload in Settings → Models.

The model downloads from Hugging Face on first use and is cached locally for subsequent runs.

Next Steps¶

Learn about Settings to customize port, display name, and more.
Understand Peers & Trust to manage your cluster.
Read about Clustering for advanced multi-node setups.
See the API reference for all available endpoints.