Getting Started

Salut turns your Macs into a distributed LLM inference cluster. Install it, pair your machines, and start querying local language models — all without touching the cloud.

Installation

Download Salut and drag it to your Applications folder. The app runs as a menu bar icon — no dock window, no clutter.

First Launch

When you launch Salut for the first time:

  1. A unique identity is generated — an Ed25519 keypair stored in your Salut data directory.

  2. The control plane starts on port 7258 (configurable in Settings).

  3. Salut begins advertising on your local network via mDNS and browsing for other peers.

  4. The menu bar icon appears — a waving hand.

If other Salut peers are running on your network, they’ll appear in the menu within a few seconds.

Pairing

Salut uses mutual pairing — both sides must agree before inference can flow between peers.

When a new peer is discovered:

  • You’ll receive a notification that a peer wants to pair.

  • Open Settings → Peers to see the request.

  • Click Accept to complete the pairing.

  • The other peer must also accept your pairing request.

Once both sides have accepted, the peers are paired and can participate in distributed inference together.

Using the API

Salut exposes an OpenAI-compatible API at http://localhost:7258. You can use it with any tool that speaks the OpenAI protocol.

curl

curl http://localhost:7258/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/Qwen3-8B-MLX-4bit",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Python (openai library)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:7258/v1", api_key="unused")

response = client.chat.completions.create(
    model="mlx-community/Qwen3-8B-MLX-4bit",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

List Available Models

curl http://localhost:7258/v1/models

This returns all models available across your cluster — both locally loaded and available on paired peers.

Preloading a Model

By default, no model is loaded at startup. You can set a model to preload in Settings → Models.

The model downloads from Hugging Face on first use and is cached locally for subsequent runs.

Next Steps