Network

Salut’s clustering controls determine how your machine participates in the peer network. You can be a full participant, a client that offloads work, or completely standalone.

Participation Modes

Cluster Mode (Default)

Your machine both advertises itself on the network and accepts inference work from paired peers.

Advertise

Yes — other peers can discover you

Browse

Yes — you discover other peers

Accept inbound

Yes — paired peers can send you work

Route outbound

Yes — you can distribute work to peers

This is the default and the right choice for most setups. Every machine contributes its GPU to the shared pool.

Client-Only Mode

Your machine discovers and routes to other peers but doesn’t advertise itself or accept inbound work.

Advertise

No

Browse

Yes

Accept inbound

No

Route outbound

Yes

Use this for machines you want to use as API clients without contributing their GPU. For example, a laptop that sends queries to more powerful desktops.

Enable in Settings → General by toggling Client-Only mode.

Set via SALUT_CLIENT_ONLY=1 environment variable.

Solo Mode

Your machine operates standalone — no advertising, no browsing, no peer communication.

Advertise

No

Browse

No

Accept inbound

No

Route outbound

No

Toggle clustering off in Settings → General or from the menu bar. All inference runs locally on your machine only.

Set via SALUT_CLUSTER_ENABLED=0 environment variable.

Distributed Inference

When a model is too large for a single machine’s GPU memory, Salut distributes it across the cluster automatically.

How It Works

  1. A request comes in for a model (e.g., a 70B model).

  2. The coordinator node builds a rendezvous plan — mapping transformer layers to peers based on available VRAM and bandwidth.

  3. Each peer loads its assigned shard (a contiguous range of layers).

  4. During inference, activations flow between peers in sequence: peer 1 processes layers 0–19, sends the result to peer 2 for layers 20–39, and so on.

  5. The final peer returns the output to the coordinator, which sends the response to the client.

Layer Assignment

Layers are assigned based on each peer’s estimated capacity:

  • Available VRAM — more memory means more layers

  • GPU type — Metal (Apple Silicon) performance characteristics

  • Network bandwidth — peers with faster connections get more layers to minimize activation transfer overhead

The assignment is recalculated whenever the cluster composition changes (peers join, leave, or become unhealthy).

Activation Transport

Activations (the intermediate tensors passed between peers) can be transported in two modes:

Passthrough (default)

Full-precision fp16 activations. Best quality, higher bandwidth usage.

INT8 Blockwise

Activations are quantized to 8-bit integers using a blockwise scheme (similar to Petals). Cuts bandwidth in half with minimal quality loss.

Configure in Settings → General or via SALUT_QUANT_ACTIVATIONS=1.

Cluster Tokens

For lab or office setups with many machines, manual pairing is tedious. Cluster tokens provide automatic pairing.

How It Works

  1. Choose a shared secret (any string) and set it on all machines.

  2. Set it in Settings → General → Cluster Token or via SALUT_CLUSTER_TOKEN environment variable.

  3. When peers discover each other, they compare HMAC-SHA256 hashes of their tokens and fingerprints.

  4. If the HMACs match, the peers are automatically paired — no manual acceptance needed.

Security Considerations

  • The token itself is never sent over the network — only the HMAC.

  • Tokens should be treated like passwords. Use a strong, unique value.

  • Peers with mismatched tokens go through the normal manual pairing flow.

  • You can use both: cluster token for trusted lab machines, manual pairing for ad-hoc guests.

Tip

A simple way to distribute the token: set it as an environment variable in a shared dotfile, configuration management tool, or launch script.