Troubleshooting

Common issues and their solutions.

Peers Not Discovering Each Other

Symptoms: The menu bar shows no peers, or peers that should be visible aren’t appearing.

Check mDNS/Bonjour:

  • Both machines must be on the same local network (same subnet).

  • If you’re on a corporate or university network, mDNS may be blocked between subnets. Try a direct connection or ask your IT team about mDNS forwarding.

  • On macOS, Bonjour is always enabled. No configuration needed.

Check the firewall:

  • Salut uses port 7258 (TCP) and mDNS port 5353 (UDP).

  • In System Settings → Network → Firewall, make sure Salut is allowed to accept incoming connections.

Check cluster mode:

  • If clustering is disabled (solo mode), peers won’t be discovered. Toggle it on in Settings → General.

  • If the peer is in client-only mode, it won’t advertise — but it can still discover you.

Verify port consistency:

  • All peers must use the same port. Check Settings → General → Port on each machine.

Model Won’t Load

Symptoms: The model field shows a model ID but inference requests fail, or the model stays in a “loading” state.

Check available memory:

  • Open Activity Monitor and check Memory Pressure. If it’s red, you don’t have enough free memory for the model.

  • See the model sizing table for approximate VRAM requirements.

  • Close other GPU-intensive applications to free memory.

Check the model ID:

  • The model must be in MLX format. Standard Hugging Face models (PyTorch, GGUF) won’t work.

  • Look for models in the mlx-community organization on Hugging Face.

  • Verify the model ID is spelled correctly — Hugging Face IDs are case-sensitive.

Check network access:

  • The first time a model is loaded, it downloads from Hugging Face. This requires internet access.

  • If you’re behind a corporate proxy, set the HTTPS_PROXY environment variable.

  • Subsequent loads use the cached version in ~/.cache/huggingface/.

Distributed Inference Not Working

Symptoms: A model that should be distributed across peers is only running locally, or distributed requests fail with errors.

Check pairing status:

  • Open Settings → Peers. Only paired peers participate in distributed inference.

  • If a peer shows as pending, complete the pairing flow on both sides.

Check peer health:

  • In the Peers tab, check the health status of paired peers. Unhealthy peers are excluded from distribution.

  • A peer may be unhealthy if it’s overloaded, unreachable, or has a network issue.

Check model availability:

  • All peers participating in distributed inference must be able to load the model (or their shard of it).

  • The coordinator builds a rendezvous plan based on available VRAM — if no peer has enough combined memory, distribution won’t be attempted.

Slow Performance

Symptoms: Inference is running but tokens are generated slowly.

For local inference:

  • Check that no other GPU-intensive tasks are running (video encoding, games, other ML workloads).

  • Smaller quantized models (4-bit) are faster than larger ones (8-bit).

  • Close browser tabs with WebGL content — they compete for GPU time.

For distributed inference:

  • Check network bandwidth between peers. Gigabit Ethernet is recommended; Wi-Fi works but is slower.

  • Consider enabling INT8 activation transport in Settings → General to halve the bandwidth needed between peers.

  • Fewer, more powerful peers outperform many weak ones (less activation transfer overhead).

Beachball or Unresponsive App

If the Salut menu bar app becomes unresponsive:

  1. Wait a few seconds — some operations (model loading, discovery changes) can briefly block.

  2. If it persists, force quit from Activity Monitor and relaunch.

  3. Check Console.app for Salut log messages — filter by process name “Salut” to see what was happening.

If this happens reliably, increase the log level to Debug in Settings and reproduce the issue. The debug logs will help diagnose the cause.

Resetting Salut

To start fresh, remove the Salut data directory:

rm -rf ~/Library/Application\ Support/run.salut.Salut/

This removes:

  • Your identity keypair (a new one will be generated on next launch)

  • The peer trust database (all pairings will need to be re-established)

  • Settings (will reset to defaults)

Model caches (in ~/.cache/huggingface/) are separate and not affected.

Getting Help

If you can’t resolve an issue:

  1. Set log level to Debug in Settings.

  2. Reproduce the issue.

  3. Check Console.app for Salut log messages — the debug output usually reveals the cause.