Node overview
ai-max
Primary compute node. Handles the most demanding inference workloads and hosts the majority of GPU-accelerated model runs.
ai-mini-x1
Secondary compute node. Runs inference in parallel with ai-max to distribute load and increase throughput.
jarvis-brain
Central orchestration node. Routes requests through LiteLLM, manages agents, and coordinates across the mesh.
dell-micro
Lightweight edge node. Handles low-latency, lightweight tasks without drawing on primary compute resources.
synologynas
Storage and NAS node. Persists model weights, vector stores, logs, and shared data across the mesh.
How nodes work together
Requests you send to Jarvis flow through a coordinated pipeline:- jarvis-brain receives the request and routes it via LiteLLM based on the model and load conditions.
- ai-max and ai-mini-x1 run inference using their GPUs — two GPUs are available across the mesh for parallel workloads.
- dell-micro handles edge tasks — fast, low-resource completions that don’t need full GPU compute.
- synologynas provides shared storage that all nodes can read from and write to, including model weights and memory stores.
You don’t need to target individual nodes for most tasks. LiteLLM routes requests automatically based on model availability and current load.
Node roles at a glance
| Node | Role | GPU | Best for |
|---|---|---|---|
| ai-max | Primary compute | Yes | Large models, heavy inference |
| ai-mini-x1 | Secondary compute | Yes | Parallel inference, overflow load |
| jarvis-brain | Orchestration | No | Routing, agents, coordination |
| dell-micro | Edge compute | No | Fast, lightweight completions |
| synologynas | Storage/NAS | No | Persistence, model weights, memory |
Check node status
You can check the status of each node and the overall mesh through the monitoring dashboard or via the API.- API
- Monitoring dashboard
Send a The response lists each node, its current availability, and active model assignments.
GET request to the health endpoint to see which nodes are reachable:Route requests to a specific node
By default, LiteLLM selects the best node for each request. If you need to target a specific node — for example, to run on a GPU node or to avoid a node under heavy load — pass thenode parameter in your request.
Next steps
Models
See the full model fleet running across the mesh.
Inference
Learn how to send inference requests through LiteLLM.