Skip to main content
Jarvis runs a fleet of local models through Ollama, all accessible via a single LiteLLM endpoint. You can send requests to a specific model by name, or let LiteLLM select the best one for your task.
All models run on your own hardware. No requests or prompts are sent to external services unless you explicitly configure a cloud provider in LiteLLM.

Model categories

General-purpose models handle a wide range of tasks: writing, summarization, question answering, and reasoning.
ModelBest for
Llama 3Balanced performance across most tasks
MistralFast responses, strong instruction following
DeepSeekReasoning-heavy tasks and analysis
Use these when you don’t have a specific requirement — they handle most everyday workloads well.

Select a model for your task

Pass the model name in your request using the standard OpenAI-compatible format. LiteLLM maps the name to the correct Ollama model and routes to an available node.
{
  "model": "llama3",
  "messages": [{ "role": "user", "content": "Summarize this document." }]
}
You can use any model name that Ollama has pulled on the mesh. To see what’s available:
curl https://your-jarvis-host/api/models

Model routing and load balancing

LiteLLM manages routing automatically:
  • Load balancing — if a model runs on multiple nodes, LiteLLM distributes requests across them.
  • Failover — if a node becomes unavailable, requests fall through to another node running the same model.
  • Priority routing — ai-max and ai-mini-x1 receive GPU-bound requests first; dell-micro picks up lightweight overflow.
You can override routing by specifying a node alongside your model. See Nodes for details.

LiteLLM API call format

LiteLLM exposes an OpenAI-compatible API. Any client that works with the OpenAI SDK works with Jarvis.
curl https://your-jarvis-host/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "model": "llama3",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "What is the capital of France?" }
    ]
  }'

Next steps

Inference

Learn how to send requests, stream responses, and get good results.

API reference

Full reference for the models API endpoint.