All models run on your own hardware. No requests or prompts are sent to external services unless you explicitly configure a cloud provider in LiteLLM.
Model categories
- General purpose
- Code
- Fast
General-purpose models handle a wide range of tasks: writing, summarization, question answering, and reasoning.
Use these when you don’t have a specific requirement — they handle most everyday workloads well.
| Model | Best for |
|---|---|
| Llama 3 | Balanced performance across most tasks |
| Mistral | Fast responses, strong instruction following |
| DeepSeek | Reasoning-heavy tasks and analysis |
Select a model for your task
Pass the model name in your request using the standard OpenAI-compatible format. LiteLLM maps the name to the correct Ollama model and routes to an available node.Model routing and load balancing
LiteLLM manages routing automatically:- Load balancing — if a model runs on multiple nodes, LiteLLM distributes requests across them.
- Failover — if a node becomes unavailable, requests fall through to another node running the same model.
- Priority routing — ai-max and ai-mini-x1 receive GPU-bound requests first; dell-micro picks up lightweight overflow.
LiteLLM API call format
LiteLLM exposes an OpenAI-compatible API. Any client that works with the OpenAI SDK works with Jarvis.Next steps
Inference
Learn how to send requests, stream responses, and get good results.
API reference
Full reference for the models API endpoint.