Skip to main content
The Model API is OpenAI-compatible. If you already use the OpenAI Python or Node.js SDK, you can switch to Jarvis by changing the base_url and API key — no other code changes needed. All requests require an Authorization header. See the API overview for authentication details.

Model naming

Jarvis runs models via Ollama, and model names follow the provider/model convention:
Model nameDescription
ollama/llama3Meta Llama 3 (8B or 70B depending on your node)
ollama/mistralMistral 7B — fast, general-purpose
ollama/deepseek-coderDeepSeek Coder — optimized for code tasks
ollama/phi3Microsoft Phi-3 — efficient, low-memory
ollama/gemma2Google Gemma 2
Use GET /models to see the exact list available on your instance.

GET /models

List all models currently available for inference.

Response

object
string
Always "list".
data
array
An array of model objects.

Example

curl
curl https://your-jarvis-host/models \
  -H "Authorization: Bearer jrv_yourkey123"
Response
{
  "object": "list",
  "data": [
    {
      "id": "ollama/llama3",
      "object": "model",
      "created": 1712000000,
      "owned_by": "ollama"
    },
    {
      "id": "ollama/mistral",
      "object": "model",
      "created": 1712000000,
      "owned_by": "ollama"
    }
  ]
}

POST /chat/completions

Send a multi-turn conversation to a model and receive a completion. This endpoint is fully OpenAI-compatible.

Request

model
string
required
The model to use (e.g., ollama/llama3). Use GET /models to see available options.
messages
array
required
An array of message objects representing the conversation history.
stream
boolean
If true, the response streams as server-sent events. Defaults to false.
temperature
number
Sampling temperature between 0 and 2. Higher values produce more varied output. Defaults to 1.
max_tokens
integer
Maximum number of tokens to generate. Defaults to model-dependent limits.

Response

id
string
Unique identifier for this completion.
object
string
Always "chat.completion".
model
string
The model that generated the response.
choices
array
An array of completion choices (usually one).
usage
object
Token usage for the request.

Examples

curl -X POST https://your-jarvis-host/chat/completions \
  -H "Authorization: Bearer jrv_yourkey123" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ollama/llama3",
    "messages": [
      { "role": "system", "content": "You are a helpful infrastructure assistant." },
      { "role": "user", "content": "What are common causes of high disk I/O on Linux?" }
    ],
    "temperature": 0.7
  }'
Response
{
  "id": "chatcmpl-9v2kxp3z",
  "object": "chat.completion",
  "model": "ollama/llama3",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Common causes of high disk I/O include..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 34,
    "completion_tokens": 112,
    "total_tokens": 146
  }
}

POST /completions

Generate a text completion from a raw prompt string (non-chat format).

Request

model
string
required
The model to use (e.g., ollama/mistral).
prompt
string
required
The input text to complete.
max_tokens
integer
Maximum tokens to generate.
temperature
number
Sampling temperature. Defaults to 1.
stream
boolean
Stream the response as server-sent events. Defaults to false.

Response

id
string
Unique identifier for this completion.
object
string
Always "text_completion".
choices
array
An array of completion choices.

Example

curl
curl -X POST https://your-jarvis-host/completions \
  -H "Authorization: Bearer jrv_yourkey123" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ollama/mistral",
    "prompt": "Write a one-paragraph summary of the benefits of local AI inference:",
    "max_tokens": 150
  }'
For most tasks, prefer POST /chat/completions. The chat format gives the model more context about the conversation role and produces better results with instruction-tuned models.