Model API - Jarvis

The Model API is OpenAI-compatible. If you already use the OpenAI Python or Node.js SDK, you can switch to Jarvis by changing the base_url and API key — no other code changes needed. All requests require an Authorization header. See the API overview for authentication details.

Model naming

Jarvis runs models via Ollama, and model names follow the provider/model convention:

Model name	Description
`ollama/llama3`	Meta Llama 3 (8B or 70B depending on your node)
`ollama/mistral`	Mistral 7B — fast, general-purpose
`ollama/deepseek-coder`	DeepSeek Coder — optimized for code tasks
`ollama/phi3`	Microsoft Phi-3 — efficient, low-memory
`ollama/gemma2`	Google Gemma 2

Use GET /models to see the exact list available on your instance.

GET /models

List all models currently available for inference.

Response

object

string

Always "list".

data

array

An array of model objects.

Show model fields

string

The model identifier to use in completion requests (e.g., ollama/llama3).

object

string

Always "model".

created

integer

Unix timestamp for when the model was registered.

owned_by

string

The provider that owns the model weights (e.g., ollama).

Example

curl

curl https://your-jarvis-host/models \
  -H "Authorization: Bearer jrv_yourkey123"

Response

{
  "object": "list",
  "data": [
    {
      "id": "ollama/llama3",
      "object": "model",
      "created": 1712000000,
      "owned_by": "ollama"
    },
    {
      "id": "ollama/mistral",
      "object": "model",
      "created": 1712000000,
      "owned_by": "ollama"
    }
  ]
}

POST /chat/completions

Send a multi-turn conversation to a model and receive a completion. This endpoint is fully OpenAI-compatible.

Request

model

string

required

The model to use (e.g., ollama/llama3). Use GET /models to see available options.

messages

array

required

An array of message objects representing the conversation history.

Show message fields

role

string

One of system, user, or assistant.

content

string

The message content.

stream

boolean

If true, the response streams as server-sent events. Defaults to false.

temperature

number

Sampling temperature between 0 and 2. Higher values produce more varied output. Defaults to 1.

max_tokens

integer

Maximum number of tokens to generate. Defaults to model-dependent limits.

Response

string

Unique identifier for this completion.

object

string

Always "chat.completion".

model

string

The model that generated the response.

choices

array

An array of completion choices (usually one).

Show choice fields

index

integer

The index of this choice.

message

object

The generated message, with role ("assistant") and content fields.

finish_reason

string

Why generation stopped. One of stop, length, or content_filter.

usage

object

Token usage for the request.

Show usage fields

prompt_tokens

integer

Tokens consumed by the input messages.

completion_tokens

integer

Tokens generated in the response.

total_tokens

integer

Sum of prompt and completion tokens.

Examples

curl -X POST https://your-jarvis-host/chat/completions \
  -H "Authorization: Bearer jrv_yourkey123" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ollama/llama3",
    "messages": [
      { "role": "system", "content": "You are a helpful infrastructure assistant." },
      { "role": "user", "content": "What are common causes of high disk I/O on Linux?" }
    ],
    "temperature": 0.7
  }'

Response

{
  "id": "chatcmpl-9v2kxp3z",
  "object": "chat.completion",
  "model": "ollama/llama3",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Common causes of high disk I/O include..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 34,
    "completion_tokens": 112,
    "total_tokens": 146
  }
}

POST /completions

Generate a text completion from a raw prompt string (non-chat format).

Request

model

string

required

The model to use (e.g., ollama/mistral).

prompt

string

required

The input text to complete.

max_tokens

integer

Maximum tokens to generate.

temperature

number

Sampling temperature. Defaults to 1.

stream

boolean

Stream the response as server-sent events. Defaults to false.

Response

string

Unique identifier for this completion.

object

string

Always "text_completion".

choices

array

An array of completion choices.

Show choice fields

text

string

The generated text.

finish_reason

string

Why generation stopped: stop or length.

Example

curl

curl -X POST https://your-jarvis-host/completions \
  -H "Authorization: Bearer jrv_yourkey123" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ollama/mistral",
    "prompt": "Write a one-paragraph summary of the benefits of local AI inference:",
    "max_tokens": 150
  }'

For most tasks, prefer POST /chat/completions. The chat format gives the model more context about the conversation role and produces better results with instruction-tuned models.

Endpoints

​Model naming

​GET /models

​Response

​Example

​POST /chat/completions

​Request

​Response

​Examples

​POST /completions

​Request

​Response

​Example

Model naming

GET /models

Response

Example

POST /chat/completions

Request

Response

Examples

POST /completions

Request

Response

Example