Antrophic-Qwen3.6-Proxy/doc/ollama-api.md

# Ollama Chat API

Endpoint: `POST /api/chat`

## Request

```json
{
  "model": "llama3.2",
  "messages": [
    { "role": "system",    "content": "string" },
    { "role": "user",      "content": "string" },
    { "role": "assistant", "content": "string" },
    { "role": "tool",      "content": "string" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "string",
        "description": "string",
        "parameters": {
          "type": "object",
          "properties": { "param": { "type": "string", "description": "..." } },
          "required": ["param"]
        }
      }
    }
  ],
  "think":     false,
  "format":    "json",
  "stream":    true,
  "keep_alive": "5m",
  "options": {
    "temperature": 0.7,
    "num_ctx":     131072,
    "num_predict": 4096
  }
}
```

### Fields

| Field        | Type    | Required | Notes                                          |
|--------------|---------|----------|------------------------------------------------|
| `model`      | string  | Yes      | Model name (e.g. `qwen3.6:35b-a3b-q4_K_M`)   |
| `messages`   | array   | Yes      | Conversation history                           |
| `tools`      | array   | No       | Function definitions (OpenAI-compatible format)|
| `think`      | boolean | No       | Enable chain-of-thought (thinking models only) |
| `format`     | string  | No       | `"json"` or JSON schema for structured output  |
| `stream`     | boolean | No       | Default: `true`                                |
| `keep_alive` | string  | No       | How long to keep model loaded. Default: `5m`   |
| `options`    | object  | No       | Model runtime parameters                       |

## Streaming Response (NDJSON)

Each line is a standalone JSON object:

```json
{ "model": "...", "message": { "role": "assistant", "content": "partial text" }, "done": false }
```

Final line (done):
```json
{
  "model": "...",
  "message": { "role": "assistant", "content": "" },
  "done": true,
  "done_reason": "stop",
  "total_duration":      1234567890,
  "load_duration":       987654321,
  "prompt_eval_count":   50,
  "eval_count":          200,
  "eval_duration":       12345678
}
```

### Tool Call Response

When the model decides to use a tool, `message.tool_calls` is set (content is empty/null):

```json
{
  "model": "...",
  "message": {
    "role": "assistant",
    "content": "",
    "tool_calls": [
      {
        "function": {
          "name": "get_weather",
          "arguments": { "location": "Berlin", "unit": "celsius" }
        }
      }
    ]
  },
  "done": false
}
```

Note: `tool_calls[].function.arguments` is an **object** (already parsed JSON), not a string.

### done_reason Values

| Value         | Meaning                          |
|---------------|----------------------------------|
| `stop`        | Natural end of generation        |
| `tool_calls`  | Model triggered a tool call      |
| `load`        | Model was loaded                 |
| `unload`      | Model was unloaded               |

## Tool Result Message

After receiving a tool call, send result as role `tool`:

```json
{
  "role": "tool",
  "content": "result text or JSON string"
}
```

## Non-Streaming Response

Single JSON object with all fields combined (same structure as final streaming line).