initiale version

2026-05-10 10:46:41 +02:00
commit deb0d5de9d
10 changed files with 2049 additions and 0 deletions
--- a/doc/anthropic-api.md
+++ b/doc/anthropic-api.md
@@ -0,0 +1,164 @@
+# Anthropic Messages API
+
+Endpoint: `POST /v1/messages`
+Docs: https://docs.anthropic.com/en/api/messages
+
+## Request
+
+```json
+{
+  "model":         "claude-opus-4-7",
+  "max_tokens":    4096,
+  "messages": [
+    {
+      "role": "user",
+      "content": "string  OR  array of content blocks"
+    }
+  ],
+  "system":        "string  OR  array of system blocks",
+  "temperature":   0.7,
+  "stop_sequences": ["---"],
+  "tools": [
+    {
+      "name":        "tool_name",
+      "description": "What this tool does",
+      "input_schema": {
+        "type": "object",
+        "properties": {
+          "param": { "type": "string", "description": "..." }
+        },
+        "required": ["param"]
+      }
+    }
+  ],
+  "tool_choice": { "type": "auto" },
+  "stream": true
+}
+```
+
+### Fields
+
+| Field            | Type         | Required | Notes                                             |
+|------------------|--------------|----------|---------------------------------------------------|
+| `model`          | string       | Yes      | e.g. `claude-opus-4-7`, `claude-sonnet-4-6`       |
+| `max_tokens`     | number       | Yes      | Max output tokens                                 |
+| `messages`       | array        | Yes      | `role: user|assistant`                            |
+| `system`         | string/array | No       | System prompt (separate from messages)            |
+| `temperature`    | number       | No       | 0.0–1.0                                           |
+| `stop_sequences` | array        | No       | Strings that stop generation                      |
+| `tools`          | array        | No       | Tool definitions with `input_schema` (JSON Schema)|
+| `tool_choice`    | object       | No       | `{ type: "auto|any|tool", name?: "..." }`         |
+| `stream`         | boolean      | No       | Enable SSE streaming                              |
+
+## Message Content Types
+
+```json
+{ "type": "text",        "text": "string" }
+{ "type": "image",       "source": { "type": "base64", "media_type": "image/jpeg", "data": "..." } }
+{ "type": "tool_use",    "id": "toolu_abc", "name": "tool_name", "input": {} }
+{ "type": "tool_result", "tool_use_id": "toolu_abc", "content": "result string or array" }
+```
+
+## Streaming SSE Events
+
+SSE format: each event is two lines `event: <type>\ndata: <json>` followed by blank line.
+
+### Event Order
+
+1. `message_start`
+2. For each content block: `content_block_start` → N× `content_block_delta` → `content_block_stop`
+3. `message_delta`
+4. `message_stop`
+
+Interspersed `ping` events may appear at any time.
+
+### message_start
+
+```json
+{
+  "type": "message_start",
+  "message": {
+    "id":    "msg_01XFDUDYJgAACzvnptvVoYEL",
+    "type":  "message",
+    "role":  "assistant",
+    "model": "claude-opus-4-7",
+    "content": [],
+    "stop_reason":   null,
+    "stop_sequence": null,
+    "usage": { "input_tokens": 25, "output_tokens": 1 }
+  }
+}
+```
+
+### content_block_start
+
+```json
+{ "type": "content_block_start", "index": 0,
+  "content_block": { "type": "text", "text": "" } }
+
+{ "type": "content_block_start", "index": 1,
+  "content_block": { "type": "tool_use", "id": "toolu_abc", "name": "get_weather", "input": {} } }
+```
+
+### content_block_delta
+
+```json
+{ "type": "content_block_delta", "index": 0,
+  "delta": { "type": "text_delta",       "text": "Hello" } }
+
+{ "type": "content_block_delta", "index": 1,
+  "delta": { "type": "input_json_delta", "partial_json": "{\"loc" } }
+```
+
+### content_block_stop
+
+```json
+{ "type": "content_block_stop", "index": 0 }
+```
+
+### message_delta
+
+```json
+{
+  "type": "message_delta",
+  "delta": {
+    "stop_reason":   "end_turn",
+    "stop_sequence": null
+  },
+  "usage": { "output_tokens": 15 }
+}
+```
+
+`stop_reason` values: `end_turn` | `stop_sequence` | `max_tokens` | `tool_use`
+
+### message_stop
+
+```json
+{ "type": "message_stop" }
+```
+
+## Non-Streaming Response
+
+```json
+{
+  "id":   "msg_abc",
+  "type": "message",
+  "role": "assistant",
+  "model": "claude-opus-4-7",
+  "content": [
+    { "type": "text", "text": "Hello!" },
+    { "type": "tool_use", "id": "toolu_abc", "name": "get_weather", "input": { "location": "Berlin" } }
+  ],
+  "stop_reason":   "tool_use",
+  "stop_sequence": null,
+  "usage": { "input_tokens": 100, "output_tokens": 50 }
+}
+```
+
+## Tool Flow
+
+1. Send request with `tools` array
+2. Model responds with `stop_reason: "tool_use"` and `content` block of `type: "tool_use"`
+3. Execute the tool locally
+4. Send next user message with `type: "tool_result"` content block referencing `tool_use_id`
+5. Continue until `stop_reason: "end_turn"`
--- a/doc/ollama-api.md
+++ b/doc/ollama-api.md
@@ -0,0 +1,125 @@
+# Ollama Chat API
+
+Endpoint: `POST /api/chat`
+
+## Request
+
+```json
+{
+  "model": "llama3.2",
+  "messages": [
+    { "role": "system",    "content": "string" },
+    { "role": "user",      "content": "string" },
+    { "role": "assistant", "content": "string" },
+    { "role": "tool",      "content": "string" }
+  ],
+  "tools": [
+    {
+      "type": "function",
+      "function": {
+        "name": "string",
+        "description": "string",
+        "parameters": {
+          "type": "object",
+          "properties": { "param": { "type": "string", "description": "..." } },
+          "required": ["param"]
+        }
+      }
+    }
+  ],
+  "think":     false,
+  "format":    "json",
+  "stream":    true,
+  "keep_alive": "5m",
+  "options": {
+    "temperature": 0.7,
+    "num_ctx":     131072,
+    "num_predict": 4096
+  }
+}
+```
+
+### Fields
+
+| Field        | Type    | Required | Notes                                          |
+|--------------|---------|----------|------------------------------------------------|
+| `model`      | string  | Yes      | Model name (e.g. `qwen3.6:35b-a3b-q4_K_M`)   |
+| `messages`   | array   | Yes      | Conversation history                           |
+| `tools`      | array   | No       | Function definitions (OpenAI-compatible format)|
+| `think`      | boolean | No       | Enable chain-of-thought (thinking models only) |
+| `format`     | string  | No       | `"json"` or JSON schema for structured output  |
+| `stream`     | boolean | No       | Default: `true`                                |
+| `keep_alive` | string  | No       | How long to keep model loaded. Default: `5m`   |
+| `options`    | object  | No       | Model runtime parameters                       |
+
+## Streaming Response (NDJSON)
+
+Each line is a standalone JSON object:
+
+```json
+{ "model": "...", "message": { "role": "assistant", "content": "partial text" }, "done": false }
+```
+
+Final line (done):
+```json
+{
+  "model": "...",
+  "message": { "role": "assistant", "content": "" },
+  "done": true,
+  "done_reason": "stop",
+  "total_duration":      1234567890,
+  "load_duration":       987654321,
+  "prompt_eval_count":   50,
+  "eval_count":          200,
+  "eval_duration":       12345678
+}
+```
+
+### Tool Call Response
+
+When the model decides to use a tool, `message.tool_calls` is set (content is empty/null):
+
+```json
+{
+  "model": "...",
+  "message": {
+    "role": "assistant",
+    "content": "",
+    "tool_calls": [
+      {
+        "function": {
+          "name": "get_weather",
+          "arguments": { "location": "Berlin", "unit": "celsius" }
+        }
+      }
+    ]
+  },
+  "done": false
+}
+```
+
+Note: `tool_calls[].function.arguments` is an **object** (already parsed JSON), not a string.
+
+### done_reason Values
+
+| Value         | Meaning                          |
+|---------------|----------------------------------|
+| `stop`        | Natural end of generation        |
+| `tool_calls`  | Model triggered a tool call      |
+| `load`        | Model was loaded                 |
+| `unload`      | Model was unloaded               |
+
+## Tool Result Message
+
+After receiving a tool call, send result as role `tool`:
+
+```json
+{
+  "role": "tool",
+  "content": "result text or JSON string"
+}
+```
+
+## Non-Streaming Response
+
+Single JSON object with all fields combined (same structure as final streaming line).
--- a/doc/proxy-analysis.md
+++ b/doc/proxy-analysis.md
@@ -0,0 +1,131 @@
+# Proxy Implementierungsanalyse
+
+## Was der Proxy macht
+
+Übersetzt **Anthropic API** (`POST /v1/messages`) → **Ollama API** (`POST /api/chat`)
+
+- Empfängt Anthropic SSE-Format
+- Gibt Anthropic SSE-Format zurück
+- Routing: `localhost:11435` → `https://ollama.aquantico.de/api/chat`
+
+---
+
+## Korrekte Implementierungen ✓
+
+### 1. Model-Substitution (korrekt)
+```js
+if (anthropicBody.model?.startsWith('claude-')) {
+  anthropicBody.model = 'qwen3.6:35b-a3b-q4_K_M';
+}
+```
+Alle `claude-*` Modelle werden durch das lokale Modell ersetzt.
+
+### 2. Think-Modus deaktiviert (korrekt)
+```js
+think: false
+```
+Hardcoded in `convertAnthropicToOllama()`.
+
+### 3. Tool-Schema-Konvertierung (korrekt)
+Anthropic `input_schema` → Ollama `function.parameters`:
+```js
+{ type: 'function', function: { name, description, parameters: sanitizeToolSchema(tool.input_schema) } }
+```
+
+### 4. Tool-Call-Parsing in Response (korrekt)
+Ollama gibt `tc.function.name` und `tc.function.arguments` zurück — genau das liest der Proxy:
+```js
+const toolName  = tc.function?.name;
+const toolInput = parseToolArguments(tc.function?.arguments);
+```
+
+### 5. SSE-Event-Sequenz (korrekt)
+Ausgabe entspricht Anthropic-Spec:
+`message_start` → `content_block_start` → `content_block_delta` → `content_block_stop` → `message_delta` → `message_stop`
+
+### 6. stop_reason (korrekt)
+```js
+stop_reason: emittedToolUse ? 'tool_use' : 'end_turn'
+```
+
+### 7. Tool-Deduplizierung (korrekt)
+Verhindert doppelte Tool-Calls via `seenToolCalls` Set mit Key `name:args`.
+
+---
+
+## Bekannte Bugs / Schwachstellen ⚠️
+
+### BUG 1: Leerer finaler Buffer-Handler (app.js:350-358)
+
+```js
+if (buffer.trim()) {
+  try {
+    const data = JSON.parse(buffer.trim());
+    // gleichen Handling-Code wie oben ausführen  ← LEER, nie ausgeführt!
+  } catch (e) { ... }
+}
+```
+
+**Problem**: Wenn das letzte NDJSON-Chunk von Ollama nicht mit `\n` endet (was bei einigen Ollama-Versionen vorkommt), bleibt die finale `done: true`-Zeile im Buffer und wird nicht verarbeitet.
+
+**Auswirkung**: `messageFinished` bleibt `false`, Fallback-Code (Zeile 360-381) sendet die Abschluss-Events ohne `eval_count` (output_tokens=0).
+
+**Fix**: Den gleichen Parsing-Code aus der while-Schleife in den finalen Buffer-Handler kopieren.
+
+### BUG 2: message_start ohne usage.input_tokens (app.js:200-209)
+
+```js
+res.write(`event: message_start\ndata: ${JSON.stringify({
+  type: 'message_start',
+  message: {
+    id: messageId, type: 'message', role: 'assistant', content: [],
+    model: anthropicBody.model
+    // fehlt: stop_reason: null, usage: { input_tokens: 0, output_tokens: 0 }
+  }
+})}\n\n`);
+```
+
+**Auswirkung**: Anthropic-kompatible Clients erwarten `usage.input_tokens` in `message_start`. Kann bei strikten Clients zu Parse-Fehlern führen.
+
+### BUG 3: tool_use/tool_result als Text im Nachrichten-Verlauf
+
+Wenn Anthropic-Clients `tool_use` (in Assistant-Nachrichten) und `tool_result` (in User-Nachrichten) im History senden, werden diese als Text-Strings in den Ollama-Messages eingebettet:
+
+```
+"Previous assistant tool call already made.\nTool name: ...\n..."
+```
+
+**Korrekt wäre**: Assistant-Nachrichten mit `tool_calls` senden, Tool-Results als `role: "tool"` Nachricht.
+
+**Auswirkung**: Das Modell versteht den Tool-Call-Verlauf semantisch nicht korrekt. Die bestehende Deduplizierungs-Logik kompensiert dies teilweise.
+
+---
+
+## Architektur-Übersicht
+
+```
+Client (Claude SDK)
+    │  POST /v1/messages (Anthropic Format)
+    ▼
+noThinkProxy :11435
+    │  convertAnthropicToOllama()
+    │  - system → messages[0] role:system
+    │  - tool_use → text string
+    │  - tool_result → text string
+    │  - model: claude-* → qwen3.6:35b-a3b-q4_K_M
+    │  - think: false
+    │  - options: { num_ctx:131072, num_predict, temperature:0.7 }
+    │
+    │  POST /api/chat (Ollama NDJSON)
+    ▼
+Ollama https://ollama.aquantico.de
+    │  NDJSON stream: {message:{content, tool_calls}, done}
+    ▼
+noThinkProxy
+    │  handleResponse()
+    │  - text → content_block_delta (text_delta)
+    │  - tool_calls → content_block_start/delta/stop (tool_use)
+    │  - done → message_delta + message_stop
+    ▼
+Client (Anthropic SSE)
+```