initiale version

This commit is contained in:
2026-05-10 10:46:41 +02:00
commit deb0d5de9d
10 changed files with 2049 additions and 0 deletions

164
doc/anthropic-api.md Normal file
View File

@@ -0,0 +1,164 @@
# Anthropic Messages API
Endpoint: `POST /v1/messages`
Docs: https://docs.anthropic.com/en/api/messages
## Request
```json
{
"model": "claude-opus-4-7",
"max_tokens": 4096,
"messages": [
{
"role": "user",
"content": "string OR array of content blocks"
}
],
"system": "string OR array of system blocks",
"temperature": 0.7,
"stop_sequences": ["---"],
"tools": [
{
"name": "tool_name",
"description": "What this tool does",
"input_schema": {
"type": "object",
"properties": {
"param": { "type": "string", "description": "..." }
},
"required": ["param"]
}
}
],
"tool_choice": { "type": "auto" },
"stream": true
}
```
### Fields
| Field | Type | Required | Notes |
|------------------|--------------|----------|---------------------------------------------------|
| `model` | string | Yes | e.g. `claude-opus-4-7`, `claude-sonnet-4-6` |
| `max_tokens` | number | Yes | Max output tokens |
| `messages` | array | Yes | `role: user|assistant` |
| `system` | string/array | No | System prompt (separate from messages) |
| `temperature` | number | No | 0.01.0 |
| `stop_sequences` | array | No | Strings that stop generation |
| `tools` | array | No | Tool definitions with `input_schema` (JSON Schema)|
| `tool_choice` | object | No | `{ type: "auto|any|tool", name?: "..." }` |
| `stream` | boolean | No | Enable SSE streaming |
## Message Content Types
```json
{ "type": "text", "text": "string" }
{ "type": "image", "source": { "type": "base64", "media_type": "image/jpeg", "data": "..." } }
{ "type": "tool_use", "id": "toolu_abc", "name": "tool_name", "input": {} }
{ "type": "tool_result", "tool_use_id": "toolu_abc", "content": "result string or array" }
```
## Streaming SSE Events
SSE format: each event is two lines `event: <type>\ndata: <json>` followed by blank line.
### Event Order
1. `message_start`
2. For each content block: `content_block_start` → N× `content_block_delta``content_block_stop`
3. `message_delta`
4. `message_stop`
Interspersed `ping` events may appear at any time.
### message_start
```json
{
"type": "message_start",
"message": {
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"model": "claude-opus-4-7",
"content": [],
"stop_reason": null,
"stop_sequence": null,
"usage": { "input_tokens": 25, "output_tokens": 1 }
}
}
```
### content_block_start
```json
{ "type": "content_block_start", "index": 0,
"content_block": { "type": "text", "text": "" } }
{ "type": "content_block_start", "index": 1,
"content_block": { "type": "tool_use", "id": "toolu_abc", "name": "get_weather", "input": {} } }
```
### content_block_delta
```json
{ "type": "content_block_delta", "index": 0,
"delta": { "type": "text_delta", "text": "Hello" } }
{ "type": "content_block_delta", "index": 1,
"delta": { "type": "input_json_delta", "partial_json": "{\"loc" } }
```
### content_block_stop
```json
{ "type": "content_block_stop", "index": 0 }
```
### message_delta
```json
{
"type": "message_delta",
"delta": {
"stop_reason": "end_turn",
"stop_sequence": null
},
"usage": { "output_tokens": 15 }
}
```
`stop_reason` values: `end_turn` | `stop_sequence` | `max_tokens` | `tool_use`
### message_stop
```json
{ "type": "message_stop" }
```
## Non-Streaming Response
```json
{
"id": "msg_abc",
"type": "message",
"role": "assistant",
"model": "claude-opus-4-7",
"content": [
{ "type": "text", "text": "Hello!" },
{ "type": "tool_use", "id": "toolu_abc", "name": "get_weather", "input": { "location": "Berlin" } }
],
"stop_reason": "tool_use",
"stop_sequence": null,
"usage": { "input_tokens": 100, "output_tokens": 50 }
}
```
## Tool Flow
1. Send request with `tools` array
2. Model responds with `stop_reason: "tool_use"` and `content` block of `type: "tool_use"`
3. Execute the tool locally
4. Send next user message with `type: "tool_result"` content block referencing `tool_use_id`
5. Continue until `stop_reason: "end_turn"`

125
doc/ollama-api.md Normal file
View File

@@ -0,0 +1,125 @@
# Ollama Chat API
Endpoint: `POST /api/chat`
## Request
```json
{
"model": "llama3.2",
"messages": [
{ "role": "system", "content": "string" },
{ "role": "user", "content": "string" },
{ "role": "assistant", "content": "string" },
{ "role": "tool", "content": "string" }
],
"tools": [
{
"type": "function",
"function": {
"name": "string",
"description": "string",
"parameters": {
"type": "object",
"properties": { "param": { "type": "string", "description": "..." } },
"required": ["param"]
}
}
}
],
"think": false,
"format": "json",
"stream": true,
"keep_alive": "5m",
"options": {
"temperature": 0.7,
"num_ctx": 131072,
"num_predict": 4096
}
}
```
### Fields
| Field | Type | Required | Notes |
|--------------|---------|----------|------------------------------------------------|
| `model` | string | Yes | Model name (e.g. `qwen3.6:35b-a3b-q4_K_M`) |
| `messages` | array | Yes | Conversation history |
| `tools` | array | No | Function definitions (OpenAI-compatible format)|
| `think` | boolean | No | Enable chain-of-thought (thinking models only) |
| `format` | string | No | `"json"` or JSON schema for structured output |
| `stream` | boolean | No | Default: `true` |
| `keep_alive` | string | No | How long to keep model loaded. Default: `5m` |
| `options` | object | No | Model runtime parameters |
## Streaming Response (NDJSON)
Each line is a standalone JSON object:
```json
{ "model": "...", "message": { "role": "assistant", "content": "partial text" }, "done": false }
```
Final line (done):
```json
{
"model": "...",
"message": { "role": "assistant", "content": "" },
"done": true,
"done_reason": "stop",
"total_duration": 1234567890,
"load_duration": 987654321,
"prompt_eval_count": 50,
"eval_count": 200,
"eval_duration": 12345678
}
```
### Tool Call Response
When the model decides to use a tool, `message.tool_calls` is set (content is empty/null):
```json
{
"model": "...",
"message": {
"role": "assistant",
"content": "",
"tool_calls": [
{
"function": {
"name": "get_weather",
"arguments": { "location": "Berlin", "unit": "celsius" }
}
}
]
},
"done": false
}
```
Note: `tool_calls[].function.arguments` is an **object** (already parsed JSON), not a string.
### done_reason Values
| Value | Meaning |
|---------------|----------------------------------|
| `stop` | Natural end of generation |
| `tool_calls` | Model triggered a tool call |
| `load` | Model was loaded |
| `unload` | Model was unloaded |
## Tool Result Message
After receiving a tool call, send result as role `tool`:
```json
{
"role": "tool",
"content": "result text or JSON string"
}
```
## Non-Streaming Response
Single JSON object with all fields combined (same structure as final streaming line).

131
doc/proxy-analysis.md Normal file
View File

@@ -0,0 +1,131 @@
# Proxy Implementierungsanalyse
## Was der Proxy macht
Übersetzt **Anthropic API** (`POST /v1/messages`) → **Ollama API** (`POST /api/chat`)
- Empfängt Anthropic SSE-Format
- Gibt Anthropic SSE-Format zurück
- Routing: `localhost:11435``https://ollama.aquantico.de/api/chat`
---
## Korrekte Implementierungen ✓
### 1. Model-Substitution (korrekt)
```js
if (anthropicBody.model?.startsWith('claude-')) {
anthropicBody.model = 'qwen3.6:35b-a3b-q4_K_M';
}
```
Alle `claude-*` Modelle werden durch das lokale Modell ersetzt.
### 2. Think-Modus deaktiviert (korrekt)
```js
think: false
```
Hardcoded in `convertAnthropicToOllama()`.
### 3. Tool-Schema-Konvertierung (korrekt)
Anthropic `input_schema` → Ollama `function.parameters`:
```js
{ type: 'function', function: { name, description, parameters: sanitizeToolSchema(tool.input_schema) } }
```
### 4. Tool-Call-Parsing in Response (korrekt)
Ollama gibt `tc.function.name` und `tc.function.arguments` zurück — genau das liest der Proxy:
```js
const toolName = tc.function?.name;
const toolInput = parseToolArguments(tc.function?.arguments);
```
### 5. SSE-Event-Sequenz (korrekt)
Ausgabe entspricht Anthropic-Spec:
`message_start``content_block_start``content_block_delta``content_block_stop``message_delta``message_stop`
### 6. stop_reason (korrekt)
```js
stop_reason: emittedToolUse ? 'tool_use' : 'end_turn'
```
### 7. Tool-Deduplizierung (korrekt)
Verhindert doppelte Tool-Calls via `seenToolCalls` Set mit Key `name:args`.
---
## Bekannte Bugs / Schwachstellen ⚠️
### BUG 1: Leerer finaler Buffer-Handler (app.js:350-358)
```js
if (buffer.trim()) {
try {
const data = JSON.parse(buffer.trim());
// gleichen Handling-Code wie oben ausführen ← LEER, nie ausgeführt!
} catch (e) { ... }
}
```
**Problem**: Wenn das letzte NDJSON-Chunk von Ollama nicht mit `\n` endet (was bei einigen Ollama-Versionen vorkommt), bleibt die finale `done: true`-Zeile im Buffer und wird nicht verarbeitet.
**Auswirkung**: `messageFinished` bleibt `false`, Fallback-Code (Zeile 360-381) sendet die Abschluss-Events ohne `eval_count` (output_tokens=0).
**Fix**: Den gleichen Parsing-Code aus der while-Schleife in den finalen Buffer-Handler kopieren.
### BUG 2: message_start ohne usage.input_tokens (app.js:200-209)
```js
res.write(`event: message_start\ndata: ${JSON.stringify({
type: 'message_start',
message: {
id: messageId, type: 'message', role: 'assistant', content: [],
model: anthropicBody.model
// fehlt: stop_reason: null, usage: { input_tokens: 0, output_tokens: 0 }
}
})}\n\n`);
```
**Auswirkung**: Anthropic-kompatible Clients erwarten `usage.input_tokens` in `message_start`. Kann bei strikten Clients zu Parse-Fehlern führen.
### BUG 3: tool_use/tool_result als Text im Nachrichten-Verlauf
Wenn Anthropic-Clients `tool_use` (in Assistant-Nachrichten) und `tool_result` (in User-Nachrichten) im History senden, werden diese als Text-Strings in den Ollama-Messages eingebettet:
```
"Previous assistant tool call already made.\nTool name: ...\n..."
```
**Korrekt wäre**: Assistant-Nachrichten mit `tool_calls` senden, Tool-Results als `role: "tool"` Nachricht.
**Auswirkung**: Das Modell versteht den Tool-Call-Verlauf semantisch nicht korrekt. Die bestehende Deduplizierungs-Logik kompensiert dies teilweise.
---
## Architektur-Übersicht
```
Client (Claude SDK)
│ POST /v1/messages (Anthropic Format)
noThinkProxy :11435
│ convertAnthropicToOllama()
│ - system → messages[0] role:system
│ - tool_use → text string
│ - tool_result → text string
│ - model: claude-* → qwen3.6:35b-a3b-q4_K_M
│ - think: false
│ - options: { num_ctx:131072, num_predict, temperature:0.7 }
│ POST /api/chat (Ollama NDJSON)
Ollama https://ollama.aquantico.de
│ NDJSON stream: {message:{content, tool_calls}, done}
noThinkProxy
│ handleResponse()
│ - text → content_block_delta (text_delta)
│ - tool_calls → content_block_start/delta/stop (tool_use)
│ - done → message_delta + message_stop
Client (Anthropic SSE)
```