initiale version
This commit is contained in:
164
doc/anthropic-api.md
Normal file
164
doc/anthropic-api.md
Normal file
@@ -0,0 +1,164 @@
|
||||
# Anthropic Messages API
|
||||
|
||||
Endpoint: `POST /v1/messages`
|
||||
Docs: https://docs.anthropic.com/en/api/messages
|
||||
|
||||
## Request
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "claude-opus-4-7",
|
||||
"max_tokens": 4096,
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "string OR array of content blocks"
|
||||
}
|
||||
],
|
||||
"system": "string OR array of system blocks",
|
||||
"temperature": 0.7,
|
||||
"stop_sequences": ["---"],
|
||||
"tools": [
|
||||
{
|
||||
"name": "tool_name",
|
||||
"description": "What this tool does",
|
||||
"input_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"param": { "type": "string", "description": "..." }
|
||||
},
|
||||
"required": ["param"]
|
||||
}
|
||||
}
|
||||
],
|
||||
"tool_choice": { "type": "auto" },
|
||||
"stream": true
|
||||
}
|
||||
```
|
||||
|
||||
### Fields
|
||||
|
||||
| Field | Type | Required | Notes |
|
||||
|------------------|--------------|----------|---------------------------------------------------|
|
||||
| `model` | string | Yes | e.g. `claude-opus-4-7`, `claude-sonnet-4-6` |
|
||||
| `max_tokens` | number | Yes | Max output tokens |
|
||||
| `messages` | array | Yes | `role: user|assistant` |
|
||||
| `system` | string/array | No | System prompt (separate from messages) |
|
||||
| `temperature` | number | No | 0.0–1.0 |
|
||||
| `stop_sequences` | array | No | Strings that stop generation |
|
||||
| `tools` | array | No | Tool definitions with `input_schema` (JSON Schema)|
|
||||
| `tool_choice` | object | No | `{ type: "auto|any|tool", name?: "..." }` |
|
||||
| `stream` | boolean | No | Enable SSE streaming |
|
||||
|
||||
## Message Content Types
|
||||
|
||||
```json
|
||||
{ "type": "text", "text": "string" }
|
||||
{ "type": "image", "source": { "type": "base64", "media_type": "image/jpeg", "data": "..." } }
|
||||
{ "type": "tool_use", "id": "toolu_abc", "name": "tool_name", "input": {} }
|
||||
{ "type": "tool_result", "tool_use_id": "toolu_abc", "content": "result string or array" }
|
||||
```
|
||||
|
||||
## Streaming SSE Events
|
||||
|
||||
SSE format: each event is two lines `event: <type>\ndata: <json>` followed by blank line.
|
||||
|
||||
### Event Order
|
||||
|
||||
1. `message_start`
|
||||
2. For each content block: `content_block_start` → N× `content_block_delta` → `content_block_stop`
|
||||
3. `message_delta`
|
||||
4. `message_stop`
|
||||
|
||||
Interspersed `ping` events may appear at any time.
|
||||
|
||||
### message_start
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "message_start",
|
||||
"message": {
|
||||
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
|
||||
"type": "message",
|
||||
"role": "assistant",
|
||||
"model": "claude-opus-4-7",
|
||||
"content": [],
|
||||
"stop_reason": null,
|
||||
"stop_sequence": null,
|
||||
"usage": { "input_tokens": 25, "output_tokens": 1 }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### content_block_start
|
||||
|
||||
```json
|
||||
{ "type": "content_block_start", "index": 0,
|
||||
"content_block": { "type": "text", "text": "" } }
|
||||
|
||||
{ "type": "content_block_start", "index": 1,
|
||||
"content_block": { "type": "tool_use", "id": "toolu_abc", "name": "get_weather", "input": {} } }
|
||||
```
|
||||
|
||||
### content_block_delta
|
||||
|
||||
```json
|
||||
{ "type": "content_block_delta", "index": 0,
|
||||
"delta": { "type": "text_delta", "text": "Hello" } }
|
||||
|
||||
{ "type": "content_block_delta", "index": 1,
|
||||
"delta": { "type": "input_json_delta", "partial_json": "{\"loc" } }
|
||||
```
|
||||
|
||||
### content_block_stop
|
||||
|
||||
```json
|
||||
{ "type": "content_block_stop", "index": 0 }
|
||||
```
|
||||
|
||||
### message_delta
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "message_delta",
|
||||
"delta": {
|
||||
"stop_reason": "end_turn",
|
||||
"stop_sequence": null
|
||||
},
|
||||
"usage": { "output_tokens": 15 }
|
||||
}
|
||||
```
|
||||
|
||||
`stop_reason` values: `end_turn` | `stop_sequence` | `max_tokens` | `tool_use`
|
||||
|
||||
### message_stop
|
||||
|
||||
```json
|
||||
{ "type": "message_stop" }
|
||||
```
|
||||
|
||||
## Non-Streaming Response
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "msg_abc",
|
||||
"type": "message",
|
||||
"role": "assistant",
|
||||
"model": "claude-opus-4-7",
|
||||
"content": [
|
||||
{ "type": "text", "text": "Hello!" },
|
||||
{ "type": "tool_use", "id": "toolu_abc", "name": "get_weather", "input": { "location": "Berlin" } }
|
||||
],
|
||||
"stop_reason": "tool_use",
|
||||
"stop_sequence": null,
|
||||
"usage": { "input_tokens": 100, "output_tokens": 50 }
|
||||
}
|
||||
```
|
||||
|
||||
## Tool Flow
|
||||
|
||||
1. Send request with `tools` array
|
||||
2. Model responds with `stop_reason: "tool_use"` and `content` block of `type: "tool_use"`
|
||||
3. Execute the tool locally
|
||||
4. Send next user message with `type: "tool_result"` content block referencing `tool_use_id`
|
||||
5. Continue until `stop_reason: "end_turn"`
|
||||
125
doc/ollama-api.md
Normal file
125
doc/ollama-api.md
Normal file
@@ -0,0 +1,125 @@
|
||||
# Ollama Chat API
|
||||
|
||||
Endpoint: `POST /api/chat`
|
||||
|
||||
## Request
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "llama3.2",
|
||||
"messages": [
|
||||
{ "role": "system", "content": "string" },
|
||||
{ "role": "user", "content": "string" },
|
||||
{ "role": "assistant", "content": "string" },
|
||||
{ "role": "tool", "content": "string" }
|
||||
],
|
||||
"tools": [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "string",
|
||||
"description": "string",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": { "param": { "type": "string", "description": "..." } },
|
||||
"required": ["param"]
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"think": false,
|
||||
"format": "json",
|
||||
"stream": true,
|
||||
"keep_alive": "5m",
|
||||
"options": {
|
||||
"temperature": 0.7,
|
||||
"num_ctx": 131072,
|
||||
"num_predict": 4096
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Fields
|
||||
|
||||
| Field | Type | Required | Notes |
|
||||
|--------------|---------|----------|------------------------------------------------|
|
||||
| `model` | string | Yes | Model name (e.g. `qwen3.6:35b-a3b-q4_K_M`) |
|
||||
| `messages` | array | Yes | Conversation history |
|
||||
| `tools` | array | No | Function definitions (OpenAI-compatible format)|
|
||||
| `think` | boolean | No | Enable chain-of-thought (thinking models only) |
|
||||
| `format` | string | No | `"json"` or JSON schema for structured output |
|
||||
| `stream` | boolean | No | Default: `true` |
|
||||
| `keep_alive` | string | No | How long to keep model loaded. Default: `5m` |
|
||||
| `options` | object | No | Model runtime parameters |
|
||||
|
||||
## Streaming Response (NDJSON)
|
||||
|
||||
Each line is a standalone JSON object:
|
||||
|
||||
```json
|
||||
{ "model": "...", "message": { "role": "assistant", "content": "partial text" }, "done": false }
|
||||
```
|
||||
|
||||
Final line (done):
|
||||
```json
|
||||
{
|
||||
"model": "...",
|
||||
"message": { "role": "assistant", "content": "" },
|
||||
"done": true,
|
||||
"done_reason": "stop",
|
||||
"total_duration": 1234567890,
|
||||
"load_duration": 987654321,
|
||||
"prompt_eval_count": 50,
|
||||
"eval_count": 200,
|
||||
"eval_duration": 12345678
|
||||
}
|
||||
```
|
||||
|
||||
### Tool Call Response
|
||||
|
||||
When the model decides to use a tool, `message.tool_calls` is set (content is empty/null):
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "...",
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"content": "",
|
||||
"tool_calls": [
|
||||
{
|
||||
"function": {
|
||||
"name": "get_weather",
|
||||
"arguments": { "location": "Berlin", "unit": "celsius" }
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
"done": false
|
||||
}
|
||||
```
|
||||
|
||||
Note: `tool_calls[].function.arguments` is an **object** (already parsed JSON), not a string.
|
||||
|
||||
### done_reason Values
|
||||
|
||||
| Value | Meaning |
|
||||
|---------------|----------------------------------|
|
||||
| `stop` | Natural end of generation |
|
||||
| `tool_calls` | Model triggered a tool call |
|
||||
| `load` | Model was loaded |
|
||||
| `unload` | Model was unloaded |
|
||||
|
||||
## Tool Result Message
|
||||
|
||||
After receiving a tool call, send result as role `tool`:
|
||||
|
||||
```json
|
||||
{
|
||||
"role": "tool",
|
||||
"content": "result text or JSON string"
|
||||
}
|
||||
```
|
||||
|
||||
## Non-Streaming Response
|
||||
|
||||
Single JSON object with all fields combined (same structure as final streaming line).
|
||||
131
doc/proxy-analysis.md
Normal file
131
doc/proxy-analysis.md
Normal file
@@ -0,0 +1,131 @@
|
||||
# Proxy Implementierungsanalyse
|
||||
|
||||
## Was der Proxy macht
|
||||
|
||||
Übersetzt **Anthropic API** (`POST /v1/messages`) → **Ollama API** (`POST /api/chat`)
|
||||
|
||||
- Empfängt Anthropic SSE-Format
|
||||
- Gibt Anthropic SSE-Format zurück
|
||||
- Routing: `localhost:11435` → `https://ollama.aquantico.de/api/chat`
|
||||
|
||||
---
|
||||
|
||||
## Korrekte Implementierungen ✓
|
||||
|
||||
### 1. Model-Substitution (korrekt)
|
||||
```js
|
||||
if (anthropicBody.model?.startsWith('claude-')) {
|
||||
anthropicBody.model = 'qwen3.6:35b-a3b-q4_K_M';
|
||||
}
|
||||
```
|
||||
Alle `claude-*` Modelle werden durch das lokale Modell ersetzt.
|
||||
|
||||
### 2. Think-Modus deaktiviert (korrekt)
|
||||
```js
|
||||
think: false
|
||||
```
|
||||
Hardcoded in `convertAnthropicToOllama()`.
|
||||
|
||||
### 3. Tool-Schema-Konvertierung (korrekt)
|
||||
Anthropic `input_schema` → Ollama `function.parameters`:
|
||||
```js
|
||||
{ type: 'function', function: { name, description, parameters: sanitizeToolSchema(tool.input_schema) } }
|
||||
```
|
||||
|
||||
### 4. Tool-Call-Parsing in Response (korrekt)
|
||||
Ollama gibt `tc.function.name` und `tc.function.arguments` zurück — genau das liest der Proxy:
|
||||
```js
|
||||
const toolName = tc.function?.name;
|
||||
const toolInput = parseToolArguments(tc.function?.arguments);
|
||||
```
|
||||
|
||||
### 5. SSE-Event-Sequenz (korrekt)
|
||||
Ausgabe entspricht Anthropic-Spec:
|
||||
`message_start` → `content_block_start` → `content_block_delta` → `content_block_stop` → `message_delta` → `message_stop`
|
||||
|
||||
### 6. stop_reason (korrekt)
|
||||
```js
|
||||
stop_reason: emittedToolUse ? 'tool_use' : 'end_turn'
|
||||
```
|
||||
|
||||
### 7. Tool-Deduplizierung (korrekt)
|
||||
Verhindert doppelte Tool-Calls via `seenToolCalls` Set mit Key `name:args`.
|
||||
|
||||
---
|
||||
|
||||
## Bekannte Bugs / Schwachstellen ⚠️
|
||||
|
||||
### BUG 1: Leerer finaler Buffer-Handler (app.js:350-358)
|
||||
|
||||
```js
|
||||
if (buffer.trim()) {
|
||||
try {
|
||||
const data = JSON.parse(buffer.trim());
|
||||
// gleichen Handling-Code wie oben ausführen ← LEER, nie ausgeführt!
|
||||
} catch (e) { ... }
|
||||
}
|
||||
```
|
||||
|
||||
**Problem**: Wenn das letzte NDJSON-Chunk von Ollama nicht mit `\n` endet (was bei einigen Ollama-Versionen vorkommt), bleibt die finale `done: true`-Zeile im Buffer und wird nicht verarbeitet.
|
||||
|
||||
**Auswirkung**: `messageFinished` bleibt `false`, Fallback-Code (Zeile 360-381) sendet die Abschluss-Events ohne `eval_count` (output_tokens=0).
|
||||
|
||||
**Fix**: Den gleichen Parsing-Code aus der while-Schleife in den finalen Buffer-Handler kopieren.
|
||||
|
||||
### BUG 2: message_start ohne usage.input_tokens (app.js:200-209)
|
||||
|
||||
```js
|
||||
res.write(`event: message_start\ndata: ${JSON.stringify({
|
||||
type: 'message_start',
|
||||
message: {
|
||||
id: messageId, type: 'message', role: 'assistant', content: [],
|
||||
model: anthropicBody.model
|
||||
// fehlt: stop_reason: null, usage: { input_tokens: 0, output_tokens: 0 }
|
||||
}
|
||||
})}\n\n`);
|
||||
```
|
||||
|
||||
**Auswirkung**: Anthropic-kompatible Clients erwarten `usage.input_tokens` in `message_start`. Kann bei strikten Clients zu Parse-Fehlern führen.
|
||||
|
||||
### BUG 3: tool_use/tool_result als Text im Nachrichten-Verlauf
|
||||
|
||||
Wenn Anthropic-Clients `tool_use` (in Assistant-Nachrichten) und `tool_result` (in User-Nachrichten) im History senden, werden diese als Text-Strings in den Ollama-Messages eingebettet:
|
||||
|
||||
```
|
||||
"Previous assistant tool call already made.\nTool name: ...\n..."
|
||||
```
|
||||
|
||||
**Korrekt wäre**: Assistant-Nachrichten mit `tool_calls` senden, Tool-Results als `role: "tool"` Nachricht.
|
||||
|
||||
**Auswirkung**: Das Modell versteht den Tool-Call-Verlauf semantisch nicht korrekt. Die bestehende Deduplizierungs-Logik kompensiert dies teilweise.
|
||||
|
||||
---
|
||||
|
||||
## Architektur-Übersicht
|
||||
|
||||
```
|
||||
Client (Claude SDK)
|
||||
│ POST /v1/messages (Anthropic Format)
|
||||
▼
|
||||
noThinkProxy :11435
|
||||
│ convertAnthropicToOllama()
|
||||
│ - system → messages[0] role:system
|
||||
│ - tool_use → text string
|
||||
│ - tool_result → text string
|
||||
│ - model: claude-* → qwen3.6:35b-a3b-q4_K_M
|
||||
│ - think: false
|
||||
│ - options: { num_ctx:131072, num_predict, temperature:0.7 }
|
||||
│
|
||||
│ POST /api/chat (Ollama NDJSON)
|
||||
▼
|
||||
Ollama https://ollama.aquantico.de
|
||||
│ NDJSON stream: {message:{content, tool_calls}, done}
|
||||
▼
|
||||
noThinkProxy
|
||||
│ handleResponse()
|
||||
│ - text → content_block_delta (text_delta)
|
||||
│ - tool_calls → content_block_start/delta/stop (tool_use)
|
||||
│ - done → message_delta + message_stop
|
||||
▼
|
||||
Client (Anthropic SSE)
|
||||
```
|
||||
Reference in New Issue
Block a user