Thinking tokens count against num_predict. At 4096 the model was running out mid-response after spending ~3000 tokens on thinking. 16384 gives enough headroom for thinking + full response. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6 lines
155 B
Plaintext
6 lines
155 B
Plaintext
API_BASE=http://gx10.aquantico.lan:8093
|
|
OLLAMA_BASE_URL=http://gx10.aquantico.lan:11434
|
|
OLLAMA_MODEL=qwen3.5:9b
|
|
OLLAMA_NUM_PREDICT=16384
|
|
OLLAMA_THINK=true
|