fix(diarization-ui): raise default num_predict to 16384

Thinking tokens count against num_predict. At 4096 the model was running out mid-response after spending ~3000 tokens on thinking. 16384 gives enough headroom for thinking + full response. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 16:26:55 +02:00
parent 831a29a23a
commit 39250e6582
2 changed files with 2 additions and 2 deletions
--- a/.env.example
+++ b/.env.example
@@ -1,5 +1,5 @@
 API_BASE=http://gx10.aquantico.lan:8093
 OLLAMA_BASE_URL=http://gx10.aquantico.lan:11434
 OLLAMA_MODEL=qwen3.5:9b
-OLLAMA_NUM_PREDICT=4096
+OLLAMA_NUM_PREDICT=16384
 OLLAMA_THINK=true