Thinking tokens count against num_predict. At 4096 the model was running out mid-response after spending ~3000 tokens on thinking. 16384 gives enough headroom for thinking + full response. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
74 KiB
74 KiB