Advisor tool max_tokens parameter and refusal billing changes
- The advisor tool now supports a
max_tokensparameter to cap the advisor model's output per call, reducing latency and output token cost for workloads that don't need full-length responses. - On the Claude API, requests returning
stop_reason: "refusal"without generated output are no longer billed.
Fetched June 4, 2026


