Cookbook for reinforcement fine-tuning conversational reasoning using HealthBench evaluations.
Fetched April 7, 2026