OpenAI/Product Releases/How confessions can keep language models honest

How confessions can keep language models honest

December 3, 2025Product ReleasesView original ↗

$npx -y @buildinternet/releases show rel_059VfdikYDGDLWPHmB0zB

OpenAI researchers are testing “confessions,” a method that trains models to admit when they make mistakes or act undesirably, helping improve AI honesty, transparency, and trust in model outputs.

Fetched April 7, 2026