releases.shpreview
Langfuse/Langfuse Changelog/Categorical LLM-as-a-Judge Scores

Categorical LLM-as-a-Judge Scores

$npx -y @buildinternet/releases show rel_j3aQvfkOQjlzQa2BcSuZQ

LLM-as-a-Judge evaluators in Langfuse can now return categorical scores in addition to numeric ones. You can define a fixed set of allowed categories in the evaluator template, have the judge choose from them, and store the result as a native categorical score in Langfuse.

This is especially useful when the right answer is a label instead of a gradient:

  • Classify answers as correct, partially_correct, or incorrect
  • Mark support replies as resolved, needs_follow_up, or escalate
  • Label safety outcomes as safe, needs_review, or blocked

What's New:

  • Choose Numeric or Categorical when creating a custom LLM-as-a-Judge evaluator
  • Define the allowed category values directly in the evaluator template
  • Optionally allow multiple matches when more than one label applies; Langfuse creates one score per selected category
  • View categorical results in evaluator logs and reuse them across Langfuse's existing score tooling
Categorical LLM-as-a-Judge Scores

Fetched April 13, 2026