LLM-as-a-Judge evaluators in Langfuse can now return categorical scores in addition to numeric ones. You can define a fixed set of allowed categories in the evaluator template, have the judge choose from them, and store the result as a native categorical score in Langfuse.
This is especially useful when the right answer is a label instead of a gradient:
correct, partially_correct, or incorrectresolved, needs_follow_up, or escalatesafe, needs_review, or blockedWhat's New:
Numeric or Categorical when creating a custom LLM-as-a-Judge evaluator
Fetched April 13, 2026