releases.shpreview
Langfuse/Langfuse Changelog/Score Analytics with Multi-Score Comparison

Score Analytics with Multi-Score Comparison

$npx -y @buildinternet/releases show rel_360n9fpuFunpUIwdc_jIM

Validate evaluation reliability and uncover insights with comprehensive score analysis. Score Analytics now provides comprehensive tools for analyzing and comparing evaluation scores across your LLM application.

Key Features:

  • Multi-Score Comparison: Compare any two scores of the same data type to validate evaluation reliability with correlation metrics, confusion matrices, and alignment patterns
  • Statistical Validation: Measure agreement with Pearson correlation, Cohen's Kappa, F1 scores, and other metrics with badge indicators for quick interpretation
  • Multi-Data Type Support: Analyze numeric (continuous), categorical (discrete), or boolean (binary) scores with type-appropriate visualizations
  • Matched vs All Analysis: Toggle between matched data to measure alignment or view all data for coverage and individual distributions
  • Temporal Insights: Track score evolution over time with configurable intervals to identify quality regressions or improvements

Use Cases: Validate LLM judge reliability, measure human-AI annotation agreement, identify coverage gaps, spot quality regressions, and discover feature relationships through score comparison.

Score Analytics with Multi-Score ComparisonScore Analytics Dashboard

Fetched April 13, 2026