Langfuse/Langfuse Changelog

Langfuse Changelog

$npx -y @buildinternet/releases show langfuse-changelog

Sun

Mon

Tue

Wed

Thu

Fri

Sat

AprMayJunJulAugSepOctNovDecJanFebMarApr

Less

Releases8Avg3/moVersionsv4

Nov 7, 2025

Validate evaluation reliability and uncover insights with comprehensive score analysis. Score Analytics now provides comprehensive tools for analyzing and comparing evaluation scores across your LLM application.

Key Features:

Multi-Score Comparison: Compare any two scores of the same data type to validate evaluation reliability with correlation metrics, confusion matrices, and alignment patterns
Statistical Validation: Measure agreement with Pearson correlation, Cohen's Kappa, F1 scores, and other metrics with badge indicators for quick interpretation
Multi-Data Type Support: Analyze numeric (continuous), categorical (discrete), or boolean (binary) scores with type-appropriate visualizations
Matched vs All Analysis: Toggle between matched data to measure alignment or view all data for coverage and individual distributions
Temporal Insights: Track score evolution over time with configurable intervals to identify quality regressions or improvements

Use Cases: Validate LLM judge reliability, measure human-AI annotation agreement, identify coverage gaps, spot quality regressions, and discover feature relationships through score comparison.

Nov 6, 2025

Add human annotations while reviewing experiment results side-by-side. You can now annotate traces directly from the experiment compare view, streamlining the workflow of running experiments and adding human feedback.

Key Features:

Select any cell in the compare view to open the annotation side panel
Assign scores and leave comments while maintaining full experiment context
Use annotation score data to compare experiment results across different prompt versions and model configurations
Optimistic UI updates provide immediate feedback while data persists in the background
Summary metrics in the compare view reflect annotations as you work

Score Configurations: Support for numerical scores (with min/max ranges), categorical scores (custom classifications), and binary scores (pass/fail judgments).

Workflow:

Run an experiment via UI or SDK
Open the experiment comparison view
Click any item to open the annotation panel
Assign scores and add comments
Move to the next item for review

Standardized score configs ensure consistency across experiments and team members.

Experiment compare view now supports baseline designation. Select two experiment runs, click Compare, and set one as baseline to enable side-by-side analysis of baseline versus candidate performance.

Key Features:

Matched rows: Each row displays baseline and candidate outputs for the same dataset item using stable identifiers for direct comparison
Visual indicators: Green/red deltas for scores, cost, and latency highlight item-level changes
Column headers: Summary stats show aggregate performance differences between baseline and candidate
Trace access: Click any row to open execution traces and debug behavioral changes
Regression hunting: Use column filters to build regression worklists (e.g., filter by score thresholds or performance deltas)
Aggregate metrics: Charts tab shows high-level metric summaries comparing quality scores, cost, and latency distributions
Annotation support: Classify failures with structured scores using annotation mode

Getting Started: Run two experiment versions using the same dataset, select both runs and click Compare, designate the production version as baseline, and review metrics in Charts tab or drill into item-level differences in Outputs tab.

The experiment compare view now supports filtering. Use filters to narrow down results based on specific criteria, such as evaluator scores, cost, latency, or other metrics. For instance, filter to show only items where your evaluator scores dropped below a threshold, making it easy to identify and address problematic cases. Drill down into your data faster by filtering to show only specific experiment items.

Nov 5, 2025

Launch Week 4 release introducing comprehensive agent tracing capabilities. Features include:

Agent Tools – Tool calls are now rendered at the top of each generation, showing all available tools to the LLM. Click any tool to see its full definition, description, and parameters. In the Chat UI, called tools are displayed alongside their arguments and call IDs with numbering that matches the available tools list.

Trace Log View – New Log View for traces displaying a single concatenation of all observations. Easily skim every agent step by scrolling and use CMD/Ctrl+F to search through entire traces, particularly useful for complex looping agents.

Observation Types – Expanded observation types to bring more meaning to spans, allowing easy identification of action types such as tool calls, embeddings, and agents.

Agent Graphs GA – Agent graphs are now generally available and work with any agent framework or custom instrumentation. The system infers graph structure from observation timings and nesting to visualize true execution flow, especially useful for complex looping observations.

Nov 4, 2025

Day 2 of Launch Week 4 brings a new integration with Amazon Bedrock AgentCore, enabling comprehensive observability for AI agents deployed on AWS infrastructure.

Trace AI agents built with Amazon Bedrock AgentCore via OpenTelemetry and Langfuse. The integration supports distributed tracing to connect traces from local development to production AgentCore deployments.

Key capabilities:

Monitor complete agent execution flows in production
Track LLM calls with token counts and costs
Debug tool usage and MCP interactions
Analyze latency metrics at each step
Maintain trace continuity across distributed systems

Includes a comprehensive example repository demonstrating a complete continuous evaluation loop with AgentCore and Langfuse, covering experimentation, QA testing, and production monitoring.

Tag teammates with @mentions to notify them instantly, and add emoji reactions to comments for quick acknowledgments. Comments now support @mentions and emoji reactions, making it easier to collaborate with your team directly in Langfuse.

New features:

@mention autocomplete: Type @ in any comment to see a list of your project members and tag them instantly
Email notifications: Mentioned teammates receive email notifications with context about the comment and object
Emoji reactions: Add quick reactions to comments to acknowledge feedback, agree with findings, or show appreciation
Notification preferences: Control when you receive email notifications for mentions on a per-project basis

Langfuse now supports IdP-initiated SSO, allowing users to start authentication directly from their identity provider (e.g., Okta, Azure AD, Keycloak, JumpCloud). This enables a seamless authentication experience where users can click on the Langfuse application tile in their identity provider's dashboard and be automatically authenticated.

How It Works: When configuring IdP-initiated SSO, you set up your identity provider to redirect users to <YOUR_LANGFUSE_INSTANCE_URL>/auth/sso-initiate?provider=<PROVIDER>. Langfuse automatically detects the provider and initiates the SSO authentication flow.

Getting Started: On Langfuse Cloud, contact support to configure IdP-initiated SSO. When self-hosting, see the SSO documentation for configuration instructions. IdP-initiated SSO is available on version >=v3.126.0 of Langfuse.

Langfuse now integrates with Mixpanel to send LLM-related product metrics into your existing Mixpanel dashboards. Configure the integration in project settings by selecting your Mixpanel region and providing your Project Token. When activated, Langfuse sends metrics related to traces, generations, and scores to Mixpanel.

Key features:

Combines regular product analytics with LLM-specific metrics from Langfuse
Historical data synced to Mixpanel with automatic hourly updates (30-minute delay)
Enables analysis of user engagement with LLM features, retention impact, and conversion correlation
Example dashboard available using Mixpanel's AI Company KPIs template
Similar integration also available for PostHog users

Previous 1 2Next

Similar releases

Other sources from this team

Similar sources

Latest

Source

langfuse.com/changelog

Tracking Since

Nov 4, 2025

Last fetched Apr 13, 2026

.json·.md·.atom

Langfuse Changelog

More from this team

Similar releases

Other sources from this team

Similar sources

Other sources from this team

Similar sources

More from this team

Similar releases