Validate evaluation reliability and uncover insights with comprehensive score analysis. Score Analytics now provides comprehensive tools for analyzing and comparing evaluation scores across your LLM application.
Key Features:
Use Cases: Validate LLM judge reliability, measure human-AI annotation agreement, identify coverage gaps, spot quality regressions, and discover feature relationships through score comparison.
Add human annotations while reviewing experiment results side-by-side. You can now annotate traces directly from the experiment compare view, streamlining the workflow of running experiments and adding human feedback.
Key Features:
Score Configurations: Support for numerical scores (with min/max ranges), categorical scores (custom classifications), and binary scores (pass/fail judgments).
Workflow:
Standardized score configs ensure consistency across experiments and team members.
Experiment compare view now supports baseline designation. Select two experiment runs, click Compare, and set one as baseline to enable side-by-side analysis of baseline versus candidate performance.
Key Features:
Getting Started: Run two experiment versions using the same dataset, select both runs and click Compare, designate the production version as baseline, and review metrics in Charts tab or drill into item-level differences in Outputs tab.
The experiment compare view now supports filtering. Use filters to narrow down results based on specific criteria, such as evaluator scores, cost, latency, or other metrics. For instance, filter to show only items where your evaluator scores dropped below a threshold, making it easy to identify and address problematic cases. Drill down into your data faster by filtering to show only specific experiment items.
Launch Week 4 release introducing comprehensive agent tracing capabilities. Features include:
Agent Tools – Tool calls are now rendered at the top of each generation, showing all available tools to the LLM. Click any tool to see its full definition, description, and parameters. In the Chat UI, called tools are displayed alongside their arguments and call IDs with numbering that matches the available tools list.
Trace Log View – New Log View for traces displaying a single concatenation of all observations. Easily skim every agent step by scrolling and use CMD/Ctrl+F to search through entire traces, particularly useful for complex looping agents.
Observation Types – Expanded observation types to bring more meaning to spans, allowing easy identification of action types such as tool calls, embeddings, and agents.
Agent Graphs GA – Agent graphs are now generally available and work with any agent framework or custom instrumentation. The system infers graph structure from observation timings and nesting to visualize true execution flow, especially useful for complex looping observations.
Day 2 of Launch Week 4 brings a new integration with Amazon Bedrock AgentCore, enabling comprehensive observability for AI agents deployed on AWS infrastructure.
Trace AI agents built with Amazon Bedrock AgentCore via OpenTelemetry and Langfuse. The integration supports distributed tracing to connect traces from local development to production AgentCore deployments.
Key capabilities:
Includes a comprehensive example repository demonstrating a complete continuous evaluation loop with AgentCore and Langfuse, covering experimentation, QA testing, and production monitoring.
Tag teammates with @mentions to notify them instantly, and add emoji reactions to comments for quick acknowledgments. Comments now support @mentions and emoji reactions, making it easier to collaborate with your team directly in Langfuse.
New features:
@ in any comment to see a list of your project members and tag them instantlyLangfuse now supports IdP-initiated SSO, allowing users to start authentication directly from their identity provider (e.g., Okta, Azure AD, Keycloak, JumpCloud). This enables a seamless authentication experience where users can click on the Langfuse application tile in their identity provider's dashboard and be automatically authenticated.
How It Works: When configuring IdP-initiated SSO, you set up your identity provider to redirect users to <YOUR_LANGFUSE_INSTANCE_URL>/auth/sso-initiate?provider=<PROVIDER>. Langfuse automatically detects the provider and initiates the SSO authentication flow.
Getting Started: On Langfuse Cloud, contact support to configure IdP-initiated SSO. When self-hosting, see the SSO documentation for configuration instructions. IdP-initiated SSO is available on version >=v3.126.0 of Langfuse.
Langfuse now integrates with Mixpanel to send LLM-related product metrics into your existing Mixpanel dashboards. Configure the integration in project settings by selecting your Mixpanel region and providing your Project Token. When activated, Langfuse sends metrics related to traces, generations, and scores to Mixpanel.
Key features: