Fixed score and feedback emission to support live correlation context and unanchored annotations. (#14942)
Fixed a crash when using provider-defined tools (like openai.tools.webSearch()) with autoResumeSuspendedTools enabled. (#14940)
Fixed an AsyncLocalStorage runtime error when importing @mastra/core/observability in browser environments. (#14948)
Fixed assistant message prefill error crashing sessions. When a model does not support assistant message prefill, the harness now automatically retries with a user message instead of failing. (#14953)
Added error name and stack trace to SpanErrorInfo, allowing exporters to access the original error class name and stack trace for richer error reporting. (#14944)
Fixed workflow spans missing entityName, which caused the metrics dashboard to show 'unknown' for workflow trace volume (#14949)
Adds a new trimMode option with a contiguous strategy that preserves a continuous suffix of messages by stopping at the first message that exceeds the token budget. Default behavior remains unchanged. (#14801)
Added component-scoped logging with custom filtering to ConsoleLogger (#14947)
new ConsoleLogger({
level: 'debug',
filter: ({ component }) => component === 'AGENT',
});
Fixed score and feedback emission to support live correlation context and unanchored annotations. (#14942)
Fixed a crash when using provider-defined tools (like openai.tools.webSearch()) with autoResumeSuspendedTools enabled. (#14940)
Fixed an AsyncLocalStorage runtime error when importing @mastra/core/observability in browser environments. (#14948)
Fixed assistant message prefill error crashing sessions. When a model does not support assistant message prefill, the harness now automatically retries with a user message instead of failing. (#14953)
Added error name and stack trace to SpanErrorInfo, allowing exporters to access the original error class name and stack trace for richer error reporting. (#14944)
Fixed workflow spans missing entityName, which caused the metrics dashboard to show 'unknown' for workflow trace volume (#14949)
Added scorer tracing and exported scores through the observability bus. (#14920)
What changed
SCORER_RUN and SCORER_STEP spans for scorer execution.mastra.observability.addScore() when a target trace is available.Why This makes scorer execution easier to debug and starts moving scorer results onto the new observability-based score pipeline.
Example
await scorer.run({
input,
output,
scoreSource: 'experiment',
targetScope: 'span',
targetTraceId: traceId,
targetSpanId: spanId,
});
9a43b47)Added DualLogger that transparently forwards all infrastructure logger calls (debug, info, warn, error, trackException) to the observability system (loggerVNext). This means all internal Mastra logs now automatically appear in your observability storage (e.g. DuckDB) without any code changes. (#14899)
trackException now extracts structured error data (errorId, domain, category, details, cause) and forwards it as an error-level log to observability storage, so exceptions are queryable alongside regular logs.
Added logging config option to ObservabilityInstance for controlling which logs reach observability storage:
new Observability({
instance: new MastraObservability({
logging: {
enabled: true, // set to false to disable log forwarding
level: 'info', // minimum level: 'debug' | 'info' | 'warn' | 'error' | 'fatal'
},
}),
});
Add registerExporter method to the observability stack and Mastra class for runtime exporter registration (#14730)
Fixed Anthropic API rejection of empty user text content blocks. (#14906)
User messages containing only empty text parts (e.g., { type: 'text', text: '' }) are now filtered out before being sent to the LLM. This prevents the "text content blocks must be non-empty" error that could occur when corrupted messages existed in the database.
Note: The root cause of how these empty user messages get persisted is still under investigation.
Improved the pattern field description in the list_files workspace tool to prevent AI models from passing "*" when they intend to match all files. The description now clarifies that omitting pattern lists all files, that * only matches within a single directory level (standard glob), and that glob patterns only filter files while directories are always shown. (#14897)
Added a lastMessageOnly option to the LLM-backed moderation, language detection, prompt injection, PII, and system prompt scrubber processors so they can inspect only the newest message instead of re-checking the full conversation on every run. (#14903)
Fixed providerMetadata (e.g. Gemini's thoughtSignature) being stripped from tool-call events when using the non-streaming (generate) code path (#14900)
Standardized all logger calls across the codebase to use static string messages with structured data objects. Dynamic values are now passed as key-value pairs in the second argument instead of being interpolated into template literal strings. This improves log filterability and searchability in observability storage. (#14899)
Removed ~150 redundant or noisy log calls including duplicate error logging after trackException and verbose in-memory storage CRUD traces.
Fixed duplicate OpenAI item ID errors when using web search. When OpenAI streams responses with web search citations, it interleaves source chunks with text, causing multiple message parts to share the same item ID. This resulted in 'Duplicate item found' errors on subsequent requests. The fix prevents text flushing on source chunks and merges any existing duplicate parts. (#14908)
Added DualLogger that transparently forwards all infrastructure logger calls (debug, info, warn, error, trackException) to the observability system (loggerVNext). This means all internal Mastra logs now automatically appear in your observability storage (e.g. DuckDB) without any code changes. (#14899)
trackException now extracts structured error data (errorId, domain, category, details, cause) and forwards it as an error-level log to observability storage, so exceptions are queryable alongside regular logs.
Added logging config option to ObservabilityInstance for controlling which logs reach observability storage:
new Observability({
instance: new MastraObservability({
logging: {
enabled: true, // set to false to disable log forwarding
level: 'info', // minimum level: 'debug' | 'info' | 'warn' | 'error' | 'fatal'
},
}),
});
Add registerExporter method to the observability stack and Mastra class for runtime exporter registration (#14730)
Fixed Anthropic API rejection of empty user text content blocks. (#14906)
User messages containing only empty text parts (e.g., { type: 'text', text: '' }) are now filtered out before being sent to the LLM. This prevents the "text content blocks must be non-empty" error that could occur when corrupted messages existed in the database.
Note: The root cause of how these empty user messages get persisted is still under investigation.
Improved the pattern field description in the list_files workspace tool to prevent AI models from passing "*" when they intend to match all files. The description now clarifies that omitting pattern lists all files, that * only matches within a single directory level (standard glob), and that glob patterns only filter files while directories are always shown. (#14897)
Added a lastMessageOnly option to the LLM-backed moderation, language detection, prompt injection, PII, and system prompt scrubber processors so they can inspect only the newest message instead of re-checking the full conversation on every run. (#14903)
Fixed providerMetadata (e.g. Gemini's thoughtSignature) being stripped from tool-call events when using the non-streaming (generate) code path (#14900)
Standardized all logger calls across the codebase to use static string messages with structured data objects. Dynamic values are now passed as key-value pairs in the second argument instead of being interpolated into template literal strings. This improves log filterability and searchability in observability storage. (#14899)
Removed ~150 redundant or noisy log calls including duplicate error logging after trackException and verbose in-memory storage CRUD traces.
Fixed duplicate OpenAI item ID errors when using web search. When OpenAI streams responses with web search citations, it interleaves source chunks with text, causing multiple message parts to share the same item ID. This resulted in 'Duplicate item found' errors on subsequent requests. The fix prevents text flushing on source chunks and merges any existing duplicate parts. (#14908)
feat(memory): add minMessages option to generateTitle config (#14778)
Delay automatic title generation until a minimum number of messages is reached, improving title quality and reducing unnecessary LLM calls.
Update provider registry and model documentation with latest models and providers (180aaaf)
Streaming traces now end correctly when a model call fails or a request is aborted, so they no longer remain stuck "in progress" in observability tools. (#14661)
Fix getWorkflowRunById with withNestedWorkflows not returning nested steps for branch sub-workflows (#14713)
Tools that return objects with circular references no longer crash the agent with "Converting circular structure to JSON". Circular parts are replaced with "[Circular]" and the conversation continues normally. (#14535)
Fixed crashes when using ModelRouterLanguageModel with AI SDK v6's generateObject() or generateText(). The model router now correctly preserves usage and metadata from underlying models. (#14283)
Agents using structured output no longer fail when workflow tools are present. Setting toolChoice to 'none' now correctly prevents tools from being sent to the provider, fixing errors from providers like Gemini that reject structured output requests when tools are included. (#14466)
Sub-agent tool calls no longer fail when LLMs use query, message, or input instead of prompt during repeated sub-agent calls via custom gateways. These common aliases are now automatically recognized and mapped to prompt when the schema expects it. (#14219)
Fixed an issue where supervisor agent messages were being saved to the sub-agent thread, causing duplicate tool call badges to appear in the chat history when sub-agents are invoked multiple times. (#13881)
Fixed workspace vector indexing silently swallowing embedder and search engine errors during auto-indexing. File-read errors (binary files, invalid UTF-8) are still skipped, but indexing failures are now logged as warnings instead of being silently ignored. ()
feat(memory): add minMessages option to generateTitle config (#14778)
Delay automatic title generation until a minimum number of messages is reached, improving title quality and reducing unnecessary LLM calls.
query, message, or input instead of prompt during repeated sub-agent calls via custom gateways. These common aliases are now automatically recognized and mapped to prompt when the schema expects it. (#14219)Streaming traces now end correctly when a model call fails or a request is aborted, so they no longer remain stuck "in progress" in observability tools. (#14661)
Fix getWorkflowRunById with withNestedWorkflows not returning nested steps for branch sub-workflows (#14713)
Tools that return objects with circular references no longer crash the agent with "Converting circular structure to JSON". Circular parts are replaced with "[Circular]" and the conversation continues normally. (#14535)
Fixed crashes when using ModelRouterLanguageModel with AI SDK v6's generateObject() or generateText(). The model router now correctly preserves usage and metadata from underlying models. (#14283)
Agents using structured output no longer fail when workflow tools are present. Setting toolChoice to 'none' now correctly prevents tools from being sent to the provider, fixing errors from providers like Gemini that reject structured output requests when tools are included. (#14466)
Fixed an issue where supervisor agent messages were being saved to the sub-agent thread, causing duplicate tool call badges to appear in the chat history when sub-agents are invoked multiple times. (#13881)
Fixed workspace vector indexing silently swallowing embedder and search engine errors during auto-indexing. File-read errors (binary files, invalid UTF-8) are still skipped, but indexing failures are now logged as warnings instead of being silently ignored. (#14786)
Fixed incorrect type cast for sub-agent context messages. The context option for new API methods (generate, stream, resumeGenerate, resumeStream) now correctly casts to ModelMessage[] instead of CoreMessage[]. (#14895)
180aaaf)Add version-aware code-agent lookup and override version lifecycle support. (#14776)
Mastra.getAgent(name, version) and Mastra.getAgentById(id, version) can now resolve draft or specific stored override versions when the editor package is configured, and throw a clear error when versioned lookup is requested without the editor.
client.getAgent(id, version) now carries version selection through agent detail and voice metadata requests, and the Agent resource now supports override version management methods including listVersions, createVersion, getVersion, activateVersion, restoreVersion, deleteVersion, and compareVersions.
Agent.createVersion(...) is intentionally limited to code-agent overrideable fields plus version metadata, rather than the full stored-agent configuration surface.
Trajectory evaluation: Added trajectory types and trace-based extraction for evaluating agent and workflow execution paths. (#14697)
TrajectoryStep models each step in an execution as a typed object — tool calls, model generations, agent runs, workflow steps, and control flow nodes each have their own variant with relevant properties (e.g., toolArgs/toolResult for tool calls, modelId/promptTokens for model generations). Steps can be nested via children to represent hierarchical execution.
TrajectoryExpectation lets you define what a good trajectory looks like — expected steps, ordering, step/token/duration budgets, blacklisted tools, and retry thresholds. ExpectedStep provides a simple way to define expected steps by name and optional stepType, with support for nested expectations via children to set different evaluation rules at each level of the hierarchy.
Trace-based extraction: extractTrajectoryFromTrace() builds hierarchical trajectories from observability trace spans. The runEvals pipeline automatically uses this when storage is configured, capturing the full execution tree including nested agent runs and tool calls. Falls back to extractTrajectory (agents) or extractWorkflowTrajectory (workflows) when storage is unavailable.
Pipeline: flows from dataset items through to trajectory scorers. Added key to both and .
Update provider registry and model documentation with latest models and providers (dc514a8)
Persist observational memory threshold settings across restarts and restore per-thread overrides. (#14788)
Fixed title generation blocking stream completion. The generateTitle LLM call now runs in the background instead of blocking the stream from closing, removing the 2-3 second post-response delay in the UI when memory is enabled. (#14757)
feat(memory): add recall-tool history retrieval for agents using observational memory (#14567)
Agents that use observational memory can now use the recall tool to retrieve history from past conversations, including raw messages, thread listings, and indexed observation-group memories.
Enable observational-memory retrieval when listing tools:
const tools = await memory.listTools({
threadId: 'thread_123',
resourceId: 'resource_abc',
observationalMemory: {
retrieval: { vector: true, scope:
Added public score and feedback analytics APIs to observability storage: (#14861)
getScoreAggregate / getFeedbackAggregate for counts, sums, averages, minimums, maximums, or latest values;
getScoreBreakdown / getFeedbackBreakdown for grouped results by dimension;
getScoreTimeSeries / getFeedbackTimeSeries for time-bucketed trends;
and getScorePercentiles / getFeedbackPercentiles for percentile series such as p50 and p95.
await observability.getScoreTimeSeries({
scorerId: 'relevance',
interval: '1h',
aggregation: 'avg',
});
// returns time-bucketed average scores
Added support for attaching scorers to datasets. Scorers attached to a dataset automatically run when an experiment is triggered, alongside any scorers specified at trigger time. New scorerIds field on DatasetRecord, CreateDatasetInput, and UpdateDatasetInput types. (#14783)
Added new observability entrypoint APIs for persisted traces. You can now call mastra.observability.getRecordedTrace({ traceId }) to load a recorded trace, and use optional top-level mastra.observability.addScore()/addFeedback() helpers to annotate a persisted trace by ID. (#14842)
Align observability signal contracts around first-class trace and span fields. (#14838)
Improved observability signal consistency
Logs, metrics, scores, and feedback now carry traceId and spanId directly on each signal. Shared correlation metadata stays in correlationContext.
Added clearer provenance fields
Score and feedback payloads now support scoreSource, feedbackSource, and executionSource for clearer source tracking.
Migration note
Deprecated fields (like source and feedback userId) are still accepted for compatibility.
Fixed agent run traces not appearing in Datadog and other observability backends when LLM calls fail. Previously, an API error during streaming would leave the root AGENT_RUN span open indefinitely, causing the entire trace tree to be silently dropped by exporters that wait for the root span to close. Failed agent runs now correctly end the span with error information, making failures visible in your observability dashboard. (#14850)
Fixed mcpOptions (including serverless: true) being silently ignored when using the Mastra deployer. The deployer now forwards mcpOptions from your server config to the underlying MastraServer, so MCP stateless mode works correctly in serverless environments like Cloudflare Workers, Vercel Edge, and AWS Lambda. (#14810) (#14812)
What changed:
mcpOptions to the ServerConfig type so it can be set in new Mastra({ server: { ... } })server.mcpOptions through to MastraServerExample:
const mastra = new Mastra({
server: {
mcpOptions: {
serverless: true,
},
},
});
Add version-aware code-agent lookup and override version lifecycle support. (#14776)
Mastra.getAgent(name, version) and Mastra.getAgentById(id, version) can now resolve draft or specific stored override versions when the editor package is configured, and throw a clear error when versioned lookup is requested without the editor.
client.getAgent(id, version) now carries version selection through agent detail and voice metadata requests, and the Agent resource now supports override version management methods including listVersions, createVersion, getVersion, activateVersion, restoreVersion, deleteVersion, and compareVersions.
Agent.createVersion(...) is intentionally limited to code-agent overrideable fields plus version metadata, rather than the full stored-agent configuration surface.
Fixed incorrect type cast for sub-agent context messages. The context option for new API methods (generate, stream, resumeGenerate, resumeStream) now correctly casts to ModelMessage[] instead of CoreMessage[]. (#14895)
expectedTrajectoryrunEvalstrajectoryAgentScorerConfigWorkflowScorerConfigAdded support for attaching scorers to datasets. Scorers attached to a dataset automatically run when an experiment is triggered, alongside any scorers specified at trigger time. New scorerIds field on DatasetRecord, CreateDatasetInput, and UpdateDatasetInput types. (#14783)
Add lsp_inspect tool for LSP-based code inspection with hover, definition, and implementation queries (#14565)
Added disableBuiltinTools to HarnessConfig so you can disable specific built-in harness tools. (#14227)
Example:
new Harness({ disableBuiltinTools: ['submit_plan', 'subagent'] });
Added SkillSearchProcessor for on-demand skill discovery. Instead of injecting all skill metadata upfront, agents get search_skills and load_skill meta-tools to find and load skills on demand with thread-scoped state and TTL cleanup. (#14596)
Example
import { SkillSearchProcessor } from '@mastra/core/processors';
const skillSearch = new SkillSearchProcessor({
workspace,
search: { topK: 5 },
});
const agent = new Agent({
workspace,
inputProcessors: [skillSearch],
});
Added public score and feedback analytics APIs to observability storage: (#14861)
getScoreAggregate / getFeedbackAggregate for counts, sums, averages, minimums, maximums, or latest values;
getScoreBreakdown / getFeedbackBreakdown for grouped results by dimension;
getScoreTimeSeries / getFeedbackTimeSeries for time-bucketed trends;
and getScorePercentiles / getFeedbackPercentiles for percentile series such as p50 and p95.
await observability.getScoreTimeSeries({
scorerId: 'relevance',
interval: '1h',
aggregation: 'avg',
});
// returns time-bucketed average scores
Added new observability entrypoint APIs for persisted traces. You can now call mastra.observability.getRecordedTrace({ traceId }) to load a recorded trace, and use optional top-level mastra.observability.addScore()/addFeedback() helpers to annotate a persisted trace by ID. (#14842)
Align observability signal contracts around first-class trace and span fields. (#14838)
Improved observability signal consistency
Logs, metrics, scores, and feedback now carry traceId and spanId directly on each signal. Shared correlation metadata stays in correlationContext.
Added clearer provenance fields
Score and feedback payloads now support scoreSource, feedbackSource, and executionSource for clearer source tracking.
Migration note
Deprecated fields (like source and feedback userId) are still accepted for compatibility.
With retrieval enabled, recall can browse the current thread, list threads for the current resource, and search indexed observation groups with source ranges.
Added resolvedVersionId to agent run trace span attributes for tracking which agent version was used during execution. (#14847)
Limit dynamically injected AGENTS.md reminders to 1000 estimated tokens by default and tell mastracode observational memory to ignore those ephemeral reminder messages. (#14790)
Fixed missing TRequestContext type parameter on DynamicArgument fields in AgentConfig. Previously, only instructions and tools correctly propagated the requestContextSchema type to their dynamic function callbacks. Now all dynamic fields — model, workflows, workspace, agents, memory, scorers, defaultGenerateOptionsLegacy, defaultStreamOptionsLegacy, defaultOptions, defaultNetworkOptions, inputProcessors, and outputProcessors — properly type requestContext based on the agent's requestContextSchema. (#14582)
Before:
const agent = new Agent({
requestContextSchema: z.object({ userId: z.string() }),
workspace: ({ requestContext }) => {
requestContext.get('userId'); // typed as `unknown`
},
});
After:
const agent = new Agent({
requestContextSchema: z.object({ userId: z.string() }),
workspace: ({ requestContext }) => {
requestContext.get('userId'); // typed as `string`
},
});
Fixed resuming suspended tool calls with resumeStream or approveToolCall failing with a TripWire when input processors (e.g. TokenLimiterProcessor) are enabled on the agent. (#14561)
Fixed Harness.listThreads() so callers can request threads across all resources. (#14690)
Fixed agent run traces not appearing in Datadog and other observability backends when LLM calls fail. Previously, an API error during streaming would leave the root AGENT_RUN span open indefinitely, causing the entire trace tree to be silently dropped by exporters that wait for the root span to close. Failed agent runs now correctly end the span with error information, making failures visible in your observability dashboard. (#14850)
Fixed streaming delegation to propagate output processor modifications to the supervisor. Previously, when a sub-agent had an output processor that modified text via processOutputResult, the supervisor received the raw LLM output instead of the processed text. The processed text was only saved to the sub-agent's memory. Now the supervisor correctly receives the output-processor-modified text from delegated sub-agents in the streaming path. (#14731)
Fixed Harness stateSchema typing to accept Zod schemas with .default(), .optional(), and .transform() modifiers. Previously, these modifiers caused TypeScript errors because the type system forced schema Input and Output types to be identical. Now stateSchema correctly accepts any schema regardless of input type divergence. (#14606)
Add getReviewSummary() to experiments storage for aggregating review status counts (#14649)
Query experiment results grouped by experiment ID, returning counts of needs-review, reviewed, and complete items in a single query instead of fetching all results client-side.
const summary = await storage.experiments.getReviewSummary();
// [{ experimentId: 'exp-1', needsReview: 3, reviewed: 5, complete: 2, total: 10 }, ...]
Fixed mcpOptions (including serverless: true) being silently ignored when using the Mastra deployer. The deployer now forwards mcpOptions from your server config to the underlying MastraServer, so MCP stateless mode works correctly in serverless environments like Cloudflare Workers, Vercel Edge, and AWS Lambda. (#14810) (#14812)
What changed:
mcpOptions to the ServerConfig type so it can be set in new Mastra({ server: { ... } })server.mcpOptions through to MastraServerExample:
const mastra = new Mastra({
server: {
mcpOptions: {
serverless: true,
},
},
});
Added isValidationError type guard for the ValidationError interface (#14853)
Fixed models.dev provider URLs to interpolate environment variable placeholders like ${ACCOUNT_ID} before creating the underlying provider client. (#14722)
Fixed tool input validation failures not producing observability spans. When input schema validation failed, no TOOL_CALL span was created because span creation happened inside the execution function that ran after validation. Moved span creation before input validation so validation errors are now captured in spans and visible in observability backends like Datadog. (#14677)
Fixed MODEL_GENERATION and AGENT_RUN spans not reflecting model, provider, parameters, and availableTools overrides from input processors. Traces in Langfuse and other exporters now show the correct model info when a processor dynamically switches models. (#14705)
Fixed MODEL_GENERATION observability span to include all system messages (tagged and untagged). Previously, working memory and semantic recall instructions were missing from trace inputs because only untagged system messages were captured. (#14800)
Fixed models.dev auth env selection to prefer auth credentials over URL path identifiers, so Cloudflare Workers AI no longer uses the account ID for authentication. (#14687)
Fixed processInputStep always receiving an empty steps array. Processors can now inspect previous step results (tool calls, LLM responses) when running inside the agentic loop. (#14821)
Configurable weights: Add weights option to createTrajectoryScorerCode for controlling how dimension scores are combined. Defaults to { accuracy: 0.4, efficiency: 0.3, toolFailures: 0.2, blacklist: 0.1 }. (#14740)
const scorer = createTrajectoryScorerCode({
defaults: { steps: [{ name: 'search' }], maxSteps: 5 },
weights: { accuracy: 0.6, efficiency: 0.2, toolFailures: 0.1, blacklist: 0.1 },
});
ExpectedStep redesign: ExpectedStep is now a discriminated union mirroring TrajectoryStep. When you specify a stepType, you get autocomplete for that variant's fields (e.g., toolArgs for tool_call, modelId for model_generation). The old data: Record<string, unknown> field is replaced by direct variant fields.
// Before: { name: 'search', stepType: 'tool_call', data: { input: { query: 'weather' } } }
// After:
{ name: 'search', stepType: 'tool_call', toolArgs: { query: 'weather' } }
Remove compareStepData: The compareStepData option is removed from compareTrajectories, TrajectoryExpectation, and all scorers. Data fields are now auto-compared when present on expected steps — if you specify toolArgs on an ExpectedStep, it will be compared against the actual step. If you omit it, only name and stepType are matched.
Also fixes documentation inaccuracies in trajectory-accuracy.mdx and scorer-utils.mdx.
Added isValidationError type guard for the ValidationError interface (#14853)
Fixed models.dev provider URLs to interpolate environment variable placeholders like ${ACCOUNT_ID} before creating the underlying provider client. (#14722)
Fixed MODEL_GENERATION observability span to include all system messages (tagged and untagged). Previously, working memory and semantic recall instructions were missing from trace inputs because only untagged system messages were captured. (#14800)
Fixed models.dev auth env selection to prefer auth credentials over URL path identifiers, so Cloudflare Workers AI no longer uses the account ID for authentication. (#14687)
Fixed processInputStep always receiving an empty steps array. Processors can now inspect previous step results (tool calls, LLM responses) when running inside the agentic loop. (#14821)