Estimated end-of-life date, accurate to within three months: 06-2027 See the support level definitions for more information.
os._exit, SIGKILL, segfault) caused buffered test events to be lost. To enable eager flushing, set DD_TRACE_PARTIAL_FLUSH_MIN_SPANS=1.Estimated end-of-life date, accurate to within three months: 06-2027 See the support level definitions for more information.
Estimated end-of-life date, accurate to within three months: 06-2027 See the support level definitions for more information.
Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
ray.job.submit spans are removed. Ray job submission outcome is now reported on the existing ray.job span through ray.job.submit_status.pin parameter in ddtrace.contrib.dbapi.TracedConnection, ddtrace.contrib.dbapi.TracedCursor, and ddtrace.contrib.dbapi_async.TracedAsyncConnection is deprecated and will be removed in version 5.0.0. To manage configuration of DB tracing please use integration configuration and environment variables.DD_TRACE_INFERRED_PROXY_SERVICES_ENABLED is deprecated and will be removed in 5.0.0. Use DD_TRACE_INFERRED_SPANS_ENABLED instead. The old environment variable continues to work but emits a DDTraceDeprecationWarning when set.profiling
ASM
ddtrace.appsec.ai_guard.integrations.litellm.DatadogAIGuardGuardrail class can be registered as a custom guardrail in the LiteLLM proxy to evaluate requests and responses against AI Guard security policies. Requires the LiteLLM proxy guardrails API v2 available since litellm>=1.46.1.azure_cosmos
CI Visibility
DD_AGENTLESS_LOG_SUBMISSION_ENABLED=true for agentless setups, or DD_LOGS_INJECTION=true when using the Datadog Agent.llama_index
llama-index-core>=0.11.0. Traces LLM calls, query engines, retrievers, embeddings, and agents. See the llama_index documentation for more information.tracing
OTEL_TRACES_EXPORTER=otlp to send spans to an OTLP endpoint instead of the Datadog Agent.mysql
mysql.connector.aio.connect in the MySQL integration.LLM Observability
decorator tag to LLM Observability spans that are traced by a function decorator.pydantic_evals ReportEvaluator as a summary evaluator when its evaluate return annotation is exactly ScalarResult. The scalar value is recorded as the summary evaluation. Report evaluators that declare a broader analysis return type (for example the full ReportAnalysis union) are not accepted as summary evaluators; use a class-based or function summary evaluator instead. Examples and further documentation can found in our documentation [here](https://docs.datadoghq.com/llm_observability/guide/evaluation_developer_guide).Example:
from pydantic_evals.evaluators import ReportEvaluator
from pydantic_evals.evaluators import ReportEvaluatorContext
from pydantic_evals.reporting.analyses import ScalarResult
from ddtrace.llmobs import LLMObs
dataset = LLMObs.create_dataset(
dataset_name="<DATASET_NAME>",
description="<DATASET_DESCRIPTION>",
records=[RECORD_1, RECORD_2, RECORD_3, ...]
)
class TotalCasesEvaluator(ReportEvaluator):
def evaluate(self, ctx: ReportEvaluatorContext) -> ScalarResult:
return ScalarResult(
title='Total Cases',
value=len(ctx.report.cases),
unit='cases',
)
def my_task(input_data, config):
return input_data["output"]
equals_expected = EqualsExpected()
summary_evaluator = TotalCasesEvaluator()
experiment = LLMObs.experiment(
name="<EXPERIMENT_NAME>",
task=my_task,
dataset=dataset,
evaluators=[equals_expected],
summary_evaluators=[summary_evaluator],
description="<EXPERIMENT_DESCRIPTION>."
)
result = experiment.run()
ModuleNotFoundError could be raised at startup in Python environments without the _ctypes extension module.invoke_agent) were incorrectly appearing as siblings of their SDK parent span (e.g. call_agent) rather than being nested under it.model_name and model_provider reported on AWS Bedrock LLM spans as the model_id full model identifier value (e.g., "amazon.nova-lite-v1:0") and "amazon_bedrock" respectively. Bedrock spans' model_name and model_provider now correctly match backend pricing data, which enables features including cost tracking.defer_loading=True) in Anthropic and OpenAI integrations caused LLMObs span payloads to include full tool descriptions and JSON schemas for every tool in a large catalog. Deferred tool definitions now have their description and schema stripped from span metadata, with only the tool name preserved.os._exit, SIGKILL, segfault) caused buffered test events to be lost. To enable eager flushing, set DD_TRACE_PARTIAL_FLUSH_MIN_SPANS=1./search_commits endpoint caused the git metadata upload to fall back to sending the full 30-day commit history instead of aborting. This fallback could trigger cascading write load on the backend. The upload now aborts when search_commits fails, matching the behavior when the /packfile upload itself fails.return in the IAST taint tracking add_aspect native function that caused redundant work when only the right operand of a string concatenation was tainted.Task.replace().RuntimeError: coroutine ignored GeneratorExit that occurred under ASGI with async views and async middleware hooks on Python 3.13+. Async view methods and middleware hooks are now correctly detected and awaited instead of being wrapped with sync bytecode wrappers.svc.auto process tag attribution logic. The tag now correctly reflects the auto-detected service name derived from the script or module entrypoint, matching the service name the tracer would assign to spans.python -m <module> could report entrypoint.name as -m in process tags.network.client.ip and http.client_ip span tags were missing when client IP collection was enabled and request had no headers.DD_SERVICE is not explicitly configured. Service remapping rules configured in Datadog will now apply correctly to Lambda spans.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
openfeature-sdk is now 0.8.0 (previously 0.6.0). This is required for the finally_after hook to receive evaluation details for metrics tracking.Reason.ERROR with ErrorCode.FLAG_NOT_FOUND instead of Reason.DEFAULT when configuration is available but the flag is not found. The previous behavior (Reason.DEFAULT) is preserved when no configuration is loaded. This aligns Python with other Datadog SDK implementations.mlflow
DD_API_KEY, DD_APP_KEY and DD_MODEL_LAB_ENABLED are set, HTTP requests to the MLFlow tracking server will include the DD-API-KEY and DD-APPLICATION-KEY headers. #16685ai_guard
block=False, which now defaults to block=True.strands-agents>=1.29.0; the HookProvider works with any version that exposes the hooks system.azure_durable_functions
profiling
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.runtime metrics
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.remote configuration
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.dynamic instrumentation
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.crashtracking
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.data streams monitoring
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.database monitoring
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.Stats computation
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.LLM Observability
Adds support for capturing stop_reason and structured_output from the Claude Agent SDK integration.
Adds support for user-defined dataset record IDs. Users can now supply an optional id field when creating dataset records via Dataset.append(), Dataset.extend(), create_dataset(), or create_dataset_from_csv() (via the new id_column parameter). If no id is provided, the SDK generates one automatically.
Experiment tasks can now optionally receive dataset record metadata as a third metadata parameter. Tasks with the existing (input_data, config) signature continue to work unchanged.
This introduces RemoteEvaluator which allows users to reference LLM-as-Judge evaluations configured in the Datadog UI by name when running local experiments. For more information, see the documentation: https://docs.datadoghq.com/llm_observability/guide/evaluation_developer_guide/#using-managed-evaluators
This adds cache creation breakdown metrics for the Anthropic integration. When making Anthropic calls with prompt caching, ephemeral_5m_input_tokens and ephemeral_1h_input_tokens metrics are now reported, distinguishing between 5 minute and 1 hour prompt caches.
Adds support for reasoning and extended thinking content in Anthropic, LiteLLM, and OpenAI-compatible integrations. Anthropic thinking blocks (type: "thinking") are now captured as role: "reasoning" messages in both streaming and non-streaming responses, as well as in input messages for tool use continuations. LiteLLM now extracts reasoning_output_tokens from completion_tokens_details and captures reasoning_content in output messages for OpenAI-compatible providers.
LLMJudge now forwards any extra client_options to the underlying provider client constructor. This allows passing provider-specific options such as base_url, timeout, organization, or max_retries directly through client_options.
Dataset records' tags can now be operated on with 3 new Dataset methods: `dataset.add_tags<span class="title-ref">, </span>dataset.remove_tags<span class="title-ref">, and </span>dataset.replace_tags<span class="title-ref">. All 3 new methods accepts an int indicating the zero based index of the record to operate on, and a list of strings in the format of key:values representing the tags. For example, if the tag "env:prod" exists on the 1st record of the dataset </span><span class="title-ref">ds</span><span class="title-ref">, calling </span><span class="title-ref">ds.remove_tags(0, ["env:prod"]</span>` will update the local state of the dataset record to have the "env:prod" tag removed.
Change experiment execution to run evaluators immediately after each record's task completes instead of batching all tasks first. Experiment spans and evaluation metrics are now posted incrementally as records complete rather than waiting until the end. This improves progress visibility and preserves partial results if a run fails midway.
Adds support for Pydantic AI evaluations in LLM Observability Experiments by allowing users to pass a pydantic evaluation (which inherents from Evaluator) in an LLM Obs Experiment.
Example:
from pydantic_evals.evaluators import EqualsExpected
from ddtrace.llmobs import LLMObs
dataset = LLMObs.create_dataset( dataset_name="<DATASET_NAME>", description="<DATASET_DESCRIPTION>", records=[RECORD_1, RECORD_2, RECORD_3, ...]
)
def my_task(input_data, config): return input_data["output"]
def my_summary_evaluator(inputs, outputs, expected_outputs, evaluators_results): return evaluators_results["Correctness"].count(True)
equals_expected = EqualsExpected()
experiment = LLMObs.experiment( name="<EXPERIMENT_NAME>", task=my_task, dataset=dataset, evaluators=[equals_expected], summary_evaluators=[my_summary_evaluator], # optional, used to summarize the experiment results description="<EXPERIMENT_DESCRIPTION>."
)
result = experiment.run()
tracer
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.aiohttp
/status/200) when aiohttp.ClientSession was initialized with a base_url. The span now records the fully-resolved URL (e.g. http://host:port/status/200), matching aiohttp's internal behaviour.pymongo
DD_TRACE_MONGODB_OBFUSCATION to allow the mongodb.query to be obfuscated or not. Resource names always remain normalized regardless of the value. To preserve raw mongodb.query values, pair with DD_APM_OBFUSCATION_MONGODB_ENABLED=false on the Datadog Agent. See Datadog trace obfuscation docs: Trace obfuscation.google_cloud_pubsub
google-cloud-pubsub library. Instruments PublisherClient.publish() and SubscriberClient.subscribe() to generate spans for message publishing and consuming, with optional distributed trace context propagation via message attributes. Use DD_GOOGLE_CLOUD_PUBSUB_PROPAGATION_ENABLED to control context propagation (default: True) and DD_GOOGLE_CLOUD_PUBSUB_PROPAGATION_AS_SPAN_LINKS to attach propagated context as span links instead of re-parenting subscriber spans under the producer trace (default: False).multipart/form-data bodies, which could allow an attacker to bypass WAF inspection by hiding a malicious value among safe ones.scope["path"] instead of scope["raw_path"] for WAF URI evaluation. In rare cases where the URI contained path traversal sequences, these could be resolved before reaching the WAF, potentially affecting a small number of URI-based detection rules on ASGI-based frameworks like FastAPI and Starlette.UNVALIDATED_REDIRECT vulnerabilities could be reported for a single redirect() call.GraphInterrupt exceptions were incorrectly marked as errors in APM traces. GraphInterrupt is a control-flow exception used in LangGraph's human-in-the-loop workflows and should not be treated as an error condition.anyio.ClosedResourceError raised during MCP server session teardown when the ddtrace MCP integration is enabled.pytest-rerunfailures and flaky were silently overridden by the ddtrace plugin. With this change, external rerun plugins will now drive retries as expected when Auto Test Retries and Early Flake Detection features are both disabled, otherwise our retry mechanism takes precedence and a warning is emitted.RuntimeError that occurred when the git binary was not available. Git metadata upload is now skipped gracefully with a warning instead of aborting pytest startup.X-RateLimit-Reset header when present to determine the retry delay.RuntimeError could be raised when iterating over the context._meta dictionary while creating spans or generating distributed traces.DD_TRACE_DEBUG instead of its own dedicated environment variable DD_INTERNAL_TELEMETRY_DEBUG_ENABLED. Setting DD_TRACE_DEBUG=true no longer enables telemetry debug mode. To enable telemetry debug mode, set DD_INTERNAL_TELEMETRY_DEBUG_ENABLED=true._dd.p.ksr span tag formatting for very small sampling rates. Previously, rates below 0.001 could be output in scientific notation (e.g. 1e-06). Now always uses decimal notation with up to 6 decimal digits.JSONDecodeError when parsing tool call arguments from streamed Anthropic response message chunks.FileNotFoundError in prompt optimization where the system prompt template was stored as a .md file that was excluded from release wheels. The template is now embedded in a Python module to ensure it is always available at runtime.DROPPED_VALUE_TEXT warning message to reference the actual 5MB size limit. The size limit itself has not changed; only the message text was updated from an incorrect 1MB reference to the correct 5MB value.cache_creation_input_tokens and cache_read_input_tokens were not captured when using the LiteLLM integration with providers that support prompt caching (e.g., Anthropic, OpenAI, Deepseek).@llm decorator raised a LLMObsAnnotateSpanError exception when a decorated function returned a value that could not be parsed as LLM messages. Note that manual annotation still overrides this automatic annotation.@llm decorator did not automatically annotate the return value as output in traces. The decorator now captures the return value and annotates it as output, consistent with @workflow and @task decorators. Manual annotations via LLMObs.annotate() still take precedence.repr() strings instead of JSON. Pydantic v1 and v2 models are now properly serialized using model_dump() or .dict() respectively.LLMObs.workflow()) and OTel-bridged spans (e.g. from Strands Agents with DD_TRACE_OTEL_ENABLED=1) produced separate LLMObs traces instead of a single unified trace.DD_LLMOBS_PAYLOAD_SIZE_BYTES and DD_LLMOBS_EVENT_SIZE_BYTES environment variables respectively. These default to 5242880 (5 MiB) and 5000000 (5 MB), matching the previous hardcoded values.tool_search_tool_regex.input_tokens from the initial message_start chunk instead of the final message_delta chunk, which contains the accurate cumulative input token count.ResponseFunctionToolCall) in the input list of the OpenAI Responses API were silently dropped from LLM Observability traces. Previously, the input parser used dict-only access patterns that failed for SDK objects; it now uses attribute-safe access that handles both plain dicts and SDK objects.model_provider to "unknown" when a custom base URL is configured that does not match a recognized provider in the OpenAI, Anthropic, and LiteLLM integrations.SIGTERM instead of honoring --graceful-timeout. #16424AttributeError crash that occurred when the lock profiler or stack profiler encountered _DummyThread instances. _DummyThread lacks the _native_id attribute, so accessing native_id raises AttributeError. The profiler now falls back to using the thread identifier when native_id is unavailable.acquire call was successful.gevent.wait called with the objects keyword argument (e.g. gevent.wait(objects=[g1, g2])) now correctly links the greenlets to their parent task. Additionally, greenlets joined via gevent.joinall or gevent.wait from a user-level greenlet are now attributed to that greenlet instead of always being attributed to the Hub.ulimit -s unlimited) on Linux caused the stack profiler sampling thread to fail to start, resulting in empty CPU and wall-time profiles. #17132KeyError that could occur when using gevent.Timeout has been fixed.template parameter value for all Flask versions.ddtrace (like the profile uploader) from triggering in fork-heavy applications has been fixed.Estimated end-of-life date, accurate to within three months: 06-2027 See the support level definitions for more information.
os._exit, SIGKILL, segfault) caused buffered test events to be lost. To enable eager flushing, set DD_TRACE_PARTIAL_FLUSH_MIN_SPANS=1.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
pin parameter in ddtrace.contrib.dbapi.TracedConnection, ddtrace.contrib.dbapi.TracedCursor, and ddtrace.contrib.dbapi_async.TracedAsyncConnection is deprecated and will be removed in version 5.0.0. To manage configuration of DB tracing please use integration configuration and environment variables.ASM
ddtrace.appsec.ai_guard.integrations.litellm.DatadogAIGuardGuardrail class can be registered as a custom guardrail in the LiteLLM proxy to evaluate requests and responses against AI Guard security policies. Requires the LiteLLM proxy guardrails API v2 available since litellm>=1.46.1.azure_cosmos
CI Visibility
DD_AGENTLESS_LOG_SUBMISSION_ENABLED=true for agentless setups, or DD_LOGS_INJECTION=true when using the Datadog Agent.llama_index
llama-index-core>=0.11.0. Traces LLM calls, query engines, retrievers, embeddings, and agents. See the llama_index documentation for more information.tracing
OTEL_TRACES_EXPORTER=otlp to send spans to an OTLP endpoint instead of the Datadog Agent.LLM Observability
decorator tag to LLM Observability spans that are traced by a function decorator.pydantic_evals ReportEvaluator as a summary evaluator when its evaluate return annotation is exactly ScalarResult. The scalar value is recorded as the summary evaluation. Report evaluators that declare a broader analysis return type (for example the full ReportAnalysis union) are not accepted as summary evaluators; use a class-based or function summary evaluator instead. Examples and further documentation can found in our documentation [here](https://docs.datadoghq.com/llm_observability/guide/evaluation_developer_guide).Example:
from pydantic_evals.evaluators import ReportEvaluator
from pydantic_evals.evaluators import ReportEvaluatorContext
from pydantic_evals.reporting.analyses import ScalarResult
from ddtrace.llmobs import LLMObs
dataset = LLMObs.create_dataset(
dataset_name="<DATASET_NAME>",
description="<DATASET_DESCRIPTION>",
records=[RECORD_1, RECORD_2, RECORD_3, ...]
)
class TotalCasesEvaluator(ReportEvaluator):
def evaluate(self, ctx: ReportEvaluatorContext) -> ScalarResult:
return ScalarResult(
title='Total Cases',
value=len(ctx.report.cases),
unit='cases',
)
def my_task(input_data, config):
return input_data["output"]
equals_expected = EqualsExpected()
summary_evaluator = TotalCasesEvaluator()
experiment = LLMObs.experiment(
name="<EXPERIMENT_NAME>",
task=my_task,
dataset=dataset,
evaluators=[equals_expected],
summary_evaluators=[summary_evaluator],
description="<EXPERIMENT_DESCRIPTION>."
)
result = experiment.run()
ModuleNotFoundError could be raised at startup in Python environments without the _ctypes extension module.invoke_agent) were incorrectly appearing as siblings of their SDK parent span (e.g. call_agent) rather than being nested under it.model_name and model_provider reported on AWS Bedrock LLM spans as the model_id full model identifier value (e.g., "amazon.nova-lite-v1:0") and "amazon_bedrock" respectively. Bedrock spans' model_name and model_provider now correctly match backend pricing data, which enables features including cost tracking.defer_loading=True) in Anthropic and OpenAI integrations caused LLMObs span payloads to include full tool descriptions and JSON schemas for every tool in a large catalog. Deferred tool definitions now have their description and schema stripped from span metadata, with only the tool name preserved.os._exit, SIGKILL, segfault) caused buffered test events to be lost. To enable eager flushing, set DD_TRACE_PARTIAL_FLUSH_MIN_SPANS=1./search_commits endpoint caused the git metadata upload to fall back to sending the full 30-day commit history instead of aborting. This fallback could trigger cascading write load on the backend. The upload now aborts when search_commits fails, matching the behavior when the /packfile upload itself fails.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
ASM
ddtrace.appsec.ai_guard.integrations.litellm.DatadogAIGuardGuardrail class can be registered as a custom guardrail in the LiteLLM proxy to evaluate requests and responses against AI Guard security policies. Requires the LiteLLM proxy guardrails API v2 available since litellm>=1.46.1.azure_cosmos
CI Visibility
DD_AGENTLESS_LOG_SUBMISSION_ENABLED=true for agentless setups, or DD_LOGS_INJECTION=true when using the Datadog Agent.llama_index
llama-index-core>=0.11.0. Traces LLM calls, query engines, retrievers, embeddings, and agents. See the llama_index documentation for more information.tracing
OTEL_TRACES_EXPORTER=otlp to send spans to an OTLP endpoint instead of the Datadog Agent.LLM Observability
decorator tag to LLM Observability spans that are traced by a function decorator.pydantic_evals ReportEvaluator as a summary evaluator when its evaluate return annotation is exactly ScalarResult. The scalar value is recorded as the summary evaluation. Report evaluators that declare a broader analysis return type (for example the full ReportAnalysis union) are not accepted as summary evaluators; use a class-based or function summary evaluator instead. Examples and further documentation can found in our documentation [here](https://docs.datadoghq.com/llm_observability/guide/evaluation_developer_guide).Example:
from pydantic_evals.evaluators import ReportEvaluator
from pydantic_evals.evaluators import ReportEvaluatorContext
from pydantic_evals.reporting.analyses import ScalarResult
from ddtrace.llmobs import LLMObs
dataset = LLMObs.create_dataset(
dataset_name="<DATASET_NAME>",
description="<DATASET_DESCRIPTION>",
records=[RECORD_1, RECORD_2, RECORD_3, ...]
)
class TotalCasesEvaluator(ReportEvaluator):
def evaluate(self, ctx: ReportEvaluatorContext) -> ScalarResult:
return ScalarResult(
title='Total Cases',
value=len(ctx.report.cases),
unit='cases',
)
def my_task(input_data, config):
return input_data["output"]
equals_expected = EqualsExpected()
summary_evaluator = TotalCasesEvaluator()
experiment = LLMObs.experiment(
name="<EXPERIMENT_NAME>",
task=my_task,
dataset=dataset,
evaluators=[equals_expected],
summary_evaluators=[summary_evaluator],
description="<EXPERIMENT_DESCRIPTION>."
)
result = experiment.run()
ModuleNotFoundError could be raised at startup in Python environments without the _ctypes extension module.invoke_agent) were incorrectly appearing as siblings of their SDK parent span (e.g. call_agent) rather than being nested under it.model_name and model_provider reported on AWS Bedrock LLM spans as the model_id full model identifier value (e.g., "amazon.nova-lite-v1:0") and "amazon_bedrock" respectively. Bedrock spans' model_name and <span class="title-ref">model_provider</span>` now correctly match backend pricing data, which enables features including cost tracking.defer_loading=True) in Anthropic and OpenAI integrations caused LLMObs span payloads to include full tool descriptions and JSON schemas for every tool in a large catalog. Deferred tool definitions now have their description and schema stripped from span metadata, with only the tool name preserved.os._exit, SIGKILL, segfault) caused buffered test events to be lost. To enable eager flushing, set DD_TRACE_PARTIAL_FLUSH_MIN_SPANS=1./search_commits endpoint caused the git metadata upload to fall back to sending the full 30-day commit history instead of aborting. This fallback could trigger cascading write load on the backend. The upload now aborts when search_commits fails, matching the behavior when the /packfile upload itself fails.Estimated end-of-life date, accurate to within three months: 06-2027 See the support level definitions for more information.
invoke_agent) were incorrectly appearing as siblings of their SDK parent span (e.g. call_agent) rather than being nested under it.os._exit, SIGKILL, segfault) caused buffered test events to be lost. To enable eager flushing, set DD_TRACE_PARTIAL_FLUSH_MIN_SPANS=1.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
openfeature-sdk is now 0.8.0 (previously 0.6.0). This is required for the finally_after hook to receive evaluation details for metrics tracking.Reason.ERROR with ErrorCode.FLAG_NOT_FOUND instead of Reason.DEFAULT when configuration is available but the flag is not found. The previous behavior (Reason.DEFAULT) is preserved when no configuration is loaded. This aligns Python with other Datadog SDK implementations.mlflow
DD_API_KEY, DD_APP_KEY and DD_MODEL_LAB_ENABLED are set, HTTP requests to the MLFlow tracking server will include the DD-API-KEY and DD-APPLICATION-KEY headers. #16685ai_guard
block=False, which now defaults to block=True.ASM
This introduces AI Guard support for the AWS Strands Agents SDK. Two entry points are provided for evaluating prompts, model responses, and tool calls against Datadog AI Guard security policies at four agent lifecycle points (BeforeModelCallEvent, AfterModelCallEvent, BeforeToolCallEvent, AfterToolCallEvent).
Plugin API (recommended, requires strands-agents >= 1.29.0):
from strands import Agent
from ddtrace.appsec.ai_guard import AIGuardStrandsPlugin
agent = Agent(model=model, plugins=[AIGuardStrandsPlugin()])
HookProvider API (legacy):
from strands import Agent
from ddtrace.appsec.ai_guard import AIGuardStrandsHookProvider
agent = Agent(model=model, hooks=[AIGuardStrandsHookProvider()])
The strands-agents package is optional. When it is not installed, both classes are replaced by no-op stubs that log a warning. The Plugin API requires strands-agents>=1.29.0; the HookProvider works with any version that exposes the hooks system.
azure_durable_functions
profiling
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.runtime metrics
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.remote configuration
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.dynamic instrumentation
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.crashtracking
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.data streams monitoring
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.database monitoring
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.Stats computation
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.LLM Observability
Adds support for capturing stop_reason and structured_output from the Claude Agent SDK integration.
Adds support for user-defined dataset record IDs. Users can now supply an optional id field when creating dataset records via Dataset.append(), Dataset.extend(), create_dataset(), or create_dataset_from_csv() (via the new id_column parameter). If no id is provided, the SDK generates one automatically.
Experiment tasks can now optionally receive dataset record metadata as a third metadata parameter. Tasks with the existing (input_data, config) signature continue to work unchanged.
This introduces RemoteEvaluator which allows users to reference LLM-as-Judge evaluations configured in the Datadog UI by name when running local experiments. For more information, see the documentation: https://docs.datadoghq.com/llm_observability/guide/evaluation_developer_guide/#using-managed-evaluators
This adds cache creation breakdown metrics for the Anthropic integration. When making Anthropic calls with prompt caching, ephemeral_5m_input_tokens and ephemeral_1h_input_tokens metrics are now reported, distinguishing between 5 minute and 1 hour prompt caches.
Adds support for reasoning and extended thinking content in Anthropic, LiteLLM, and OpenAI-compatible integrations. Anthropic thinking blocks (type: "thinking") are now captured as role: "reasoning" messages in both streaming and non-streaming responses, as well as in input messages for tool use continuations. LiteLLM now extracts reasoning_output_tokens from completion_tokens_details and captures reasoning_content in output messages for OpenAI-compatible providers.
LLMJudge now forwards any extra client_options to the underlying provider client constructor. This allows passing provider-specific options such as base_url, timeout, organization, or max_retries directly through client_options.
Dataset records' tags can now be operated on with 3 new Dataset methods: `dataset.add_tags<span class="title-ref">, </span>dataset.remove_tags<span class="title-ref">, and </span>dataset.replace_tags<span class="title-ref">. All 3 new methods accepts an int indicating the zero based index of the record to operate on, and a list of strings in the format of key:values representing the tags. For example, if the tag "env:prod" exists on the 1st record of the dataset </span><span class="title-ref">ds</span><span class="title-ref">, calling </span><span class="title-ref">ds.remove_tags(0, ["env:prod"]</span>` will update the local state of the dataset record to have the "env:prod" tag removed.
Change experiment execution to run evaluators immediately after each record's task completes instead of batching all tasks first. Experiment spans and evaluation metrics are now posted incrementally as records complete rather than waiting until the end. This improves progress visibility and preserves partial results if a run fails midway.
Adds support for Pydantic AI evaluations in LLM Observability Experiments by allowing users to pass a pydantic evaluation (which inherents from Evaluator) in an LLM Obs Experiment.
Example:
from pydantic_evals.evaluators import EqualsExpected
from ddtrace.llmobs import LLMObs
dataset = LLMObs.create_dataset( dataset_name="<DATASET_NAME>", description="<DATASET_DESCRIPTION>", records=[RECORD_1, RECORD_2, RECORD_3, ...]
)
def my_task(input_data, config): return input_data["output"]
def my_summary_evaluator(inputs, outputs, expected_outputs, evaluators_results): return evaluators_results["Correctness"].count(True)
equals_expected = EqualsExpected()
experiment = LLMObs.experiment( name="<EXPERIMENT_NAME>", task=my_task, dataset=dataset, evaluators=[equals_expected], summary_evaluators=[my_summary_evaluator], # optional, used to summarize the experiment results description="<EXPERIMENT_DESCRIPTION>."
)
result = experiment.run()
tracer
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.aiohttp
/status/200) when aiohttp.ClientSession was initialized with a base_url. The span now records the fully-resolved URL (e.g. http://host:port/status/200), matching aiohttp's internal behaviour.pymongo
DD_TRACE_MONGODB_OBFUSCATION to allow the mongodb.query to be obfuscated or not. Resource names always remain normalized regardless of the value. To preserve raw mongodb.query values, pair with DD_APM_OBFUSCATION_MONGODB_ENABLED=false on the Datadog Agent. See Datadog trace obfuscation docs: Trace obfuscation.google_cloud_pubsub
google-cloud-pubsub library. Instruments PublisherClient.publish() and SubscriberClient.subscribe() to generate spans for message publishing and consuming, with optional distributed trace context propagation via message attributes. Use DD_GOOGLE_CLOUD_PUBSUB_PROPAGATION_ENABLED to control context propagation (default: True) and DD_GOOGLE_CLOUD_PUBSUB_PROPAGATION_AS_SPAN_LINKS to attach propagated context as span links instead of re-parenting subscriber spans under the producer trace (default: False).multipart/form-data bodies, which could allow an attacker to bypass WAF inspection by hiding a malicious value among safe ones.scope["path"] instead of scope["raw_path"] for WAF URI evaluation. In rare cases where the URI contained path traversal sequences, these could be resolved before reaching the WAF, potentially affecting a small number of URI-based detection rules on ASGI-based frameworks like FastAPI and Starlette.UNVALIDATED_REDIRECT vulnerabilities could be reported for a single redirect() call.GraphInterrupt exceptions were incorrectly marked as errors in APM traces. GraphInterrupt is a control-flow exception used in LangGraph's human-in-the-loop workflows and should not be treated as an error condition.anyio.ClosedResourceError raised during MCP server session teardown when the ddtrace MCP integration is enabled.pytest-rerunfailures and flaky were silently overridden by the ddtrace plugin. With this change, external rerun plugins will now drive retries as expected when Auto Test Retries and Early Flake Detection features are both disabled, otherwise our retry mechanism takes precedence and a warning is emitted.RuntimeError that occurred when the git binary was not available. Git metadata upload is now skipped gracefully with a warning instead of aborting pytest startup.X-RateLimit-Reset header when present to determine the retry delay.RuntimeError could be raised when iterating over the context._meta dictionary while creating spans or generating distributed traces.DD_TRACE_DEBUG instead of its own dedicated environment variable DD_INTERNAL_TELEMETRY_DEBUG_ENABLED. Setting DD_TRACE_DEBUG=true no longer enables telemetry debug mode. To enable telemetry debug mode, set DD_INTERNAL_TELEMETRY_DEBUG_ENABLED=true._dd.p.ksr span tag formatting for very small sampling rates. Previously, rates below 0.001 could be output in scientific notation (e.g. 1e-06). Now always uses decimal notation with up to 6 decimal digits.JSONDecodeError when parsing tool call arguments from streamed Anthropic response message chunks.FileNotFoundError in prompt optimization where the system prompt template was stored as a .md file that was excluded from release wheels. The template is now embedded in a Python module to ensure it is always available at runtime.DROPPED_VALUE_TEXT warning message to reference the actual 5MB size limit. The size limit itself has not changed; only the message text was updated from an incorrect 1MB reference to the correct 5MB value.cache_creation_input_tokens and cache_read_input_tokens were not captured when using the LiteLLM integration with providers that support prompt caching (e.g., Anthropic, OpenAI, Deepseek).@llm decorator raised a LLMObsAnnotateSpanError exception when a decorated function returned a value that could not be parsed as LLM messages. Note that manual annotation still overrides this automatic annotation.@llm decorator did not automatically annotate the return value as output in traces. The decorator now captures the return value and annotates it as output, consistent with @workflow and @task decorators. Manual annotations via LLMObs.annotate() still take precedence.repr() strings instead of JSON. Pydantic v1 and v2 models are now properly serialized using model_dump() or .dict() respectively.LLMObs.workflow()) and OTel-bridged spans (e.g. from Strands Agents with DD_TRACE_OTEL_ENABLED=1) produced separate LLMObs traces instead of a single unified trace.DD_LLMOBS_PAYLOAD_SIZE_BYTES and DD_LLMOBS_EVENT_SIZE_BYTES environment variables respectively. These default to 5242880 (5 MiB) and 5000000 (5 MB), matching the previous hardcoded values.tool_search_tool_regex.input_tokens from the initial message_start chunk instead of the final message_delta chunk, which contains the accurate cumulative input token count.ResponseFunctionToolCall) in the input list of the OpenAI Responses API were silently dropped from LLM Observability traces. Previously, the input parser used dict-only access patterns that failed for SDK objects; it now uses attribute-safe access that handles both plain dicts and SDK objects.model_provider to "unknown" when a custom base URL is configured that does not match a recognized provider in the OpenAI, Anthropic, and LiteLLM integrations.SIGTERM instead of honoring --graceful-timeout. #16424AttributeError crash that occurred when the lock profiler or stack profiler encountered _DummyThread instances. _DummyThread lacks the _native_id attribute, so accessing native_id raises AttributeError. The profiler now falls back to using the thread identifier when native_id is unavailable.acquire call was successful.gevent.wait called with the objects keyword argument (e.g. gevent.wait(objects=[g1, g2])) now correctly links the greenlets to their parent task. Additionally, greenlets joined via gevent.joinall or gevent.wait from a user-level greenlet are now attributed to that greenlet instead of always being attributed to the Hub.ulimit -s unlimited) on Linux caused the stack profiler sampling thread to fail to start, resulting in empty CPU and wall-time profiles. #17132KeyError that could occur when using gevent.Timeout has been fixed.template parameter value for all Flask versions.ddtrace (like the profile uploader) from triggering in fork-heavy applications has been fixed.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
ASM
ddtrace.appsec.ai_guard.integrations.litellm.DatadogAIGuardGuardrail class can be registered as a custom guardrail in the LiteLLM proxy to evaluate requests and responses against AI Guard security policies. Requires the LiteLLM proxy guardrails API v2 available since litellm>=1.46.1.azure_cosmos
CI Visibility
DD_AGENTLESS_LOG_SUBMISSION_ENABLED=true for agentless setups, or DD_LOGS_INJECTION=true when using the Datadog Agent.tracing
OTEL_TRACES_EXPORTER=otlp to send spans to an OTLP endpoint instead of the Datadog Agent.LLM Observability
decorator tag to LLM Observability spans that are traced by a function decorator.pydantic_evals ReportEvaluator as a summary evaluator when its evaluate return annotation is exactly ScalarResult. The scalar value is recorded as the summary evaluation. Report evaluators that declare a broader analysis return type (for example the full ReportAnalysis union) are not accepted as summary evaluators; use a class-based or function summary evaluator instead. Examples and further documentation can found in our documentation here.Example:
from pydantic_evals.evaluators import ReportEvaluator
from pydantic_evals.evaluators import ReportEvaluatorContext
from pydantic_evals.reporting.analyses import ScalarResult
from ddtrace.llmobs import LLMObs
dataset = LLMObs.create_dataset(
dataset_name="<DATASET_NAME>",
description="<DATASET_DESCRIPTION>",
records=[RECORD_1, RECORD_2, RECORD_3, ...]
)
class TotalCasesEvaluator(ReportEvaluator):
def evaluate(self, ctx: ReportEvaluatorContext) -> ScalarResult:
return ScalarResult(
title='Total Cases',
value=len(ctx.report.cases),
unit='cases',
)
def my_task(input_data, config):
return input_data["output"]
equals_expected = EqualsExpected()
summary_evaluator = TotalCasesEvaluator()
experiment = LLMObs.experiment(
name="<EXPERIMENT_NAME>",
task=my_task,
dataset=dataset,
evaluators=[equals_expected],
summary_evaluators=[summary_evaluator],
description="<EXPERIMENT_DESCRIPTION>."
)
result = experiment.run()
ModuleNotFoundError could be raised at startup in Python environments without the _ctypes extension module.invoke_agent) were incorrectly appearing as siblings of their SDK parent span (e.g. call_agent) rather than being nested under it.model_name and model_provider reported on AWS Bedrock LLM spans as the model_id full model identifier value (e.g., "amazon.nova-lite-v1:0") and "amazon_bedrock" respectively. Bedrock spans' model_name and model_provider now correctly match backend pricing data, which enables features including cost tracking.defer_loading=True) in Anthropic and OpenAI integrations caused LLMObs span payloads to include full tool descriptions and JSON schemas for every tool in a large catalog. Deferred tool definitions now have their description and schema stripped from span metadata, with only the tool name preserved.Estimated end-of-life date, accurate to within three months: 06-2027 See the support level definitions for more information.
Estimated end-of-life date, accurate to within three months: 06-2027 See the support level definitions for more information.
Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
openfeature-sdk is now 0.8.0 (previously 0.6.0). This is required for the finally_after hook to receive evaluation details for metrics tracking.Reason.ERROR with ErrorCode.FLAG_NOT_FOUND instead of Reason.DEFAULT when configuration is available but the flag is not found. The previous behavior (Reason.DEFAULT) is preserved when no configuration is loaded. This aligns Python with other Datadog SDK implementations.mlflow
DD_API_KEY, DD_APP_KEY and DD_MODEL_LAB_ENABLED are set, HTTP requests to the MLFlow tracking server will include the DD-API-KEY and DD-APPLICATION-KEY headers. #16685ai_guard
block=False, which now defaults to block=True.ASM
This introduces AI Guard support for the AWS Strands Agents SDK. Two entry points are provided for evaluating prompts, model responses, and tool calls against Datadog AI Guard security policies at four agent lifecycle points (BeforeModelCallEvent, AfterModelCallEvent, BeforeToolCallEvent, AfterToolCallEvent).
Plugin API (recommended, requires strands-agents >= 1.29.0):
from strands import Agent
from ddtrace.appsec.ai_guard import AIGuardStrandsPlugin
agent = Agent(model=model, plugins=[AIGuardStrandsPlugin()])
HookProvider API (legacy):
from strands import Agent
from ddtrace.appsec.ai_guard import AIGuardStrandsHookProvider
agent = Agent(model=model, hooks=[AIGuardStrandsHookProvider()])
The strands-agents package is optional. When it is not installed, both classes are replaced by no-op stubs that log a warning. The Plugin API requires strands-agents>=1.29.0; the HookProvider works with any version that exposes the hooks system.
azure_durable_functions
profiling
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.runtime metrics
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.remote configuration
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.dynamic instrumentation
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.crashtracking
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.data streams monitoring
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.database monitoring
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.Stats computation
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.LLM Observability
Adds support for capturing stop_reason and structured_output from the Claude Agent SDK integration.
Adds support for user-defined dataset record IDs. Users can now supply an optional id field when creating dataset records via Dataset.append(), Dataset.extend(), create_dataset(), or create_dataset_from_csv() (via the new id_column parameter). If no id is provided, the SDK generates one automatically.
Experiment tasks can now optionally receive dataset record metadata as a third metadata parameter. Tasks with the existing (input_data, config) signature continue to work unchanged.
This introduces RemoteEvaluator which allows users to reference LLM-as-Judge evaluations configured in the Datadog UI by name when running local experiments. For more information, see the documentation: https://docs.datadoghq.com/llm_observability/guide/evaluation_developer_guide/#using-managed-evaluators
This adds cache creation breakdown metrics for the Anthropic integration. When making Anthropic calls with prompt caching, ephemeral_5m_input_tokens and ephemeral_1h_input_tokens metrics are now reported, distinguishing between 5 minute and 1 hour prompt caches.
Adds support for reasoning and extended thinking content in Anthropic, LiteLLM, and OpenAI-compatible integrations. Anthropic thinking blocks (type: "thinking") are now captured as role: "reasoning" messages in both streaming and non-streaming responses, as well as in input messages for tool use continuations. LiteLLM now extracts reasoning_output_tokens from completion_tokens_details and captures reasoning_content in output messages for OpenAI-compatible providers.
LLMJudge now forwards any extra client_options to the underlying provider client constructor. This allows passing provider-specific options such as base_url, timeout, organization, or max_retries directly through client_options.
Dataset records' tags can now be operated on with 3 new Dataset methods: `dataset.add_tags<span class="title-ref">, </span>dataset.remove_tags<span class="title-ref">, and </span>dataset.replace_tags<span class="title-ref">. All 3 new methods accepts an int indicating the zero based index of the record to operate on, and a list of strings in the format of key:values representing the tags. For example, if the tag "env:prod" exists on the 1st record of the dataset </span><span class="title-ref">ds</span><span class="title-ref">, calling </span><span class="title-ref">ds.remove_tags(0, ["env:prod"]</span>` will update the local state of the dataset record to have the "env:prod" tag removed.
Change experiment execution to run evaluators immediately after each record's task completes instead of batching all tasks first. Experiment spans and evaluation metrics are now posted incrementally as records complete rather than waiting until the end. This improves progress visibility and preserves partial results if a run fails midway.
Adds support for Pydantic AI evaluations in LLM Observability Experiments by allowing users to pass a pydantic evaluation (which inherents from Evaluator) in an LLM Obs Experiment.
Example:
from pydantic_evals.evaluators import EqualsExpected
from ddtrace.llmobs import LLMObs
dataset = LLMObs.create_dataset( dataset_name="<DATASET_NAME>", description="<DATASET_DESCRIPTION>", records=[RECORD_1, RECORD_2, RECORD_3, ...]
)
def my_task(input_data, config): return input_data["output"]
def my_summary_evaluator(inputs, outputs, expected_outputs, evaluators_results): return evaluators_results["Correctness"].count(True)
equals_expected = EqualsExpected()
experiment = LLMObs.experiment( name="<EXPERIMENT_NAME>", task=my_task, dataset=dataset, evaluators=[equals_expected], summary_evaluators=[my_summary_evaluator], # optional, used to summarize the experiment results description="<EXPERIMENT_DESCRIPTION>."
)
result = experiment.run()
tracer
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.aiohttp
/status/200) when aiohttp.ClientSession was initialized with a base_url. The span now records the fully-resolved URL (e.g. http://host:port/status/200), matching aiohttp's internal behaviour.pymongo
DD_TRACE_MONGODB_OBFUSCATION to allow the mongodb.query to be obfuscated or not. Resource names always remain normalized regardless of the value. To preserve raw mongodb.query values, pair with DD_APM_OBFUSCATION_MONGODB_ENABLED=false on the Datadog Agent. See Datadog trace obfuscation docs: Trace obfuscation.google_cloud_pubsub
google-cloud-pubsub library. Instruments PublisherClient.publish() and SubscriberClient.subscribe() to generate spans for message publishing and consuming, with optional distributed trace context propagation via message attributes. Use DD_GOOGLE_CLOUD_PUBSUB_PROPAGATION_ENABLED to control context propagation (default: True) and DD_GOOGLE_CLOUD_PUBSUB_PROPAGATION_AS_SPAN_LINKS to attach propagated context as span links instead of re-parenting subscriber spans under the producer trace (default: False).multipart/form-data bodies, which could allow an attacker to bypass WAF inspection by hiding a malicious value among safe ones.scope["path"] instead of scope["raw_path"] for WAF URI evaluation. In rare cases where the URI contained path traversal sequences, these could be resolved before reaching the WAF, potentially affecting a small number of URI-based detection rules on ASGI-based frameworks like FastAPI and Starlette.UNVALIDATED_REDIRECT vulnerabilities could be reported for a single redirect() call.GraphInterrupt exceptions were incorrectly marked as errors in APM traces. GraphInterrupt is a control-flow exception used in LangGraph's human-in-the-loop workflows and should not be treated as an error condition.anyio.ClosedResourceError raised during MCP server session teardown when the ddtrace MCP integration is enabled.pytest-rerunfailures and flaky were silently overridden by the ddtrace plugin. With this change, external rerun plugins will now drive retries as expected when Auto Test Retries and Early Flake Detection features are both disabled, otherwise our retry mechanism takes precedence and a warning is emitted.RuntimeError that occurred when the git binary was not available. Git metadata upload is now skipped gracefully with a warning instead of aborting pytest startup.X-RateLimit-Reset header when present to determine the retry delay.RuntimeError could be raised when iterating over the context._meta dictionary while creating spans or generating distributed traces.DD_TRACE_DEBUG instead of its own dedicated environment variable DD_INTERNAL_TELEMETRY_DEBUG_ENABLED. Setting DD_TRACE_DEBUG=true no longer enables telemetry debug mode. To enable telemetry debug mode, set DD_INTERNAL_TELEMETRY_DEBUG_ENABLED=true._dd.p.ksr span tag formatting for very small sampling rates. Previously, rates below 0.001 could be output in scientific notation (e.g. 1e-06). Now always uses decimal notation with up to 6 decimal digits.JSONDecodeError when parsing tool call arguments from streamed Anthropic response message chunks.FileNotFoundError in prompt optimization where the system prompt template was stored as a .md file that was excluded from release wheels. The template is now embedded in a Python module to ensure it is always available at runtime.DROPPED_VALUE_TEXT warning message to reference the actual 5MB size limit. The size limit itself has not changed; only the message text was updated from an incorrect 1MB reference to the correct 5MB value.cache_creation_input_tokens and cache_read_input_tokens were not captured when using the LiteLLM integration with providers that support prompt caching (e.g., Anthropic, OpenAI, Deepseek).@llm decorator raised a LLMObsAnnotateSpanError exception when a decorated function returned a value that could not be parsed as LLM messages. Note that manual annotation still overrides this automatic annotation.@llm decorator did not automatically annotate the return value as output in traces. The decorator now captures the return value and annotates it as output, consistent with @workflow and @task decorators. Manual annotations via LLMObs.annotate() still take precedence.repr() strings instead of JSON. Pydantic v1 and v2 models are now properly serialized using model_dump() or .dict() respectively.LLMObs.workflow()) and OTel-bridged spans (e.g. from Strands Agents with DD_TRACE_OTEL_ENABLED=1) produced separate LLMObs traces instead of a single unified trace.DD_LLMOBS_PAYLOAD_SIZE_BYTES and DD_LLMOBS_EVENT_SIZE_BYTES environment variables respectively. These default to 5242880 (5 MiB) and 5000000 (5 MB), matching the previous hardcoded values.tool_search_tool_regex.input_tokens from the initial message_start chunk instead of the final message_delta chunk, which contains the accurate cumulative input token count.ResponseFunctionToolCall) in the input list of the OpenAI Responses API were silently dropped from LLM Observability traces. Previously, the input parser used dict-only access patterns that failed for SDK objects; it now uses attribute-safe access that handles both plain dicts and SDK objects.model_provider to "unknown" when a custom base URL is configured that does not match a recognized provider in the OpenAI, Anthropic, and LiteLLM integrations.SIGTERM instead of honoring --graceful-timeout. #16424AttributeError crash that occurred when the lock profiler or stack profiler encountered _DummyThread instances. _DummyThread lacks the _native_id attribute, so accessing native_id raises AttributeError. The profiler now falls back to using the thread identifier when native_id is unavailable.acquire call was successful.gevent.wait called with the objects keyword argument (e.g. gevent.wait(objects=[g1, g2])) now correctly links the greenlets to their parent task. Additionally, greenlets joined via gevent.joinall or gevent.wait from a user-level greenlet are now attributed to that greenlet instead of always being attributed to the Hub.ulimit -s unlimited) on Linux caused the stack profiler sampling thread to fail to start, resulting in empty CPU and wall-time profiles. #17132KeyError that could occur when using gevent.Timeout has been fixed.template parameter value for all Flask versions.ddtrace (like the profile uploader) from triggering in fork-heavy applications has been fixed.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
CI Visibility: Fixes an issue where HTTP 429 (Too Many Requests) responses from the Datadog backend were treated as non-retriable errors, causing CI visibility data to be dropped when the backend applied rate limiting. The backend connector now retries on 429 responses and respects the X-RateLimit-Reset header when present to determine the retry delay.
internal: A bug preventing certain periodic threads of ddtrace (like the profile uploader) from triggering in fork-heavy applications has been fixed.
profiling: Fixes an issue where setting an unlimited stack size (ulimit -s unlimited) on Linux caused the stack profiler sampling thread to fail to start, resulting in empty CPU and wall-time profiles. #17132
Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
CI Visibility: Fixes an issue where HTTP 429 (Too Many Requests) responses from the Datadog backend were treated as non-retriable errors, causing CI visibility data to be dropped when the backend applied rate limiting. The backend connector now retries on 429 responses and respects the X-RateLimit-Reset header when present to determine the retry delay.
internal: A bug preventing certain periodic threads of ddtrace (like the profile uploader) from triggering in fork-heavy applications has been fixed.
profiling: Fixes an issue where setting an unlimited stack size (ulimit -s unlimited) on Linux caused the stack profiler sampling thread to fail to start, resulting in empty CPU and wall-time profiles. #17132
Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
openfeature-sdk is now 0.8.0 (previously 0.6.0). This is required for the finally_after hook to receive evaluation details for metrics tracking.Reason.ERROR with ErrorCode.FLAG_NOT_FOUND instead of Reason.DEFAULT when configuration is available but the flag is not found. The previous behavior (Reason.DEFAULT) is preserved when no configuration is loaded. This aligns Python with other Datadog SDK implementations.mlflow
DD_API_KEY, DD_APP_KEY and DD_MODEL_LAB_ENABLED are set, HTTP requests to the MLFlow tracking server will include the DD-API-KEY and DD-APPLICATION-KEY headers. #16685ai_guard
block=False, which now defaults to block=True.ASM
This introduces AI Guard support for the AWS Strands Agents SDK. Two entry points are provided for evaluating prompts, model responses, and tool calls against Datadog AI Guard security policies at four agent lifecycle points (BeforeModelCallEvent, AfterModelCallEvent, BeforeToolCallEvent, AfterToolCallEvent).
Plugin API (recommended, requires strands-agents >= 1.29.0):
from strands import Agent
from ddtrace.appsec.ai_guard import AIGuardStrandsPlugin
agent = Agent(model=model, plugins=[AIGuardStrandsPlugin()])
HookProvider API (legacy):
from strands import Agent
from ddtrace.appsec.ai_guard import AIGuardStrandsHookProvider
agent = Agent(model=model, hooks=[AIGuardStrandsHookProvider()])
The strands-agents package is optional. When it is not installed, both classes are replaced by no-op stubs that log a warning. The Plugin API requires strands-agents>=1.29.0; the HookProvider works with any version that exposes the hooks system.
azure_durable_functions
profiling
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.runtime metrics
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.remote configuration
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.dynamic instrumentation
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.crashtracking
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.data streams monitoring
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.database monitoring
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.Stats computation
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.LLM Observability
Adds support for capturing stop_reason and structured_output from the Claude Agent SDK integration.
Adds support for user-defined dataset record IDs. Users can now supply an optional id field when creating dataset records via Dataset.append(), Dataset.extend(), create_dataset(), or create_dataset_from_csv() (via the new id_column parameter). If no id is provided, the SDK generates one automatically.
Experiment tasks can now optionally receive dataset record metadata as a third metadata parameter. Tasks with the existing (input_data, config) signature continue to work unchanged.
This introduces RemoteEvaluator which allows users to reference LLM-as-Judge evaluations configured in the Datadog UI by name when running local experiments. For more information, see the documentation: https://docs.datadoghq.com/llm_observability/guide/evaluation_developer_guide/#using-managed-evaluators
This adds cache creation breakdown metrics for the Anthropic integration. When making Anthropic calls with prompt caching, ephemeral_5m_input_tokens and ephemeral_1h_input_tokens metrics are now reported, distinguishing between 5 minute and 1 hour prompt caches.
Adds support for reasoning and extended thinking content in Anthropic, LiteLLM, and OpenAI-compatible integrations. Anthropic thinking blocks (type: "thinking") are now captured as role: "reasoning" messages in both streaming and non-streaming responses, as well as in input messages for tool use continuations. LiteLLM now extracts reasoning_output_tokens from completion_tokens_details and captures reasoning_content in output messages for OpenAI-compatible providers.
LLMJudge now forwards any extra client_options to the underlying provider client constructor. This allows passing provider-specific options such as base_url, timeout, organization, or max_retries directly through client_options.
Dataset records' tags can now be operated on with 3 new Dataset methods: `dataset.add_tags<span class="title-ref">, </span>dataset.remove_tags<span class="title-ref">, and </span>dataset.replace_tags<span class="title-ref">. All 3 new methods accepts an int indicating the zero based index of the record to operate on, and a list of strings in the format of key:values representing the tags. For example, if the tag "env:prod" exists on the 1st record of the dataset </span><span class="title-ref">ds</span><span class="title-ref">, calling </span><span class="title-ref">ds.remove_tags(0, ["env:prod"]</span>` will update the local state of the dataset record to have the "env:prod" tag removed.
Change experiment execution to run evaluators immediately after each record's task completes instead of batching all tasks first. Experiment spans and evaluation metrics are now posted incrementally as records complete rather than waiting until the end. This improves progress visibility and preserves partial results if a run fails midway.
Adds support for Pydantic AI evaluations in LLM Observability Experiments by allowing users to pass a pydantic evaluation (which inherents from Evaluator) in an LLM Obs Experiment.
Example:
from pydantic_evals.evaluators import EqualsExpected
from ddtrace.llmobs import LLMObs
dataset = LLMObs.create_dataset( dataset_name="<DATASET_NAME>", description="<DATASET_DESCRIPTION>", records=[RECORD_1, RECORD_2, RECORD_3, ...]
)
def my_task(input_data, config): return input_data["output"]
def my_summary_evaluator(inputs, outputs, expected_outputs, evaluators_results): return evaluators_results["Correctness"].count(True)
equals_expected = EqualsExpected()
experiment = LLMObs.experiment( name="<EXPERIMENT_NAME>", task=my_task, dataset=dataset, evaluators=[equals_expected], summary_evaluators=[my_summary_evaluator], # optional, used to summarize the experiment results description="<EXPERIMENT_DESCRIPTION>."
)
result = experiment.run()
tracer
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.aiohttp
/status/200) when aiohttp.ClientSession was initialized with a base_url. The span now records the fully-resolved URL (e.g. http://host:port/status/200), matching aiohttp's internal behaviour.pymongo
DD_TRACE_MONGODB_OBFUSCATION to allow the mongodb.query to be obfuscated or not. Resource names always remain normalized regardless of the value. To preserve raw mongodb.query values, pair with DD_APM_OBFUSCATION_MONGODB_ENABLED=false on the Datadog Agent. See Datadog trace obfuscation docs: Trace obfuscation.google_cloud_pubsub
google-cloud-pubsub library. Instruments PublisherClient.publish() and SubscriberClient.subscribe() to generate spans for message publishing and consuming, with optional distributed trace context propagation via message attributes. Use DD_GOOGLE_CLOUD_PUBSUB_PROPAGATION_ENABLED to control context propagation (default: True) and DD_GOOGLE_CLOUD_PUBSUB_PROPAGATION_AS_SPAN_LINKS to attach propagated context as span links instead of re-parenting subscriber spans under the producer trace (default: False).multipart/form-data bodies, which could allow an attacker to bypass WAF inspection by hiding a malicious value among safe ones.UNVALIDATED_REDIRECT vulnerabilities could be reported for a single redirect() call.GraphInterrupt exceptions were incorrectly marked as errors in APM traces. GraphInterrupt is a control-flow exception used in LangGraph's human-in-the-loop workflows and should not be treated as an error condition.anyio.ClosedResourceError raised during MCP server session teardown when the ddtrace MCP integration is enabled.pytest-rerunfailures and flaky were silently overridden by the ddtrace plugin. With this change, external rerun plugins will now drive retries as expected when Auto Test Retries and Early Flake Detection features are both disabled, otherwise our retry mechanism takes precedence and a warning is emitted.RuntimeError that occurred when the git binary was not available. Git metadata upload is now skipped gracefully with a warning instead of aborting pytest startup.X-RateLimit-Reset header when present to determine the retry delay.RuntimeError could be raised when iterating over the context._meta dictionary while creating spans or generating distributed traces.DD_TRACE_DEBUG instead of its own dedicated environment variable DD_INTERNAL_TELEMETRY_DEBUG_ENABLED. Setting DD_TRACE_DEBUG=true no longer enables telemetry debug mode. To enable telemetry debug mode, set DD_INTERNAL_TELEMETRY_DEBUG_ENABLED=true._dd.p.ksr span tag formatting for very small sampling rates. Previously, rates below 0.001 could be output in scientific notation (e.g. 1e-06). Now always uses decimal notation with up to 6 decimal digits.cache_creation_input_tokens and cache_read_input_tokens were not captured when using the LiteLLM integration with providers that support prompt caching (e.g., Anthropic, OpenAI, Deepseek).@llm decorator raised a LLMObsAnnotateSpanError exception when a decorated function returned a value that could not be parsed as LLM messages. Note that manual annotation still overrides this automatic annotation.@llm decorator did not automatically annotate the return value as output in traces. The decorator now captures the return value and annotates it as output, consistent with @workflow and @task decorators. Manual annotations via LLMObs.annotate() still take precedence.repr() strings instead of JSON. Pydantic v1 and v2 models are now properly serialized using model_dump() or .dict() respectively.LLMObs.workflow()) and OTel-bridged spans (e.g. from Strands Agents with DD_TRACE_OTEL_ENABLED=1) produced separate LLMObs traces instead of a single unified trace.DD_LLMOBS_PAYLOAD_SIZE_BYTES and DD_LLMOBS_EVENT_SIZE_BYTES environment variables respectively. These default to 5242880 (5 MiB) and 5000000 (5 MB), matching the previous hardcoded values.tool_search_tool_regex.input_tokens from the initial message_start chunk instead of the final message_delta chunk, which contains the accurate cumulative input token count.ResponseFunctionToolCall) in the input list of the OpenAI Responses API were silently dropped from LLM Observability traces. Previously, the input parser used dict-only access patterns that failed for SDK objects; it now uses attribute-safe access that handles both plain dicts and SDK objects.model_provider to "unknown" when a custom base URL is configured that does not match a recognized provider in the OpenAI, Anthropic, and LiteLLM integrations.SIGTERM instead of honoring --graceful-timeout. #16424AttributeError crash that occurred when the lock profiler or stack profiler encountered _DummyThread instances. _DummyThread lacks the _native_id attribute, so accessing native_id raises AttributeError. The profiler now falls back to using the thread identifier when native_id is unavailable.acquire call was successful.gevent.wait called with the objects keyword argument (e.g. gevent.wait(objects=[g1, g2])) now correctly links the greenlets to their parent task. Additionally, greenlets joined via gevent.joinall or gevent.wait from a user-level greenlet are now attributed to that greenlet instead of always being attributed to the Hub.ulimit -s unlimited) on Linux caused the stack profiler sampling thread to fail to start, resulting in empty CPU and wall-time profiles. #17132KeyError that could occur when using gevent.Timeout has been fixed.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
mlflow
DD_API_KEY, DD_APP_KEY and DD_MODEL_LAB_ENABLED are set, HTTP requests to the MLFlow tracking server will include the DD-API-KEY and DD-APPLICATION-KEY headers. #16685AI Guard
block=False, which now defaults to block=True.strands-agents>=1.29.0; the HookProvider works with any version that exposes the hooks system.azure_durable_functions
profiling
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.runtime metrics
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.remote configuration
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.dynamic instrumentation
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.crashtracking
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.data streams monitoring
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.database monitoring
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.Stats computation
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.LLM Observability
Experiment tasks can now optionally receive dataset record metadata as a third metadata parameter. Tasks with the existing (input_data, config) signature continue to work unchanged.
This introduces RemoteEvaluator which allows users to reference LLM-as-Judge evaluations configured in the Datadog UI by name when running local experiments. For more information, see the documentation: https://docs.datadoghq.com/llm_observability/guide/evaluation_developer_guide/#using-managed-evaluators
This adds cache creation breakdown metrics for the Anthropic integration. When making Anthropic calls with prompt caching, ephemeral_5m_input_tokens and ephemeral_1h_input_tokens metrics are now reported, distinguishing between 5 minute and 1 hour prompt caches.
Adds support for reasoning and extended thinking content in Anthropic, LiteLLM, and OpenAI-compatible integrations. Anthropic thinking blocks (type: "thinking") are now captured as role: "reasoning" messages in both streaming and non-streaming responses, as well as in input messages for tool use continuations. LiteLLM now extracts reasoning_output_tokens from completion_tokens_details and captures reasoning_content in output messages for OpenAI-compatible providers.
LLMJudge now forwards any extra client_options to the underlying provider client constructor. This allows passing provider-specific options such as base_url, timeout, organization, or max_retries directly through client_options.
Dataset records' tags can now be operated on with 3 new Dataset methods: `dataset.add_tags<span class="title-ref">, </span>dataset.remove_tags<span class="title-ref">, and </span>dataset.replace_tags<span class="title-ref">. All 3 new methods accepts an int indicating the zero based index of the record to operate on, and a list of strings in the format of key:values representing the tags. For example, if the tag "env:prod" exists on the 1st record of the dataset </span><span class="title-ref">ds</span><span class="title-ref">, calling </span><span class="title-ref">ds.remove_tags(0, ["env:prod"]</span>` will update the local state of the dataset record to have the "env:prod" tag removed.
Change experiment execution to run evaluators immediately after each record's task completes instead of batching all tasks first. Experiment spans and evaluation metrics are now posted incrementally as records complete rather than waiting until the end. This improves progress visibility and preserves partial results if a run fails midway.
Adds support for Pydantic AI evaluations in LLM Observability Experiments by allowing users to pass a pydantic evaluation (which inherents from Evaluator) in an LLM Obs Experiment.
Example:
from pydantic_evals.evaluators import EqualsExpected
from ddtrace.llmobs import LLMObs
dataset = LLMObs.create_dataset( dataset_name="<DATASET_NAME>", description="<DATASET_DESCRIPTION>", records=[RECORD_1, RECORD_2, RECORD_3, ...]
)
def my_task(input_data, config): return input_data["output"]
def my_summary_evaluator(inputs, outputs, expected_outputs, evaluators_results): return evaluators_results["Correctness"].count(True)
equals_expected = EqualsExpected()
experiment = LLMObs.experiment( name="<EXPERIMENT_NAME>", task=my_task, dataset=dataset, evaluators=[equals_expected], summary_evaluators=[my_summary_evaluator], # optional, used to summarize the experiment results description="<EXPERIMENT_DESCRIPTION>."
)
result = experiment.run()
tracer
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.pymongo
DD_TRACE_MONGODB_OBFUSCATION to allow the mongodb.query to be obfuscated or not. Resource names always remain normalized regardless of the value. To preserve raw mongodb.query values, pair with DD_APM_OBFUSCATION_MONGODB_ENABLED=false on the Datadog Agent. See Datadog trace obfuscation docs: Trace obfuscation.google_cloud_pubsub
google-cloud-pubsub library. Instruments PublisherClient.publish() and SubscriberClient.subscribe() to generate spans for message publishing and consuming, with optional distributed trace context propagation via message attributes. Use DD_GOOGLE_CLOUD_PUBSUB_PROPAGATION_ENABLED to control context propagation (default: True) and DD_GOOGLE_CLOUD_PUBSUB_PROPAGATION_AS_SPAN_LINKS to attach propagated context as span links instead of re-parenting subscriber spans under the producer trace (default: False).multipart/form-data bodies, which could allow an attacker to bypass WAF inspection by hiding a malicious value among safe ones.UNVALIDATED_REDIRECT vulnerabilities could be reported for a single redirect() call.GraphInterrupt exceptions were incorrectly marked as errors in APM traces. GraphInterrupt is a control-flow exception used in LangGraph's human-in-the-loop workflows and should not be treated as an error condition.anyio.ClosedResourceError raised during MCP server session teardown when the ddtrace MCP integration is enabled.pytest-rerunfailures and flaky were silently overridden by the ddtrace plugin. With this change, external rerun plugins will now drive retries as expected when Auto Test Retries and Early Flake Detection features are both disabled, otherwise our retry mechanism takes precedence and a warning is emitted.RuntimeError that occurred when the git binary was not available. Git metadata upload is now skipped gracefully with a warning instead of aborting pytest startup.RuntimeError could be raised when iterating over the context._meta dictionary while creating spans or generating distributed traces.DD_TRACE_DEBUG instead of its own dedicated environment variable DD_INTERNAL_TELEMETRY_DEBUG_ENABLED. Setting DD_TRACE_DEBUG=true no longer enables telemetry debug mode. To enable telemetry debug mode, set DD_INTERNAL_TELEMETRY_DEBUG_ENABLED=true.cache_creation_input_tokens and cache_read_input_tokens were not captured when using the LiteLLM integration with providers that support prompt caching (e.g., Anthropic, OpenAI, Deepseek).@llm decorator raised a LLMObsAnnotateSpanError exception when a decorated function returned a value that could not be parsed as LLM messages. Note that manual annotation still overrides this automatic annotation.@llm decorator did not automatically annotate the return value as output in traces. The decorator now captures the return value and annotates it as output, consistent with @workflow and @task decorators. Manual annotations via LLMObs.annotate() still take precedence.repr() strings instead of JSON. Pydantic v1 and v2 models are now properly serialized using model_dump() or .dict() respectively.LLMObs.workflow()) and OTel-bridged spans (e.g. from Strands Agents with DD_TRACE_OTEL_ENABLED=1) produced separate LLMObs traces instead of a single unified trace.DD_LLMOBS_PAYLOAD_SIZE_BYTES and DD_LLMOBS_EVENT_SIZE_BYTES environment variables respectively. These default to 5242880 (5 MiB) and 5000000 (5 MB), matching the previous hardcoded values.tool_search_tool_regex.input_tokens from the initial message_start chunk instead of the final message_delta chunk, which contains the accurate cumulative input token count.SIGTERM instead of honoring --graceful-timeout. #16424AttributeError crash that occurred when the lock profiler or stack profiler encountered _DummyThread instances. _DummyThread lacks the _native_id attribute, so accessing native_id raises AttributeError. The profiler now falls back to using the thread identifier when native_id is unavailable.acquire call was successful.gevent.wait called with the objects keyword argument (e.g. gevent.wait(objects=[g1, g2])) now correctly links the greenlets to their parent task. Additionally, greenlets joined via gevent.joinall or gevent.wait from a user-level greenlet are now attributed to that greenlet instead of always being attributed to the Hub.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
@llm decorator raised a LLMObsAnnotateSpanError exception when a decorated function returned a value that could not be parsed as LLM messages (e.g. a plain string, integer, or non-messages dict). The decorator now logs a debug message instead and continues. Manual annotations via LLMObs.annotate() still take precedence.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
DD_LLMOBS_PAYLOAD_SIZE_BYTES and DD_LLMOBS_EVENT_SIZE_BYTES environment variables respectively. These default to 5242880 (5 MiB) and 5000000 (5 MB), matching the previous hardcoded values.@llm decorator did not automatically annotate the return value as output in traces. The decorator now captures the return value and annotates it as output, consistent with @workflow and @task decorators. Manual annotations via LLMObs.annotate() still take precedence.repr() strings instead of JSON. Pydantic v1 and v2 models are now properly serialized using model_dump() or .dict() respectively.