v4.7.0 — Datadog dd-trace-py

Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.

Upgrade Notes

profiling
- This compiles the lock profiler's hot path to C via Cython, reducing per-operation overhead. At the default 1% capture rate, lock operations are ~49% faster for both contended and uncontended workloads. At 100% capture, gains are ~15-19%. No configuration changes are required.
openfeature
- The minimum required version of openfeature-sdk is now 0.8.0 (previously 0.6.0). This is required for the finally_after hook to receive evaluation details for metrics tracking.

API Changes

openfeature
- Flag evaluations for non-existent flags now return Reason.ERROR with ErrorCode.FLAG_NOT_FOUND instead of Reason.DEFAULT when configuration is available but the flag is not found. The previous behavior (Reason.DEFAULT) is preserved when no configuration is loaded. This aligns Python with other Datadog SDK implementations.

New Features

mlflow
- Adds a request header provider (auth plugin) for MLFlow. If the environment variables DD_API_KEY, DD_APP_KEY and DD_MODEL_LAB_ENABLED are set, HTTP requests to the MLFlow tracking server will include the DD-API-KEY and DD-APPLICATION-KEY headers. #16685
ai_guard
- Calls to evaluate now block if blocking was enabled for the service in the AI Guard UI. This behavior can be disabled by passing the parameter block=False, which now defaults to block=True.
- This updates the AI Guard API client to return Sensitive Data Scanner (SDS) results in the SDK response.
- This introduces AI Guard support for Strands Agents. The Plugin API requires strands-agents>=1.29.0; the HookProvider works with any version that exposes the hooks system.
azure_durable_functions
- Add tracing support for Azure Durable Functions. This integration traces durable activity and entity functions.
profiling
- This adds process tags to profiler payloads. To deactivate this feature, set DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.
runtime metrics
- This adds process tags to runtime metrics tags. To deactivate this feature, set DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.
remote configuration
- This adds process tags to remote configuration payloads. To deactivate this feature, set DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.
dynamic instrumentation
- This adds process tags to debugger payloads. To deactivate this feature, set DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.
crashtracking
- This adds process tags to crash tracking payloads. To deactivate this feature, set DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.
data streams monitoring
- This adds process tags to Data Streams Monitoring payloads. To deactivate this feature, set DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.
database monitoring
- This adds process tags to Database Monitoring SQL service hash propagation. To deactivate this feature, set DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.
Stats computation
- This adds process tags to stats computation payloads. To deactivate this feature, set DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.
LLM Observability
- Adds support for capturing stop_reason and structured_output from the Claude Agent SDK integration.
- Adds support for user-defined dataset record IDs. Users can now supply an optional id field when creating dataset records via Dataset.append(), Dataset.extend(), create_dataset(), or create_dataset_from_csv() (via the new id_column parameter). If no id is provided, the SDK generates one automatically.
- Experiment tasks can now optionally receive dataset record metadata as a third metadata parameter. Tasks with the existing (input_data, config) signature continue to work unchanged.
- This introduces RemoteEvaluator which allows users to reference LLM-as-Judge evaluations configured in the Datadog UI by name when running local experiments. For more information, see the documentation: https://docs.datadoghq.com/llm_observability/guide/evaluation_developer_guide/#using-managed-evaluators
- This adds cache creation breakdown metrics for the Anthropic integration. When making Anthropic calls with prompt caching, ephemeral_5m_input_tokens and ephemeral_1h_input_tokens metrics are now reported, distinguishing between 5 minute and 1 hour prompt caches.
- Adds support for reasoning and extended thinking content in Anthropic, LiteLLM, and OpenAI-compatible integrations. Anthropic thinking blocks (type: "thinking") are now captured as role: "reasoning" messages in both streaming and non-streaming responses, as well as in input messages for tool use continuations. LiteLLM now extracts reasoning_output_tokens from completion_tokens_details and captures reasoning_content in output messages for OpenAI-compatible providers.
- LLMJudge now forwards any extra client_options to the underlying provider client constructor. This allows passing provider-specific options such as base_url, timeout, organization, or max_retries directly through client_options.
- Dataset records' tags can now be operated on with 3 new Dataset methods: `dataset.add_tags, dataset.remove_tags, and dataset.replace_tags. All 3 new methods accepts an int indicating the zero based index of the record to operate on, and a list of strings in the format of key:values representing the tags. For example, if the tag "env:prod" exists on the 1st record of the dataset ds, calling ds.remove_tags(0, ["env:prod"]` will update the local state of the dataset record to have the "env:prod" tag removed.
- Change experiment execution to run evaluators immediately after each record's task completes instead of batching all tasks first. Experiment spans and evaluation metrics are now posted incrementally as records complete rather than waiting until the end. This improves progress visibility and preserves partial results if a run fails midway.
- Adds support for Pydantic AI evaluations in LLM Observability Experiments by allowing users to pass a pydantic evaluation (which inherents from Evaluator) in an LLM Obs Experiment.
 
 Example:
 
 from pydantic_evals.evaluators import EqualsExpected
 
 from ddtrace.llmobs import LLMObs
 
 dataset = LLMObs.create_dataset( dataset_name="<DATASET_NAME>", description="<DATASET_DESCRIPTION>", records=[RECORD_1, RECORD_2, RECORD_3, ...]
 
 )
 
 def my_task(input_data, config): return input_data["output"]
 
 def my_summary_evaluator(inputs, outputs, expected_outputs, evaluators_results): return evaluators_results["Correctness"].count(True)
 
 equals_expected = EqualsExpected()
 
 experiment = LLMObs.experiment( name="<EXPERIMENT_NAME>", task=my_task, dataset=dataset, evaluators=[equals_expected], summary_evaluators=[my_summary_evaluator], # optional, used to summarize the experiment results description="<EXPERIMENT_DESCRIPTION>."
 
 )
 
 result = experiment.run()
tracer
- This introduces API endpoint discovery support for Tornado applications. HTTP endpoints are now automatically collected at application startup and reported via telemetry, bringing Tornado in line with Flask, FastAPI, and Django.
- This adds process tags to trace payloads. To deactivate this feature, set DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.
- Adds instrumentation support for `mlflow>=2.11.0. See the mlflow <https://ddtrace.readthedocs.io/en/stable/integrations.html#mlflow\> documentation for more information.
- Add process tags to client side stats payload
aiohttp
- Fixed an issue where spans captured an incomplete URL (e.g. /status/200) when aiohttp.ClientSession was initialized with a base_url. The span now records the fully-resolved URL (e.g. http://host:port/status/200), matching aiohttp's internal behaviour.
pymongo
- Add a new configuration option called DD_TRACE_MONGODB_OBFUSCATION to allow the mongodb.query to be obfuscated or not. Resource names always remain normalized regardless of the value. To preserve raw mongodb.query values, pair with DD_APM_OBFUSCATION_MONGODB_ENABLED=false on the Datadog Agent. See Datadog trace obfuscation docs: Trace obfuscation.
google_cloud_pubsub
- Add tracing support for the google-cloud-pubsub library. Instruments PublisherClient.publish() and SubscriberClient.subscribe() to generate spans for message publishing and consuming, with optional distributed trace context propagation via message attributes. Use DD_GOOGLE_CLOUD_PUBSUB_PROPAGATION_ENABLED to control context propagation (default: True) and DD_GOOGLE_CLOUD_PUBSUB_PROPAGATION_AS_SPAN_LINKS to attach propagated context as span links instead of re-parenting subscriber spans under the producer trace (default: False).

Bug Fixes

AAP
- Fix multipart request body parsing to preserve all values when the same field name appears multiple times. Previously, only the last value was kept for duplicate keys in multipart/form-data bodies, which could allow an attacker to bypass WAF inspection by hiding a malicious value among safe ones.
- Fixes a minor issue where the ASGI middleware used the framework-resolved scope["path"] instead of scope["raw_path"] for WAF URI evaluation. In rare cases where the URI contained path traversal sequences, these could be resolved before reaching the WAF, potentially affecting a small number of URI-based detection rules on ASGI-based frameworks like FastAPI and Starlette.
- This fix resolves an issue where RASP exploit prevention stack traces incorrectly included ddtrace internal frames at the top. Stack traces now correctly show only user and library frames.
IAST
- This fix resolves an issue where duplicate UNVALIDATED_REDIRECT vulnerabilities could be reported for a single redirect() call.
Fixed an issue with internal periodic threads that could have caused a crash during shutdown if a fork occurred.
telemetry
- fix extended heartbeat payload key from "configurations" to "configuration" to match the telemetry v2 API spec.
langgraph
- Fixed an issue where GraphInterrupt exceptions were incorrectly marked as errors in APM traces. GraphInterrupt is a control-flow exception used in LangGraph's human-in-the-loop workflows and should not be treated as an error condition.
mcp
- Fixes anyio.ClosedResourceError raised during MCP server session teardown when the ddtrace MCP integration is enabled.
CI Visibility
- Fixes an issue where pytest plugins pytest-rerunfailures and flaky were silently overridden by the ddtrace plugin. With this change, external rerun plugins will now drive retries as expected when Auto Test Retries and Early Flake Detection features are both disabled, otherwise our retry mechanism takes precedence and a warning is emitted.
- Fixed an issue where a test marked as both quarantined and attempt-to-fix was incorrectly treated as plain quarantined, preventing attempt-to-fix flow from running.
- Fix an unhandled RuntimeError that occurred when the git binary was not available. Git metadata upload is now skipped gracefully with a warning instead of aborting pytest startup.
- pytest:
  - Fixed missing ITR tags in the new pytest plugin that caused time saved by Test Impact Analysis to not appear in dashboards.
- Fixes an issue where HTTP 429 (Too Many Requests) responses from the Datadog backend were treated as non-retriable errors, causing CI visibility data to be dropped when the backend applied rate limiting. The backend connector now retries on 429 responses and respects the X-RateLimit-Reset header when present to determine the retry delay.
tracing
- Resolves an issue where a RuntimeError could be raised when iterating over the context._meta dictionary while creating spans or generating distributed traces.
- fixes an issue where telemetry debug mode was incorrectly enabled by DD_TRACE_DEBUG instead of its own dedicated environment variable DD_INTERNAL_TELEMETRY_DEBUG_ENABLED. Setting DD_TRACE_DEBUG=true no longer enables telemetry debug mode. To enable telemetry debug mode, set DD_INTERNAL_TELEMETRY_DEBUG_ENABLED=true.
- Fix _dd.p.ksr span tag formatting for very small sampling rates. Previously, rates below 0.001 could be output in scientific notation (e.g. 1e-06). Now always uses decimal notation with up to 6 decimal digits.
- sampling rules do not early exit anymore if a single rule is missing service and name.
LLM Observability
- This fix avoids potential JSONDecodeError when parsing tool call arguments from streamed Anthropic response message chunks.
- Fixes a FileNotFoundError in prompt optimization where the system prompt template was stored as a .md file that was excluded from release wheels. The template is now embedded in a Python module to ensure it is always available at runtime.
- Corrected the DROPPED_VALUE_TEXT warning message to reference the actual 5MB size limit. The size limit itself has not changed; only the message text was updated from an incorrect 1MB reference to the correct 5MB value.
- This fix resolves an issue where cache_creation_input_tokens and cache_read_input_tokens were not captured when using the LiteLLM integration with providers that support prompt caching (e.g., Anthropic, OpenAI, Deepseek).
- Fixes an issue where the @llm decorator raised a LLMObsAnnotateSpanError exception when a decorated function returned a value that could not be parsed as LLM messages. Note that manual annotation still overrides this automatic annotation.
- Fixes an issue where the @llm decorator did not automatically annotate the return value as output in traces. The decorator now captures the return value and annotates it as output, consistent with @workflow and @task decorators. Manual annotations via LLMObs.annotate() still take precedence.
- Fixes an issue where Pydantic model outputs nested inside lists, tuples, or dicts were serialized as unreadable repr() strings instead of JSON. Pydantic v1 and v2 models are now properly serialized using model_dump() or .dict() respectively.
- Fixes an issue where SDK-side LLMObs spans (e.g. LLMObs.workflow()) and OTel-bridged spans (e.g. from Strands Agents with DD_TRACE_OTEL_ENABLED=1) produced separate LLMObs traces instead of a single unified trace.
- Fixes an issue where the payload size limit and event size limit were hardcoded and could not be configured. These are now configurable via the DD_LLMOBS_PAYLOAD_SIZE_BYTES and DD_LLMOBS_EVENT_SIZE_BYTES environment variables respectively. These default to 5242880 (5 MiB) and 5000000 (5 MB), matching the previous hardcoded values.
- Fixes an issue where Anthropic LLM spans were dropped when streaming responses from Anthropic beta API features with tool use, such as tool_search_tool_regex.
- Fixes an issue where streamed Anthropic responses with generic content block types were not captured in output messages.
- Fixes an issue where streamed Anthropic responses reported input_tokens from the initial message_start chunk instead of the final message_delta chunk, which contains the accurate cumulative input token count.
- Fixes an issue where tool_result message content in Anthropic responses were not captured.
- Fixes an issue where tool calls and function call outputs passed as OpenAI SDK objects (e.g. ResponseFunctionToolCall) in the input list of the OpenAI Responses API were silently dropped from LLM Observability traces. Previously, the input parser used dict-only access patterns that failed for SDK objects; it now uses attribute-safe access that handles both plain dicts and SDK objects.
- Defaults model_provider to "unknown" when a custom base URL is configured that does not match a recognized provider in the OpenAI, Anthropic, and LiteLLM integrations.
profiling
- Fixes an issue where enabling the profiler with gevent workers caused gunicorn to skip graceful shutdown, killing in-flight requests immediately on SIGTERM instead of honoring --graceful-timeout. #16424
- Fix potential reentrant crashes in the memory profiler by avoiding object allocations and frees during stack unwinding inside the allocator hook. #16661
- the Profiler now correctly flushes profiles at most once per upload interval.
- Fixes an AttributeError crash that occurred when the lock profiler or stack profiler encountered _DummyThread instances. _DummyThread lacks the _native_id attribute, so accessing native_id raises AttributeError. The profiler now falls back to using the thread identifier when native_id is unavailable.
- Lock acquire samples are now only recorded if the acquire call was successful.
- A rare crash which could happen on Python 3.11+ was fixed.
- the memory profiler now uses the weighted allocation size in heap live size, fixing a bug where the reported heap live size was much lower than it really was.
- A crash that could happen on Python < 3.11 when profiling asynchronous code was fixed.
- A rare crash that could happen when profiling asyncio code has been fixed.
- Fixes two bugs in gevent task attribution. gevent.wait called with the objects keyword argument (e.g. gevent.wait(objects=[g1, g2])) now correctly links the greenlets to their parent task. Additionally, greenlets joined via gevent.joinall or gevent.wait from a user-level greenlet are now attributed to that greenlet instead of always being attributed to the Hub.
- Fixes an issue where setting an unlimited stack size (ulimit -s unlimited) on Linux caused the stack profiler sampling thread to fail to start, resulting in empty CPU and wall-time profiles. #17132
- A KeyError that could occur when using gevent.Timeout has been fixed.
Fix for potential crashes at process shutdown due to incorrect detection of the VM finalization state when stopping periodic worker threads.
dynamic instrumentation
- fixed an issue that prevented Dynamic Instrumentation from being re-enabled once disabled via the UI while being originally enabled via environment variable.
flask
- The Flask integration now properly captures the template parameter value for all Flask versions.
internal
- A bug preventing certain periodic threads of ddtrace (like the profile uploader) from triggering in fork-heavy applications has been fixed.