Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
RuntimeError that occurred when the git binary was not available. Git metadata upload is now skipped gracefully with a warning instead of aborting pytest startup.LLMObs.workflow()) and OTel-bridged spans (e.g. from Strands Agents with DD_TRACE_OTEL_ENABLED=1) produced separate LLMObs traces instead of a single unified trace.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
RuntimeError that occurred when the git binary was not available. Git metadata upload is now skipped gracefully with a warning instead of aborting pytest startup.tool_search_tool_regex.RuntimeError during forks.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
pytest-rerunfailures and flaky were silently overridden by the ddtrace plugin. With this change, external rerun plugins will now drive retries as expected when Auto Test Retries and Early Flake Detection features are both disabled, otherwise our retry mechanism takes precedence and a warning is emitted.tool_search_tool_regex.RuntimeError could be raised when iterating over the context._meta dictionary while creating spans or generating distributed traces.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
pytest-rerunfailures and flaky were silently overridden by the ddtrace plugin. With this change, external rerun plugins will now drive retries as expected when Auto Test Retries and Early Flake Detection features are both disabled, otherwise our retry mechanism takes precedence and a warning is emitted.RuntimeError could be raised when iterating over the context._meta dictionary while creating spans or generating distributed traces.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
mlflow
DD_API_KEY, DD_APP_KEY and DD_MODEL_LAB_ENABLED are set, HTTP requests to the MLFlow tracking server will include the DD-API-KEY and DD-APPLICATION-KEY headers. #16685AI Guard
block=False, which now defaults to block=True.strands-agents>=1.29.0; the HookProvider works with any version that exposes the hooks system.azure_durable_functions
profiling
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.runtime metrics
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.remote configuration
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.dynamic instrumentation
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.crashtracking
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.data streams monitoring
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.database monitoring
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.Stats computation
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.LLM Observability
metadata parameter. Tasks with the existing (input_data, config) signature continue to work unchanged.RemoteEvaluator which allows users to reference LLM-as-Judge evaluations configured in the Datadog UI by name when running local experiments. For more information, see the documentation: https://docs.datadoghq.com/llm_observability/guide/evaluation_developer_guide/#using-managed-evaluatorsephemeral_5m_input_tokens and ephemeral_1h_input_tokens metrics are now reported, distinguishing between 5 minute and 1 hour prompt caches.type: "thinking") are now captured as role: "reasoning" messages in both streaming and non-streaming responses, as well as in input messages for tool use continuations. LiteLLM now extracts reasoning_output_tokens from completion_tokens_details and captures reasoning_content in output messages for OpenAI-compatible providers.tracer
DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=false.pytest-rerunfailures and flaky were silently overridden by the ddtrace plugin. With this change, external rerun plugins will now drive retries as expected when Auto Test Retries and Early Flake Detection features are both disabled, otherwise our retry mechanism takes precedence and a warning is emitted.RuntimeError could be raised when iterating over the context._meta dictionary while creating spans or generating distributed traces.DD_TRACE_DEBUG instead of its own dedicated environment variable DD_INTERNAL_TELEMETRY_DEBUG_ENABLED. Setting DD_TRACE_DEBUG=true no longer enables telemetry debug mode. To enable telemetry debug mode, set DD_INTERNAL_TELEMETRY_DEBUG_ENABLED=true.cache_creation_input_tokens and cache_read_input_tokens were not captured when using the LiteLLM integration with providers that support prompt caching (e.g., Anthropic, OpenAI, Deepseek).SIGTERM instead of honoring --graceful-timeout. #16424AttributeError crash that occurred when the lock profiler or stack profiler encountered _DummyThread instances. _DummyThread lacks the _native_id attribute, so accessing native_id raises AttributeError. The profiler now falls back to using the thread identifier when native_id is unavailable.acquire call was successful.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
Span.parent_id will change from Optional[int] to int in v5.0.0.Experiments now report their execution status to the backend. Status transitions to running when execution starts, completed on success, failed when tasks or evaluators error with raise_errors=False, and interrupted when the experiment is stopped by an exception. #16713
Adds LLMObs.publish_evaluator() to sync a locally-defined LLMJudge evaluator to the Datadog UI as a custom LLM-as-Judge evaluation.
Adds support for DeepEval evaluations in LLM Observability Experiments by allowing users to pass a DeepEval evaluation (which either inherents from BaseMetric or BaseConversationalMetric) in an LLM Obs Experiment.
Example:
from deepeval.metrics import GEval
from deepeval.test_case import LLMTestCaseParams
from ddtrace.llmobs import LLMObs
correctness_metric = GEval(
name="Correctness",
criteria="Determine whether the actual output is factually correct based on the expected output.",
evaluation_steps=[
"Check whether the facts in 'actual output' contradicts any facts in 'expected output'",
"You should also heavily penalize omission of detail",
"Vague language, or contradicting OPINIONS, are OK"
],
evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT],
async_mode=True
)
dataset = LLMObs.create_dataset(
dataset_name="<DATASET_NAME>",
description="<DATASET_DESCRIPTION>",
records=[RECORD_1, RECORD_2, RECORD_3, ...]
)
def my_task(input_data, config):
return input_data["output"]
def my_summary_evaluator(inputs, outputs, expected_outputs, evaluators_results):
return evaluators_results["Correctness"].count(True)
experiment = LLMObs.experiment(
name="<EXPERIMENT_NAME>",
task=my_task,
dataset=dataset,
evaluators=[correctness_metric],
summary_evaluators=[my_summary_evaluator], # optional, used to summarize the experiment results
description="<EXPERIMENT_DESCRIPTION>."
)
result = experiment.run()
adds experiment summary logging after run() with row count, run count, per-evaluator stats, and error counts.
adds max_retries and retry_delay parameters to experiment.run() for retrying failed tasks and evaluators. Example: experiment.run(max_retries=3, retry_delay=lambda attempt: 2 ** attempt).
This introduces LLMObs.get_prompt() to retrieve managed prompts from Datadog's Prompt Registry. The method returns a ManagedPrompt object with a format()
method for variable substitution. Prompt updates propagate to running applications within the cache TTL (default: 60 seconds).
Use with annotation_context or annotate to correlate prompts with LLM spans:
prompt = LLMObs.get_prompt("greeting")
variables = {"user": "Alice"}
with LLMObs.annotation_context(prompt=prompt.to_annotation_dict(**variables)):
openai.chat.completions.create(messages=prompt.format(**variables))
experiments propagate canonical_ids from dataset records to the corresponding experiments span when present. The canonical_ids are only guaranteed to be available after calling pull_dataset.
LLMObs.create_dataset supports a bulk_upload parameter to control data uploading behavior. Both LLMObs.create_dataset and LLMObs.create_dataset_from_csv supports users specifying the deduplicate parameter.
Subset of dataset records can now be pulled with tags by using the tags argument to LLMObs.pull_dataset, provided in a list of strings of key value pairs: LLMObs.pull_dataset(dataset_name="my-dataset", tags=["env:prod", "version:1.0"])
LLMObs.create_dataset.AttributeError on openai-agents >= 0.8.0 caused by the removal of AgentRunner._run_single_turn.gevent module unnecessarily even when the profiler was not enabled.<module> in flame graphs has been fixed.asyncio.Condition | None). This was causing a TypeError at import time for libraries such as kopf that use union type annotations at class definition time.kafka_cluster_id tag to Kafka offset/backlog tracking for confluent-kafka. Previously, cluster ID was only included in DSM checkpoint edge tags (produce/consume) but missing from offset commit and produce offset backlogs. This ensures correct attribution of backlog data to specific Kafka clusters when multiple clusters share topic names.contextvars) could cause use-after-free or double-free crashes (SIGSEGV) inside libddwaf. A per-context lock now serializes WAF calls on the same context.ddtrace.internal.wrapping.context.BaseWrappingContext.pytest-html and other third-party reporting plugins caused by the ddtrace pytest plugin using a non-standard dd_retry test outcome for retry attempts. The outcome is now set to rerun, which is the standard value used by pytest-rerunfailures and recognized by reporting plugins.RuntimeError: generator didn't yield in the Symbol DB remote config subscriber when the process has no writable temporary directory.RuntimeError during forks.LLMJudge, BooleanStructuredOutput, ScoreStructuredOutput, and CategoricalStructuredOutput to the public ddtrace.llmobs module level.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
profiling: This fix resolves an issue where the lock profiler's wrapper class did not support PEP 604 type union syntax (e.g., asyncio.Condition | None). This was causing a TypeError at import time for libraries such as kopf that use union type annotations at class definition time.
Fix for potential crashes at process shutdown due to incorrect detection of the VM finalization state when stopping periodic worker threads.
Estimated end-of-life date, accurate to within three months: 08-2026 See the support level definitions for more information.
contextvars) could cause use-after-free or double-free crashes (SIGSEGV) inside libddwaf. A per-context lock now serializes WAF calls on the same context.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
cache_creation_input_tokens and cache_read_input_tokens were not captured when using the LiteLLM integration with providers that support prompt caching (e.g., Anthropic, OpenAI, Deepseek).LLMObs.publish_evaluator() to sync a locally-definedLLMJudge evaluator to the Datadog UI as a custom LLM-as-Judge evaluation.running when execution starts, completed on success, failed when tasks or evaluators error with raise_errors=False, and interrupted when the experiment is stopped by an exception. #16713asyncio.Condition | None). This was causing a TypeError at import time for libraries such as kopf that use union type annotations at class definition time.RuntimeError during forks.LLMJudge, BooleanStructuredOutput, ScoreStructuredOutput, and CategoricalStructuredOutput to the public ddtrace.llmobs module level.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
We have identified a bug where workloads relying on fork could encounter crashes post-fork due to a race condition. We are currently working on a fix.
AAP: Fixes a memory corruption issue where concurrent calls to the WAF on the same request context from multiple threads (e.g. an asyncio event loop and a thread pool executor inheriting the same context via contextvars) could cause use-after-free or double-free crashes (SIGSEGV) inside libddwaf. A per-context lock now serializes WAF calls on the same context.
CI Visibility: Fixed an incompatibility with pytest-html and other third-party reporting plugins caused by the ddtrace pytest plugin using a non-standard dd_retry test outcome for retry attempts. The outcome is now set to rerun, which is the standard value used by pytest-rerunfailures and recognized by reporting plugins.
LLM Observability: Adds support for DeepEval evaluations in LLM Observability Experiments by allowing users to pass a DeepEval evaluation (which either inherents from BaseMetric or BaseConversationalMetric) in an LLM Obs Experiment.
Example:
from deepeval.metrics import GEval
from deepeval.test_case import LLMTestCaseParams
from ddtrace.llmobs import LLMObs
correctness_metric = GEval(
name="Correctness",
criteria="Determine whether the actual output is factually correct based on the expected output.",
evaluation_steps=[
"Check whether the facts in 'actual output' contradicts any facts in 'expected output'",
"You should also heavily penalize omission of detail",
"Vague language, or contradicting OPINIONS, are OK"
],
evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT],
async_mode=True
)
dataset = LLMObs.create_dataset(
dataset_name="<DATASET_NAME>",
description="<DATASET_DESCRIPTION>",
records=[RECORD_1, RECORD_2, RECORD_3, ...]
)
def my_task(input_data, config):
return input_data["output"]
def my_summary_evaluator(inputs, outputs, expected_outputs, evaluators_results):
return evaluators_results["Correctness"].count(True)
experiment = LLMObs.experiment(
name="<EXPERIMENT_NAME>",
task=my_task,
dataset=dataset,
evaluators=[correctness_metric],
summary_evaluators=[my_summary_evaluator], # optional, used to summarize the experiment results
description="<EXPERIMENT_DESCRIPTION>."
)
result = experiment.run()
run() with row count, run count, per-evaluator stats, and error counts.max_retries and retry_delay parameters to experiment.run() for retrying failed tasks and evaluators. Example: experiment.run(max_retries=3, retry_delay=lambda attempt: 2 ** attempt).contextvars) could cause use-after-free or double-free crashes (SIGSEGV) inside libddwaf. A per-context lock now serializes WAF calls on the same context.ddtrace.internal.wrapping.context.BaseWrappingContext.pytest-html and other third-party reporting plugins caused by the ddtrace pytest plugin using a non-standard dd_retry test outcome for retry attempts. The outcome is now set to rerun, which is the standard value used by pytest-rerunfailures and recognized by reporting plugins.RuntimeError: generator didn't yield in the Symbol DB remote config subscriber when the process has no writable temporary directory.<module> in flame graphs has been fixed.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
Span.parent_id will change from Optional[int] to int in v5.0.0.This introduces LLMObs.get_prompt() to retrieve managed prompts from Datadog's Prompt Registry. The method returns a ManagedPrompt object with a format()
method for variable substitution. Prompt updates propagate to running applications within the cache TTL (default: 60 seconds).
Use with annotation_context or annotate to correlate prompts with LLM spans:
prompt = LLMObs.get_prompt("greeting")
variables = {"user": "Alice"}
with LLMObs.annotation_context(prompt=prompt.to_annotation_dict(**variables)):
openai.chat.completions.create(messages=prompt.format(**variables))
experiments propagate canonical_ids from dataset records to the corresponding experiments span when present. The canonical_ids are only guaranteed to be available after calling pull_dataset.
LLMObs.create_dataset supports a bulk_upload parameter to control data uploading behavior. Both LLMObs.create_dataset and LLMObs.create_dataset_from_csv supports users specifying the deduplicate parameter.
Subset of dataset records can now be pulled with tags by using the tags argument to LLMObs.pull_dataset, provided in a list of strings of key value pairs: LLMObs.pull_dataset(dataset_name="my-dataset", tags=["env:prod", "version:1.0"])
LLMObs.create_dataset.AttributeError on openai-agents >= 0.8.0 caused by the removal of AgentRunner._run_single_turn.gevent module unnecessarily even when the profiler was not enabled.kafka_cluster_id tag to Kafka offset/backlog tracking for confluent-kafka. Previously, cluster ID was only included in DSM checkpoint edge tags (produce/consume) but missing from offset commit and produce offset backlogs. This ensures correct attribution of backlog data to specific Kafka clusters when multiple clusters share topic names.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
We have identified a bug where workloads relying on fork could encounter crashes post-fork due to a race condition. We are currently working on a fix.
DD_CIVISIBILITY_USE_BETA_WRITER option), and also contains performance and memory usage improvements. A beta version of the plugin had been available since v4.2.0, and could be enabled via the DD_PYTEST_USE_NEW_PLUGIN environment variable. This new version is now the default, and the environment variable can be used to revert to the previous plugin if used with false or 0 values.tracing
DD_TRACE_128_BIT_TRACEID_GENERATION_ENABLED is deprecated and will be removed in version 5.0.0. 128-bit trace ID generation will become mandatory.The tracer parameter is deprecated in the following functions and class methods and will be removed in version 5.0.0:
trace_appTraceMiddleware.__init__TracePlugin.__init__TraceMiddleware.__init__TraceMiddleware.__init__get_traced_cachetrace_engineWSGIMiddleware.__init__The ddtrace.trace.tracer singleton is always used.
uvloop with asyncio.query, ClaudeSDKClient.query, and ClaudeSDKClient.receive_messages methods. See the docs for more information.start_as_current_span decorator on asynchronous functions. Requires opentelemetry-api>=1.24.LLMObs.async_experiment() method for running experiments with async task functions and mixed sync/async evaluators.StringCheckEvaluator: Performs string comparison operations (equals, not equals, contains, case-insensitive contains).RegexMatchEvaluator: Validates output against regex patterns with search, match, or fullmatch modes.LengthEvaluator: Validates output length constraints by characters, words, or lines.JSONEvaluator: Validates JSON syntax and optionally checks for required keys.SemanticSimilarityEvaluator: Measures semantic similarity between output and expected output using embedding vectors.json metric type in evaluation metrics. Users can now submit dict values as evaluation metrics using LLMObs.submit_evaluation() with metric_type="json". Additionally, experiment evaluators that return dict values are automatically detected as json metric type.LLMJudge evaluator for automated evaluation of LLM outputs using another LLM as the judge. Supports OpenAI and Anthropic providers with boolean, score, categorical, and custom JSON schema output formats.site-packages directory is now added as the last entry in the PYTHONPATH environment variable (it previously was added before the last entry).asyncio Tasks has been fixed."<N frame(s) omitted>" entries in profiling data and unbounded memory growth in the memory profiler.--skip-atexit flag and registered Python atexit handlers regardless. This caused profiler cleanup code to run during process shutdown even when --skip-atexit was set, leading to crashes and hangs in uwsgi workers.SIGTERM or SIGINT signals.SIGTERM or SIGINT signals.noopener and noreferrer link tags in the Datadog link in the footer of the App and API Protection HTML blocking response template.
This could previously trigger a "reverse tabnabbing" vulnerability finding from other security analysis tools.DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED was enabled.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
DD_CIVISIBILITY_USE_BETA_WRITER option), and also contains performance and memory usage improvements. A beta version of the plugin had been available since v4.2.0, and could be enabled via the DD_PYTEST_USE_NEW_PLUGIN environment variable. This new version is now the default, and the environment variable can be used to revert to the previous plugin if used with false or 0 values.tracing
DD_TRACE_128_BIT_TRACEID_GENERATION_ENABLED is deprecated and will be removed in version 5.0.0. 128-bit trace ID generation will become mandatory.The tracer parameter is deprecated in the following functions and class methods and will be removed in version 5.0.0:
trace_appTraceMiddleware.__init__TracePlugin.__init__TraceMiddleware.__init__TraceMiddleware.__init__get_traced_cachetrace_engineWSGIMiddleware.__init__The ddtrace.trace.tracer singleton is always used.
uvloop with asyncio.query, ClaudeSDKClient.query, and ClaudeSDKClient.receive_messages methods. See the docs for more information.start_as_current_span decorator on asynchronous functions. Requires opentelemetry-api>=1.24.LLMObs.async_experiment() method for running experiments with async task functions and mixed sync/async evaluators.StringCheckEvaluator: Performs string comparison operations (equals, not equals, contains, case-insensitive contains).RegexMatchEvaluator: Validates output against regex patterns with search, match, or fullmatch modes.LengthEvaluator: Validates output length constraints by characters, words, or lines.JSONEvaluator: Validates JSON syntax and optionally checks for required keys.SemanticSimilarityEvaluator: Measures semantic similarity between output and expected output using embedding vectors.json metric type in evaluation metrics. Users can now submit dict values as evaluation metrics using LLMObs.submit_evaluation() with metric_type="json". Additionally, experiment evaluators that return dict values are automatically detected as json metric type.LLMJudge evaluator for automated evaluation of LLM outputs using another LLM as the judge. Supports OpenAI and Anthropic providers with boolean, score, categorical, and custom JSON schema output formats.site-packages directory is now added as the last entry in the PYTHONPATH environment variable (it previously was added before the last entry).asyncio Tasks has been fixed."<N frame(s) omitted>" entries in profiling data and unbounded memory growth in the memory profiler.--skip-atexit flag and registered Python atexit handlers regardless. This caused profiler cleanup code to run during process shutdown even when --skip-atexit was set, leading to crashes and hangs in uwsgi workers.SIGTERM or SIGINT signals.SIGTERM or SIGINT signals.noopener and noreferrer link tags in the Datadog link in the footer of the App and API Protection HTML blocking response template.
This could previously trigger a "reverse tabnabbing" vulnerability finding from other security analysis tools.DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED was enabled.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
tracing
DD_TRACE_128_BIT_TRACEID_GENERATION_ENABLED is deprecated and will be removed in version 5.0.0. 128-bit trace ID generation will become mandatory.The tracer parameter is deprecated in the following functions and class methods and will be removed in version 5.0.0:
trace_appTraceMiddleware.__init__TracePlugin.__init__TraceMiddleware.__init__TraceMiddleware.__init__get_traced_cachetrace_engineWSGIMiddleware.__init__The ddtrace.trace.tracer singleton is always used.
uvloop with asyncio.StringCheckEvaluator: Performs string comparison operations (equals, not equals, contains, case-insensitive contains).RegexMatchEvaluator: Validates output against regex patterns with search, match, or fullmatch modes.LengthEvaluator: Validates output length constraints by characters, words, or lines.JSONEvaluator: Validates JSON syntax and optionally checks for required keys.SemanticSimilarityEvaluator: Measures semantic similarity between output and expected output using embedding vectors.json metric type in evaluation metrics. Users can now submit dict values as evaluation metrics using LLMObs.submit_evaluation() with metric_type="json". Additionally, experiment evaluators that return dict values are automatically detected as json metric type.LLMJudge evaluator for automated evaluation of LLM outputs using another LLM as the judge. Supports OpenAI and Anthropic providers with boolean, score, categorical, and custom JSON schema output formats.site-packages directory is now added as the last entry in the PYTHONPATH environment variable (it previously was added before the last entry).asyncio Tasks has been fixed."<N frame(s) omitted>" entries in profiling data and unbounded memory growth in the memory profiler.--skip-atexit flag and registered Python atexit handlers regardless. This caused profiler cleanup code to run during process shutdown even when --skip-atexit was set, leading to crashes and hangs in uwsgi workers.SIGTERM or SIGINT signals.SIGTERM or SIGINT signals.noopener and noreferrer link tags in the Datadog link in the footer of the App and API Protection HTML blocking response template.
This could previously trigger a "reverse tabnabbing" vulnerability finding from other security analysis tools.DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED was enabled.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
tracing
DD_TRACE_128_BIT_TRACEID_GENERATION_ENABLED is deprecated and will be removed in version 5.0.0. 128-bit trace ID generation will become mandatory.The tracer parameter is deprecated in the following functions and class methods and will be removed in version 5.0.0:
trace_appTraceMiddleware.__init__TracePlugin.__init__TraceMiddleware.__init__TraceMiddleware.__init__get_traced_cachetrace_engineWSGIMiddleware.__init__The ddtrace.trace.tracer singleton is always used.
uvloop with asyncio.StringCheckEvaluator: Performs string comparison operations (equals, not equals, contains, case-insensitive contains).RegexMatchEvaluator: Validates output against regex patterns with search, match, or fullmatch modes.LengthEvaluator: Validates output length constraints by characters, words, or lines.JSONEvaluator: Validates JSON syntax and optionally checks for required keys.SemanticSimilarityEvaluator: Measures semantic similarity between output and expected output using embedding vectors.json metric type in evaluation metrics. Users can now submit dict values as evaluation metrics using LLMObs.submit_evaluation() with metric_type="json". Additionally, experiment evaluators that return dict values are automatically detected as json metric type.LLMJudge evaluator for automated evaluation of LLM outputs using another LLM as the judge. Supports OpenAI and Anthropic providers with boolean, score, categorical, and custom JSON schema output formats.site-packages directory is now added as the last entry in the PYTHONPATH environment variable (it previously was added before the last entry)."<N frame(s) omitted>" entries in profiling data and unbounded memory growth in the memory profiler.--skip-atexit flag and registered Python atexit handlers regardless. This caused profiler cleanup code to run during process shutdown even when --skip-atexit was set, leading to crashes and hangs in uwsgi workers.SIGTERM or SIGINT signals.SIGTERM or SIGINT signals.noopener and noreferrer link tags in the Datadog link in the footer of the App and API Protection HTML blocking response template.
This could previously trigger a "reverse tabnabbing" vulnerability finding from other security analysis tools.DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED was enabled.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
tracing
DD_TRACE_128_BIT_TRACEID_GENERATION_ENABLED is deprecated and will be removed in version 5.0.0. 128-bit trace ID generation will become mandatory.The tracer parameter is deprecated in the following functions and class methods and will be removed in version 5.0.0:
trace_appTraceMiddleware.__init__TracePlugin.__init__TraceMiddleware.__init__TraceMiddleware.__init__get_traced_cachetrace_engineWSGIMiddleware.__init__The ddtrace.trace.tracer singleton is always used.
StringCheckEvaluator: Performs string comparison operations (equals, not equals, contains, case-insensitive contains).RegexMatchEvaluator: Validates output against regex patterns with search, match, or fullmatch modes.LengthEvaluator: Validates output length constraints by characters, words, or lines.JSONEvaluator: Validates JSON syntax and optionally checks for required keys.SemanticSimilarityEvaluator: Measures semantic similarity between output and expected output using embedding vectors.json metric type in evaluation metrics. Users can now submit dict values as evaluation metrics using LLMObs.submit_evaluation() with metric_type="json". Additionally, experiment evaluators that return dict values are automatically detected as json metric type.LLMJudge evaluator for automated evaluation of LLM outputs using another LLM as the judge. Supports OpenAI and Anthropic providers with boolean, score, categorical, and custom JSON schema output formats.site-packages directory is now added as the last entry in the PYTHONPATH environment variable (it previously was added before the last entry)."<N frame(s) omitted>" entries in profiling data and unbounded memory growth in the memory profiler.--skip-atexit flag and registered Python atexit handlers regardless. This caused profiler cleanup code to run during process shutdown even when --skip-atexit was set, leading to crashes and hangs in uwsgi workers.SIGTERM or SIGINT signals.Estimated end-of-life date, accurate to within three months: 05-2027 See the support level definitions for more information.
We have identified a bug where profiles emitted by ddtrace v4.4.* do not contain tags injected by the Datadog Trace Agent (like pod_name, kube_namespace, etc.)
This issue is fixed in ddtrace v4.5.0.
We’re currently investigating an issue impacting services using ddtrace versions v4.1.* -> v4.4.* with profiling turned on. Our engineering teams are actively working on a permanent fix. In the meantime, disabling the memory profiler in the ddtrace configuration has been identified as a temporary workaround, DD_PROFILING_MEMORY_ENABLED=false.
BaseEvaluator class, providing more flexibility and structure for implementing evaluation logic. The EvaluatorContext stores the context of the evaluation, including the dataset record and span information. Additionally, class-based summary evaluators are supported via BaseSummaryEvaluator, which receives a SummaryEvaluatorContext containing aggregated inputs, outputs, expected outputs, and per-row evaluation results.DD_TRACE_LOG_LEVEL to control the ddtrace logger level, following the log levels available in the logging module.pathlib.Path.open() for App and API Protection Exploit Prevention.DD_TRACE_TORNADO_ENABLED=true or DD_PATCH_MODULES=tornado:trueFallbackStreamWrapper (introduced for mid-stream fallback support) that caused an AttributeError when attempting to access the .handler attribute. The integration now gracefully handles both the original response format and wrapped responses by falling back to ddtrace's own stream wrapping when needed.asyncio task stacks could contain duplicated frames when the task was on-CPU is now fixed. The stack now correctly shows each frame only once.gevent.joinall is called.StreamedRunResult.stream_responses() method which was introduced in pydantic-ai==0.8.1. This was leading to agent spans not being finished.LLMObs.experiment was overly constrained due to the use of an invariant List type. The argument now uses the covariant Sequence type, allowing users to pass in a list of evaluator functions with narrower return type.