releases.shpreview

Agent

Host-level monitoring agent

Mon
Wed
Fri
JunJulAugSepOctNovDecJanFebMarAprMayJun
Less
More
Releases14Avg Interval7dAvg Cadence4/mo

APM trace agent starts lazy; new AI and GPU features

This release10 featuresNew capabilities5 enhancementsImprovements to existing featuresAI-tallied from the release notes
v7.80.0

Agent

Prelude

Released on: 2026-06-11

Upgrade Notes
  • Health Platform: the ReportIssue method now takes a single IssueReport argument instead of (checkID, checkName string, report *IssueReport). The IssueReport struct carries three new fields — IssueID (unique instance id), IssueType (template id), and Source (reporting integration name) — replacing the separate checkID and checkName arguments.

    The health platform persistence file format has been bumped to version 2. Existing persistence files (<run_path>/health-platform/issues.json) written by a previous agent version will be detected, logged as incompatible, and discarded on startup; the agent starts with a fresh issue state. No data migration is performed.

    For integrations calling ReportIssue: construct an IssueReport with IssueID set to a unique instance key (e.g. "check-execution-failure:<check-id>"), IssueType set to the template identifier that was previously passed as the IssueId field of the proto IssueReport, and Source set to the integration name. To resolve an issue, call ResolveIssue(issueID) instead of passing nil to ReportIssue.

  • Health Platform: the health_platform.issues_detected telemetry counter is now tagged with issue_type instead of health_check_id. Update any dashboards, monitors, or telemetry configuration that filtered or grouped by the health_check_id tag to use issue_type instead.

  • APM: On Linux, the trace agent process now only starts once data is sent to any of its configured listeners.

    Previously, the trace agent started immediately on agent startup, it now starts lazily when needed, which reduces resource usage. To disable and restore the previous behavior, set apm_config.socket_activation.enabled: false in datadog.yaml, or set the environment variable DD_APM_SOCKET_ACTIVATION_ENABLED=false.

New Features
  • The Windows MSI installer now ships the AI usage Chrome native messaging host (ai-prompt-logger-native-host.exe) under bin\agent. The installer generates a Chrome Native Messaging Host manifest under bin\agent\dist and registers it machine-wide under HKLM\SOFTWARE\Google\Chrome\NativeMessagingHosts (including the WOW6432Node view for 32-bit Chrome). The host's runtime configuration is generated as C:\ProgramData\Datadog\ai_usage_native_host.yaml and uses the Agent's configured APM receiver port.

  • Adds a new discovery.service_map.enabled system-probe configuration option that boots the universal service monitoring (USM) eBPF monitor in a restricted mode, capturing only the data needed to render a service dependency map (HTTP and HTTPS via TLS uprobes). Hosts running in this mode are not billed as USM customers, are not surfaced in USM dashboards, and do not produce universal.http.* metrics. Intended for non-APM customers as a free preview of application observability.

  • Add k8sobjectsreceiver to the DDOT (Datadog Distribution of OpenTelemetry Collector) default manifest, enabling collection of Kubernetes object events and resource states via the OpenTelemetry Collector pipeline.

  • Adds a new action get-resource in kubeactions.

  • The fleet installer's agent-package OCI index now contains a FIPS-flavored sibling manifest for each platform, distinguished by the OCI Platform.Variant field. When DD_FIPS_MODE=true is set, the installer downloads the FIPS manifest; otherwise it downloads the base manifest. The package URL is unchanged in both cases.

  • Add a new nccl core check that collects per-rank NCCL collective communication metrics from GPU training and inference workloads.

    The check listens on a Unix domain socket (default /var/run/datadog/nccl.socket) for JSON events emitted by the NCCL profiler plugin (libnccl-profiler-dd.so) running inside GPU pods. Each event is tagged with rank, collective, n_ranks, kube_pod_name, kube_namespace, and kube_container_name.

    Metrics emitted:

    • nccl.collective.exec_time_us — time a rank spends inside a collective operation. A rank with a significantly lower value than its peers is the straggler; ranks with higher values are waiting at the barrier.
    • nccl.collective.algo_bandwidth_gbps — algorithm bandwidth of the collective.
    • nccl.collective.bus_bandwidth_gbps — bus bandwidth normalised for the collective type.
    • nccl.collective.msg_size_bytes — tensor size being communicated.
    • nccl.rank.seconds_since_last_event — seconds since this rank last reported an event; non-zero values indicate a potential hang.

    Enable the check cluster-wide by setting gpu.nccl.enabled: true in the Agent configuration (or DD_GPU_NCCL_ENABLED=true). The socket path can be overridden via gpu.nccl.socket_path; the host directory mounted into training pods can be overridden via gpu.nccl.host_socket_path.

  • Add support for the datadog.metric.as_type datapoint attribute on OTLP delta sum metrics. When this attribute is set to "rate", the metric is sent to Datadog as a Rate (value divided by interval) instead of a Count. Accepted values are "rate", "count", and "gauge"; unknown values are logged and ignored. This allows users migrating from DogStatsD to OpenTelemetry to preserve rate-type metric behavior.

  • Add multi_secret_backends in datadog.yaml so you can declare extra named secret backends (each with type and config). When no secret_backend_type is set, select the backend per handle using ENC[backendID;secretKey] (backendID matches a name under multi_secret_backends). Precedence is secret_backend_command (if set) over secret_backend_type over multi_secret_backends: a custom command wins over native type; when native secret_backend_type is set (and no custom command), every ENC[...] inner string is resolved only through that type and multi_secret_backends is not used for routing.

  • Add admission_controller.auto_instrumentation.container_registry_allow_list configuration option (env var DD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_CONTAINER_REGISTRY_ALLOW_LIST) to restrict which container registries can be used as sources for APM library injection via Single Step Instrumentation. When set to a non-empty comma-separated list, the admission controller will skip injection for any pod whose injector image registry is not in the list, and will set the internal.apm.datadoghq.com/injection-error annotation with the reason. An empty list (the default) allows injection from any registry.

  • Windows: windows_certificate check adds certificate_store_regex, a list of Go regular expressions matched against HKLM certificate store names. Patterns are matched case-insensitively. certificate_store and certificate_store_regex can be used together; at least one must be set.

Enhancement Notes
  • Process kubernetes actions asynchronously to avoid blocking the main thread.

  • Add an example OpenMetrics check configuration for Agent Data Plane deployments to restore datadog.agent.dogstatsd.* and datadog.agent.forwarder.transactions.* metrics.

  • Pre-register datadog-apm-library-iis, datadog-apm-library-iis-rum, and datadog-apm-library-httpd in the fleet installer. The packages are gated behind remote updates so they can be rolled out via remote configuration without a new installer release.

  • Chunk remote workloadmeta messages in the Agent to avoid exceeding the gRPC max message size.

  • APM : The Trace Agent agent status output now shows the UDS (Unix Domain Socket) receiver path when UDS is enabled, in addition to the existing TCP receiver address. Each per-client entry in the receiver stats section also displays the connection type (tcp, uds, or pipe), making it easier to distinguish traffic arriving via different transports.

  • Autodiscovery template resolution failures are now logged at ERROR level instead of DEBUG, making them visible without enabling debug logging. Additionally, when the health platform is enabled, these failures are reported as AD misconfiguration health events with actionable remediation steps, providing proactive visibility when an autodiscovered check config is silently skipped due to unsupported template variables.

  • When infrastructure_mode is basic, the Agent's default allowlist now includes the Directory, WMI Check, Windows Certificate, Windows Performance Counters, and Windows Registry integrations so they can run without extra integration.additional configuration on Windows-oriented deployments.

  • Agents are now built with Go 1.25.10.

  • On Windows, network connections collected by Cloud Network Monitoring are now tagged with interface_name and interface_type.

  • The Agent now streams Kubernetes metadata from the Cluster Agent by default, instead of polling for it periodically. This propagates tags derived from Kubernetes metadata (like kube_service) with less delay. This behavior is controlled by the kubernetes_metadata_streaming setting.

  • agent diagnose now renders the check name as a prefix for all checks under the check-datadog suite. The JSON output gains a check_name field for the same purpose.

  • The --include and --exclude flags of agent diagnose now match against the suite name, the owning check name, and the diagnosis category. For example, agent diagnose --include postgres now filters individual diagnoses across all suites instead of only matching suite names.

  • DogStatsD timing metrics (t type) now include an explicit unit value (millisecond) in the metric payload sent to Datadog, allowing the Datadog UI to display the correct unit automatically.

  • Dynamic Instrumentation now supports compound conditions using &&, ||, and !.

  • When infrastructure_mode is set to none, ECS task metadata collection is now disabled by default (see ecs_task_collection_enabled). Set DD_ECS_TASK_COLLECTION_ENABLED to true to override.

  • When infrastructure_mode is set to end_user_device, the Agent now attaches additional host tags to identify the device: infrastructure_mode:end_user_device, os_name, os_version, cpu_model, total_memory_gb, and device_model. Hardware and OS tags are collected on macOS and Windows only.

  • Add new ad_tag_completeness_max_wait configuration option. When set, autodiscovery waits up to that many seconds for an entity's tags to be complete before scheduling checks for it. This avoids checks running briefly with incomplete tags. It's disabled by default.

  • Add logs_config.use_container_timestamp to optionally use the time field from container log files as the log timestamp instead of ingestion time, preserving container-provided per-line timestamps.

  • Logs Agent: logs_config.tag_multi_line_logs and logs_config.tag_truncated_logs now default to true so file logs are tagged by default when they were aggregated as multiline logs or truncated by the Agent.

  • Use native API requestWhenInUseAuthorization() to manage location permission prompt on MacOS.

  • The OTel Agent standalone mode now automatically disables IPC with a core Datadog Agent. When DD_OTEL_STANDALONE is enabled, DD_CMD_PORT is forced to -1, so users no longer need to set it manually when running DDOT without a core Agent.

  • The service.instance.id OpenTelemetry resource attribute is now mapped to the service.instance.id Datadog metric tag when converting OTLP metrics. This attribute is required for OTel traffic metrics in Datadog Fleet Automation.

  • Private Action Runner: When private_action_runner.api_key_only_enrollment is enabled, the agent now enrolls via the new API-key-only OPMS endpoint (/api/unstable/on_prem_runners/api_key_only). This allows runners to self-enroll using only a scoped API key, without requiring an application key.

  • The Private Action Runner now honors the X-Retry-After-Ms response header returned by the Datadog backend on workflow task dequeue and health check requests.

  • The Private Action Runner now retries self-enrollment and auto-connection creation requests when the Datadog API returns a transient 5xx response.

  • Parse the ECS /tasks host metadata endpoint on Managed Instances and populate DaemonName for daemon-scheduled tasks.

  • The default for logs_config.file_scan_period is now 1 second instead of 10, so the Agent discovers new and rotated log files on disk more quickly. Set logs_config.file_scan_period explicitly if you need a slower scan to reduce filesystem load (for example on network file systems).

  • Bumped the Security Agent policies to v0.80.0

  • Reduce the payload size of SNMP device metrics by letting the Datadog backend enrich device tags (such as snmp_device, device_ip, and device_id) from device metadata instead of attaching them to every metric. Existing queries and monitors continue to work, and no action is required. This only applies when collect_device_metadata is enabled (the default).

  • SNMP network device metadata: when a profile lists multiple scalar symbols for the same metadata field (for example serial_number), the check now skips values that resolve to an empty string (after trimming whitespace) and continues to the next symbol, matching the intended fallback order when an OID exists but carries no usable serial.

  • Upgrade OpenTelemetry Collector dependencies from v0.150.0 to v0.151.0 (core v1.56.0 to v1.57.0).

    Notable upstream changes:

    • Removed stable feature gates that are no longer needed: connector.datadogconnector.NativeIngest, exporter.datadogexporter.UseLogsAgentExporter, and exporter.datadogexporter.metricexportnativeclient.
    • Several collector-contrib components have been renamed with deprecated aliases (spanmetrics to span_metrics, hostmetrics to host_metrics, fluentforward to fluent_forward). The old names continue to work but will be removed in a future release.

    See the full upstream changelogs: collector-contrib v0.151.0, collector core v0.151.0.

  • Upgrade OpenTelemetry Collector dependencies from v0.151.0 to v0.152.0 (core v1.57.0 to v1.58.0).

    See the full upstream changelogs: collector-contrib v0.152.0, collector core v0.152.0.

  • A small sample of series metric flushes (0.1% by default) is now additionally sent to a v3beta metrics intake endpoint to validate the upcoming v3 metrics protocol. Shadow traffic is only sent for agents configured against the datadoghq.com (US1) site. To opt out, set serializer_experimental_use_v3_api.series.shadow_sample_rate to 0.

  • The OTel Agent now logs a warning and displays it in agent status when the hostmetrics receiver is configured while running in connected mode (DD_OTEL_STANDALONE=false). In connected mode the core Datadog Agent already collects host metrics, so enabling the hostmetrics receiver can lead to duplicate or conflicting metric names. To suppress the warning, either remove the hostmetrics receiver or switch to standalone mode (DD_OTEL_STANDALONE=true).

  • Add six opt-in tag flags to the Windows Certificate Store integration: certificate_template_tag, enhanced_key_usage_tag, friendly_name_tag, subject_alternative_names_tag, issuer_tag, and signature_algorithm_tag. When enabled, each certificate's metrics and service checks are tagged with the corresponding X.509 or Windows certificate property. All flags default to false.

Deprecation Notes
  • APM: Restored the deprecated DD_APM_SPAN_DERIVED_PRIMARY_TAGS configuration option, but only in serverless contexts: the Datadog Azure App Services extension (DD_AZURE_APP_SERVICES=1) and serverless-init (Cloud Run, Container Apps, Cloud Run Functions). In all other deployments the option is silently ignored. Tracers should populate additional_metric_tags instead; do not use DD_APM_SPAN_DERIVED_PRIMARY_TAGS in new deployments.
Security Notes
  • Bumped pip to 26.1.1 in the embedded Python distribution to address CVE-2026-6357.
  • Updated the Windows 1809 / LTSC 2019 Agent container base images from the deprecated mcr.microsoft.com/powershell:*-1809 images to mcr.microsoft.com/dotnet/sdk:9.0-nanoserver-1809 (nanoserver) and mcr.microsoft.com/dotnet/sdk:9.0-windowsservercore-ltsc2019 (servercore). The previous PowerShell base images were unmaintained and still shipped PowerShell 7.1.0, which is affected by CVE-2022-26788.
Bug Fixes
  • The debugger proxy no longer forwards Exception Replay and Live Debugger logs when logs_enabled is false. This can be overridden using the new apm_config.debugger_logs_enabled_override setting (environment variable DD_APM_DEBUGGER_LOGS_ENABLED_OVERRIDE), which enables Exception Replay and Live Debugger when logs_enabled is false.
  • APM : Fixed trace span obfuscation for OpenSearch request bodies when Elasticsearch JSON obfuscation is also enabled. Spans that only included the opensearch.body tag (and not elasticsearch.body) were previously left unobfuscated in that configuration.
  • APM : Fix SQL obfuscation error when a query uses PostgreSQL array slice syntax with bind parameters (e.g. arr[$1:] or arr[$1:$2]). The tokenizer was incorrectly treating the : range separator as the start of a named bind variable, causing obfuscation to fail with a LexError.
  • APM OTLP: Preserve gRPC status codes on trace metrics computed by DDOT and the OpenTelemetry Collector Datadog connector. This includes explicit gRPC status attributes such as rpc.grpc.status_code and the newer OpenTelemetry semantic convention rpc.response.status_code when rpc.system.name is grpc.
  • APM : Enforce body-size limits on trace-agent proxy endpoints (DogStatsD, pipeline stats, OpenLineage, Debugger, SymDB). All endpoints are capped at apm_config.max_request_bytes (default 25 MB). The profiling proxy uses a separate limit configurable via apm_config.profiling_max_request_bytes (default 50 MB, env DD_APM_PROFILING_MAX_REQUEST_BYTES). The Traces and ClientStatsPayload msgpack decoders now reject payloads declaring more than 500,000 elements in a single array.
  • APM : Fix an issue where converting traces to the v1 format did not prefer the root span's sampling priority when multiple _sampling_priority_v1 values were present on spans in the same trace.
  • Logs collected with automatic multiline detection now fall back to individual events when combining lines would exceed logs_config.max_message_size_bytes. Oversized single log lines continue to use the normal truncation path, and multiline logs that fit within the limit are still aggregated.
  • [DBM] Bump go-sqllexer to v0.2.2 to fix the following bugs:
    • Obfuscate EXTRACT field keywords (e.g. epoch, year) so that queries from pg_stat_activity and pg_stat_statements converge on the same DBM signature.
    • Fix handling of PostgreSQL VACUUM commands so they are correctly extracted into statement metadata.
    • Fix lexer handling of multiline comments immediately following keywords.
  • Fixed HTTP flows being incorrectly dropped when a request body arrives before the response. Fixed pending HTTP transactions not being finalized when a connection is closed by a bare FIN or RST. Fixed a bug check (BSOD) caused by mismatched maxRequestFragment values during HTTP initialization.
  • Fixed the container ID to PID mapping for processes running in a sub-cgroup of their container's cgroup (for example, CrowdStrike Falcon's sensor.falcon scope nested under a container scope).
  • Fix the connection_reset_interval setting not being applied to additional log endpoints and HTTP MRF endpoints. Previously, only the main log endpoint would periodically reset its connection, which could cause additional endpoints to send logs to stale destinations after a DNS failover. Additional endpoints now inherit the global logs_config.connection_reset_interval value by default, and can also be overridden per-endpoint in the additional_endpoints configuration.
  • Fix a panic in the system-probe network tracer caused by concurrent access to the gateway lookup subnet cache. The cache now uses a thread-safe LRU implementation.
  • Fixed the health platform forwarder using the wrong intake endpoint. Agent health reports are now sent to agenthealth-intake.{site} instead of event-platform-intake.{site}, which is only configured for the logs and processesraw tracks. This caused org_id to be missing from all agent health recommendations.
  • Fix a class of IPv6 host:port formatting bugs found in multiple call sites across the Agent. IPv6 literals in host configuration values were not bracketed when used to build URLs and dial addresses, producing strings like http://fd38::1:5005 instead of http://[fd38::1]:5005 and causing too many colons in address errors at runtime.
  • Fixes the journald log tailer skipping the first journal entry when start_position is set to beginning or forceBeginning.
  • Fix chassis type detection for Mac mini and Mac Pro hosts, which were previously reported as Other.
  • Fix the macOS battery check detecting battery on Mac minis.
  • Logs: Fixed a bug where the MultiLineParser did not mark truncation when reassembled log lines exceeded the 900KB size cap. Oversized lines are now properly flagged with IsTruncated so that downstream handlers can apply truncation markers and increment telemetry.
  • Fix spurious "Unknown environment variable" warnings for DD_SYNC_DELAY, DD_SYNC_TO, and DD_CORE_CONFIG when running the OTel Agent.
  • Fix a panic in the OTLP metrics pipeline when a sender submits a histogram with more BucketCounts entries than ExplicitBounds allows (violating the counts == bounds + 1 OpenTelemetry specification invariant). Such data points are now rejected with an error instead of crashing the agent.
  • Fix a C-memory leak in the logs batch sender where resetBatch() replaced the zstd StreamCompressor without closing the previous one.
  • The datadog-installer.exe install script now adds datadog.yaml.example template comments to the config files on fresh installs.
  • Fixed the gohai resource check silently dropping processes whose UID does not exist in the host's /etc/passwd. This commonly affects containerized processes running as UIDs created inside container images. The "Processes memory usage" widget on the host infrastructure page now correctly includes these processes by falling back to the numeric UID string when username lookup fails.
  • gpu: fix an issue where some GPM metrics (gpu.gr_engine_active, gpu.sm_utilization, gpu.sm_occupancy, gpu.integer_active, gpu.fp16_active, gpu.fp32_active, gpu.fp64_active, gpu.tensor_active) were only emitted correctly one out of eight times on average.
  • NDM SNMP: fix IP metadata fields rendered as <nil> for OIDs declared with SYNTAX IpAddress (e.g. Cisco IPsec tunnel local/remote outside IPs, CDP remote addresses). gosnmp decodes these values as Go strings, but the metadata store previously only handled the raw-bytes path.
  • Fixes an issue with the NetFlow collector where certain packets would be dropped, producing error logs and lost data. The issue was caused when a packet had trailing padding which was not properly handled.
  • Fixed a bug where empty log sources would get orphaned due to empty serviceID due the agent attempting to collect logs from short lived containers that exit quickly.
  • Fixed a regression introduced around Agent 7.40 where non-template check configurations containing unresolvable ENC[...] secrets were still scheduled with raw secret handles in their config. Checks are now correctly dropped when all instances fail secret decryption. When only some instances fail, the surviving instances are scheduled and the failing instances are dropped, preserving the pre-regression per-instance behavior.
  • SNMP: Detect GetBulk response truncation (fewer varbinds than requested OIDs) and automatically reduce the batch size, preventing silent metric loss on devices that truncate large SNMP responses.
Other Notes
  • Add handling for dbm-column-statistics events in the event platform forwarder. These events are used by Database Monitoring integrations to report column statistics from database catalogs.
  • Add metrics origins for Cisco SD-WAN and Versa integrations.
  • Add metrics origins for HPE Aruba EdgeConnect and NiFi.
  • Removed support for using the OpenTelemetry components contained in this repo from an external collector, using OCB. The equivalent components in the opentelemetry-collector-contrib repository are designed exactly for this use case and should be used instead: - datadog exporter - datadog extension

Datadog Cluster Agent

Prelude

Released on: 2026-06-11 Pinned to datadog-agent v7.80.0: CHANGELOG.

Upgrade Notes
  • Updated the bundled kube-state-metrics library from v2.13 to v2.18. The kube-state-metrics metric allow/deny list now uses ECMAScript regular expression syntax instead of Go regexp syntax. Most patterns are compatible, but users relying on Go-specific regex features (e.g. (?s) flag) in metric_allowlist or metric_denylist should update their patterns.
New Features
  • The Cluster Agent admission controller now reports connectivity probe failures to the Datadog Health Platform. When the admission webhook becomes unreachable, an admission-controller-connectivity-failure health issue is raised with severity high and category availability, including remediation steps. The issue is automatically resolved when connectivity is restored.
  • Add a Prometheus HTTP Service Discovery (HTTP SD) provider for the Cluster Agent. The provider polls Prometheus-compatible HTTP SD endpoints and generates check configurations for each discovered target. Configure endpoints under prometheus_http_sd.configs, each providing its own url and check_template.
  • Autoscaling profiles (DatadogPodAutoscalerClusterProfile) now support Argo Rollouts as a target workload type. The Cluster Agent automatically detects whether the Argo Rollouts CRD is installed at startup and, if present, watches Rollout resources for profile labels alongside Deployments and StatefulSets.
  • The kubernetes_state core check now collects both endpoints and endpointslices resources by default, and emits new kubernetes_state.endpointslice.address_available and kubernetes_state.endpointslice.address_not_ready metrics for Kubernetes EndpointSlice objects, mirroring the existing kubernetes_state.endpoint.address_available and kubernetes_state.endpoint.address_not_ready metrics.
Enhancement Notes
  • The orchestrator check now collects force-deleted pods by default. The orchestrator_explorer.terminated_pods_improved.enabled option will be removed in a future release.

Docker daemon connectivity restored; AppSec webhook fixed

This release5 fixesBug fixesAI-tallied from the release notes
v7.79.2

Agent

Prelude

Released on: 2026-06-03

Security Notes
  • Bumped containerd dependencies to mitigate CVE-2026-46680: github.com/containerd/containerd to v1.7.32 and pinned github.com/containerd/containerd/v2 to v2.0.9 (the EOL v2.1.x line has no fix).
Bug Fixes
  • Use the Docker daemon's /ping endpoint instead of /info to verify connectivity during DockerUtil initialization. Some daemons emit DefaultAddressPools[].Base values in /info that are not valid CIDRs, which fail the strict netip.Prefix decoding introduced by the moby v29 client and previously caused DockerUtil to fail to initialize. This cascaded into the Docker workloadmeta collector and the Docker core check being unavailable, leading to missing container/image tags on metrics and traces from Docker containers.

  • Fix the Agent's Docker integration against Docker daemons that return malformed values in their /info response. The failure was visible in Agent logs as:

    Docker init error: temporary failure in dockerutil, will retry later:
    Error reading remote info: netip.ParsePrefix("invalid Prefix"): no '/'

    When triggered, it prevented the Docker integration from initializing, which cascaded into:

    • missing container and image tags on metrics, traces and logs collected from Docker containers,
    • missing docker_version and docker_swarm entries in host metadata,
    • missing docker_swarm_node_role host tag on Docker Swarm nodes,
    • in containerized deployments without an explicit DD_HOSTNAME, the Agent could refuse to start because the Docker hostname provider could no longer determine a hostname.
  • Add the macOS hardened-runtime Location Services entitlement (com.apple.security.personal-information.location) to signed Agent binaries in order to trigger the system location permission prompt properly.

Datadog Cluster Agent

Prelude

Released on: 2026-06-03 Pinned to datadog-agent v7.79.2: CHANGELOG.

Bug Fixes
  • Cluster Agent: Evaluate AppSec sidecar admission webhook match conditions against the deleted object for pod deletion requests.
  • Cluster Agent: Prevent disabled AppSec proxy injection cleanup from enabling the AppSec sidecar admission webhook.
v7.79.1

Agent

Prelude

Released on: 2026-05-28

Security Notes
  • Bump github.com/prometheus/prometheus to v0.311.4 to address CVE-2026-42151 and CVE-2026-42154.
Bug Fixes
  • Windows: Fix CD-ROM drives being monitored by the disk check since Agent 7.73.0. The diskv2 check now uses the Windows GetDriveType() API to properly detect and exclude CD-ROM drives, matching the behavior of the previous Python disk check. This fixes false alerts on system.disk.in_use for CD-ROM drives with inserted media.
  • Fix a bug in the workload autoscaling controller where annotation-only edits (e.g. autoscaling.datadoghq.com/preview) on a locally-owned DatadogPodAutoscaler were not picked up until the next .spec change or cluster-agent restart, because the controller gated re-sync on .metadata.generation (which annotations do not bump). Toggling burstable mode via the preview annotation now takes effect on the next reconcile.
  • MacOS agent GUI app needs to ignore SIGPIPE to avoid process termination.
  • On macOS, preserve user customizations to system-probe.yaml across Agent upgrades.
  • Fixed a bug on Windows where the NPM TCP failure rate could exceed 100% and climb indefinitely.

Datadog Cluster Agent

Prelude

Released on: 2026-05-28 Pinned to datadog-agent v7.79.1: CHANGELOG.

macOS Agent now system-wide LaunchDaemon; JMXFetch upgraded

This release14 featuresNew capabilities14 enhancementsImprovements to existing featuresAI-tallied from the release notes
v7.79.0

Agent

Prelude

Released on: 2026-05-20

Upgrade Notes
  • Upgraded JMXFetch to 0.52.0, which adds JMX metrics mappings for Generational Shenandoah GC and introduces the use_canonical_bean_name option to guarantee consistent key property ordering in bean names. See 0.52.0 for more details.
  • On macOS, the Agent now installs as a system-wide LaunchDaemon running under a dedicated _dd-agent service user instead of a per-user LaunchAgent. Existing per-user installations will need to uninstall and reinstall to adopt the new mode. The previous install script is preserved as install_mac_os_v1.sh for versions prior to 7.79.0.
New Features
  • Flares now include a connectivity/resolved_endpoints.txt file that lists the IP addresses each configured Datadog intake endpoint hostname resolves to at flare-generation time. This makes it straightforward to determine whether the Agent is using PrivateLink (private IPs) or the public Datadog intake.
  • Added a capacity-type:spot host tag on AWS EC2 Spot instances. The tag is collected from IMDS and added alongside the other EC2 instance info host tags when collect_ec2_instance_info is enabled.
  • Adds cluster agent processing of select actions on kubernetes resources
  • APM: Add a context-aware shutdown API to the trace agent, allowing callers to specify a timeout when waiting for the agent to stop gracefully.
  • Add a native Go core check for the Datadog CSI driver (datadog_csi_driver), replacing the Python OpenMetrics integration. The check scrapes the CSI driver's Prometheus endpoint and submits datadog.csi_driver.node_publish_volume_attempts.count and datadog.csi_driver.node_unpublish_volume_attempts.count as monotonic count metrics. Metric names, tags, and autodiscovery identifiers are unchanged; no user action is required.
  • Add DNS monitoring support on macOS using libpcap packet capture.
  • Add the comp/dataobs/queryactions agent component for Data Observability query actions. When enabled via data_observability.query_actions.enabled: true, the component subscribes to the DO_QUERY_ACTIONS Remote Configuration product and schedules a do_query_actions Python check to execute SQL queries against monitored Postgres instances on configurable intervals. Results are forwarded to the data-obs-intake.<site>/api/v2/query-actions event platform endpoint.
  • Add agent experimental check-config and agent experimental onboard commands that run a 6-stage validation pipeline on datadog.yaml without requiring a running agent: file permissions, YAML syntax (with line-level error messages), API key format, site/region validity, live API key validation (skippable with --no-api), and a product enablement summary. These commands are experimental and subject to change.
  • On macOS, the Agent now collects CPU L1/L2/L3 cache sizes, CPU package count, and hardware platform in host metadata.
  • Kata core check to gather kata metrics, see details - https://github.com/kata-containers/kata-containers/blob/main/docs/design/kata-2-0-metrics.md#metrics-architecture
  • The macOS install script now accepts DD_INFRASTRUCTURE_MODE to set the Agent's infrastructure_mode at install time.
  • Add support for Cloud Network Monitoring (CNM) on macOS via BPF filters.
  • The macOS install script now performs a system-wide installation by default. The Agent runs as a dedicated _dd-agent user via LaunchDaemon.
  • New gauge metric datadog.dogstatsd.offline_duration reports how long (in seconds) the DogStatsD server was offline between the previous shutdown and the current startup. Enable with telemetry.offlinereporter.enabled: true (disabled by default).
Enhancement Notes
  • Added support for all public registries to the K8s SSI gradual rollout feature.

    • The default list of Datadog registries is now:
      • gcr.io/datadoghq
      • docker.io/datadog
      • public.ecr.aws/datadog
      • datadoghq.azurecr.io
      • us-docker.pkg.dev/datadoghq/gcr.io
      • europe-docker.pkg.dev/datadoghq/eu.gcr.io
      • asia-docker.pkg.dev/datadoghq/asia.gcr.io
      • registry.datad0g.com
      • registry.datadoghq.com
  • Sends status updates for kubernetes actions through the EVP pipeline.

  • Add datadog-apm-library-nginx to the fleet installer so it is installed alongside the other APM libraries when APM instrumentation is enabled.

  • The cluster agent readiness probe now includes the admission controller webhook server. Newly started cluster agents will not be marked as ready until the webhook can serve requests, preventing missed pod mutations during rollouts.

  • Added new additional_metric_tags field to APM metrics payload to allow tracers to send customer configured span derived primary tags.

  • APM: Fetch Org Propagation Marker on startup to Org Propagation Guard. The trace-agent now fetches /api/v2/validate at startup to derive an Org Propagation Marker (OPM) and exposes it in the /info endpoint.

  • Agents are now built with Go 1.25.10.

  • Agents are now built with Go 1.25.9.

  • Bump rshell to v0.0.10 for the Private Action Runner. Shell commands now follow symlinks that cross between allowed roots and resolve host-mounted paths correctly in containerized deployments.

  • Bump rshell to v0.0.14.

  • Added internal telemetry counters to measure the impact of enabling auto_multi_line_detection by default. The counters track how many log lines would be combined and how many would risk truncation, without changing any log processing behavior.

  • system-probe: The discovery module (discovery.enabled) and system-probe-lite (discovery.use_system_probe_lite) are now enabled by default on Linux. When discovery is the only enabled system-probe module, system-probe-lite is automatically used to minimize resource usage. To disable discovery, set discovery.enabled: false in system-probe.yaml.

  • Add ECS Fargate task ARN to X-Datadog-Additional-Tags header on data-streams-message HTTP requests.

  • Dynamic Instrumentation: Add support for conditional probes via the when clause. Probes can now include equality conditions that compare captured variables against literal values (integers, floats, booleans, strings, and null). When a condition evaluates to false, the probe event is suppressed, reducing overhead for high-traffic instrumentation points.

  • Dynamic Instrumentation: Add support for probing Go generic functions. Snapshots and log probes now display concrete types for generic parameters.

  • Enables network monitoring for devices with infrastructure_mode: end_user_device.

  • When using RDS Aurora Autodiscovery, tags present on the cluster are now inherited by the instances. For example, if a cluster has the tag datadoghq.com/dbm: true<span class="title-ref">, all instances in that cluster will have </span><span class="title-ref">extra_dbm_enabled: true</span>`. Tags on the instances will override tags on the cluster.

  • Add SandboxId field to the workloadmeta structure. Update collectors (crio and containerd) accordingly.

  • The kubelet core check now reports container kubernetes.containers.cpu.requests, kubernetes.containers.cpu.limits, kubernetes.containers.memory.requests, and kubernetes.containers.memory.limits metrics using the live values from pod.status.containerStatuses[].resources when available, so the metrics reflect the effective runtime values after an in-place vertical resize. Resources declared only in the pod spec (for example GPUs or custom resources) are preserved, and clusters where the kubelet does not yet populate status.resources continue to report the spec values as before.

  • The logs agent now retries log payloads on HTTP 403 (Forbidden) responses instead of dropping them, when the endpoint's API key was resolved from a secrets backend. On 403, the agent triggers an asynchronous secrets refresh and retries the payload. This applies to the core logs agent, CWS security reporter, compliance reporter, and the event platform forwarder. Endpoints whose API key is not managed by the secrets backend retain the original drop behavior.

  • Hide DMG mount in MacOS agent installation process.

  • Send device metadata for devices monitored by Network Configuration Management.

  • NPM connection payloads now include a process_name:<name> tag identifying the process executable that owns each connection. The tag is populated from the process agent's process list and requires process_config.process_collection.enabled to be set to true.

  • Switch config implementation to an improved version by default. Can be disabled with the env var DD_CONF_NODETREEMODEL=viper, or the config setting conf_nodetreemodel: viper in datadog.yaml.

  • The OTel Agent now supports a standalone mode (DD_OTEL_STANDALONE=true) that runs without a co-resident core Datadog Agent. In standalone mode a new dogtelextension OpenTelemetry Collector extension provides Datadog Agent functionality directly.

  • OTLP ingest configuration keys now register explicit default values matching the upstream OpenTelemetry Collector defaults. Previously these keys were bound without defaults, which caused agent config and similar introspection commands to omit them. Runtime behavior is unchanged: only user-configured values are forwarded to the OTel Collector pipeline, so unconfigured settings continue to use the Collector's own built-in defaults.

    Notable default changes in pkg/config/config_template.yaml:

  • Added Translate, TranslateK8sObjects, and NewManifestCache to otlp/logs so exporters can share log translation and manifest deduplication logic without duplicating code.

  • Add private_action_runner.api_key_only_enrollment configuration flag to explicitly control Private Action Runner enrollment mode. When set to true, enrollment uses the API key only (no app key required, no auto-connections created). When false (default), the app key is required and connections are auto-created during enrollment.

  • The private action runner binary now has the CAP_NET_RAW capability.

  • The Private Action Runner default enabled actions now include runNetworkPath and runCommand.

  • The Private Action Runner now includes default enabled actions that are automatically allowed. To opt out, set private_action_runner.default_actions_enabled to false in datadog.yaml. This still requires explicit opt-in into the Private Action Runner feature.

  • Make app key optional during installation to prepare for app-key-less PAR enrollment.

  • Add private_action_runner.skip_connection_creation configuration flag to control auto-connection creation during Private Action Runner enrollment. When set to true, the runner skips creating connections during app-key enrollment. Defaults to false, which preserves the existing behavior of auto-creating connections.

  • Retry transactions on API key errors (HTTP 403 responses) when API key refresh is enabled via secrets management in the Agent configuration.

  • Bumped the Security Agent policies to v0.79.0

  • NDM: SNMP default scan is now enabled by default. Discovered SNMP devices will be automatically scanned to collect OID data. To disable, set network_devices.default_scan.enabled to false.

  • Upgrade OpenTelemetry Collector dependencies from v0.147.0 to v0.150.0 (core v1.53.0 to v1.56.0).

    Notable upstream changes:

    • The exporter.datadogexporter.DisableAllMetricRemapping feature gate has been promoted to beta (enabled by default). Metric remappings are now handled by the Datadog backend. If you experience issues, disable the gate with --feature-gates=-exporter.datadogexporter.DisableAllMetricRemapping and contact Datadog support.
    • Semantic conventions updated from v1.38.0 to v1.40.0.
    • The datadogextension now supports gateway_service and gateway_destination config fields for Fleet Automation gateway topology view.
    • Fix for use-after-free bug in quantile sketches when exporting ExponentialHistogram metrics with multiple attribute sets.
    • OTTL context setters (used by transform, filter, and tailsampling processors) now validate value types and return errors on type mismatches instead of silently ignoring them. Users with error_mode: propagate (the default for the transform processor) may see new errors if their OTTL statements had pre-existing type mismatches. Switch to error_mode: ignore to preserve the previous behavior while fixing the statements.

    See the full upstream changelogs: collector-contrib v0.150.0, collector core v0.150.0.

  • Add environment variable overrides to selectively keep infrastructure checks enabled in Windows containers. By default, the disk, network, winproc, file_handle, and io checks are still removed at startup for backward compatibility. Set DD_WINDOWS_HOST_METRICS=true to keep all infra checks, or use per-check variables (e.g. DD_WINDOWS_ENABLE_DISK_CHECK=true, DD_WINDOWS_ENABLE_IO_CHECK=true) to enable individual checks.

Known Issues
Deprecation Notes
  • The beta feature configuration option DD_APM_SPAN_DERIVED_PRIMARY_TAGS has been removed. The agent no longer supports customer configurable span derived primary tags. This feature is only available on tracers.
  • APM : Document that DD_APM_MAX_EPS is deprecated (legacy App Analytics APM events only) and does not affect trace or span volumes.
  • Per-user macOS Agent installations (LaunchAgent mode) are deprecated. Use the default system-wide installation going forward.
  • MapLogsAndRouteRUMEvents on the logs Translator is deprecated (abandoned RUM/OTel integration attempt).
Security Notes
  • Upgrade the Docker SDK dependency from github.com/docker/docker v28.5.2 to github.com/moby/moby v29 (moby/moby/api v1.54.1, moby/moby/client v0.4.0) to fix CVE-2026-34040 (High, CVSS 7.8) and CVE-2026-33997 (Medium, CVSS 8.1).
Bug Fixes
  • The api_server.request_duration_seconds internal metric now tags requests with the gorilla/mux route template (e.g. /{component}/status) instead of the raw request path. This prevents arbitrary user-provided path values from creating high-cardinality metric tags. Requests that do not match any registered route are tagged with unknown.

  • Adds a new tag 'is_physical_storage' to every 'system.disk.*' metric if 'tag_by_physical_storage' configuration option (defaults to false) is enabled. Emits a new set of metrics: 'system.disk.physical_total','system.disk.physical_used', 'system.disk.physical_free', 'system.disk.physical_utilized', and 'system.disk.physical_in_use' if 'collect_physical_metrics' configuration option (defaults to false) is enabled. Requires the Go disk check v2 (disk_check.use_core_loader: true). Linux only.

  • Fix span stats and priority sampling for Cloud Run job tasks by properly waiting for the trace agent shutdown sequence to complete, ensuring in-flight traces are flushed before the serverless function exits.

  • APM : Fix missing tracer language in stats aggregation key when the V1 stats path is enabled. This issue only affects users with the V1 feature flag enabled or using the 'convert-traces' flag.

  • APM: Fixed unnecessary CPU load on the core Agent in non-containerized environments by skipping container ID resolution (header parsing and cgroup lookups) in the trace API when not running in a container.

  • Dynamic Instrumentation: Fix a bug where evaluationErrors were reported in the wrong location in snapshot payloads, causing them to not appear properly in the UI.

  • Fix AKS cluster name parsing from kubernetes.azure.com/cluster label.

  • Fixes a bug where autodiscovered services were not being deleted if GetAuroraClustersFromTags or GetRdsInstancesFromTags returned no matches.

  • SNMP: Fix bandwidth usage rate metrics (snmp.ifBandwidthInUsage.rate and snmp.ifBandwidthOutUsage.rate) not being emitted when there are intermittent check failures.

  • Fix a concurrent map write crash in the config package when multiple goroutines call config getters with unknown keys simultaneously. This could cause the agent to crash with fatal error: concurrent map writes when Docker log collection with container_collect_all is enabled.

  • Fix a deadlock that could make the Agent become unresponsive after a remote configuration value was cleared.

  • Fixes a caching bug in dbm rds instance and aurora cluster autodiscovery. When service metatadata changed (DbName for example) the service check would not be updated with the new metadata if the service was already in the cache. Now the cached service is deleted and the updated service is added as a new check.

  • Fix a regression introduced in Agent 7.76 where anchored log_processing_rules (using ^ and $) stopped matching log lines. This was caused by the new default auto-multiline detection tagging path not trimming trailing whitespace from log content before forwarding it to processing rules.

  • Fixed a panic in the system-probe container store caused by gopsutil parsing malformed /proc/[pid]/stat files during process termination race conditions.

  • Fix agent status failing when the HA Agent feature is enabled. The status templates attempted to iterate over a struct with range, which is not supported by Go templates. The HA Agent Metadata section now renders correctly.

  • Fix IPv6 address formatting when constructing the Cluster Agent endpoint URL from Kubernetes service environment variables. IPv6 addresses are now properly wrapped in brackets (e.g. https://[fd38:552b:2959::4f4a]:5005 instead of https://fd38:552b:2959::4f4a:5005), which previously caused the remote tagger and other gRPC clients to fail with "too many colons in address" errors on IPv6-only clusters.

  • Fixed Oracle Data Guard metrics query that caused ORA-01873 (interval precision overflow).

  • Fix spurious warn log on otel-agent startup about conflicting dd_url and logs_no_ssl settings.

  • DD_PROXY_HTTP, DD_PROXY_HTTPS, HTTP_PROXY, HTTPS_PROXY, DD_PROXY_NO_PROXY, and NO_PROXY environment variables are now respected by the standalone OTel agent without requiring --core-config.

  • NTP: renames ntp.offset with the tag source:intake to ntp.intake_offset and removes the source:ntp tag from ntp.offset, restoring it to its pre-7.77.0 single-series behavior. This fixes false alerts on existing monitors querying ntp.offset without a tag filter.

  • OTel logs exported via the Datadog Exporter (otel_source:datadog_exporter) now correctly populate otel.event_name from the OTLP event_name field, and fall back to observed_time_unix_nano for the timestamp when time_unix_nano is unset (per the OTLP spec). Previously, both fields were missing for this ingestion path, causing OTel RUM events to be dropped or timestamped at the Unix epoch.

  • Fixed a bug (only present when deduplication is enabled) where SNMP devices loaded from the cache on agent restart were not registered immediately, causing them to be temporarily unavailable until the next discovery cycle completed. Cached devices are now registered right away and tracked for deduplication so that subsequent scans for the same physical device are correctly deduplicated.

  • Fixed an issue in SNMP autodiscovery where the IP processing counter was not reset immediately after processing, potentially delaying or preventing device registration when deduplication was enabled.

  • Windows: Fixed a remote update failure in datadog-installer when validating Agent domain accounts.

    When querying some domain account names, NetQueryServiceAccount can return NTSTATUS 0xC0000106 (STATUS_NAME_TOO_LONG) during gMSA detection. This status is now treated like STATUS_INVALID_ACCOUNT_NAME so the account is handled as a regular domain account instead of incorrectly failing the update.

Other Notes
  • The agent status output now displays uptime values greater than 24 hours in a days-based format (e.g., 23d2h54m59s) instead of the raw hour count (e.g., 554h54m59s).
  • Update agent-payload version to v5.0.189

Datadog Cluster Agent

Prelude

Released on: 2026-05-20 Pinned to datadog-agent v7.79.0: CHANGELOG.

New Features
  • Add AppSec injection support for ingress-nginx controllers. The Cluster Agent now automatically injects the Datadog nginx-datadog module into ingress-nginx controller pods, enabling AppSec protection without manual extraModules configuration. Configurable via admission_controller.appsec.nginx.init_image and admission_controller.appsec.nginx.module_mount_path.
  • Add spot scheduling. When enabled, the Cluster Agent assigns eligible workload pods to spot nodes and maintains configured percentage of spot replicas and minimum on-demand replica count. It automatically falls back to on-demand scheduling when spot pods remain pending longer than configured timeout.
  • Add namespace-level batch onboarding for workload autoscaling profiles. The Cluster Agent now discovers all workloads in namespaces labeled with autoscaling.datadoghq.com/profile=<profile-name> and automatically manages DatadogPodAutoscaler entries for them. Individual workloads can opt out by setting autoscaling.datadoghq.com/profile=excluded.
Enhancement Notes
  • The datadog-cluster-agent clusterchecks CLI command now displays check execution status for checks running on Cluster Level Check (CLC) runners, matching the node agent agent status collector output format. This includes OK/ERROR status, total runs, metric samples, events, service checks, average execution time, last execution date, last successful execution date, and last error message.
  • The cluster agent metadata payload now includes a clustercheck_integration_status field reporting check execution status (OK/ERROR) for cluster checks running on CLC runners. This enables the backend to populate datadog_agent_integration_status for cluster checks. The clustercheck_metadata field now also reports all instances for multi-instance checks and uses precomputed instance IDs for consistency.
  • Add OOTB CRD collection for Gateway API, service mesh (Istio, Envoy Gateway, Traefik, Linkerd, Consul, Kuma), and ingress controller (NGINX, Traefik, Kong, HAProxy) custom resources. Three new per-family config flags allow operators to enable collection independently: orchestrator_explorer.custom_resources.ootb.gateway_api, orchestrator_explorer.custom_resources.ootb.service_mesh, and orchestrator_explorer.custom_resources.ootb.ingress_controllers (all default to false).
Security Notes
  • Upgrade the Docker SDK dependency from github.com/docker/docker v28.5.2 to github.com/moby/moby v29 (moby/moby/api v1.54.1, moby/moby/client v0.4.0) to fix CVE-2026-34040 (High, CVSS 7.8) and CVE-2026-33997 (Medium, CVSS 8.1).
Bug Fixes
  • Fix a deadlock in the orchestrator check that caused Cancel to hang indefinitely, leaking goroutines and preventing the check from being rescheduled. The issue occurred when TerminatedResourceBundle.Disable tried to flush manifests through a channel whose consumer goroutine had already stopped.
  • Honor label and annotation as tags configuration options for all Kubernetes resources.
v7.78.4

Agent

Prelude

Released on: 2026-05-14

Security Notes
  • Upgrade github.com/moby/spdystream to 0.5.1 to address CVE-2026-35469. In versions 0.5.0 and below, the SPDY/3 frame parser does not validate attacker-controlled counts and lengths before allocating memory. Three allocation paths are affected: the SETTINGS frame entry count, the header count in parseHeaderValueBlock, and individual header field sizes — all read as 32-bit integers and used directly as allocation sizes with no bounds checking. Because SPDY header blocks are zlib-compressed, a small on-the-wire payload can decompress into large attacker-controlled values. A remote peer that can send SPDY frames to a service using spdystream can exhaust process memory and cause an out-of-memory crash with a single crafted control frame. This issue has been fixed in version 0.5.1.

Datadog Cluster Agent

Prelude

Released on: 2026-05-14 Pinned to datadog-agent v7.78.4: CHANGELOG.

Agent

Prelude

Released on: 2026-05-07

Security Notes
  • Upgrade go.opentelemetry.io/otel/sdk to v1.43.0 to address CVE-2026-39883, a PATH-hijacking vulnerability in the OpenTelemetry Go SDK's host detection on BSD and Solaris platforms (the SDK invoked the kenv command without an absolute path). The Datadog Agent's primary supported platforms (Linux, Windows, macOS) are not affected at runtime, but the dependency is upgraded to keep the shipped binary free of the vulnerable code.

Datadog Cluster Agent

Prelude

Released on: 2026-05-07 Pinned to datadog-agent v7.78.3: CHANGELOG.

Agent

Prelude

Released on: 2026-04-29

Enhancement Notes
  • Adds datadog-agent otel command to install/remove DDOT from an OCI package.
Deprecation Notes
  • The Install-Datadog.ps1 PowerShell script is deprecated and will be removed in a future version. Please use datadog-installer.exe or the MSI installer instead. Visit the in-app installation guide for complete up-to-date installation instructions.
Bug Fixes
  • The signature check in Install-Datadog.ps1 is now more accomodating to formatting variations in the CN field. Refer to the Agent Data Security page for more information on validating signatures.
  • Fixes user-defined network_path.collector.filters being silently dropped when infrastructure_mode is set to end_user_device. Custom filters are now correctly appended to the built-in EUDM defaults.

Datadog Cluster Agent

Prelude

Released on: 2026-04-29 Pinned to datadog-agent v7.78.2: CHANGELOG.

Agent

Prelude

Released on: 2026-04-23

Enhancement Notes
  • The Agent's embedded Python has been upgraded from 3.13.12 to 3.13.13
  • Agents are now built with Go 1.25.9.
Bug Fixes
  • Fix missing signature on macOS Agent packages
  • Fix the system-probe SELinux policy module failing to load on RHEL 7 with policydb module version 21 does not match my version range 4-19. The module is now compiled against modular policy version 19, which is the highest version supported by RHEL 7 and is backward-compatible with newer RHEL releases.
  • Add logic to include integrations that do not have a manifest.json file in the Agent.
  • Adds the tasks/agent.py file to the list of files used to compute the global omnibus cache.

Datadog Cluster Agent

Prelude

Released on: 2026-04-23 Pinned to datadog-agent v7.78.1: CHANGELOG.

Bug Fixes
  • Fixed a Cluster Agent issue where container-targeted APM library injection could mount a tracing library into all application containers in a pod instead of only the annotated container.

Agent

Prelude

Released on: 2026-04-15

Upgrade Notes
  • APM OTLP: Changed attribute precedence behavior when looking up OpenTelemetry semantic convention attributes that have multiple equivalent keys (e.g., http.status_code vs http.response.status_code, deployment.environment vs deployment.environment.name).

    Previous behavior: When both old and new semantic convention keys existed, the lookup would check ALL keys in span attributes before checking ANY key in resource attributes. So whichever key appeared in span attributes would win, regardless of which key was in resource attributes.

    New behavior: The lookup now uses a per-concept precedence order. For each semantic concept, the registry defines an ordered list of attribute keys; the first key that has a value is returned. The precedence order (which key takes priority) depends on the concept and may prefer either the newer or the older convention key. Span vs resource precedence (which map is checked first) is unchanged and still depends on the function.

    Who is affected: This change only affects users who have the same concept represented by different convention-version keys in span vs resource attributes. The returned value may now come from a different key than before, according to the concept's precedence order.

    This is an uncommon configuration since most instrumentation libraries use consistent semantic convention versions across span and resource attributes.

New Features
  • Allows the Agent to get an API key in exchange for an AWS cloud authorization proof. This allows you to use your AWS credentials against Datadog and removes the need for you to manage an API key. More details can be found here: https://docs.datadoghq.com/account_management/cloud_provider_authentication/

  • The autoscaling vertical controller now supports in-place vertical pod resizing.

  • Add a new configuration provider, which schedules new instances of KSM checks to generate metrics from CustomResourceDefinitions.

    This new provider works with the kube_crd listener which listens for CustomResourceDefinitions created on the cluster and triggers a new autodiscovery-service for each one.

    This new configuration provider must use the standard kubernetes GroupVersionKind format in its AdvancedADIdentifier section to apply to a matching CustomResourceDefinition.

    The rest of the configuration is a standard KSM configuration instance.

  • CNM - Add 7 per-connection TCP congestion signals: rto_count (RTO loss events), recovery_count (fast recovery events), reord_seen (send-side reordering), rcv_ooopack (receive-side out-of-order packets), delivered_ce (ECN CE-marked segments), ecn_negotiated (ECN negotiation status), and probe0_count (zero-window probes). Collected via eBPF on CO-RE and runtime-compiled tracers, Linux only.

  • dd-procmgrd can now read process definitions and manage child process lifecycles with graceful shutdown.

  • dd-procmgrd now supervises managed processes with configurable restart policies, exponential backoff, and burst limiting.

  • dd-procmgrd can now manage the DDOT (Datadog Distribution of OpenTelemetry) collector process via a dual-mode mechanism. When a processes.d/datadog-agent-ddot.yaml config is present, dd-procmgrd takes over DDOT lifecycle management; otherwise the existing systemd unit manages it directly.

  • Automatic SBOM generation for running containers via system-probe

  • Runtime usage tracking - identifies which files and packages are actively accessed by running processes

  • Security enrichment - flags SUID binaries and processes running as root

  • gRPC streaming from system-probe to core agent for efficient SBOM forwarding

  • Automatic CWS policy generation based on running container SBOMs.

  • On Windows, the APM SSI installer now automatically enables system-probe to report injection telemetry from the ddinjector driver.

  • Kubernetes pod check annotations: Invalid JSON in pod check annotations (ad.datadoghq.com/<container>.checks) now produces a clear error message in the "Configuration Errors" section of agent status. A new CLI command agent validate-pod-annotation validates annotation JSON from a file or stdin and exits with an error on invalid syntax, so you can catch mistakes before applying annotations to pods.

Enhancement Notes
  • The agent now supports explicitly set cluster names that start with a digit or contain underscores.
  • Add source and provider fields to rtloader API and add integration_security configuration properties.
  • secrets-generic-connector: Allow configuration of X-Vault-AWS-IAM-Server-ID header for Hashicorp Vault AWS authentication method. Helps to prevent different types of replay attacks.
  • APM: When a 403 is received from the backend, trigger an API Key refresh, and retry the payload submission.
  • Secret Generic Connector: The Azure Key Vault backend now supports Service Principal authentication with client secret or client certificate, in addition to Managed Identity. Credentials are configured under the azure_session block (azure_tenant_id, azure_client_id, azure_client_secret or azure_client_certificate_path).
  • Agents are now built with Go 1.25.8.
  • dd-procmgr: Add CLI for the dd-procmgrd process manager. Processes are addressable by name or UUID.
  • dd-procmgrd: Add gRPC server over Unix socket with read-only RPCs (List, Describe, GetStatus) for querying managed process state.
  • dd-procmgrd: Add multi-process startup ordering via after/before config fields with topological sort and reverse shutdown order.
  • dd-procmgrd: Add write RPCs (Create, Start, Stop, ReloadConfig, GetConfig) for runtime control of managed processes.
  • The disk check now falls back to lsblk when blkid fails or returns no labels for disk label tagging. This ensures label and device_label tags are present on disk metrics even when the agent runs as a non-root user, since lsblk reads from sysfs and does not require elevated privileges.
  • Document kubernetes_use_endpoint_slices flag
  • Add X-Datadog-Additional-Tags header with hostname and agent version to data-streams-message HTTP requests.
  • DSM: The kafka_actions check now automatically inherits Schema Registry configuration (URL, credentials, TLS, OAuth) from the kafka_consumer integration, enabling schema registry support without additional configuration.
  • DDOT now sets deployment_type on the Datadog extension to daemonset by default, or gateway when Gateway mode is enabled.
  • The podman_db_path configuration option now accepts a comma-separated list of paths to support monitoring containers from multiple users simultaneously (e.g. root and rootless users). Example: podman_db_path: "/var/lib/containers/storage/db.sql,/home/myuser/.local/share/containers/storage/db.sql". When podman_db_path is not set, the Agent automatically discovers Podman databases for the root user and for all users under /home/. Log collection (logs_config.use_podman_logs) is also updated to work correctly with both explicit multi-path configuration and auto-discovery.
  • FIPS variants of the ddot-collector and agent -full images are now published.
  • Remote Agent Management is now enabled by default on FIPS environments when Remote Configuration is explicitly enabled.
  • The resource discovery agent (system-probe-lite) now wraps system-probe, acting as a loader for it. system-probe-lite will automatically fallback to system-probe when one of the following is true:
    • <span class="title-ref">`discovery.enabled</span> is set to false
    • discovery.useSystemProbeLite is set to false (the default).
    • Any other non-discovery feature of system-probe is enabled.
  • Bumped the Security Agent policies to v0.78.0
Security Notes
  • The CMD API gRPC server is now configured to require client certificates (mTLS).
Bug Fixes
  • APM: Fix an issue where SQL stats group resources longer than 5000 characters were truncated before obfuscation, causing the trace-agent to fail to parse mid-token fragments and log an error instead of correctly obfuscating the query.

  • Use atomic file replacement (write to temp file then rename) when writing APM workload selection policy files, preventing concurrent readers from seeing partially-written data.

  • Fixed a race condition in the logs auditor where Flush() could write a stale registry to disk during a transport restart. The auditor now drains all pending payloads from its input channel before flushing, ensuring file offsets are up to date and reducing duplicate log processing after a TCP-to-HTTP transport switch.

  • [DBM] Bump go-sqllexer to v0.2.1 to fix the following bugs:

    • Fixes table name metadata extraction to correctly collect all table names from comma-separated table lists (e.g., SELECT * FROM t1, t2).
  • The diagnose command now returns an error if an API key is not configured.

  • Fixes panic when advanced dispatching is disabled when KSM Core is ran as a cluster check.

  • Fix support of Kafka actions for configurations where kafka_connect_str is a list.

  • Fixed a bug in the disk Go check (diskv2) where partition enumeration could hang indefinitely on Windows when an orphaned or offline volume is present on the system. The check now applies the configured timeout (default 5s) to partition discovery and guards against spawning duplicate goroutines on subsequent check runs, preventing permanent worker starvation, goroutine buildup, and high CPU utilization.

  • The process check now reports the correct container host type on ECS Managed Instances when the agent runs as a daemon.

  • Fixed kafka actions failing to match the local kafka_consumer integration when the bootstrap_servers tag exceeds the 200-character backend tag limit. Long broker lists (e.g. 3+ MSK brokers) are now truncated to match the backend's tag normalization.

  • APM: Fix base_service tag being missed on a subset of APM stats matching span.kind=server.

  • Fix kube_distribution tag value detection logic by analyzing node system info first.

  • Fixed a memory leak in the kubernetes_state_core check caused by orphaned reflector goroutines in the KSM store during rebuilds. This led to unbounded memory growth and potential OOM kills.

  • The Go network v2 check now correctly monitors the host network namespace when running in a container, similar to the Python version's behavior.

  • Fixes system.net.* metrics when the Agent runs in Docker with the host's procfs mounted (for example /host/proc with host PID namespace). The Go network check (network v2) now reads /proc/1/net/dev under that mount so interface stats match the host; previously /proc/net/dev could resolve in the container network namespace and report wrong or missing traffic (regression in Agent 7.73+).

  • Fixed a race condition in the workloadmeta process collector where a containerized process could be permanently stuck with an empty container ID if it was collected before the container runtime reported the PID-to-container mapping.

  • Fixed a bug in the kubeapiserver check where the eventText length was reported as 0 when it did not fit in the event bundle.

  • The API server now logs errors from srv.Serve that were previously silently discarded.

  • When a multiline log processing rule has a pattern that never matches, the logs agent now sends lines individually instead of joining all lines into a single oversized message. Normal multiline aggregation begins once the pattern matches for the first time.

  • Fixed the network check (v2) ignoring the combine_connection_states configuration option. When set to false, the check now emits granular per-state TCP metrics (e.g. system.net.tcp4.close_wait, system.net.tcp4.syn_sent) instead of only the combined ones (e.g. system.net.tcp4.closing, system.net.tcp4.opening), restoring parity with the previous Python-based network check.

  • Fixes a bug in the Network Configuration Management (NCM) module where the SSH Timeout settings were parsed as nanoseconds instead of seconds. This issue caused SSH sessions to time out prematurely, leading to errors like:

    Error running check: failed to connect to 192.168.0.1:22: dial tcp 192.168.0.1:22: i/o timeout
  • Fixed the Datadog Agent installer on Windows: when DD_PRIVATE_ACTION_RUNNER_ENABLED=true is set without an explicit DD_PRIVATE_ACTION_RUNNER_ACTIONS_ALLOWLIST, the Private Action Runner now defaults to com.datadoghq.script.runPredefinedPowershellScript on Windows and com.datadoghq.script.runPredefinedScript on Linux/macOS.

  • Preserve odbc.ini and odbcinst.ini across Fleet Automation upgrades on Linux.

  • Add missing node name to the manifests for Kubernetes resources in the OTEL logs agent exporter.

  • With systemd, the system-probe service now checks environment variables for configuration even if system-probe.yaml does not exist.

  • Fixed an issue on Windows where Cloud Network Monitoring reported TCP failure rates greater than 100%. The Windows kernel driver can report a TCP failure (reset, timeout, or refused connection) without also setting the flow-closed flag. The agent now correctly marks any connection with a TCP failure as closed.

  • Fixed discovery of Windows processes to identify reused PIDs between process snapshots and correctly track these processes.

Other Notes
  • The agent status output and process-agent endpoint list now display only the last 4 characters of the API key (previously 5), aligning with the Datadog UI.
  • Added functions to support delegated authentication with the agent in order to exchange AWS proofs for API keys for use by the agent. This does not actually enable this functionality yet.
  • Add metric origin for Dell Powerflex. Fix metric origins for Control-M and Prefect.

Datadog Cluster Agent

Prelude

Released on: 2026-04-15 Pinned to datadog-agent v7.78.0: CHANGELOG.

New Features
  • Added an admission controller connectivity probe that periodically verifies the admission webhook is reachable from the Kubernetes API server. When a connectivity issue is detected, the probe logs environment-specific guidance for EKS, GKE, and AKS. Probe results are visible in the agent status output under the Admission Controller section. The probe is disabled by default and can be enabled by setting admission_controller.probe.enabled to true. The probe uses dry-run ConfigMap creation requests in the cluster agent's namespace.
  • Add Remote Configuration status section to datadog-cluster-agent status output and flares. This displays whether RC is enabled for the organization, whether the API key is authorized for Remote Configuration, and any last errors, matching the node agent's existing behavior.
Enhancement Notes
  • Configurable support for TLS communication between the sidecar Agent and the Cluster Agent via the agent-sidecar mutation webhook. Requires elevated permissions for Cluster Agent to copy the certificate authority to the target namespace as a secret.
  • Single Step Instrumentation volumes are now mounted as read-only to prevent accidental writes to SSI artifacts.

Agent

Prelude

Released on: 2026-04-08

Bug Fixes
  • Fixes an issue where Cloud Network Monitoring would not resolve NAT'd cluster IPs when using Cilium to replace kube-proxy.

Datadog Cluster Agent

Prelude

Released on: 2026-04-08 Pinned to datadog-agent v7.77.3: CHANGELOG.

Agent

Prelude

Released on: 2026-04-01

Enhancement Notes
  • Hide GUI app by default for MacOS agent per-user install.
  • Windows: Add PAR self-enrollment to installer.
Bug Fixes
  • Fixes Workload Protection raw-packet eBPF programs when multiple packet filters are compiled together. The generated assembly reused register R8 both as the event pointer expected by the filter chain and to hold immediate values, which corrupted the pointer and caused the kernel BPF verifier to reject the program. The code now uses a separate register for those immediates so the pointer is preserved across filters.
  • Workload Protection: resolves an issue in in-kernel cgroup tracking, enabling packet filtering to be correctly applied to containers.

Datadog Cluster Agent

Prelude

Released on: 2026-04-01 Pinned to datadog-agent v7.77.2: CHANGELOG.

Agent

Prelude

Released on: 2026-03-24

Enhancement Notes
  • Agents are now built with Go 1.25.8.
Bug Fixes
  • Fixed a bug introduced in 7.77.0 that prevents system-probe from starting on Fargate environments when Workload Protection is enabled
  • Fixed a command injection vulnerability in the Private Action Runner's inline PowerShell script execution. Parameter values are now assigned as PowerShell single-quoted string literals in a preamble instead of being substituted directly into the script body, preventing arbitrary code execution via crafted parameter inputs.

Datadog Cluster Agent

Prelude

Released on: 2026-03-24 Pinned to datadog-agent v7.77.1: CHANGELOG.

Agent

Known Issues
  • A bug introduced in this release prevents system-probe from starting on Fargate environments when Workload Protection is enabled. There is currently no workaround and the recommendation at this time is to downgrade to Agent v7.76.3 or upgrade to v7.77.1 when it becomes available.
Prelude

Released on: 2026-03-18

Upgrade Notes
  • APM OTLP: The datadog.* namespaced span attributes are no longer used to construct Datadog span fields. Previously, attributes like datadog.service, datadog.env, and datadog.container_id were used to directly set corresponding Datadog span fields. This functionality has been removed and the Agent now relies solely on standard OpenTelemetry semantic conventions.

    Exceptions:

    The configuration option otlp_config.traces.ignore_missing_datadog_fields (and corresponding environment variable DD_OTLP_CONFIG_IGNORE_MISSING_DATADOG_FIELDS) is deprecated and no longer has any effect. The Agent now always uses standard OTel semantic conventions.

    Migration: If you were using datadog.* attributes, switch to the standard OpenTelemetry semantic conventions:

    • datadog.serviceservice.name
    • datadog.envdeployment.environment.name (OTel 1.27+) or deployment.environment
    • datadog.versionservice.version
    • datadog.container_idcontainer.id

    Who is affected: Users who explicitly set datadog.* attributes (other than datadog.host.name and datadog.container.tag.*) in their OpenTelemetry instrumentation to override default field mappings. Users relying solely on standard OpenTelemetry semantic conventions are not affected.

New Features
  • Add dd-procmgrd, a minimal Rust daemon for the Datadog process manager. The daemon starts, logs, and waits for a shutdown signal. It does not provide user-facing functionality.
  • Add a new listener based on all Custom Resource Definitions (CRDs) found on the cluster.
  • Logs pipeline failover: Added automatic failover capability to prevent log loss when compression blocks pipelines. When a pipeline becomes blocked during compression, log messages are automatically routed to healthy pipelines. N router channels (one per pipeline) distribute tailers via round-robin, each with its own forwarder goroutine that handles failover independently across all pipelines. Enable with logs_config.pipeline_failover.enabled: true (default: false). When all pipelines are blocked, backpressure is applied to prevent data loss.
  • The system memory check on Linux can now collect memory pressure metrics from /proc/vmstat to help detect memory pressure before OOM events occur. To enable, set collect_memory_pressure: true in the memory check configuration. New metrics: system.mem.allocstall (with zone tag), system.mem.pgscan_direct, system.mem.pgsteal_direct, system.mem.pgscan_kswapd, system.mem.pgsteal_kswapd.
  • APM: Add support for span-derived primary tags in APM stats aggregation. This allows configuring tag keys via apm_config.span_derived_primary_tags that will be extracted from span tags and used as additional aggregation dimensions for APM statistics.
  • APM: Add initial support for converting trace payload formats to the new "v1.0" format. This feature is disabled by default but can be enabled by adding the feature flag "convert-traces" to apm_config.features. It is not recommended to use this flag without direction from Datadog Support.
  • Integrate the Private Action Runner into the Datadog Cluster Agent.
  • The Private Action Runner (PAR) now runs in the Datadog Cluster Agent with improved identity management for Kubernetes environments. PAR identity (URN and private key) is now stored in a Kubernetes secret and shared across all DCA replicas using leader election. The leader replica handles enrollment and secret creation, while follower replicas wait for and read the shared identity. This enables multiple DCA replicas to execute PAR tasks using a single cluster identity, eliminating the need for per-replica enrollment.
  • Add a Windows PowerShell example config for private action runner scripts.
  • APM: Add image_volume-based library injection as an alternative to init containers and csi driver (experimental). Available only for Kubernetes 1.33+. This provides faster pod startup.
  • Autodiscovery template variables are now supported in ad.datadoghq.com/tags and ad.datadoghq.com/<container>.tags Kubernetes pod annotations. Template variables are resolved at runtime, enabling dynamic tagging based on pod and container metadata. This allows centralized tag configuration that applies to all checks, logs, and traces without hardcoding pod-specific values.
  • Start the Windows Private Action Runner service alongside the Agent when private_action_runner.enabled is set in datadog.yaml.
  • On Windows, the private action runner binary is now included in the MSI installer and registered as the datadog-agent-action Windows service. The service is installed as demand-start with a dependency on the main Agent service, and its credentials and ACLs are managed alongside the other Agent services during install, upgrade, and repair.
  • Add runPredefinedPowershellScript action to the Private Action Runner on Windows. This action allows running predefined PowerShell scripts (inline or file-based) with optional parameter templating, JSON schema parameter validation, environment variable allowlisting, configurable timeouts, and a 10 MB output limit.
  • On Windows, the Agent stops the private action runner service during MSI upgrades and fleet-driven stop-all operations so it is shut down alongside the Agent.
Enhancement Notes
  • The Agent's embedded Python has been upgraded from 3.13.11 to 3.13.12.

  • Add ntp.offset metric with source:intake tag to monitor clock drift using Datadog intake server timestamps. Original ntp.offset metric calculated from an NTP server is now tagged source:ntp.

  • As of Kubernetes version 1.33, the Endpoint API object has been deprecated in favor of EndpointSlice. Autodiscovery now supports the use of an EndpointSlice listener and provider to collect endpoint checks. To enable this feature, set kubernetes_use_endpoint_slices to true in your Datadog Agent configuration.

  • Add bucket label to image_resolution_attempts telemetry to track gradual rollout progress.

  • Added a private action runner bundle that exposes the Network Path traceroute functionality through the getNetworkPath action.

  • Sends telemetry for synthetics tests run on the agent, including checks received, checks processed, and error counts for test configuration, traceroute, and event platform result submission.

  • Added support for two new configurations for tag-based gradual rollout in Kubernetes SSI deployments. The gradual rollout can be configured using the following parameters:

    • DD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_GRADUAL_ROLLOUT_ENABLED: Whether to enable gradual rollout (default: true)

    • DD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_GRADUAL_ROLLOUT_CACHE_TTL: The cache TTL duration for the gradual rollout image cache (default: 1h)

      • This cache is used to store the mapping of mutable tags to image digest for the gradual rollout, and setting this TTL helps prevent the image resolution from becoming stale.
  • Agent metrics now include a connection_type tag with a value of tcp, uds, or pipe for lib-to-agent communications.

  • Automatically collect the team tag when a Kubernetes resource has a team label or annotation and explicit team tag extraction is not configured.

  • Enables the agent to support built-in credentials like IRSA for AWS cloud environments.

  • Bump go-sqllexer to v0.1.13, improving SQL obfuscation performance and fixing incorrect tokenization of multi-byte UTF-8 characters (e.g., CJK characters, full-width punctuation).

  • Agents are now built with Go 1.25.7.

  • NDM: Cisco SD-WAN interface metadata now includes the is_physical field to distinguish physical from virtual interfaces (loopback, tunnel). cEdge interfaces also include the type field with the IANA interface type number.

  • In the Cluster Autoscaling controller, use Kubernetes client update instead of patch.

  • On ECS Managed Instances, detect hostname from IMDS when the agent runs in daemon mode.

  • On ECS Managed Instances with daemon scheduling, the agent uses ECS_CONTAINER_METADATA_URI_V4 environment variable as a fallback signal for v4 availability.

  • Expose a new metric kube_apiserver.api_resource that holds the name, kind, group, and version of all known cluster-wide (non namespaced) resources on the cluster.

  • Add new DDOT feature gate 'exporter.datadogexporter.DisableAllMetricRemapping' to disable all client-side metric remapping.

  • Increases the reliability of namespaceLabelsAsTags and namespaceAnnotationsAsTags for new pods by caching the last seen namespace metadata.

  • Added a new, optional configuration setting for journald logs: default_application_name. If set to a non-empty string, the value will replace "docker" as the default application name for contained based journald logs. If set to an empty string, the application name will be determined by the systemd journal fields, like all non-container based journald logs.

  • Simplified location permission detection on MacOS by removing the first detection with polling at the time of app startup. The permission detection now happens only at the time of WLAN data collection.

  • Use config flag 'request_location_permission' in WLAN config to gate location permission request on MacOS

  • Added the enable_otlp_container_tags_v2 feature flag, which may reduce the Agent's outgoing traffic when ingesting OTLP traces from containerized applications.

    However, the flag introduces some breaking changes:

    • container tags on the new spans can no longer be queried as span attributes (with @);
    • using the k8s.pod.uid attribute as a fallback container ID is no longer supported;
    • disabling the infraattributes processor in DDOT trace pipelines will prevent automatic container tag detection.
  • The datadog.yaml configuration file now includes a commented-out private_action_runner section on all platforms.

  • The Private Action Runner now supports Datadog's secret management features. It can now resolve secrets using the ENC[...] notation in configuration files, supporting all secret backends via secret_backend_type and secret_backend_config settings.

  • Private Action Runner now supports running as a Windows service via Service Control Manager (SCM).

  • Bumped the Security Agent policies to v0.77.0

  • SNMP interface metadata now includes type (IF-MIB ifType) and is_physical fields. The is_physical field is set to true for physical ethernet interface types (ethernetCsmacd, fastEther, fastEtherFX, gigabitEthernet).

  • Add support for unconnected UDP sockets in the SNMP corecheck. Automatically fallback to unconnected UDP sockets if the connected UDP socket times out.

  • APM: Added a new health metric, datadog.trace_agent.receiver.payload_timeout, to track incoming trace payload timeouts caused by client connection closures or middleware timeouts.

  • Upgraded the Datadog Agent Windows installer from WiX 3 to WiX 5.

  • Reports telemetry from the Windows Injector, enabled by default. Disable this feature by setting injector.enable_telemetry=false in system-probe.yaml when running system-probe.

  • Add Windows version information to the Private Action Runner executable. The version info is now visible in Windows Explorer file properties.

  • Added a telemetry metric to track pending events in workloadmeta: "workloadmeta.pending_event_bundles".

  • Avoid blocking workloadmeta collectors when streaming events to remote agents.

Deprecation Notes
  • GPUm: renamed metrics gpu.process.{encoder,decoder}_utilization to gpu.process.{encoder,decoder}_active for consistency with the 'active' suffix in the rest of the GPUm metrics
Security Notes
  • Oracle check: PDB names in ALTER SESSION SET CONTAINER statements are now properly quoted to prevent SQL injection.
  • The Jetson integration now validates the tegrastats_path configuration option to prevent command injection. The path must be absolute and cannot contain shell metacharacters or whitespace.
Bug Fixes
  • APM: Fix panic that could occur when decoding malformed v1.0 trace payloads.
  • APM: Correctly mark traces as probability sampled when using the trace V1 format. APM: Fix issue where v1 trace writer might not flush traces during an agent shutdown.
  • The container and process discovery checks are now disabled when the process check is enabled for service discovery.
  • Detect correct launch type for ECS Managed Instances when running in daemon mode.
  • Fixed a minor but persistent memory leak in the logs endpoint diagnostic behavior.
  • Fixes an issue where agent check --flare created the checks directory with 0000 permissions, preventing check output files from being written. The directory is now created with 0750 permissions.
  • Changed integration log file behavior to delete and recreate instead of truncating. This should help prevent duplicate and missing logs from integrations.
  • Fixes using ReplicaSet creation time for rollout duration, because rollbacks reuse existing ReplicaSets, causing durations to show as hours/days instead of the actual rollback time. The fix tracks revision annotation changes and resets the start time to now when a rollback is detected.
  • Oracle check: Fix a bug where custom queries accumulated metrics across iterations, causing metrics from earlier queries to be re-sent with each subsequent query in the same check run.
  • Oracle check: Fix potential panic in sendMetric when the sender or metric function cannot be resolved.
  • Oracle check: Fix custom query error accumulation so that type errors from earlier queries are no longer silently discarded.
  • Oracle check: Report a clear error when a custom query returns a NULL value for a metric column instead of an "UNKNOWN" type error message.
  • Oracle check: Detect column count mismatches in both directions (too many or too few) between custom query results and configured column mappings.
  • Oracle check: Remove redundant GetSender call in custom query handling in favor of the existing commit helper.
  • Oracle check: Replace per-call map allocations with switch statements in custom query metric helpers for improved performance.
  • Fixed a bug where log lines exactly at the logs_config.max_message_size_bytes limit (default 900KB) were incorrectly marked as truncated. This caused the ...TRUNCATED... marker to appear in logs that fit within the size limit, and incorrectly marked the subsequent log line as a truncated remainder. Additionally, improved truncation detection by extending the FrameMatcher interface to explicitly signal when content is truncated, ensuring consistent truncation state across the framer and handler components.
  • Fixes a bug in the admission controller webhook that allowed admission to re-run for pods that already had APM injection in image-volume mode.
  • Refined location permission checks to avoid unnecessary system prompt. Added prevention for possible installation conflict between per-user and system-wide installations.
  • Fix data race in opentelemetry-mapping-go/inframetadata.Reporter which could cause a crash with error message "concurrent map iteration and map write".
  • OTLP logs now support array type attributes. Arrays containing primitive values or nested maps are now correctly preserved in the log output.
  • Align Private Action Runner configuration keys and log guidance to the private_action_runner.* snake-case names.
  • Fix the private action runner PowerShell example config not being installed on Windows. The file is now correctly placed at C:\ProgramData\Datadog\private-action-runner\powershell-script-config.yaml.
  • Fix process collection to detect command line changes for processes with the same PID and creation time by hashing the command line.
  • Fixed a bug where tailing UTF-16 encoded log files (UTF-16-LE or UTF-16-BE) could produce mojibake (garbled text) when log lines exceeded the configured logs_config.max_message_size_bytes limit (default 900KB). The truncation was performed at the byte level without respecting 2-byte UTF-16 character boundaries, which could split a character in half and produce Unicode replacement characters (U+FFFD) after decoding. The framer now aligns the truncation limit to a 2-byte boundary for UTF-16 encodings, ensuring that truncated frames always contain valid UTF-16 data.
Other Notes
  • Add metrics origins for Pinot integration.

Datadog Cluster Agent

Prelude

Released on: 2026-03-18 Pinned to datadog-agent v7.77.0: CHANGELOG.

New Features
  • Add APM tracing instrumentation to the Datadog Cluster Agent for improved observability and debugging in production environments. When enabled, the Cluster Agent emits APM traces for cluster check dispatching and rebalancing operations, surfacing patch failures and rebalancing decisions as span tags.
Enhancement Notes
  • Reduce admission controller downtime during certificate rotation.
  • Add the ability to collect NodeClasses EKS Auto Mode custom resources (eks.amazonaws.com API group) by default.
  • Experimental: Adds support for collecting force-deleted pods in the orchestrator check using orchestrator_explorer.terminated_pods_improved.enabled.

Agent

Prelude

Released on: 2026-03-09

Security Notes
  • Bump github.com/cloudflare/circl to fix v1.6.3 to fix CVE-2026-1229.
  • Fixed a limited out-of-bounds memory read and DoS vulnerability in Windows kernel driver while handling TLS traffic. The host must have the ddnpm kernel driver service running, by having system_probe_config and network_config enabled, to be affected. This configuration is not enabled by default. Query with PowerShell: Get-Service ddnpm Query with command prompt: sc query ddnpm
Bug Fixes
  • Fixed IPv6 address matching logic that caused network traffic to be tracked incorrectly. Fixed failed classification of HTTP DELETE requests. Added additional memory handling and overflow safety checks.

Datadog Cluster Agent

Prelude

Released on: 2026-03-09 Pinned to datadog-agent v7.76.3: CHANGELOG.

Agent

Prelude

Released on: 2026-03-05

Bug Fixes
  • The infra_mode tag is now correctly added to system.cpu.user on Windows when infrastructure_mode is not set to "full", matching the behavior of the Linux cpu check.

Datadog Cluster Agent

Prelude

Released on: 2026-03-05 Pinned to datadog-agent v7.76.2: CHANGELOG.

Agent

Prelude

Released on: 2026-02-26

Security Notes
  • APM: On span tags, add obfuscation for ACL command.
Bug Fixes
  • Fixes a rare crash in the system-probe process caused by concurrent access to an internal LRU cache.
  • Fix a Windows file-permission issue that prevented workload selection policy files from being updated after the initial write.
  • Fixed a bug in the disk Go check (diskv2) where custom tags from one check instance would leak into metrics from other instances. Tags are now correctly isolated per instance.
  • GPU: ensure gpu.nvlink.speed metric is emitted in Blackwell or newer devices.

Datadog Cluster Agent

Prelude

Released on: 2026-02-26 Pinned to datadog-agent v7.76.1: CHANGELOG.

Agent

Prelude

Released on: 2026-02-23

Upgrade Notes
  • DDOT now submits Fleet Automation metadata through the upstream datadogextension, which is enabled by default. As a result, your DDOT configuration will now appear under the OTel Collector tab. If you configured otelcollector.converter.features, you may need to add the datadog feature to enable Fleet Automation, as DDOT Fleet Automation metadata is no longer submitted through the ddflareextension.
New Features
  • Allow users to filter agent check instances using a new --instance-id parameter, which filters by the instance hash found in the agent status.

  • Add privateactionrunner binary in Agent artifacts to allow running actions using the Agent, and enable running it on Linux. The binary is disabled by default. To enable it, set privateactionrunner.enabled: true in your configuration file.

  • Integration check failures are now automatically reported to the Agent Health Platform component when enabled via health_platform.enabled: true. This provides structured health issue tracking with:

    • Detailed error context including check name, error message, and configuration source
    • Actionable remediation steps for debugging check failures
    • Automatic issue resolution when checks recover
    • Integration with the health platform telemetry and reporting system

    This feature helps users proactively identify and troubleshoot integration issues across their fleet.

  • The Agent Profiling check now supports automatic Agent termination after flare generation when memory or CPU thresholds are exceeded. This feature is useful in resource-constrained environments where the Agent needs to be restarted after generating diagnostic information.

    Enable this feature by setting terminate_agent_on_threshold: true in the Agent Profiling check configuration. When enabled, the Agent uses its established shutdown mechanism to trigger graceful shutdown after successfully generating a flare, ensuring proper cleanup before exit.

    Warning: This feature will cause the Agent to exit. This feature is disabled by default and should be used with caution.

  • Experimental support the ConfigSync HTTP endpoints over unix sockets with agent_ipc.use_socket: true (defaults to false).

  • Implements the flare command for the otel-agent binary. Now you can run otel-agent flare directly in the otel-agent container to get OTel flares.

  • Adds system info metadata collection for macOS end-user devices.

  • Adds system info metadata collection for Windows end-user devices.

  • Added GPU runtime discovery support for ECS EC2 environments. The Datadog Agent can now detect GPU device UUIDs assigned to containers by extracting the NVIDIA_VISIBLE_DEVICES environment variable from the Docker container configuration. This enables GPU-to-container mapping for GPU metrics without requiring the Kubernetes PodResources API, which is not available in ECS environments.

  • After falling back to TCP, the Logs Agent periodically retries to establish HTTP and upgrades the connection once HTTP connectivity is available.

  • Container logs now include a LogSource tag indicating whether each log message originated from stdout or stderr. This applies to logs parsed via Docker and Kubernetes CRI runtimes.

  • Added paging file metrics to the Windows memory check for pagefile.sys usage.

Enhancement Notes
  • Add a new global_view_db variable to AWS Autodisovery templates. By default this is the value of the datadoghq.com/global_view_db tag on the instance or cluster.

  • Add NotReady endpoint processing to be on par with EndpointSlices processing.

  • The agentprofiling check now retries flare generation 2 times with exponential backoff (1 minute after first failure, 5 minutes after second failure) when flare creation or sending fails. This improves reliability when encountering transient failures during flare generation.

  • Adds a kubernetes_kube_service_new_behavior flag (default false) to alter kube_service tag behavior. If the flag is set to true, kube_service tag is attached unconditionally. Previously, the tag was only attached when the Kubernetes service has the status Ready.

  • APM: Add custom protobuf encoder for trace writer v1 with string compaction to reduce payload size.

  • Extended the autodiscovery secret resolver to support refreshing secrets.

  • Agents are now built with Go 1.25.7.

  • The datadog-installer setup command now prints human-readable errors instead of mixing JSON and text.

  • Added GPUDeviceIDs field to the workloadmeta Container entity to store GPU device UUIDs. This field is populated by the Docker collector in ECS environments from the NVIDIA_VISIBLE_DEVICES environment variable (e.g., GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx).

  • The GPU collector now uses GPUDeviceIDs from workloadmeta as the primary source for GPU-to-container mapping in ECS, with fallback to procfs for regular Docker environments and PodResources API for Kubernetes.

  • GPU: add new tag gpu_type to the GPU metrics to identify the type of GPU (e.g., a100, h100).

  • Improve eBPF conntracker support by using alternate probes when the primary probe is unavailable, enabling compatibility with GKE Autopilot and other environments running Google COS.

  • The logs.dropped metric now tracks dropped logs for both TCP and HTTP log transports. Previously, this metric was only available when using TCP transport. Customers can now monitor dropped logs with a single unified metric regardless of which transport protocol is configured, making it easier to detect and troubleshoot log delivery issues.

  • The logs agent now supports using start_position: beginning and start_position: forceBeginning with wildcard file paths. Previously, configurations like path: /var/log/*.log with start_position: beginning would fail validation. The agent's fingerprinting system when enabled prevents duplicate log reads during file rotation, making this combination safe to use.

  • Site config URLs are now lowercased for consistent handling.

  • APM: Add tags databricks_job_id, databricks_job_run_id, databricks_task_run_id, config.spark_app_startTime, config.spark_databricks_job_parentRunId to the default list of tags that are known to not be credit card numbers so they are skipped by the credit card obfuscator.

  • Add option to switch on/off Infra-Attribute-Processor for traces in the OTLP ingest pipeline.
    otlp_config:
    traces:
    infra_attributes:
    enabled: false

    These settings can be configured in the Agent config file or by using the environment variables.

  • The Datadog Agent now collects AWS Spot preemption events (requires IMDS access) as Datadog events.

  • Added network_config.dns_monitoring_ports, which is a list of DNS ports Cloud Network Monitoring will use to monitor DNS traffic on.

  • Automatically tag, but don't aggregate, multiline logs. Logs are tagged with the number of other logs they could potentially be aggregated with.

  • Update the histogram helpers API in the pkg/opentelemetry-mapping-go/otlp/metrics package. The API now accepts accept pointers to the OTLP data points, and returns blank DDSketches when the pointer is nil.

  • Update image resolution attempt telemetry to include the tag specified in the configuration, and remove the registry and digest_resolution tags.

  • Windows: Add a new flare artifact agent_loaded_modules.json listing loaded DLLs with metadata (full path, timestamp, size, perms) and version info (CompanyName, ProductName, OriginalFilename, FileVersion, ProductVersion, InternalName). Keeps <flavor>_open_files.txt for compatibility.

Deprecation Notes
  • The command agent diagnose show-metadata inventory-otel has been removed. To display DDOT metadata, you can query the datadog extension endpoint: http://localhost:9875/metadata.
Bug Fixes
  • Properly scrub sensitive information from Kubernetes pod specifications in agent flares. Environment variables with sensitive names are now redacted.
  • Fixed a bug where long Kubernetes event bundles were being truncated by dogweb.
  • APM: Fix a bug where the Agent would log a warning when the DD_APM_MODE environment variable was unset.
  • Properly parse the image_tag tag when defining a container spec that uses both an image tag and a digest like nginx:1.23@sha256:xxx.
  • Updates tag enrichment logic to retry on failed tag resolution attempts. This regression was introduced in #41587 on Agent v7.73+. Impacts origin detection on cgroup v2 runtimes with DogStatsD, which led to tags not being enriched, even if origin detection was possible by using other methods like container ID from socket or ExternalData.
  • Fixed a regression in the Go-native disk check (diskv2) where a failure in IO counter collection (e.g. ERROR_INVALID_FUNCTION from DeviceIoControl on Windows Server 2016) caused all disk metrics to be discarded, including successfully collected partition/usage metrics such as system.disk.total, system.disk.used, and system.disk.free. IO counter collection is now best-effort: known errors such as ERROR_INVALID_FUNCTION are logged at debug level, while unexpected errors are logged as warnings. Neither prevent partition metrics from being reported.
  • Fleet installer: ensure the DD_LOGS_ENABLED environment variable is honored again when running setup scripts, so Windows installs using the new installer flow properly. Sets logs_enabled in datadog.yaml.
  • Fixes a bug introduced in 7.73.0 that can cause a remote Agent update through Fleet Automation to fail to restore the previous version if the MSI fails and the C:\Windows\SystemTemp\datadog-installer\rollback\InstallOciPackages.json file is present.
  • Fix Flux API groups, split fluxcd.io into source.toolkit.fluxcd.io and kustomize.toolkit.fluxcd.io.
  • Fixes repetitive 'Could not make file tailer' warning logs when short lived pods are terminated and the Agent attempts to create a file tailer for the deleted containers in a pod. Now the Agent will not create container services for pods that have been deleted and no-longer have containers to tail.
  • GPU: MIG devices and parents are now reporting correct core and memory limits.
  • GPUm: fix gpu.memory.limit being duplicated in Hopper devices
  • Fixed the logs.sent metric for the HTTP log transport to no longer increment when logs are dropped due to non-retryable errors. This ensures more accurate reporting of successfully delivered logs.
  • Fix WLAN check failure on macOS systems.
  • Fix datadog.agent.check_ready to always include the check_name tag value for Python checks.
  • Rename kubernetes_kube_service_new_behavior to kubernetes_kube_service_ignore_readiness to better reflect the behavior.
  • Prevent a deadlock from occurring in the otel-agent when its internal telemetry Prometheus endpoint is scraped.
  • [oracle] Updates the oracle.d/conf.yaml.example file to include all supported sql obfuscator options. [DBM] Bump go-sqllexer to v0.1.12:
    • Fixes a normalization bug for Oracle queries with positional bind parameters.
    • Fixes a memory leak in the go-sqllexer package.
Other Notes
  • Add metrics origins for battery integration.
  • Remove procps-ng and associated tools from Agent packages.

Datadog Cluster Agent

Prelude

Released on: 2026-02-23 Pinned to datadog-agent v7.76.0: CHANGELOG.

New Features
  • APM: Add apm_config.instrumentation.injection_mode configuration option to control APM library injection method. Possible values are auto (default), init_container, and csi. The auto mode automatically selects the best injection mode (currently uses init containers). The init_container mode is the legacy method that copies APM libraries into pods using init containers. The csi mode mounts APM libraries directly into pods using the Datadog CSI driver. It is experimental and requires Cluster Agent 7.76+ and the Datadog CSI driver.
  • APM: Add CSI-based library injection as an alternative to init containers (experimental). This provides faster pod startup and reduced storage overhead.
  • Reduced memory usage of compliance checks on large clusters
Enhancement Notes
  • Reduced memory usage when pod collection is enabled in the Cluster Agent.
Bug Fixes
  • When injection fails for Single Step Instrumentation due to constrained resources, we add an annotation to the pod with a reason for the error. This annotation now matches all other annotations the webhook writes to a pod spec by prefixing the annotation with internal. The full annotation is now: internal.apm.datadoghq.com/injection-error
Other Notes
  • Refactor the auto-instrumentation webhook's injectTracers function to use a modular, explicit mutation pattern. This improves code readability and maintainability. Edge case behavior may differ slightly, but overall functionality remains unchanged.

Agent

Prelude

Released on: 2026-02-17

Enhancement Notes
  • Agents are now built with Go 1.25.7.
Security Notes
  • APM: On span tags, add obfuscation for HELLO and MIGRATE Redis commands.
    Similar to AUTH, all arguments passed to these commands will be obfuscated and replaced with ?.

Datadog Cluster Agent

Prelude

Released on: 2026-02-17 Pinned to datadog-agent v7.75.4: CHANGELOG.

Agent

Prelude

Released on: 2026-02-11

Security Notes
  • Bump the version of envoyproxy/gateway to 1.5.7

Datadog Cluster Agent

Prelude

Released on: 2026-02-11 Pinned to datadog-agent v7.75.3: CHANGELOG.

Agent

Prelude

Released on: 2026-02-04

Upgrade Notes
  • Update OpenJDK to 11.0.30. This release includes changes that may negatively affect JMX integrations that use TLS. Refer to OpenJDK release notes for more information.
Bug Fixes
  • Disable the SNMP device scan by default.
  • Fixes a regression introduced in version 7.75 that caused Workload Protection File Integrity Monitoring to be disabled by default when installing the Datadog Agent via the Helm chart.
  • Fixes a bug introduced in Agent v7.74 where unresolved SSH sessions could cause Workload Protection events to be delayed for several minutes, potentially blocking the delivery of other Workload Protection events.
  • GPU: fix metric type for gpu.nvlink.*, gpu.pci.replay_counter and gpu.remapped_rows.* metric that were reported as counters instead of gauges

Datadog Cluster Agent

Prelude

Released on: 2026-02-04 Pinned to datadog-agent v7.75.2: CHANGELOG.

Latest
Jun 11, 2026