Releases: DataDog/datadog-agent
7.38.2
Prelude
Release on: 2022-08-10
- Please refer to the 7.38.2 tag on integrations-core for the list of changes on the Core Checks
Bug Fixes
- Fixes a bug making the agent creating a lot of zombie (defunct) processes. This bug happened only with the docker images
7.38.x
when the containerized agent was launched withouthostPID: true
.
7.38.1
Prelude
Release on: 2022-08-02
Bug Fixes
- Fixes CWS rules with 'process.file.name !=""' expression.
Datadog Cluster Agent 1.22.0
Prelude
Released on: 2022-07-26
Pinned to datadog-agent v7.38.0: CHANGELOG
New Features
- Enable collection of Ingresses by default in the orchestrator check.
7.38.0
Prelude
Release on: 2022-07-25
- Please refer to the 7.38.0 tag on integrations-core for the list of changes on the Core Checks
New Features
- Add NetFlow feature to listen to NetFlow traffic and forward them to Datadog.
- The CWS agent now supports filtering events depending on whether they are performed by a thread. A process is considered a thread if it's a child process that hasn't executed another program.
- Adds a diagnose datadog-connectivity command that displays information about connectivity issues between the Agent and Datadog intake.
- Adds support for tailing modes in the journald logs tailer.
- The CWS agent now supports writing rules on processes termination.
- Add support for new types of CI Visibility payloads to the Trace Agent, so features that until now were Agentless-only are available as well when using the Agent.
Enhancement Notes
- Tags configured with DD_TAGS or DD_EXTRA_TAGS in an EKS Fargate environment are now attached to OTLP metrics.
- Add NetFlow static enrichments (TCP flags, IP Protocol, EtherType, and more).
- Report lines matched by auto multiline detection as metrics and show on the status page.
- Add a containerd_exclude_namespaces configuration option for the Agent to ignore containers from specific containerd namespaces.
- The log_level of the agent is now appended to the flare archive name upon its creation.
- The metrics reported by KSM core now include the tags "kube_app_name", "kube_app_instance", and so on, if they're related to a Kubernetes entity that has a standard label like "app.kubernetes.io/name", "app.kubernetes.io/instance", etc.
- The Kubernetes State Metrics Core check now collects two ingress metrics:
kubernetes_state.ingress.count
andkubernetes_state.ingress.path
. - Move process chunking code to util package to avoid cycle import when using it in orchestrator check.
- APM: Add support for PostgreSQL JSON operators in the SQL obfuscate package.
- The OTLP ingest endpoint now supports the same settings and protocol as the OpenTelemetry Collector OTLP receiver v0.54.0 (OTLP v0.18.0).
- The Agent now embeds Python-3.8.13, an upgrade from Python-3.8.11.
- APM: Updated Rare Sampler default configuration values to sample traces more uniformly across environments and services.
- The OTLP ingest endpoint now supports Exponential Histograms with delta aggregation temporality.
- The Windows installer now supports grouped Managed Service Accounts.
- Enable https monitoring on arm64 with kernel >= 5.5.0.
- Add
otlp_config.debug.loglevel
to determine log level when the OTLP Agent receives metrics/traces for debugging use cases.
Deprecation Notes
- Deprecate
otlp_config.metrics.instrumentation_library_metadata_as_tags
in in favor ofotlp_config.metrics.instrumentation_scope_metadata_as_tags
.
Bug Fixes
- When
enable_payloads.series
orenable_payloads.sketches
are set to false, don't log the errorCannot append a metric in a closed buffered channel
. - Restrict permissions for the entrypoint executables of the Dockerfiles.
- Revert docker.mem.in_use calculation to use RSS Memory instead of total memory.
- Add missing telemetry metrics for HTTP log bytes sent.
- Fix panic in container, containerd, and docker when container stats are temporarily not available
- Fix prometheus check Metrics parsing by not enforcing a list of strings.
- Fix potential deadlock when shutting down an Agent with a log TCP listener.
- APM: Fixed trace rare sampler's oversampling behavior. With this fix, the rare sampler will sample rare traces more accurately.
- Fix journald byte count on the status page.
- APM: Fixes an issue where certain (#> and #>>) PostgreSQL JSON operators were being interpreted as comments and removed by the obfuscate package.
- Scrubs HTTP Bearer tokens out of log output
- Fixed the triggered "svType != tvType; key=containerd_namespace, st=[]interface {}, tt=[]string, sv=[], tv=[]" error when using a secret backend reader.
- Fixed an issue that made the container check to show an error in the "agent status" output when it was working properly but there were no containers deployed.
Datadog Cluster Agent 1.21.0
Prelude
Released on: 2022-06-28
Pinned to datadog-agent v7.37.0: CHANGELOG
Enhancement Notes
-
The Cluster Agent followers now forward queries to the Cluster Agent leaders themselves. This allows a reduction in the overall number of connections to the Cluster Agent and better spreads the load between leader and forwarders.
-
Make the name of the ConfigMap used by the Cluster Agent for its leader election configurable.
-
The Datadog Cluster Agent exposes a new metric
endpoint_checks_configs_dispatched
.
Bug Fixes
-
Fix a panic occuring during the invocation of the
check
command on the
Cluster Agent if the Orchestrator Explorer feature is enabled. -
Fix the node count reported for Kubernetes clusters.
Datadog Cluster Agent 1.20.0
Prelude
Released on: 2022-05-22
Pinned to datadog-agent v7.36.0: CHANGELOG
New Features
-
The Datadog Admission Controller supports multiple configuration injection
modes through theadmission_controller.inject_config.mode
parameter
or theDD_ADMISSION_CONTROLLER_INJECT_CONFIG_MODE
environment variable:hostip
: Inject the host IP. (default)service
: Inject Datadog's local-service DNS name.socket
: Inject the Datadog socket path.
-
Collect ResourceRequirements for jobs and cronjobs for kubernetes live containers.
Enhancement Notes
-
Added a configuration option to admission controller to allow
configuration of the failure policy. Defaults to Ignore which
was the previous default. The default of Ignore means that pods
will still be admitted even if the webhook is unavailable to
inject them. Setting to Fail will require the admission controller
to be present and pods to be injected before they are allowed to run. -
The admission controller's reinvocation policy is now set to
IfNeeded
by default.
It can be changed using theadmission_controller.reinvocation_policy
parameter. -
The Datadog Cluster Agent now supports internal profiling.
-
KSM core check: add a new
kubernetes_state.cronjob.complete
service check that returns the status of the most recent job for
a cronjob.
Security Notes
- Cluster Agent API (only used by Node Agents) is now only server with TLS >= 1.3 by default. Setting "cluster_agent.allow_legacy_tls" to true allows to fallback to TLS 1.0.
Bug Fixes
-
Fix the node count reported for Kubernetes clusters.
-
Fixed an issue that created lots of log messages when the DCA admission controller was enabled on AKS.
-
Time-based metrics (for example,
kubernetes_state.pod.age
,kubernetes_state.pod.uptime
) are now comparable in the Kubernetes state core check. -
Fix a risk of panic when multiple KSM Core check instances run concurrently.
-
Remove noisy Kubernetes API deprecation warnings in the Cluster Agent logs.
Other Notes
- Change the default value of the external metrics provider port from 443 to 8443.
This will allow to run the cluster agent with a non-root user for better security.
This was already the default value in the Helm chart and in the datadog operator.
7.37.1
Prelude
Release on: 2022-06-28
Bug Fixes
- Fixes issue where proxy config was ignored by the trace-agent.
7.37.0
Prelude
Release on: 2022-06-27
- Please refer to the 7.37.0 tag on integrations-core for the list of changes on the Core Checks
Upgrade Notes
- OTLP ingest: Support for the deprecated
experimental.otlp
section and theDD_OTLP_GRPC_PORT
andDD_OTLP_HTTP_PORT
environment variables has been removed. Use theotlp_config
section or theDD_OTLP_CONFIG_RECEIVER_PROTOCOLS_GRPC_ENDPOINT
andDD_OTLP_CONFIG_RECEIVER_PROTOCOLS_HTTP_ENDPOINT
environment variables instead. - OTLP: Deprecated settings
otlp_config.metrics.report_quantiles
andotlp_config.metrics.send_monotonic_counter
have been removed in favor ofotlp_config.metrics.summaries.mode
andotlp_config.metrics.sums.cumulative_monotonic_mode
respectively.
New Features
- Adds User-level service unit filtering support for Journald log collection via
include_user_units
andexclude_user_units
. - A wildcard (*) can be used in either exclude_units or exclude_user_units if only a particular type of Journald log is desired.
- A new troubleshooting section has been added to the Agent CLI. This section will hold helpers to understand the Agent behavior. For now, the section only has two command to print the different metadata payloads sent by the Agent (v5 and inventory).
- APM: Incoming OTLP traces are now allowed to set their own sampling priority.
- Enable NPM NAT gateway lookup by default.
- Partial support of IPv6 on EKS clusters
- Fix the kubelet client when the IP of the host is IPv6.
- Fix the substitution of %%host%% patterns inside the auto-discovery annotations: If the concerned pod has an IPv6 and the %%host%% pattern appears inside an URL context, then the IPv6 is surrounded by square brackets.
- OTLP ingest now supports the same settings and protocol version as the OpenTelemetry Collector OTLP receiver v0.50.0.
- The Cloud Workload Security agent can now monitor and evaluate rules on bind syscall.
- [corechecks/snmp] add scale factor option to metric configurations
- Evaluate
memory.usage
metrics based on collected metrics.
Enhancement Notes
- APM:
DD_APM_FILTER_TAGS_REQUIRE
andDD_APM_FILTER_TAGS_REJECT
can now be a literal JSON array. e.g.["someKey:someValue"]
This allows for matching tag values with the space character in them. - SNMP Traps are now sent to a dedicated intake via the epforwarder.
- Update SNMP traps database to include integer enumerations.
- The Agent now supports a single
com.datadoghq.ad.checks
label in Docker, containerd, and Podman containers. It merges the contents of the existingcheck_names
,init_configs
(now optional), andinstances
annotations into a single JSON value. - Add a new Agent telemetry metric
autodiscovery_poll_duration
(histogram) to monitor configuration poll duration in Autodiscovery. - APM: Added
/config/set
endpoint in trace-agent to change configuration settings during runtime. Supports changing log level(log_level). - APM: When the X-Datadog-Trace-Count contains an invalid value, an error will be issued.
- Upgrade to Docker client 20.10, reducing the duration of docker check on Windows (requires Docker >= 20.10 on the host).
- The Agent maintains scheduled cluster and endpoint checks when the Cluster Agent is unavailable.
- The Cluster Agent followers now forward queries to the Cluster Agent leaders themselves. This allows a reduction in the overall number of connections to the Cluster Agent and better spreads the load between leader and forwarders.
- The
kube_namespace
tag is now included in all metrics, events, and service checks generated by the Helm check. - Include install_info to version-history.json
- Allow nightly builds install on non-prod repos
- Add a
kubernetes_node_annotations_as_tags
parameter to use Kubernetes node annotations as host tags. - Add more detailed logging around leadership status failures.
- Move the experimental SNMP Traps Listener configuration under
network_devices
. - Add support for the DNS Monitoring feature of NPM to Linux kernels older than 4.1.
- Adds
segment_name
andsegment_id
tags to PCF containers that belong to an isolation segment. - Make logs agent
additional_endpoints
reliable by default. This can be disabled by settingis_reliable: false
on the additional endpoint. - On Windows, if a
datadog.yaml
file is found during an installation or upgrade, the dialogs collecting the API Key and Site are skipped. - Resolve SNMP trap variables with integer enumerations to their string representation.
- [corechecks/snmp] Add profile
static_tags
config - Report telemetry metrics about the retry queue capacity:
datadog.agent.retry_queue_duration.capacity_secs
,datadog.agent.retry_queue_duration.bytes_per_sec
anddatadog.agent.retry_queue_duration.capacity_bytes
- Updated cloud providers to add the Instance ID as a host alias for EC2 instances, matching what other cloud providers do. This should help with correctly identifying hosts where the customer has changed the hostname to be different from the Instance ID.
- NTP check: Include
/etc/ntpd.conf
and/etc/openntpd/ntpd.conf
foruse_local_defined_servers
. - Kubernetes pod with short-lived containers do not have log lines duplicated with both container tags (the stopped one and the running one) when logs are collected. This feature is enabled by default, set
logs_config.validate_pod_container_id
tofalse
to disable it.
Security Notes
- The Agent is built with Go 1.17.11.
Bug Fixes
- Updates defaults for the port and binding host of the experimental traps listener.
- APM: The Agent is now performing rare span detection on all spans, as opposed to only dropped spans. This change will slightly reduce the number of rare spans kept unnecessarily.
- APM OTLP: This change ensures that the ingest now standardizes certain attribute keys to their correct Datadog tag counter parts, such as: container tags, "operation.name", "service.name", etc.
- APM: Fix a bug where the APM section of the GUI would not show up in older Internet Explorer versions on Windows.
- Support dynamic Auth Tokens in Kubernetes v1.22+ (Bound Service Account Token Volume).
- The
%%host%%
autodiscovery tag now works properly when using containerd, but only on Linux and when using IP v4 addresses. - Enhanced the coverage of pause-containers filtering on Containerd.
- APM: Fix the loss of trace metric container information when large payloads need to be split.
- Fix cri check producing no metrics when running on OpenShift / cri-o.
- Fix missing health status from Docker containers in Live Container View.
- Fix Agent startup failure when running as a non-privileged user (for instance, when running on OpenShift with
restricted
SCC). - Fix missing container metrics (container, containerd checks and live container view) on AWS Bottlerocket.
- APM: Fixed an issue where "CPU threshold exceeded" logs would show the wrong user CPU usage by a factor of 100.
- Ensures that when
kubernetes_namespace_labels_as_tags
is set, the namespace labels are always attached to metrics and logs, even when the pod is not ready yet. - Add missing support for UDPv6 receive path to NPM.
- The
agent workload-list --verbose
command and theworkload-list.log
file in the flare do not show containers' environment variables anymore. Except forDD_SERVICE
,DD_ENV
andDD_VERSION
. - Fixed a potential deadlock in the Python check runner during agent shutdown.
- Fixes issue where trace-agent would not report any version info.
- The DCA and the cluster runners no longer write warning logs to /tmp.
- Fixes an issue where the Agent would panic when trying to inspect Docker containers while the Docker daemon was unavailable or taking too long to respond.
Other Notes
- Exclude teradata on Mac agents.
7.36.1
Prelude
Release on: 2022-05-31
- Please refer to the 7.36.1 tag on integrations-core for the list of changes on the Core Checks
Bug Fixes
- Fixes issue where proxy config was ignored by the trace-agent.
- This fixes a regression introduced in
7.36.0
where some logs sources attached to a container/pod would not be unscheduled on container/pod stop if multiple logs configs were attached to the container/pod. This could lead to duplicate log entries being created on container/pod restart as there would be more than one tailer tailing the targeted source.
7.36.0
Prelude
Release on: 2022-05-24
- Please refer to the 7.36.0 tag on integrations-core for the list of changes on the Core Checks
Upgrade Notes
- Debian packages are now built on Debian 8. Newly built DEBs are supported on Debian >= 8 and Ubuntu >= 14.
- The OTLP endpoint will no longer enable the legacy OTLP/HTTP endpoint
0.0.0.0:55681
by default. To keep using the legacy endpoint, explicitly declare it via theotlp_config.receiver.protocols.http.endpoint
configuration setting or its associated environment variable,DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_HTTP_ENDPOINT
. - Package signing keys were rotated:
New Features
- Adding support for IBM cloud. The agent will now detect that we're running on IBM cloud and collect host aliases (vm name and ID).
- Added event collection in the Helm check. The feature is disabled by default. To enable it, set the
collect_events
option to true. - Adds a service check for the Helm check. The check fails for a release when its latest revision is in "failed" state.
- Adds a
kube_qos
(quality of service) tag to metrics associated with kubernetes pods and their containers. - CWS can now track network devices creation and load TC classifiers dynamically.
- CWS can now track network namespaces.
- The DNS event type was added to CWS.
- The OTLP ingest endpoint is now considered GA for metrics.
Enhancement Notes
- Traps OIDs are now resolved to names using user-provided 'traps db' files in
snmp.d/traps_db/
. - The Agent now supports a single
ad.datadoghq.com/$IDENTIFIER.checks
annotation in Kubernetes Pods and Services to configure Autodiscovery checks. It merges the contents of the existing "check_names",init_configs
(now optional), andinstances
annotations into a single JSON value. DD_URL
environment variable can now be used to set the Datadog intake URL just likeDD_DD_URL
. If bothDD_DD_URL
and DD_URL are set,DD_DD_URL
will be used to avoid breaking change.- Added a
process-agent version
command, and made the output mimic the core agent. - Windows: Add Datadog registry to Flare.
- Add
--service
flag tostream-logs
command to filter streamed logs in detail. - Support a simple date pattern for automatic multiline detection
- APM: The OTLP ingest stringification of non-standard Datadog values such as Arrays and KeyValues is now consistent with OpenTelemetry attribute stringification.
- APM: Connections to upload profiles to the Datadog intake are now closed after 47 seconds of idleness. Common tracer setups send one profile every 60 seconds, which coincides with the intake's connection timeout and would occasionally lead to errors.
- The Cluster Agent now exposes a new metric
cluster_checks_configs_info
. It exposes the node and the check ID as tags. - KSM core check: add a new
kubernetes_state.cronjob.complete
service check that returns the status of the most recent job for a cronjob. - Retry more HTTP status codes for the logs agent HTTP destination.
COPYRIGHT-3rdparty.csv
now contains each copyright statement exactly as it is shown on the original component.- Adds
sidecar_present
andsidecar_count
tags on Cloud Foundry containers that run apps with sidecar processes. - Agent flare now includes output from the
process
andcontainer
checks. - Add the
--cfgpath
parameter in the Process Agent replacing--config
. - Add the
check
subcommand in the Process Agent replacing--check
(-check
). Only warn once if the-version
flag is used. - Adds human readable output of process and container data in the
check
command for the Process Agent. - The Agent flare command now collects Process Agent performance profile data in the flare bundle when the
--profile
flag is used.
Deprecation Notes
- Deprecated
process-agent --vesion
in favor ofprocess-agent version
. - The logs configuration
use_http
anduse_tcp
flags have been deprecated in favor offorce_use_http
andforce_use_tcp
. - OTLP ingest:
metrics.send_monotonic_counter
has been deprecated in favor ofmetrics.sums.cumulative_monotonic_mode
.metrics.send_monotonic_counter
will be removed in v7.37. - OTLP ingest:
metrics.report_quantiles
has been deprecated in favor ofmetrics.summaries.mode
.metrics.report_quantiles
will be removed in v7.37 / v6.37. - Remove the unused
--ddconfig
(-ddconfig
) parameter. Deprecate the--config
(-config
) parameter (show warning on usage). - Deprecate the
--check
(-check
) parameter (show warning on usage).
Bug Fixes
- Bump GoSNMP to fix incomplete support of SNMP v3 INFORMs.
- APM: OTLP: Fixes an issue where attributes from different spans were merged leading to spans containing incorrect attributes.
- APM: OTLP: Fixed an inconsistency where the error message was left empty in cases where the "exception" event was not found. Now, the span status message is used as a fallback.
- Fixes an issue where some data coming from the Agent when running in ECS Fargate did not have
task_*
,ecs_cluster_name
,region
, andavailability_zone
tags. - Collect the "0" value for resourceRequirements if it has been set
- Fix a bug introduced in 7.33 that could prevent auto-discovery variable
%%port_<name>%%
to not be resolved properly. - Fix a panic in the Docker check when a failure happens early (when listing containers)
- Fix missing
docker.memory.limit
(anddocker.memory.in_use
) on Windows - Fixes a conflict preventing NPM/USM and the TCP Queue Length check from being enabled at the same time.
- Fix permission of "/readsecret.sh" script in the agent Dockerfile when executing with dd-agent user (for cluster check runners)
- For Windows, fixes problem in upgrade wherein NPM driver is not automatically started by system probe.
- Fix Gohai not being able to fetch network information when running on a non-English windows (when the output of commands like
ipconfig
were not in English).gohai
no longer relies on system commands but uses Golangnet
package instead (same as Linux hosts). This bug had the side effect of preventing network monitoring data to be linked back to the host. - Time-based metrics (for example,
kubernetes_state.pod.age
,kubernetes_state.pod.uptime
) are now comparable in the Kubernetes state core check. - Fix a risk of panic when multiple KSM Core check instances run concurrently.
- For Windows, includes NPM driver 1.3.2, which has a fix for a BSOD on system probe shutdown.
- Adds new
--json
flag tocheck
.process-agent check --json
now outputs valid json. - On Windows, includes NPM driver update which fixes performance problem when host is under high connection load.
- Previously, the Agent could not log the start or end of a check properly after the first five check runs. The Agent now can log the start and end of a check correctly.
Other Notes
- Include pre-generated trap db file in the
conf.d/snmp.d/traps_db/
folder. - Gohai dependency has been upgraded. This brings a newer version of gopsutil and a fix when fetching network information in non-english Windows (see
fixes
section). - If users are using strict firewall rules, they should also exclude the new port 6162 from their firewall.