Support Diagnostic 7.0.0
- The way a remote REST call list is built for the ES version has been modified. It is no longer tied to major version, and will now build a set of calls specific to the minor release level. If run against an unreleased version that is not explicitly configured (such as ES 8, for example) it will no longer fail, but instead use the most up to date set of calls it can construct. It remains backward compatible to all 1.x-6.x versions as well.
- A number of new REST calls tailored to 6.x minor releases have been added and indices_stats now includes shard statistics.
indices_stats: "_stats?level=shards&pretty&human"
nodes_usage: "/_nodes/usage?pretty"
remote_cluster_info: "/_remote/info"
rollup_jobs: "/_xpack/rollup/job/_all"
rollup_caps: "/_xpack/rollup/data/_all"
cluster_settings_defaults: "/_cluster/settings?include_defaults&pretty&flat_settings"
rollup_index_caps: "*/_xpack/rollup/data"
security_priv: "_xpack/security/privilege?pretty"
ccr_stats: "_ccr/stats?pretty"
ccr_autofollow_patterns: "/_ccr/auto_follow?pretty"
ccr_follower_info: "_all/_ccr/info?pretty"
- Docker aware. If any nodes have a process ID of 1, or if Docker containers were found on the host the diagnostic is being run on the diag will assume Elasticsearch is running in a container. Normal system calls will not be run - instead Docker diagnostics and logs, as well as Docker-ized executions of compatible system calls will be obtained. By default, all running containers on that host will be queried and the results written to a subdirectory with the container id as the folder name. You can limit the Docker calls to a single container by using the --dockerId in the command line arguments.
Docker specific calls:
docker-info: "docker info"
docker-ps-all: "docker ps -a --no-trunc"
docker-logs: "docker logs <container_id>"
docker-top: "docker top <container_id>"
System calls run via docker exec for each container
ulimit
top
uname
process-list
cpu-info
sysctl
- There appeared to be a number of cases where isolated REST calls, such as nodes, node stats, or shards would fail, which in turn caused failure in the analyzer. Often, running the diagnostic again resulted in normal output so the failure condition was not continuing. For the high value calls such as those above, if they fail for any other reason than security the diagnostic will wait 5 seconds, then retry, for up to 3 times.
- Proxy servers, with and without authentication are now supported.
- in the check for the diagnostic version the option to stop the execution altogether is gone - it will now halt the run until the user has seen the message hit to proceed. It will then continue with that run, after which they can obtain the update version or continue with the outdated run.
- The local collection option has been moved to a separate script/application to minimize the chance of incorrectly using it for a standard diagnostic collection. Which was happening fairly often. It's also been flagged for removal in the future depending on how much usage the revised version gets.
- The timed multi-run, thread dump, and heap dump options have been removed.
- General library upgrades and a significant amount of refactoring were also done to streamline the codebase and facilitate upcoming feature additions.