Skip to content

AWS ElassticSearch Service Troubleshooting and Resolution Guide

Dennis Christilaw edited this page Nov 11, 2019 · 6 revisions

Troubleshooting Tips

Parking space for ElasticSearch Troubleshooting Tips that can be used to help fix some issues that may come up with the service.

Due to the limited access to the underlying infrastructure of the ESS provided by AWS, there are a limited number of things we can do or have access too in order to get things working again without involving AWS Support. I am hoping that some of the information provided here will help others.

NOTE

  • The items in this document are based from issues that I have had in my own implementations of the AWS EslasticSeaarch Service (ESS). This is not meant to be an all inclusive troubleshooting guide. What you find here may or may not work with stand-alone ESS installs.

REST API via Console or CURL

You can use CURL or the built in Dev Tools area to directly work with the Kibana API.

Using CURL, your (X)PUT commands must contain:

-H "Content-Type: application/json" -d

The command blocks are sent as JSON formatted command blocks

You can use the Console Dev Tools to send commands as well, they too, will need to be JSON formatted blocks

You can access the Dev Tools by clicking on the "Wrench" icon in Kibana Cluster in Read Only State

NOTE About Cluster Quorum

This note is only for AWS EslasticSearch Hosted Services, not EC2 built stacks.

  • As of Elasticsearch version 7.x, whenever the cluster loses quorum, the cluster is put in a read-only state as a precautionary measure.
  • If quorum loss occurs and your cluster has more than one node, Amazon ES restores quorum and places the cluster into a read-only state. You have two options:
    • Remove the read-only state and use the cluster as-is (below).
    • Restore the cluster or individual indices from a snapshot.
  • If quorum loss occurs and your cluster has only one node, Amazon ES replaces the node and does not place the cluster into a read-only state.
  • Please note that the cluster is put in a read-only state only when it loses quorum. It is not put in a read-only state in any other scenario where the cluster goes down, and is brought back to a healthy status either by you, or by AWS.

You can tell when the cluster is in a Read Only state when you do not see any new logs being ingested (or look at the cluster state using the information below).

Investigation:

Get Cluster Settings

You can query the cluster to get the cluster settings:

CURL

curl https://<ElasticSearch_Endpoint>/_cluster/settings

Using the default command above will yield results int he following format:

{"persistent":{"cluster":{"routing":{"allocation":{"cluster_concurrent_rebalance":"2","node_concurrent_recoveries":"2","disk":{"watermark":{"low":"2.85gb","flood_stage":"0.95gb","high":"1.9gb"}},"node_initial_primaries_recoveries":"4"}},"blocks":{"read_only":"false"},"metadata":{"unsafe-bootstrap":"true"}},"indices":{"recovery":{"max_bytes_per_sec":"60mb"}}},"transient":{}}

If you append the following to the end of the CURL command, you will get an output similar to the Kibana Console output below:

?pretty

Example:

curl https://<ElasticSearch_Endpoint>/_cluster/settings?pretty

You can add the '?pretty' option to most all of the commands you send to the stack.

Kibana Console

GET /_cluster/settings

Output:

{
  "persistent" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "cluster_concurrent_rebalance" : "2",
          "node_concurrent_recoveries" : "2",
          "disk" : {
            "watermark" : {
              "low" : "2.85gb",
              "flood_stage" : "0.95gb",
              "high" : "1.9gb"
            }
          },
          "node_initial_primaries_recoveries" : "4"
        }
      },
      "blocks" : {
        "read_only" : "true"
      },
      "metadata" : {
        "unsafe-bootstrap" : "true"
      }
    },
    "indices" : {
      "recovery" : {
        "max_bytes_per_sec" : "60mb"
      }
    }
  },
  "transient" : { }
}
Key part of the output:
      "blocks" : {

        "read_only" : "true"

      },

This section shows that the cluster is in Read Only mode

Lambda CloudWatch Logs

You can also see in the logs for the Lambda Function that ships logs to CloudWatch using this example:

2019-11-06T16:29:15.337Z 5e062bfb-b852-43e7-8729-48101a3605eb ERROR Invoke Error{
    "errorType": "Error",
    "errorMessage": "{\"statusCode\":403,\"responseBody\":{\"error\":{\"root_cause\":[{\"type\":\"cluster_block_exception\",\"reason\":\"blocked by: [FORBIDDEN/6/cluster read-only (api)];\",\"suppressed\":[{\"type\":\"cluster_block_exception\",\"reason\":\"blocked by: [FORBIDDEN/6/cluster read-only (api)];\"}]}],\"type\":\"cluster_block_exception\",\"reason\":\"blocked by: [FORBIDDEN/6/cluster read-only (api)];\",\"suppressed\":[{\"type\":\"cluster_block_exception\",\"reason\":\"blocked by: [FORBIDDEN/6/cluster read-only (api)];\"}]},\"status\":403}}",
    "stack": [
        "Error: {\"statusCode\":403,\"responseBody\":{\"error\":{\"root_cause\":[{\"type\":\"cluster_block_exception\",\"reason\":\"blocked by: [FORBIDDEN/6/cluster read-only (api)];\",\"suppressed\":[{\"type\":\"cluster_block_exception\",\"reason\":\"blocked by: [FORBIDDEN/6/cluster read-only (api)];\"}]}],\"type\":\"cluster_block_exception\",\"reason\":\"blocked by: [FORBIDDEN/6/cluster read-only (api)];\",\"suppressed\":[{\"type\":\"cluster_block_exception\",\"reason\":\"blocked by: [FORBIDDEN/6/cluster read-only (api)];\"}]},\"status\":403}}",
        "    at _homogeneousError (/var/runtime/CallbackContext.js:13:12)",
        "    at postError (/var/runtime/CallbackContext.js:30:51)",
        "    at done (/var/runtime/CallbackContext.js:57:7)",
        "    at fail (/var/runtime/CallbackContext.js:69:7)",
        "    at Object.fail (/var/runtime/CallbackContext.js:105:16)",
        "    at /var/task/index.js:42:25",
        "    at IncomingMessage.<anonymous> (/var/task/index.js:176:13)",
        "    at IncomingMessage.emit (events.js:203:15)",
        "    at endReadableNT (_stream_readable.js:1145:12)",
        "    at process._tickCallback (internal/process/next_tick.js:63:19)"
    ]
}

Key Element:

blocked by: [FORBIDDEN/6/cluster read-only (api)

This gives us the error that tells us the cluster is in Read Only mode

Resolution

You can use the following command to return the cluster back to Read/Write mode:

CURL

curl -XPUT https://<ElasticSearch_Endpoint>/_cluster/settings -H 'Content-Type: application/json' -d '{"persistent": {"cluster.blocks.read_only": false }}'

Kibana Console

PUT '{"persistent": {"cluster.blocks.read_only": false }}'

Result

{"acknowledged":true,"persistent":{"cluster":{"blocks":{"read_only":"false"}}},"transient":{}}

You can verify this by checking the Cluster Settings again, the /_cluster/settings should now show the following:

      "blocks" : {
        "read_only" : "false"