Provide an overview and detailed technical information regarding the implementation of AWS Elasticache using the Memcache engine.
GraphQL queries increase load on Drupal backed VA.GOV-CMS database. By implementing a layer of the most frequently accessed data those queries will remove be much faster and shift load away from the CMS database. In-memory caching is an effective way to solve this problem
Team tested build times[4] and resource consumption of a PoC for Elasticache Memcache and found that a full build request was faster and consumed fewer resources[5] on the database server and only slightly more CPU on the app server. It is suspected that it is so much faster on a cold cache because the cache writes are faster with Memcache than RDS. Cold cache (over 50% decrease)
- GraphQL Request time
- wo/Memcache: 1m06s
- w/Memcache: 0m32s
- RDS Load
- wo/Memcache: 33% max
- w/Memcache: 21%
- RDS Write IOPS (read was negligible)
- wo/Memcache: 657/second
- w/Memcache: 13/second
- EC2 Load
- wo/Memcache: 24.0% max
- w/Memcache: 47.1% max
- Warm cache
- GraphQL Request time
- wo/Memcache: 5.365s
- w/Memcache: 4.788s
- RDS Load - N/A
- EC2 Load - N/A
- GraphQL Request time
The 50% decrease in response times for a vets-website GraphQL Build with our current node count of 5K+ justifies the replacement of AWS Relational Database Service (RDS) with AWS Elasticache Memcache for the Drupal Cache Layer. The decrease in response time also justifies the increased complexity of adding Memcache to our entire stack, including CI and Local environments.
- Redis
- Pros
- Advanced Data Structure
- Clustering
- “Fastest”
- Sharding
- Cons
- Single Threading
- Blocking
- Pros
- Memcache
- Pros:
- Simple
- Multi Thread out of the box
- Fast
- Cons:
- Simple data structure
- Old hotness?
- Pros:
AWS Elasticache using Memcache is implemented in Terraform.[2]
Attribute | Value |
---|---|
Engine | Memcache |
Node Type | cache.r5.large - 2 vCPU - 13GiB Memory |
Number of Nodes | 2 |
Engine Version | 1.6.6 |
Maintenance Window | Every Monday 22:00-23:00 UTC |
AWS Elasticache for Memcache is configured for High-Availability by running multiple (2) nodes across multiple AWS Availability Zones. This will mitigate the failure of hardware or network connectivity with a particular AZ.
Terraform is configured to ignore changes to the Node Type, Number of Nodes and Engine Version as changes to these attributes would show a difference between the configured value and the new value until the maintenance window applies them. The other possible solution to this is to use the apply_immediately
attribute which may be fine for no production environments but may introduce problems for live/production environments.
Any changes to the above attributes will be applied during the predefined Maintenance Window. Monday 2200-2300 UTC was chosen so that it coincides with COB Eastern time and 1500 Pacific time. This is meant to minimize disruptions to users while also occurring during business hours when DevOps support staff are available.
Debugging docs Elasticache/cluster debugging CI/Lando will be same-node debugging
Memcache health and performance metrics are graphed in Grafana. This Grafrana implemenation is only available within the VA, accessible by the socks proxy. Memcache metrics are at the bottom of this page.