Spoptimize is a tool that automates use of Amazon EC2 spot instances in your AutoScaling Groups.
Spoptimize was inspired by AutoSpotting, performs very similar actions, but has its own - completely unique - implementation.
But why reinvent the wheel and not use AutoSpotting?
I had been noodling on ways to utilize spot instances in AutoScaling groups for quite awhile. Before writing Spoptimize, I had brainstormed a few different ideas before I came across AutoSpotting. I thought the idea was ingenious, but I thought it might be fun to build a similar system that was event driven vs using polling. I had never used AWS Step Functions before, so I took the opportunity to build my own tool using Step Functions whose executions were initiated by AutoScaling Launch Notifications.
Each launch notification is processed by a Lambda, which in turns begins an execution of Spoptimize's Step Funcions.
The Step Function execution manages the execution of Lambda functions which perform these actions:
- Wait following new instance launch. (See
spoptimize:init_sleep_interval
below) - Verify that the new on-demand instance is healthy according to autoscaling.
- Request Spot Instance using specifications defined in autoscaling group's launch configuration.
- Wait for Spot Request to be fulfilled and for spot instance to be online. (See
spoptimize:spot_req_sleep_interval
below) - Acquire an exclusive lock on the autoscaling group. This step prevents multiple executions from attaching & terminating instances simultaneously.
- Attach spot instance to autoscaling group and terminate original on-demand instance.
- Wait for spot instance to be healthy according to autoscaling. (See
spoptimize:spot_attach_sleep_interval
below) - Verify health of spot instance and release exclusive lock.
Screenshot of a successful execution:
Here's a breakdown the privileges required for deployment. Deployment requires the ability to:
- create/update/delete:
- CloudFormation stacks
- IAM Managed Policy
- IAM Roles
- CloudWatch Alarms
- DynamoDb tables whose table names begin with
spoptimize
- Lambda functions whose function names begin with
spoptimize
- Step Functions whose names begin with
spoptimize
- create a SNS topic named
spoptimize-init
- create a S3 bucket named
spoptimize-artifacts-YOUR_AWS_ACCOUNT_ID
- read/write to aforementioned S3 bucket with a prefix of
spoptimize
Note: many of the names and prefixes can be overridden via setting environment variables prior to running the deployment script.
You can deploy Spoptimize via the CloudFormation console using the following launch button. It will deploy the latest build:
If you wish to deploy Spoptimize via a shell or an automated process, you can utilize the included deploy script.
Prerequisites:
- Bash
- AWS CLI
- API access to an AWS account
First clone this repo, or download a tar.gz or zip from Releases.
Deploy both the IAM stack and the Step Functions & Lambdas:
$ ./deploy.sh
Deploy just the IAM stack:
$ ./deploy.sh iam
Deploy just the Step Functions and Lambdas:
$ ./deploy.sh cfn
After Spoptimize is deployed, configure your autoscaling groups to send launch notifications to the
spoptimize-init
SNS topic.
Set via CloudFormation (see NotificationConfigurations
):
LaunchGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
LaunchConfigurationName: !Ref LaunchConfig
DesiredCapacity: 0
MinSize: 0
MaxSize: 12
VPCZoneIdentifier:
- !Select [ 0, !Ref SubnetIds ]
- !Select [ 1, !Ref SubnetIds ]
MetricsCollection:
- Granularity: 1Minute
HealthCheckGracePeriod: 120
Cooldown: 180
HealthCheckType: ELB
TargetGroupARNs:
- !Ref ElbTargetGroup
Tags:
- Key: Name
Value: !Ref AWS::StackName
PropagateAtLaunch: true
NotificationConfigurations:
- TopicARN: !Sub "arn:aws:sns:${AWS::Region}:${AWS::AccountId}:spoptimize-init"
NotificationTypes:
- autoscaling:EC2_INSTANCE_LAUNCH
Newly launched instances will (eventually) be replaced by spot instances.
Spoptimize's wait intervals may be overridden per AutoScaling via the use of tags.
spoptimize:min_protected_instances
: Set a minimum number of on-demand instances for the autoscaling group. Defaults to 0. This prevents Spoptimize from replacing all on-demand instances with spot instances. NOTE: Spoptimzie leverages Instance Protection to achieve this.spoptimize:init_sleep_interval
: Initial wait interval after launch notification is received. Spoptimize won't do anything during this wait period. Defaults to approximately the group's Health Check Grace Period times the Desired Capacity plus 30-90s. This is directly correlated to the capacity to allow for rolling updates to complete before any instances are replaced.spoptimize:spot_req_sleep_interval
: Wait interval following spot instance request. Default is 30s.spoptimize:spot_attach_sleep_interval
: Wait interval following attachment of spot instance to autoscaling group. Defaults to the group's Health Check Grace Period plus 30s.spoptimize:spot_failure_sleep_interval
: Wait interval between iterations following a spot instance failure. Defaults to 1 hour. A spot failure may be a failed spot instance request or a failure of the spot instance after it comes online.
Below are override tags I used during development. (Note: these are very aggressive so that I could watch Spoptimize in action.)
Set via CloudFormation:
Tags:
- Key: Name
Value: !Ref AWS::StackName
PropagateAtLaunch: true
- Key: spoptimize:min_protected_instances
Value: 1
PropagateAtLaunch: false
- Key: spoptimize:init_sleep_interval
Value: 45
PropagateAtLaunch: false
- Key: spoptimize:spot_req_sleep_interval
Value: 10
PropagateAtLaunch: false
- Key: spoptimize:spot_attach_sleep_interval
Value: 125
PropagateAtLaunch: false
- Key: spoptimize:spot_failure_sleep_interval
Value: 900
PropagateAtLaunch: false
- Auto-Scaling groups that deploy EC2 instances to VPCs are tested. Auto-Scaling groups in EC2-Classic should work, but is not tested.