This project aims to audit and analyze S3 bucket access using AWS CloudTrail, ElasticSearch, and Node.js Express. It includes processes for sending audit logs to another S3 bucket, ingesting logs into ElasticSearch, creating visualizations in Kibana, and building an API to query top source IPs for PutObject requests.
- Prerequisites
- Stream Log data to Amazon OpenSearch domain
- ElasticSearch and Kibana Setup
- Node.js Express API
- Setup the Node.js server
- Continous Integration and Tests using GitHub Actions
- AWS Account with an S3 bucket
Our goal is to analyze S3 bucket audit logs, so we start by ensuring we have an S3 bucket.
We use CloudTrail to store all audit logs of the S3 bucket in a new bucket. When creating the trail, we configure a KMS key for encrypting log files:
Enable CloudWatch logs to integrate them with the trail so that CloudWatch can monitor the trail logs and send them to the ElasticSearch cluster:
For the event type, our aim is to analyze only the audit logs of an S3 bucket, so select only "Data events":
We will reduce costs and logging events by choosing S3 as the data event source, and selecting only a specific S3 bucket we wish to analyze:
Click next and create the trail.
Navigate to the OpenSearch Service on AWS console and create a new domain. In order to go through the limits of AWS free tier, we will use a basic testing deployment running only on one AZ.
We will use a General Purpose t3.small.search instance type for our date node, as it's included under AWS free tier. Avoid using T2 instance types as they do not support encryption at rest. We will deploy a single node with 20 GiB of storage size to align with the free tier limits.
We will make our ElasticSearch cluster public, securing it with Fine-grained access control by creating role mappings on the ElasticSearch domain.
With fine-grained access control enabled, we will have HTTPS security, node-to-node encryption, and encryption at rest enabled.
Proceed with all the default settings and create the OpenSearch cluster. It will take 15-20 minutes for it to be completed.
While waiting for the provisioning for the cluster, we will go on creating a CloudWatch log group to stream data in near-real time to our domain using CloudWatch subscription.
Go to the CloudWatch log groups and select the cloudtrail log group which was automatically created when we created the trail. Create a subscription for this group:
The delivery of the logs to the ElasticSearch domain will be done using Lambda, and as such, we also need to create an IAM role for the Lambda execution. We want to limit the Labmda function with OpenSearch Service access only to our specific domain. Go to IAM -> Policies -> Create Policy -> Paste the following JSON (Replace the resource with your domain ARN of the OpenSearch cluster):
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"es:*"
],
"Effect": "Allow",
"Resource": "arn:aws:es:us-east-1:371029671060:domain/elastic-domain-s3/*"
}
]
}
Create the policy and go on to create the IAM role - select "Labmda" as the service. Click next and choose the policy we just created. Along that, you might need to also add the AWSLambdaBasicExecutionRole policy, if you wish to enable logs on the lambda function.
Go back to the CloudWatch subscription creation and choose the IAM role we just created for the Lambda execution. Proceed with choosing "CloudTrail" for the log format and keep the filter empty:
Click "Start streaming".
Go to the Amazon OpenSearch service and access the Kibana URL of our domain. Login with the master user you configured earlier.
You might notice we are yet to observe the logs on the ElasticSearch cluster. This is because we first need to map the Lambda role with a backend role on ElasticSearch which grants him permission to write the logs. Make sure you add a backend role, specifying the ARN of the Lambda role we created before:
Define an index pattern to fit with the logs coming from CloudWatch:
Select the @timestamp attribute as primary time field for use with the global time filter, and create the index pattern:
We can now start exploring the logs coming to ElasticSearch at the 'Discover' tab. Click on "Add filter" and configure it accordingly with the events you wish to further observe. Our goal is to find out top most source IPs with PutObject event:
We will use this data to create a kibana visualization. Save this report by clicking 'save' at the top menu:
Go to the 'Visualize' and proceed with creating a visualization. Choose your desired visualization type and select the filtered report we saved:
Configure the visualization filter to show the source IP address ordered by count:
Save the visualization, and go on to the 'Dashboards' tab to create a Dashboard. Click on "Add an existing" and choose the visualization we just created:
There we go. Our Kibana dashboard is set. Don't forget to save it by clicking 'save' at the top menu.
This Node.js Express API provides endpoints for querying and retrieving information from an Elasticsearch cluster that stores CloudTrail logs. The API is designed to expose specific information related to S3 bucket access events, specifically focusing on the "PutObject" events.
- Endpoint:
/top-source-ips
- Method:
GET
- Description: Retrieves the top source IP addresses associated with "PutObject" events in the CloudTrail logs stored in Elasticsearch.
- Request Parameters: None
- Example Request:
curl http://localhost:3000/top-source-ips
Example Response:
[
{"sourceIPAddress":"77.138.24.9","count":32},
{"sourceIPAddress":"109.253.185.10","count":10},
// ... additional entries
]
The API requires the following environment variables to be configured:
ELASTIC_USERNAME: Username for ElasticSearch authentication
ELASTIC_PASSWORD: Password for ElasticSearch authentication PORT: Port on which the API server will listen (default is 3000)
For security purposes, ensure that Elasticsearch authentication credentials (ELASTIC_USERNAME and ELASTIC_PASSWORD) are kept secure and are not included in the code. Set up the required environment variables by creating a .env file or directly in your environment.
We will set up our API server on an EC2 instance with ubuntu 22.04, running it secured with HTTPS behind an Application Load Balancer. Configure a Target Group of 'Instances' type for our EC2 instance. Choose HTTPS:443 for our protocol, and place it in the same VPC as our EC2 instance.
For the Health Check, we will need to override the default settings and configure the 3000 port which our API server listens on as the Health check port. Choose /top-source-ips, as the path, and keep the HTTP success code to '200'. Click next to move on to the 'Register targets' step. Choose our EC2 instance - register it and create the Target Group.
On the EC2 dashboard, create an Application Load Balancer. Make it internet facing and configure it in the same VPC and AZ where our EC2 instance lies. Create a Security Group for the ALB which listens on HTTPS only:
Our API listens on port 3000 but the Load Balancer will accept only HTTPS connections and reach our backend API at port 3000. Similarly to this setup, the security group of our EC2 instance will need to accept only inbound rule for TCP Port 3000, coming from the ALB security group we created. It does not need to accept HTTPS connections as it will only get requests from the ALB on port 3000.
Move on with creating an HTTPS:443 listener, choosing the Target Group we created. Choose an SSL certificate which will be attached to the Load Balancer:
If you don't have a certificate, go to the AWS Certificate Manager, and request a public SSL certificate for a domain you own. You can register a domain on Route 53 or other vendors.
Click on 'Create Load Balancer' after reviewing all the settings are correct.
After our Load Balancer is ready, go ahead and create a DNS Alias record on Route 53 for the Load Balancer:
Now we can reach our API on the DNS record we configured.
Make sure to install dependencies on the EC2 instance by running
npm install
And our application is ready to go! Another step to make our app more reliable would be configuring it as a service in /etc/systemd/system/nodeapp.service:
[Unit]
Description=Node.js App
[Service]
ExecStart=/usr/bin/node /home/ubuntu/node_app/app.js
Restart=always
User=ubuntu
Group=ubuntu
Environment=PATH=/usr/bin:/usr/local/bin
Environment=NODE_ENV=production
WorkingDirectory=/home/ubuntu/node_app
[Install]
WantedBy=multi-user.target
Run and enable the service to make it run on boot:
sudo systemctl daemon-reload
sudo systemctl enable nodeapp
sudo systemctl start nodeapp
Verify the service is running correctly:
We have implemented continuous integration (CI) for our Node.js Express API using GitHub Actions. The CI process includes linting and running tests automatically whenever changes are pushed to the main branch. The GitHub workflow is triggered on each push to the main branch. It sets up a Node.js environment, installs dependencies, lints the code using ESLint, and runs tests using Mocha. The environment variables ELASTIC_USERNAME and ELASTIC_PASSWORD are securely provided using GitHub Secrets. You can deploy the Secrets at the GitHub repository settings -> Secrets and Variables:
These tests use Mocha as the testing framework, Chai for assertions, and Supertest for making HTTP requests to the API. The tests include checking if the response is in JSON format and if it contains the expected sourceIPAddress and count properties on each item of the array. This test will reliably confirm our API is working correctly.
This ESLint configuration is specifically tailored for Node.js and Mocha, allowing for linting of test files.
By following this CI setup, we ensure that our code is consistently tested and linted with each change.