This sample deploys Agones multi-cluster configuration to Amazon EKS, one routing cluster and two DGS clusters, with multi-cluster allocation feature enabled.
This sample also works as a good Terraform example for the following features:
- Deploy Agones with Network Load Balancer (NLB) instead of Classic Load Balancer with AWS Load Balancer Controller
- Aggregate logs and metrics into Amazon CloudWatch using Fluent Bit and OpenTelemetry
- View Kubernetes resources with Kubernetes Dashboard
- Adjust the number of nodes by Karpenter (You can also see cluster autoscaler version in this tag)
The architecture overview of this sample is as the image below.
We adopt a cluster topology of Dedicated Cluster Responsible For Routing, which is disscussed here. By this way, your cluster configurations are symmetric - all the DGS clusters can share the same configuration, which is simpler than the toplogy Single Cluster Responsible For Routing with a special DGS cluster to perform serving game servers as well as allocation routing function. All Clusters Responsible For Routing topology seems overkill for a single region deployment, because it is unlikely for only a single cluster to fail while other clusters in the same region are working normally. It might improve availability with multi-region deployment though.
You must install the following tools before deploying this sample:
- Terraform CLI
- Kubectl
- Helm
- AWS CLI
- After install, you must configure permission equivalent to Administrator IAM policy
Please open variable.tf
and check the parameters.
You can continue to deploy without any modification, but you may want to change some of the settings such as AWS region to deploy. You can also improve the security by specifying CIDRs that can connect to servers. By default all the servers are protected by mTLS but can be connected from anyone.
To deploy this sample, you need to run the following commands:
# Install required modules
terraform init
# deploy to your account
terraform apply -auto-approve
It usually takes 20-30 minutes to deploy.
After a deployment, please check all the pods are running properly (i.e. Running
state) by the commands below:
aws eks update-kubeconfig --name dgs01
kubectl get pods -A
aws eks update-kubeconfig --name dgs02
kubectl get pods -A
aws eks update-kubeconfig --name router
kubectl get pods -A
You can follow the official guide to connect to a game server, or run the following commands:
aws eks update-kubeconfig --name dgs01 # or --name dgs02
kubectl get gs
# you will get a output like below
NAME STATE ADDRESS PORT NODE AGE
dgs-fleet-2l7fs-8pjfb Ready ec2-redacted.us-west-2.compute.amazonaws.com 7684 ip-10-0-177-35.us-west-2.compute.internal 66m
dgs-fleet-2l7fs-dtz7c Ready ec2-redacted.us-west-2.compute.amazonaws.com 7039 ip-10-0-177-35.us-west-2.compute.internal 66m
# get IP address and PORT number from the above output
nc -u {ADDRESS} {PORT}
# now you can send some message and see ACK is returned
As a sample game server, we are running simple-game-server. The available commands are described in the README.md
.
You can allocate a game server pod either by using GameServerAllocation
API aggregation or allocator service client.
To use GameServerAllocation API aggregation, run the following command:
aws eks update-kubeconfig --name router
kubectl create -f example/allocation.yaml
# Try to run above command several times
# You can see some DGS pods are allocated
aws eks update-kubeconfig --name dgs01
kubectl get gs
aws eks update-kubeconfig --name dgs02
kubectl get gs
To use an allocator service client, run the following commands. You can either use gRPC or REST interface. Since they are protected by mTLS, you need to set up TLS certificates and private keys first.
NAMESPACE=default # replace with any namespace
EXTERNAL_IP=$(terraform output -raw allocation_service_hostname)
KEY_FILE=client.key
CERT_FILE=client.crt
TLS_CA_FILE=ca.crt
# get certificates locally
terraform output -raw allocation_service_client_tls_key | base64 -d > $KEY_FILE
terraform output -raw allocation_service_client_tls_crt | base64 -d > $CERT_FILE
terraform output -raw allocation_service_server_tls_crt | base64 -d > $TLS_CA_FILE
mv $KEY_FILE $CERT_FILE $TLS_CA_FILE ./example
cd ./example
# Using go example code for gRPC interface
go run allocation-client.go --ip ${EXTERNAL_IP} --namespace ${NAMESPACE} --key ${KEY_FILE} --cert ${CERT_FILE} --cacert ${TLS_CA_FILE} --multicluster true
# Using curl for REST interface
curl --key ${KEY_FILE} \
--cert ${CERT_FILE} \
--cacert ${TLS_CA_FILE} \
-H "Content-Type: application/json" \
--data '{"namespace":"'${NAMESPACE}'", "multiClusterSetting":{"enabled":true}}' \
https://${EXTERNAL_IP}/gameserverallocation \
-v
Note that allocation requests are forwarded from the routing cluster to the DGS clusters with Agones multi-cluster allocation feature.
You can open Kubernetes dashboard to see and manage Kubernetes resources in detail. It is already installed in all the clusters. You can follow the steps below to open and login it.
aws eks update-kubeconfig --name <cluster name> # cluster name: dgs01, dgs02, router
kubectl proxy
# Now, open http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:https/proxy/#/login
# Retrieve access token by the below command:
kubectl -n kubernetes-dashboard get secret $(kubectl -n kubernetes-dashboard get sa/admin-user -o jsonpath="{.secrets[0].name}") -o go-template="{{.data.token | base64decode}}"
Agones logs and metrics are aggregated into CloudWatch in this sample. You can easily check them in CloudWatch management console.
To check logs, you can open Log groups
page, and inspect relevant log groups (e.g. /aws/containerinsights/dgs01/application
.)
Here you can see application logs ingested nearly realtime by Fluent Bit. You can configure which logs should be included or excluded by modifying modules/fluent_bit/manifests
. Please also check the official document for further detail.
For metrics, you can either open All metrics
page or Dashboards
page from CloudWatch management console.
In All metrics page, you can check each metric one by one, which can be useful to check metrics in an ad-hoc manner.
In CloudWatch Dashboards page, you can create a dashboard to monitor all the required metrics at a glance. This sample includes a sample dashboard for monitoring Agones. You can import the dashboard by the following command:
aws cloudwatch put-dashboard --dashboard-name agones-demo-dashboard --dashboard-body file://exmaple/dashboard.json
Note that AWS region us-west-2
is hard-coded in dashboard.json
. If you deployed this sample in other regions, please replace it before creating a dashboard.
After put-dashboard
successed, now you can open the imported dashboard from CloudWatch management console.
You can freely and intuitively customize the dashboard via management console. Please also refer to this document if you need further information.
You can add or remove metrics ingested into CloudWatch by modifying otel.yaml
.
The main document for AWS Distro for OpenTelemetry is here.
You can also refer to the URLs commented in the file for further detail of each configs.
Currently there are only two DGS clusters, but you can add more of them easily.
To add a DGS cluster, open main.tf
and declare another instance of ./modules/dgs_cluster
module. You also need to add the module to local.dgs_clusters
list variable.
# Add this
module "dgs03" {
source = "./modules/dgs_cluster"
cluster_name = "dgs03"
vpc = module.vpc
cluster_endpoint_public_access_cidrs = var.cluster_endpoint_allowed_cidrs
gameserver_allowed_cidrs = var.gameserver_allowed_cidrs
}
# Don't forget to edit also this variable
locals {
dgs_clusters = [module.dgs01, module.dgs02, module.dgs03]
}
To avoid incurring future charges, clean up the resources you created.
You can remove all the AWS resources deployed by this sample running the following command:
terraform destroy -auto-approve
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.