Alright, time to see what all the hype is about and start some science in the cloud.
For both the pangeo cloud and coiled you have to sign in. Its free to do so, so just sign up for both right now.
You will likely use pangeo cloud as your 'frontend', meaning the spot in the cloud where your jupyter notebook is running. You currently have the choice between two different deployments on Google Cloud or AWS. Each of these has a 'staging' and a 'production' deployment, which differ in the versions that are available in the environment ('staging' has the more recent versions').
Each of these has a 'staging' and a 'production' deployment, which differ in the versions that are available in the environment ('staging' has the more recent versions').
You can choose your deployment on the pangeo cloud federation repo.
Choosing a cloud provider usually is a question of where your data lives. You always want to 'bring your compute to the data' so match the deployment to your dataset location of choice.
For coiled you currently should use AWS only!
Once you decided you will have to start a server
Most of the time you want to choose the server with the pangeo-Notebook
installed
For the google cloud deployments, you will be able to choose different resources for your cloud server Its usually faster to get a small or medium server (and since we will get the computing power via dask below), those are often enough here.
So the above will usually give you a node with 2-4 cores and few GB of RAM. But obviously we want
You can currently choose between two ways of getting a dask cluster:
- Via pangeo cloud:
- Super easy to set up
from dask_gateway import GatewayCluster
cluster = GatewayCluster()
cluster.adapt(minimum=2, maximum=10) # or cluster.scale(n) to a fixed size.
client = cluster.get_client()
client
- Hard to change dependencies on the dask workers (see here for some workarounds)
- Costs pangeo money
- Via coiled:
- Just slightly more complicated setup:
import coiled
from dask.distributed import Client
# make sure to match the
coiled.create_software_environment(
name='pangeo-cloud',
container='pangeo/pangeo-notebook', # matches Pangeo Cloud AWS staging cluster with latest image
# container='pangeo/pangeo-notebook:2021.05.04', # matches Pangeo Cloud AWS staging cluster
# container='pangeo/pangeo-notebook:2021.05.04', # matches Pangeo Cloud AWS production cluster
)
cluster = coiled.Cluster(software="pangeo-cloud")
client = Client(cluster)
client
- Limited to AWS at the moment
- It is super easy to set up custom environment on dask workers example.
- We get free credits with our consulting contract