This code should help to jump start PySpark with Anaconda on AWS.
conda env create -f environment.yml
- Fill in all the required information e.g. aws access key, secret acess key etc. into the
config.yml.example
file and rename it toconfig.yml
- Run it
python emr_loader.py