Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build hdf5 clibs; install xarray deps for netcdf #65

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 3 additions & 17 deletions terraform/.terraform.lock.hcl

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

45 changes: 43 additions & 2 deletions terraform/docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,45 @@ FROM public.ecr.aws/emr-serverless/spark/emr-7.0.0:latest
USER root
WORKDIR /opt

RUN yum update -y && yum install -y git

# Update and install required packages
RUN dnf update -y && \
dnf install -y \
git \
gcc \
gcc-c++ \
make \
wget \
zlib-devel \
openmpi-devel \
python3-devel && \
dnf clean all


# Set up MPI environment
ENV PATH=/usr/lib64/openmpi/bin:$PATH
ENV LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:$LD_LIBRARY_PATH
ENV CPATH=/usr/lib64/openmpi/include:$CPATH
ENV MPI_CC=mpicc

RUN pip3 install mpi4py

# Download and build HDF5 from source with thread safety and parallel enabled
RUN wget https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.14/hdf5-1.14.3/src/hdf5-1.14.3.tar.gz && \
tar -xzf hdf5-1.14.3.tar.gz && \
cd hdf5-1.14.3 && \
./configure --prefix=/usr --disable-fortran --enable-hl --enable-parallel --with-zlib=/usr/include,/usr/lib CPPFLAGS=-I/usr/include/openmpi-aarch64 && \
make -j -l6 && \
make install && \
cd .. && \
rm -rf hdf5-1.14.3 hdf5-1.14.3.tar.gz

# Set the HDF5 library path
ENV HDF5_DIR=/usr
ENV LD_LIBRARY_PATH=/usr/lib:$LD_LIBRARY_PATH
ENV CPATH=/usr/include:/usr/include/openmpi-aarch64:$CPATH

# Install Python packages
RUN pip3 install \
s3fs \
gcsfs \
Expand All @@ -16,7 +53,11 @@ RUN pip3 install \
venvception>=0.0.5 \
jupyter-repo2docker \
pangeo-forge-recipes \
git+https://github.com/ranchodeluxe/beam-pyspark-runner@patch-2
netcdf4 \
h5netcdf \
wheel \
git+https://github.com/ranchodeluxe/beam-pyspark-runner@patch-2 && \
HDF5_MPI="ON" HDF5_DIR=/usr pip3 install --no-binary=h5py h5py

WORKDIR /home/hadoop
USER hadoop:hadoop
10 changes: 4 additions & 6 deletions terraform/emr/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -74,13 +74,11 @@ resource "aws_ecr_repository_policy" "emr_serverless_ecr_policy" {
}

# Execution role (permissions for actual job runs)
data "template_file" "execution_role_policy" {
template = file(var.execution_role_template)

vars = {
region = var.region
locals {
execution_role_policy = templatefile(var.execution_role_template, {
region = var.region
account_id = var.account_id
}
})
}

resource "aws_iam_policy" "emr_execution_policy" {
Expand Down
2 changes: 1 addition & 1 deletion terraform/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ provider "aws" {
}

terraform {
required_version = "1.7.4"
required_version = "~> 1.7.4"
required_providers {
aws = {
source = "hashicorp/aws"
Expand Down