Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

soci in codebuild: containerd mountpoint? #4

Open
jrobison-sb opened this issue Dec 5, 2023 · 18 comments
Open

soci in codebuild: containerd mountpoint? #4

jrobison-sb opened this issue Dec 5, 2023 · 18 comments

Comments

@jrobison-sb
Copy link

My goal is to build soci indexes inside AWS codebuild, like this.

When I try setting up containerd using a tmpfs mountpoint, ctr image pull ... seems to work at first, but it runs out of disk space because tmpfs defaults to having a disk size which is equivalent to .5X RAM, and my image is greater than this size.

When I try setting up containerd without using a tmpfs mountpoint, like this:

                - mkdir /containerd
                # - mount -t tmpfs tempfs /containerd
                - echo Start Containerd
                - sudo containerd --root /containerd &
                - sleep 3

Then ctr image pull ... fails with an error saying failed to convert whiteout file.

Do you have any suggestion for building these indexes inside Codebuild when my image size is greater than .5X RAM?

Thanks.

@ollypom
Copy link
Contributor

ollypom commented Jan 19, 2024

Hey @jrobison-sb sorry for the delayed response, I didn't have notifications turned on for this repo (I do now) 😞

I looked into creating a SOCI Index in codebuild again today and realized we don't actually need to run a second containerd daemon. As there is already a containerd daemon (with an accessible socket) running in the environment underneath codebuild's Docker Engine. We can just leverage that instead, removing all of the filesystem / mount restrictions of containers in containers.

The containerd endpoint is at /var/run/docker/containerd/containerd.sock, so just export that as the CONTAINERD_ADDRESS variable in your CodeBuild instance and away you go.

I've updated the CodeBuild sample in this repo. But the gist is, I create the container image and export it as a tarball in a single docker buildx command. I then import the image into containerd, index it with soci, and then push the image and the index to ECR.

- echo Building the container image
- docker buildx create --driver=docker-container --use
- docker buildx build --quiet --tag $IMAGE_URI:$IMAGE_TAG --file Dockerfile.v2 --output type=oci,dest=./image.tar .
- echo Import the container image to containerd
- ctr image import ./image.tar
- echo Generating SOCI index
- soci create $IMAGE_URI:$IMAGE_TAG
- echo Pushing the container image
- ctr image push --user AWS:$PASSWORD $IMAGE_URI:$IMAGE_TAG
- echo Push the SOCI index to ECR
- soci push --user AWS:$PASSWORD $IMAGE_URI:$IMAGE_

@jrobison-sb
Copy link
Author

@ollypom thanks for your reply. These new directions without an intermediate push-pull step are much cleaner, thanks for that.

I gave this another try using the updated directions. We currently use Codebuild AL2 standard:2.0 image on arm64, which ships with Docker v20. I'm guessing that Docker v20 isn't compatible with the new suggestions (specifically the docker buildx ... commands), because when I try docker buildx create --driver, I get an error saying "unknown flag: --driver".

This would probably work if we upgraded to AL2 standard:3.0 image on arm64 (which ships with Docker v23), but apparently docker buildx is also broken on AL2 standard:3.0 for arm64, even though buildx is supported on amd64.

I guess I'll just have to wait for aws/aws-codebuild-docker-images#640 to be resolved and then revisit this.

@ollypom
Copy link
Contributor

ollypom commented Jan 20, 2024

Oh interesting. I didn't realize docker buildx is not in the arm64 CodeBuild image. Great call out @jrobison-sb thanks for letting me know.

I have now added an alternative method for arm64 in a multi architecture example. This example uses docker buildx for x86, and docker build for arm64. I have to admit the way I have created the manifest list using the manifest-tool could be simplified by using docker manifest instead, but I wanted to experiment with the lambda compute type. I have added some background in the README.md.

For the arm64 image here are the steps to create the image, export it, and then import it back into containerd. This should work with any Docker version (including the x86 image).

- echo Building the container image
- docker build --quiet --tag $IMAGE_URI:$IMAGE_TAG --file Dockerfile.v2 .
- echo Export the container image
- docker save --output ./image.tar $IMAGE_URI:$IMAGE_TAG
- echo Import the container image to containerd
- ctr image import ./image.tar

@kichik
Copy link

kichik commented Jan 21, 2024

I couldn't get any of the examples to work for various reasons. I ended up forking the Lambda snapshotter and making it standalone so I can use it in CodeBuild with no dependencies. You might also find it useful for your use case.

https://github.com/CloudSnorkel/standalone-soci-indexer

@ollypom
Copy link
Contributor

ollypom commented Jan 30, 2024

@jrobison-sb Looks like the arm64 CodeBuild image now includes Buildx :)

@kichik ooh interesting, thank you for sharing. I've transferred this over to a feature request in the soci-snapshotter repository to see if there is appetite to support soci create without a container runtime. Do you mind adding details of your use case to that issue? Ty!

@kichik
Copy link

kichik commented Jan 30, 2024

I have the same use case as this ticket. I am trying to create an index in CodeBuild. Specifically here.

@jrobison-sb
Copy link
Author

@ollypom thanks for the new simplified directions. Using those directions I'm able to:

docker build -t $REPOSITORY_URI:$IMAGE_TAG .
docker image save --output image.tar $REPOSITORY_URI:$IMAGE_TAG
ctr image import ./image.tar
soci create $REPOSITORY_URI:$IMAGE_TAG
docker push $REPOSITORY_URI:$IMAGE_TAG

And all of the above works fine.

But then when I try to soci push --user AWS:$(aws ecr get-login-password --region $AWS_DEFAULT_REGION) $REPOSITORY_URI:$IMAGE_TAG, it seemingly pushes a few layers (or whatever) and then gives me this error:

[Container] 2024/01/31 19:52:13.677265 Running command soci push --user AWS:$(aws ecr get-login-password --region $AWS_DEFAULT_REGION) $REPOSITORY_URI:$IMAGE_TAG
5214 | checking if a soci index already exists in remote repository...
5215 | pushing soci index with digest: sha256:c9e4f0bef52c80a425dc71f18f3fb782b13ede71a3c9917b10e6921b93e6dfce
5216 | pushing artifact with digest: sha256:42ca96a18056ef0fdd3fcd731dd039b2d9370c283a0cafff81d530083935567d
5217 | pushing artifact with digest: sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
5218 | soci: error pushing graph to remote: sha256:b5f6cd50db4808bca7c327f0c25c9dd35ba3dc1e32e64795acbf532c40d08134: application/vnd.docker.distribution.manifest.v2+json: not found

I googled the error but didn't find anything, unfortunately. I assume there's probably just not that many folks on earth who are building SOCI indexes just yet. Have you seen this one before?

Is it a conflict that I push the image to ECR by way of docker push (which pushes from the docker image store) and then I push the SOCI index by way of soci push (which, as I understand it, pushes from the containerd image store)?

Thanks for any thoughts you might have on this.

Full buildspec and other codebuild specifics here.

Codebuild image: aws/codebuild/amazonlinux2-aarch64-standard:3.0 (this is ARM, not Intel)

Base image in our Dockerfile: ecr-public/docker/library/ruby:3.0.2-bullseye

Buildspec (with some business-y logic removed, but this should illustrate what we're doing with docker, at least):

version: 0.2

env:
  variables:
    CONTAINERD_ADDRESS: "/var/run/docker/containerd/containerd.sock"
  parameter-store:
    ECR_URL: /production/some_service/ecr_repository_url

phases:
  pre_build:
    commands:
      - echo Logging in to Amazon ECR...
      - AWS_DEFAULT_REGION=us-east-1
      - ACCOUNT_ID=$(echo $CODEBUILD_BUILD_ARN | cut -d":" -f 5)
      - >
        aws ecr get-login-password --region $AWS_DEFAULT_REGION |
        docker login --username AWS --password-stdin "$ACCOUNT_ID".dkr.ecr."$AWS_DEFAULT_REGION".amazonaws.com
      - REPOSITORY_URI=$ECR_URL
      - IMAGE_TAG=$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | cut -c 1-7)

      # Install dependecies for building soci indexes
      # https://github.com/aws-samples/aws-fargate-seekable-oci-toolbox/blob/main/soci-codepipeline/cloudformation.yaml#L358-L384
      # https://docs.aws.amazon.com/AmazonECS/latest/userguide/container-considerations.html?trk=3d9bf291-787b-4280-bb00-c4a8e441a748&sc_channel=el#fargate-tasks-soci-images
      - echo Download the SOCI Binaries
      - wget --quiet -O /tmp/soci.tar.gz https://github.com/awslabs/soci-snapshotter/releases/download/v0.4.1/soci-snapshotter-0.4.1-linux-arm64.tar.gz
      - tar xvf /tmp/soci.tar.gz -C /usr/local/bin/ soci
  build:
    on-failure: ABORT
    commands:
      - set -x
      - echo Building the Docker image...
      - >
        DOCKER_BUILDKIT=1 docker build -t $REPOSITORY_URI:$IMAGE_TAG -f aws/Dockerfile
        --target final
        --cache-from $REPOSITORY_URI:latest
        --build-arg FOO=BAR
        .
      - echo Starting a DB migration
      - docker images
      - docker run --rm $REPOSITORY_URI:$IMAGE_TAG bundle exec rake db:migrate
      - docker tag $REPOSITORY_URI:$IMAGE_TAG $REPOSITORY_URI:latest
  post_build:
    commands:
      - echo Build completed on `date`
      # Export the image to a tar file so the tar file can be imported to containerd
      # so that it can be used in the SOCI index
      - docker image save --output image.tar $REPOSITORY_URI:$IMAGE_TAG
      - echo Import the container image to containerd
      - ctr image import ./image.tar
      - echo Generating SOCI index
      - soci create $REPOSITORY_URI:$IMAGE_TAG
      - echo Pushing the Docker images...
      - docker push $REPOSITORY_URI:latest
      - docker push $REPOSITORY_URI:$IMAGE_TAG
      - echo Push the SOCI index to ECR
      - soci push --user AWS:$(aws ecr get-login-password --region $AWS_DEFAULT_REGION) $REPOSITORY_URI:$IMAGE_TAG
      - echo Writing image definitions file...
      - printf '[{"name":"some-service","imageUri":"%s"},{"name":"some-service-nginx","imageUri":"%s"}]' $REPOSITORY_URI:$IMAGE_TAG $REPOSITORY_URI:$IMAGE_TAG > imagedefinitions.json


artifacts:
    files:
      - imagedefinitions.json

@ollypom
Copy link
Contributor

ollypom commented Feb 2, 2024

@jrobison-sb you are correct. The container image digest changes through the docker save and ctr import process, discussed here in the containerd repo.

In this context, you have indexed a container image with a digest that starts b5f6cd50db, this is the digest of the image when stored in containerd. However an image with this digest can not be found in ECR by the soci CLI hence the error. In your example in ECR you would find a container image with the digest from the Docker image store, the digest pre docker save.

To work around this, push the container image with ctr image push instead of docker push before you do a soci push. As shown in the example:

- ctr image push --user AWS:$PASSWORD $IMAGE_URI:$IMAGE_TAG

@kichik
Copy link

kichik commented Feb 2, 2024

@ollypom does this affect the export the Lambda solution does as well? With my fork of the Lambda solution, I am able to push indexes, but they don't seem to be used (checked with curl $ECS_CONTAINER_METADATA_URI_V4 | jq .Snapshotter). Could this be because the import process modifies the image hash there as well?

@jrobison-sb
Copy link
Author

@ollypom thanks for your reply. ctr image push instead of docker push unblocked me and I'm now able to build and push my images and also build and push SOCI indexes. And this appears to have reduced our Fargate startup times by over 50%.

But even now that I've gotten this working, I do still have an appetite for awslabs/soci-snapshotter#1057 and would definitely use it if it ever became a thing. If I was to ever quit my job or get hit by a bus or something, I would expect the next engineer to already be familiar with the usual docker build && docker tag && docker push steps in our build pipeline. I probably wouldn't expect them to be familiar with ctr commands. So I'd like to return to the more familiar workflow if a standalone tool could be used for building and pushing these indexes. And a standalone index builder would also reduce our pipeline duration, since docker image save && ctr image import does take a few minutes, too.

Anyway, now that I've gotten this working, I don't need any additional help from @ollypom. I'll leave this ticket open to allow the discussion with @kichik to continue, though. Thanks again.

@ollypom
Copy link
Contributor

ollypom commented Feb 7, 2024

Hey @kichik it does not, if you have the three artifacts in your ECR repository (1) container Image (2) SOCI Index (3) OCI image Index (used to map the SOCI Index to the container image) then you should be good to go with Fargate. Artifacts 2 and 3 are what are created by the lambda / your standalone tool. With the assumption that the considerations in the ECS documentation have been met.

Happy to help troubleshoot over DM / call if its easier.


Hey @jrobison-sb completely understandable. In the latest versions of Docker (v24 and v25) you can index images that have been built with docker build with the new experimental containerd image store support in the Docker Engine, allowing you to remove the ctr commands. I have written up an example here. Once the codebuild amazon linux images have been upgraded from Docker v23, I'll upgrade the examples in this repo again 😄

@kichik
Copy link

kichik commented Feb 7, 2024

Happy to help troubleshoot over DM / call if its easier.

That would be really appreciated. I only see Soci Index in ECR. There is no Image Index. Where can I DM you? I'm on CDK Slack, Twitter or whatever works.

@uhaiderdev
Copy link

Hey @jrobison-sb sorry for the delayed response, I didn't have notifications turned on for this repo (I do now) 😞

I looked into creating a SOCI Index in codebuild again today and realized we don't actually need to run a second containerd daemon. As there is already a containerd daemon (with an accessible socket) running in the environment underneath codebuild's Docker Engine. We can just leverage that instead, removing all of the filesystem / mount restrictions of containers in containers.

The containerd endpoint is at /var/run/docker/containerd/containerd.sock, so just export that as the CONTAINERD_ADDRESS variable in your CodeBuild instance and away you go.

I've updated the CodeBuild sample in this repo. But the gist is, I create the container image and export it as a tarball in a single docker buildx command. I then import the image into containerd, index it with soci, and then push the image and the index to ECR.

- echo Building the container image
- docker buildx create --driver=docker-container --use
- docker buildx build --quiet --tag $IMAGE_URI:$IMAGE_TAG --file Dockerfile.v2 --output type=oci,dest=./image.tar .
- echo Import the container image to containerd
- ctr image import ./image.tar
- echo Generating SOCI index
- soci create $IMAGE_URI:$IMAGE_TAG
- echo Pushing the container image
- ctr image push --user AWS:$PASSWORD $IMAGE_URI:$IMAGE_TAG
- echo Push the SOCI index to ECR
- soci push --user AWS:$PASSWORD $IMAGE_URI:$IMAGE_

@ollypom is it possible to use zstd compression as well? If yes, what needs to be changed in the above commands?

@ollypom
Copy link
Contributor

ollypom commented Mar 12, 2024

Hey @uhaiderdev , unfortunately the SOCI snapshotter does not support zstd compression at the moment, it is tracked in the upstream project here.

@uhaiderdev
Copy link

uhaiderdev commented Mar 12, 2024

@ollypom I am using the aws/codebuild/amazonlinux2-x86_64-standard:4.0 image in CodeBuild with Environment Type Linux EC2 and compute type EC2.

I tried the following

export CONTAINERD_ADDRESS="/var/run/docker/containerd/containerd.sock"

[Container] 2024/03/12 19:30:35.860027 Running command ctr image import ./image.tar
--
134 | ctr: failed to dial "/var/run/docker/containerd/containerd.sock": context deadline exceeded: connection error: desc = "transport: error while dialing: dial unix:///var/run/docker/containerd/containerd.sock: timeout"
135 |  
136 | [Container] 2024/03/12 19:30:47.281212 Command did not exit successfully ctr image import ./image.tar exit status 1
137 | [Container] 2024/03/12 19:30:47.298706 Phase complete: BUILD State: FAILED
138 | [Container] 2024/03/12 19:30:47.298739 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: ctr image import ./image.tar. Reason: exit status 1

What could be the problem? Thanks

@ollypom
Copy link
Contributor

ollypom commented Mar 13, 2024

Hey @uhaiderdev, I'm unable to recreate you're issue. Are you able to share your buildspec file?

The environment that I used to try and recreate your issue is as follows:

Environment:
  ComputeType: BUILD_GENERAL1_SMALL
  Image: aws/codebuild/amazonlinux2-x86_64-standard:4.0
  PrivilegedMode: true
  Type: LINUX_CONTAINER

(FWIW there is no "Environment Type Linux EC2 and compute type EC2" as per the reference guide)

And then my commands are the same mentioned above in this thread, but I have also explicitly set the CONTAINERD_ADDRESS.

- docker build --quiet --tag $IMAGE_URI:$IMAGE_TAG --file Dockerfile.v2 .
- echo Export the container image
- docker save --output ./image.tar $IMAGE_URI:$IMAGE_TAG
- echo Import the container image to containerd
- export CONTAINERD_ADDRESS="/var/run/docker/containerd/containerd.sock"
- ctr image import ./image.tar

@uhaiderdev
Copy link

@ollypom I had to set PrivilegedMode to true and it worked. I think it would be good if we can add this in README.md. If you want I can create a pull request for that.

Thanks for your help.

@mfittko
Copy link

mfittko commented May 4, 2024

We're using @kichik 's standalone indexer and it it's working fine as container startup times on ECS fargate have essentially halved 🎉 . We also do use a custom docker image for running codebuild containers as we want full control over the build environment. Does anyone have an idea if when we build SOCI layers for our codebuild image as well, that it will be used by the codebuild runner? It seems like the indices have no impact yet. We're seeing CodeBuild provisioning times of about ~45s - 1m, which could be more in the 15-20s area I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants