Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Changes #574

Open
wants to merge 3 commits into
base: staging
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# A simple container for variant-service.
# Runs service on port 80.
# Healthchecks service up every 5m.

FROM python:3.10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably good to use the latest Python version

Suggested change
FROM python:3.10
FROM python:3.12

Copy link
Author

@rajatkapoordfci rajatkapoordfci Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can upgrade the python version to 3.12 , however this would depend on the compatibility with all the dependant modules of Variation Normalizer. If currently Variation Normalizer is working on 3.12 , we can upgrade to 3.12.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Variation Normalizer works with 3.12

RUN apt update ; apt install -y rsync
RUN pip install pipenv uvicorn[standard]
COPY . /app
WORKDIR /app
RUN if [ ! -f "Pipfile.lock" ] ; then pipenv lock ; else echo Pipfile.lock exists ; fi
RUN pipenv sync
Comment on lines +10 to +11
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of pipenv, can we just install deps using pip?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be done , but this would require change in Docker commands. I have followed the existing setup of Variation Normalizer mentioned in Readme which is using pip. Any particular reason for moving to pip approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajatkapoordfci We've been removing pipenv from our projects in favor of pip (and we'll probably switch to uv). pipenv is also pretty slow

EXPOSE 80
HEALTHCHECK --interval=5m --timeout=3s \
CMD curl -f http://localhost/variation || exit 1

CMD pipenv run uvicorn variation.main:app --port 80 --host 0.0.0.0
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,3 +198,25 @@ From the _root_ directory of the repository:
```shell
pytest tests/
```

### Docker Setup

From the root directory , where Docker and docker-compose file are , run the following commands:

*docker-compose up -d db
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is clearer to rename db to seqrepo

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, your team can finalise the naming convention for the services . I will rename them accordingly.

*docker-compose up -d uta
*docker-compose up -d dynamodb-local
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is clearer to rename dynamodb-local to vicc-normalizers-ddb or something similar. I assume in the future, we will be re-using this for gene/disease/therapy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, your team can finalise the naming convention for the services . I will rename them accordingly.


Wait for all containers to start , Db service which is seqrepo container takes time and exits once the sequences download is over.

Before starting the Variation-normalizer service, check the following:

* Db container(seqrepo) has finished downloading data and has exited.
* UTA service has dispalyed the message " database system is ready to accept connections" in logs.
* Dynamodb container is up and running.

Start the Variation Normalizer container with the following commnad:

docker-compose up -d app

The container will be up and running however , it downloads the gene normalizer data and that takes time. Please wait till the ETL for gene database is done. After that Variation normalizer is ready to use.
58 changes: 58 additions & 0 deletions docker-compose.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not get this to work on my machine and had to make some modifications. Have you pushed all your commits? And were you able to get this working on your machine?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes , It working fine on my machine without any change. Please share whether it's working in your machine or not. If not what are the errors. We can connect on call as well to discuss further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajatkapoordfci what machine are you using again? We use MacOS.

The main issues I encountered were not having the correct dependencies installed (gene-normalizer etl requires the etl optional dependency to be installed) and permissions errors.

If you want, I can make a PR into your branch with the changes I had to make in order to get it semi-working. Cool-Seq-Tool creates a genomic table (which I'm not sure this is the best approach and I have an issue to revisit this) and requires some additional setup before starting the service. This part I did not get to fixing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/cancervariants/variation-normalization/tree/docker-compose-kori Here is my branch where I made some changes. I didn't get to fixing the Cool-Seq-Tool additional genomic table in UTA db yet. @rajatkapoordfci thanks again for working on this. It's great work and exciting to see!

Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
version: "3.9"

services:
app:
build:
context: .
environment:
SEQREPO_ROOT_DIR: /usr/local/share/seqrepo/2024-02-20
AWS_ACCESS_KEY_ID: 'DUMMYIDEXAMPLE'
AWS_SECRET_ACCESS_KEY: 'DUMMYEXAMPLEKEY'
AWS_DEFAULT_REGION: 'us-west-2'
GENE_NORM_DB_URL: http://dynamodb-local:8000

links:
- "dynamodb-local"

ports:
- "8001:80"

volumes:
- seqrepo_vol:/usr/local/share/seqrepo
command: >
sh -c "pipenv run gene_norm_update --update_all --update_merged &&
cd src wait_for_db &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is wait_for_db?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait_for_db is the custom function to handle the database race condition , as depends on only checks whether service is running or not. It will be implemented in next phase. Please review further and I will remove this command for now along with other review comments(if any) for this phase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajatkapoordfci Would you be able to update the PR description with what this phase intended to achieve and what the next phase(s) include?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Add Phase scope in the description.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

pipenv run uvicorn variation.main:app --log-level debug --port 80 --host 0.0.0.0"
depends_on:
- db
- dynamodb-local
- uta

db:
image: biocommons/seqrepo:latest
volumes:
- seqrepo_vol:/usr/local/share/seqrepo

uta:
image: biocommons/uta:uta_20210129b
environment:
- POSTGRES_PASSWORD=some-password-that-you-make-up
volumes:
- seqrepo_vol:/var/lib/postgresql/data
ports:
- 5432:5432

dynamodb-local:
command: "-jar DynamoDBLocal.jar -sharedDb -dbPath ./data"
image: "amazon/dynamodb-local:1.18.0"
container_name: dynamodb-local
ports:
- "8000:8000"
user: root
volumes:
- "seqrepo_vol:/home/dynamodblocal/data"
working_dir: /home/dynamodblocal

volumes:
seqrepo_vol:
uta_vol:
Loading