Skip to content

Contributing Guide

Lawrence Fernandes edited this page Aug 19, 2022 · 6 revisions

How to Contribute

We'd love to accept your patches and contributions to this project. There are just a few small guidelines you need to follow.

Code Reviews

All submissions, including submissions by project members, require review. We use GitHub pull requests for this purpose. Consult GitHub Help for more information on using pull requests.

Before opening a pull request to suggest a feature change, please open a Github Issue to discuss the use-case and feature proposal with the project maintainers. After this aligment, you can fork the project, improve it and open a pull request!

Development

Debussy is currently beign developed on native Ubuntu/Debian linux distributions or through Windows Subsystem for Linux (WSL). It may not work properly on other OS such as MacOS, Windows/cygwin, or CentOS/Fedora/FreeBSD, etc.

IDE

Visual Studio Code (VS Code) or another IDE that supports Python (e.g. Pycharm).

If you're on Windows, use VS Code with Remote development in WSL).

WARNING: The option "Git: Rebase When Sync" must be active in File > Preferences > Settings.

WSL (Windows)

If you are on Windows, install WSL 2 and Ubuntu, according to Microsoft's tutorial. We can say that WSL 2 has almost full access to your machine's resources. It has access by default:

  • All hard drive.
  • Making full use of processing resources.
  • Using 80% of available RAM.
  • Using 25% of available memory for SWAP.

This might not be interesting, as WSL 2 can use almost every resource on your machine, but we can set limits.

Create a file called .wslconfig in the root of your user folder (e.g. C:\Users\<your_user>) and configure these settings:

[wsl2]
memory=8GB
processors=4
swap=2GB

These are example limits and the most basic settings to be used, configure them to your availabilities. For more details see wsl-2-settings.

To apply these settings it is necessary to restart the Linux distributions, so we suggest running the command in PowerShell: wsl --shutdown (This command will shut down all active WSL 2 instances and just open the terminal again to use it with the new settings).

Google Cloud SDK

  • Cloud SDK: in case of Windows, install directly from WSL and use the command gcloud init --console-only at startup. (Guide)

With this configuration, it is not necessary to use a service account locally, authentication is done by your GCP user.

  • (Optional) For deployment, you'll probably want a service account to be used by Debussy: create a GCP service account and export it's JSON access key, then create the GOOGLE_APPLICATION_CREDENTIALS environment variable pointing to your key.
# create the environment variable:
export GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json

# displays the credential path:
echo $GOOGLE_APPLICATION_CREDENTIALS

# displays the contents of the credential:
cat $GOOGLE_APPLICATION_CREDENTIALS

Include it in the .bashrc file and reload it with source, so that the environment variable created is available in new sessions.

Python

  • Python: comes by default on Linux, the same goes for Ubuntu distros on WSL. We currently use Python 3.8.x or 3.9.x. (Download)

    To check the version of Python installed, run the command python3 --version in the terminal.

  • Create virtual Python environment. To do this, in the chosen directory, run the following commands:

# Create the virtual environment:
python -m venv .debussy-env

# Activate the virtual environment:
source .debussy-env/bin/activate
# (Optional) Check the pip package manager version, and install if necessary:
pip3 --version
sudo apt install python3-pip

# Install virtualenv and virtualenvwrapper:
sudo pip install virtualenv virtualenvwrapper

# virtualenvwrapper configuration:
export WORKON_HOME=~/workspace/.virtualenvs
mkdir -p $WORKON_HOME
source /usr/local/bin/virtualenvwrapper.sh

# Creating the virtual environment:
mkvirtualenv debussy-env

# (Optional) If the virtual environment is not selected automatically:
workon debussy-env

# Add in startup (vim ~/.bashrc) the following commands:
VIRTUALENVWRAPPER_PYTHON=$(which python3)
export WORKON_HOME=~/workspace/.virtualenvs
source /usr/local/bin/virtualenvwrapper.sh

# Reload the .bashrc file:
source ~/.bashrc
  • To install Debussy, open a command or terminal window and:
  1. Clone the Debussy Concert repo
  2. cd into the root directory, where setup.py is located
  3. Enter: python setup.py install

Docker

If you're on Windows, we recommend using Docker Engine directly through your Ubuntu distro (Native Docker).

Install the pre-requisites:

sudo apt update && sudo apt upgrade
sudo apt remove docker docker-engine docker.io containerd runc
sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg \
    lsb-release

Add the Docker repository to the Ubuntu sources list:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
  "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Install Docker Engine:

sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io

Give permission to run Docker with your current user:

sudo usermod -aG docker $USER

Install Docker Compose:

sudo curl -L "https://github.com/docker/compose/releases/download/1.29.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose

Start the Docker service:

sudo service docker start

The above command will have to be run every time Ubuntu is restarted. If the Docker service is not running, it will show this error message:

Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

The above error is usually related to permissions, to fix it, run the following command:

sudo chmod 666 /var/run/docker.sock

(Optional) In Windows 11 it is possible to specify a default command to be executed whenever WSL is started, this allows us to put the docker service to start automatically. See WSL boot settings.

Airflow

For development or trying out Debussy, we recommend using Astro. Follow their official guide to install astro-cli.

With the CLI installed, follow the getting started to create a project. Once you have a Astro project configured, you'll need to configure some files.

First, you'll need to mount the path to the Debussy Concert repository through the docker-compose.override.yml file, according Astro's docs. The file will look like this:

version: "2"
services:
  scheduler:
    volumes:
      - /home/user/workspace/debussy_concert/debussy_concert:/usr/local/lib/python3.9/site-packages/debussy_concert
      - /home/user/workspace/debussy_concert/examples:/usr/local/airflow/dags/examples
      - /home/user/workspace/secrets/debussy-develop.json:/auth/debussy-develop.json
      - /home/user/workspace/environment/environment.yaml:/usr/local/airflow/dags/environment.yaml
    environment:
      - GOOGLE_APPLICATION_CREDENTIALS=/auth/debussy.json
      - GCP_PROJECT=gcp-project-id
      - DEBUSSY_CONCERT__DAGS_FOLDER=/usr/local/airflow/dags

Then, you need to update the packages.txt file:

gcc
g++
unixodbc-dev

Finaly, you need to update the requirements.txt file with the dependencies:

mysql-connector-python==8.0.24
pymssql==2.1.5
#pyodbc==4.0.32
google-cloud-datacatalog==3.0.0
google-cloud-datastore==1.11.0
google-cloud-bigquery==2.13.1
google-cloud-pubsub==2.6.1
google-cloud-secret-manager==2.4.0
google-cloud-storage==1.38.0
Inject==4.3.1
yaml-env-var-parser

Dependencies

Clone this wiki locally