Skip to content

Latest commit

 

History

History
120 lines (103 loc) · 5.74 KB

README.md

File metadata and controls

120 lines (103 loc) · 5.74 KB

IDR Client

IDR(Integrated Data Repository) Client is a tool that extracts data from a source(most likely a database), performs any transformations that may be required on the data and then transmits it to a remote server for further processing and consumption. The tool is authored in Python(3.9+) but working executable binaries for Linux can be found on the release section.

Coverage Status pre-commit Code style: black

Getting Started

To run the app locally, you can download the latest executable binary from the release section if you have a Linux box. Note that the binary has only been tested on the following Linux distros: Ubuntu 16.04 LTS, Ubuntu 18.04 LTS, Ubuntu 20.04 LTS and Fedora 36. Compatibility for other Linux distros might be possible but is not guaranteed. Users of other platforms can also clone this repo and set up the project on their computers but this much more involving. Both of these set up methods are described below as well as how to build an executable binary for other platforms.

1. Using the Executable Binary

This is by far the easiest way to set up and run the application and can be achieved using the following steps:

  1. Download the binary from the latest release:-

    curl https://github.com/savannahghi/idr-client/releases/download/v0.1.0/idr_client --output idr_client -L
  2. Make the downloaded binary executable:-

    chmod u+x idr_client
  3. Define a configuration file for the app to use. A template for the config file is provided with the tool, check the .config.template.yaml file and edit it to match your setup/needs.

  4. Once you are done with the config file, you can run the app as follows:-

    idr_client -c /path/to/your/config.yaml

    Replace /path/to/your/config.yaml with the correct path to your config file.

    You are now good to go 👍.

2. Cloning the Repo.

For this method, you will need have Python 3.9.0 (3.10 is recommended) or above installed. You could optionally create a virtualenv for the project separate from the system Python. Next, perform the following steps:

  1. Clone this repository (if you haven't already) and CD into the root project directory, that is, the directory containing pyproject.toml. Unless otherwise specified, this is the directory we are going to run all the rest of the commands from.

  2. Install the project's dependencies by running:-

     pip install -r requirements/base.txt
  3. Define a configuration file for the app to use. A template for the config file is provided with the tool, check the .config.template.yaml file and edit it to match your setup/needs.

  4. Once you are done with the config file, you can run the app as follows:-

    python -m app -c /path/to/your/config.yaml

    Replace /path/to/your/config.yaml with the correct path to your config file.

    That's it, you are now good to go 👍.

3. Create an Executable Binary (Optional)

For those wishing to create executable binaries for other platforms, you will need to follow most of method2's steps but with the following differences:

On step 2, install the project dependencies by running:-

pip install -r requirements/dev.txt

And then create the binary using the following command:-

pyinstaller app/__main__.py idr_client.spec

This will create an executable but the executable will still depend on the target system/computer having the correct system libraries. More details on this can be found here. To learn more about the pyinstaller command, check the docs here.

To create a fully statically linked executable, run the following command:-

staticx dist/idr_client_temp dist/idr_client

The executable binary can be found on the dist directory of the project. To learn more about the staticx command, check the docs here.

Concepts

This section is for the curious and those wishing to contribute. It provides a summary description of how the app works and the concepts and terms used in the project. These are:

  • Data Source Type - A data source type is just that, it describes a kind of data source together with the operations that can be performed around those data sources. Each data source type can have multiple data sources.
  • Data Source - A data source represents an entity that contains data of interest such as a database or a file. Each data source has multiple extra metadata.
  • Extract Metadata - This a description of the data to be extracted from a data source. An extract metadata also defines how data is extracted from a data source.
  • Upload Metadata - This describes data that has been extracted and how it's packaged for uploading to the remote server. Each upload metadata is always associated with a given extract metadata.

License

MIT License

Copyright (c) 2022, Savannah Informatics Global Health Institute