Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs for connector and download #13

Merged
merged 7 commits into from
Feb 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ extra_css:
plugins:
- search
- git-revision-date-localized
- include_dir_to_nav

# Additional configuration
extra:
Expand Down Expand Up @@ -83,8 +84,8 @@ markdown_extensions:
- pymdownx.critic
- pymdownx.details
- pymdownx.emoji:
emoji_index: !!python/name:materialx.emoji.twemoji
emoji_generator: !!python/name:materialx.emoji.to_svg
emoji_index: !!python/name:material.extensions.emoji.twemoji
emoji_generator: !!python/name:material.extensions.emoji.to_svg
- pymdownx.inlinehilite
- pymdownx.keys
- pymdownx.magiclink:
Expand Down Expand Up @@ -119,7 +120,6 @@ nav:
- "Sample Module": metadata/data_dictionary/sample-module.md
- "Sequencing Module": metadata/data_dictionary/sequencing-module.md
- "File Submission": metadata/data_dictionary/file-submission.md
- "Tools":
- "GHGA Validator": validator/validator.md
- "GHGA Transpiler": transpiler/transpiler.md
- "CLI Tools": cli_tools
- "Data Portal": data_portal
- "Glossary": glossary/glossary.md
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ pygments
markdown
pymdown-extensions
mkdocs-git-revision-date-localized-plugin
mkdocs-include-dir-to-nav
Binary file added user_docs/assets/img/dataset-link.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added user_docs/assets/img/dataset-select.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added user_docs/assets/img/token-form.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
115 changes: 115 additions & 0 deletions user_docs/cli_tools/connector.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# GHGA Connector

The GHGA Connector is a command line tool and Python library facilitating interaction with the file storage infrastructure of GHGA. Currently, it provides functionality for downloading and decrypting files.


## Installation and Upgrade

We recommend installing / upgrading to the latest version of the GHGA connector using pip.

Install or upgrade:
```bash
pip install --upgrade ghga-connector
```



### Crypt4gh Keys

GHGA Connector requires a [Crypt4GH](https://crypt4gh.readthedocs.io/en/latest/) key pair to download data. Please create a pair of Crypt4GH keys if you don't already have one. The public key is also needed for the creation of the download token through the Data Portal.

By default, GHGA Connector looks for the keys at **./key.pub** and **./key.sec**. You can either place your keys there or use CLI options to specify your key locations.


## Usage

```
Usage: ghga-connector [OPTIONS] COMMAND [ARGS]...

Options:
--install-completion [bash|zsh|fish|powershell|pwsh]
Install completion for the specified shell.
--show-completion [bash|zsh|fish|powershell|pwsh]
Show completion for the specified shell, to
copy it or customize the installation.
--help Show this message and exit.

Commands:
decrypt Command to decrypt a downloaded file
download Command to download files
```

## Download

The _`download`_ command is used to download files. In order to download files, you must provide a *download token*, which contains both the download instructions and authentication details.

Download command usage:
```bash
Usage: ghga-connector download [OPTIONS]

Command to download files

Options:
--output-dir PATH The directory to put the downloaded files into.
[required]
--my-public-key-path PATH The path to a public key from the Crypt4GH key
pair that was announced when the download token
was created. Defaults to key.pub in the current
folder. [default: ./key.pub]
--my-private-key-path PATH The path to a private key from the Crypt4GH key
pair that was announced when the download token
was created. Defaults to key.sec in the current
folder. [default: ./key.sec]
--help Show this message and exit.
```

### Download Token

GHGA Connector requires a download token to authenticate and process your request against GHGA Central. Each download request - which may comprise multiple files - is represented by a download token, which should be created via the GHGA Data Portal. For further information on how to create a download token, please refer to the [Data Download](../data_portal/data_download.md) documentation.

### Download Examples

1. To download a dataset:
```bash
ghga-connector download --output-dir <OUTPUT-DIR>
```
You will then be asked to provide the download token:
```
Please paste the complete download token that you copied from the GHGA data portal:
```
Paste the *download token* you created via the GHGA data portal and the download process will be initiated.



## Decrypt

The files you download are encrypted. To decrypt a file, please use the _`decrypt`_ command.

Decrypt command usage:
```bash
Usage: ghga-connector decrypt [OPTIONS]

Command to decrypt a downloaded file

Options:
--input-dir PATH Path to the directory containing files that
should be decrypted using a common decryption
key. [required]
--output-dir PATH Optional path to a directory that the decrypted
file should be written to. Defaults to input
dir.
--my-private-key-path PATH The path to a private key from the Crypt4GH key
pair that was announced when the download token
was created. Defaults to key.sec in the current
folder. [default: ./key.sec]
--help Show this message and exit.
```

### Decrypt Examples

1. To decrypt files:
```bash
ghga-connector decrypt --input-dir <INPUT-DIR>
```


File renamed without changes.
File renamed without changes.
3 changes: 3 additions & 0 deletions user_docs/css/extra.css
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@
--md-primary-fg-color--dark: #90030C;
}

.md-content {
--md-typeset-a-color: #E84614;
}
@font-face {
font-family: "Lexend";
src: "Lexend.ttf"
Expand Down
5 changes: 5 additions & 0 deletions user_docs/data_portal/data_access_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Data Access Request

The GHGA Data Portal enables users to request access to data through the portal. Browse for your dataset of interest and then click on the "Request Access" button. This will direct you to a data access request form. Complete the form with the necessary information and submit it to request access to the dataset. The data access committee will review your request and respond accordingly.

For further details on how to access data, please refer to the [Data Download](./data_download.md) documentation.
55 changes: 55 additions & 0 deletions user_docs/data_portal/data_download.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Data Download

Downloading data from datasets you have been granted access to is a two stage process:

* The download is prepared through the *Data Portal*. The corresponding dataset
is selected and the download potentially restricted to individual files from
the dataset. At the end of this process, a download token is generated and
shown to the user.

* Subsequently, the CLI tool [GHGA Connector](../cli_tools/connector.md) is used
to perform the actual file download using the previously generated download
token and the user's Crypt4GH key pair.

## Prerequisites

To perform a file download from GHGA, users are required to have genreated a
Crypt4GH keypair. The public key will be used to encrypt both the download token
and the actual files that are downloaded. For information on how to generate a
Crypt4GH keypair please refer to the official [Crypt4GH
documentation](https://crypt4gh.readthedocs.io/en/latest/).

## Download Preparation

After a user has been granted access to a dataset, the user initiates a data
download by creating a download token in the Data Portal. A single download
token can be generated to download either a single or multiple files from a
dataset. The download token is then passed on to the CLI tool GHGA Connector to
perform the actual download.

1. Navigate to the [GHGA Data Portal](https://data.ghga.de/).

2. Visit your profile page to see the datasets you have access to.

![Dataset access link](../assets/img/dataset-link.png){ width="500" }

3. Navigate to the dataset list and select your dataset of interest to be downloaded.

![Select dataset](../assets/img/dataset-select.png){ width="500" }

4. Fill the form with the necessary information in order to create a download
token. Specifying one or multiple file IDs is optional, if not information is
provided the entire dataset will be downloaded. A Crypt4GH public key must be
provided before submitting the form.

![Token form](../assets/img/token-form.png){ width="500" }


## Download using GHGA Connector

The GHGA Connector is a command-line tool that facilitates interaction with the
file storage infrastructure of GHGA. Data downloading is carried out using the
GHGA Connector.

For further information on how to use the command-line tool, please refer to the
[GHGA Connector](../cli_tools/connector.md) documentation.
Loading