Data preparation guide

To use covSampler to analyze your own data, you’ll need to prepare two files:

A FASTA file with viral genomic sequences.
A corresponding TSV file with metadata describing each sequence.

Format your sequence data

Prepare your nucleotide sequences in a FASTA format file named sequences.fasta.

You can see a formatted example sequence file here.

Format your metadata

Prepare your metadata in a TSV format file named metadata.tsv.

A metadata file must include the following fields:

Fields	Description	Format
strain	Sequence name	The strain values in the metadata file must match them in the fasta file
date	Collection date	YYYY-MM-DD (Ambiguous value is unacceptable)
region_exposure	Continent	Africa / Asia / Europe / North America / Oceania / South America
country_exposure	Country	Country
division_exposure	Administrative division	Division
pango_lineage*	Viral lineage under the Pango nomenclature	See the lastest Pango lineage list

* Currently covSampler workflow does not include Pango lineage assignment. You can perform the Pango lineage assignment using pangolin or nextclade.

You can see a formatted example metadata file here.

Create your project data directory

All data are in the data/ directory. The raw data and intermediate data of each project will be stored in its corresponding directory.

For a new project (here named tutorial_project):

Create your project data folder in data/.
Create rawdata/ folder in data/tutorial_project.
Move your sequence data and metadata into data/turotial_project/rawdata/ folder.

Now, the data/ directory structure should look like this:

data
├── README.md
├── example_project
│   └── rawdata
│       ├── metadata.tsv
│       └── sequences.fasta
└── tutorial_project
    └── rawdata
        ├── metadata.tsv
        └── sequences.fasta

What's next?

Run covSampler with your data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prep_data.md

prep_data.md

Data preparation guide

Format your sequence data

Format your metadata

Create your project data directory

What's next?

Files

prep_data.md

Latest commit

History

prep_data.md

File metadata and controls

Data preparation guide

Format your sequence data

Format your metadata

Create your project data directory

What's next?