Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insert a list from a dataset into the a section of a config.yml file! #6

Open
ccbaumler opened this issue Jan 10, 2024 · 0 comments
Open
Labels
documentation Improvements or additions to documentation

Comments

@ccbaumler
Copy link
Collaborator

Please consider the documentation at dib-lab/genome-grist#284 to rapidly include lists into a config file.

This is able to take the tsv sample below and import the list of Assembly Accession identifiers directly into the config file for the spacegraphcats workflow:

The config:
# This is the file path to the metadata file.
# In this case, the file is the full metadata
# output of the SRA Run Selector.

metadata_file_path: metadata/SraRunTable.txt

# Directories
workdir: ~/dissertation-project/seqs

outdir: dissertation-project/seqs

prevent_sra_download: False

# The kmer size within the database (`sourmash sig fileinfo`)
k_size:
  - 21
  - 31
#  - 51 is too large for khmer abundtrimming

# Query genomes for spacegraphcats
query_genomes:
 - GCA_000349525.1

query_radius:
  - 1
  - 5
  - 10

# The amount to scale representative kmer set
scale:
  - 1000
The tsv:
Assembly Accession	Assembly Name	Organism Name	Annotation Name	Assembly Stats Total Sequence Length	Assembly Level	Assembly Release Date	WGS project accession
GCA_000143535.4	ASM14353v4	Botrytis cinerea B05.10	Annotation submitted by Syngenta Biotechnology, Inc.	42630066	Complete Genome	2015-02-05	
GCF_000143535.2	ASM14353v4	Botrytis cinerea B05.10	Annotation submitted by Syngenta Biotechnology, Inc.	42630066	Complete Genome	2015-02-05	
GCA_019186565.1	ASM1918656v1	Botrytis cinerea		42721243	Contig	2021-07-09	JAHHFM01
GCA_019186575.1	ASM1918657v1	Botrytis cinerea		42739314	Contig	2021-07-09	JAHHFN01
GCA_031205075.1	Bcin_M3a_1.1	Botrytis cinerea		43592014	Contig	2023-09-07	JARWBL01
GCA_015148055.1	ASM1514805v1	Botrytis cinerea		41439596	Contig	2020-10-30	JACVFN01

The code:

awk -F'\t' 'NR>1 && NF {print " - " $1}' assembly-test.tsv | sed "/query_genomes:/r /dev/stdin" -i sgc-prep-config.yml
The updated config:
# This is the file path to the metadata file.
# In this case, the file is the full metadata
# output of the SRA Run Selector.

metadata_file_path: metadata/SraRunTable.txt

# Directories
workdir: ~/dissertation-project/seqs

outdir: dissertation-project/seqs

prevent_sra_download: False

# The kmer size within the database (`sourmash sig fileinfo`)
k_size:
  - 21
  - 31
#  - 51 is too large for khmer abundtrimming

# Query genomes for spacegraphcats
query_genomes:
 - GCA_000143535.4
 - GCF_000143535.2
 - GCA_019186565.1
 - GCA_019186575.1
 - GCA_031205075.1
 - GCA_015148055.1
 - GCA_000349525.1

query_radius:
  - 1
  - 5
  - 10

# The amount to scale representative kmer set
scale:
  - 1000
@ccbaumler ccbaumler added the documentation Improvements or additions to documentation label Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant