Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to read three genome file formats together #5

Open
makerer5 opened this issue Jul 9, 2024 · 4 comments
Open

How to read three genome file formats together #5

makerer5 opened this issue Jul 9, 2024 · 4 comments

Comments

@makerer5
Copy link

makerer5 commented Jul 9, 2024

hi
I want to use pgcgap to construct a whole genome phylogenetic tree of 120 strains. I used pgcgap to read in the genome file, but it seems that pgcgap can only read files of the same format. However, my 120 genomes contain three formats of files: double-end R1.fq.gz, R2.fq.gz; single-end .fq.gz (downloaded from NCBI); genbank (.gb).
How can I read in 120 genome files in three formats using the following command:
pgcgap --All --platform illumina --filter_length 200 --ReadsPath Reads/Illumina --reads1 _1.fastq.gz --reads2 _2.fastq.gz --suffix_len 11 --kmmer 81 --genus Escherichia --species coli --codon 11 --strain_num 6 --threads 4 --VAR --refgbk /mnt/h/PGCGAP_Examples/Reads/MG1655.gbff --qualtype sanger

@liaochenlanruo
Copy link
Owner

Hi,
PGCGAP can only take one format for input. However, You can assemble paired-end reads and single-end reads separately, and then conduct other analyses. I do not recommend you to use gbk files for analysis. Instead, you can download the scaffolds file corresponding to the gbk, and use it together with the scaffolds file obtained from the previous assembly of reads as the input files for PGCGAP for downstream analysis.

@makerer5
Copy link
Author

makerer5 commented Jul 9, 2024

Thank you very much. This is indeed a very good idea, thank you for your guidance!

@makerer5
Copy link
Author

Hi
After you gave the instructions to "assemble single-end and double-end files separately", I downloaded ".fa.gz" and ".fasta" files from NCBI. How can I use pgcgap to read single-end files or fasta files?

@liaochenlanruo
Copy link
Owner

  • For single-pair reads, you can use the abyss software to assemble the reads to scaffold (fasta file), and then put all .fasta files in one folder/directory, for example, named as "Scaf".
# assemble the reads one by one
abyss-pe name=strainname k=81 se='.fa'
  • for the fasta files in the directory "Scaf", you can run the following command to annotate
pgcgap --Annotate --scafPath ./Scaf --Scaf_suffix .fasta  --codon 11 --threads 4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants