Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download issue for reference genome fasta and gtf from ensembl ftp #12

Open
nukaemon opened this issue Jul 25, 2024 · 1 comment
Open

Comments

@nukaemon
Copy link

nukaemon commented Jul 25, 2024

The URL starting with 'ftp' is no longer accessible which causes a problem in get_genome and get_annotation in genome.smk.
To specify to use the URL starting with 'http' instead of 'ftp', 'url' param is available in these wrappers.
However, the version must be modified from v1.23.3 to newer to get this param work.

rule get_genome:
    output:
        expand("resources/reference_genome/{ref}/homo_sapiens.fasta",ref=config["ref"]["build"])
    params:
+       url="http://ftp.ensembl.org/pub",
        species="homo_sapiens",
        datatype="dna",
        build=config["ref"]["build"],
        release=config["ref"]["release"]
    log:
        outputdir + "logs/ensembl/get_genome.log"
    cache: "omit-software"
    wrapper:
-       "v1.23.3/bio/reference/ensembl-sequence"
+       "v3.13.8/bio/reference/ensembl-sequence"

rule get_annotation:
    output:
       expand("resources/reference_genome/{ref}/homo_sapiens.gtf",ref=config["ref"]["build"])
    params:
+       url="http://ftp.ensembl.org/pub",
        species="homo_sapiens",
        release=config["ref"]["release"] if config["ref"]["release"]=='GRCh38' else 87,
        build=config["ref"]["build"]
    cache: "omit-software"
    wrapper:
-       "v1.23.3/bio/reference/ensembl-annotation"
+       "v3.13.8/bio/reference/ensembl-annotation"
@nukaemon
Copy link
Author

nukaemon commented Jul 26, 2024

get_known_variation also need to be modified as below.
Regarding the line that specifies fai file path, it causes TypeError in python due to the change in v3.7.0.
To avoid the error, simply put [0] at the end.

rule get_known_variation:
    input:
        # use fai to annotate contig lengths for GATK BQSR
-        fai=expand("resources/reference_genome/{ref}/homo_sapiens.fasta.fai",ref=config["ref"]["build"])
+        fai=expand("resources/reference_genome/{ref}/homo_sapiens.fasta.fai",ref=config["ref"]["build"])[0]
    output:
        vcf=expand("resources/database/{ref}/variation.vcf.gz",ref=config["ref"]["build"])
    params:
+        url="http://ftp.ensembl.org/pub",
        species="homo_sapiens",
        build=config["ref"]["build"],
        release=config["ref"]["release"],
        type="all"
    cache: "omit-software"  # save space and time with between workflow caching (see docs)
    wrapper:
-        "v1.23.3/bio/reference/ensembl-variation"
+        "v3.13.8/bio/reference/ensembl-variation"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant