Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rsync_from_ncbi.pl: FTP connection error: [Net::FTP] Timeout #895

Open
DukeVimes opened this issue Dec 5, 2024 · 9 comments
Open

rsync_from_ncbi.pl: FTP connection error: [Net::FTP] Timeout #895

DukeVimes opened this issue Dec 5, 2024 · 9 comments

Comments

@DukeVimes
Copy link

DukeVimes commented Dec 5, 2024

Using Kraken version 2.1.3
raken2-build --standard --use-ftp --threads 24 --db test-2024-12-05

I get:

Downloading nucleotide gb accession to taxon map... done.
Downloading nucleotide wgs accession to taxon map... done.
Downloaded accession to taxon map(s)
Downloading taxonomy tree data... done.
Uncompressing taxonomy data... done.
Untarring taxonomy tree data... done.
Step 1/2: Performing ftp file transfer of requested files
rsync_from_ncbi.pl: FTP connection error: [Net::FTP] Timeout

without the --use-ftp option I can't even download the taxon map.

I reduced rsync_from_ncbi.pl to the part that create the ftp connection, which does seem to work (at least I was able to download check.txt)

#!/usr/bin/env perl

use strict;
use warnings;
use File::Basename;
use Getopt::Std;
use Net::FTP;
use List::Util qw/max/;

my $PROG = "TEST";
my $SERVER = "ftp.ncbi.nlm.nih.gov";
my $SERVER_PATH = "/genomes";
my $FTP_USER = "anonymous";
my $FTP_PASS = "kraken2download";

sub ftp_connection {
    my $ftp = Net::FTP->new($SERVER, Passive => 1)
        or die "$PROG: FTP connection error: $@\n";
    $ftp->login($FTP_USER, $FTP_PASS)
        or die "$PROG: FTP login error: " . $ftp->message() . "\n";
    $ftp->binary()
        or die "$PROG: FTP binary mode error: " . $ftp->message() . "\n";
    $ftp->cwd($SERVER_PATH)
        or die "$PROG: FTP CD error: " . $ftp->message() . "\n";
    return $ftp;
}

my $ftp = ftp_connection();
warn "we got an ftp connection";
$ftp->get('check.txt');
warn "$PROG: ftp message: ".$ftp->message()."\n";
#        last if $ftp->get($_);
#        warn "$PROG: unable to download $_ on try $try of $ntries: ".$ftp->message()."\n";
#    die "$PROG: unable to download ftp://${SERVER}${SERVER_PATH}/$_\n" if $try == $ntries;
$ftp->quit;
@DukeVimes
Copy link
Author

I tried to see why rsync isnt working for me, with telnet ftp.ncbi.nlm.nih.gov 873 I get an connection to the NCBI rsync server, which indicates to me that port 873 is reachable, nevertheless I temporarily disabled the firewall. Alas, rsync -v --list-only rsync://ftp.ncbi.nlm.nih.gov/genomes doesnt return either.

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Dec 6, 2024

Hello,

Can you give k2 a try? It is a Python script that we are working on to replace the current suite of wrapper scripts.

You can download a library from NCBI like so:

k2 download-library --db <db> --library <library> --threads <n>

The --threads parameter is specific to k2 and will specify the number of connections used to fetch accession files and the number of processes used for post-processing said files.

To build a database you can run: k2 build --db <db> --threads <threads>

The standard database can be built using the command: k2 build --standard --db standard --threads <n>

N.B. the k2 script that shipped with current release of kraken 2 was very much a work in progress. If you are planning on using the script please fetch the most recent version from the kraken 2 master/main branch.

As always, feedback is very much appreciated.

@DukeVimes
Copy link
Author

I tried, but this led to multiple followup problems (most probably due to myself).
First I naively cloned the master/HEAD and called the k2 build --standard --db /mnt/db/db_acc/k2_test --threads 8.

This sucessfully downloaded a lot of files, but couldnt find the k2mask process.
Indeed k2mask isnt in the path.

Next I tried to do a fresh complete installation (without conda), using ./install_kraken2.sh, which resulted in Kraken 2 installation complete, I symlinked kraken2, kraken2-build and kraken2-inspect into the path.

But now ./k2 build --standard --db /mnt/db/db_acc/k2_test --threads 8 fails immediatly with http.client.BadStatusLine: c8 in client.py _read_status:

Traceback (most recent call last):
  File "/mnt/miniconda/lib/python3.12/concurrent/futures/process.py", line 263, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/db/db_acc/k2_bin/kraken2/scripts/./k2", line 1357, in http_download_file2
    response = conn.getresponse()
               ^^^^^^^^^^^^^^^^^^
  File "/mnt/miniconda/lib/python3.12/http/client.py", line 1428, in getresponse
    response.begin()
  File "/mnt/miniconda/lib/python3.12/http/client.py", line 331, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/mnt/miniconda/lib/python3.12/http/client.py", line 313, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: c8

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Dec 9, 2024

Ah, it is trying to use a masker that we added to kraken2 in the last release. You can either build it yourself and copy it to a location in your $PATH or use the --no-masking flag to skip the masking process.

Did you experience any issues with downloading the libraries?

@mohitsharma-123
Copy link

mohitsharma-123 commented Dec 10, 2024

unable to download kraken2 standard database using
kraken2-buid --standard --threads 96 --db kraken2db
or
kraken2-buid --standard --threads 96 --db kraken2db/ --use-ftp

@mohitsharma-123
Copy link

(kraken2) mohitsharma@deep:/data/mohitsharma$ kraken2-build --standard --threads 96 --db kraken2db
Downloading nucleotide gb accession to taxon map...rsync: [Receiver] failed to connect to ftp.ncbi.nlm.nih.gov (130.14.250.13): Connection timed out (110)
rsync: [Receiver] failed to connect to ftp.ncbi.nlm.nih.gov (130.14.250.12): Connection timed out (110)
rsync: [Receiver] failed to connect to ftp.ncbi.nlm.nih.gov (130.14.250.31): Connection timed out (110)
rsync: [Receiver] failed to connect to ftp.ncbi.nlm.nih.gov (130.14.250.10): Connection timed out (110)
rsync: [Receiver] failed to connect to ftp.ncbi.nlm.nih.gov (130.14.250.11): Connection timed out (110)
rsync: [Receiver] failed to connect to ftp.ncbi.nlm.nih.gov (130.14.250.7): Connection timed out (110)
rsync: [Receiver] failed to connect to ftp.ncbi.nlm.nih.gov (2607:f220:41e:250::31): Network is unreachable (101)
rsync: [Receiver] failed to connect to ftp.ncbi.nlm.nih.gov (2607:f220:41e:250::11): Network is unreachable (101)
rsync: [Receiver] failed to connect to ftp.ncbi.nlm.nih.gov (2607:f220:41e:250::12): Network is unreachable (101)
rsync: [Receiver] failed to connect to ftp.ncbi.nlm.nih.gov (2607:f220:41e:250::13): Network is unreachable (101)
rsync: [Receiver] failed to connect to ftp.ncbi.nlm.nih.gov (2607:f220:41e:250::10): Network is unreachable (101)
rsync: [Receiver] failed to connect to ftp.ncbi.nlm.nih.gov (2607:f220:41e:250::7): Network is unreachable (101)
rsync error: error in socket IO (code 10) at clientserver.c(139) [Receiver=3.3.0]

@mohitsharma-123
Copy link

(kraken2) mohitsharma@deep:~$ kraken2-build --standard --threads 96 --db kraken2_db/ --use-ftp
Downloading nucleotide gb accession to taxon map... done.
Downloading nucleotide wgs accession to taxon map... done.
Downloaded accession to taxon map(s)
Downloading taxonomy tree data... done.
Uncompressing taxonomy data... done.
Untarring taxonomy tree data... done.
Step 1/2: Performing ftp file transfer of requested files
rsync_from_ncbi.pl: unable to download all/GCF/023/238/205/GCF_023238205.1_ASM2323820v1/GCF_023238205.1_ASM2323820v1_genomic.fna.gz on try 1 of 5: Scanning file for viruses.
There'll be a delay while we scan for viruses.
Opening BINARY mode data connection for all/GCF/023/238/205/GCF_023238205.1_ASM2323820v1/GCF_023238205.1_ASM2323820v1_genomic.fna.gz (1321507 bytes)
Scanning for viruses.
Scanning for viruses.
Idle timeout (60 seconds): closing control connection

rsync_from_ncbi.pl: unable to download all/GCF/023/238/205/GCF_023238205.1_ASM2323820v1/GCF_023238205.1_ASM2323820v1_genomic.fna.gz on try 2 of 5: [Net::FTP] Connection closed
rsync_from_ncbi.pl: unable to download all/GCF/023/238/205/GCF_023238205.1_ASM2323820v1/GCF_023238205.1_ASM2323820v1_genomic.fna.gz on try 3 of 5: [Net::FTP] Connection closed
rsync_from_ncbi.pl: unable to download all/GCF/023/238/205/GCF_023238205.1_ASM2323820v1/GCF_023238205.1_ASM2323820v1_genomic.fna.gz on try 4 of 5: [Net::FTP] Connection closed
rsync_from_ncbi.pl: unable to download all/GCF/023/238/205/GCF_023238205.1_ASM2323820v1/GCF_023238205.1_ASM2323820v1_genomic.fna.gz on try 5 of 5: [Net::FTP] Connection closed
rsync_from_ncbi.pl: unable to download ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/023/238/205/GCF_023238205.1_ASM2323820v1/GCF_023238205.1_ASM2323820v1_genomic.fna.gz

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Dec 13, 2024

Did you try the https://github.com/DerrickWood/kraken2/blob/master/scripts/k2 recommended earlier in this thread?

@mohitsharma-123
Copy link

mohitsharma-123 commented Dec 17, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants