You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Datasources in myvariant.io may contain large files to download (e.g. dbsnp release 155 has 380GB). Due to various reasons (like FTP connection issues), the download may be incomplete, leading errors during the uploading processes.
Some of the datasource has MD5 checksum files available. It would be nice to download those *.md5 files as well and validate the data files in the post_dump phrases.
The validation in bash is quite straight forward. Each .md5 file is essentially a tuple of (checksum, filename). md5sum -c will read a .md5 file, re-calculate the checksum for that filename and match it with the origin checksum. E.g.
It's also feasible in python with built-in hashlib.md5(). See Generating an MD5 checksum of a file. Performance of feeding file content to hashlib should be taken into account before developing a MD5 helper class/function.
The text was updated successfully, but these errors were encountered:
Datasources in myvariant.io may contain large files to download (e.g.
dbsnp
release 155 has 380GB). Due to various reasons (like FTP connection issues), the download may be incomplete, leading errors during the uploading processes.Some of the datasource has MD5 checksum files available. It would be nice to download those
*.md5
files as well and validate the data files in thepost_dump
phrases.The validation in bash is quite straight forward. Each
.md5
file is essentially a tuple of(checksum, filename)
.md5sum -c
will read a.md5
file, re-calculate the checksum for thatfilename
and match it with the originchecksum
. E.g.It's also feasible in python with built-in
hashlib.md5()
. See Generating an MD5 checksum of a file. Performance of feeding file content tohashlib
should be taken into account before developing a MD5 helper class/function.The text was updated successfully, but these errors were encountered: