-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbcreator_encode: K562 bug #93
Comments
At least for the K562E bug, we need to hard code these particular files. Simply repl6acing K562E with K562 will break the "K562Ezh2" files, where Ezh2 is a legitimate factor. So, even in the wrong files, not only the K562E should be K562, but also the factors should be capitalized, e.g., Fos. |
In re K562b - we can rename the files. There is no factors that start with lowercase "b". But there are some factors starting with capital "B", e.g., Btf3. So, if the split algorithms in the dbcreator are case-sensitive (to my knowledge, yes), we should be good with renaming K562b to K562 |
Won't fix. Temporary solution - delete 'K562b' folders as they contain files the same as in regular 'K562'. The 'K562E' will be processed into correct 'K562' folder, but factor will be like 'Efos' - this can be corrected manually in the 'gf_description' file. |
Check for duplicates:
should be equal to
The "K562E" error is fixed. The "K562b" folders should be manually deleted using For hg19, 19,776 GFs become 19, 771 after removing duplicates. The duplicates are:
Filter grsnp_db/hg19/gf_descriptions.txt file:
|
K562 is the cell line that is parsed out of the file names. But sometimes it is labeled as K562b or as K562E, URLs below. These are depreciated cell line names, and are the same as K562.
Can we hard-code these exceptions, so the files are downloaded with the original names but processed specially. E.g.
wgEncodeSydhHistone/wgEncodeSydhHistoneK562bH3k27me3bUcdPk.narrowPeak.gz - should be K562-H3k27me2-Sydh
wgEncodeAwgTfbsUniform/wgEncodeAwgTfbsUchicagoK562EfosUniPk.narrowPeak.gz - should be K562-Fos-Uchicago
All noted bugs:
hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeSydhHistone/wgEncodeSydhHistoneK562bH3k27me3bUcdPk.narrowPeak.gz
hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeSydhHistone/wgEncodeSydhHistoneK562bH3k4me1UcdPk.narrowPeak.gz
hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeSydhHistone/wgEncodeSydhHistoneK562bH3k4me3bUcdPk.narrowPeak.gz
hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeSydhHistone/wgEncodeSydhHistoneK562bH3k9acbUcdPk.narrowPeak.gz
hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUniform/wgEncodeAwgTfbsUchicagoK562EfosUniPk.narrowPeak.gz
hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUniform/wgEncodeAwgTfbsUchicagoK562Egata2UniPk.narrowPeak.gz
hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUniform/wgEncodeAwgTfbsUchicagoK562Ehdac8UniPk.narrowPeak.gz
hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUniform/wgEncodeAwgTfbsUchicagoK562EjunbUniPk.narrowPeak.gz
hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUniform/wgEncodeAwgTfbsUchicagoK562EjundUniPk.narrowPeak.gz
The text was updated successfully, but these errors were encountered: