-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KMC abrubtly not working #222
Comments
Hi, I know there are a few issues, like the irritating unknown exception (which in most cases is in fact not unknown, but wrongly propagated, so the user is seeing this nonsense message). We have this fixed, but not published yet. What is the amount of RAM you have on your machine?
Do you really need this switch? I mean it seems the dataset is quite small, so KMC will probably not use more than the default 12GB of RAM anyway. |
|
Sorry I didnt mean to close and reopen, just reply. I fixed the issue where it was killing it at 97%, however I have yet to find what is causing the unknown exception error. What is interesting is that it only throws the unknown exception with one particular file, I am working with fastp for filtering and paired end merging, all reads that aren't able to be merged are sent to two files reflecting the original files, however the reads that are merged are sent to a third file. When using kmc on either unmerged file it operates without issue, however when using it on the file of merged reads it throws the unknown exception error. I used a head -50 command to a manual inspection of the file for differences in structure, but they appear to be the same. What steps would you suggest I take to solve this? Thank you for your response by the way! |
Hi, could you share these files? I will try to reproduce. |
the unmerged one that works is a 10GB file, and I'd have to retrieve it from the HPC, would you be ok with me just sharing the merged file that doesn't work? its only about 67MB |
Sure, a smaller file causing issues is even better :) |
I had to zip it because Github said it doesn/t support the file type, but it should be a .fastq format when unzipped. Thank you! |
Thanks! I think I know the reason.
note that the header of quality is different than the header of sequence, which I believe is not allowed in fastq format. I mean the qual header should be either: empty (just +) or the same as sequence harder (Wikipedia seems to confirm that (but there are other sources saying the same): https://en.wikipedia.org/wiki/FASTQ_format: "Field 3 begins with a '+' character and is optionally followed by the same sequence identifier (and any description) again."). I think it's best to keep only the In summary, I think KMC behaviour is OK (except error message). |
Thank you so much! |
Hey, I wrote a sed command- sed -i 's/merged_[0-9]_[0-9]//g' "$mergeReadout" to delete the line that differentiated the two headers, for context here is a snippet of the problematic .fastq file now. @SRR5088929.119.1 119 length=51 As you can see the two headers are now identical, however kmc still throws the unknown exception error during processing. |
Update: I modified the command to include unmerged (basically increases the amount of data in the input file for kmc) and it seemed like there was an improvement because instead of throwing the unknown exception error it actually started stage 1, then threw the following error: Stage 1: 84% |
@marekkokot, hey I just wanted to update you on the status of the error:
sed -i 's/merged_[0-9]_[0-9]//g' "$mergeReadout" gzip "$mergeReadout" kmc -k27 -ci50 "$mergeReadout" histogram .
|
It seems you are not removing space before "merged".
|
For most of the time I've been using kmc it has been working with few issues. However I recently had to start working with .fastq file data stored on a directory in such a way that I need to use absolute paths (what I believe to be the root cause of the issue anyway).
This has caused me to get two kinds of errors, either:
A)
[jrosen5@c005:~/applied_proj/sandbox]$ bin/kmc -k27 -ci50 "/scratch/jrosen5/applied_proj/sandbox/data/PRCRreads/SRR5088929_1.fastq.gz" histogram . -sm
Stage 1: 94%Killed
or
B)
Error: unknown exception
The text was updated successfully, but these errors were encountered: