-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import of BigWig files with rtracklayer fails under MultiCoreParam #111
Comments
Can you clarify what 'fails' mean? I see
|
Fails here means an error is raised, and a rather cryptic one relating to the parallel processing. When calling import with the which argument, it will produce a warning if you request one or more seqlevels not in the bigwig file. |
can you cut-and-paste the command and error message? It doesn't fail in this way for me... Also please add the output of |
Sure, continuing from the example in the first post:
Session info:
BiocManager::valid():
|
Your reproducible example doesn't reproduce for me ;) does
work? |
Ah, the plot thickens then! It seems bplapply fails specifically when run with MultiCoreParam AND the which argument is used with import. Here's another example (reusing files from the original example):
I have no idea how to decipher the error message. Could it be:
I originally encountered the problem using GenomicFiles::reduceByRange, which needs to load separate parts of a BigWigFile. |
And some more testing: Both of these work:
So MultiCoreParam only fails when more than 1 core is used. SnowParam works without errors. |
FYI, they recommend against using forked parallel processing in the RStudio Console. I would rule out that player. Try to replicate it without RStudio by running R in the terminal. |
Next I would try with: x <- parallel::mclapply(bw_plus, import, which=grl[[1]], mc.cores=3L) |
Aha, Mother Of All Error Messages:
|
I'm still not able to reproduce this. Can you confirm that a file 'test.R' containing only
generates the error when run from the command line with I guess that more careful management of seqlevels means that the warning messages and hence crash goes away...?
|
Still raises an error:
If this is some deep unexplained bug, you're right that it might be easier to modify the code to avoid warnings. |
The warning: Warning message:
In parallel::mccollect(wait = FALSE, timeout = 1) :
1 parallel job did not deliver a result strongly suggests that the forked worker process died, cf. Issue #112. The reason for why it died seems to be suggested by #111 (comment): > x <- parallel::mclapply(bw_plus, import, which=grl[[1]], mc.cores=3L)
*** caught segfault ***
*** caught segfault ***
address 0x10bdb8020, cause 'memory not mapped'address 0x10bdb8020, cause 'memory not mapped'
*** caught segfault ***
address 0x10bdb8020, cause 'memory not mapped' To me, the problem is most likely not in BiocParallel. I would:
|
I'll mention that |
Having very similar issues with @MalteThodberg Base on your first post, I'm assuming your package also relies on It seems |
I'm wondering if this has something to do with repeated calls to the underlying C library that are made every time you call import on a I ran into similar issues recently when trying to parallelize |
I'm not exactly sure why we went down the rabbit hole above, but I think it's likely (though a bit mysterious) to say that the C code in rtracklayer is not re-entrant, so access via forked processes has unpredictable results. I would guess that SnowParam (i.e., independent rather than forked processes) is more robust. This does require (a) remembering that these processes are independent, so individual packages need to be loaded explicitly (e.g., via I don't think there is an easy solution here -- if the C library is not re-entrant, then very considerable work will be needed to make it so. |
Is it possible to detect if the current R process is forked and warn when the non-reentrant method is called from a forked process? That would make it easier for users to discover the underlying issue. |
See also PR#18230 (WISH: Export parallel:::isChild() to make it possible to protect against forked processing). |
As @HenrikBengtsson notes it's possible to find out that one is in a forked process (useful for avoiding nested forked processes and over-subscription to the available CPUs) but unfortunately @DarwinAwardWinner there is no way to know that a particular function call is non-reentrant (n.b., it is speculation that this is the problem in the current thread or for @bschilder 's use; it would be useful, though not definitive, to show that the code works without problem under Despite the appeal of forked ( |
I'm seeing a weird bug in in my CAGEfightR package when trying to import BigWigFiles.
Here's a reproducible example:
To summarize:
What's causing this strange behavior?
The text was updated successfully, but these errors were encountered: