-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XGBoost different inference results on AMD64, ARM and PPC #44542
Comments
cms-bot internal usage |
A new Issue was created by @smorovic. @makortel, @Dr15Jones, @smuzaffar, @antoniovilela, @sextonkennedy, @rappoccio can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign reconstruction, ml |
New categories assigned: reconstruction,ml @jfernan2,@mandrenguyen,@valsdav,@wpmccormack you have been requested to review this Pull request/Issue and eventually sign? Thanks |
I fetched xgboost v1.7.5 (and for the subpackage
If applying this cmsdist patch: Finally,compiling with gcc12 (14_0_2 cmsenv) also give the same result. Making above into unit test and running it on ARM actually passes:
so I'm starting to be suspicious that there is something wrong with the unit test itself (regarding portability). |
Found it.
I noticed earlier that on x86_64 I will push this fix to the unit test (reusing the existing PR and backport). |
The C The only (or "easiest") way I could imagine the compiler's behavior on x86-64 would be that somehow it picked the C++ |
It sounds really strange! |
maybe some architecture specific header redefines it, for example |
The only way I can make it happening is to have
or
somehow conditional on compiling on x86_64 |
if one wrongly does ditto for play with
grep malloc /data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/lib/gcc/x86_64-redhat-linux-gnu/12.3.1/include/xmmintrin.h |
with explicit include on xmmintrin.h |
Yeah, #include <mm_malloc.h> which has #include <stdlib.h> (ah, this information was already in #44542 (comment)) |
let's see what they say |
I'm a bit lost in this thread discussion. |
On Mar 26, 2024, at 4:14 PM, Slava Krutelyov ***@***.***> wrote:
I'm a bit lost in this thread discussion.
Isn't explicit use of std::abs in our coding rule?
YES.
Shouldn't the related unit test change from abs to std::abs?
YES.
Or is there an evidence that std::abs can get overwritten in some circumstances by a C abs?
NO. The opposite: C abs is overwritten by C++ std::abs in some circumstances.
|
Unit test is going to be fixed to use std::abs, this is already submitted to two PRs (mentioned). |
well, I would have also thought so, but there is still discussion: cms-sw/cms-sw.github.io#99 (comment) |
Why didn't you just stick with the very nice GBRForest you have in CMSSW for the MVA? Then you don't need the XGBoost C API dependency, and the GBRForest has even better performance! And it's also more platform independent. Furthermore, these ML tools evolve quickly, so maintaining the dependency can be work. Translating from XGBoost to GBRForest is easy. I do this in my library, where I renamed the GBRForest from CMS as "FastForest", but the code is almost the same. Actually I'm about to bring the GBRForest into ROOT itself, rebranded as "RBDT" this time 😆 So if at some point CMS uses a newer ROOT version (6.32 I guess) and wants to avoid the XGBoost dependency, it will be very easy. Meaning if the issue is not urgent, it can also be waited out. |
Hi @guitargeek, We are aware that performance should be better with GBRForest. C API is optimized to run on multiple rows, and is relatively heavy on allocations when preparing to run inference..Besides it is also much slower on the full menu than running only selected paths (30 times!), I suspect either caching or heap allocations causing that. Therefore, initially we are using C API directly. I agree that we should try to migrate to GRBForest in subsequent releases. Thank you for suggesting the tool, I will have a look at it (in a week when I'm back from vacation). |
Thanks a lot for your answer, even from your vacations! Indeed the performance hit is big and you have to do memory allocations. But if you studies the performance impact and it turned out to be minimal, that's good. I forgot about |
@smorovic any further progress on this? Or has the issue been resolved with your PR? Thanks |
Hello, Concerning migration away from XGBoost library, about two weeks ago I was looking at how to convert current "bin" files to TMVA models. So far failed to get something useful. It doesn't help that I didn't find code from my older attempt at this with XGBoost2TMVA. |
As discussed in PR #44473
we noticed discrepancy in XGBoost inference result with the new unit test
RecoEgamma/PhotonIdentification/test/test_PhotonMvaXgb.cc
.Unit test passes on x86_64, but fails in identical fashion and with identical discrepancies on both PPC64 LE and ARM 64, happening in 4 out of 10 tests:
PR was submitted to disable the check on non-x86_64 for now:
#44531
The text was updated successfully, but these errors were encountered: