Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XGBoost different inference results on AMD64, ARM and PPC #44542

Open
smorovic opened this issue Mar 25, 2024 · 23 comments
Open

XGBoost different inference results on AMD64, ARM and PPC #44542

smorovic opened this issue Mar 25, 2024 · 23 comments

Comments

@smorovic
Copy link
Contributor

As discussed in PR #44473
we noticed discrepancy in XGBoost inference result with the new unit test RecoEgamma/PhotonIdentification/test/test_PhotonMvaXgb.cc.
Unit test passes on x86_64, but fails in identical fashion and with identical discrepancies on both PPC64 LE and ARM 64, happening in 4 out of 10 tests:

src/RecoEgamma/PhotonIdentification/test/test_PhotonMvaXgb.cc:44: FAILED:
  CHECK_THAT( xgbScore, Catch::Matchers::WithinAbs(mva_score_v1[i], 0.0001) )
with expansion:
  0.91074f is within 0.0001 of 0.9863399863

src/RecoEgamma/PhotonIdentification/test/test_PhotonMvaXgb.cc:44: FAILED:
  CHECK_THAT( xgbScore, Catch::Matchers::WithinAbs(mva_score_v1[i], 0.0001) )
with expansion:
  0.82656f is within 0.0001 of 0.9750099778

src/RecoEgamma/PhotonIdentification/test/test_PhotonMvaXgb.cc:44: FAILED:
  CHECK_THAT( xgbScore, Catch::Matchers::WithinAbs(mva_score_v1[i], 0.0001) )
with expansion:
  0.0f is within 0.0001 of 0.00179

src/RecoEgamma/PhotonIdentification/test/test_PhotonMvaXgb.cc:44: FAILED:
  CHECK_THAT( xgbScore, Catch::Matchers::WithinAbs(mva_score_v1[i], 0.0001) )
with expansion:
  0.93808f is within 0.0001 of 0.9837399721

PR was submitted to disable the check on non-x86_64 for now:
#44531

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 25, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

A new Issue was created by @smorovic.

@makortel, @Dr15Jones, @smuzaffar, @antoniovilela, @sextonkennedy, @rappoccio can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor

assign reconstruction, ml

@cmsbuild
Copy link
Contributor

New categories assigned: reconstruction,ml

@jfernan2,@mandrenguyen,@valsdav,@wpmccormack you have been requested to review this Pull request/Issue and eventually sign? Thanks

@smorovic
Copy link
Contributor Author

I fetched xgboost v1.7.5 (and for the subpackage dmlc-core master branch) from github and compiled with default options (cmake . ; make).
On x86_64 RHEL8 (gcc8) as well as lxplus9-arm (gcc11) it gives the same result for the test, which is consistent with the first check in the unit test.

#include <stdio.h>
#include <string>
#include <sstream>
#include <iostream>
extern "C" {
#include "xgboost/c_api.h"
}

int main() {

  int best_ntree_limit_ = 158;
  BoosterHandle booster;
  XGBoosterCreate(NULL, 0, &booster);
  XGBoosterLoadModel(booster, "/afs/cern.ch/user/s/smorovic/public/Photon_NTL_158_Endcap_v1.bin");

  std::stringstream config;
  config << "{\"training\": false, \"type\": 0, \"iteration_begin\": 0, \"iteration_end\": " << best_ntree_limit_
         << ", \"strict_shape\": false}";
  std::string config_ = config.str();

  float var[9];

  var[0] = 134.303;
  var[1] = 0.945981;
  var[2] = 0.0264346;
  var[3] = 0.012448;
  var[4] = 0.0208734;
  var[5] = 113.405;
  var[6] = 1.7446;
  var[7] = 0.00437808;
  var[8] = 0.303464;

  DMatrixHandle dmat;
  XGDMatrixCreateFromMat(var, 1, 9, -999.9f, &dmat);
  uint64_t const* out_shape;
  uint64_t out_dim;
  const float* out_result = NULL;
  XGBoosterPredictFromDMatrix(booster, dmat, config_.c_str(), &out_shape, &out_dim, &out_result);
  float ret = out_result[0];
  XGDMatrixFree(dmat);

  float exp = 0.98634;
  printf(" ===TEST===     VAL: %f\n", ret);
  printf(" ===EXPECTED=== VAL: %f\n", exp);
}
 ===TEST===     VAL: 0.986344
 ===EXPECTED=== VAL: 0.986340

If applying this cmsdist patch:
https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_14_1_X/master/xgboost-arm-and-ppc.patch
There is also no difference.

Finally,compiling with gcc12 (14_0_2 cmsenv) also give the same result.

Making above into unit test and running it on ARM actually passes:

Fail    2s ... RecoEgamma/PhotonIdentification/RecoEgammaPhotonIdentificationTest
Pass    1s ... RecoEgamma/PhotonIdentification/RecoEgammaPhotonIdentificationTest2

so I'm starting to be suspicious that there is something wrong with the unit test itself (regarding portability).

@smorovic
Copy link
Contributor Author

Found it.
if (abs(etaSC) < 1.5) --> (std::abs(etaSC) < 1.5) fixes the unit test on ARM:

Pass    1s ... RecoEgamma/PhotonIdentification/RecoEgammaPhotonIdentificationTest
Pass    1s ... RecoEgamma/PhotonIdentification/RecoEgammaPhotonIdentificationTest2

I noticed earlier that on x86_64 abs is a floating point version, but apparently not on other architectures?

I will push this fix to the unit test (reusing the existing PR and backport).

@smorovic smorovic changed the title XGBoost different inference results on AMD64, ARM and PPC XGBoost different inference results on AMD64, ARM and PPC (FIX) Mar 25, 2024
@smorovic smorovic changed the title XGBoost different inference results on AMD64, ARM and PPC (FIX) XGBoost different inference results on AMD64, ARM and PPC Mar 25, 2024
@makortel
Copy link
Contributor

I noticed earlier that on x86_64 abs is a floating point version, but apparently not on other architectures?

The C abs() function is actually for int, whereas fabs() is for double, i.e. the behavior on ARM was the correct one.

The only (or "easiest") way I could imagine the compiler's behavior on x86-64 would be that somehow it picked the C++ std::abs() instead of C abs(). But even that would sound strange.

@VinInn
Copy link
Contributor

VinInn commented Mar 26, 2024

It sounds really strange!

@smorovic
Copy link
Contributor Author

maybe some architecture specific header redefines it, for example #define abs fabs?

@VinInn
Copy link
Contributor

VinInn commented Mar 26, 2024

The only way I can make it happening is to have

using namespace std;

or

using std::abs;

somehow conditional on compiling on x86_64

@VinInn
Copy link
Contributor

VinInn commented Mar 26, 2024

 grep abs /data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/stdlib.h
using std::abs;
using std::labs;

if one wrongly does
include<stdlib.h>
it happens (also on ARM btw)
with
#include<cstdlib>
it's ok
( abs integer, std::abs templated)

ditto for #include<math.h> vs #include<cmath>

play with
https://godbolt.org/z/7fs559aWx

  1. find math.h or stdlib.h
  2. see if included only in x86_64 (clue the various intrinsics)

grep malloc /data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/lib/gcc/x86_64-redhat-linux-gnu/12.3.1/include/xmmintrin.h
/* Get _mm_malloc () and _mm_free (). */
#include <mm_malloc.h>
[innocent@gputest-genoa-01 (gpu-c2e35-08-01) CMSSW_14_0_0]$ cat /tmp/test_PhotonMvaXgb.i | less
[innocent@gputest-genoa-01 (gpu-c2e35-08-01) CMSSW_14_0_0]$ grep stdlib /data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/lib/gcc/x86_64-redhat-linux-gnu/12.3.1/include/mm_malloc.h
#include <stdlib.h>

@VinInn
Copy link
Contributor

VinInn commented Mar 26, 2024

with explicit include on xmmintrin.h

https://godbolt.org/z/cxo65rnr9

@makortel
Copy link
Contributor

makortel commented Mar 26, 2024

with explicit include on xmmintrin.h

Yeah, xmmintrin.h has

#include <mm_malloc.h>

which has

#include <stdlib.h>

(ah, this information was already in #44542 (comment))

@VinInn
Copy link
Contributor

VinInn commented Mar 26, 2024

let's see what they say
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114484

@slava77
Copy link
Contributor

slava77 commented Mar 26, 2024

I'm a bit lost in this thread discussion.
Isn't explicit use of std::abs in our coding rule? Shouldn't the related unit test change from abs to std::abs?
Or is there an evidence that std::abs can get overwritten in some circumstances by a C abs?

@VinInn
Copy link
Contributor

VinInn commented Mar 26, 2024 via email

@smorovic
Copy link
Contributor Author

I'm a bit lost in this thread discussion. Isn't explicit use of std::abs in our coding rule? Shouldn't the related unit test change from abs to std::abs? Or is there an evidence that std::abs can get overwritten in some circumstances by a C abs?

Unit test is going to be fixed to use std::abs, this is already submitted to two PRs (mentioned).
In fact std::abs was already correctly used in the related non-unit test code..

@mmusich
Copy link
Contributor

mmusich commented Mar 26, 2024

Isn't explicit use of std::abs in our coding rule?

well, I would have also thought so, but there is still discussion: cms-sw/cms-sw.github.io#99 (comment)

@guitargeek
Copy link
Contributor

guitargeek commented Apr 21, 2024

Why didn't you just stick with the very nice GBRForest you have in CMSSW for the MVA? Then you don't need the XGBoost C API dependency, and the GBRForest has even better performance! And it's also more platform independent. Furthermore, these ML tools evolve quickly, so maintaining the dependency can be work.

Translating from XGBoost to GBRForest is easy. I do this in my library, where I renamed the GBRForest from CMS as "FastForest", but the code is almost the same.

Actually I'm about to bring the GBRForest into ROOT itself, rebranded as "RBDT" this time 😆

So if at some point CMS uses a newer ROOT version (6.32 I guess) and wants to avoid the XGBoost dependency, it will be very easy. Meaning if the issue is not urgent, it can also be waited out.

@smorovic
Copy link
Contributor Author

Why didn't you just stick with the very nice GBRForest you have in CMSSW for the MVA? Then you don't need the XGBoost C API dependency, and the GBRForest has even better performance! And it's also more platform independent. Furthermore, these ML tools evolve quickly, so maintaining the dependency can be work.

Translating from XGBoost to GBRForest is easy. I do this in my library, where I renamed the GBRForest from CMS as "FastForest", but the code is almost the same.

Actually I'm about to bring the GBRForest into ROOT itself, rebranded as "RBDT" this time 😆

So if at some point CMS uses a newer ROOT version (6.32 I guess) and wants to avoid the XGBoost dependency, it will be very easy. Meaning if the issue is not urgent, it can also be waited out.

Hi @guitargeek,
we looked at it, but we didn't go with it initially due to difficulties in translating models.
We tried with XGBoost2TMVA and didn't get results we expected (it changed classifiers from [0,1] to [-1,1] range used by TMVA). And then we noticed that XGBoost is integrated as a tool, probably because of Python users...

We are aware that performance should be better with GBRForest. C API is optimized to run on multiple rows, and is relatively heavy on allocations when preparing to run inference..Besides it is also much slower on the full menu than running only selected paths (30 times!), I suspect either caching or heap allocations causing that.
Still, for now, them impact on HLT menus is low, (around 0.3% range) which was accepted by TSG.

Therefore, initially we are using C API directly. I agree that we should try to migrate to GRBForest in subsequent releases.

Thank you for suggesting the tool, I will have a look at it (in a week when I'm back from vacation).
Possible caveat is that "NTree limit" parameter of the model needs to be passed explicitely to the XGBoost C API (or even python API when loaded from the "bin" files that we're using) or inference always returns different result. Maybe it will be fine with txt files, but we need to try.

@guitargeek
Copy link
Contributor

guitargeek commented Apr 21, 2024

Thanks a lot for your answer, even from your vacations! Indeed the performance hit is big and you have to do memory allocations. But if you studies the performance impact and it turned out to be minimal, that's good.

I forgot about XGBoost2TMVA actually, indeed that would have been the easiest solution because the GBRForest can directly read it, if I remember correctly. I very well remember this problem with different outputs! I forgot the details, but applying an (inverse) logistic transformation and/or a simple re-scaling did the trick for me. Just plot the GBRForest and XGBoost output against each other in a scatter plot, and the functional relation will become apparent.

@jfernan2
Copy link
Contributor

@smorovic any further progress on this? Or has the issue been resolved with your PR? Thanks

@smorovic
Copy link
Contributor Author

Hello,
the problem with excessive creation of OpenMP threads was resolved by PR, so there is no urgent need to replace XGBoost.

Concerning migration away from XGBoost library, about two weeks ago I was looking at how to convert current "bin" files to TMVA models. So far failed to get something useful. It doesn't help that I didn't find code from my older attempt at this with XGBoost2TMVA.
Once I manage it, I will start numerically comparing inference with GRBForest to XGBoost and, if needed, see how to limit iteration to get accurate results wrt. XGBoost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants