XGBoost different inference results on AMD64, ARM and PPC #44542

smorovic · 2024-03-25T21:15:09Z

As discussed in PR #44473
we noticed discrepancy in XGBoost inference result with the new unit test RecoEgamma/PhotonIdentification/test/test_PhotonMvaXgb.cc.
Unit test passes on x86_64, but fails in identical fashion and with identical discrepancies on both PPC64 LE and ARM 64, happening in 4 out of 10 tests:

src/RecoEgamma/PhotonIdentification/test/test_PhotonMvaXgb.cc:44: FAILED:
  CHECK_THAT( xgbScore, Catch::Matchers::WithinAbs(mva_score_v1[i], 0.0001) )
with expansion:
  0.91074f is within 0.0001 of 0.9863399863

src/RecoEgamma/PhotonIdentification/test/test_PhotonMvaXgb.cc:44: FAILED:
  CHECK_THAT( xgbScore, Catch::Matchers::WithinAbs(mva_score_v1[i], 0.0001) )
with expansion:
  0.82656f is within 0.0001 of 0.9750099778

src/RecoEgamma/PhotonIdentification/test/test_PhotonMvaXgb.cc:44: FAILED:
  CHECK_THAT( xgbScore, Catch::Matchers::WithinAbs(mva_score_v1[i], 0.0001) )
with expansion:
  0.0f is within 0.0001 of 0.00179

src/RecoEgamma/PhotonIdentification/test/test_PhotonMvaXgb.cc:44: FAILED:
  CHECK_THAT( xgbScore, Catch::Matchers::WithinAbs(mva_score_v1[i], 0.0001) )
with expansion:
  0.93808f is within 0.0001 of 0.9837399721

PR was submitted to disable the check on non-x86_64 for now:
#44531

The text was updated successfully, but these errors were encountered:

cmsbuild · 2024-03-25T21:15:28Z

cms-bot internal usage

cmsbuild · 2024-03-25T21:15:28Z

A new Issue was created by @smorovic.

@makortel, @Dr15Jones, @smuzaffar, @antoniovilela, @sextonkennedy, @rappoccio can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel · 2024-03-25T22:02:03Z

assign reconstruction, ml

cmsbuild · 2024-03-25T22:02:26Z

New categories assigned: reconstruction,ml

@jfernan2,@mandrenguyen,@valsdav,@wpmccormack you have been requested to review this Pull request/Issue and eventually sign? Thanks

smorovic · 2024-03-25T22:41:16Z

I fetched xgboost v1.7.5 (and for the subpackage dmlc-core master branch) from github and compiled with default options (cmake . ; make).
On x86_64 RHEL8 (gcc8) as well as lxplus9-arm (gcc11) it gives the same result for the test, which is consistent with the first check in the unit test.

#include <stdio.h>
#include <string>
#include <sstream>
#include <iostream>
extern "C" {
#include "xgboost/c_api.h"
}

int main() {

  int best_ntree_limit_ = 158;
  BoosterHandle booster;
  XGBoosterCreate(NULL, 0, &booster);
  XGBoosterLoadModel(booster, "/afs/cern.ch/user/s/smorovic/public/Photon_NTL_158_Endcap_v1.bin");

  std::stringstream config;
  config << "{\"training\": false, \"type\": 0, \"iteration_begin\": 0, \"iteration_end\": " << best_ntree_limit_
         << ", \"strict_shape\": false}";
  std::string config_ = config.str();

  float var[9];

  var[0] = 134.303;
  var[1] = 0.945981;
  var[2] = 0.0264346;
  var[3] = 0.012448;
  var[4] = 0.0208734;
  var[5] = 113.405;
  var[6] = 1.7446;
  var[7] = 0.00437808;
  var[8] = 0.303464;

  DMatrixHandle dmat;
  XGDMatrixCreateFromMat(var, 1, 9, -999.9f, &dmat);
  uint64_t const* out_shape;
  uint64_t out_dim;
  const float* out_result = NULL;
  XGBoosterPredictFromDMatrix(booster, dmat, config_.c_str(), &out_shape, &out_dim, &out_result);
  float ret = out_result[0];
  XGDMatrixFree(dmat);

  float exp = 0.98634;
  printf(" ===TEST===     VAL: %f\n", ret);
  printf(" ===EXPECTED=== VAL: %f\n", exp);
}

 ===TEST===     VAL: 0.986344
 ===EXPECTED=== VAL: 0.986340

If applying this cmsdist patch:
https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_14_1_X/master/xgboost-arm-and-ppc.patch
There is also no difference.

Finally,compiling with gcc12 (14_0_2 cmsenv) also give the same result.

Making above into unit test and running it on ARM actually passes:

Fail    2s ... RecoEgamma/PhotonIdentification/RecoEgammaPhotonIdentificationTest
Pass    1s ... RecoEgamma/PhotonIdentification/RecoEgammaPhotonIdentificationTest2

so I'm starting to be suspicious that there is something wrong with the unit test itself (regarding portability).

smorovic · 2024-03-25T22:47:14Z

Found it.
if (abs(etaSC) < 1.5) --> (std::abs(etaSC) < 1.5) fixes the unit test on ARM:

Pass    1s ... RecoEgamma/PhotonIdentification/RecoEgammaPhotonIdentificationTest
Pass    1s ... RecoEgamma/PhotonIdentification/RecoEgammaPhotonIdentificationTest2

I noticed earlier that on x86_64 abs is a floating point version, but apparently not on other architectures?

I will push this fix to the unit test (reusing the existing PR and backport).

makortel · 2024-03-26T13:32:20Z

I noticed earlier that on x86_64 abs is a floating point version, but apparently not on other architectures?

The C abs() function is actually for int, whereas fabs() is for double, i.e. the behavior on ARM was the correct one.

The only (or "easiest") way I could imagine the compiler's behavior on x86-64 would be that somehow it picked the C++ std::abs() instead of C abs(). But even that would sound strange.

VinInn · 2024-03-26T14:02:53Z

It sounds really strange!

smorovic · 2024-03-26T14:06:54Z

maybe some architecture specific header redefines it, for example #define abs fabs?

VinInn · 2024-03-26T14:12:40Z

The only way I can make it happening is to have

using namespace std;

or

using std::abs;

somehow conditional on compiling on x86_64

VinInn · 2024-03-26T14:38:51Z

 grep abs /data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/stdlib.h
using std::abs;
using std::labs;

if one wrongly does
include<stdlib.h>
it happens (also on ARM btw)
with
#include<cstdlib>
it's ok
( abs integer, std::abs templated)

ditto for #include<math.h> vs #include<cmath>

play with
https://godbolt.org/z/7fs559aWx

find math.h or stdlib.h
see if included only in x86_64 (clue the various intrinsics)

grep malloc /data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/lib/gcc/x86_64-redhat-linux-gnu/12.3.1/include/xmmintrin.h
/* Get _mm_malloc () and _mm_free (). */
#include <mm_malloc.h>
[innocent@gputest-genoa-01 (gpu-c2e35-08-01) CMSSW_14_0_0]$ cat /tmp/test_PhotonMvaXgb.i | less
[innocent@gputest-genoa-01 (gpu-c2e35-08-01) CMSSW_14_0_0]$ grep stdlib /data/cmssw/el9_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/lib/gcc/x86_64-redhat-linux-gnu/12.3.1/include/mm_malloc.h
#include <stdlib.h>

VinInn · 2024-03-26T14:51:22Z

with explicit include on xmmintrin.h

https://godbolt.org/z/cxo65rnr9

makortel · 2024-03-26T14:56:13Z

with explicit include on xmmintrin.h

Yeah, xmmintrin.h has

#include <mm_malloc.h>

which has

#include <stdlib.h>

(ah, this information was already in #44542 (comment))

VinInn · 2024-03-26T15:00:06Z

let's see what they say
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114484

slava77 · 2024-03-26T15:14:34Z

I'm a bit lost in this thread discussion.
Isn't explicit use of std::abs in our coding rule? Shouldn't the related unit test change from abs to std::abs?
Or is there an evidence that std::abs can get overwritten in some circumstances by a C abs?

VinInn · 2024-03-26T15:18:18Z

On Mar 26, 2024, at 4:14 PM, Slava Krutelyov ***@***.***> wrote: I'm a bit lost in this thread discussion. Isn't explicit use of std::abs in our coding rule?

YES.

Shouldn't the related unit test change from abs to std::abs?

YES.

Or is there an evidence that std::abs can get overwritten in some circumstances by a C abs?

NO. The opposite: C abs is overwritten by C++ std::abs in some circumstances.

smorovic · 2024-03-26T15:23:39Z

I'm a bit lost in this thread discussion. Isn't explicit use of std::abs in our coding rule? Shouldn't the related unit test change from abs to std::abs? Or is there an evidence that std::abs can get overwritten in some circumstances by a C abs?

Unit test is going to be fixed to use std::abs, this is already submitted to two PRs (mentioned).
In fact std::abs was already correctly used in the related non-unit test code..

mmusich · 2024-03-26T15:25:49Z

Isn't explicit use of std::abs in our coding rule?

well, I would have also thought so, but there is still discussion: cms-sw/cms-sw.github.io#99 (comment)

guitargeek · 2024-04-21T18:19:23Z

Why didn't you just stick with the very nice GBRForest you have in CMSSW for the MVA? Then you don't need the XGBoost C API dependency, and the GBRForest has even better performance! And it's also more platform independent. Furthermore, these ML tools evolve quickly, so maintaining the dependency can be work.

Translating from XGBoost to GBRForest is easy. I do this in my library, where I renamed the GBRForest from CMS as "FastForest", but the code is almost the same.

Actually I'm about to bring the GBRForest into ROOT itself, rebranded as "RBDT" this time 😆

So if at some point CMS uses a newer ROOT version (6.32 I guess) and wants to avoid the XGBoost dependency, it will be very easy. Meaning if the issue is not urgent, it can also be waited out.

smorovic · 2024-04-21T19:15:20Z

Why didn't you just stick with the very nice GBRForest you have in CMSSW for the MVA? Then you don't need the XGBoost C API dependency, and the GBRForest has even better performance! And it's also more platform independent. Furthermore, these ML tools evolve quickly, so maintaining the dependency can be work.

Translating from XGBoost to GBRForest is easy. I do this in my library, where I renamed the GBRForest from CMS as "FastForest", but the code is almost the same.

Actually I'm about to bring the GBRForest into ROOT itself, rebranded as "RBDT" this time 😆

So if at some point CMS uses a newer ROOT version (6.32 I guess) and wants to avoid the XGBoost dependency, it will be very easy. Meaning if the issue is not urgent, it can also be waited out.

Hi @guitargeek,
we looked at it, but we didn't go with it initially due to difficulties in translating models.
We tried with XGBoost2TMVA and didn't get results we expected (it changed classifiers from [0,1] to [-1,1] range used by TMVA). And then we noticed that XGBoost is integrated as a tool, probably because of Python users...

We are aware that performance should be better with GBRForest. C API is optimized to run on multiple rows, and is relatively heavy on allocations when preparing to run inference..Besides it is also much slower on the full menu than running only selected paths (30 times!), I suspect either caching or heap allocations causing that.
Still, for now, them impact on HLT menus is low, (around 0.3% range) which was accepted by TSG.

Therefore, initially we are using C API directly. I agree that we should try to migrate to GRBForest in subsequent releases.

Thank you for suggesting the tool, I will have a look at it (in a week when I'm back from vacation).
Possible caveat is that "NTree limit" parameter of the model needs to be passed explicitely to the XGBoost C API (or even python API when loaded from the "bin" files that we're using) or inference always returns different result. Maybe it will be fine with txt files, but we need to try.

guitargeek · 2024-04-21T19:23:29Z

Thanks a lot for your answer, even from your vacations! Indeed the performance hit is big and you have to do memory allocations. But if you studies the performance impact and it turned out to be minimal, that's good.

I forgot about XGBoost2TMVA actually, indeed that would have been the easiest solution because the GBRForest can directly read it, if I remember correctly. I very well remember this problem with different outputs! I forgot the details, but applying an (inverse) logistic transformation and/or a simple re-scaling did the trick for me. Just plot the GBRForest and XGBoost output against each other in a scatter plot, and the functional relation will become apparent.

jfernan2 · 2024-06-19T08:29:10Z

@smorovic any further progress on this? Or has the issue been resolved with your PR? Thanks

smorovic · 2024-06-19T09:19:53Z

Hello,
the problem with excessive creation of OpenMP threads was resolved by PR, so there is no urgent need to replace XGBoost.

Concerning migration away from XGBoost library, about two weeks ago I was looking at how to convert current "bin" files to TMVA models. So far failed to get something useful. It doesn't help that I didn't find code from my older attempt at this with XGBoost2TMVA.
Once I manage it, I will start numerically comparing inference with GRBForest to XGBoost and, if needed, see how to limit iteration to get accurate results wrt. XGBoost.

cmsbuild added the pending-assignment label Mar 25, 2024

cmsbuild added reconstruction-pending pending-signatures ml-pending and removed pending-assignment labels Mar 25, 2024

smorovic changed the title ~~XGBoost different inference results on AMD64, ARM and PPC~~ XGBoost different inference results on AMD64, ARM and PPC (FIX) Mar 25, 2024

smorovic changed the title ~~XGBoost different inference results on AMD64, ARM and PPC (FIX)~~ XGBoost different inference results on AMD64, ARM and PPC Mar 25, 2024

smorovic mentioned this issue Mar 25, 2024

HLT XGBoost Photon MVA and Diphoton combination filter #44473

Merged

mmusich mentioned this issue May 24, 2024

HLT crashes in Run 380399 #44923

Closed

makortel mentioned this issue May 28, 2024

reduce impact of OPENMP #45040

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XGBoost different inference results on AMD64, ARM and PPC #44542

XGBoost different inference results on AMD64, ARM and PPC #44542

smorovic commented Mar 25, 2024

cmsbuild commented Mar 25, 2024 •

edited

Loading

cmsbuild commented Mar 25, 2024

makortel commented Mar 25, 2024

cmsbuild commented Mar 25, 2024

smorovic commented Mar 25, 2024

smorovic commented Mar 25, 2024

makortel commented Mar 26, 2024

VinInn commented Mar 26, 2024

smorovic commented Mar 26, 2024

VinInn commented Mar 26, 2024

VinInn commented Mar 26, 2024 •

edited

Loading

VinInn commented Mar 26, 2024

makortel commented Mar 26, 2024 •

edited

Loading

VinInn commented Mar 26, 2024

slava77 commented Mar 26, 2024

VinInn commented Mar 26, 2024 via email

smorovic commented Mar 26, 2024

mmusich commented Mar 26, 2024

guitargeek commented Apr 21, 2024 •

edited

Loading

smorovic commented Apr 21, 2024

guitargeek commented Apr 21, 2024 •

edited

Loading

jfernan2 commented Jun 19, 2024

smorovic commented Jun 19, 2024

XGBoost different inference results on AMD64, ARM and PPC #44542

XGBoost different inference results on AMD64, ARM and PPC #44542

Comments

smorovic commented Mar 25, 2024

cmsbuild commented Mar 25, 2024 • edited Loading

cmsbuild commented Mar 25, 2024

makortel commented Mar 25, 2024

cmsbuild commented Mar 25, 2024

smorovic commented Mar 25, 2024

smorovic commented Mar 25, 2024

makortel commented Mar 26, 2024

VinInn commented Mar 26, 2024

smorovic commented Mar 26, 2024

VinInn commented Mar 26, 2024

VinInn commented Mar 26, 2024 • edited Loading

VinInn commented Mar 26, 2024

makortel commented Mar 26, 2024 • edited Loading

VinInn commented Mar 26, 2024

slava77 commented Mar 26, 2024

VinInn commented Mar 26, 2024 via email

smorovic commented Mar 26, 2024

mmusich commented Mar 26, 2024

guitargeek commented Apr 21, 2024 • edited Loading

smorovic commented Apr 21, 2024

guitargeek commented Apr 21, 2024 • edited Loading

jfernan2 commented Jun 19, 2024

smorovic commented Jun 19, 2024

cmsbuild commented Mar 25, 2024 •

edited

Loading

VinInn commented Mar 26, 2024 •

edited

Loading

makortel commented Mar 26, 2024 •

edited

Loading

guitargeek commented Apr 21, 2024 •

edited

Loading

guitargeek commented Apr 21, 2024 •

edited

Loading