Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit test and workflow failures because of illegal characters in MonitorElement path name #38885

Closed
makortel opened this issue Jul 28, 2022 · 16 comments

Comments

@makortel
Copy link
Contributor

The unit test testSiStripDQM_OfflineTkMap started to fail in CMSSW_12_5_X_2022-07-28-1100 with a segfault

Thu Jul 28 12:47:37 CEST 2022
Thread 1 (Thread 0x2aab6c76d1c0 (LWP 3183) "cmsRun"):
#3  0x00002aab71b25aeb in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/nweek-02743/el8_amd64_gcc10/cms/cmssw/CMSSW_12_5_X_2022-07-24-0000/lib/el8_amd64_gcc10/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00002aab9ba2767c in TkHistoMap::getValue(DetId) () from /cvmfs/cms-ib.cern.ch/nweek-02743/el8_amd64_gcc10/cms/cmssw/CMSSW_12_5_X_2022-07-24-0000/lib/el8_amd64_gcc10/libDQMSiStripCommon.so
#6  0x00002aab9b8bc5ea in SiStripTrackerMapCreator::createInfoFile(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, TTree*, dqm::legacy::DQMStore&, GeometricDet const*) () from /cvmfs/cms-ib.cern.ch/nweek-02743/el8_amd64_gcc10/cms/cmssw/CMSSW_12_5_X_2022-07-24-0000/lib/el8_amd64_gcc10/libDQMSiStripMonitorClient.so
#7  0x00002aab9b8581c0 in SiStripOfflineDQM::endRun(edm::Run const&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/nweek-02743/el8_amd64_gcc10/cms/cmssw/CMSSW_12_5_X_2022-07-24-0000/lib/el8_amd64_gcc10/pluginDQMSiStripMonitorClientPlugins.so
#8  0x00002aab69307479 in edm::one::EDProducerBase::doEndRun(edm::RunTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02743/el8_amd64_gcc10/cms/cmssw-patch/CMSSW_12_5_X_2022-07-28-1100/lib/el8_amd64_gcc10/libFWCoreFramework.so
#9  0x00002aab692eb5e0 in edm::WorkerT<edm::one::EDProducerBase>::implDoEnd(edm::RunTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02743/el8_amd64_gcc10/cms/cmssw-patch/CMSSW_12_5_X_2022-07-28-1100/lib/el8_amd64_gcc10/libFWCoreFramework.so

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc10/CMSSW_12_5_X_2022-07-28-1100/unitTestLogs/DQM/SiStripMonitorClient#/

The cause probably lies in the beginning of the log

Running script: /data/cmsbld/jenkins/workspace/ib-run-qa/CMSSW_12_5_X_2022-07-28-1100/src/DQM/SiStripMonitorClient/test/test_SiStripDQM_OfflineTkMap.sh
xrdfs command status = 0
Using file /store/group/comm_dqm/DQMGUI_data/Run2018/ZeroBias/R0003191xx/DQM_V0001_R000319176__ZeroBias__Run2018B-PromptReco-v2__DQMIO.root and run 319176. Running in /data/cmsbld/jenkins/workspace/ib-run-qa/CMSSW_12_5_X_2022-07-28-1100/src/DQM/SiStripMonitorClient/test.
----- Begin Fatal Exception 28-Jul-2022 12:47:03 CEST-----------------------
An exception of category 'BadMonitorElementPathName' occurred while
   [0] Processing global begin Run run: 319176
   [1] Calling method for module SiStripOfflineDQM/'siStripOfflineAnalyser'
Exception Message:
 Monitor element path name: 'root://cms-xrd-global.cern.ch//store/group/comm_dqm/DQMGUI_data/Run2018/ZeroBias/R0003191xx/DQM_V0001_R000319176__ZeroBias__Run2018B-PromptReco-v2__DQMIO.root:/DQMData/Run 319176/CSC/By Lumi Section 393-393/EventInfo/reportSummaryMap' uses unacceptable characters.
 Acceptable characters are: /ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-+=_()# 
----- End Fatal Exception -------------------------------------------------
@makortel
Copy link
Contributor Author

assign dqm

@cmsbuild
Copy link
Contributor

New categories assigned: dqm

@jfernan2,@ahmad3213,@micsucmed,@rvenditti,@emanueleusai,@pmandrik you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

FYI @cms-sw/trk-dpg-l2

@makortel
Copy link
Contributor Author

Seems like caused by #38831

@makortel
Copy link
Contributor Author

The workflow 1042.0 step 3 fails in the same way

An exception of category 'BadMonitorElementPathName' occurred while
   [0] Processing  stream begin Run run: 305064 stream: 2
   [1] Calling method for module PPSAlignmentWorker/'ppsAlignmentWorker'
Exception Message:
 Monitor element path name: 'AlCaReco/PPSAlignment/worker/sector 45/near_far/x slices, N/5.0-5.5/h_y' uses unacceptable characters.
 Acceptable characters are: /ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-+=_()# 
----- End Fatal Exception -------------------------------------------------

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc10/CMSSW_12_5_X_2022-07-28-1100/pyRelValMatrixLogs/run/1042.0_RunExpressPhy2017F+RunExpressPhy2017F+TIER0EXPPPSCALALIG+ALCASPLITPPSALIG+ALCAHARVDPPSALIG/step3_RunExpressPhy2017F+RunExpressPhy2017F+TIER0EXPPPSCALALIG+ALCASPLITPPSALIG+ALCAHARVDPPSALIG.log#/

@makortel
Copy link
Contributor Author

assign alca

FYI @cms-sw/ctpps-dpg-l2 (for #38885 (comment))

@cmsbuild
Copy link
Contributor

New categories assigned: alca

@yuanchao,@francescobrivio,@malbouis,@tvami,@ChrisMisan you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel makortel changed the title Segfault in testSiStripDQM_OfflineTkMap unit test Unit test and workflow failures because of illegal characters in MonitorElement path name Jul 28, 2022
@tvami
Copy link
Contributor

tvami commented Jul 28, 2022

AlCaReco/PPSAlignment/worker/sector 45/near_far/x slices, N/5.0-5.5/h_y

Is it the comma that's wrong?

/DQMData/Run 319176/CSC/By Lumi Section 393-393/EventInfo/reportSummaryMap

I dont see anything here, is it the whitespace? Dont we have a tonne of MEs with whitespace in them? (that work anyway)

@mmusich
Copy link
Contributor

mmusich commented Jul 28, 2022

Is it the comma that's wrong?

Yes.

I dont see anything here, is it the whitespace? Dont we have a tonne of MEs with whitespace in them? (that work anyway)

I think the problem is that the path is checked (not only the histogram name) and when the file is remote, it includes also unacceptable characters in the lfn.

@tvami
Copy link
Contributor

tvami commented Jul 28, 2022

AlCaReco/PPSAlignment/worker/sector 45/near_far/x slices, N/5.0-5.5/h_y
Is it the comma that's wrong?
Yes.

@ChrisMisan can you please fix that? Thanks!

@mmusich
Copy link
Contributor

mmusich commented Jul 28, 2022

@cms-sw/dqm-l2

regarding

I think the problem is that the path is checked (not only the histogram name) and when the file is remote, it includes also unacceptable characters in the lfn.

I can either:

  1. Change the unit test such that the input file becomes local

    (cmsRun "${LOCAL_TEST_DIR}/SiStripDQM_OfflineTkMap_Template_cfg_DB.py" globalTag="$GT" runNumber="$RUN" dqmFile=" root://cms-xrd-global.cern.ch//$DQMFILE" detIdInfoFile="file.root") || die 'failed running SiStripDQM_OfflineTkMap_Template_cfg_DB.py' $?

  2. Lift the requirement of having a sanitized path and just require the histogram name to have acceptable characters.

  3. Change the logic in DQMStore.cc to ignore remote LFN in the path name in bookME.

  4. Any other change you propose

Please let me know.

@mmusich
Copy link
Contributor

mmusich commented Jul 29, 2022

Is it the comma that's wrong?
Yes.

There are actually two problems with path name.
The comma here

iBooker.setCurrentFolder(rootDir + "/" + scfg_.name_ + "/near_far/x slices, N/" + buf);

and here:

iBooker.setCurrentFolder(rootDir + "/" + scfg_.name_ + "/near_far/x slices, F/" + buf);

but also the point here:

sprintf(buf, "%.1f-%.1f", x_min, x_max);

Tagging also the original author of the problematic code @MatiXOfficial

@mmusich
Copy link
Contributor

mmusich commented Jul 30, 2022

Concerning my earlier comment #38885 (comment) since I didn't receive any feedback I decided to implement option 3.
The issue with the testSiStripDQM_OfflineTkMap unit test described at #38885 (comment) should be solved by #38905.

N.B. This won't solve the other issue with wf 1042.0 step 3 reported at #38885 (comment), so the PPS colleagues should still sanitize their code.

@emanueleusai
Copy link
Member

Concerning my earlier comment #38885 (comment) since I didn't receive any feedback I decided to implement option 3. The issue with the testSiStripDQM_OfflineTkMap unit test described at #38885 (comment) should be solved by #38905.

N.B. This won't solve the other issue with wf 1042.0 step 3 reported at #38885 (comment), so the PPS colleagues should still sanitize their code.

We are ok with any of the three options, though option 3 as far as I understand seem to be the most general solution, so we are definitely on board with this.

@tvami
Copy link
Contributor

tvami commented Aug 2, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants