-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(physmon): compare GX2F vs KF truth tracking #3889
base: main
Are you sure you want to change the base?
Conversation
WalkthroughA new configuration file Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Do we want to do this comparision also on the gitlab ci? |
@AJPfleger I think we should keep the two configurations in sync. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should rather use the output of the two separate physmon workflows directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that too, but decided against it for three reasons:
- The outputs might be from different settings. (Still can be, but there is one layer less)
- I wanted to have a script:
- that directly generates a comparison
- can tune more parameters at the same time, like number of events, offsets, detector configuration (I used it already for detailed comparisons, where I only used a small part of the ODD), ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how you achieve that by calling runTruthTrackingKalman
and runTruthTrackingGx2f
. Most of the settings are hidden in these functions.
I would either build a complete reconstruction chain that runs both fitters in a single sequencer or use the existing outputs. The idea is that the configuration of all the fitters are the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how you achieve that by calling runTruthTrackingKalman and runTruthTrackingGx2f. Most of the settings are hidden in these functions.
The outputs might be from different settings. (Still can be, but there is one layer less)
We execute phys_perf_mon.sh
. Inside we execute the scripts physmon_trackfitting_kf.py
and physmon_trackfitting_gx2f.py
which both then call runTruthTrackingKalman
/runTruthTrackingGx2f
with the option-set:
events
numThreads
logLevel
trackingGeometry
field
digiConfigFile
outputDir
(which is still missing the option for trackFpes
)
The logLevel
and outputDir
setting are not relevant for the comparison, leaving still 5 settings, that could be avoid to have different by having a single script. This is what I wanted to say with "one layer less".
[A script,] that directly generates a comparison
If relying on the other outputs, one needs to run
phys_perf_mon.sh gx2f
phys_perf_mon.sh kf
phys_perf_mon.sh gx2f_vs_kf
each time when developing something. (This would probably end up in a local script anyway, which I wanted to avoid.) Locally, a large amount of time is spent on creating the plots at each phys_perf_mon.sh
-run, which I wanted to reduce. Of course not all 3 would need to be run for each change, especially, when developing for a single fitter, but it still makes development quite annoying.
[A script,] can tune more parameters at the same time, like number of events, offsets, detector configuration (I used it already for detailed comparisons, where I only used a small part of the ODD), ...
When someone wants to change some parameters of the comparison, they would need to touch with your approach at least 2 files and need to check, if everything is copied over correctly. By having it in a single file, development is
- a lot smoother, since only a single value needs to be changed, instead of 2
- a lot faster, since only a single file with the purpose of doing this needs to be touched, which is useful for remote development
- a lot safer, because both fitter will receive the same input reducing the place for human error
I added also more options to the script:
skip
trackFpes
skip
is useful, when starting to compare single events. I used this a lot when comparing the fitters track-by-track.
trackFpes
this is necessary, because the KF unfortunately has sometimes problems with unmasked FPEs. I don't want to turn it off in the original CI-run, because we really want to have it there.
I would either build a complete reconstruction chain that runs both fitters in a single sequencer or use the existing outputs.
I don't see the need for a full reconstruction chain. This is another large function, that needs to be maintained. Assuming that the configuration of all the fitters are the same it should be enough to have a direct call to the existing functions, which could be achieved with my small proposed script.
Does this make my idea behind the workflow clearer? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the physmon scripts should be used for any developments. This is something you should rather do with dedicated local scripts.
If relying on the other outputs, one needs to run
phys_perf_mon.sh gx2f phys_perf_mon.sh kf phys_perf_mon.sh gx2f_vs_kf
You can just run gx2f
and kf
when you ask for gx2f_vs_kf
. This is just a matter changing if
s.
I don't see the need for a full reconstruction chain. This is another large function, that needs to be maintained. Assuming that the configuration of all the fitters are the same it should be enough to have a direct call to the existing functions, which could be achieved with my small proposed script.
I think we should either go the full way and replicate a simulation and reconstruction chain which guarantees that you get the same event and synchronized outputs or we aim for minimal duplication. Right now it looks like it is in the middle with some duplication but seemingly no benefit to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the physmon scripts should be used for any developments. This is something you should rather do with dedicated local scripts.
I see your point. I will reconsider, if we need this feature at all and check, if can find more points or better words to explain the above mentioned benefits. :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is definitely nice to have. It seems a bit on the boundary of what physmon is supposed to do. Since usually we are comparing if something changed but here we just generate plots to look at and don't fail the CI at any point.
I though about doing something similar for the steppers/navigator where you can just look at differences instead of references. If we can separate these plots out from the monitoring plots, like you did by putting it at the bottom with a special indicator, it quite fine to me.
My only concern then would be CI run time where we can safe some time by reusing existing outputs instead of generating them twice, or even more often given multiple fitters or steppers in my example. Potentially the plotting itself is the heaviest part right now anyways and the reprocessing might not be a big concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now put both fitters in a single sequencer as suggested. I can live with that compromise. I had to adapt the whiteboard a bit to support multiple fitters:
Popped up in a discussion in #3889 that the configurations should be the same for all fitters. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Chores** - Updated parameter naming in function calls for improved clarity and readability. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Outside diff range and nitpick comments (6)
CI/physmon/config/info_only.yml (1)
1-7
: Document the purpose of disabled checks, we must.Hmmmm, clear the structure is, but documentation it lacks. Add comments to explain why all checks are disabled, you should. Help future maintainers understand the purpose, it will.
+# Configuration for GX2F vs KF truth tracking comparison +# All checks intentionally disabled for fail-safe implementation checks: "*": Chi2Test: null KolmogorovTest: null RatioCheck: null ResidualCheck: null IntegralCheck: nullCI/physmon/summary.py (1)
51-60
: Hmmmm, good changes these are, but a small improvement suggest I must.Clean and logical the implementation is, young padawan. Yet, extract the emoji constants to the top of the file, we should. Make future changes easier, this will.
# At the top of the file with other constants +EMOJI_COMPARISON = "🔵" +EMOJI_SUCCESS = "✅" +EMOJI_FAILURE = "🔴" # In the HTML generation - <li>🔵 <a href="{s["path"]}">{s["title"]}</a></li>""" + <li>{EMOJI_COMPARISON} <a href="{s["path"]}">{s["title"]}</a></li>""" - <li>{"✅" if s["total"] else "🔴"} <a href="{s["path"]}">{s["title"]}</a></li>""" + <li>{EMOJI_SUCCESS if s["total"] else EMOJI_FAILURE} <a href="{s["path"]}">{s["title"]}</a></li>"""CI/physmon/phys_perf_mon.sh (1)
465-475
: Consider GitLab CI resource optimization, mentioned in PR comments you have.In PR comments, discussion about limiting output to GitHub Actions exists. Implement configuration option for this, you should.
Consider adding an environment check:
if [[ "$mode" == "all" || "$mode" == "gx2f_vs_kf" ]]; then + # Skip resource-intensive comparison in GitLab CI + if [ "$CI_SERVER_NAME" = "GitLab" ]; then + echo "Skipping GX2F vs KF comparison in GitLab CI" + return + fi run_histcmp \CI/physmon/workflows/physmon_trackfitting_gx2f_vs_kf.py (3)
65-77
: Particle selection criteria, review you should.The
postSelectParticles
configuration may exclude particles unintentionally. Verify that thept
range andminMeasurements
align with your analysis needs.
97-105
: Consistency in track aliases, maintain you must.Assigning
whiteboardAliasTracks="tracks-gx2f"
should match the track names used downstream to prevent confusion.
129-136
: Consistency in Kalman track naming, vital it is.Match
whiteboardAliasTracks="tracks-kf"
with the naming conventions used to avoid confusion between different track collections.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
CI/physmon/config/info_only.yml
(1 hunks)CI/physmon/phys_perf_mon.sh
(3 hunks)CI/physmon/summary.py
(2 hunks)CI/physmon/workflows/physmon_trackfitting_gx2f_vs_kf.py
(1 hunks)Examples/Python/python/acts/examples/reconstruction.py
(6 hunks)
🔇 Additional comments (10)
CI/physmon/config/info_only.yml (1)
2-7
: Verify the intention of disabled checks, we must.
Disabled all checks are, young padawan. With the Force, I sense this aligns with the fail-safe implementation mentioned in the PR. Yet verify this intention explicitly, we should.
✅ Verification successful
Confirmed the intention is, young padawan. Aligned with the Force, this change is.
Clear it now becomes, through examination of other configuration files. In default.yml
and track fitting configurations, specific thresholds for these checks defined they are. Yet in info_only.yml
, disabled all checks are - a deliberate choice this appears to be.
Pattern observed in track fitting configs:
- Selective disabling of checks for specific metrics they do (
pullwidth_*
,pullmean_*
, etc.) - Default thresholds maintained for general cases
- Careful exceptions made where needed
Therefore, in info_only.yml
, disabling all checks aligns with its purpose - for information gathering only, without enforcing checks it is. Consistent with the fail-safe implementation mentioned in the PR, this approach is.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for related test configurations to ensure consistency
echo "Searching for similar check configurations..."
rg -l "Chi2Test|KolmogorovTest|RatioCheck|ResidualCheck|IntegralCheck" --type yaml
# Look for documentation about these checks
echo "Searching for documentation about these checks..."
rg "Chi2Test|KolmogorovTest|RatioCheck|ResidualCheck|IntegralCheck" --type md
Length of output: 553
Script:
#!/bin/bash
# Examine the content of related configuration files to understand the check patterns
echo "Default configuration, let us see..."
cat CI/physmon/config/default.yml
echo "\nOther configurations, we shall examine..."
for file in CI/physmon/config/trackfitting_*.yml; do
echo "\nInspecting $file:"
cat "$file"
done
# Search for any related Python files that might implement these checks
echo "\nImplementation files, we shall find..."
rg -l "class (Chi2Test|KolmogorovTest|RatioCheck|ResidualCheck|IntegralCheck)" --type py
Length of output: 3168
CI/physmon/summary.py (1)
83-87
: Strong with the Force, this code is. Approve it, I do.
Consistent with the HTML changes, this implementation is. Proper handling of comparison entries, it maintains. The path of the Jedi, you follow well.
CI/physmon/phys_perf_mon.sh (3)
23-24
: LGTM! Properly updated the mode validation, hmmmm.
Correctly added the new mode gx2f_vs_kf
to the valid options and updated the usage message, I see.
166-168
: LGTM! Dataset generation for comparison, well implemented it is.
The new conditional block properly generates the comparison dataset using run_physmon_gen
with appropriate title and slug.
465-475
: Verify the impact of disabled checks in info_only.yml, we must.
Hmmmm, disabled all checks in info_only.yml
as mentioned in PR objectives, I see. But verify the implications, we should.
Run this script to examine the config file:
Additionally, consider adding a comment in the code explaining why checks are disabled:
if [[ "$mode" == "all" || "$mode" == "gx2f_vs_kf" ]]; then
+ # Note: Using info_only.yml which disables all checks for fail-safe comparison
run_histcmp \
$outdir/data/trackfitting_gx2f_vs_kf/performance_trackfitting_gx2f.root \
✅ Verification successful
I seek to understand the normal configuration patterns and documentation about these checks, I do. Help us make a wise decision about disabling all checks, this will.
Safe and intentional, the disabling of checks appears to be, young padawan.
Clear in the Force, I see this matter now. The info_only.yml
follows the same pattern as trackfitting_gx2f.yml
, where checks are purposefully disabled. A common practice for comparison-only scenarios, this is. Other config files show normal threshold values, they do:
- Chi2Test: 0.01
- KolmogorovTest: 0.68
- RatioCheck: 3
- ResidualCheck: 3
- IntegralCheck: 3
Used only in gx2f_vs_kf
comparison mode, this configuration is. Intentional design choice, not oversight, mmmmm. For visualization without statistical validation, appropriate this is.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Examine the contents and usage of info_only.yml
# Check the contents of info_only.yml
cat CI/physmon/config/info_only.yml
# Find other files that might be affected by this configuration
rg -l "info_only.yml"
Length of output: 215
Script:
#!/bin/bash
# Search for other config files to understand the normal check patterns
fd yml --exec head -n 20 {} \; --exec echo "---" \; CI/physmon/config/
# Find where run_histcmp is defined to understand its usage of config
rg -A 5 "run_histcmp\(\)" CI/physmon/phys_perf_mon.sh
# Look for any documentation about these checks
fd README.md --exec cat {} \; CI/physmon/
Length of output: 3094
Examples/Python/python/acts/examples/reconstruction.py (4)
1286-1286
: Approve the new parameter addition to addKalmanTracks, I do!
Wise addition of whiteboardAliasTracks
parameter, this is. Flexibility in naming output tracks, it provides. Default value maintains backward compatibility, it does.
1422-1422
: Consistent implementation in addCKFTracks, I observe!
Mirror the pattern from addKalmanTracks, this change does. Harmony in the codebase, it maintains.
1573-1573
: Complete the trilogy of changes with addGx2fTracks, we do!
Consistent with its siblings, this implementation is. Balance to the Force... err... codebase, it brings.
1286-1286
: Verify the usage of these new parameters across the codebase, we must!
Search for existing track aliases, we shall, to ensure no conflicts arise.
Also applies to: 1422-1422, 1573-1573
✅ Verification successful
The Force guides me to search deeper into the codebase, particularly for the new parameter whiteboardAliasTracks
and its usage. This shall reveal if any conflicts with existing aliases exist, hmm.
Safe and well-designed, this parameter usage is, young Padawan!
Through the Force, I sense that:
- Default value "tracks" aligns with existing usage patterns, it does
- In CI workflows, unique aliases like "tracks-gx2f" and "tracks-kf" used properly, they are
- No conflicting usage detected across the codebase, hmmmm
- Parameterization allows flexibility without breaking existing code paths, wise choice this is
The parameter serves its purpose well, allowing customization where needed while maintaining backward compatibility. Trust in the code, we can!
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Search for track alias usage to verify no conflicts
# Search for whiteboard alias assignments for tracks
rg -A 2 "addWhiteboardAlias.*tracks.*"
Length of output: 2294
Script:
#!/bin/bash
# Search for track alias usage in function parameters and variable assignments
ast-grep --pattern 'whiteboardAliasTracks = $_'
# Search for any other track alias references
rg "whiteboardAliasTracks" -A 2
Length of output: 2238
CI/physmon/workflows/physmon_trackfitting_gx2f_vs_kf.py (1)
56-59
: Zero vertex spread, intentional is it?
Setting both mean
and stddev
to zero in GaussianVertexGenerator
results in all particles originating from a single point without any spatial spread.
Ensure this reflects your simulation goals. If a realistic vertex distribution desired is, adjust stddev
accordingly:
vtxGen=acts.examples.GaussianVertexGenerator(
mean=acts.Vector4(0, 0, 0, 0),
- stddev=acts.Vector4(0, 0, 0, 0),
+ stddev=acts.Vector4(0, 0, 0, 0.1 * u.ns),
),
Adds a new item to the physmon output to get the current comparison of GX2F and KF. This item cannot fail because all checks are turned off.
It looks something like this:
The FPE check for the KF is set to false due to this issue:
PointwiseMaterialInteraction::covarianceContributions
#3890The comparison highlights some interesting differences, which could be worked on in the future. e.g.:
--- END COMMIT MESSAGE ---
@asalzburger @XiaocongAi
blocked:
add<Fitter>Tracks()
#3917Summary by CodeRabbit
New Features
Bug Fixes