Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed target DIS DW datasets as variant #2176

Merged
merged 18 commits into from
Oct 24, 2024

Conversation

giacomomagni
Copy link
Contributor

Move the DW fixed target DIS to be a variant.

@giacomomagni giacomomagni linked an issue Oct 16, 2024 that may be closed by this pull request
@giacomomagni giacomomagni marked this pull request as draft October 16, 2024 15:53
@giacomomagni
Copy link
Contributor Author

giacomomagni commented Oct 16, 2024

I noticed that:

  • BCDMS DW had different FKtables splitted by Q2, is that because we forgot to update
    the standard BCDMS sets?

These 2 parameters are also different in the DW data and standard one:

  • nnpdf_metadata.experiment
  • observable_name.process_type
    can we add them to the variant options?

@RoyStegeman
Copy link
Member

I'm not sure if this is what you mean, but BCDMS has a different fktable for each beam energy because there is a degeneracy in Q2,x between them so pineappl cannot distinguish the measurements of a given kinematic point at different energies.

@giacomomagni
Copy link
Contributor Author

I'm not sure if this is what you mean, but BCDMS has a different fktable for each beam energy because there is a degeneracy in Q2,x between them so pineappl cannot distinguish the measurements of a given kinematic point at different energies.

I've noticed that this has not been updated, because we usually use the DW version:

@RoyStegeman
Copy link
Member

Ah that's your point. Yes you're right. I think you can just update the FKtables in the metadata? Because the difference is just in the uncertainties yaml. I would assume that in new theories the old fktables don't even exist

@scarlehoff
Copy link
Member

nnpdf_metadata.experiment

Nop... this became a problem quite quickly @RoyStegeman :___

I'd say use the nuclear/deuteron type which should be more inclusive in what correlations are allowed (right @enocera ?)

observable_name.process_type

Here even if they used to be different, they can probably be made into the same one because the data is the same.

@giacomomagni
Copy link
Contributor Author

giacomomagni commented Oct 17, 2024

The 2 remaining tests: test_pseudodata and test_overfit_metric are failing because of the change in nnpdf_metadata.experiment, I suspect (ie reverting the metadata makes them passing).
so there we need a better solution.

@scarlehoff
Copy link
Member

are failing because of the change in nnpdf_metadata.experiment

Because that's making the random numbers change? (the seed takes the experiment name)

If so... I don't really see a way of fixing it without having two separate datasets, but if it is only the random numbers we can accept that it changed (this tag will not be random-seed-equivalent with the previous one anyway) although maybe we will need to redo some of the fits in the tests.

@giacomomagni
Copy link
Contributor Author

giacomomagni commented Oct 18, 2024

So I gave a try to add experiment in the variant options,
somehow PlottingOptions still need a deepcopy somewhere, which I'm sure know how to fix
(I tried with object.__setattr__ but didn't work) .
If a7294bd it's not going in the correct direction then I suspect we have to give up on this PR.

@scarlehoff
Copy link
Member

scarlehoff commented Oct 18, 2024

somehow PlottingOptions still need a deepcopy somewhere, which I'm sure know how to fix

Why do you care about PlottingOptions?

... I'm wondering whether it wouldn't be better just to have them as separate datasets. The fact that two "variants" of the same dataset can come from a different experiment is... silly. We are just trying to get around a mistake of times past.

However, send me the runcard and I'll give it a go .

Copy link
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine? At least at the level of the tests. Let's see whether the fitbot works fine as well.

nnpdf_data/nnpdf_data/__init__.py Outdated Show resolved Hide resolved
@scarlehoff scarlehoff added the run-fit-bot Starts fit bot from a PR. label Oct 22, 2024
@giacomomagni giacomomagni marked this pull request as ready for review October 22, 2024 09:03
Copy link

Greetings from your nice fit 🤖 !
I have good news for you, I just finished my tasks:

Check the report carefully, and please buy me a ☕ , or better, a GPU 😉!

@scarlehoff
Copy link
Member

I think the runcards with DW are not understood correctly by vp :/

@giacomomagni
Copy link
Contributor Author

giacomomagni commented Oct 22, 2024

I think the runcards with DW are not understood correctly by vp :/

PDFs are identical, but yes the chi2 tables are weird 😅
Something looks wrong when reading the old filters no?

@scarlehoff
Copy link
Member

No, the problem is here https://github.com/NNPDF/nnpdf/blob/87daeb1b25cc913fe45a6d7d8be779f80c7ebffc/validphys2/src/validphys/config.py#L465 since we already had a variant. Not sure how to get around this without adding too much _DW_ specific code

The easy fix is to add:

# legacy_dw trumps everything
if variant is None or map_variant == "legacy_dw"

Btw, you need to add to this PR also the fixed target DY which also have DW variants.

@giacomomagni
Copy link
Contributor Author

# legacy_dw trumps everything
if variant is None or map_variant == "legacy_dw"

I see, Shall I go for it?

Btw, you need to add to this PR also the fixed target DY which also have DW variants.

Yes, but these are trivial as they only have a DW legacy (as EMC), so it's just renaming but no duplication.

@scarlehoff
Copy link
Member

Yes, let's try...

@scarlehoff scarlehoff added run-fit-bot Starts fit bot from a PR. and removed run-fit-bot Starts fit bot from a PR. labels Oct 22, 2024
Copy link

Greetings from your nice fit 🤖 !
I have good news for you, I just finished my tasks:

Check the report carefully, and please buy me a ☕ , or better, a GPU 😉!

@scarlehoff
Copy link
Member

scarlehoff commented Oct 23, 2024

Ok, this works now in terms of vp, however looking through the datafiles, they are actually not always the same, so you need to copy them as well for some of the datasets.

For instance, these two in master are not equal:

https://github.com/NNPDF/nnpdf/blob/master/nnpdf_data/nnpdf_data/commondata/NUTEV_CC_NOTFIXED_FE/data_legacy_NB-SIGMARED.yaml
https://github.com/NNPDF/nnpdf/blob/master/nnpdf_data/nnpdf_data/commondata/NUTEV_CC_NOTFIXED_FE_DW/data_legacy_NB-SIGMARED.yaml

The differences left in the fitbot are coming from there, but the fitbot does not include all datasets, so you should check. These can be added to the variant so it should just be a question of copying them over.

A quick check would be running this runcard for one replica for one epoch and then looking at the resulting .csv files for the generated pseudodata, if they are identical it's all good https://github.com/NNPDF/nnpdf/blob/master/n3fit/runcards/examples/nnpdf40-like.yml

@giacomomagni
Copy link
Contributor Author

Ok, this works now in terms of vp, however looking through the datafiles, they are actually not always the same, so you need to copy them as well for some of the datasets.

okay let me double check this.

@giacomomagni
Copy link
Contributor Author

giacomomagni commented Oct 23, 2024

Okay so somehow only Nutev had different central values.
Here you have a small script to check:

import yaml
import pathlib
import rich
import numpy as np

COMMONDATA_PATH = pathlib.Path(__file__).parent / "nnpdf_data/nnpdf_data/commondata"

def check_central(dataset, data_file):
    with open(COMMONDATA_PATH / f"{dataset}_DW" / data_file, encoding="utf-8") as f:
        central_dw = yaml.safe_load(f)['data_central']
    with open(COMMONDATA_PATH / dataset / data_file, encoding="utf-8") as f:
        central = yaml.safe_load(f)['data_central']

    try:
        np.testing.assert_allclose(central_dw, central)
    except AssertionError:
        rich.print(f"[red] {dataset}_DW has a different central value")
    return

if __name__ == "__main__":
    DATA_LIST = [
        "BCDMS_NC_NOTFIXED_D",
        "BCDMS_NC_NOTFIXED_P",
        "CHORUS_CC_NOTFIXED_PB",
        "NMC_NC_NOTFIXED",
        "NUTEV_CC_NOTFIXED_FE",
        "SLAC_NC_NOTFIXED_D",
        "SLAC_NC_NOTFIXED_P",
    ]
    for dataset in DATA_LIST:
        try:
            data_file = "data_legacy_EM-F2.yaml"
            check_central(dataset, data_file)
        except FileNotFoundError:
            data_file = "data_legacy_NB-SIGMARED.yaml"
            check_central(dataset, data_file)
            data_file = "data_legacy_NU-SIGMARED.yaml"
            check_central(dataset, data_file)

And the outcome of the test you suggested with the nutev data fixed.
They look identical now.

Fit on master:
datacuts_theory_fitting_pseudodata_table.csv

Fit on this branch:
datacuts_theory_fitting_pseudodata_table.csv

@scarlehoff scarlehoff removed the run-fit-bot Starts fit bot from a PR. label Oct 23, 2024
@scarlehoff scarlehoff added the run-fit-bot Starts fit bot from a PR. label Oct 23, 2024
Copy link

Greetings from your nice fit 🤖 !
I have good news for you, I just finished my tasks:

Check the report carefully, and please buy me a ☕ , or better, a GPU 😉!

@scarlehoff scarlehoff force-pushed the fixed_target_dis_dw_variants branch from 15174ca to d6ce694 Compare October 24, 2024 04:29
@giacomomagni giacomomagni merged commit 0fadd2d into master Oct 24, 2024
6 checks passed
@giacomomagni giacomomagni deleted the fixed_target_dis_dw_variants branch October 24, 2024 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data toolchain run-fit-bot Starts fit bot from a PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revisit implementation of all DIS
3 participants