Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the NNPDF data instead of yamldb #159

Merged
merged 14 commits into from
Mar 26, 2024
4 changes: 2 additions & 2 deletions .github/workflows/bench.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,11 @@ jobs:
virtualenvs-create: false
installer-parallel: true
- name: Install dependencies
run: poetry install --no-interaction --no-root --with test ${{ inputs.poetry-extras }}
run: poetry install --no-interaction --no-root --with test -E nnpdf
- name: Install project
# it is required to repeat extras, otherwise they will be removed from
# the environment
run: poetry install --no-interaction ${{ inputs.poetry-extras }}
run: poetry install --no-interaction -E nnpdf
- name: Install task runner
run: pip install poethepoet
- name: Lint with pylint
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/unittests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ jobs:
test:
strategy:
matrix:
python-version: ["3.8", "3.9", "3.10"]
python-version: ["3.9", "3.10", "3.11", "3.12"]

uses: NNPDF/workflows/.github/workflows/python-poetry-tests.yml@v2
with:
python-version: ${{ matrix.python-version }}
poetry-extras: "-E nnpdf"
30 changes: 22 additions & 8 deletions docs/source/overview/prerequisites.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,11 @@ This is a standard example:

::

[general]
nnpdf = true

[paths]
# inputs
ymldb = "data/yamldb"
grids = "data/grids"
theory_cards = "data/theory_cards"
operator_card_template_name = "_template.yaml"
Expand All @@ -30,10 +32,24 @@ This is a standard example:

All the relevant inputs are described below. The command ``pineko scaffold new`` will generate all necessary folders.

*ymldb*
-------
nnpdf
-----
The key ``nnpdf`` tells ``pineko`` it should use the data files from NNPDF to map datasets to FK Tables.
If this key is given, a valid installation of ``nnpdf`` needs to be available as well.
i.e, ``pineko`` should be installed with the ``nnpdf`` extra (``pip install pineko[nnpdf]``).

Alternatively, it is possible not to set this key (or set it to false) and instead
provide a path with ``yaml`` files containing such dataset-FK mapping.
If a custom database of mappings is to be used, the path to the folder containing
this files needs to be explicitly provided:

::

You need all files of the *ymldb* [2]_ which define the mapping from datasets to FK tables.
[paths]
ymldb = "data/yamldb"

These `yaml` files (which should be named `<dataset>.yaml`)
define a mapping from datasets to FK tables.
An actual (rather simple) example is the following:

::
Expand All @@ -51,7 +67,7 @@ can be used (for instance ``ratio``).
Theory Runcards
---------------

You need to provide the necessary theory runcards named with their respective theory ID inside the *paths.theory_cards* folder [3]_.
You need to provide the necessary theory runcards named with their respective theory ID inside the *paths.theory_cards* folder [1]_.
For more details about theory runcards you can look at https://eko.readthedocs.io/en/latest/code/IO.html under **Theory Runcards**.

Default Operator Card
Expand Down Expand Up @@ -122,6 +138,4 @@ There are typically two ways to obtain grids:
Notes
-----

.. [2] this is to be replaced by the new CommonData format implemented by NNPDF

.. [3] this is to be replaced by a binding to the NNPDF theory objects
.. [1] this is to be replaced by a binding to the NNPDF theory objects
3,199 changes: 2,295 additions & 904 deletions poetry.lock

Large diffs are not rendered by default.

16 changes: 10 additions & 6 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ authors = [
"Alessandro Candido <[email protected]>",
"Andrea Barontini <[email protected]>",
"Felix Hekhorn <[email protected]>",
"Juan Cruz-Martinez <[email protected]>",
]
classifiers = [
"Programming Language :: Python",
Expand All @@ -23,15 +24,17 @@ repository = "https://github.com/N3PDF/pineko"
packages = [{ include = "pineko", from = "src" }]

[tool.poetry.dependencies]
python = ">=3.8,<3.12"
eko = "^0.14.0"
pineappl = "^0.6.2"
python = ">=3.9,<3.13"
eko = "^0.14.2"
pineappl = ">0.7.0"
PyYAML = "^6.0"
numpy = "^1.21.0"
pandas = "^1.4.1"
pandas = "^2.1"
rich = "^12.5.1"
click = "^8.0.4"
tomli = "^2.0.1"
nnpdf = { git = "https://github.com/NNPDF/nnpdf", optional = true}
lhapdf-management = { version = "^0.5", optional = true }

[tool.poetry.group.docs]
optional = true
Expand All @@ -48,15 +51,16 @@ optional = true
pytest = "^7.1.3"
pytest-cov = "^4.0.0"
pytest-env = "^0.6.2"
pylint = "^2.11.1"
banana-hep = "^0.6.11"
pylint = "^3.1.0"
banana-hep = "^0.6.13"

[tool.poetry.group.dev.dependencies]
pdbpp = "^0.10.3"
ipython = "^8.0"

[tool.poetry.extras]
docs = ["sphinx", "sphinx-rtd-theme", "sphinxcontrib-bibtex"]
nnpdf = ["nnpdf", "lhapdf-management"]

[tool.poetry.scripts]
pineko = "pineko:command"
Expand Down
18 changes: 16 additions & 2 deletions src/pineko/configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
"Holds loaded configurations"

NEEDED_KEYS = [
"ymldb",
"operator_cards",
"grids",
"operator_card_template_name",
Expand All @@ -23,6 +22,7 @@
]

NEEDED_FILES = ["operator_card_template_name"]
GENERIC_OPTIONS = "general"


def defaults(base_configs):
Expand Down Expand Up @@ -59,8 +59,22 @@ def enhance_paths(configs_):
configs_ : dict
configuration
"""
required_keys = list(NEEDED_KEYS)
giacomomagni marked this conversation as resolved.
Show resolved Hide resolved
# Check that one of nnpdf / ymldb is given
generic_options = configs_.get(GENERIC_OPTIONS, {})
if generic_options.get("nnpdf", False):
# Fail as soon as possible
try:
import validphys
except ModuleNotFoundError:
raise ModuleNotFoundError(
"Cannot use `nnpdf=True` without a valid installation of NNPDF"
)
else:
required_keys.append("ymldb")

# required keys without default
for key in NEEDED_KEYS:
for key in required_keys:
if key not in configs_["paths"]:
raise ValueError(f"Configuration is missing a 'paths.{key}' key")
if key in NEEDED_FILES:
Expand Down
32 changes: 28 additions & 4 deletions src/pineko/fonll.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import yaml

from . import configs, parser, theory_card
from .utils import read_grids_from_nnpdf

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -44,6 +45,20 @@ class InconsistentInputsError(Exception):
"""Raised if the inputs are not consistent with FONLL."""


def _json_theory_read(theory_info):
"""Read the theory info from a pineappl grid.

Theory information within pineappl grids might not be saved in a consistent
json-compatible way. This functions corrects some found problems.
"""
try:
ret = json.loads(theory_info)
except json.decoder.JSONDecodeError:
# Some grids have the theories saved with incompatible formats
ret = json.loads(theory_info.replace("'", '"').replace("True", "true"))
return ret


class FONLLInfo:
"""Class containing all the information for FONLL predictions."""

Expand Down Expand Up @@ -97,7 +112,11 @@ def dataset_name(self):
@property
def theorycard_no_fns_pto(self):
"""Return the common theory info between the different FONLL FK tables."""
theorycards = [json.loads(self.fks[p].key_values()["theory"]) for p in self.fks]
theorycards = []
for pinefk in self.fks.values():
thinfo = pinefk.key_values()["theory"]
theorycards.append(_json_theory_read(thinfo))

# Only these should differ
for card in theorycards:
del card["FNS"]
Expand All @@ -122,11 +141,12 @@ def Q2grid(self):
def update_fk_theorycard(combined_fk, input_theorycard_path):
"""Update theorycard entries for the combined FK table.

Update by reading the yamldb of the original theory.
Update by reading the yaml file of the original theory.
"""
with open(input_theorycard_path, encoding="utf-8") as f:
final_theorycard = yaml.safe_load(f)
theorycard = json.loads(combined_fk.key_values()["theory"])

theorycard = _json_theory_read(combined_fk.key_values()["theory"])
theorycard["FNS"] = final_theorycard["FNS"]
theorycard["PTO"] = final_theorycard["PTO"]
theorycard["NfFF"] = final_theorycard["NfFF"]
Expand Down Expand Up @@ -228,8 +248,12 @@ def assembly_combined_fk(
else:
tcard["DAMPPOWERb"] = 0
tcard["DAMPPOWERc"] = 0

# Getting the paths to the grids
grids_name = grids_names(configs.configs["paths"]["ymldb"] / f"{dataset}.yaml")
grids_name = read_grids_from_nnpdf(dataset, configs.configs)
if grids_name is None:
grids_name = grids_names(configs.configs["paths"]["ymldb"] / f"{dataset}.yaml")

for grid in grids_name:
# Checking if it already exists
new_fk_path = configs.configs["paths"]["fktables"] / str(theoryid) / grid
Expand Down
19 changes: 13 additions & 6 deletions src/pineko/theory.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from eko.runner.managed import solve

from . import check, configs, evolve, parser, scale_variations, theory_card
from .utils import read_grids_from_nnpdf

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -116,12 +117,18 @@ def load_grids(self, ds):
grids : dict
mapping basename to path
"""
paths = configs.configs["paths"]
_info, grids = parser.get_yaml_information(
paths["ymldb"] / f"{ds}.yaml", self.grids_path()
)
# the list is still nested, so flatten
grids = [grid for opgrids in grids for grid in opgrids]
# Take fktable information from NNPDF
raw_grids = read_grids_from_nnpdf(ds, configs.configs)
if raw_grids is not None:
grids = [self.grids_path() / i for i in raw_grids]
else:
paths = configs.configs["paths"]
_info, raw_grids = parser.get_yaml_information(
paths["ymldb"] / f"{ds}.yaml", self.grids_path()
)
# the list is still nested, so flatten
grids = [grid for opgrids in raw_grids for grid in opgrids]

# then turn into a map name -> path
grids = {grid.stem.rsplit(".", 1)[0]: grid for grid in grids}
return grids
Expand Down
35 changes: 35 additions & 0 deletions src/pineko/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
"""Shared utilities for pineko.

Common tools typically used by several pineko functions.
"""

from .configs import GENERIC_OPTIONS


def read_grids_from_nnpdf(dataset_name, configs=None):
"""Read the list of fktables given a dataset name.

If NNPDF is not available, returns None.

Parameters
----------
dataset_name: str
configs: dict
dictionary of configuration options
if None it it assumed that the NNPDF version is required
"""
if configs is not None:
if not configs.get(GENERIC_OPTIONS, {}).get("nnpdf", False):
return None

# Import NNPDF only if we really want it!
from nnpdf_data import legacy_to_new_map
from validphys.commondataparser import EXT
from validphys.loader import Loader

# We only need the metadata, so this should be enough
dataset_name, variant = legacy_to_new_map(dataset_name)
cd = Loader().check_commondata(dataset_name, variant=variant)
fks = cd.metadata.theory.FK_tables
# Return it flat
return [f"{i}.{EXT}" for operand in fks for i in operand]
16 changes: 16 additions & 0 deletions tests/test_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import pytest

from pineko import utils


@pytest.mark.parametrize(
"dsname", ["HERA_NC_318GEV_EAVG_CHARM-SIGMARED", "ATLAS_DY_7TEV_46FB_CC"]
)
def test_nnpdf_grids(dsname):
"""Checks that the grids can be read out from the NNPDF theory metadata."""
grids = utils.read_grids_from_nnpdf(dsname)
# Check that we get _something_
assert len(grids) > 0
# And that they look like pineappl grids
for grid_name in grids:
grid_name.endswith(".pineappl.lz4")
Loading