Configuration database(s) #3

felixhekhorn · 2020-12-03T14:29:02Z

Current status

1 theory database out of which a single record is selected for a fit
the record contains tentatively all information needed for evolution and DIS
(this is of course mainly due to historic reasons)

Problems

FNS actually should only matter for cross sections, but instead is also (ab-)used for evolution
- FONLL is not involved in evolution and its threshold does not play any role in evolution
- instead a dedicated threshold for cross section is needed, that can be chosen freely and independent of evolution thresholds
some of the settings are redundant: M_Z, M_W, sin(theta_w), G_F, alphaqed are not linearly independent
in principle the card could be divided into two parts (evolution <-> cross sections) with some few settings shared

Configurations ("o-cards")

both eko and yadism will be shipped default-less (in big contrast to APFEL)
both eko and yadism require some additional configurations:
- in eko "operators": the target scale of the operators, the discretization and some numerical details
- in yadism "observables": the discretization, DIS configurations (currents, hadron, lepton) and the target functions, e.g. F2total(x=0.1, Q2=90)
we require a mapping of the old settings to the new settings (which both programs already use at this point)
- our current implementation of this remapping is given here
- the current status of eko already ignores the FNS setting, but instead it has to be fed with the correct kcThr
- yadism has to be fed with the correct kDIScThr (the name can of course be changed to kxscThr or similar) to determine the thresholds and FNS to determine the (re-)combination of heavy/light coefficient functions
- in order to implement FONLL correctly (to our understanding) we need to deactivate the charm threshold in eko, but not in yadism

Proposed Workflow

pineko should determine a consistent configuration for both eko and yadism
ask them each in turn to compute their ingredients
join their respective outcome to provide what is needed: a mapping f_j(x,Q_0) -> theory prediction

Questions

how are the "observables" currently determined?
how can we organize and maintain the configurations?
how can we ensure the consistency between (experimental) dataset and theory?

The text was updated successfully, but these errors were encountered:

cschwan · 2020-12-03T17:15:35Z

How do you choose your independent parameters in DIS?
I think the answer to your question about how the obsersables are chosen is that this is done by the experiments (obviously), and we decide which one we consider for the fit, of course. For the collider processes the corresponding observables are stored in APPLgrids or fastNLO tables and collected in the https://github.com/NNPDF/applgrids repository, but I think the procedure is quite different for fixed-target experiments and DIS. In any case, you can look at the experimental data in https://github.com/NNPDF/nnpdf/tree/master/buildmaster/rawdata, which also contains DIS datasets. From this you should be able to reverse engineer them. In https://github.com/NNPDF/nnpdf/tree/master/buildmaster/filters you can take a look into the source files to find the corresponding papers. The apfelcomb database, https://github.com/NNPDF/apfelcomb/blob/master/db/apfelcomb.dat, might contain a few more important parameters (or not).
To answer the question of how to maintain this information I suggest anything that's more central and easier than the above. I think a natural place are the PineAPPL grids, so I suggest storing the information in them, maybe as metadata; see also Add new subcommand to read/write/add metadata pineappl#6.
For your last question I have an idea, which is basically making the entire toolchain automated. Let's say I change MW in the theory database, then the toolchain should calculate all the PineAPPL grids by running mg5_aMC, yadism, and what not, then run eko to convert the PineAPPL grids to a single scale, and finally generate the FK tables. Ideally there's some dependency management that detects which things are sensitive to which changes, and a caching system that maps run cards for the PineAPPL grids together with all the input parameters to a hash so we don't always calculate everything again. Or we simply follow the pragmatic physicsist's approach, which is be strong and don't screw up ;).

alecandido · 2020-12-03T17:59:13Z

Hi Christopher just a brief reply on two points, tomorrow we can discuss more:

"how the observables are chosen" from Felix post means: where do you specify which function/settings should be called to compute the theoretical prediction for those data
storing information in PineAPPL grids its fine, but definitely not enough: if we don't have any PineAPPL grid, since we are going to generate them, where it's stored the information describing what we should generate?

It's also fine to automate stuffs, but according to me this is a second-order problem: the first-order is how to run the stuffs manually in a reproducible way, i.e. keeping all the information used in generation. Of course it doesn't make much sense that the information on how to generate the objects it's only in the generated object (while if we are speaking about putting it also in them I perfectly agree, this will have also the benefit of being able to tell if an object it's or it's not up to date with the central information store, e.g. if you have changed the MW value).

cschwan · 2020-12-03T18:22:59Z

I see - and I also completely agree. Naturally, the source where it all comes from is a repository containing all the runcards, so something like https://github.com/scarrazza/applgridew. Thinking this idea further, we have two parts: process-dependent information, like cuts, and so on so forth, and process-independent information like parameter values. The former should be in runcards, and the latter in a database that defines the theory.

alecandido · 2020-12-04T09:22:35Z

Ok, I'm glad that a - at least partial - solution already exists, i.e. since you had to generate the hadronics you already thought about this.

Still there are three main points:

what you call theory database, the process-independent information, maybe it is process independent, but still it is almost-partitioned between processes and evolution, that deserves almost orthogonal specs
- nevertheless we can decide we would like to keep everything together the same, but if so, and if we're going to use a database (let's say relational), at least I'd keep the partition in different tables, and simply we can keep a third one, the selects one record per each table and allows for annotations (such that each fit can still use a unique ID, and not any - even slightly - more structure)
I had a look to applgridew, and I saw that what we need it's essentially a launch.txt file, while basically you need a further one (output.txt) for generating hadronic processes, because madgraph deserves its own language, but yadism instead it's purely declarative, just theory parameters and one runcard
- of course I'm not suggesting to change madgraph, but simply that we should account for the two different settings, and I'd like to use different names for yadism, 'cause (apart from stems, that sounds a little bit weird to me, even if I know they are coming from madgraph, but it doesn't matter) we would like to use yaml syntax (or in general a proper data serialization format, whatever) and not madgraph one
we also need a launch.txt-like for eko, I don't know where we'd like to put it, but the generation of evolution-specific runcards (not theory section, but the actual runcards containing vfns thresholds and so on), but when generating it should be validated against the runcard for the pineapplgrid you'd like to join: at this point eko it's really flexible, but in turn not all the combinations are guaranteed to be physical (so essentially any time you make the joint you have to check a number of constraints telling you that the thing makes sense)

cschwan · 2020-12-04T09:44:17Z

Can we maybe talk about this later after the phone conference?

felixhekhorn · 2020-12-04T09:45:37Z

Ideally there's some dependency management that detects which things are sensitive to which changes

that's why splitting is a good idea

Can we maybe talk about this later after the phone conference?

I think we first should agree on a proposal among ourselves before going "public" to the collaboration ...

cschwan · 2020-12-04T09:51:18Z

What I meant is can we (you, Alessandro, I and possibly Juan and Stefano) talk about it after the phone conference, that is in a meeting after the PC!

felixhekhorn · 2020-12-04T09:55:56Z

Sorry I missed the important word "after" ;-)

fine by me - after having lunch after the PC though ;-)

scarlehoff · 2020-12-04T10:12:55Z

I would also like to have lunch, but I have no real opinion on how the databases should behave so feel free to do it without me.

My only comment would be on:

To answer the question of how to maintain this information I suggest anything that's more central and easier than the above. I think a natural place are the PineAPPL grids, so I suggest storing the information in them, maybe as metadata; see also NNPDF/pineappl#6.

and

storing information in PineAPPL grids its fine, but definitely not enough: if we don't have any PineAPPL grid, since we are going to generate them, where it's stored the information describing what we should generate?
It's also fine to automate stuffs, but according to me this is a second-order problem: the first-order is how to run the stuffs manually in a reproducible way, i.e. keeping all the information used in generation.

I agree with both accounts. Specially with the last point.
I think the natural place for the metadata to be is the PineAPPL grid. This metadata can have for instance the form of a yaml file so that taken "by itself" is just the runcard(s) by which the grid was created with maybe version information. I think this covers all points?

It's what vp already does, which dumps all user info but also any defaults that were not set so that any output is reproducible if ran with the same version of the code.

Of course, in this case one would dump the runcard for yadism, eko, whatever montecarlo (sherpa or madgraph or anything) etc.

Ideally there's some dependency management that detects which things are sensitive to which changes

We will end up re-implementing reportengine :P

scarrazza · 2020-12-04T10:15:48Z

Sorry, I have to teach after the PC. I think you could write a specification for the database too, so we can discuss in more detail.

alecandido · 2020-12-04T11:41:40Z

Can we maybe talk about this later after the phone conference?

Later perfectly fine, right after lunch I would be in trouble, since I have to meet a student. But we should finish no later than 15.

Ideally there's some dependency management that detects which things are sensitive to which changes

We will end up re-implementing reportengine :P

I believe that pineko should really be kept minimal, so it should be just a combiner. If we have to implement a dependency manager as well should be really kept separate from the combiner itself, i.e. the same repo it's still fine, the same package not. If it is also already implemented (like reportengine) let's reuse it, if possible we will improve that one, reducing the entropy of NNPDF code.

cschwan · 2020-12-04T11:52:22Z

Can we maybe talk about this later after the phone conference?

Later perfectly fine, right after lunch I would be in trouble, since I have to meet a student. But we should finish no later than 15.

Please send an email with a link where we can meet!

felixhekhorn · 2022-03-25T10:02:20Z

the theory input scheme rework will not take place here (and instead might be discussed at the Code Meeting 2022)
the configuration for yadism is a concern for https://github.com/NNPDF/runcards
pineko does not deal with configurations, but takes them as input
the various path to these inputs instead will be implemented in Move permanent part of fkutils here #12

felixhekhorn assigned cschwan, felixhekhorn, scarrazza, alecandido and scarlehoff Dec 3, 2020

felixhekhorn linked a pull request Mar 24, 2022 that will close this issue

Move permanent part of fkutils here #12

Merged

1 task

scarlehoff closed this as completed in #12 Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration database(s) #3

Configuration database(s) #3

felixhekhorn commented Dec 3, 2020

cschwan commented Dec 3, 2020 •

edited

Loading

alecandido commented Dec 3, 2020 •

edited

Loading

cschwan commented Dec 3, 2020

alecandido commented Dec 4, 2020 •

edited

Loading

cschwan commented Dec 4, 2020

felixhekhorn commented Dec 4, 2020

cschwan commented Dec 4, 2020

felixhekhorn commented Dec 4, 2020

scarlehoff commented Dec 4, 2020 •

edited

Loading

scarrazza commented Dec 4, 2020

alecandido commented Dec 4, 2020 •

edited

Loading

cschwan commented Dec 4, 2020 •

edited

Loading

felixhekhorn commented Mar 25, 2022

Configuration database(s) #3

Configuration database(s) #3

Comments

felixhekhorn commented Dec 3, 2020

Current status

Problems

Configurations ("o-cards")

Proposed Workflow

Questions

cschwan commented Dec 3, 2020 • edited Loading

alecandido commented Dec 3, 2020 • edited Loading

cschwan commented Dec 3, 2020

alecandido commented Dec 4, 2020 • edited Loading

cschwan commented Dec 4, 2020

felixhekhorn commented Dec 4, 2020

cschwan commented Dec 4, 2020

felixhekhorn commented Dec 4, 2020

scarlehoff commented Dec 4, 2020 • edited Loading

scarrazza commented Dec 4, 2020

alecandido commented Dec 4, 2020 • edited Loading

cschwan commented Dec 4, 2020 • edited Loading

felixhekhorn commented Mar 25, 2022

cschwan commented Dec 3, 2020 •

edited

Loading

alecandido commented Dec 3, 2020 •

edited

Loading

alecandido commented Dec 4, 2020 •

edited

Loading

scarlehoff commented Dec 4, 2020 •

edited

Loading

alecandido commented Dec 4, 2020 •

edited

Loading

cschwan commented Dec 4, 2020 •

edited

Loading