Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration database(s) #3

Closed
felixhekhorn opened this issue Dec 3, 2020 · 13 comments · Fixed by #12
Closed

Configuration database(s) #3

felixhekhorn opened this issue Dec 3, 2020 · 13 comments · Fixed by #12
Assignees

Comments

@felixhekhorn
Copy link
Contributor

Current status

  • 1 theory database out of which a single record is selected for a fit
  • the record contains tentatively all information needed for evolution and DIS
  • (this is of course mainly due to historic reasons)

Problems

  • FNS actually should only matter for cross sections, but instead is also (ab-)used for evolution
    • FONLL is not involved in evolution and its threshold does not play any role in evolution
    • instead a dedicated threshold for cross section is needed, that can be chosen freely and independent of evolution thresholds
  • some of the settings are redundant: M_Z, M_W, sin(theta_w), G_F, alphaqed are not linearly independent
  • in principle the card could be divided into two parts (evolution <-> cross sections) with some few settings shared

Configurations ("o-cards")

  • both eko and yadism will be shipped default-less (in big contrast to APFEL)
  • both eko and yadism require some additional configurations:
    • in eko "operators": the target scale of the operators, the discretization and some numerical details
    • in yadism "observables": the discretization, DIS configurations (currents, hadron, lepton) and the target functions, e.g. F2total(x=0.1, Q2=90)
  • we require a mapping of the old settings to the new settings (which both programs already use at this point)
    • our current implementation of this remapping is given here
    • the current status of eko already ignores the FNS setting, but instead it has to be fed with the correct kcThr
    • yadism has to be fed with the correct kDIScThr (the name can of course be changed to kxscThr or similar) to determine the thresholds and FNS to determine the (re-)combination of heavy/light coefficient functions
    • in order to implement FONLL correctly (to our understanding) we need to deactivate the charm threshold in eko, but not in yadism

Proposed Workflow

  • pineko should determine a consistent configuration for both eko and yadism
  • ask them each in turn to compute their ingredients
  • join their respective outcome to provide what is needed: a mapping f_j(x,Q_0) -> theory prediction

Questions

  • how are the "observables" currently determined?
  • how can we organize and maintain the configurations?
  • how can we ensure the consistency between (experimental) dataset and theory?
@cschwan
Copy link
Contributor

cschwan commented Dec 3, 2020

  • How do you choose your independent parameters in DIS?
  • I think the answer to your question about how the obsersables are chosen is that this is done by the experiments (obviously), and we decide which one we consider for the fit, of course. For the collider processes the corresponding observables are stored in APPLgrids or fastNLO tables and collected in the https://github.com/NNPDF/applgrids repository, but I think the procedure is quite different for fixed-target experiments and DIS. In any case, you can look at the experimental data in https://github.com/NNPDF/nnpdf/tree/master/buildmaster/rawdata, which also contains DIS datasets. From this you should be able to reverse engineer them. In https://github.com/NNPDF/nnpdf/tree/master/buildmaster/filters you can take a look into the source files to find the corresponding papers. The apfelcomb database, https://github.com/NNPDF/apfelcomb/blob/master/db/apfelcomb.dat, might contain a few more important parameters (or not).
  • To answer the question of how to maintain this information I suggest anything that's more central and easier than the above. I think a natural place are the PineAPPL grids, so I suggest storing the information in them, maybe as metadata; see also Add new subcommand to read/write/add metadata pineappl#6.
  • For your last question I have an idea, which is basically making the entire toolchain automated. Let's say I change MW in the theory database, then the toolchain should calculate all the PineAPPL grids by running mg5_aMC, yadism, and what not, then run eko to convert the PineAPPL grids to a single scale, and finally generate the FK tables. Ideally there's some dependency management that detects which things are sensitive to which changes, and a caching system that maps run cards for the PineAPPL grids together with all the input parameters to a hash so we don't always calculate everything again. Or we simply follow the pragmatic physicsist's approach, which is be strong and don't screw up ;).

@alecandido
Copy link
Member

alecandido commented Dec 3, 2020

Hi Christopher just a brief reply on two points, tomorrow we can discuss more:

  • "how the observables are chosen" from Felix post means: where do you specify which function/settings should be called to compute the theoretical prediction for those data
  • storing information in PineAPPL grids its fine, but definitely not enough: if we don't have any PineAPPL grid, since we are going to generate them, where it's stored the information describing what we should generate?

It's also fine to automate stuffs, but according to me this is a second-order problem: the first-order is how to run the stuffs manually in a reproducible way, i.e. keeping all the information used in generation. Of course it doesn't make much sense that the information on how to generate the objects it's only in the generated object (while if we are speaking about putting it also in them I perfectly agree, this will have also the benefit of being able to tell if an object it's or it's not up to date with the central information store, e.g. if you have changed the MW value).

@cschwan
Copy link
Contributor

cschwan commented Dec 3, 2020

I see - and I also completely agree. Naturally, the source where it all comes from is a repository containing all the runcards, so something like https://github.com/scarrazza/applgridew. Thinking this idea further, we have two parts: process-dependent information, like cuts, and so on so forth, and process-independent information like parameter values. The former should be in runcards, and the latter in a database that defines the theory.

@alecandido
Copy link
Member

alecandido commented Dec 4, 2020

Ok, I'm glad that a - at least partial - solution already exists, i.e. since you had to generate the hadronics you already thought about this.

Still there are three main points:

  1. what you call theory database, the process-independent information, maybe it is process independent, but still it is almost-partitioned between processes and evolution, that deserves almost orthogonal specs
    • nevertheless we can decide we would like to keep everything together the same, but if so, and if we're going to use a database (let's say relational), at least I'd keep the partition in different tables, and simply we can keep a third one, the selects one record per each table and allows for annotations (such that each fit can still use a unique ID, and not any - even slightly - more structure)
  2. I had a look to applgridew, and I saw that what we need it's essentially a launch.txt file, while basically you need a further one (output.txt) for generating hadronic processes, because madgraph deserves its own language, but yadism instead it's purely declarative, just theory parameters and one runcard
    • of course I'm not suggesting to change madgraph, but simply that we should account for the two different settings, and I'd like to use different names for yadism, 'cause (apart from stems, that sounds a little bit weird to me, even if I know they are coming from madgraph, but it doesn't matter) we would like to use yaml syntax (or in general a proper data serialization format, whatever) and not madgraph one
  3. we also need a launch.txt-like for eko, I don't know where we'd like to put it, but the generation of evolution-specific runcards (not theory section, but the actual runcards containing vfns thresholds and so on), but when generating it should be validated against the runcard for the pineapplgrid you'd like to join: at this point eko it's really flexible, but in turn not all the combinations are guaranteed to be physical (so essentially any time you make the joint you have to check a number of constraints telling you that the thing makes sense)

@cschwan
Copy link
Contributor

cschwan commented Dec 4, 2020

Can we maybe talk about this later after the phone conference?

@felixhekhorn
Copy link
Contributor Author

Ideally there's some dependency management that detects which things are sensitive to which changes

that's why splitting is a good idea

Can we maybe talk about this later after the phone conference?

I think we first should agree on a proposal among ourselves before going "public" to the collaboration ...

@cschwan
Copy link
Contributor

cschwan commented Dec 4, 2020

What I meant is can we (you, Alessandro, I and possibly Juan and Stefano) talk about it after the phone conference, that is in a meeting after the PC!

@felixhekhorn
Copy link
Contributor Author

Sorry I missed the important word "after" ;-)

fine by me - after having lunch after the PC though ;-)

@scarlehoff
Copy link
Member

scarlehoff commented Dec 4, 2020

I would also like to have lunch, but I have no real opinion on how the databases should behave so feel free to do it without me.

My only comment would be on:

To answer the question of how to maintain this information I suggest anything that's more central and easier than the above. I think a natural place are the PineAPPL grids, so I suggest storing the information in them, maybe as metadata; see also NNPDF/pineappl#6.

and

storing information in PineAPPL grids its fine, but definitely not enough: if we don't have any PineAPPL grid, since we are going to generate them, where it's stored the information describing what we should generate?
It's also fine to automate stuffs, but according to me this is a second-order problem: the first-order is how to run the stuffs manually in a reproducible way, i.e. keeping all the information used in generation.

I agree with both accounts. Specially with the last point.
I think the natural place for the metadata to be is the PineAPPL grid. This metadata can have for instance the form of a yaml file so that taken "by itself" is just the runcard(s) by which the grid was created with maybe version information. I think this covers all points?

It's what vp already does, which dumps all user info but also any defaults that were not set so that any output is reproducible if ran with the same version of the code.

Of course, in this case one would dump the runcard for yadism, eko, whatever montecarlo (sherpa or madgraph or anything) etc.

Ideally there's some dependency management that detects which things are sensitive to which changes

We will end up re-implementing reportengine :P

@scarrazza
Copy link
Member

Sorry, I have to teach after the PC. I think you could write a specification for the database too, so we can discuss in more detail.

@alecandido
Copy link
Member

alecandido commented Dec 4, 2020

Can we maybe talk about this later after the phone conference?

Later perfectly fine, right after lunch I would be in trouble, since I have to meet a student. But we should finish no later than 15.

Ideally there's some dependency management that detects which things are sensitive to which changes

We will end up re-implementing reportengine :P

I believe that pineko should really be kept minimal, so it should be just a combiner. If we have to implement a dependency manager as well should be really kept separate from the combiner itself, i.e. the same repo it's still fine, the same package not. If it is also already implemented (like reportengine) let's reuse it, if possible we will improve that one, reducing the entropy of NNPDF code.

@cschwan
Copy link
Contributor

cschwan commented Dec 4, 2020

Can we maybe talk about this later after the phone conference?

Later perfectly fine, right after lunch I would be in trouble, since I have to meet a student. But we should finish no later than 15.

Please send an email with a link where we can meet!

@felixhekhorn felixhekhorn linked a pull request Mar 24, 2022 that will close this issue
1 task
@felixhekhorn
Copy link
Contributor Author

  • the theory input scheme rework will not take place here (and instead might be discussed at the Code Meeting 2022)
  • the configuration for yadism is a concern for https://github.com/NNPDF/runcards
  • pineko does not deal with configurations, but takes them as input
  • the various path to these inputs instead will be implemented in Move permanent part of fkutils here #12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants