Skip to content

Meeting on April 17, 2018

Brandon Kuczenski edited this page Apr 17, 2018 · 8 revisions

Upcoming meeting

Time: 18:00 Paris time

Place: https://yale.zoom.us/j/809419495

Meeting Minutes

Beware partisan propaganda offered in bad faith. NAS.org = National Association of Scholars, right wing advocacy group in US.

  • JIE issue

Authors submitting python code - Reid wants advice on what to offer / require.

  • BK: Jupyter notebook is nice because it allows you to reproduce it it but it may not work
  • RLu: if they posted it on Github then people would be able to read it right away, even if they don't have python installed. Also "My binder" service allows people to host their code and run it remotely. but it requires more than just providing a notebook- because you also need to specify what packages are required
  • RLi: please write that down
  • RLu: will make a wiki page

To be specific, the files are ipynb - we will probably want to have 'required' and 'optional' aspects of the guidelines.

  • RLi: norm is that the reader should know what the supplementary files are and what info they contain. e.g. spreadsheet, the first tab is documentation. what's equivalent for jupyter?
  • BK: notebook supports inline documentation but yous till need to be able to read it.
  • RLu: also Zenodo allows you to archive a report / get a DOI (put that on the wiki too)

Also important to determine what the journal wants to accomplish-- is there the ability to run code while you're reading the article?

  • RLu: you could do much more, e.g. have the notebook reproduce the figures, but that is more than is required.
  • RLi: if the code is self-contained and it is just a link to take you out of the author, then it's only a URL and no live interactive capability is required in the article.

In the short run, the basic suggestion would be to put it in github and/or zenodo and simply have a reference to it.

  • NH: presumably Wiley has an opinion about it? Don't put too much trust in 3rd party services that just cropped up and could disappear.
  • RLi: permanence of the record is a profound issue. e.g. Large grey-literature report -- should it be part of the SI? Publishing world has not grappled with that.
  • BK: separate question about URLs: inline vs footnote vs reference or all three?

References should be references- supporting a statement- inline links should be for interaction though not necessarily for support.

Stefan's spreadsheet

reference: https://docs.google.com/spreadsheets/d/1yupwhtfUiBnW5DcAzOTek03gxzH22hDzn_djzwocDGA

Tried to structure the feedback I got last week, what it boils down to is that there's not an inventory or database, but different stages. My idea was just to report plain-text metadata but Brandon observed that may not actually be very useful.

Catalogs:

Type 1: Keyword-based catalog. Ideally, we have all data in a general database and the data are all linked. e.g. a person types 'Sweden' and gets all the information pertaining to Sweden.

Type 2: Data structure catalog without classifications. The "sweet spot" is where we have a catalog, with ontologies for different data types; and data users need to do some work to link their data to the existing types

Type 3: Data structure catalog with classifications. like type 2 but the authors do the work of linking to the types

Actual Databases

Type 4: metadata included, numeric data excluded

Type 5: metadata and data included

"Aspects" or "dimensions" are different entries in an n-tuple of data about something.

  • RLu: "measures" are the quantitative extent of dimensions in the 'data cube' context.

  • Example of that in RLu paper on sankey diagrams

  • BK: these are problems that other fields must have already dealt with. Incoherent talk about NCEAS

  • perspective from other fields - e.g. AGMiP http://www.agmip.org example of a large scale synthesis project

  • Steven Kraines - early work on ontologies in JIE (2004-2005) (e.g. https://doi.org/10.1162/1088198054821690)

  • We are a bit stuck

  • Mathematics + computer science

  • More of a human problem than a CS problem

  • SP: in this MFA software I used this data structure and I will carry on so that we have something to point to and iterate on

  • Important question is raw data versus proxy data. e.g. steel data in ecoinvent comes from observations on one steel plant in Switzerland and gets processed into a proxy model

  • Same thing with IO data- goes through dozens of transformations, but once the MRIO table is generated it is treated as "raw" data because it satisfies a balancing requirement and because the upstream transformations are not visible

Regarding aspects- we need to identify what aspects are "required" for different data types, e.g. a flow needs "at least two processes"