chron_data causes duplicates that are not in the database #6

MartinHinz · 2024-05-14T13:21:40Z

Try the following standard query from the API page:

chron_data(labnr = "AAR-1847")

The result is currently 7 observations, but there are only (at least) 4 duplicates in the database. They also come out correctly via API:

[{"measurement":{"id":21924,"labnr":"AAR-1847","bp":4780,"std":100,"cal_bp":null,"cal_std":null,"delta_c13":-27.3,"source_database":"","lab_name":"","material":"food residue","species":null,"feature":null,"feature_type":"","site":"Øgårde 4","country":"DK","lat":"55.5937","lng":"11.543","site_type":"hoard","periods":[{"period":"Neolithic"},{"period":"Trichterbecher-North Group"}],"typochronological_units":[{"typochronological_unit":"Neolithic"},{"typochronological_unit":"Trichterbecher-North Group"}],"ecochronological_units":[],"reference":[{"reference":"Koch 1998, 307."},{"reference":"CalPal2022"}]}},

{"measurement":{"id":85745,"labnr":"AAR-1847","bp":4780,"std":100,"cal_bp":null,"cal_std":null,"delta_c13":null,"source_database":"","lab_name":"","material":"food residue","species":null,"feature":null,"feature_type":"","site":"Øgårde 4","country":"DK","lat":"55.5937","lng":"11.543","site_type":"hoard","periods":[{"period":"MN"},{"period":"Bernburg"}],"typochronological_units":[{"typochronological_unit":"MN"},{"typochronological_unit":"Bernburg"}],"ecochronological_units":[],"reference":[{"reference":"EUROEVOL"}]}},

{"measurement":{"id":88925,"labnr":"AAR-1847","bp":4780,"std":100,"cal_bp":null,"cal_std":null,"delta_c13":-27.3,"source_database":"","lab_name":"","material":"food remains","species":"\"food crust\".","feature":"Moorfund; \"food crust on funnel beaker Type III\".","feature_type":"","site":"Øgårde 4","country":"DK","lat":"55.5937","lng":"11.543","site_type":"hoard","periods":[{"period":"Bernburg"}],"typochronological_units":[{"typochronological_unit":"Bernburg"}],"ecochronological_units":[],"reference":[{"reference":"Koch 1998, 307; Aud 1995, 319"},{"reference":"RADON"}]}},

{"measurement":{"id":177143,"labnr":"AAR-1847","bp":4780,"std":100,"cal_bp":null,"cal_std":null,"delta_c13":-27.3,"source_database":"","lab_name":"","material":"charcoal","species":null,"feature":null,"feature_type":"","site":"Øgårde 4","country":"DK","lat":"55.5937","lng":"11.543","site_type":"hoard","periods":[],"typochronological_units":[],"ecochronological_units":[],"reference":[{"reference":"Koch 1998 307."},{"reference":"p3k14c"}]}}]

The duplicates are created during processing.

The text was updated successfully, but these errors were encountered:

MartinHinz · 2024-05-29T10:03:01Z

While working on #7, I realised that this is due to the way references are currently parsed. For this PR, I left the result as it was, but we should consider whether this behaviour is intentional!

joeroe · 2024-06-07T08:27:12Z

IIRC the aim was to provide a completely flat table from chron_data(), so in that sense having one row per reference (long format) makes sense. I'm quite fond of nested data frames myself, e.g. holding references in a list of vectors, but I think returning that kind of thing in a package isn't a good idea because many people aren't used to working with them.

Longer term, my preference would be for reworking the architecture of this package so it has individual functions that more closely match XRONOS 'v2' API and return classed vectors that can be used to build up multi-stage queries (e.g. xr_get_c14s(...) |> xr_get_references()), rather than one monolithic function. It is difficult to robustly reduce a complex data structure like XRONOS' to a single flat table, both on the server and client side.

MartinHinz changed the title ~~chron_data verursacht Duplikate, die so nicht in der Datenbank sind~~ chron_data causes duplicates that are not in the database May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chron_data causes duplicates that are not in the database #6

chron_data causes duplicates that are not in the database #6

MartinHinz commented May 14, 2024

MartinHinz commented May 29, 2024

joeroe commented Jun 7, 2024 •

edited

Loading

chron_data causes duplicates that are not in the database #6

chron_data causes duplicates that are not in the database #6

Comments

MartinHinz commented May 14, 2024

MartinHinz commented May 29, 2024

joeroe commented Jun 7, 2024 • edited Loading

joeroe commented Jun 7, 2024 •

edited

Loading