Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chron_data causes duplicates that are not in the database #6

Open
MartinHinz opened this issue May 14, 2024 · 2 comments
Open

chron_data causes duplicates that are not in the database #6

MartinHinz opened this issue May 14, 2024 · 2 comments

Comments

@MartinHinz
Copy link
Contributor

Try the following standard query from the API page:

chron_data(labnr = "AAR-1847")

The result is currently 7 observations, but there are only (at least) 4 duplicates in the database. They also come out correctly via API:

[{"measurement":{"id":21924,"labnr":"AAR-1847","bp":4780,"std":100,"cal_bp":null,"cal_std":null,"delta_c13":-27.3,"source_database":"","lab_name":"","material":"food residue","species":null,"feature":null,"feature_type":"","site":"Øgårde 4","country":"DK","lat":"55.5937","lng":"11.543","site_type":"hoard","periods":[{"period":"Neolithic"},{"period":"Trichterbecher-North Group"}],"typochronological_units":[{"typochronological_unit":"Neolithic"},{"typochronological_unit":"Trichterbecher-North Group"}],"ecochronological_units":[],"reference":[{"reference":"Koch 1998, 307."},{"reference":"CalPal2022"}]}},

{"measurement":{"id":85745,"labnr":"AAR-1847","bp":4780,"std":100,"cal_bp":null,"cal_std":null,"delta_c13":null,"source_database":"","lab_name":"","material":"food residue","species":null,"feature":null,"feature_type":"","site":"Øgårde 4","country":"DK","lat":"55.5937","lng":"11.543","site_type":"hoard","periods":[{"period":"MN"},{"period":"Bernburg"}],"typochronological_units":[{"typochronological_unit":"MN"},{"typochronological_unit":"Bernburg"}],"ecochronological_units":[],"reference":[{"reference":"EUROEVOL"}]}},

{"measurement":{"id":88925,"labnr":"AAR-1847","bp":4780,"std":100,"cal_bp":null,"cal_std":null,"delta_c13":-27.3,"source_database":"","lab_name":"","material":"food remains","species":"\"food crust\".","feature":"Moorfund; \"food crust on funnel beaker Type III\".","feature_type":"","site":"Øgårde 4","country":"DK","lat":"55.5937","lng":"11.543","site_type":"hoard","periods":[{"period":"Bernburg"}],"typochronological_units":[{"typochronological_unit":"Bernburg"}],"ecochronological_units":[],"reference":[{"reference":"Koch 1998, 307; Aud 1995, 319"},{"reference":"RADON"}]}},

{"measurement":{"id":177143,"labnr":"AAR-1847","bp":4780,"std":100,"cal_bp":null,"cal_std":null,"delta_c13":-27.3,"source_database":"","lab_name":"","material":"charcoal","species":null,"feature":null,"feature_type":"","site":"Øgårde 4","country":"DK","lat":"55.5937","lng":"11.543","site_type":"hoard","periods":[],"typochronological_units":[],"ecochronological_units":[],"reference":[{"reference":"Koch 1998 307."},{"reference":"p3k14c"}]}}]

The duplicates are created during processing.

@MartinHinz MartinHinz changed the title chron_data verursacht Duplikate, die so nicht in der Datenbank sind chron_data causes duplicates that are not in the database May 29, 2024
@MartinHinz
Copy link
Contributor Author

While working on #7, I realised that this is due to the way references are currently parsed. For this PR, I left the result as it was, but we should consider whether this behaviour is intentional!

@joeroe
Copy link
Contributor

joeroe commented Jun 7, 2024

IIRC the aim was to provide a completely flat table from chron_data(), so in that sense having one row per reference (long format) makes sense. I'm quite fond of nested data frames myself, e.g. holding references in a list of vectors, but I think returning that kind of thing in a package isn't a good idea because many people aren't used to working with them.

Longer term, my preference would be for reworking the architecture of this package so it has individual functions that more closely match XRONOS 'v2' API and return classed vectors that can be used to build up multi-stage queries (e.g. xr_get_c14s(...) |> xr_get_references()), rather than one monolithic function. It is difficult to robustly reduce a complex data structure like XRONOS' to a single flat table, both on the server and client side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants