Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining composed ids considering new lines as different items #58

Open
mgaitan opened this issue Sep 11, 2022 · 1 comment
Open

Defining composed ids considering new lines as different items #58

mgaitan opened this issue Sep 11, 2022 · 1 comment

Comments

@mgaitan
Copy link

mgaitan commented Sep 11, 2022

I'm a newbie to the datasette ecosystem and I'm particularly amazed by the git-scraping technique. Thanks Simon for sharing it!

I need help defining a composed id on the rows for this CSV where I'm tracking power outages events in Buenos Aires's metropolitan area every 20'.

https://github.com/OpenDataCordoba/cortes_enre/blob/main/cortes_enre.csv

My problem is that there is no a clear ID of each event and I would like to track changes over it

Consider this recent commit
OpenDataCordoba/cortes_enre@b3cde1c

Here it seems I could use all the columns but the last two as a composed id

latitud,longitud,nn,tipo,empresa,partido,localidad,subestacion,alimentador

Then the colums afectados (affected users) and normalizacion estimada (estimated time to normalization) could change during a few next updates, but eventually the line will be deleted.

The problem is that the composed id basically describes the "place" where the outage is happening, and maybe in the future it could be a totally different event in the same place unrelated to the current event.

So, how could I distinguish different events in the same place? I'm wondering if there is a way to consider it's a new item if the composed id appears again (ie the commit is not updating an existing line but adding it).

@mgaitan mgaitan changed the title Defining composed ids considering new lines as different rows. Defining composed ids considering new lines as different items Sep 11, 2022
@simonw
Copy link
Owner

simonw commented Sep 12, 2022

This is really difficult!

One idea: you could take the yyyy-mm - the year and month - and use those as part of the ID. This would at least give you a unique new ID for each location for each month.

The two downsides to this are that if an outage starts on August 31st and continues to September the 1st it would be treated as two separate outages. And if an outage finishes September 2nd and then a new one starts on September 28th in the same location they would be treated as the same outage.

But it may be the best you can do in this situation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants