Automate parts of metadata update comparison #1564
Labels
code health
readability, maintainability, best practices, etc
data quality
devops
building, running, deploying, environment stuff, handy utils, repository-related, engineer QoL, etc
documentation
enhancement
The CSV files (derived from a google spreadsheet) that hold important semantic metadata about our signals and sources are getting quite large. There has also been a lot of recent activity in editing their content, due to work on the signal documentation app. Together that means there are potentially more frequent and bigger diffs to compare when updates happen. To ease the process of reviewing such changes, create automated summaries of:
Row comparisons should be keyed by source+signal instead of just by row number/position, to be more resilient to any row reorderings that happen.
The CSV files can be found at:
Their generation is kicked off by a GH action and performed by code in https://github.com/cmu-delphi/delphi-epidata/blob/dev/tasks.py.
Ideally, the summary text should be added to the body of the PR produced by the GH action.
There is some very rudimentary comparison code that might be useful as a starting point in #1546 (comment).
The text was updated successfully, but these errors were encountered: