-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Version control of data #8
Comments
Hi @nheeren, whilst very useful, this would be a challenging request to satisfy!
There are many methods for version controlling (VC) data. The format of the data being tracked and the type of changes you want to capture will affect which option is optimal for each use case. You can hack around a bit to use GitHub for VC, but as you mention it's not likely to be a good choice. If keen to use a spreadsheet - is it an option to use a web based one such as Google Sheets? The automatic VC there may suit your needs. If they're in a 'proper' database running within a database management system (DBMS), then there are some very solid options available. For instance I used to use Change Data Capture in SQL Server, which probably does what you're looking for. Or slightly more old-school but available in full open-source: Posrgresql triggers. Basically the tech-stack used by each group in the IE community will determine which body of technical knowledge they'll need to master to achieve reliable tracking of their data. If you're flexible and looking for recommendations on an appropriate tech stack to choose for an upcoming open-science project then that's another question! |
Thanks @tmillross! Sorry, I should have been more clear about my objectives. The reason why I opened this issues was to start a discussion and identify best practices for IE Open Science that could later become part of a guideline. The goal would be to keep the project database as reproducible as possible and very much in the IE Open Science spirit. So VC is only part of the requirements.
|
Have you seen https://datbase.org/? Looks cool but it's pretty new -- haven't used it for anything serious myself but some people are, they have examples on their blog. |
|
In a larger project, we have the issue that we would like to create a database on github. However, GitHub is meant to keep track of changes in text files and we are using binary files (xlsx) for now. That means uploading new versions of the data files will eventually cause very large overhead over time and no meaningful version control is possible. I could see that the final data will be converted to
csv
at some point, but so far, this database is a moving target and we would like to use Excel files for now.Can we add guidelines or recommendations in the wiki on how to do version control of IE datasets and databases? Any suggestions are very much welcome.
The text was updated successfully, but these errors were encountered: