Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run table and unit table #124

Merged
merged 40 commits into from
Nov 17, 2024
Merged
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
7176760
Moved erd test function to test_file_writer and deleted test_plugin.p…
Nov 7, 2024
f874bd5
created test functions for schema, yaml, toml readers
Nov 7, 2024
8544983
updated file writer CI test with extra package installs for erd test
Nov 7, 2024
56e9d15
created a unified dsi_units table for all ingested data and updated h…
Nov 7, 2024
17c2734
Updated other files to handle unified units table and separate back-w…
Nov 7, 2024
10d0e54
Created run table to store metadata for every workflow run and handle…
Nov 7, 2024
7eb97dd
updated backend actions and split back-write and read actions
Nov 13, 2024
fdcb7a3
created set_schema2 which sets file reader data dict to metadata dict
Nov 13, 2024
5a68c19
added inspect artifact handler function example workflow
Nov 13, 2024
bd205d9
updated file writers to handle unified unit table
Nov 13, 2024
f638ee1
updated schema, yaml, toml readers to use set_schema2 instead of old …
Nov 13, 2024
e4229f8
Only commit db insert if all data in workflow is stable/non repetitiv…
Nov 13, 2024
bd3b2f8
only execute sql statements if no insertion error else rollback all p…
Nov 14, 2024
592a218
customized print error if duplicate/error data ingested to backend
Nov 14, 2024
4c3c8d6
updated er diagram writer to generate dot as python object and no ext…
Nov 14, 2024
b41422d
created generic text file reader -- assumes only one table in data so…
Nov 14, 2024
e1afd9b
merged changes with main branch
Nov 15, 2024
0b49b3e
combined and tested sqlite read/write class
Nov 15, 2024
6d458ef
Updated other files to handle unified units table and separate back-w…
Nov 15, 2024
dec406c
updated primary key handling in schema reader
Nov 15, 2024
750931b
Created run table to store metadata for every workflow run and handle…
Nov 7, 2024
0411470
updated backend actions and split back-write and read actions
Nov 13, 2024
3e97340
Only commit db insert if all data in workflow is stable/non repetitiv…
Nov 13, 2024
3536990
changed coreterminal to merge to main
Nov 15, 2024
0a149d6
updated sqlite to merge to main
Nov 15, 2024
c5126a5
Merge branch 'main' into runTable_and_unitTable
Vedant1 Nov 15, 2024
402477b
added graphviz to pip install in CI file
Nov 15, 2024
5895856
updated tests to reflect units table in collections now
Nov 15, 2024
23252d1
updated test file reader again
Nov 15, 2024
718b439
moved nbc and nbf dependencies inline for only inspect artifact
Nov 15, 2024
4679e6e
only backend read can call read and only backend write can call put
Nov 15, 2024
2543522
removed extra imports and updated erd writer if graphviz installed or…
Nov 15, 2024
1e4f2e0
Updated csv reader to be faster and renamed yaml/toml to YAML1 TOML1
Nov 15, 2024
c9e0845
Updated with new toml and yaml reader names
Nov 15, 2024
2047081
Updated init function of csv reader
Nov 15, 2024
61adf5b
Updated name of set schema 2 function call in csv reader
Nov 15, 2024
d9cd059
updated set_schema_2 in metadata to create nested ordered dict
Nov 15, 2024
7fe1964
Extraneous columnns expected in csv dictionary in test function are r…
Nov 15, 2024
ec9b5a0
Logger overwrites not appends to output file
Vedant1 Nov 16, 2024
123c851
fixed description
jpulidojr Nov 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/test_file_reader.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ jobs:
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install .
pip install graphviz
- name: Test reader
run: |
pip install pytest
Expand Down
5 changes: 4 additions & 1 deletion .github/workflows/test_file_writer.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,10 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install .
python -m pip install opencv-python
pip install .
pip install graphviz
sudo apt-get install graphviz
- name: Test reader
run: |
pip install pytest
Expand Down
36 changes: 0 additions & 36 deletions .github/workflows/test_plugin.yml

This file was deleted.

5 changes: 3 additions & 2 deletions dsi/backends/parquet.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
import pyarrow as pa
from pyarrow import parquet as pq
import nbconvert as nbc
import nbformat as nbf
import subprocess

from dsi.backends.filesystem import Filesystem
Expand Down Expand Up @@ -46,6 +44,9 @@ def get_cmd_output(cmd: list) -> str:
return proc.stdout.strip().decode("utf-8")

def inspect_artifacts(self, collection, interactive=False):
import nbconvert as nbc
import nbformat as nbf

"""Populate a Jupyter notebook with tools required to look at Parquet data."""
nb = nbf.v4.new_notebook()
text = """\
Expand Down
Loading