-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add value-based tests for table ingestion #129
Comments
Attention: When reading the files from Parquet back into a Pandas dataframe, nullable values are implicitly type-converted from int to float (read here), which will throw an error when asserting equivalence. To make sure the types are identical even for the new NaN columns, cast them explicitly (with |
Attention: |
totally agree with adding value-based tests... our current system of only checking shape is very dangerous! |
I am adding a documentation snippet previously in """ --------- code snippets for testing code ---------
# -------- pandas dataframe alignment ----------------
# (note: missing columns are added with same name and type
# as in ref_dataframe, but containing NaN values.)
dataframe, ref_dataframe_new = dataframe.align(ref_dataframe, join="right", axis=1)
# assert that the reference table has not been modified by the alignment.
assert ref_dataframe_new.equals(ref_dataframe)
# --------- identical pandas schemata-----------------
assert dataframe.dtypes.equals(ref_dataframe.dtypes)
# --------- identical pyarrow schemata----------------
# (note: use "==" for pyarrrow schema comparisons, not "is")
table = pyarrow.Table.from_pandas(dataframe)
assert (table.schema.types == writers_dict[name]["schema"].types)
assert (table.schema.names == writers_dict[name]["schema"].names)
""" |
Current tests only check the size of (but not the content of) the concatenated tables (for both
--parquet
and--sqlite
).However, the tables have been modified:
TableNumber
column and added prefixed inget_and_modify_df89
write_to_disk()
.To compare the written and chopped files with the original ones, value-based tests would have to modify the original tables accordingly.
The text was updated successfully, but these errors were encountered: