Skip to content

Commit

Permalink
fix: fix naive timestamps and int types in duckdb (#148)
Browse files Browse the repository at this point in the history
Don't require a timezone with our parsed timestamps (otherwise,
we can't parse a timestamp like YYYY-MM-DD).

And make sure to ask Pandas to use modern nullable columns instead
of coerced-float columns when there are nullable-int datasets
(like you see if you have a powerset output table with an integer
column).
  • Loading branch information
mikix authored Nov 29, 2023
1 parent f2aafcd commit 9c9e630
Showing 1 changed file with 6 additions and 2 deletions.
8 changes: 6 additions & 2 deletions cumulus_library/databases.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ def __init__(self, db_file: str):
"from_iso8601_timestamp",
self._compat_from_iso8601_timestamp,
None,
duckdb.typing.TIMESTAMP_TZ,
duckdb.typing.TIMESTAMP,
)

def insert_tables(self, tables: dict[str, pyarrow.Table]) -> None:
Expand Down Expand Up @@ -151,7 +151,11 @@ def cursor(self) -> duckdb.DuckDBPyConnection:
return self.connection

def execute_as_pandas(self, sql: str) -> pandas.DataFrame:
return self.connection.execute(sql).df()
# We call convert_dtypes here in case there are integer columns.
# Pandas will normally cast nullable-int as a float type unless
# we call this to convert to its nullable int column type.
# PyAthena seems to do this correctly for us, but not DuckDB.
return self.connection.execute(sql).df().convert_dtypes()

def close(self) -> None:
self.connection.close()
Expand Down

0 comments on commit 9c9e630

Please sign in to comment.