Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/249 support geoparquet #254

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

nathanjmcdougall
Copy link
Contributor

@nathanjmcdougall nathanjmcdougall commented Jul 19, 2024

To resolve #249.

I am developing on Windows so adding a dependency and updating the requirements file is a little cumbersome.

The way I approached this is to use uv with

uv pip compile setup.cfg --extra=doc --extra=test --python-platform=linux --output-file=requirements/dev.txt --unsafe-package=pip --unsafe-package=setuptools

And then I did some massaging of the diff to remove some unnecessary stylistic changes introduced by uv.
Also pip-compile ignores platform specifiers, so uv didn't include appnope==0.1.4 (it is a Darwin-only dependency of ipykernel), but I manually added it back to minimize the diff.

This might be an argument in favour of switching to uv pip compile over pip-compile but it also might be an argument in favour of me developing on Linux 👀

@nathanjmcdougall
Copy link
Contributor Author

The pipeline failed but it seems just to be an intermittent networking issue, from the raw logs
The test pins.boards.BoardManual which uses a GitHub-hosted dataset is returning this error:

requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /repos/rstudio/pins-python

@@ -206,10 +234,20 @@ def default_title(obj, name):
import pandas as pd

if isinstance(obj, pd.DataFrame):
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might play really nicely with the changes you've made in #263 👀 What if we merge that PR first and then refactor geopandas dataframes to be part of _get_df_family?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree.

# TODO(compat): title says CSV rather than data.frame
# see https://github.com/machow/pins-python/issues/5
shape_str = " x ".join(map(str, obj.shape))
return f"{name}: a pinned {shape_str} DataFrame"
return f"{name}: a pinned {shape_str} {obj_name}"
Copy link
Collaborator

@isabelizimm isabelizimm Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, as someone who is not a geopandas frequent flyer-- is it important to you to denote that it is a geopandas dataframe, rather than just "DataFrame"? I partially ask since we won't specify between pandas/polars (although I do realize that is mostly due to the fact polars dataframes do not round-trip)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I think GeoDataFrame would be more informative in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GeoDataFrames contain extra metadata such as a coordinate system, an assigned geometry column, and in general "feel" quite different to a standard DataFrame. The naming convention reflects this: using _gdf in variable names instead of _df.

@nathanjmcdougall nathanjmcdougall force-pushed the feature/249-support-geoparquet branch from 701248d to da71966 Compare July 25, 2024 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support GeoParquet
2 participants