You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All data sets used in geobr are currently stored in the format of GeoPackage.gpkg files. The choice for GeoPackage was an easy one. GeoPackage is a very robust, open standard and compact format for geospatial data. A key aspect here is that .gpkg files are platform-independent, so we can make sure that geobr data is consistent for both R and Python users.
Nonetheless, we are seeing major advances with the development of GeoParquet, a new data format to store geospatial vector data (point, lines, polygons). GeoParquet is built on top of Apache Parquet, a popular columnar storage format for tabular data. It is much (much!) more efficent than GeoPackage in terms of file storage as well as in terms speed to read and save files. I believe it's safe to say that GeoParquet has a bright future in the geospatial industry because of its flexibility and efficiency.
What to expect:
I would like to migrate all data sets available in geobr from GeoPackage to GeoParquet .parquet format in geobr v2.0. This should be done in 2023. I need some time fix some issues in geobr and it would be good to wait a little longer to see GeoParquet become a stable specification with more robust and stable packages to manipulate GeoParquet in R and Python.
How will this affect geobr users?
The only meaninful way this will affect users is that geobr v2.0 will be much faster. Because GeoParquet files are much smaller and because this format is more efficient for IO, download and reading times should be significatly reduced.
I will keep GeoPackage files stored for a while to make sure we have a very smooth transition.
How will this affect geobr developers?
There are already libraries that can read GeoParquet files in both R and Python (see below). geobr v2.0 will need to include just a couple more package dependencies to be able to read geospatial data in .parquet format. In practice, this should have minimum effects on code development.
The python team supports this decision emphatically.
I just recommend to plan the transition carefully given that the geoparquet specs are not stable yet. Their current documentation expects stability at version v1.0.0, but they are still at version v0.3.0. (see text below)
Roadmap
Our aim is to get to a 1.0.0 within 'months', not years. The rough plan is:
0.1 - Get the basics established, provide a target for implementations to start building against.
0.2 / 0.3 - Feedback from implementations, 3D coordinates support, geometry types, crs optional.
0.4 - Feedback from implementations, add spatial index.
0.x - Several iterations based on feedback from implementations.
1.0.0-RC.1 - Aim for this when there are at least 6 implementations that all work interoperably and all feel good about the spec.
1.0.0 - Once there are 12(?) implementations in diverse languages we will lock in for 1.0
Our detailed roadmap is in the Milestones and we'll aim to keep it up to date.
In order to implement GeoParquet in geobr, we still need to investigate the best approaches / packages to read geoparquet into R and Python. Because this is all very recent, it might take a few months before we have stable R and Python packages to do this.
Context
All data sets used in geobr are currently stored in the format of GeoPackage
.gpkg
files. The choice for GeoPackage was an easy one. GeoPackage is a very robust, open standard and compact format for geospatial data. A key aspect here is that.gpkg
files are platform-independent, so we can make sure that geobr data is consistent for bothR
andPython
users.Nonetheless, we are seeing major advances with the development of GeoParquet, a new data format to store geospatial vector data (point, lines, polygons). GeoParquet is built on top of Apache Parquet, a popular columnar storage format for tabular data. It is much (much!) more efficent than GeoPackage in terms of file storage as well as in terms speed to read and save files. I believe it's safe to say that GeoParquet has a bright future in the geospatial industry because of its flexibility and efficiency.
What to expect:
I would like to migrate all data sets available in geobr from GeoPackage to GeoParquet
.parquet
format in geobr v2.0. This should be done in 2023. I need some time fix some issues in geobr and it would be good to wait a little longer to see GeoParquet become a stable specification with more robust and stable packages to manipulate GeoParquet in R and Python.How will this affect geobr users?
How will this affect geobr developers?
There are already libraries that can read GeoParquet files in both
R
andPython
(see below). geobr v2.0 will need to include just a couple more package dependencies to be able to read geospatial data in.parquet
format. In practice, this should have minimum effects on code development.The text was updated successfully, but these errors were encountered: