Skip to content

Commit

Permalink
Updated nyc_buildings to use higher res, have a dashboard, and explai…
Browse files Browse the repository at this point in the history
…n more
  • Loading branch information
jbednar committed Apr 21, 2022
1 parent 914be0e commit 2570798
Showing 1 changed file with 97 additions and 28 deletions.
125 changes: 97 additions & 28 deletions nyc_buildings/nyc_buildings.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,18 @@
"# NYC Buildings\n",
"Written by Philipp Rudiger<br>\n",
"Created: January 27, 2021<br>\n",
"Last updated: August 4, 2021"
"Last updated: April 20, 2022"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Many plotting libraries can handle collections of polygons, e.g. [Bokeh](https://docs.bokeh.org/en/latest/docs/gallery/texas.html) or [HoloViews+Bokeh](http://holoviews.org/gallery/demos/bokeh/choropleth_data_link.html). However, because browser-based libraries like Bokeh and Plotly send all the polygon data to the browser, they can struggle when either the collections or the polygons themselves get large. Even natively in Python, typical formats like Shapely for representing polygons scale poorly to large polygon collections, because each polygon is wrapped up as a separate Python object, leading to a lot of duplicated storage overhead when many polygons of the same type are defined.\n",
"Many plotting libraries can handle collections of polygons, including [Bokeh](https://docs.bokeh.org/en/latest/docs/gallery/texas.html) and [HoloViews](http://holoviews.org/gallery/demos/bokeh/choropleth_data_link.html). However, because browser-based libraries like Bokeh and Plotly send all the polygon data to JavaScript running in the browser, they can struggle when either the collections or the individual polygons themselves get large. Even natively in Python, typical formats like Shapely for representing polygons scale poorly to large polygon collections, because each polygon is wrapped up as a full, separate Python object, leading to a lot of duplicated storage overhead when many polygons of the same type are defined.\n",
"\n",
"If you want to work with lots of polygons, here you can see how to use [SpatialPandas](https://github.com/holoviz/spatialpandas) and Dask to represent polygons efficiently in memory, fastparquet to represent them efficiently on disk, and [Datashader](https://datashader.org) to render them quickly in a web browser. This notebook also demonstrates how to support hovering for datashaded polygons, with Bokeh overlaying a single vector-based representation of a polygon where the mouse cursor is, while all the rest are sent to the browser only as rendered pixels. That way hover and other interactive features can be supported fully without ever needing to transfer large amounts of data or store them in the limited memory of the web browser tab. \n",
"If you want to work with lots of polygons, here you can see how to use [SpatialPandas](https://github.com/holoviz/spatialpandas) and [Dask](https://dask.org) to represent large collections of polygons efficiently in memory, [fastparquet](https://fastparquet.readthedocs.io/) to represent them efficiently on disk, [Datashader](https://datashader.org) to render them quickly in a web browser, and [HoloViews](https://holoviews.org) to provide a convenient API. This notebook also demonstrates how to support hovering for datashaded polygons, with HoloViews setting up Bokeh to overlay a single vector-based representation of a polygon where the mouse cursor is, while all the rest are sent to the browser only as rendered pixels. That way hover and other interactive features can be supported fully without ever needing to transfer large amounts of data or store them in the limited memory of the web browser tab. \n",
"\n",
"This example plots the outlines of all the buildings in New York City. See\n",
"This example plots the outlines of all one million+ buildings in New York City. See\n",
"[nyc.gov](https://www1.nyc.gov/site/doitt/residents/gis-2d-data.page) for the original data and its description."
]
},
Expand All @@ -35,9 +35,11 @@
"import spatialpandas.io\n",
"\n",
"from dask.diagnostics import ProgressBar\n",
"from holoviews.operation.datashader import (\n",
" rasterize, datashade, inspect_polygons\n",
")\n",
"from holoviews.operation.datashader import rasterize, datashade, inspect_polygons\n",
"\n",
"# Add more resolution to dynamic plots, particularly important for Retina displays\n",
"from holoviews.streams import PlotSize\n",
"PlotSize.scale=2.0\n",
"\n",
"hv.extension('bokeh')"
]
Expand All @@ -48,14 +50,18 @@
"metadata": {},
"outputs": [],
"source": [
"ddf = spd.io.read_parquet_dask('./data/nyc_buildings.parq').persist()"
"ddf = spd.io.read_parquet_dask('./data/nyc_buildings.parq').persist()\n",
"print(len(ddf))\n",
"ddf.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we compute the top categories and drop everything else:"
"Here you can see that we have 1.1 million \"MultiPolygons\", some of which have a `type` and `name` declared.\n",
"\n",
"To get a look at this data, let's plot all the polygons, overlaid on a tiled map of the region:"
]
},
{
Expand All @@ -64,11 +70,47 @@
"metadata": {},
"outputs": [],
"source": [
"cats = list(ddf.type.value_counts().compute().iloc[:10].index.values) + ['unknown']\n",
"polys = hv.Polygons(ddf, vdims='type')\n",
"tiles = hv.element.tiles.CartoLight()\n",
"tiles * rasterize(polys, aggregator='any').opts(width=600, height=500)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At this scale, the plot looks like a bunch of dots or large colored areas, because each building is smaller than a pixel in the plot. But if you have a live Python server running, you can use the Bokeh tools to zoom in and have the plot dynamically redrawn, showing you the full outline of each polygon. You should see more detail whenever you zoom in, possibly after a short delay after Datashader re-renders the new scene.\n",
"\n",
"Now let's make use of the category information. To get a manageable number of types, we'll compute the top 10 most common categories and drop everything else:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cats = list(ddf.type.value_counts().compute().iloc[:10].index.values) + \\\n",
" ['unknown']\n",
"\n",
"ddf['type'] = ddf.type.replace({None: 'unknown'})\n",
"ddf = ddf[ddf.type.isin(cats)]\n",
"ddf['type'] = ddf['type'].astype('category').cat.as_known()\n",
"\n",
"ddf['type'] = ddf['type'].astype('category').cat.as_known()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"SpatialPandas lets us build a spatial index for accessing spatially organized regions more quickly, so let's do that:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with ProgressBar():\n",
" ddf = ddf.build_sindex().persist()"
]
Expand All @@ -77,7 +119,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Next we build a legend for the categories and declare a tile source as backdrop:"
"Now we can view each category separately with a selector widget:"
]
},
{
Expand All @@ -86,21 +128,43 @@
"metadata": {},
"outputs": [],
"source": [
"colors = cc.glasbey_bw_minc_20_maxl_70\n",
"color_key = {cat: tuple(int(e*255.) for e in colors[i]) for i, cat in enumerate(cats)}\n",
"legend = hv.NdOverlay({k: hv.Points([0,0], label=str(k)).opts(\n",
" color=cc.rgb_to_hex(*v), size=0, apply_ranges=False) \n",
" for k, v in color_key.items()}, 'Type')\n",
"polys = hv.Polygons(ddf, vdims='type')\n",
"\n",
"tiles = hv.element.tiles.CartoLight().opts(\n",
" min_height=500, responsive=True, xaxis=None, yaxis=None)"
"hmap = hv.HoloMap({ cat: polys.select(type=cat) for cat in cats}, 'Type')\n",
"rcats = rasterize(hmap, aggregator='any').opts(width=600, height=500)\n",
"\n",
"tiles * rcats"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we put it all together, declaring a `Polygons` element from our data, datashade them and use the `inspect_polygons` operation to allow us to hover on the data:"
"If you look at each one, you can see that unfortunately most of the categories are unknown, but there are interesting patterns (e.g. almost no garages in Manhattan, and apparently all the sheds are in New Jersey).\n",
"\n",
"Since these buildings don't normally overlap, we can actually combine them all into a single plot using color to show all of the categories (though we have to construct a color key manually):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"colors = cc.glasbey_bw_minc_20_maxl_70\n",
"color_key = {cat: tuple(int(e*255.) for e in colors[i]) \n",
" for i, cat in enumerate(cats)}\n",
"legend = hv.NdOverlay({k: hv.Points([(0,0)], label=str(k)).opts(\n",
" color=cc.rgb_to_hex(*v), size=0, \n",
" apply_ranges=False)\n",
" for k, v in color_key.items()}, 'Type')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we let's show the color-coded plot, and use the `inspect_polygons` operation to allow us to hover on the data dynamically:"
]
},
{
Expand All @@ -109,20 +173,21 @@
"metadata": {},
"outputs": [],
"source": [
"polys = hv.Polygons(ddf, vdims='type')\n",
"\n",
"shaded = datashade(polys, color_key=color_key, aggregator=ds.by('type', ds.any()))\n",
"\n",
"hover = inspect_polygons(shaded).opts(fill_color='red', tools=['hover'])\n",
"\n",
"tiles * shaded * legend * hover"
"plot = tiles * shaded * legend * hover\n",
"plot.opts(min_height=500, min_width=600, responsive=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally we will plot each category of buildings separately:"
"If you zoom into an area of interest and then hover over the polygons with a mouse, you'll see an overlay of the building at that location, with hover information indicating its type and name when available. \n",
"\n",
"Finally, we'll make this notebook into a shareable app (run with `panel serve nyc_buildings.ipynb`, or `anaconda-project run dashboard`)"
]
},
{
Expand All @@ -131,9 +196,13 @@
"metadata": {},
"outputs": [],
"source": [
"hv.NdLayout({\n",
" cat: hv.element.tiles.CartoLight() * rasterize(polys.select(type=cat), aggregator='any') for cat in cats\n",
"}, 'Type').opts('Image', width=250, height=400, xaxis=None, yaxis=None).cols(4)"
"text = \"\"\"\n",
"# [1 million buildings in NYC](https://examples.pyviz.org/nyc_buildings)\n",
"## Rendered using [Datashader](https://datashader.org) and [HoloViews](https://holoviews.org).\n",
"\"\"\"\n",
"\n",
"import panel as pn\n",
"pn.Column(text, pn.panel(plot, sizing_mode='stretch_width')).servable();"
]
}
],
Expand Down

0 comments on commit 2570798

Please sign in to comment.