Census 2020 example #430

Azaya89 · 2024-10-17T00:34:24Z

Created a new example using the 2020 US census dataset. The file exists locally as a large .parq file that will be uploaded to S3 at a later time.

minor updates on the 2010 census example - Moved to Update census 2010 example #459

NOTES:

The url added in the downloads section of the anaconda-project.yml files is not a real link and that is what is causing the CI build failure.

maximlt · 2024-10-17T07:06:33Z

I suspect it is due to #429 but I'm not sure how to resolve it.

You need to re-create the conda environment locally following the contributing guide.

The test file added is a 0.1% sample of the full dataset but it is still about 8MB in size. I don't know if that is too large and should be reduced further.

It's still way too large. You should aim for the minimum dataset size possible, it's fine if it's just a few KB as long as it contains data that is representative of the whole dataset. For instance, if the code expects some data category, then it should be in the sample dataset to let the notebook run entirely.

maximlt

Is there an absolute need to rename the original census project census_one? Without doing anything else, this is going to break all the links to its web page and deployment.

I would also not call the new one census_two but census2020.

Azaya89 · 2024-10-17T09:20:48Z

Is there an absolute need to rename the original census project census_one? Without doing anything else, this is going to break all the links to its web page and deployment.

I would also not call the new one census_two but census2020.

I imagine renaming the original from census to something else makes sense seeing as there are now more than one census notebooks in the examples gallery (and possibly more in the future). However, I tried renaming both to census2010 and census2020 but the doit validate step emits a warning that only lower case characters and underscore allowed in the naming. I wasn't sure ignoring that warning was ideal that is why I now renamed both to the current names.

maximlt · 2024-10-17T10:39:18Z

However, I tried renaming both to census2010 and census2020 but the doit validate step emits a warning that only lower case characters and underscore allowed in the naming

Sounds like a bug in the validation code, something like census2020 should be allowed.

Azaya89 · 2024-10-17T11:03:40Z

You need to re-create the conda environment locally following the contributing guide.

Done. Thanks

It's still way too large. You should aim for the minimum dataset size possible, it's fine if it's just a few KB as long as it contains data that is representative of the whole dataset. For instance, if the code expects some data category, then it should be in the sample dataset to let the notebook run entirely.

Reduced it to <1MB now.

maximlt · 2024-10-21T06:58:52Z

Replying to your comment elsewhere:

Thank you. I'm still in favor of renaming the first one to census2010 though.

If you intend to rename it, then redirect links have to be set up:

Full link: https://examples.holoviz.org/gallery/census/census.html to https://examples.holoviz.org/gallery/census2010/census2010.html
Shortcut link: https://examples.holoviz.org/census to https://examples.holoviz.org/census2010
Unfortunately, it's not super easy to set a redirect link for the deployment itself (https://census-notebook.holoviz-demo.anaconda.com/notebooks/census.ipynb), so renaming would break it. We have recently broken them all (new subdomain) and no one complained as far as I know so it seems it wouldn't be too bad.

Alternatively, we could just:

Change the title property in the project YAML to Census 2010
Change the notebook top-level heading to Census 2010

Azaya89 · 2024-10-21T11:17:53Z

Alternatively, we could just:

Change the title property in the project YAML to Census 2010

Change the notebook top-level heading to Census 2010

I already did these in this PR. Would that be enough to differentiate both examples eventually?

maximlt · 2024-10-21T11:35:50Z

Would that be enough to differentiate both examples eventually?

I think so?

Azaya89 · 2024-10-21T14:34:31Z

I think so?

OK. I will revert the other renaming then

census2020/census2020.ipynb

hoxbro · 2024-11-07T07:44:35Z

My suggestion was that you use the processing script to save it to disk as new data and use that data in the notebook.

Azaya89 · 2024-11-07T10:07:51Z

My suggestion was that you use the processing script to save it to disk as new data and use that data in the notebook.

Oh? Alright then. Will do...

maximlt · 2024-11-15T05:49:38Z

@Azaya89 you will need to re-lock the project as the solve is failing:

Channels:
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed

PackagesNotFoundError: The following packages are not available from current channels:

  - libcurl==8.11.0=hbbe4b11_0

Not your fault, sometimes conda-forge marks some packages as broken (adding the broken label on conda-forge) which means these packages are no longer available on the conda-forge channel but on conda-forge/label/broken.

conda-forge/admin-requests#1147

jbednar · 2024-12-02T17:01:04Z

It was hard to follow the discussion above, but it looks like the original one is still called census rather than census2010, and if so, I agree -- let's preserve those links. We'll put a link to census2020 within census so that wherever someone lands they will find both.

Even apart from the file size, the test data seems more complex than necessary. I think you can provide an option to write_parquet to store the test data into a single flat .parq file rather than a directory full of separate part files. Looks like the old census didn't do that, but I don't think there was a good reason for that, as e.g. opensky uses a single parquet file.

Azaya89 · 2024-12-02T20:30:13Z

It was hard to follow the discussion above, but it looks like the original one is still called census rather than census2010, and if so, I agree -- let's preserve those links.

Correct.

We'll put a link to census2020 within census so that wherever someone lands they will find both.

OK. That will require a separate PR then.

Even apart from the file size, the test data seems more complex than necessary. I think you can provide an option to write_parquet to store the test data into a single flat .parq file rather than a directory full of separate part files. Looks like the old census didn't do that, but I don't think there was a good reason for that, as e.g. opensky uses a single parquet file.

OK. I will do that.

maximlt · 2024-12-02T20:37:13Z

OK. That will require a separate PR then.

I'd make sense doing it in this PR.

Azaya89 requested a review from hoxbro October 17, 2024 00:34

maximlt reviewed Oct 17, 2024

View reviewed changes

Azaya89 added 17 commits October 19, 2024 09:51

minor updates

7890e28

renamed 2010 project

91ae8cd

created 2020 example dir

06fa9d8

further WIP

0057220

re-lock files

08c9413

more WIP

5ed6311

add thumbnails image and tag

e5cbae6

more WIP

a6751b2

renamed 2010 test data dir

5cc1021

added 2020 test data files

fcf8b95

renamed both census files

5175db5

re-lock files

55a1536

reduced test data size

d15fba4

renamed 2020 example

abdb229

renamed 2010 example

ea8a141

reduced 2020 test files

a53fa76

more renaming

b18baf4

Azaya89 force-pushed the azaya/census branch from 5bd8da9 to b18baf4 Compare October 19, 2024 10:32

revert census2010 renaming

5e7610e

Azaya89 commented Oct 22, 2024

View reviewed changes

census2020/census2020.ipynb Outdated Show resolved Hide resolved

hoxbro mentioned this pull request Nov 4, 2024

enh(bokeh): Add select to ImageStack holoviz/holoviews#6437

Draft

Azaya89 added 3 commits November 6, 2024 20:27

added data processing script

815892b

re-lock files

29696ea

added dashboard WIP

d65cb69

Azaya89 added 4 commits November 7, 2024 11:44

saved processed file to disk

bf9e6dc

merged main into branch

91578ce

update file and test data

448760a

update dependencies and dashboard

b5f5284

This was referenced Nov 19, 2024

Update census example to use datashader 0.14.5a1 #271

Closed

Drop spatial indexing (for now) #131

Closed

Replace the use of stamen tiles in the repo #451

Merged

Azaya89 added 6 commits November 29, 2024 16:02

update lock files to close #271

637db36

notebook update

2dccfb5

reverted all changes to census 2010

a2532b7

Merge branch 'main' into azaya/census

2f41638

Merge branch 'main' into azaya/census

7dea01e

update dashboard

f300bda

Azaya89 force-pushed the azaya/census branch from f60ed1f to f300bda Compare December 2, 2024 15:29

Azaya89 self-assigned this Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Census 2020 example #430

Census 2020 example #430

Azaya89 commented Oct 17, 2024 •

edited

Loading

maximlt commented Oct 17, 2024

maximlt left a comment

Azaya89 commented Oct 17, 2024

maximlt commented Oct 17, 2024

Azaya89 commented Oct 17, 2024

maximlt commented Oct 21, 2024

Azaya89 commented Oct 21, 2024

maximlt commented Oct 21, 2024

Azaya89 commented Oct 21, 2024

hoxbro commented Nov 7, 2024

Azaya89 commented Nov 7, 2024

maximlt commented Nov 15, 2024

jbednar commented Dec 2, 2024

Azaya89 commented Dec 2, 2024

maximlt commented Dec 2, 2024

Census 2020 example #430

Are you sure you want to change the base?

Census 2020 example #430

Conversation

Azaya89 commented Oct 17, 2024 • edited Loading

maximlt commented Oct 17, 2024

maximlt left a comment

Choose a reason for hiding this comment

Azaya89 commented Oct 17, 2024

maximlt commented Oct 17, 2024

Azaya89 commented Oct 17, 2024

maximlt commented Oct 21, 2024

Azaya89 commented Oct 21, 2024

maximlt commented Oct 21, 2024

Azaya89 commented Oct 21, 2024

hoxbro commented Nov 7, 2024

Azaya89 commented Nov 7, 2024

maximlt commented Nov 15, 2024

jbednar commented Dec 2, 2024

Azaya89 commented Dec 2, 2024

maximlt commented Dec 2, 2024

Azaya89 commented Oct 17, 2024 •

edited

Loading