-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Census 2020 example #430
base: main
Are you sure you want to change the base?
Census 2020 example #430
Conversation
You need to re-create the conda environment locally following the contributing guide.
It's still way too large. You should aim for the minimum dataset size possible, it's fine if it's just a few KB as long as it contains data that is representative of the whole dataset. For instance, if the code expects some data category, then it should be in the sample dataset to let the notebook run entirely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an absolute need to rename the original census
project census_one
? Without doing anything else, this is going to break all the links to its web page and deployment.
I would also not call the new one census_two
but census2020
.
I imagine renaming the original from |
Sounds like a bug in the validation code, something like |
Done. Thanks
Reduced it to <1MB now. |
Replying to your comment elsewhere:
If you intend to rename it, then redirect links have to be set up:
Alternatively, we could just:
|
I already did these in this PR. Would that be enough to differentiate both examples eventually? |
I think so? |
OK. I will revert the other renaming then |
My suggestion was that you use the processing script to save it to disk as new data and use that data in the notebook. |
Oh? Alright then. Will do... |
@Azaya89 you will need to re-lock the project as the solve is failing:
Not your fault, sometimes conda-forge marks some packages as broken (adding the |
It was hard to follow the discussion above, but it looks like the original one is still called Even apart from the file size, the test data seems more complex than necessary. I think you can provide an option to |
Correct.
OK. That will require a separate PR then.
OK. I will do that. |
I'd make sense doing it in this PR. |
Created a new example using the 2020 US census dataset. The file exists locally as a large
.parq
file that will be uploaded to S3 at a later time.NOTES:
downloads
section of theanaconda-project.yml
files is not a real link and that is what is causing the CI build failure.