Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

173 analyse network #200

Merged
merged 45 commits into from
Dec 15, 2023
Merged

173 analyse network #200

merged 45 commits into from
Dec 15, 2023

Conversation

SergioRec
Copy link
Contributor

@SergioRec SergioRec commented Nov 9, 2023

Description

Module analyse_network.

Closes #173
Fixes #211

Motivation and Context

This module is a wrapper for r5py to calculate an origin-destination matrix to feed into the matrix module. It takes the outputs from the urban_centres and rasterpop modules.

Module allows selection of a different number of origins to calculate O-D matrices in batches. It saves the outputs as parquet files, one per origin batch. However, it will split any parquet file exceeding a certain size.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

How Has This Been Tested?

Module run locally, outputs checked manually.
Unit tests run locally.

Test configuration details:

  • OS: MacOS Ventura
  • Python version: 3.9.13
  • Java version: 11
  • Python management system: pip/conda

Advice for reviewer

Run times:

  • Newport: all origins in batch, ~3 minutes
  • Marseille: all origins in batch, ~10 minutes
  • Leeds: all origins in batch, ~ 1 hour
  • London: one origins per batch, ~40 hours

Checklist:

  • My code follows the intended structure of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

Additional comments

I've chosen to not explicitly ask for keyword arguments from r5py and leave them as **kwargs. This may not be a good choice, but I wasn't sure if arbitrarily including some arguments and not others would be messy. Happy to change it.

Regarding the defence against dates not being in the GTFS (which also triggers when no valid GTFS is passed and TransportMode.TRANSIT is included), it's not straightforward to access any attribute within TransportNetwork or TravelTimeMatrixComputer that exposes GTFS information. The current solution catches the specific runtime warning and raises an IndexError when present.

@codecov-commenter
Copy link

codecov-commenter commented Nov 9, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (1156895) 98.18% compared to head (707dbe5) 98.30%.

❗ Current head 707dbe5 differs from pull request most recent head e351984. Consider uploading reports for the commit e351984 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #200      +/-   ##
==========================================
+ Coverage   98.18%   98.30%   +0.11%     
==========================================
  Files          18       19       +1     
  Lines        1433     1530      +97     
==========================================
+ Hits         1407     1504      +97     
  Misses         26       26              
Flag Coverage Δ
unittests 98.30% <100.00%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@SergioRec SergioRec marked this pull request as draft November 9, 2023 09:01
@SergioRec SergioRec marked this pull request as ready for review November 9, 2023 15:35
@SergioRec SergioRec requested a review from ethan-moss November 9, 2023 15:35
@ethan-moss ethan-moss self-assigned this Dec 6, 2023
Copy link
Collaborator

@ethan-moss ethan-moss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've completed a review of the source code updates. The main functionalities all LGTM. I've tested them with a few urban centres and:

  • we're getting consistent results to those we got previously, and
  • transport performance metrics are identical whether calculated using the batched mode or not.

Just sharing initial comments at this stage, as it would be useful to pick out which of these (if any at all) need to be implemented right away, tagged as tech-debt, or treat as 'won't fix'. Let me know your thoughts on this.

In the meantime I'll continue to review the unit tests and add comments

notebooks/urban_centres/buffer_experiments.py Show resolved Hide resolved
src/transport_performance/analyse_network.py Outdated Show resolved Hide resolved
requirements.txt Show resolved Hide resolved
src/transport_performance/analyse_network.py Show resolved Hide resolved
src/transport_performance/analyse_network.py Outdated Show resolved Hide resolved
src/transport_performance/analyse_network.py Show resolved Hide resolved
src/transport_performance/analyse_network.py Outdated Show resolved Hide resolved
src/transport_performance/analyse_network.py Outdated Show resolved Hide resolved
src/transport_performance/analyse_network.py Outdated Show resolved Hide resolved
src/transport_performance/analyse_network.py Outdated Show resolved Hide resolved
@ethan-moss ethan-moss self-requested a review December 15, 2023 16:14
Copy link
Collaborator

@ethan-moss ethan-moss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed this PR and all LTGM - thanks very much for putting this together, it's a really great job!

As well as reviewing the new features and tests in detail, i've also run back-to-back comparisons with previous results (when this module wasn't available). This gave identical outputs for multiple urban centres. This was also identical in terms of performance for the non-batched approach. I also compared transport performance outputs for batched and non-batched methods - again identical results for multiple urban centres.

Only 2 pieces of tech-debt remain (they do not impact the core functioning of analyse_network and will be dealt with as smaller tickets):

  1. Possible to overwrite parquet outputs when running od_matrix() with different configurations - Overwriting existing parquet files when rerunning od_matrix() with different configurations.  #222
  2. When batching, if there are no destination centroids within distance of the origin then it writes empty parquet files - Handle empty parquet files when batching and no destinations are within distance of origins #225

Also one enhancement off the back of this PR:

  1. OD matrix index naming when writing to parquet - Writing to parquet with a __null_dask_index__  #221

Happy to merge to dev. Thanks very much again for your help with this.

@ethan-moss ethan-moss merged commit cef04e2 into dev Dec 15, 2023
9 checks passed
@ethan-moss ethan-moss deleted the 173-analyse-network branch December 15, 2023 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactoring of od_matrix in analyse_network.py. develop initial analyse_network module
3 participants