173 analyse network #200

SergioRec · 2023-11-09T08:32:18Z

Description

Module analyse_network.

Closes #173
Fixes #211

Motivation and Context

This module is a wrapper for r5py to calculate an origin-destination matrix to feed into the matrix module. It takes the outputs from the urban_centres and rasterpop modules.

Module allows selection of a different number of origins to calculate O-D matrices in batches. It saves the outputs as parquet files, one per origin batch. However, it will split any parquet file exceeding a certain size.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

How Has This Been Tested?

Module run locally, outputs checked manually.
Unit tests run locally.

Test configuration details:

OS: MacOS Ventura
Python version: 3.9.13
Java version: 11
Python management system: pip/conda

Advice for reviewer

Run times:

Newport: all origins in batch, ~3 minutes
Marseille: all origins in batch, ~10 minutes
Leeds: all origins in batch, ~ 1 hour
London: one origins per batch, ~40 hours

Checklist:

My code follows the intended structure of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules

Additional comments

I've chosen to not explicitly ask for keyword arguments from r5py and leave them as **kwargs. This may not be a good choice, but I wasn't sure if arbitrarily including some arguments and not others would be messy. Happy to change it.

Regarding the defence against dates not being in the GTFS (which also triggers when no valid GTFS is passed and TransportMode.TRANSIT is included), it's not straightforward to access any attribute within TransportNetwork or TravelTimeMatrixComputer that exposes GTFS information. The current solution catches the specific runtime warning and raises an IndexError when present.

codecov-commenter · 2023-11-09T08:38:39Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (1156895) 98.18% compared to head (707dbe5) 98.30%.

❗ Current head 707dbe5 differs from pull request most recent head e351984. Consider uploading reports for the commit e351984 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##              dev     #200      +/-   ##
==========================================
+ Coverage   98.18%   98.30%   +0.11%     
==========================================
  Files          18       19       +1     
  Lines        1433     1530      +97     
==========================================
+ Hits         1407     1504      +97     
  Misses         26       26

Flag	Coverage Δ
unittests	`98.30% <100.00%> (+0.11%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…e import

ethan-moss

I've completed a review of the source code updates. The main functionalities all LGTM. I've tested them with a few urban centres and:

we're getting consistent results to those we got previously, and
transport performance metrics are identical whether calculated using the batched mode or not.

Just sharing initial comments at this stage, as it would be useful to pick out which of these (if any at all) need to be implemented right away, tagged as tech-debt, or treat as 'won't fix'. Let me know your thoughts on this.

In the meantime I'll continue to review the unit tests and add comments

notebooks/urban_centres/buffer_experiments.py

src/transport_performance/analyse_network.py

requirements.txt

src/transport_performance/analyse_network.py

ethan-moss

I've reviewed this PR and all LTGM - thanks very much for putting this together, it's a really great job!

As well as reviewing the new features and tests in detail, i've also run back-to-back comparisons with previous results (when this module wasn't available). This gave identical outputs for multiple urban centres. This was also identical in terms of performance for the non-batched approach. I also compared transport performance outputs for batched and non-batched methods - again identical results for multiple urban centres.

Only 2 pieces of tech-debt remain (they do not impact the core functioning of analyse_network and will be dealt with as smaller tickets):

Possible to overwrite parquet outputs when running od_matrix() with different configurations - Overwriting existing parquet files when rerunning od_matrix() with different configurations. #222
When batching, if there are no destination centroids within distance of the origin then it writes empty parquet files - Handle empty parquet files when batching and no destinations are within distance of origins #225

Also one enhancement off the back of this PR:

OD matrix index naming when writing to parquet - Writing to parquet with a __null_dask_index__ #221

Happy to merge to dev. Thanks very much again for your help with this.

SergioRec added 3 commits November 9, 2023 08:13

feat: analyse network module

5a3a2ee

test: tests for analyse_network module

0586627

chore: updated requirements

e9e9815

SergioRec added r5py medium labels Nov 9, 2023

SergioRec marked this pull request as draft November 9, 2023 09:01

feat: added defence for GTFS dates in TransportNetwork, with test

cd8090f

SergioRec marked this pull request as ready for review November 9, 2023 15:35

SergioRec requested a review from ethan-moss November 9, 2023 15:35

ethan-moss and others added 16 commits November 28, 2023 10:13

chore: merged with dev

17dd8c7

feat: added notebook with experiments about projections and buffer

036748a

feat: refactored function to include bool flag

5271c08

chore: moved analyse_network.py to parent folder to prevent repetitiv…

a65913d

…e import

feat: added defence for argument

5ac74ef

chore: minor comment changes

797141f

chore: moved to folder

1e719f6

fix: change module import

8b38c64

chore: merge with dev

a74cc70

fix: corrected reference to unassigned variable

39a0ebf

feat: added example notebook running analyse_network

f3a61af

tests: separated fixtures into different file

75003dd

tests: refactor tests to use a 'name-value' structure in parametrization

3104c47

tests: refactored tests to account for new variable

dacbb35

tests: small refactoring of parquet files assertion

3011223

chore: merged with dev for urban centre updates

f4c79bb

ethan-moss self-assigned this Dec 6, 2023

ethan-moss reviewed Dec 6, 2023

View reviewed changes

SergioRec added 2 commits December 7, 2023 09:52

added defence for gdf projection, inc. warning and reprojection

80b5d28

removed '== True' instances, dropped index after reset

74d08bd

SergioRec and others added 23 commits December 7, 2023 09:56

changed '<' for '<='

b2e6d13

pinned pyarrow version in requirements

5cdaacf

fix: updated 'dummy_gdf_centroids' to have bool flag instead of int

a653033

tests: added test for crs warning

70eaddb

added defence for 'destination_col' dtype

4f47c31

added test for destination_col dtype check

896bd76

changed type of distance to int or float, modified defence, added test

8e1bfef

change to name of path variable in notebook

4140bf1

added defence for out_path containing parquet files

55672db

changed notebook as result of analyse_network init change

bfbbb85

tests: fixed tests adding a module scoped temp path

ebcb524

tests: added test for files already in directory defence

49077bb

chore: merged latest updates with urban centre mods

f224b3c

chore: removed duplicate package haversine

e426860

chore: resolved conflict with dev after #193 merge

4f9d7f9

fix: move within uc col defence upfront; updated tests

5a7b448

test: added num generator instances check to _gdf_batch_origins_outputs

56aa153

chore: changes tmp_path_factory sub-dir

a4c213a

refactor: move test file to preserve src symmetry

56de54f

refactor: moved common parquet file checks

2fa89ee

chore: updated test comments

d4fafea

refactor: aligning integ runner with updated analyse_network tests

707dbe5

refactor: aligning windows runner with updated analyse_network tests

e351984

ethan-moss self-requested a review December 15, 2023 16:14

ethan-moss approved these changes Dec 15, 2023

View reviewed changes

ethan-moss merged commit cef04e2 into dev Dec 15, 2023
9 checks passed

ethan-moss deleted the 173-analyse-network branch December 15, 2023 16:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

173 analyse network #200

173 analyse network #200

SergioRec commented Nov 9, 2023 •

edited

Loading

codecov-commenter commented Nov 9, 2023 •

edited

Loading

ethan-moss left a comment

ethan-moss left a comment

173 analyse network #200

173 analyse network #200

Conversation

SergioRec commented Nov 9, 2023 • edited Loading

Description

Motivation and Context

Type of change

How Has This Been Tested?

Advice for reviewer

Checklist:

Additional comments

codecov-commenter commented Nov 9, 2023 • edited Loading

Codecov Report

ethan-moss left a comment

Choose a reason for hiding this comment

ethan-moss left a comment

Choose a reason for hiding this comment

SergioRec commented Nov 9, 2023 •

edited

Loading

codecov-commenter commented Nov 9, 2023 •

edited

Loading