185 html report always cleaning #248

CBROWN-ONS · 2024-02-09T08:19:14Z

Description

This PR makes it so that gtfs::validation::GtfsIntance.html_report() does not clean when the clean_feed param is set to False

Fixes #185

Motivation and Context

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

How Has This Been Tested?

Test configuration details:

OS: Windows 10
Python version: 3.9.13
Java version: N/A
Python management system: MiniConda

Advice for reviewer

Checklist:

My code follows the intended structure of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules

Additional comments

codecov-commenter · 2024-02-20T14:22:40Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.18%. Comparing base (51ee214) to head (9f91aad).

Additional details and impacted files

@@           Coverage Diff           @@
##              dev     #248   +/-   ##
=======================================
  Coverage   98.18%   98.18%           
=======================================
  Files          21       21           
  Lines        1924     1927    +3     
=======================================
+ Hits         1889     1892    +3     
  Misses         35       35

Flag	Coverage Δ
unittests	`98.18% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

SergioRec

When running this with the clipped GTFS for Newport, the new flag clean_feed works as expected, with the report including some errors and warnings when the flag is set to False that would be removed otherwise.

However, when running the function with clean_feed == False in larger GTFS files (all Wales or Leeds), I get the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File Untitled-1:2
      [1](untitled-1:1) # %%
----> [2](untitled-1:2) instance.html_report(overwrite=True, clean_feed=False)

File [~/src/transport_performance/gtfs/validation.py:1501](~/src/transport_performance/gtfs/validation.py:1501), in GtfsInstance.html_report(self, report_dir, overwrite, summary_type, extended_validation, clean_feed)
   [1499](~/src/transport_performance/gtfs/validation.py:1499) # create extended reports if requested
   [1500](~/src/transport_performance/gtfs/validation.py:1500) if extended_validation:
-> [1501](~/src/transport_performance/gtfs/validation.py:1501)     self._extended_validation(output_path=report_dir)
   [1502](~/src/transport_performance/gtfs/validation.py:1502)     info_href = (
   [1503](~/src/transport_performance/gtfs/validation.py:1503)         validation_dataframe["message"].apply(
   [1504](~/src/transport_performance/gtfs/validation.py:1504)             lambda x: "_".join(x.split(" "))
   (...)
   [1508](~/src/transport_performance/gtfs/validation.py:1508)         + ".html"
   [1509](~/src/transport_performance/gtfs/validation.py:1509)     )
   [1510](~/src/transport_performance/gtfs/validation.py:1510)     validation_dataframe["info"] = [
   [1511](~/src/transport_performance/gtfs/validation.py:1511)         f"""<a href="{href}"> Further Info</a>"""
   [1512](~/src/transport_performance/gtfs/validation.py:1512)         if len(rows) > 1
   [1513](~/src/transport_performance/gtfs/validation.py:1513)         else "Unavailable"
   [1514](~/src/transport_performance/gtfs/validation.py:1514)         for href, rows in zip(info_href, validation_dataframe["rows"])
   [1515](~/src/transport_performance/gtfs/validation.py:1515)     ]

File [~/src/transport_performance/gtfs/validation.py:1375](~/src/transport_performance/gtfs/validation.py:1375), in GtfsInstance._extended_validation(self, output_path, scheme)
   [1370](~/src/transport_performance/gtfs/validation.py:1370)         duplicate_counts[col] = impacted_rows[
   [1371](~/src/transport_performance/gtfs/validation.py:1371)             impacted_rows[f"{col}_original"]
   [1372](~/src/transport_performance/gtfs/validation.py:1372)             == impacted_rows[f"{col}_duplicate"]
   [1373](~/src/transport_performance/gtfs/validation.py:1373)         ].shape[0]
   [1374](~/src/transport_performance/gtfs/validation.py:1374) else:
-> [1375](~/src/transport_performance/gtfs/validation.py:1375)     impacted_rows = table_map[table].copy().iloc[rows]
   [1377](~/src/transport_performance/gtfs/validation.py:1377) # create the html to display the impacted rows (clean possibly)
   [1378](~/src/transport_performance/gtfs/validation.py:1378) table_html = f"""
   [1379](~/src/transport_performance/gtfs/validation.py:1379) <head>
   [1380](~/src/transport_performance/gtfs/validation.py:1380)     <link rel="stylesheet" href="styles.css">
   (...)
   [1389](~/src/transport_performance/gtfs/validation.py:1389)             {msg_type}</span>
   [1390](~/src/transport_performance/gtfs/validation.py:1390) </h1>"""

KeyError: 'full_stop_schedule'

I suspect there's something in those GTFS that doesn't work with the HTML report, and it was getting cleaned by default. Setting the new flag to False is causing the report to fail. Could you please investigate? Thanks!

PS: did some minor changes (correcting a typo and merging with dev).

…datasciencecampus/transport-network-performance into 185-html-report-always-cleaning

SergioRec

Hi @CBROWN-ONS. I am now getting a different KeyError using Leeds:

KeyError: 'multiple_stops_invalid'.

Is this a case where we will need to manually add objects to the table map manually if a new issue appears (e.g. with other unfamiliar GTFS), or will there be a fixed number of them? Just wondering if this issue may appear again when trying a new location (e.g. Germany).

CBROWN-ONS · 2024-02-22T11:01:05Z

Hi @CBROWN-ONS. I am now getting a different KeyError using Leeds:

KeyError: 'multiple_stops_invalid'.

Is this a case where we will need to manually add objects to the table map manually if a new issue appears (e.g. with other unfamiliar GTFS), or will there be a fixed number of them? Just wondering if this issue may appear again when trying a new location (e.g. Germany).

HI Sergio. This shouldn't be the case of 1 by 1 adding unless we add new validation tables manually. Do you have the full error message?
Thanks

SergioRec · 2024-02-22T14:08:46Z

Here's the code I ran:

# %%
from transport_performance.gtfs.multi_validation import MultiGtfsInstance
from pyprojroot import here

# %%
t = MultiGtfsInstance(here('data/interim/gtfs/itm_leeds_filtered_gtfs.zip'))
s = t.instances[0]
# %%
s.html_report(overwrite=True, clean_feed=False)
# %%

Here's the full traceback:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[3], [line 2](vscode-notebook-cell:?execution_count=3&line=2)
      [1](vscode-notebook-cell:?execution_count=3&line=1) # %%
----> [2](vscode-notebook-cell:?execution_count=3&line=2) s.html_report(overwrite=True, clean_feed=False)

File [~/src/transport_performance/gtfs/validation.py:1502](~//src/transport_performance/gtfs/validation.py:1502), in GtfsInstance.html_report(self, report_dir, overwrite, summary_type, extended_validation, clean_feed)
   [1500](~/src/transport_performance/gtfs/validation.py:1500) # create extended reports if requested
   [1501](~/src/transport_performance/gtfs/validation.py:1501) if extended_validation:
-> [1502](~/src/transport_performance/gtfs/validation.py:1502)     self._extended_validation(output_path=report_dir)
   [1503](~/src/transport_performance/gtfs/validation.py:1503)     info_href = (
   [1504](~/src/transport_performance/gtfs/validation.py:1504)         validation_dataframe["message"].apply(
   [1505](~/src/transport_performance/gtfs/validation.py:1505)             lambda x: "_".join(x.split(" "))
   (...)
   [1509](~/src/transport_performance/gtfs/validation.py:1509)         + ".html"
   [1510](~/src/transport_performance/gtfs/validation.py:1510)     )
   [1511](~/src/transport_performance/gtfs/validation.py:1511)     validation_dataframe["info"] = [
   [1512](~/src/transport_performance/gtfs/validation.py:1512)         f"""<a href="{href}"> Further Info</a>"""
   [1513](~/src/transport_performance/gtfs/validation.py:1513)         if len(rows) > 1
   [1514](~/src/transport_performance/gtfs/validation.py:1514)         else "Unavailable"
   [1515](~/src/transport_performance/gtfs/validation.py:1515)         for href, rows in zip(info_href, validation_dataframe["rows"])
   [1516](~/src/transport_performance/gtfs/validation.py:1516)     ]

File [~/src/transport_performance/gtfs/validation.py:1376](~/src/transport_performance/gtfs/validation.py:1376), in GtfsInstance._extended_validation(self, output_path, scheme)
   [1371](~/src/transport_performance/gtfs/validation.py:1371)         duplicate_counts[col] = impacted_rows[
   [1372](~/src/transport_performance/gtfs/validation.py:1372)             impacted_rows[f"{col}_original"]
   [1373](~/src/transport_performance/gtfs/validation.py:1373)             == impacted_rows[f"{col}_duplicate"]
   [1374](~/src/transport_performance/gtfs/validation.py:1374)         ].shape[0]
   [1375](~/src/transport_performance/gtfs/validation.py:1375) else:
-> [1376](~/src/transport_performance/gtfs/validation.py:1376)     impacted_rows = table_map[table].copy().iloc[rows]
   [1378](~/src/transport_performance/gtfs/validation.py:1378) # create the html to display the impacted rows (clean possibly)
   [1379](~/src/transport_performance/gtfs/validation.py:1379) table_html = f"""
   [1380](~/src/transport_performance/gtfs/validation.py:1380) <head>
   [1381](~/src/transport_performance/gtfs/validation.py:1381)     <link rel="stylesheet" href="styles.css">
   (...)
   [1390](~/src/transport_performance/gtfs/validation.py:1390)             {msg_type}</span>
   [1391](~/src/transport_performance/gtfs/validation.py:1391) </h1>"""

KeyError: 'multiple_stops_invalid'

CBROWN-ONS · 2024-02-22T15:04:27Z

Here's the code I ran:

# %%
from transport_performance.gtfs.multi_validation import MultiGtfsInstance
from pyprojroot import here

# %%
t = MultiGtfsInstance(here('data/interim/gtfs/itm_leeds_filtered_gtfs.zip'))
s = t.instances[0]
# %%
s.html_report(overwrite=True, clean_feed=False)
# %%

Here's the full traceback:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[3], [line 2](vscode-notebook-cell:?execution_count=3&line=2)
      [1](vscode-notebook-cell:?execution_count=3&line=1) # %%
----> [2](vscode-notebook-cell:?execution_count=3&line=2) s.html_report(overwrite=True, clean_feed=False)

File [~/src/transport_performance/gtfs/validation.py:1502](~//src/transport_performance/gtfs/validation.py:1502), in GtfsInstance.html_report(self, report_dir, overwrite, summary_type, extended_validation, clean_feed)
   [1500](~/src/transport_performance/gtfs/validation.py:1500) # create extended reports if requested
   [1501](~/src/transport_performance/gtfs/validation.py:1501) if extended_validation:
-> [1502](~/src/transport_performance/gtfs/validation.py:1502)     self._extended_validation(output_path=report_dir)
   [1503](~/src/transport_performance/gtfs/validation.py:1503)     info_href = (
   [1504](~/src/transport_performance/gtfs/validation.py:1504)         validation_dataframe["message"].apply(
   [1505](~/src/transport_performance/gtfs/validation.py:1505)             lambda x: "_".join(x.split(" "))
   (...)
   [1509](~/src/transport_performance/gtfs/validation.py:1509)         + ".html"
   [1510](~/src/transport_performance/gtfs/validation.py:1510)     )
   [1511](~/src/transport_performance/gtfs/validation.py:1511)     validation_dataframe["info"] = [
   [1512](~/src/transport_performance/gtfs/validation.py:1512)         f"""<a href="{href}"> Further Info</a>"""
   [1513](~/src/transport_performance/gtfs/validation.py:1513)         if len(rows) > 1
   [1514](~/src/transport_performance/gtfs/validation.py:1514)         else "Unavailable"
   [1515](~/src/transport_performance/gtfs/validation.py:1515)         for href, rows in zip(info_href, validation_dataframe["rows"])
   [1516](~/src/transport_performance/gtfs/validation.py:1516)     ]

File [~/src/transport_performance/gtfs/validation.py:1376](~/src/transport_performance/gtfs/validation.py:1376), in GtfsInstance._extended_validation(self, output_path, scheme)
   [1371](~/src/transport_performance/gtfs/validation.py:1371)         duplicate_counts[col] = impacted_rows[
   [1372](~/src/transport_performance/gtfs/validation.py:1372)             impacted_rows[f"{col}_original"]
   [1373](~/src/transport_performance/gtfs/validation.py:1373)             == impacted_rows[f"{col}_duplicate"]
   [1374](~/src/transport_performance/gtfs/validation.py:1374)         ].shape[0]
   [1375](~/src/transport_performance/gtfs/validation.py:1375) else:
-> [1376](~/src/transport_performance/gtfs/validation.py:1376)     impacted_rows = table_map[table].copy().iloc[rows]
   [1378](~/src/transport_performance/gtfs/validation.py:1378) # create the html to display the impacted rows (clean possibly)
   [1379](~/src/transport_performance/gtfs/validation.py:1379) table_html = f"""
   [1380](~/src/transport_performance/gtfs/validation.py:1380) <head>
   [1381](~/src/transport_performance/gtfs/validation.py:1381)     <link rel="stylesheet" href="styles.css">
   (...)
   [1390](~/src/transport_performance/gtfs/validation.py:1390)             {msg_type}</span>
   [1391](~/src/transport_performance/gtfs/validation.py:1391) </h1>"""

KeyError: 'multiple_stops_invalid'

This should be fixed now. It was the same error as before (an unfinished TODO)

r-leyshon

Checked reports against fixtures. Behaves as expected with a few caveats. Please see issues below for reference:

#262
#263
#264

* feat: add doctstring;add param;add type defence * feat: add condition on whether to clean or not * fix: fixed small typo in error message * fix: changed match in test to fit error message * fix: update gtfs attr table for html reports * test: Refactor tests that are now warning about expired feed * refactor: Test targets new row thanks to feed expired warning * refactor: Tests assert against fixture with expired feed warning --------- Co-authored-by: Sergio Recio <[email protected]> Co-authored-by: r-leyshon <[email protected]> 4300b44

CBROWN-ONS added 2 commits February 9, 2024 07:59

feat: add doctstring;add param;add type defence

b0930c8

feat: add condition on whether to clean or not

17376fe

CBROWN-ONS added the GTFS label Feb 9, 2024

CBROWN-ONS requested review from r-leyshon, ethan-moss and SergioRec February 9, 2024 08:19

CBROWN-ONS added the x-small label Feb 13, 2024

ethan-moss removed request for r-leyshon and ethan-moss February 15, 2024 11:34

SergioRec self-assigned this Feb 19, 2024

chore: merged with dev

758d1c5

SergioRec added 2 commits February 20, 2024 14:40

fix: fixed small typo in error message

bbd9414

fix: changed match in test to fit error message

a0f1a82

SergioRec suggested changes Feb 20, 2024

View reviewed changes

CBROWN-ONS added 2 commits February 21, 2024 11:28

fix: update gtfs attr table for html reports

3e41456

Merge branch '185-html-report-always-cleaning' of https://github.com/…

e661aa3

…datasciencecampus/transport-network-performance into 185-html-report-always-cleaning

SergioRec suggested changes Feb 22, 2024

View reviewed changes

CBROWN-ONS unassigned SergioRec Mar 19, 2024

r-leyshon added 4 commits May 1, 2024 10:29

Merge branch 'dev' into 185-html-report-always-cleaning

581ea6e

test: Refactor tests that are now warning about expired feed

5040a56

refactor: Test targets new row thanks to feed expired warning

268fde6

refactor: Tests assert against fixture with expired feed warning

9f91aad

r-leyshon self-assigned this May 1, 2024

This was referenced May 1, 2024

html_report KeyError #262

Open

regression test for html_report #263

Open

r-leyshon approved these changes May 1, 2024

View reviewed changes

r-leyshon merged commit 4300b44 into dev May 1, 2024
9 checks passed

r-leyshon deleted the 185-html-report-always-cleaning branch May 1, 2024 13:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

185 html report always cleaning #248

185 html report always cleaning #248

CBROWN-ONS commented Feb 9, 2024

codecov-commenter commented Feb 20, 2024 •

edited

Loading

SergioRec left a comment •

edited

Loading

SergioRec left a comment

CBROWN-ONS commented Feb 22, 2024

SergioRec commented Feb 22, 2024

CBROWN-ONS commented Feb 22, 2024

r-leyshon left a comment

185 html report always cleaning #248

185 html report always cleaning #248

Conversation

CBROWN-ONS commented Feb 9, 2024

Description

Motivation and Context

Type of change

How Has This Been Tested?

Advice for reviewer

Checklist:

Additional comments

codecov-commenter commented Feb 20, 2024 • edited Loading

Codecov Report

SergioRec left a comment • edited Loading

Choose a reason for hiding this comment

SergioRec left a comment

Choose a reason for hiding this comment

CBROWN-ONS commented Feb 22, 2024

SergioRec commented Feb 22, 2024

CBROWN-ONS commented Feb 22, 2024

r-leyshon left a comment

Choose a reason for hiding this comment

codecov-commenter commented Feb 20, 2024 •

edited

Loading

SergioRec left a comment •

edited

Loading