Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incrementally generate raptor transfers from constrained transfers for new pattern #6191

Draft
wants to merge 1 commit into
base: dev-2.x
Choose a base branch
from

Conversation

slvlirnoff
Copy link
Member

Summary

This aims at incrementally updating the generated raptor transfers for the transit layer from the constrained transfers. It keeps track of the last RoutingPattern index and only generate transfers that are alighting or boarding newly created patterns.

Issue

Closes #6190

@slvlirnoff slvlirnoff requested a review from a team as a code owner October 23, 2024 16:41
Copy link

codecov bot commented Oct 23, 2024

Codecov Report

Attention: Patch coverage is 65.71429% with 12 lines in your changes missing coverage. Please review.

Project coverage is 69.90%. Comparing base (cee960f) to head (dd394a5).
Report is 595 commits behind head on dev-2.x.

Files with missing lines Patch % Lines
...it/constrainedtransfer/TransferIndexGenerator.java 65.71% 8 Missing and 4 partials ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             dev-2.x    #6191      +/-   ##
=============================================
- Coverage      69.90%   69.90%   -0.01%     
- Complexity     17723    17725       +2     
=============================================
  Files           1998     1998              
  Lines          75443    75468      +25     
  Branches        7718     7727       +9     
=============================================
+ Hits           52740    52754      +14     
- Misses         20025    20031       +6     
- Partials        2678     2683       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@slvlirnoff
Copy link
Member Author

In our deployment (40k pattern, 60k constrained transfer) updating the transit layer goes from systematically ~6s to ~10-20ms in average and a up to a few hundreds ms when new pattern are created.

@slvlirnoff slvlirnoff changed the title Incrementally generate raptor transfers from constrained transfers from new pattern Incrementally generate raptor transfers from constrained transfers for new pattern Oct 23, 2024
@optionsome optionsome requested a review from t2gran October 24, 2024 14:41
@slvlirnoff
Copy link
Member Author

@t2gran I've investigated further where the performance issue comes from for switzerland's setup. Out of the 50k constrained transfers, the majority of them are trip to trip and aren't causing an issue but there is around 8'000 stations to stations transfers (for minimum transfer time) that are creating 40'000'000 TransferForPattern object.

@t2gran
Copy link
Member

t2gran commented Nov 5, 2024

@t2gran I've investigated further where the performance issue comes from for switzerland's setup. Out of the 50k constrained transfers, the majority of them are trip to trip and aren't causing an issue but there is around 8'000 stations to stations transfers (for minimum transfer time) that are creating 40'000'000 TransferForPattern object.

I am not sure if you are allowed to do it or not, but to me a sensible thing would be to specify a minTransferTime per stop (on the target stop). If there are additional constraints, like only apply the minWaitTime to longDistance rail, we could add that. If you want I would like to discuss this in a meeting. I think the GTFS specification is impossible to implement in a scalable way, and that we should NOT do it. Instead we should come up with some features that cover the needs of the industry. We should probably also inform the GTFS community what we think and what we intend to do.

Is there a time witch we can discuss this?

@leonardehrenfried
Copy link
Member

leonardehrenfried commented Nov 6, 2024

My experience is that often legacy systems required explicit transfers to be defined for them to work at all. When these data sets are being converted to GTFS they are mapped to minimum_transfer_times when in fact it just means that a transfer is possible. In GTFS and OTP it's not necessary to explicitly say that the transfer is possible and the better option is to micromap your stations so that you get good, flexible walking instructions with detailed paths.

I often completely drop the minimum_transfer_times from my German data sets. I suspect the source of @slvlirnoff's data is a German software vendor.

@slvlirnoff
Copy link
Member Author

Yep, unfortunately we currently have a need to use this information. In many cases it is difficult to micromap the station level for several reasons, the main one is that we aren't always in contact with the producer/owner of the specific station data.

They also use these constraints to sometimes prevent/discourage short transfer between specific stops, for instance there are nearby bus stops up the mountains were you could theoretically switch bus in under a minute, however if your bus is late or the other bus is running early you'll be stuck in the middle of nowhere for an hour and it'd be safer to consider only transfers with 10mn slack.

In other cases the train platforms are very long and you could reach them from multiple connecting bus stops relatively quickly, to the front, the middle or respectively the back of the platform. This is harder to map and historically done with these transfer time. Would you map these with pathways?

In any case, there are two issues:

  • if you create a constrained transfer between stops it'll have a significant performance impact that is somehow hidden to the operator of OTP (no logs, etc.). OTP displays the number of constrained transfers however it doesn't display the number of computed raptor transfers and here it generates 40Mio objects from 8k constraints. This practice should be potentially discouraged in the documentation.
  • The raptor transfers are recomputed for each transit layer update cycle, this PR tries to address this.

@t2gran
Copy link
Member

t2gran commented Nov 12, 2024

They also use these constraints to sometimes prevent/discourage short transfer between specific stops, for instance there are nearby bus stops up the mountains were you could theoretically switch bus in under a minute, however if your bus is late or the other bus is running early you'll be stuck in the middle of nowhere for an hour and it'd be safer to consider only transfers with 10mn slack.

This is supported in NeTEx/OTP by setting stop priority.

In other cases the train platforms are very long and you could reach them from multiple connecting bus stops relatively quickly, to the front, the middle or respectively the back of the platform. This is harder to map and historically done with these transfer time. Would you map these with pathways?

No, this should be done with a new feature "minimumTransferTime for stop with an optional submode". Submode, if you want to distinguage between local and long distance trains.

There is a report in the report API witch can be used to look at the number of generated constrained transfers: /otp/report/transfers.csv

@t2gran
Copy link
Member

t2gran commented Nov 12, 2024

The raptor transfers are recomputed for each transit layer update cycle, this PR tries to address this.

I also want this. So, I will review the PR, sorry for the delay. We planned to fix this after the refactoring of the transit model, but I think that will take time.

Copy link
Member

@t2gran t2gran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TransferIndexGenerator has a few issues and this PR make it worse, but I am in the favour of fixing this issue now, and not do the cleanup first. The cleanup is probably not possible without a lot of work on the transit model.

@@ -42,6 +42,10 @@ public class TransferIndexGenerator {
private final Map<Route, Set<RoutingTripPattern>> patternsByRoute = new HashMap<>();
private final Map<Trip, Set<RoutingTripPattern>> patternsByTrip = new HashMap<>();

private int lastPatternIndex = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to prevProcessedNumberOfPatterns, and make it equals to the size of the collection.

Comment on lines +46 to +47
private TransferForPatternByStopPos[] prevForwardTransfers = {};
private TransferForPatternByStopPos[] prevReverseTransfers = {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above code keep a reference to the list(array) of transfers kept elsewhere. This may lead to inconsistency problems, but I guess this is also the case for the other cashed data in this class. So I think we can ignore it for know. There is no performance gain of keeping these as arrays, so they can be ArrayLists. I will comment on the needed changes bellow to use ArrayList instead, the coping is also unnecessary, so:

  // We need to use ArrayList to be able to enforce the capacity
  private ArrayList<TransferForPatternByStopPos> forwardTransfers = new ArrayList<>();
  private ArrayList<TransferForPatternByStopPos> reverseTransfers = new ArrayList<>();

Comment on lines +62 to +65
return new ConstrainedTransfersForPatterns(
Arrays.asList(prevForwardTransfers),
Arrays.asList(prevReverseTransfers)
);
Copy link
Member

@t2gran t2gran Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not dry, make a private factory method, if using ArrayList it will be something like:

/**
 * TODO - Doc on: 
 *  - Why we set prevProcessedNumberOfPatterns here
 *  - Why we make a defensive copy of the list or push into ConstrainedTransfersForPatterns
 */
private ConstrainedTransfersForPatterns createConstrainedTransfersForPatterns() {
    this.prevProcessedNumberOfPatterns = prevForwardTransfers.size();
    return new ConstrainedTransfersForPatterns(
      // TODO: Push the copy into the construct instead of doing it here
      List.copyOf(forwardTransfers),
      List.copyOf(reverseTransfers)
    );
}

Note! No matter how you do this the arrays will be copied at this point, see Arrays.asList(), List.copyOf().

Comment on lines +71 to +74
// Copy previously generated transfers
System.arraycopy(prevForwardTransfers, 0, forwardTransfers, 0, prevForwardTransfers.length);
System.arraycopy(prevReverseTransfers, 0, reverseTransfers, 0, prevReverseTransfers.length);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With ArrayList this can be replaced with (line 68-73):

 // Ensure we increment the capacity once and not incremental when we add new transfers  
 forwardTransfers.ensureCapacity(nPatterns);
 reverseTransfers.ensureCapacity(nPatterns);

Unsure if this is needed, but at least it can hurt much (except readability).

Comment on lines +105 to +109
// Update the last pattern handled index and store the generated transfers
lastPatternIndex = nPatterns - 1;
prevForwardTransfers = forwardTransfers;
prevReverseTransfers = reverseTransfers;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Update the last pattern handled index and store the generated transfers
lastPatternIndex = nPatterns - 1;
prevForwardTransfers = forwardTransfers;
prevReverseTransfers = reverseTransfers;

And replace the new Co... with the factory method.

Comment on lines +89 to +90
boolean alightFromPreviouslySeenPattern =
fromPoint.pattern.patternIndex() <= lastPatternIndex;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern.patternIndex() <= lastPatternIndex is a leaked abstraction, but it is difficult to fix here - I looked into it, but at least you should make sure it is dry - implemented in just one place with a private method:

private boolean `isNewPattern(RoutingTripPattern pattern) { return pattern.patternIndex() >= prevProcessedNumberOfPatterns; }`

and

boolean alightFromPreviouslySeenPattern = !isNewPattern(fromPoint.pattern);

@@ -158,6 +186,9 @@ private List<TPoint> findTPoints(StationTransferPoint point) {
var result = new ArrayList<TPoint>();

for (RoutingTripPattern pattern : patterns) {
if (onlyNewPatterns && pattern.patternIndex() <= lastPatternIndex) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (onlyNewPatterns && pattern.patternIndex() <= lastPatternIndex) {
if (onlyNewPatterns && !isNewPattern(pattern)) {

@@ -180,6 +211,9 @@ private List<TPoint> findTPoints(StopTransferPoint point) {
var result = new ArrayList<TPoint>();

for (RoutingTripPattern pattern : patterns) {
if (onlyNewPatterns && pattern.patternIndex() <= lastPatternIndex) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (onlyNewPatterns && pattern.patternIndex() <= lastPatternIndex) {
if (onlyNewPatterns && isNewPattern(pattern)) {

@slvlirnoff
Copy link
Member Author

Thanks for the review @t2gran I'll try to address your comments shortly.

@t2gran t2gran added this to the 2.7 (next release) milestone Dec 11, 2024
@optionsome optionsome marked this pull request as draft December 12, 2024 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TransferIndexGenerator ConstrainedTransfer and TransitLayerUpdater performance issue
3 participants