Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DELFI (and others): Use wikidata entity id as agency_id #143

Open
hbruch opened this issue Apr 4, 2024 · 4 comments
Open

DELFI (and others): Use wikidata entity id as agency_id #143

hbruch opened this issue Apr 4, 2024 · 4 comments
Labels
DELFI Erweiterung Erweiterungs- oder Änderungswunsch

Comments

@hbruch
Copy link
Member

hbruch commented Apr 4, 2024

Current issue(s)
For all agencies in the DELFI GTFS feed, agency_url is set to https://www.delfi.de, for most of them further information (besides the name) is missing. Regarding the agencies IDs, it's unclear who maintains them and if they are stable accross different feed versions.

Enhancement/addition I'd like to suggest
As all of this information should be publicly available, and many agencies are already present in wikidata, I suggest to use wikidata entity IDs as identifiers, by which further information can be linked to agencies and unique IDs across GTFS feeds would automatically achieved.

This also would be a step forward to promote linked open data in the transit domain.

Downsides
Currently, the DELFI feed often uses kind of "Dummy agencies" (see #107) which would not exist in wikidata. Personally, I consider this bad practice and the recommendation to use wikidata entity IDs could underline that real agencies should be specified. As long as this is not the case, agency_ids not refering to an existing wikidata-entity should at least not use wikidata entity id format, i.e. they shoul not start with a Q followed by numbers.

Last update of GTFS Feed
2024-04-02

GTFS Feed Download Link
Open-Data ÖPNV

@hbruch hbruch added Erweiterung Erweiterungs- oder Änderungswunsch DELFI labels Apr 4, 2024
@hbruch
Copy link
Member Author

hbruch commented Apr 4, 2024

To start collecting the entity identifiers and match them with the current agency_id, I started this DELFI GTFS Agencies Google Sheet. Feel free to create missing agencies in wikidata.

@BeckertAnke
Copy link

Your suggestion is an interesting approach. It will be included in our internal discussion about adapting agency.txt.
Using the example of the associations in Baden-Württemberg, I would like to point out the following restriction: the company Friedrich Müller Omnibus operates on both the VVS and the HNV and each has its own internal ID for Friedrich Müller Omibus. It will be difficult to merge these two IDs in our data collector and reference them to the wikidata ID.

Best Regards
BeckertAnke (DELFI-Team)

@hbruch
Copy link
Member Author

hbruch commented Apr 5, 2024

Thanks for considering it in the further discussion. I guess this ID merge restriction is also the reason for current _G or _D suffixes in stops.txt oder routes.txt? If that's the case, I'd think that a general solution for merging equivalent entities provided by different agencies needs to found. If the collector itself can't merge them, a post-processing might be required(?)

@BeckertAnke
Copy link

You're right. The _G suffixes are added to the GTFS feed due to data merging. We also do not want _D suffixes. We follow this up with the data suppliers and ask them to provide us with correct data sets.

Merging agencies is hard work ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DELFI Erweiterung Erweiterungs- oder Änderungswunsch
Projects
None yet
Development

No branches or pull requests

2 participants