-
-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Add utility to abbreviate road names #14
Comments
mapnik-german-l10n doesn’t produce very good results in English because it has to avoid stepping on the toes of the French abbreviation code and also abbreviates words out of context (e.g., “Court Street” becomes “Ct Street”). I’d imagine it would be straightforward to write more robust abbreviation code from scratch. Ideally, the abbreviator would know the country that the feature is in, allowing it to make language-specific assumptions about |
These projects have different use cases, so they apply different inclusion criteria. The abbreviations in OSRM Text Instructions are used by the Mapbox Directions API, which tags words in Mapnik has a similar ability to progressively abbreviate labels, but as you’ve noted, MapLibre does not yet have this capability. If you use OSRM Text Instructions’ abbreviations without progressive abbreviation, avoid the Footnotes
|
Thanks for the feedback @1ec5 ! Of those 3 options above, I was leaning towards geocoder-abbreviations since it includes things like "one" -> "1" and "fifteenth" -> "15th" with something like: point -> country -> default language for that country -> tokens file for that language. Do you foresee any issues going that route? Or alternatively are you aware of any other better data sources to power these kinds of abbreviations? |
geocoder-abbreviations is more comprehensive; however, it doesn’t distinguish the more aggressive abbreviations from the less aggressive ones. You’d need to be careful about applying abbreviations only to words that are unlikely to be the base name, so that “South Park Street” would get abbreviated as “S Park St” rather than “S Pk St”, which wouldn’t be very recognizable. This touches on a broader problem with OSM’s unstructured representation of street names, combined with insisting on spelled-out words in |
Good points, I'll keep a running list of test cases that should pass at the top of this issue. Another thought I had was using libpostal (https://github.com/openvenues/jpostal) to try to extract some more structure from raw street names. Not sure it would handle street names and not addresses though...? |
Moved the openmaptiles profile to https://github.com/openmaptiles/planetiler-openmaptiles. This ticket will track adding the generic capability of abbreviating road-names, and openmaptiles/planetiler-openmaptiles#17 will track using that from the openmaptiles profile. |
This repository “moved” to https://github.com/geocoding/geocoder-abbreviations/. |
Thanks - I was wondering if this might even make more sense as a client-side capability - possibly requiring a new hook added to maplibre? The raw data is the full length name, it seems like a client may either want to render abbreviated or un-abbreviated versions from the same raw tile data? Either way, we need a datasource that describes the abbreviation rules by locale... |
The client-side renderer doesn’t have as much context as the tile generator – and the string processing capabilities inside MapLibre are comically limited. I think ideally the tiles would contain both the unabbreviated and abbreviated forms, maybe even a series of progressively abbreviated forms, that the client would apply as space allows. In osm-americana/openstreetmap-americana#793 (comment), I started prototyping a purely client-side solution for dropping less important words from a street name, which is a similar problem. Identifying the words to delete and then deleting them is massively difficult, but choosing the best-fitting name turned out to be quite feasible, with the caveat of increased CPU usage due to extra symbol placement. If the tiles already contain the candidate names, displaying the best one should be pretty straightforward with MapLibre’s existing capabilities. If MapLibre then adds a built-in way to try different labels as space allows, as Mapnik does, then it would be all that much easier. |
The feature properties transform hook could be good for this. I should finish the pull request at some point... maplibre/maplibre-gl-js#4199 |
The transform hook would be desirable – it would make it easier to work around the lack of string processing expression operators. However, there still isn’t very much context, so the client-side code would need to make lots of assumptions, as I did in my proof of concept above. As with the other stuff that OSM Americana does with runtime styling, it isn’t very portable and makes it harder to integrate the stylesheet into an application centered around “basemap layers”. A tile generator is in a much better position to precalculate the abbreviations. It can always stick them in a separate field, just as with localized names, to maximize the tileset’s versatility. |
When you say "context" are you referring to just the country/default language that the feature is in? Including both abbreviated and unabbreviated names in the tiles solves for my concern of letting the client choose which one to use. We could also possibly default to What do you think the best way to get started on this would be @1ec5 ? TBH the main reason I haven't done this yet is that your earlier comments made it seem like it might not even be possible to do correctly 😆 Maybe we could start with a conservative subset of one of those data sources (ie. just the ones in your label density branch) and gradually add more? |
Yes, the language and country are important context. Otherwise, we don’t know whether to abbreviate “boulevard” as “Bd.” (French) or as “Blvd.” (English), or whether to abbreviate “Calle” to “C.” (Spanish, but not when borrowed into English). For schemas such as OpenMapTiles, we know the language of most of the name tags but not
Yes,
My branch took advantage of the style’s focus on North America and only targeted English users, but if Planetiler has context about the country and can guess the language, then it doesn’t have to make such assumptions and doesn’t need to be particularly conservative. That said, a small hard-coded list is better than nothing, as long as it’s a coherent list, rather than the arbitrary one OpenMapTiles uses. |
Thanks, so a rough approach in planetiler would be:
I guess it would also be possible to generate abbreviations for each name translation like Do you think it would be best to start from one of those existing data sources? Or build up a map-rendering-specific abbreviation data source from scratch? |
The most important heuristic for language identification would be finding a
This is about what language In theory,
Coverage for streets would be very low except in some bilingual cities. Also, the OpenMapTiles schema currently only stores road names in English and German, ignoring other languages. A possible half-measure would be to always produce a
Either is fine I guess, but if we use an existing source, it should be one intended for display rather than search. A search-oriented dataset will tend to be too aggressive. |
I think if you give an |
Yeah the geocoder one (https://github.com/geocoding/geocoder-abbreviations/) seems like it's going to be too aggressive. The OSRM one (https://github.com/Project-OSRM/osrm-text-instructions/tree/master/languages/abbreviations) seems closer to what we want, except the last update was by @1ec5 6 years ago 😆 To Oliver's point - I don't want to be invoking an LLM from planetiler directly, but I could see using one to generate static config files to start from and maybe test cases. It might be 90% of the way there and we can correct from that. |
It’s pretty stable; most languages don’t change their abbreviations very often. The nice thing about that project is that volunteers can sign up to add their language on Transifex today using a decently user-friendly interface.
I don’t think we need to treat OSM like a black box and rely on an LLM to interpret it for us. My only point was that |
OK cool, trying to think how to decouple this into simpler subtasks...it seems like there are 3 distinct steps here:
Or do you think it needs to go down to region instead of country in order to infer a language? I could see building up a set of test cases for 1 and a separate set of test cases for 2 to make sure the rules from that wiki page and heuristics work together to produce the desired language inference. Or do you think we can't really separate 1 and 2 and we'll want the code/test cases to go directly from element and inferred country to abbreviated name? Or even worse that 3 can't be separated and we'd need to go from element and location to abbreviated name? |
Yes, and to add to the complexity, we definitely would have to consider subnational divisions such as Québec and the cantons of Switzerland.
Natural Earth should be fine. We don’t really need precision, because all we’re doing with the data is choosing abbreviations for Rest assured, just about anything we do here is already going to surpass OpenMapTiles, Mapzen, and I think Mapbox Streets in terms of comprehensiveness. |
OK cool, so sounds like a revised approach would be:
|
I think this is a fair starting assumption, though I wouldn’t be surprised if we eventually find out about country-specific abbreviations in some language. |
OK got it, I guess it doesn't change the last step much except it could take a country/region as well. This seems doable, the first step I'd get started on when I have time would be the element -> country/region provider. |
When implementing the basemap layer, I did not port road name abbreviations from https://github.com/giggls/mapnik-german-l10n because of the licensing and just pass road names from OpenStreetMap directly. This is not ideal because clients like MapLibre won't show labels on a line if we can't fit the entire string, so overall shorter label names will increase label density.
I think it would be best to abbreviate all road names (not just long ones), for example:
Planning to use https://github.com/Project-OSRM/osrm-text-instructions/tree/master/languages/abbreviations as a starting data source (other possibilities from OpenCageData address formatter and geocoder-abbreviations but search-specific datasets will likely be too aggressive)
Subtassk to build/test:
(coordinate) => (admin level 2/admin level 4) or (adm2/adm4)[]
- this will support pluggable datasets (natural earth, overture division areas) to choose a resolution and POV and fall back to country-coder geojson files. It will also be usable for profiles to get the country/region for a feature for other purposes besides abbreviation(element, country, region?) => language or languages[]
to infer what language aname
tag is based on its location and other tags on it(name, language, country?, region?) => abbreviation
- that applies rules to a name based on that languageThe text was updated successfully, but these errors were encountered: