[FEATURE] Add utility to abbreviate road names #14

msbarry · 2021-10-29T10:48:42Z

When implementing the basemap layer, I did not port road name abbreviations from https://github.com/giggls/mapnik-german-l10n because of the licensing and just pass road names from OpenStreetMap directly. This is not ideal because clients like MapLibre won't show labels on a line if we can't fit the entire string, so overall shorter label names will increase label density.

I think it would be best to abbreviate all road names (not just long ones), for example:

Input	Output
Park Avenue	Park Ave
Northeast Boulevard	NE Blvd
East 61st Street	E 61st St
First Street	1st St
South Park Street	S Park St
South Street	South St

Planning to use https://github.com/Project-OSRM/osrm-text-instructions/tree/master/languages/abbreviations as a starting data source (other possibilities from OpenCageData address formatter and geocoder-abbreviations but search-specific datasets will likely be too aggressive)

Subtassk to build/test:

(coordinate) => (admin level 2/admin level 4) or (adm2/adm4)[] - this will support pluggable datasets (natural earth, overture division areas) to choose a resolution and POV and fall back to country-coder geojson files. It will also be usable for profiles to get the country/region for a feature for other purposes besides abbreviation
(element, country, region?) => language or languages[] to infer what language a name tag is based on its location and other tags on it
(name, language, country?, region?) => abbreviation - that applies rules to a name based on that language

The text was updated successfully, but these errors were encountered:

1ec5 · 2022-02-16T10:37:08Z

mapnik-german-l10n doesn’t produce very good results in English because it has to avoid stepping on the toes of the French abbreviation code and also abbreviates words out of context (e.g., “Court Street” becomes “Ct Street”). I’d imagine it would be straightforward to write more robust abbreviation code from scratch. Ideally, the abbreviator would know the country that the feature is in, allowing it to make language-specific assumptions about name and perhaps avoid abbreviating a French name in France that was copied to name:en.

/ref openmaptiles/openmaptiles#1360

1ec5 · 2022-02-16T10:56:23Z

Some possible data sources:

These projects have different use cases, so they apply different inclusion criteria. The abbreviations in OSRM Text Instructions are used by the Mapbox Directions API, which tags words in name or destination with potential abbreviations that the Mapbox Navigation SDK can progressively apply¹ until the text fits the allotted space. Priority is given to directions abbreviations, then classifications abbreviations, then abbreviations abbreviations in a last-ditch attempt to make the text fit.

Mapnik has a similar ability to progressively abbreviate labels, but as you’ve noted, MapLibre does not yet have this capability. If you use OSRM Text Instructions’ abbreviations without progressive abbreviation, avoid the abbreviations table, which would make the results look desperate but probably wouldn’t salvage many labels of borderline length.

This is a link to the last open-source version of the code in question. ↩

msbarry · 2022-02-16T12:37:55Z

Thanks for the feedback @1ec5 ! Of those 3 options above, I was leaning towards geocoder-abbreviations since it includes things like "one" -> "1" and "fifteenth" -> "15th" with something like: point -> country -> default language for that country -> tokens file for that language. Do you foresee any issues going that route? Or alternatively are you aware of any other better data sources to power these kinds of abbreviations?

1ec5 · 2022-02-16T13:17:32Z

geocoder-abbreviations is more comprehensive; however, it doesn’t distinguish the more aggressive abbreviations from the less aggressive ones. You’d need to be careful about applying abbreviations only to words that are unlikely to be the base name, so that “South Park Street” would get abbreviated as “S Park St” rather than “S Pk St”, which wouldn’t be very recognizable. This touches on a broader problem with OSM’s unstructured representation of street names, combined with insisting on spelled-out words in name, but some heuristics like avoiding abbreviating the middle word(s) could help.

msbarry · 2022-02-16T13:33:24Z

Good points, I'll keep a running list of test cases that should pass at the top of this issue.

Another thought I had was using libpostal (https://github.com/openvenues/jpostal) to try to extract some more structure from raw street names. Not sure it would handle street names and not addresses though...?

msbarry · 2022-07-29T10:57:26Z

Moved the openmaptiles profile to https://github.com/openmaptiles/planetiler-openmaptiles. This ticket will track adding the generic capability of abbreviating road-names, and openmaptiles/planetiler-openmaptiles#17 will track using that from the openmaptiles profile.

1ec5 · 2024-09-10T04:08:39Z

https://github.com/mapbox/geocoder-abbreviations (en example abbreviations)

This repository “moved” to https://github.com/geocoding/geocoder-abbreviations/.

msbarry · 2024-09-10T09:37:58Z

Thanks - I was wondering if this might even make more sense as a client-side capability - possibly requiring a new hook added to maplibre? The raw data is the full length name, it seems like a client may either want to render abbreviated or un-abbreviated versions from the same raw tile data?

Either way, we need a datasource that describes the abbreviation rules by locale...

1ec5 · 2024-09-10T14:45:31Z

The client-side renderer doesn’t have as much context as the tile generator – and the string processing capabilities inside MapLibre are comically limited. I think ideally the tiles would contain both the unabbreviated and abbreviated forms, maybe even a series of progressively abbreviated forms, that the client would apply as space allows.

In osm-americana/openstreetmap-americana#793 (comment), I started prototyping a purely client-side solution for dropping less important words from a street name, which is a similar problem. Identifying the words to delete and then deleting them is massively difficult, but choosing the best-fitting name turned out to be quite feasible, with the caveat of increased CPU usage due to extra symbol placement.

If the tiles already contain the candidate names, displaying the best one should be pretty straightforward with MapLibre’s existing capabilities. If MapLibre then adds a built-in way to try different labels as space allows, as Mapnik does, then it would be all that much easier.

wipfli · 2024-09-10T21:10:15Z

The feature properties transform hook could be good for this. I should finish the pull request at some point... maplibre/maplibre-gl-js#4199

1ec5 · 2024-09-10T22:22:31Z

The transform hook would be desirable – it would make it easier to work around the lack of string processing expression operators. However, there still isn’t very much context, so the client-side code would need to make lots of assumptions, as I did in my proof of concept above. As with the other stuff that OSM Americana does with runtime styling, it isn’t very portable and makes it harder to integrate the stylesheet into an application centered around “basemap layers”. A tile generator is in a much better position to precalculate the abbreviations. It can always stick them in a separate field, just as with localized names, to maximize the tileset’s versatility.

msbarry · 2024-09-11T09:35:34Z

When you say "context" are you referring to just the country/default language that the feature is in?

Including both abbreviated and unabbreviated names in the tiles solves for my concern of letting the client choose which one to use. We could also possibly default to name:abbreviation short_name or ref when available instead of computing it?

What do you think the best way to get started on this would be @1ec5 ? TBH the main reason I haven't done this yet is that your earlier comments made it seem like it might not even be possible to do correctly 😆 Maybe we could start with a conservative subset of one of those data sources (ie. just the ones in your label density branch) and gradually add more?

1ec5 · 2024-09-11T18:20:59Z

When you say "context" are you referring to just the country/default language that the feature is in?

Yes, the language and country are important context. Otherwise, we don’t know whether to abbreviate “boulevard” as “Bd.” (French) or as “Blvd.” (English), or whether to abbreviate “Calle” to “C.” (Spanish, but not when borrowed into English). For schemas such as OpenMapTiles, we know the language of most of the name tags but not name, so that would have to come from heuristics, like finding a matching name:* tag or a matching Wikidata label, or detecting certain character patterns, and then breaking any ties based on the country it’s in.

We could also possibly default to name:abbreviation short_name or ref when available instead of computing it?

Yes, short_name should be one candidate abbreviation, though maybe not the final abbreviation. For example, a street named Dr. Martin Luther King Dr. might have a short name of “MLK Drive”.

TBH the main reason I haven't done this yet is that your earlier comments made it seem like it might not even be possible to do correctly 😆 Maybe we could start with a conservative subset of one of those data sources (ie. just the ones in your label density branch) and gradually add more?

My branch took advantage of the style’s focus on North America and only targeted English users, but if Planetiler has context about the country and can guess the language, then it doesn’t have to make such assumptions and doesn’t need to be particularly conservative. That said, a small hard-coded list is better than nothing, as long as it’s a coherent list, rather than the arbitrary one OpenMapTiles uses.

msbarry · 2024-09-12T10:59:23Z

Thanks, so a rough approach in planetiler would be:

pick a name to start with: prefer short_name, otherwise name:abbreviation, otherwise name
collect context about the feature:
- what country it is in (this can be a controversial topic 😬 )
- best guess language (match a translation from osm or wikidata, otherwise fall back to ICU language detection breaking ties with predominant language in that country)
apply language (and country?) specific rules to abbreviate that name

I guess it would also be possible to generate abbreviations for each name translation like name:de -> name:abbrev:de which would let us bypass language detection but I assume the coverage wouldn't be as high so clients would want to fall back to the main abbreviated name.

Do you think it would be best to start from one of those existing data sources? Or build up a map-rendering-specific abbreviation data source from scratch?

1ec5 · 2024-09-12T15:05:38Z

The most important heuristic for language identification would be finding a name:* that matches name.

what country it is in (this can be a controversial topic 😬 )

This is about what language name is in for the country in OSM in practice, not what it should be. In cases like India where The name language is in English despite the “on the ground” principle, this functionality would go with English. I think more than any controversy, we’ll find many edge cases where there just isn’t a single national language, or where the name contains multiple languages. The conservative approach would ignore these regions for now, since abbreviation isn’t super critical.

In theory, default_language on the surrounding boundary can give you a good idea of the language, but in practice the values are often wrong, and I don’t think it’s a good idea to create that broad and deep a target for vandalism. I think it could be a starting point for a dataset, but I’d be wary of using it uncritically.

I guess it would also be possible to generate abbreviations for each name translation like name:de -> name:abbrev:de which would let us bypass language detection but I assume the coverage wouldn't be as high so clients would want to fall back to the main abbreviated name.

Coverage for streets would be very low except in some bilingual cities. Also, the OpenMapTiles schema currently only stores road names in English and German, ignoring other languages. A possible half-measure would be to always produce a name:en:abbr based on name even if there’s no name:en, but this circumvents the language fallback rules that are currently the client’s responsibility.

Do you think it would be best to start from one of those existing data sources? Or build up a map-rendering-specific abbreviation data source from scratch?

Either is fine I guess, but if we use an existing source, it should be one intended for display rather than search. A search-oriented dataset will tend to be too aggressive.

wipfli · 2024-09-13T10:26:42Z

I think if you give an name to an LLM together with a geolocation it can guess the language quite well.

msbarry · 2024-09-13T11:03:00Z

Either is fine I guess, but if we use an existing source, it should be one intended for display rather than search. A search-oriented dataset will tend to be too aggressive.

Yeah the geocoder one (https://github.com/geocoding/geocoder-abbreviations/) seems like it's going to be too aggressive. The OSRM one (https://github.com/Project-OSRM/osrm-text-instructions/tree/master/languages/abbreviations) seems closer to what we want, except the last update was by @1ec5 6 years ago 😆

To Oliver's point - I don't want to be invoking an LLM from planetiler directly, but I could see using one to generate static config files to start from and maybe test cases. It might be 90% of the way there and we can correct from that.

1ec5 · 2024-09-13T15:01:12Z

The OSRM one (https://github.com/Project-OSRM/osrm-text-instructions/tree/master/languages/abbreviations) seems closer to what we want, except the last update was by @1ec5 6 years ago 😆

It’s pretty stable; most languages don’t change their abbreviations very often. The nice thing about that project is that volunteers can sign up to add their language on Transifex today using a decently user-friendly interface.

To Oliver's point - I don't want to be invoking an LLM from planetiler directly, but I could see using one to generate static config files to start from and maybe test cases. It might be 90% of the way there and we can correct from that.

I don’t think we need to treat OSM like a black box and rely on an LLM to interpret it for us. My only point was that default_language is unreliable – because it lacks visibility and nuance. That nuance is probably captured on this wiki page, which, sure, we could use an LLM to reformat into something more structured. For any gaps in coverage, CLDR maintains an open-source, industry-standard set of country-language mappings, and there’s no shortage of heuristic-based tools for language ID.

msbarry · 2024-09-15T11:03:15Z

OK cool, trying to think how to decouple this into simpler subtasks...it seems like there are 3 distinct steps here:

(name, language) => abbreviated name should be relatively straightforward based on a data source like OSRM
(element, country) => language or language[] most complex based on those rules you were describing
(coordinate) => country or country[] can be controversial and I'd rather use a pre-constructed data source than need to build it from the OSM extract (requires an extra 2 passes). I'm thinking either:
- natural earth - it's only a 5mb download and you can specify a POV to choose the source, but it could be of slightly from OSM
- or overture country division areas - 62mb but should line up with OSM exactly, and it looks like they plan to eventually populate a "perspectives" field that would let you filter on a perspective from a single download

Or do you think it needs to go down to region instead of country in order to infer a language?

I could see building up a set of test cases for 1 and a separate set of test cases for 2 to make sure the rules from that wiki page and heuristics work together to produce the desired language inference. Or do you think we can't really separate 1 and 2 and we'll want the code/test cases to go directly from element and inferred country to abbreviated name? Or even worse that 3 can't be separated and we'd need to go from element and location to abbreviated name?

1ec5 · 2024-09-15T14:08:25Z

(element, country) => language or language[] most complex based on those rules you were describing

Yes, and to add to the complexity, we definitely would have to consider subnational divisions such as Québec and the cantons of Switzerland. default_language is defined recursively on subnational divisions, though I would probably cross-reference the wiki just in case.

(coordinate) => country or country[] can be controversial and I'd rather use a pre-constructed data source than need to build it from the OSM extract (requires an extra 2 passes).

Natural Earth should be fine. We don’t really need precision, because all we’re doing with the data is choosing abbreviations for name. Plus, not every language even abbreviates road names or abbreviates at all. If you’re planning to make it more generic, so that a tileset could expose the geocoded country code on each feature, that would be a different story. Another alternative would be country-coder. It’s in JavaScript, but you could pull out the GeoJSON data.

Rest assured, just about anything we do here is already going to surpass OpenMapTiles, Mapzen, and I think Mapbox Streets in terms of comprehensiveness.

msbarry · 2024-09-16T11:46:20Z

OK cool, so sounds like a revised approach would be:

(coordinate) => (admin level 2/admin level 4) or (adm2/adm4)[] - it's come up in other contexts that it would be helpful to have access to the country or region for other purposes as well, so I'd probably make this a pluggable thing where country-coder json is the built-in fallback but you can optionally choose natural earth or overture for higher-resolution or a specific POV
(element, country, region?) => language or languages[] so there can be overrides per province or canton if the geocoder is detailed enough to provide region as well
(name, language) => abbreviation - the abbreviations only seem to change by language, but would this need to be country or region-aware as well?

1ec5 · 2024-09-17T06:46:36Z

the abbreviations only seem to change by language, but would this need to be country or region-aware as well?

I think this is a fair starting assumption, though I wouldn’t be surprised if we eventually find out about country-specific abbreviations in some language.

msbarry · 2024-09-17T09:52:16Z

OK got it, I guess it doesn't change the last step much except it could take a country/region as well. This seems doable, the first step I'd get started on when I have time would be the element -> country/region provider.

1ec5 mentioned this issue Feb 16, 2022

Shields of Kentucky osm-americana/openstreetmap-americana#119

Merged

1ec5 mentioned this issue Feb 24, 2022

Name translation improvements #86

Open

1ec5 mentioned this issue Mar 6, 2022

Change demo map tile server to self-hosted planetiler server osm-americana/openstreetmap-americana#214

Closed

msbarry added the omt-compat OpenMapTiles Compatibility label Mar 8, 2022

ZeLonewolf mentioned this issue May 12, 2022

North Division Street through Spokane isn't labeled as such osm-americana/openstreetmap-americana#321

Open

msbarry changed the title ~~[FEATURE] Abbreviate road names~~ [FEATURE] Add utility to abbreviate road names Jul 29, 2022

msbarry removed the omt-compat OpenMapTiles Compatibility label Jul 29, 2022

msbarry mentioned this issue Jul 29, 2022

Abbreviate road names openmaptiles/planetiler-openmaptiles#17

Open

msbarry closed this as completed Jul 29, 2022

msbarry reopened this Jul 29, 2022

1ec5 mentioned this issue Feb 16, 2023

Sparse road labels in dense road networks osm-americana/openstreetmap-americana#793

Open

msbarry mentioned this issue Oct 24, 2024

add country-coder json [#44] protomaps/basemaps#328

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add utility to abbreviate road names #14

[FEATURE] Add utility to abbreviate road names #14

msbarry commented Oct 29, 2021 •

edited

Loading

1ec5 commented Feb 16, 2022 •

edited

Loading

1ec5 commented Feb 16, 2022

msbarry commented Feb 16, 2022 •

edited

Loading

1ec5 commented Feb 16, 2022

msbarry commented Feb 16, 2022

msbarry commented Jul 29, 2022

1ec5 commented Sep 10, 2024

msbarry commented Sep 10, 2024 •

edited

Loading

1ec5 commented Sep 10, 2024

wipfli commented Sep 10, 2024

1ec5 commented Sep 10, 2024

msbarry commented Sep 11, 2024

1ec5 commented Sep 11, 2024 •

edited

Loading

msbarry commented Sep 12, 2024 •

edited

Loading

1ec5 commented Sep 12, 2024

wipfli commented Sep 13, 2024

msbarry commented Sep 13, 2024 •

edited

Loading

1ec5 commented Sep 13, 2024 •

edited

Loading

msbarry commented Sep 15, 2024 •

edited

Loading

1ec5 commented Sep 15, 2024 •

edited

Loading

msbarry commented Sep 16, 2024

1ec5 commented Sep 17, 2024

msbarry commented Sep 17, 2024

[FEATURE] Add utility to abbreviate road names #14

[FEATURE] Add utility to abbreviate road names #14

Comments

msbarry commented Oct 29, 2021 • edited Loading

1ec5 commented Feb 16, 2022 • edited Loading

1ec5 commented Feb 16, 2022

Footnotes

msbarry commented Feb 16, 2022 • edited Loading

1ec5 commented Feb 16, 2022

msbarry commented Feb 16, 2022

msbarry commented Jul 29, 2022

1ec5 commented Sep 10, 2024

msbarry commented Sep 10, 2024 • edited Loading

1ec5 commented Sep 10, 2024

wipfli commented Sep 10, 2024

1ec5 commented Sep 10, 2024

msbarry commented Sep 11, 2024

1ec5 commented Sep 11, 2024 • edited Loading

msbarry commented Sep 12, 2024 • edited Loading

1ec5 commented Sep 12, 2024

wipfli commented Sep 13, 2024

msbarry commented Sep 13, 2024 • edited Loading

1ec5 commented Sep 13, 2024 • edited Loading

msbarry commented Sep 15, 2024 • edited Loading

1ec5 commented Sep 15, 2024 • edited Loading

msbarry commented Sep 16, 2024

1ec5 commented Sep 17, 2024

msbarry commented Sep 17, 2024

msbarry commented Oct 29, 2021 •

edited

Loading

1ec5 commented Feb 16, 2022 •

edited

Loading

msbarry commented Feb 16, 2022 •

edited

Loading

msbarry commented Sep 10, 2024 •

edited

Loading

1ec5 commented Sep 11, 2024 •

edited

Loading

msbarry commented Sep 12, 2024 •

edited

Loading

msbarry commented Sep 13, 2024 •

edited

Loading

1ec5 commented Sep 13, 2024 •

edited

Loading

msbarry commented Sep 15, 2024 •

edited

Loading

1ec5 commented Sep 15, 2024 •

edited

Loading