Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search for Japan city details using Japanese characters does not lead to correct results via API #3484

Open
patelmm79 opened this issue Jul 23, 2024 · 2 comments

Comments

@patelmm79
Copy link

patelmm79 commented Jul 23, 2024

What did you search for?

[<!-- Please try to provide a link to your search. You can go to https://nominatim.openstreetmap.org and repeat your search there. If you originally found the issue somewhere else, please tell us what software/website you were using. --https://nominatim.openstreetmap.org/search?format=jsonv2&addressdetails=1&limit=10&namedetails=2&polygon_geojson=0&extratags=1&city=%E4%BB%99%E5%8F%B0

What result did you get?

Xiantai, Pingdingshan, Henan, China. This was the only result.

Including Japan as country yields 0 results. https://nominatim.openstreetmap.org/search?format=jsonv2&addressdetails=1&limit=10&namedetails=2&polygon_geojson=0&extratags=1&city=%E4%BB%99%E5%8F%B0&country=%E6%97%A5%E6%9C%AC

What result did you expect?

Sendai, Japan. Search using city name via Nominatim UI achieves a relevant result for the Railway location:

https://nominatim.openstreetmap.org/ui/details.html?osmtype=N&osmid=3570916502&class=railway

This is the place location for Sendai.

https://nominatim.openstreetmap.org/ui/details.html?osmtype=N&osmid=752184864&class=place

@mtmail
Copy link
Collaborator

mtmail commented Jul 24, 2024

So the issue seems to be that the city Sendai has the name '仙台市' in OSM data https://www.openstreetmap.org/node/752184864 and cannot be found when searching for '仙台'. The '市' suffix stands for 'city'. That's quite common with regional names in Japan. We should check how common and could add another database entry with the suffix removed.

Adding @miku0 who worked on such a list https://github.com/miku0/Nominatim/blob/soft_phrase2/nominatim/api/search/icu_tokenizer_japanese.py#L23 in the past. It's part of an older PR #3158

@lonvia
Copy link
Member

lonvia commented Jul 26, 2024

We used to have a special rule in the old tokenizer for those prefixes. We can probably bring them back either via a sanitizer or variants. Depends a bit on how frequently the characters appear in other contexts as suffixes.

Miku's PR is rather for splitting addresses into words. Not quite the same but related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants