[arm] finding IPA transcriptions outside of the Pronunciation block #470

jhdeov · 2022-11-07T02:32:51Z

For the word կարկանդակ, wikipron finds the correct pronunciation of [kɑɾkɑndɑk] but it also finds the IPA transcriptions of other words in the Usage Notes section like [pɛrɑʃˈki]. I'm not sure if this is an unavoidable glitch from Wikipron's side, or if it's a glitch that could be fixed from the Wiktionary side.

It seems that what's going on is that WikiPron is just finding any IPA transcription that's inside the Armenian entry, even if it's not associated with a dialect. E.g., if you run wikipron arm --dialect='ladygaga' --no-skip-parens --narrow > randos.tsv you get a handful of IPA transcriptions that aren't associated with the pre-defined dialects. These are either a) IPA transcriptions in the Usage notes or etymology, or b) IPA transcriptions for non-standard dialects. This isn't a problem for using Wikipron on a specific language (because the person can just filter those out manually). But I wonder if this glitch causes any other funny business for the other languages.

Side note: I wonder if there's been enough situations where people had to fix Wiktionary entries in order to optimize Wikipron's scraper (like on the various closed issues). If so, perhaps a tips and tricks page would be helpful down the line?

The text was updated successfully, but these errors were encountered:

kylebgorman · 2022-11-07T14:19:46Z

It basically finds anything in the pronunciation section in // or []. TBF it is bizarre to be giving the pronunciation of an unrelated Russian word here. I'm going to edit the entry.

The Wiktionary people have taken absolutely zero interest in our project so I don't think there's a demand outside of WikiPron developers for this information.

jhdeov · 2022-11-07T19:31:47Z

Heh, the admin ended up agreeing

jhdeov · 2022-11-07T19:34:38Z

It basically finds anything in the pronunciation section in // or [].

But then this is a glitch though because the Russian word was not under the pronunciation section but under a separate heading. The original example is gone now, but another example is գրաբար. The usage notes explain a pronunciation tidbit. It's in a separate section, but it's getting scraped too.

kylebgorman · 2022-11-07T19:41:52Z

Yes, that was a surprise to me that it did that all the same.

…

On Mon, Nov 7, 2022 at 2:34 PM Hossep Dolatian ***@***.***> wrote: It basically finds anything in the pronunciation section in // or []. But then this is a glitch though because the Russian word was *not* under the pronunciation section but under a separate heading. The original example is gone now, but another example is գրաբար <https://en.m.wiktionary.org/wiki/%D5%A3%D6%80%D5%A1%D5%A2%D5%A1%D6%80>. The usage notes explain a pronunciation tidbit. It's in a separate section, but it's getting scraped too. — Reply to this email directly, view it on GitHub <#470 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABG4OJOT6KGN4QW7QZN4ULWHFKVRANCNFSM6AAAAAARYXOKXI> . You are receiving this because you commented.Message ID: ***@***.***>

jhdeov · 2022-11-07T19:50:51Z

Wikipron also found IPAs that were in the etymology section, before the pronunciation section. This word had a transcription there until I found and removed it (via the above 'fake dialect' trick).

This makes me think that Wikipron is looking IPA anywhere in the entry, and not just in the pronunciation box. I'm not sure if that's an error (because the code isn't designed to go out of the pronunciation box) or a missing feature (because the code is designed to go out of the pronunciation box).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[arm] finding IPA transcriptions outside of the Pronunciation block #470

[arm] finding IPA transcriptions outside of the Pronunciation block #470

jhdeov commented Nov 7, 2022

kylebgorman commented Nov 7, 2022

jhdeov commented Nov 7, 2022

jhdeov commented Nov 7, 2022

kylebgorman commented Nov 7, 2022 via email

jhdeov commented Nov 7, 2022

[arm] finding IPA transcriptions outside of the Pronunciation block #470

[arm] finding IPA transcriptions outside of the Pronunciation block #470

Comments

jhdeov commented Nov 7, 2022

kylebgorman commented Nov 7, 2022

jhdeov commented Nov 7, 2022

jhdeov commented Nov 7, 2022

kylebgorman commented Nov 7, 2022 via email

jhdeov commented Nov 7, 2022