Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glottochronology #1435

Open
vmonakhov opened this issue Jun 4, 2023 · 1 comment
Open

Glottochronology #1435

vmonakhov opened this issue Jun 4, 2023 · 1 comment

Comments

@vmonakhov
Copy link
Contributor

vmonakhov commented Jun 4, 2023

Task: To implement Glottochronology tool.

Theory and realization:

There is 100-words Swadesh list, it contains fundamental words on Russian to research any languages and calculate their relationship (aka. distance). The relationship based on etymological links between Swadesh words within each pair of dictionaries.

The result distance is calculated using the following formula:
distance = sqrt( ln(linked_words / total_words) / -0.1 / sqrt(linked_words / total_words) ), where:

  1. total_words is total number of matching Swadesh words in pair of dictionaries
  2. linked_words is number of etymologically linked words from (1)

Maximal distance by the formula above is 21.46, when linked_words/total_words == 1/100. Possible minimal distance is zero.
There is a hard-coded value distance == 25. It’s used when linked_words and/or total_words are zero. Large distance indicates weak relationship, little distance says about closeness of dictionaries and corresponding languages or dialects.

Result:

a) 2-d constellation, where dots are the dictionaries and distances between them indicate corresponding results by the formula.
b) 3-d constellation. It has the same meaning as 2-d one.
c) Table with each-to-each distances. It shows the calculated distances between corresponding dictionaries.
d) Table with cognates. It presents etymological groups by rows. Every value has the form:
Swadesh_word [phonological_transcription] original_translation_from_dictionary
An element of the table can have more than one such item (aka. synonyms) inside.
e) Table with single Swadesh words by dictionaries. This words have no cognates within the table (d).
f) A link to xlsx-file with the tables (c),(d),(e) in the corresponding worksheets.

Limitations:

About used limitations you can note at the bottom of the modal window. This can be:

g) Hidden tables. If the calculated output is too large, some tables can be hidden. The used limit is 1М html symbols for the tables summary size.
h) Not all the input dictionaries were processed. If an input dictionary has less than 50 Swadesh words, it’s not processed by the tool.

@vmonakhov vmonakhov mentioned this issue Jun 5, 2023
vmonakhov added a commit that referenced this issue Jun 22, 2023
* Reduce multi spaces

* Refactoring
vmonakhov added a commit that referenced this issue Jul 5, 2023
@vmonakhov
Copy link
Contributor Author

Changed the distance formula. The actual one is in the main text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant