Skip to content

open-dsl-dict/wikidict-dsl-zh

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

wikidict-dsl-zh - Wikidata Bilingual DSL Dictionaries (Chinese)

This repository makes available a collection of bilingual Chinese dictionaries in DSL format derived from interwiki links (links between article titles in different languages) in Wikipedia. The data has been extracted from Wikidata.

Format

ABBYY Lingvo DSL is a flexible dictionary format that can be read by dictionary applications such as Goldendict and converted to other formats using tools such as pyglossary. There are also a number of tools for creating DSL format dictionaries available in the dsl-tools project.

DSL files must be saved as UTF-16 to be usable by dictionary programs. The raw source files in this repository are saved in UTF-8 format, which is both significantly smaller in terms of file size, and also readable (and diffable) by git. However, there are fully encoded and compressed .dsl.dz dictionaries ready for use available in the Releases section.

You can also use a pair of provided scripts rezip_dsl.rb and unzip_dsl.rb to encode/compress and decode/uncompress the dictionaries either individually or as a group.

Data

The data directory contains the bilingual dictionaries in pairs according to ISO language code.

The basic filename pattern is [ISO]-zh_wikidict.dsl, with [ISO] being the source language ISO code. A list of all language pairs is below.

Available language pairs

Language codes Language names
af-zh Afrikaans => Chinese
am-zh Amharic => Chinese
ang-zh Anglo-Saxon => Chinese
ar-zh Arabic => Chinese
arc-zh Aramaic => Chinese
bg-zh Bulgarian => Chinese
bi-zh Bislama => Chinese
bn-zh Bengali => Chinese
bo-zh Tibetan => Chinese
br-zh Breton => Chinese
bs-zh Bosnian => Chinese
ca-zh Catalan => Chinese
cdo-zh Min Dong => Chinese
chr-zh Cherokee => Chinese
chy-zh Cheyenne => Chinese
cr-zh Cree => Chinese
cs-zh Czech => Chinese
cy-zh Welsh => Chinese
da-zh Danish => Chinese
de-zh German => Chinese
el-zh Greek => Chinese
en-zh English => Chinese
eo-zh Esperanto => Chinese
es-zh Spanish => Chinese
et-zh Estonian => Chinese
eu-zh Basque => Chinese
fa-zh Persian => Chinese
ff-zh Fula => Chinese
fi-zh Finnish => Chinese
fr-zh French => Chinese
ga-zh Irish => Chinese
gan-zh Gan => Chinese
gd-zh Scottish Gaelic => Chinese
gu-zh Gujarati => Chinese
gv-zh Manx => Chinese
ha-zh Hausa => Chinese
hak-zh Hakka => Chinese
haw-zh Hawaiian => Chinese
he-zh Hebrew => Chinese
hi-zh Hindi => Chinese
hr-zh Croatian => Chinese
ht-zh Haitian => Chinese
hu-zh Hungarian => Chinese
hy-zh Armenian => Chinese
id-zh Indonesian => Chinese
ig-zh Igbo => Chinese
is-zh Icelandic => Chinese
it-zh Italian => Chinese
iu-zh Inuktitut => Chinese
ja-zh Japanese => Chinese
jbo-zh Lojban => Chinese
jv-zh Javanese => Chinese
ka-zh Georgian => Chinese
kg-zh Kongo => Chinese
ki-zh Kikuyu => Chinese
kl-zh Greenlandic => Chinese
km-zh Khmer => Chinese
ko-zh Korean => Chinese
la-zh Latin => Chinese
lg-zh Luganda => Chinese
lo-zh Lao => Chinese
lt-zh Lithuanian => Chinese
lv-zh Latvian => Chinese
mg-zh Malagasy => Chinese
mi-zh Maori => Chinese
mn-zh Mongolian => Chinese
ms-zh Malay => Chinese
mt-zh Maltese => Chinese
nah-zh Nahuatl => Chinese
ne-zh Nepali => Chinese
nl-zh Dutch => Chinese
nn-zh Norwegian (Nynorsk) => Chinese
no-zh Norwegian => Chinese
nv-zh Navajo => Chinese
ny-zh Chichewa => Chinese
oc-zh Occitan => Chinese
pa-zh Punjabi => Chinese
pi-zh Pali => Chinese
pl-zh Polish => Chinese
ps-zh Pashto => Chinese
pt-zh Portuguese => Chinese
qu-zh Quechua => Chinese
ro-zh Romanian => Chinese
ru-zh Russian => Chinese
sa-zh Sanskrit => Chinese
se-zh Northern Sami => Chinese
sh-zh Serbo-Croatian => Chinese
sk-zh Slovak => Chinese
sl-zh Slovenian => Chinese
sn-zh Shona => Chinese
so-zh Somali => Chinese
sq-zh Albanian => Chinese
sr-zh Serbian => Chinese
sv-zh Swedish => Chinese
sw-zh Kiswahili => Chinese
ta-zh Tamil => Chinese
te-zh Telugu => Chinese
th-zh Thai => Chinese
tl-zh Tagalog => Chinese
tpi-zh Tok Pisin => Chinese
tr-zh Turkish => Chinese
ug-zh Uyghur => Chinese
uk-zh Ukrainian => Chinese
ur-zh Urdu => Chinese
vi-zh Vietnamese => Chinese
wo-zh Wolof => Chinese
wuu-zh Wu => Chinese
xh-zh Xhosa => Chinese
yi-zh Yiddish => Chinese
yo-zh Yoruba => Chinese
za-zh Zhuang => Chinese
zh_classical-zh Classical Chinese => Chinese
zh_min_nan-zh Min Nan => Chinese
zh_yue-zh Cantonese => Chinese
zu-zh Zulu => Chinese

Statistics

Dictionary size

Language pair # of entries
af-zh 20508
am-zh 5392
ang-zh 2199
ar-zh 98023
arc-zh 1232
bg-zh 72781
bi-zh 439
bn-zh 19032
bo-zh 2625
br-zh 27205
bs-zh 23758
ca-zh 160059
cdo-zh 2322
chr-zh 448
chy-zh 503
cr-zh 78
cs-zh 93537
cy-zh 21108
da-zh 67620
de-zh 242415
el-zh 43040
en-zh 435515
eo-zh 102452
es-zh 256049
et-zh 47040
eu-zh 109880
fa-zh 126560
ff-zh 190
fi-zh 104571
fr-zh 285450
ga-zh 17306
gan-zh 5171
gd-zh 9951
gu-zh 3668
gv-zh 3690
ha-zh 362
hak-zh 3361
haw-zh 1756
he-zh 66386
hi-zh 22172
hr-zh 50800
ht-zh 11511
hu-zh 93801
hy-zh 45507
id-zh 90922
ig-zh 680
is-zh 18295
it-zh 268870
iu-zh 336
ja-zh 225645
jbo-zh 1130
jv-zh 14367
ka-zh 37929
kg-zh 786
ki-zh 292
kl-zh 1444
km-zh 1577
ko-zh 132383
la-zh 75247
lg-zh 161
lo-zh 1081
lt-zh 54256
lv-zh 30513
mg-zh 37211
mi-zh 2064
mn-zh 9611
ms-zh 106086
mt-zh 1784
nah-zh 6073
ne-zh 6572
nl-zh 252137
nn-zh 51596
no-zh 112905
nv-zh 1421
ny-zh 118
oc-zh 64468
pa-zh 7355
pi-zh 2186
pl-zh 237094
ps-zh 2281
pt-zh 231392
qu-zh 10873
ro-zh 120535
ru-zh 234102
sa-zh 4612
se-zh 5153
sh-zh 89358
sk-zh 92714
sl-zh 39992
sn-zh 1424
so-zh 2180
sq-zh 20562
sr-zh 102246
sv-zh 207634
sw-zh 14774
ta-zh 21201
te-zh 7378
th-zh 44963
tl-zh 33989
tpi-zh 1215
tr-zh 76810
ug-zh 2252
uk-zh 169306
ur-zh 28229
vi-zh 195713
wo-zh 841
wuu-zh 2753
xh-zh 242
yi-zh 6326
yo-zh 21601
za-zh 669
zh_classical-zh 3538
zh_min_nan-zh 11627
zh_yue-zh 26623
zu-zh 514

Top ten dictionaries by number of entries

Language pair # of entries
en-zh 435515
fr-zh 285450
it-zh 268870
es-zh 256049
nl-zh 252137
de-zh 242415
pl-zh 237094
ru-zh 234102
pt-zh 231392
ja-zh 225645

License

According to the Wikidata website:

All structured data from the main and property namespace is available under the Creative Commons CC0 License

The data in this repository is therefore made available under the same Creative Commons CC0 License as that used by the Wikidata project. All of the data has been derived from the Wikidata JSON format database dumps.