Skip to content

Latest commit

 

History

History
217 lines (171 loc) · 12 KB

README.md

File metadata and controls

217 lines (171 loc) · 12 KB

Wn logo
a Python library for wordnets
PyPI link Python Support tests Documentation Status Conda-Forge Version
Available Wordnets | Documentation | FAQ | Migrating from NLTK | Roadmap


Wn is a Python library for exploring information in wordnets.

Installation

Install it from PyPI using pip:

pip install wn

Or install using conda from the conda-forge channel (conda-forge/wn-feedstock):

conda install -c conda-forge wn

Getting Started

First, download some data:

python -m wn download oewn:2024  # the Open English WordNet 2024

Now start exploring:

>>> import wn
>>> en = wn.Wordnet('oewn:2024')        # Create Wordnet object to query
>>> ss = en.synsets('win', pos='v')[0]  # Get the first synset for 'win'
>>> ss.definition()                     # Get the synset's definition
'be the winner in a contest or competition; be victorious'

Features

Available Wordnets

Any WN-LMF-formatted wordnet can be added to Wn's database from a local file or remote URL, but Wn also maintains an index (see wn/index.toml) of available projects, similar to a package manager for software, to aid in the discovery and downloading of new wordnets. The projects in this index are listed below.

English Wordnets

There are several English wordnets available. In general it is recommended to use the latest Open English Wordnet, but if you have stricter compatibility needs for, e.g., experiment replicability, you may try the OMW English Wordnet based on WordNet 3.0 (compatible with the Princeton WordNet 3.0 and with the NLTK), or OpenWordnet-EN (for use with the Portuguese wordnet OpenWordnet-PT).

Name Specifier # Synsets Notes
Open English WordNet oewn:2024
oewn:2023
oewn:2022
oewn:2021
ewn:2020
ewn:2019
120630
120135
120068
120039
120053
117791
Recommended
 
 
 
 
 
OMW English Wordnet based on WordNet 3.0 omw-en:1.4 117659 Included with omw:1.4
OMW English Wordnet based on WordNet 3.1 omw-en31:1.4 117791
OpenWordnet-EN own-en:1.0.0 117659 Included with own:1.0.0

Other Wordnets and Collections

These are standalone non-English wordnets and collections. The wordnets of each collection are listed further down.

Name Specifier # Synsets Language
Open Multilingual Wordnet omw:1.4 n/a multiple [mul]
Open German WordNet odenet:1.4
odenet:1.3
36268
36159
German [de]
Open Wordnets for Portuguese and English own:1.0.0 n/a multiple [mul]
KurdNet kurdnet:1.0 2144 Kurdish [ckb]

Open Multilingual Wordnet (OMW) Collection

The Open Multilingual Wordnet collection (omw:1.4) installs the following lexicons (from here) which can also be downloaded and installed independently:

Name Specifier # Synsets Language
Albanet omw-sq:1.4 4675 Albanian [sq]
Arabic WordNet (AWN v2) omw-arb:1.4 9916 Arabic [arb]
BulTreeBank Wordnet (BTB-WN) omw-bg:1.4 4959 Bulgarian [bg]
Chinese Open Wordnet omw-cmn:1.4 42312 Mandarin (Simplified) [cmn-Hans]
Croatian Wordnet omw-hr:1.4 23120 Croatian [hr]
DanNet omw-da:1.4 4476 Danish [da]
FinnWordNet omw-fi:1.4 116763 Finnish [fi]
Greek Wordnet omw-el:1.4 18049 Greek [el]
Hebrew Wordnet omw-he:1.4 5448 Hebrew [he]
IceWordNet omw-is:1.4 4951 Icelandic [is]
Italian Wordnet omw-iwn:1.4 15563 Italian [it]
Japanese Wordnet omw-ja:1.4 57184 Japanese [ja]
Lithuanian WordNet omw-lt:1.4 9462 Lithuanian [lt]
Multilingual Central Repository omw-ca:1.4 45826 Catalan [ca]
Multilingual Central Repository omw-eu:1.4 29413 Basque [eu]
Multilingual Central Repository omw-gl:1.4 19312 Galician [gl]
Multilingual Central Repository omw-es:1.4 38512 Spanish [es]
MultiWordNet omw-it:1.4 35001 Italian [it]
Norwegian Wordnet omw-nb:1.4 4455 Norwegian (Bokmål) [nb]
Norwegian Wordnet omw-nn:1.4 3671 Norwegian (Nynorsk) [nn]
OMW English Wordnet based on WordNet 3.0 omw-en:1.4 117659 English [en]
Open Dutch WordNet omw-nl:1.4 30177 Dutch [nl]
OpenWN-PT omw-pt:1.4 43895 Portuguese [pt]
plWordNet omw-pl:1.4 33826 Polish [pl]
Romanian Wordnet omw-ro:1.4 56026 Romanian [ro]
Slovak WordNet omw-sk:1.4 18507 Slovak [sk]
sloWNet omw-sl:1.4 42583 Slovenian [sl]
Swedish (SALDO) omw-sv:1.4 6796 Swedish [sv]
Thai Wordnet omw-th:1.4 73350 Thai [th]
WOLF (Wordnet Libre du Français) omw-fr:1.4 59091 French [fr]
Wordnet Bahasa omw-id:1.4 38085 Indonesian [id]
Wordnet Bahasa omw-zsm:1.4 36911 Malaysian [zsm]

Open Wordnet (OWN) Collection

The Open Wordnets for Portuguese and English collection (own:1.0.0) installs the following lexicons (from here) which can also be downloaded and installed independently:

Name Specifier # Synsets Language
OpenWordnet-PT own-pt:1.0.0 52670 Portuguese [pt]
OpenWordnet-EN own-en:1.0.0 117659 English [en]

Collaborative Interlingual Index

While not a wordnet, the Collaborative Interlingual Index (CILI) represents the interlingual backbone of many wordnets. Wn, including interlingual queries, will function without CILI loaded, but adding it to the database makes available the full list of concepts, their status (active, deprecated, etc.), and their definitions.

Name Specifier # Concepts
Collaborative Interlingual Index cili:1.0 117659

Changes to the Index

ewnoewn

The 2021 version of the Open English WordNet (oewn:2021) has changed its lexicon ID from ewn to oewn, so the index is updated accordingly. The previous versions are still available as ewn:2019 and ewn:2020.

pwnomw-en, omw-en31

The wordnet formerly called the Princeton WordNet (pwn:3.0, pwn:3.1) is now called the OMW English Wordnet based on WordNet 3.0 (omw-en) and the OMW English Wordnet based on WordNet 3.1 (omw-en31). This is more accurate, as it is a OMW-produced derivative of the original WordNet data, and it also avoids license or trademark issues.

*wnomw-* for OMW wordnets

All OMW wordnets have changed their ID scheme from ...wn to omw-.. and the version no longer includes +omw (e.g., bulwn:1.3+omw is now omw-bg:1.4).