You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a query that correctly performs an ILI-expanded synset relation traversal, but it is rather complicated and wouldn't lend itself well to a recursive query (see #38). This means it only works one hop at a time and doesn't grant much performance bonus, although it does make the Python code simpler (while making the SQL more complex):
WITH
rt(rowid, type) AS
(SELECT rowid, type FROM relation_types WHERE type IN (?, ...)),
lex(rowid) AS (VALUES (?), ...),
intra(rowid, source_rowid, target_rowid, type) AS
(SELECTrel.rowid, source_rowid, target_rowid, rt.typeFROM synset_relations AS rel
JOIN rt ONrel.type_rowid=rt.rowidWHERErel.lexicon_rowidIN lex
ANDrel.source_rowid= ?),
inter(rowid, source_rowid, target_rowid, type) AS
(SELECTrel.rowid, source_rowid, target_rowid, rt.typeFROM synset_relations AS rel
JOIN rt ONrel.type_rowid=rt.rowidWHERErel.lexicon_rowidIN (?, ...)
ANDrel.source_rowidIN
(SELECTss1.rowidFROM synsets AS ss1
JOIN ilis ONilis.rowid=ss1.ili_rowidWHEREilis.id= ?
ANDss1.rowid!= ?))
SELECTintra.type, intra.rowid, tgt.id, tgt.pos,
ilis.id, tgt.lexicon_rowid, tgt.rowidFROM intra
JOIN synsets AS tgt ONintra.target_rowid=tgt.rowidLEFT JOIN ilis ONilis.rowid=tgt.ili_rowidWHEREtgt.lexicon_rowidIN lex
UNIONSELECTinter.type, inter.rowid,
CASE WHEN ss.lexicon_rowidIN lex THEN ss.id ELSE null END,
ss.pos, ilis.id,
CASE WHEN ss.lexicon_rowidIN lex THEN ss.lexicon_rowid ELSE null END,
CASE WHEN ss.lexicon_rowidIN lex THEN ss.rowid ELSE null END
FROM inter
JOIN synsets AS _tgt ONinter.target_rowid=_tgt.rowidJOIN synsets AS ss ON_tgt.ili_rowid=ss.ili_rowidJOIN ilis ONilis.rowid=ss.ili_rowidWHERE_tgt.ili_rowidIS NOT NULL;
Since all these joins and checks occur each time a relation is traversed, it's a bit wasteful. We could precompute them into a temporary table with something like this:
CREATE TEMPORARY TABLE IF NOT EXISTS virtual_synset_relations (
wnid INTEGERNOT NULL,
source_rowid INTEGERNOT NULL,
target_rowid INTEGER,
target_ili INTEGER,
synset_relation_rowid INTEGERNOT NULL
);
CREATEINDEXIF NOT EXISTS temp.virtual_synset_relations_sourceON virtual_synset_relations(source_rowid);
WITH
lex(rowid) AS (VALUES (?), ...),
rel(src_ili_rowid, tgt_ili_rowid, rowid) AS
(SELECT_src.ili_rowid, _tgt.ili_rowid, _rel.rowidFROM synset_relations AS _rel
JOIN synsets AS _src ON_src.rowid=_rel.source_rowidJOIN synsets AS _tgt ON_tgt.rowid=_rel.target_rowidWHERE_rel.lexicon_rowidIN (?, ...))
INSERT INTO virtual_synset_relations
SELECT (SELECT ifnull(max(wnid), 0) +1FROM virtual_synset_relations),
src.rowid, tgt.rowid, tgt.ili_rowid, rel.rowidFROM rel
JOIN synsets AS src ONsrc.ili_rowid=rel.src_ili_rowidJOIN synsets AS tgt ONtgt.ili_rowid=rel.tgt_ili_rowidWHEREsrc.lexicon_rowidIN lex
ANDtgt.lexicon_rowidIN lex;
This creates a temporary table that, for a wordnet (defined as a set of target and expand lexicons), lists all relations that would be available via expansion, but not those in the original lexicons (unless those lexicons are also in the expand set). Creating this table for the Chinese or French wordnets, expanded with PWN 3.0, takes less than 2 seconds each on my machine. It might be less if the table were created in memory only.
I have not yet performed any queries with this to estimate the performance of lookup (e.g., if more indexes are required), or estimated the memory requirements. If it doesn't take much memory, it might make sense to also include non-expanded relations so lookup only needs to access one table of relations.
The text was updated successfully, but these errors were encountered:
Forgot to mention that this would be computed when a user creates a Wordnet object and not permanently stored in the database. This is because the temporary table becomes invalid when the data changes (a lexicon is added, removed, or modified), and also because it wouldn't be efficient to precompute all possible permutations of target + expand lexicons. With LMF 1.1 it's possible to anticipate those with the <Requires> element, so those could be done early, but it's not guaranteed the user wants to use that set, and if it doesn't take long to create a temporary table there's little to gain by doing it early.
The 2-seconds to build climbs quite a bit when many lexicons are loaded. When all of OMW is added, it can be over 10 seconds. Using the SQLite pragma for in-memory temporary tables didn't help. This probably shouldn't be done by default when creating a Wordnet object.
I have a query that correctly performs an ILI-expanded synset relation traversal, but it is rather complicated and wouldn't lend itself well to a recursive query (see #38). This means it only works one hop at a time and doesn't grant much performance bonus, although it does make the Python code simpler (while making the SQL more complex):
Since all these joins and checks occur each time a relation is traversed, it's a bit wasteful. We could precompute them into a temporary table with something like this:
This creates a temporary table that, for a wordnet (defined as a set of target and expand lexicons), lists all relations that would be available via expansion, but not those in the original lexicons (unless those lexicons are also in the expand set). Creating this table for the Chinese or French wordnets, expanded with PWN 3.0, takes less than 2 seconds each on my machine. It might be less if the table were created in memory only.
I have not yet performed any queries with this to estimate the performance of lookup (e.g., if more indexes are required), or estimated the memory requirements. If it doesn't take much memory, it might make sense to also include non-expanded relations so lookup only needs to access one table of relations.
The text was updated successfully, but these errors were encountered: