Precompute expanded relations as a temporary table #110

goodmami · 2021-04-07T03:45:40Z

I have a query that correctly performs an ILI-expanded synset relation traversal, but it is rather complicated and wouldn't lend itself well to a recursive query (see #38). This means it only works one hop at a time and doesn't grant much performance bonus, although it does make the Python code simpler (while making the SQL more complex):

WITH
  rt(rowid, type) AS
    (SELECT rowid, type FROM relation_types WHERE type IN (?, ...)),
  lex(rowid) AS (VALUES (?), ...),
  intra(rowid, source_rowid, target_rowid, type) AS
    (SELECT rel.rowid, source_rowid, target_rowid, rt.type
       FROM synset_relations AS rel
       JOIN rt ON rel.type_rowid = rt.rowid
      WHERE rel.lexicon_rowid IN lex
        AND rel.source_rowid = ?),
  inter(rowid, source_rowid, target_rowid, type) AS
    (SELECT rel.rowid, source_rowid, target_rowid, rt.type
       FROM synset_relations AS rel
       JOIN rt ON rel.type_rowid = rt.rowid
      WHERE rel.lexicon_rowid IN (?, ...)
        AND rel.source_rowid IN
            (SELECT ss1.rowid
               FROM synsets AS ss1
               JOIN ilis ON ilis.rowid = ss1.ili_rowid
              WHERE ilis.id = ?
                AND ss1.rowid != ?))
SELECT intra.type, intra.rowid, tgt.id, tgt.pos,
       ilis.id, tgt.lexicon_rowid, tgt.rowid
  FROM intra
  JOIN synsets AS tgt ON intra.target_rowid = tgt.rowid
  LEFT JOIN ilis ON ilis.rowid = tgt.ili_rowid
 WHERE tgt.lexicon_rowid IN lex
 UNION
SELECT inter.type, inter.rowid,
       CASE WHEN ss.lexicon_rowid IN lex THEN ss.id ELSE null END,
       ss.pos, ilis.id,
       CASE WHEN ss.lexicon_rowid IN lex THEN ss.lexicon_rowid ELSE null END,
       CASE WHEN ss.lexicon_rowid IN lex THEN ss.rowid ELSE null END
  FROM inter
  JOIN synsets AS _tgt ON inter.target_rowid = _tgt.rowid
  JOIN synsets AS ss ON _tgt.ili_rowid = ss.ili_rowid
  JOIN ilis ON ilis.rowid = ss.ili_rowid
 WHERE _tgt.ili_rowid IS NOT NULL;

Since all these joins and checks occur each time a relation is traversed, it's a bit wasteful. We could precompute them into a temporary table with something like this:

CREATE TEMPORARY TABLE IF NOT EXISTS virtual_synset_relations (
    wnid INTEGER NOT NULL,
    source_rowid INTEGER NOT NULL,
    target_rowid INTEGER,
    target_ili INTEGER,
    synset_relation_rowid INTEGER NOT NULL
);
CREATE INDEX IF NOT EXISTS temp.virtual_synset_relations_source
    ON virtual_synset_relations(source_rowid);

WITH
  lex(rowid) AS (VALUES (?), ...),
  rel(src_ili_rowid, tgt_ili_rowid, rowid) AS
    (SELECT _src.ili_rowid, _tgt.ili_rowid, _rel.rowid
       FROM synset_relations AS _rel
       JOIN synsets AS _src ON _src.rowid = _rel.source_rowid
       JOIN synsets AS _tgt ON _tgt.rowid = _rel.target_rowid
      WHERE _rel.lexicon_rowid IN (?, ...))
INSERT INTO virtual_synset_relations
       SELECT (SELECT ifnull(max(wnid), 0) + 1 FROM virtual_synset_relations),
              src.rowid, tgt.rowid, tgt.ili_rowid, rel.rowid
          FROM rel
          JOIN synsets AS src ON src.ili_rowid = rel.src_ili_rowid
          JOIN synsets AS tgt ON tgt.ili_rowid = rel.tgt_ili_rowid
         WHERE src.lexicon_rowid IN lex
           AND tgt.lexicon_rowid IN lex;

This creates a temporary table that, for a wordnet (defined as a set of target and expand lexicons), lists all relations that would be available via expansion, but not those in the original lexicons (unless those lexicons are also in the expand set). Creating this table for the Chinese or French wordnets, expanded with PWN 3.0, takes less than 2 seconds each on my machine. It might be less if the table were created in memory only.

I have not yet performed any queries with this to estimate the performance of lookup (e.g., if more indexes are required), or estimated the memory requirements. If it doesn't take much memory, it might make sense to also include non-expanded relations so lookup only needs to access one table of relations.

goodmami · 2021-04-07T03:50:33Z

Forgot to mention that this would be computed when a user creates a Wordnet object and not permanently stored in the database. This is because the temporary table becomes invalid when the data changes (a lexicon is added, removed, or modified), and also because it wouldn't be efficient to precompute all possible permutations of target + expand lexicons. With LMF 1.1 it's possible to anticipate those with the <Requires> element, so those could be done early, but it's not guaranteed the user wants to use that set, and if it doesn't take long to create a temporary table there's little to gain by doing it early.

goodmami · 2021-06-15T06:13:24Z

The 2-seconds to build climbs quite a bit when many lexicons are loaded. When all of OMW is added, it can be over 10 seconds. Using the SQLite pragma for in-memory temporary tables didn't help. This probably shouldn't be done by default when creating a Wordnet object.

goodmami added the maintenance Improvements to project health label Apr 7, 2021

goodmami mentioned this issue Jun 20, 2021

Speeding up similarity calculations #126

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precompute expanded relations as a temporary table #110

Precompute expanded relations as a temporary table #110

goodmami commented Apr 7, 2021

goodmami commented Apr 7, 2021

goodmami commented Jun 15, 2021

Precompute expanded relations as a temporary table #110

Precompute expanded relations as a temporary table #110

Comments

goodmami commented Apr 7, 2021

goodmami commented Apr 7, 2021

goodmami commented Jun 15, 2021