Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompute expanded relations as a temporary table #110

Open
goodmami opened this issue Apr 7, 2021 · 2 comments
Open

Precompute expanded relations as a temporary table #110

goodmami opened this issue Apr 7, 2021 · 2 comments
Labels
maintenance Improvements to project health

Comments

@goodmami
Copy link
Owner

goodmami commented Apr 7, 2021

I have a query that correctly performs an ILI-expanded synset relation traversal, but it is rather complicated and wouldn't lend itself well to a recursive query (see #38). This means it only works one hop at a time and doesn't grant much performance bonus, although it does make the Python code simpler (while making the SQL more complex):

WITH
  rt(rowid, type) AS
    (SELECT rowid, type FROM relation_types WHERE type IN (?, ...)),
  lex(rowid) AS (VALUES (?), ...),
  intra(rowid, source_rowid, target_rowid, type) AS
    (SELECT rel.rowid, source_rowid, target_rowid, rt.type
       FROM synset_relations AS rel
       JOIN rt ON rel.type_rowid = rt.rowid
      WHERE rel.lexicon_rowid IN lex
        AND rel.source_rowid = ?),
  inter(rowid, source_rowid, target_rowid, type) AS
    (SELECT rel.rowid, source_rowid, target_rowid, rt.type
       FROM synset_relations AS rel
       JOIN rt ON rel.type_rowid = rt.rowid
      WHERE rel.lexicon_rowid IN (?, ...)
        AND rel.source_rowid IN
            (SELECT ss1.rowid
               FROM synsets AS ss1
               JOIN ilis ON ilis.rowid = ss1.ili_rowid
              WHERE ilis.id = ?
                AND ss1.rowid != ?))
SELECT intra.type, intra.rowid, tgt.id, tgt.pos,
       ilis.id, tgt.lexicon_rowid, tgt.rowid
  FROM intra
  JOIN synsets AS tgt ON intra.target_rowid = tgt.rowid
  LEFT JOIN ilis ON ilis.rowid = tgt.ili_rowid
 WHERE tgt.lexicon_rowid IN lex
 UNION
SELECT inter.type, inter.rowid,
       CASE WHEN ss.lexicon_rowid IN lex THEN ss.id ELSE null END,
       ss.pos, ilis.id,
       CASE WHEN ss.lexicon_rowid IN lex THEN ss.lexicon_rowid ELSE null END,
       CASE WHEN ss.lexicon_rowid IN lex THEN ss.rowid ELSE null END
  FROM inter
  JOIN synsets AS _tgt ON inter.target_rowid = _tgt.rowid
  JOIN synsets AS ss ON _tgt.ili_rowid = ss.ili_rowid
  JOIN ilis ON ilis.rowid = ss.ili_rowid
 WHERE _tgt.ili_rowid IS NOT NULL;

Since all these joins and checks occur each time a relation is traversed, it's a bit wasteful. We could precompute them into a temporary table with something like this:

CREATE TEMPORARY TABLE IF NOT EXISTS virtual_synset_relations (
    wnid INTEGER NOT NULL,
    source_rowid INTEGER NOT NULL,
    target_rowid INTEGER,
    target_ili INTEGER,
    synset_relation_rowid INTEGER NOT NULL
);
CREATE INDEX IF NOT EXISTS temp.virtual_synset_relations_source
    ON virtual_synset_relations(source_rowid);

WITH
  lex(rowid) AS (VALUES (?), ...),
  rel(src_ili_rowid, tgt_ili_rowid, rowid) AS
    (SELECT _src.ili_rowid, _tgt.ili_rowid, _rel.rowid
       FROM synset_relations AS _rel
       JOIN synsets AS _src ON _src.rowid = _rel.source_rowid
       JOIN synsets AS _tgt ON _tgt.rowid = _rel.target_rowid
      WHERE _rel.lexicon_rowid IN (?, ...))
INSERT INTO virtual_synset_relations
       SELECT (SELECT ifnull(max(wnid), 0) + 1 FROM virtual_synset_relations),
              src.rowid, tgt.rowid, tgt.ili_rowid, rel.rowid
          FROM rel
          JOIN synsets AS src ON src.ili_rowid = rel.src_ili_rowid
          JOIN synsets AS tgt ON tgt.ili_rowid = rel.tgt_ili_rowid
         WHERE src.lexicon_rowid IN lex
           AND tgt.lexicon_rowid IN lex;

This creates a temporary table that, for a wordnet (defined as a set of target and expand lexicons), lists all relations that would be available via expansion, but not those in the original lexicons (unless those lexicons are also in the expand set). Creating this table for the Chinese or French wordnets, expanded with PWN 3.0, takes less than 2 seconds each on my machine. It might be less if the table were created in memory only.

I have not yet performed any queries with this to estimate the performance of lookup (e.g., if more indexes are required), or estimated the memory requirements. If it doesn't take much memory, it might make sense to also include non-expanded relations so lookup only needs to access one table of relations.

@goodmami goodmami added the maintenance Improvements to project health label Apr 7, 2021
@goodmami
Copy link
Owner Author

goodmami commented Apr 7, 2021

Forgot to mention that this would be computed when a user creates a Wordnet object and not permanently stored in the database. This is because the temporary table becomes invalid when the data changes (a lexicon is added, removed, or modified), and also because it wouldn't be efficient to precompute all possible permutations of target + expand lexicons. With LMF 1.1 it's possible to anticipate those with the <Requires> element, so those could be done early, but it's not guaranteed the user wants to use that set, and if it doesn't take long to create a temporary table there's little to gain by doing it early.

@goodmami
Copy link
Owner Author

The 2-seconds to build climbs quite a bit when many lexicons are loaded. When all of OMW is added, it can be over 10 seconds. Using the SQLite pragma for in-memory temporary tables didn't help. This probably shouldn't be done by default when creating a Wordnet object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Improvements to project health
Projects
None yet
Development

No branches or pull requests

1 participant