More robust parser result handling #1120

myrix · 2024-05-06T04:36:53Z

Current implementation of parser result processing is problematic.

Parser results with disambiguation info are stored as plain text html, see DB table parserresult attribute content, are displayed in the interface as is,

lingvodoc-react/src/components/OdtMarkupModal/index.js

Line 505 in 39b0000

dangerouslySetInnerHTML={{ __html: this.content }}

and are modified by directly taking and saving interface HTML source as is,

lingvodoc-react/src/components/OdtMarkupModal/index.js

Line 396 in 39b0000

    
           this.docToSave.getElementsByTagName("body")[0].innerHTML = document.getElementById("markup-content").innerHTML;

This is obviously unsafe and leads to problems when there are unintended interface HTML source modifications, e.g. when the interface page is modified by translation extensions or built-in translation browser functionality, messing up parser result HTML markup structure.

We need to fix this by properly storing parser result data in explicit internal representation format, e.g. as JSON, both on the backend and the frontend, so that interface would explicitly display, modify and save this representation ensuring its integrity.

Naturally, all functionality which uses parser results as source data, in particular valency example extraction, should be suitably updated. Also, it might be beneficial to store parser results not as whole big JSON documents, but separately by paragraphs or even paragraphs and sentences to simplify processing and editing, in particular allowing to minimize data exchange between frontend and backend when saving disambiguation updates, though that will require more extensive modifications to parserresult DB table (and perhaps intoduction of additional helper tables) and source code of corresponding functionality and should be carefully considered before deciding whether to go for it or not.

It may very well be possible that to a certain extent work on this issue would be better done concurrently with other current issues pertaining to handling of parser results and their derivatives.

The text was updated successfully, but these errors were encountered:

* init * get paragraph id * get dedoc data * text from dedoc * full results structure * fixes * fixes * undo doc_parser.py * handling several bold words * fix * next * fixes * better bold font and refactoring * refactoring * save to_json * most correct version * use json any way * cleanup * cleanup * next changes * some fixes * json_to_html * next steps * fixes after testing * fixes * fixed for strange parsers

* get paragraph id * fixes * next * fix * fixes * fixes * fixes * fixes * it works * better bold font and refactoring * refactoring * save to_json * most correct version * cleanup * cleanup * get_by_id * next steps * next steps * right components * selection * selection * next changes * some fixes * show results * toggle variants * toggle unverified * correct setting approved * cleanup * count highlighted * best solution for add markup * refactoring * refactoring * new removeFromMarkup * more smart code * pasteMarkup * paste results but keep prefix * save * parse element * minor * almost complete * UserVariantModal * next steps * fixes after testing * fixes * thin fixes * update on delete key

vmonakhov · 2024-06-25T16:14:05Z

The issue is mostly resolved. Main points:

In OdtMarkupModal component now we have internal representation of parserresult as JSON. It is generated in backend and transfered through network.
Parsers generate html, other components (e.g. valency) use html. So in database we stil store our parserresult as html and convert it every time to json and back on backend side.
All elements in browser are rendered as react components. All actions with them reflect in JSON state.
Manipulations with DOM are stil using for browserselection and in order to find textnode index within whole text.
Many side-effects were fixed. So now we can use bold and/or italic font in text and keep it after manipulations with markup.
Everywhere I tried to avoid code duplication and to fix such cases in old code.

myrix added enhancement this label means that resolving the issue would improve some part of the system backend bug is related to backend frontend bug is related to frontend labels May 6, 2024

vmonakhov self-assigned this May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More robust parser result handling #1120

More robust parser result handling #1120

myrix commented May 6, 2024

vmonakhov commented Jun 25, 2024 •

edited

Loading

More robust parser result handling #1120

More robust parser result handling #1120

Comments

myrix commented May 6, 2024

vmonakhov commented Jun 25, 2024 • edited Loading

vmonakhov commented Jun 25, 2024 •

edited

Loading