Extracting Headings and paragraphs in the same order from ODT file #135

ryocchin · 2024-01-23T19:00:14Z

I managed to solve this problem myself.
I created a temporary file in which the Heading is replaced with a paragraph with a special style name, and by rereading it, I was able to reproduce the same order as before.
Thank you if anyone read this

Hello. I am trying to convert odt file to html file using odfpy.
Our files have some headings placed between regular paragraphs that indicate chapter names. For example:

Chapter 1 xxxx
paragraph bbbb dddd eeee.
paragraph ddddd eeee ffff.
Chapter 2 yyyy
paragraph ggggggg
paragraph hhhhhh
Where "Chapter 1" and "chapter 2" are heading in odt and other lines are ordinary paragraphs. I tried to extract these lines as follows:

doc = opendocument.load("somefile.odt")
headers = doc. getElementByType(text.H)   
paras = doc. getElementByType(text.P)

Headings and paragraphs are extrcated in separate lists in 'headers' and 'paras'.
And it is unable to reproduce the same order as in the ODT file. Is there any way to get a list of texts in the same order?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extracting Headings and paragraphs in the same order from ODT file #135

Extracting Headings and paragraphs in the same order from ODT file #135

ryocchin commented Jan 23, 2024 •

edited

Loading

Extracting Headings and paragraphs in the same order from ODT file #135

Extracting Headings and paragraphs in the same order from ODT file #135

Comments

ryocchin commented Jan 23, 2024 • edited Loading

ryocchin commented Jan 23, 2024 •

edited

Loading