Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting Headings and paragraphs in the same order from ODT file #135

Open
ryocchin opened this issue Jan 23, 2024 · 0 comments
Open

Extracting Headings and paragraphs in the same order from ODT file #135

ryocchin opened this issue Jan 23, 2024 · 0 comments

Comments

@ryocchin
Copy link

ryocchin commented Jan 23, 2024

I managed to solve this problem myself.
I created a temporary file in which the Heading is replaced with a paragraph with a special style name, and by rereading it, I was able to reproduce the same order as before.
Thank you if anyone read this


Hello. I am trying to convert odt file to html file using odfpy.
Our files have some headings placed between regular paragraphs that indicate chapter names. For example:

Chapter 1 xxxx
paragraph bbbb dddd eeee.
paragraph ddddd eeee ffff.
Chapter 2 yyyy
paragraph ggggggg
paragraph hhhhhh
Where "Chapter 1" and "chapter 2" are heading in odt and other lines are ordinary paragraphs. I tried to extract these lines as follows:

doc = opendocument.load("somefile.odt")
headers = doc. getElementByType(text.H)   
paras = doc. getElementByType(text.P)

Headings and paragraphs are extrcated in separate lists in 'headers' and 'paras'.
And it is unable to reproduce the same order as in the ODT file. Is there any way to get a list of texts in the same order?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant