Skip to content
-Smooth-E- edited this page Feb 15, 2023 · 9 revisions

Introduction

Unlike specific APIs, OdfPy is essentially an abstraction layer just above the XML format. It provides a way to programmatically interact with the XML elements in an OpenDocument file (ODF file). This means that creating or editing ODF files using OdfPy is an exercise in:

  • Creating elements of the appropriate type, attributes and content.
  • Finding elements of a certain type and changing their attributes and/or content.

Background

The main focus of OdfPy has been to prevent the programmer from creating invalid documents. It has checks that raise an exception if the programmer adds an invalid element, adds an attribute unknown to the grammar, forgets to add a required attribute or adds text to an element that doesn't allow it.

About OpenDocument

An OpenDocument file is essentially an XML structure split into four XML files in a zip-file. If you unzip the file, you’ll see content.xml, styles.xml, meta.xml and settings.xml. Odfpy handles these as one memory structure, which you can use the API to navigate. From within the XML you can refer to images, and add them to the zip-archive. Finally, OpenDocument keeps track of all this via a META-INF/manifest.xml file (the manifest) containing the official files of the document.

Creating a document

You start the document by instantiating one of the OpenDocumentChart, OpenDocumentDrawing, OpenDocumentImage, OpenDocumentPresentation, OpenDocumentSpreadsheet or OpenDocumentText classes.

All of these provide properties you can attach elements to:

  • meta
  • scripts
  • fontfacedecls
  • settings
  • styles
  • automaticstyles
  • masterstyles
  • body

Additionally, the OpenDocumentText class provides the text property, which is where you add your text elements. A quick example probably is the best approach to give you an idea.

Example

from odf.opendocument import OpenDocumentText
from odf.style import Style, TextProperties
from odf.text import H, P, Span

textdoc = OpenDocumentText()

# Styles
s = textdoc.styles
h1style = Style(name="Heading 1", family="paragraph")
h1style.addElement(TextProperties(attributes={'fontsize':"24pt",'fontweight':"bold" }))
s.addElement(h1style)

# An automatic style
boldstyle = Style(name="Bold", family="text")
boldprop = TextProperties(fontweight="bold")
boldstyle.addElement(boldprop)
textdoc.automaticstyles.addElement(boldstyle)

# Text
h=H(outlinelevel=1, stylename=h1style, text="My first text")
textdoc.text.addElement(h)
p = P(text="Hello world. ")
boldpart = Span(stylename=boldstyle, text="This part is bold. ")
p.addElement(boldpart)
p.addText("This is after bold.")
textdoc.text.addElement(p)

textdoc.save("myfirstdocument.odt")

You now have your first script-produced document called “myfirstdocument.odt”.

Manipulating documents

When manipulating ODF files programmatically, it is of course important to figure out which element you want to change. For styles, the stylenames can be used. But for manipulating other types of elements in the ODF file, other attributes are needed. For instance, the 'id' attribute that can be set on most draw elements, most form elements, text.H and text.P. However, as opposed to stylenames, typical ODF files created in an office suite don't have id's set on the elements.

Example

In order to annotate the elements in your ODF file, use 'print(doc.xml())' and loops such as the below:

from odf.opendocument import load
from odf import text, draw

infile = 'My-file.odt'
outfile = 'My-file{}.odt'.format(2)

doc = load(infile)
for item in doc.getElementsByType(draw.TextBox):
       print(item.getAttribute('id'))
       for child in item.childNodes:
               print("\tchild:\t{}".format(child))

for item in doc.getElementsByType(draw.TextBox):
       for child in item.getElementsByType(text.Span):
               print("Text-span:\t{}".format(child))
               if str(child) == "magic string":
                       child.setAttribute('id','some-id')

doc.save(outfile)