-
Notifications
You must be signed in to change notification settings - Fork 64
Introduction
Unlike specific APIs, OdfPy is essentially an abstraction layer just above the XML format. It provides a way to programmatically interact with the XML elements in an OpenDocument file (ODF file). This means that creating or editing ODF files using OdfPy is an exercise in:
- Creating elements of the appropriate type, attributes and content.
- Finding elements of a certain type and changing their attributes and/or content.
The main focus of OdfPy has been to prevent the programmer from creating invalid documents. It has checks that raise an exception if the programmer adds an invalid element, adds an attribute unknown to the grammar, forgets to add a required attribute or adds text to an element that doesn't allow it.
An OpenDocument file is essentially an XML structure split into four XML files in a zip-file. If you unzip the file, you’ll see content.xml, styles.xml, meta.xml and settings.xml. Odfpy handles these as one memory structure, which you can use the API to navigate. From within the XML you can refer to images, and add them to the zip-archive. Finally, OpenDocument keeps track of all this via a META-INF/manifest.xml file (the manifest) containing the official files of the document.
You start the document by instantiating one of the OpenDocumentChart, OpenDocumentDrawing, OpenDocumentImage, OpenDocumentPresentation, OpenDocumentSpreadsheet or OpenDocumentText classes.
All of these provide properties you can attach elements to:
- meta
- scripts
- fontfacedecls
- settings
- styles
- automaticstyles
- masterstyles
- body
Additionally, the OpenDocumentText class provides the text property, which is where you add your text elements. A quick example probably is the best approach to give you an idea.
from odf.opendocument import OpenDocumentText
from odf.style import Style, TextProperties
from odf.text import H, P, Span
textdoc = OpenDocumentText()
# Styles
s = textdoc.styles
h1style = Style(name="Heading 1", family="paragraph")
h1style.addElement(TextProperties(attributes={'fontsize':"24pt",'fontweight':"bold" }))
s.addElement(h1style)
# An automatic style
boldstyle = Style(name="Bold", family="text")
boldprop = TextProperties(fontweight="bold")
boldstyle.addElement(boldprop)
textdoc.automaticstyles.addElement(boldstyle)
# Text
h=H(outlinelevel=1, stylename=h1style, text="My first text")
textdoc.text.addElement(h)
p = P(text="Hello world. ")
boldpart = Span(stylename=boldstyle, text="This part is bold. ")
p.addElement(boldpart)
p.addText("This is after bold.")
textdoc.text.addElement(p)
textdoc.save("myfirstdocument.odt")
You now have your first script-produced document called “myfirstdocument.odt”.
When manipulating ODF files programmatically, it is of course important to figure out which element you want to change. For styles, the stylenames can be used. But for manipulating other types of elements in the ODF file, other attributes are needed. For instance, the 'id' attribute that can be set on most draw elements, most form elements, text.H and text.P. However, as opposed to stylenames, typical ODF files created in an office suite don't have id's set on the elements.
In order to annotate the elements in your ODF file, use 'print(doc.xml())' and loops such as the below:
from odf.opendocument import load
from odf import text, draw
infile = 'My-file.odt'
outfile = 'My-file{}.odt'.format(2)
doc = load(infile)
for item in doc.getElementsByType(draw.TextBox):
print(item.getAttribute('id'))
for child in item.childNodes:
print("\tchild:\t{}".format(child))
for item in doc.getElementsByType(draw.TextBox):
for child in item.getElementsByType(text.Span):
print("Text-span:\t{}".format(child))
if str(child) == "magic string":
child.setAttribute('id','some-id')
doc.save(outfile)