-
Notifications
You must be signed in to change notification settings - Fork 24
BasicUsage
The client will help you create, add, amend and delete content being held in a SWORD2 repository (this document will use the word server and phrase SWORD2 repository interchangeably).
The basic concept is that the server holds 'collections' of resources, and may have one or more 'workspaces' which group together sets of these collections.
As a client, you will need to specify which collection you are going to work with and the server's Service Document (think of it like a sitemap) should aid you in finding the correct collection to work with.
The server's "Service Document" is a key, if not crucial thing to know as it lets the client discover the capabilities and locations of the available Collections.
You will need the URL of the server's Service Document. The SWORD2 specification http://sword-app.svn.sourceforge.net/viewvc/sword-app/spec/trunk/SWORDProfile.html?revision=HEAD#autodiscovery that this may be embedded in HTML pages by means of the following element:
<html:link rel="sword" href="[Service Document URL]"/>
So, to get started (Service Document URL is often shortened to SD-URI)
from sword2 import Connection
c = Connection("http://my.server.com/sd-uri", user_name="....", user_pass="......")
(It is almost certain that you will also need to supply a username and password to communicate with the server.)
By default, the client will not contact the remote server and retrieve this document until asked to.
c.get_service_document()
The Connection can alternatively be given the service document, should a local copy exist:
with open("servicedoc.xml", "r") as sd:
c.load_service_document(sd.read())
The Service Document will be loaded and stored in the 'sd' attribute of the Connection and will be accessible from there.
Note that you can set one of Connections many startup flags, 'download_service_document' to True for it to attempt to download the Service Document. Please see http://packages.python.org/sword2/sword2.connection.Connection-class.html within the [http://packages.python.org/sword2/](API documentation) for more details on these flags.
First, make sure that you have a valid service document. One easy way is to check to see if the connection has a service document attribute ('sd'):
>>> c.sd != None
# We can do further checks on this document:
>>> c.sd.parsed # Was the document valid XML and parsable?
True
>>> c.sd.valid # Did the document conform to the SWORD2 profile?
True
If something has gone wrong, then it is best to examine some of the low-level actions that occurred. The Connection class keeps a history of all the transactions it attempts (when started with the default flags) and this is logged in a attribute called 'history'.
This can be 'print'ed out to give a reasonably easy to scan report about what happened:
Example of a failed load (as the service document did not exist at the given URI)
>>> print len(c.history)
2
>>> print c.history
--------------------
Type: 'init' [2011-06-08T09:47:35.509661]
Data:
user_name: None
on_behalf_of: None
sd_iri: http://localhost:8080/sd-iri
--------------------
Type: 'SD_IRI GET' [2011-06-08T09:47:44.675707]
Data:
sd_iri: http://localhost:8080/sd-iri
response: {'transfer-encoding': 'chunked', 'date': 'Wed, 08 Jun 2011 08:47:44 GMT', 'status': '404', 'content-type': 'text/html', 'server': 'CherryPy/3.1.2 WSGI Server'}
process_duration: 0.126793861389
# Or, you can print pretty json (to_pretty_json) or json (to_json) reports:
>>> print c.history.to_pretty_json()
[
{
"timestamp": "2011-06-08T10:01:03.559164",
"type": "init",
"payload": {
"user_name": null,
"on_behalf_of": null,
"sd_iri": "http://localhost:8080/sd-uri"
}
},
{
"timestamp": "2011-06-08T10:01:06.866662",
"type": "SD_IRI GET",
"payload": {
"sd_iri": "http://localhost:8080/sd-uri",
"response": {
"transfer-encoding": "chunked",
"date": "Wed, 08 Jun 2011 09:01:06 GMT",
"status": "401",
"www-authenticate": "Basic realm=\"SSS\"",
"server": "CherryPy/3.1.2 WSGI Server"
},
"process_duration": 0.0077669620513916016
}
}
]
All history 'items' have a type which labels the type of activity they represent and all the data in the report is accessible through the object:
>>> c.history[0]['type']
'init'
>>> c.history[1]['response']['status']
'404'
You can also alter the logging levels of the client - it becomes quite verbose when set to 'DEBUG'!
To correct the incorrect Service Document URI, you can either change the stored uri (c.sd_iri) or alternatively, just create a new instance of the Connection class (and optionally, copy across the history object.)
Now, to have a look around it:
>>> len(c.workspaces)
1
# 'workspaces' attribute is taken from the 'sd' object:
>>> c.workspaces == c.sd.workspaces
True
The 'workspaces' are a list of paired items, the first is the title of the workspace, and the second is a list of the collections within it.
So, a short way to grab the first collection from the first workspace is:
>>> collection = c.workspaces[0][1][1]
>>> collection
<sword2.SDCollection - title: Collection 8916f13f-7be6-471f-ba31-c6faf1cbcf25>
>>> print collection # get a human-readable report for it
Collection: 'Collection 8916f13f-7be6-471f-ba31-c6faf1cbcf25' @ 'http://localhost:8080/col-uri/8916f13f-7be6-471f-ba31-c6faf1cbcf25'. Accept:['*/*']
SWORD: Description - 'Collection Description'
SWORD: Collection Policy - 'Collection Policy'
SWORD: Mediation? - 'True'
SWORD: Treatment - 'Treatment description'
SWORD: Accept Packaging: '['http://purl.org/net/sword/package/SimpleZip', 'http://purl.org/net/sword/package/Binary', 'http://purl.org/net/sword/package/METSDSpaceSIP']'
SWORD: Nested Service Documents - '['http://localhost:8080/sd-uri/f7ca101e-d303-413b-ac60-738d5a1507dc']'
Consider that we have a file ("foo.png") of things to deposit into the collection above. How do we put that file into the collection as a new thing?
We need to tell it about :
- what collection we want to put it in
- Either by supplying the collection's 'href' or by specifying the workspace and collection labels and leaving the client to work it out.
- what the file is and what to do with it
...
>>> collection.href
'http://localhost:8080/col-uri/8916f13f-7be6-471f-ba31-c6faf1cbcf25'
>>> with open("foo.png", "rb") as data:
... receipt = c.create_resource(col_iri = collection.href,
payload = data,
mimetype = "application/zip",
filename = "foo.png",
packaging = "http://purl.org/net/sword/package/Binary")
2011-06-08 10:27:27,844 - sword2.connection - INFO - Received a Resource Created (201) response.
2011-06-08 10:27:27,845 - sword2.connection - INFO - Server response included a Deposit Receipt. Caching a copy in .resources['http://localhost:8080/edit-uri/8916f13f-7be6-471f-ba31-c6faf1cbcf25/11605c9f-b69b-4ba0-ab75-155b5c137995']
>>> receipt
<sword2.deposit_receipt.Deposit_Receipt object at 0x2e86350>
>>> print receipt
atom_author: '
'
atom_generator: 'uri:"http://www.swordapp.org/sss" version:"1.0"'
atom_id: 'tag:container@sss/8916f13f-7be6-471f-ba31-c6faf1cbcf25/11605c9f-b69b-4ba0-ab75-155b5c137995'
atom_summary: 'Content deposited with SWORD client'
atom_title: 'SWORD Deposit'
atom_updated: '2011-06-08T10:27:27Z'
dcterms_abstract: 'Content deposited with SWORD client'
dcterms_creator: 'SWORD Client'
dcterms_title: 'SWORD Deposit'
sword_treatment: 'Treatment description'
sword_verboseDescription: 'SSS has done this, that and the other to process the deposit'
Edit IRI: http://localhost:8080/edit-uri/8916f13f-7be6-471f-ba31-c6faf1cbcf25/11605c9f-b69b-4ba0-ab75-155b5c137995
Edit-Media IRI: http://localhost:8080/em-uri/8916f13f-7be6-471f-ba31-c6faf1cbcf25/11605c9f-b69b-4ba0-ab75-155b5c137995
SWORD2 Add IRI: http://localhost:8080/em-uri/8916f13f-7be6-471f-ba31-c6faf1cbcf25/11605c9f-b69b-4ba0-ab75-155b5c137995
SWORD2 Package formats available: ['http://purl.org/net/sword/package/SimpleZip']
Alternate IRI: http://localhost:8080/html/8916f13f-7be6-471f-ba31-c6faf1cbcf25/11605c9f-b69b-4ba0-ab75-155b5c137995
Link rel:'edit' -- [{'href': 'http://localhost:8080/edit-uri/8916f13f-7be6-471f-ba31-c6faf1cbcf25/11605c9f-b69b-4ba0-ab75-155b5c137995'}]
Link rel:'http://purl.org/net/sword/terms/statement' -- [{'href': 'http://localhost:8080/state-uri/8916f13f-7be6-471f-ba31-c6faf1cbcf25/11605c9f-b69b-4ba0-ab75-155b5c137995.atom', 'type': 'application/atom+xml;type=feed'}, {'href': 'http://localhost:8080/state-uri/8916f13f-7be6-471f-ba31-c6faf1cbcf25/11605c9f-b69b-4ba0-ab75-155b5c137995.rdf', 'type': 'application/rdf+xml'}]
Link rel:'alternate' -- [{'href': 'http://localhost:8080/html/8916f13f-7be6-471f-ba31-c6faf1cbcf25/11605c9f-b69b-4ba0-ab75-155b5c137995'}]
Link rel:'edit-media' -- [{'href': 'http://localhost:8080/em-uri/8916f13f-7be6-471f-ba31-c6faf1cbcf25/11605c9f-b69b-4ba0-ab75-155b5c137995'}, {'href': 'http://localhost:8080/em-uri/8916f13f-7be6-471f-ba31-c6faf1cbcf25/11605c9f-b69b-4ba0-ab75-155b5c137995.atom', 'type': 'application/atom+xml;type=feed'}]
Link rel:'http://purl.org/net/sword/terms/add' -- [{'href': 'http://localhost:8080/em-uri/8916f13f-7be6-471f-ba31-c6faf1cbcf25/11605c9f-b69b-4ba0-ab75-155b5c137995'}]
(Check the API documentation for more information on how to use 'Connection.create_resource')
This receipt is a very useful thing and will be used in later transactions.
We have to specify the collection as before with the binary file so the same comments about that stand.
The key difference is that we want to put in a metadata record instead of a file or package. In the SWORD2 specification, this should be an atom:entry document. There is a convenience class in the sword2 library which simplifies its creation:
>>> from sword2 import Entry
>>> e = Entry() # it can be opened blank, but more usefully...
>>> e = Entry(id="atom id",
title="atom title",
dcterms_identifier="some other id",
etc)
That's it - the metadata entry is created and ready to be uploaded:
>>> receipt = c.create_resource(col_iri = collection.href,
metadata_entry = e)
The Entry is wrapper over the top of an XML document (etree.Element instance) and provides a number of methods to simplify building the Entry:
# Adding fields to the metadata entry
# dcterms (and other, non-atom fields) can be used by passing in a parameter with an underscore between the
# prefix and element name, eg:
>>> e.add_fields(dcterms_title= "dcterms title", dcterms_some_other_field = "other")
# atom:author field is treated slightly differently than all the other fields:
# dictionary is required
>>> e.add_fields(author={"name":"Ben", "email":"[email protected]"})
>>> print str(e)
<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:dcterms="http://purl.org/dc/terms/">
<generator uri="http://bitbucket.org/beno/python-sword2" version="0.1"/>
<updated>2011-06-05T16:20:34.914474</updated>
<dcterms:identifier>some other id</dcterms:identifier>
<id>atom id</id><title>atom title</title>
<author>
<name>Ben</name>
<email>foo@example.org</email>
</author>
<dcterms:some_other_field>other</dcterms:some_other_field>
<dcterms:title>dcterms title</dcterms:title>
</entry>
>>>
While the SWORD2 profile only names Dublin Core Terms (dcterms) as a potential namespace with which to include other metadata, it does not limit this.
To add and use other namespaces with 'Entry':
# Other namespaces - use `Entry.register_namespace` to add them to the list of those considered (prefix, URL):
>>> e.register_namespace("myschema", "http://example.org")
>>> e.add_fields(myschema_foo = "bar")
>>> print str(e)
<?xml version="1.0"?><entry xmlns="http://www.w3.org/2005/Atom" xmlns:dcterms="http://purl.org/dc/terms/">
<generator uri="http://bitbucket.org/beno/python-sword2" version="0.1"/>
<updated>2011-06-05T16:20:34.914474</updated>
<dcterms:identifier>some other id</dcterms:identifier>
<id>atom id</id><title>atom title</title>
<author>
<name>Ben</name>
<email>foo@example.org</email>
</author>
<dcterms:some_other_field>other</dcterms:some_other_field>
<dcterms:title>dcterms title</dcterms:title>
<myschema:foo xmlns:myschema="http://example.org">bar</myschema:foo>
</entry>
This class doesn't provide editing/updating functions as the full etree API is exposed through the attribute 'entry'. For example:
>>> e.entry
<Element {http://www.w3.org/2005/Atom}entry at 0x2e7df50>
>>> dir(e.entry)
['__class__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__doc__', '__format__', '__getattribute__', '__getitem__', '__hash__', '__init__', '__iter__', '__len__', '__new__', '__nonzero__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', '_init', 'addnext', 'addprevious', 'append', 'attrib', 'base', 'clear', 'extend', 'find', 'findall', 'findtext', 'get', 'getchildren', 'getiterator', 'getnext', 'getparent', 'getprevious', 'getroottree', 'index', 'insert', 'items', 'iter', 'iterancestors', 'iterchildren', 'iterdescendants', 'iterfind', 'itersiblings', 'itertext', 'keys', 'makeelement', 'nsmap', 'prefix', 'remove', 'replace', 'set', 'sourceline', 'tag', 'tail', 'text', 'values', 'xpath']
>>> len(e.entry.getchildren())
14
[AlterAmendDelete](Next: altering the deposited resource)