From bba8f5f948821c5afb9355ee596ff644f7d24615 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Wed, 22 Nov 2023 09:33:37 +0100
Subject: [PATCH 01/43] Point to the release page on GitHub instead of PyPI for
 the download link in the README

---
 README.rst | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/README.rst b/README.rst
index 1d1b19ca..e32e8ce3 100644
--- a/README.rst
+++ b/README.rst
@@ -21,10 +21,10 @@ is based on Suds and extends it with ICAT specific features.
 Download
 --------
 
-The latest release version can be found in the
-`Python Package Index (PyPI)`__.
+The latest release version can be found at the
+`release page on GitHub`__.
 
-.. __: `PyPI site`_
+.. __: `GitHub release`_
 
 
 Documentation
@@ -64,6 +64,6 @@ permissions and limitations under the License.
 
 
 .. _ICAT: https://icatproject.org/
-.. _PyPI site: https://pypi.org/project/python-icat/
+.. _GitHub release: https://github.com/icatproject/python-icat/releases/latest
 .. _Read the Docs site: https://python-icat.readthedocs.io/
 .. _Apache License: https://www.apache.org/licenses/LICENSE-2.0

From 9be8303b0dd96ac5ef991a2fecab34b3830303fd Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Wed, 22 Nov 2023 09:48:27 +0100
Subject: [PATCH 02/43] Put a full download URL in the spec file

---
 python-icat.spec | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python-icat.spec b/python-icat.spec
index b0f4292c..2401ab9f 100644
--- a/python-icat.spec
+++ b/python-icat.spec
@@ -15,7 +15,7 @@ Url:		$url
 Summary:	$description
 License:	Apache-2.0
 Group:		Development/Libraries/Python
-Source:		%{name}-%{version}.tar.gz
+Source:		https://github.com/icatproject/python-icat/releases/latest/download/python-icat-%{version}.tar.gz
 BuildRequires:	python%{pyversfx}-base >= 3.4
 BuildRequires:	python%{pyversfx}-setuptools
 BuildRequires:	fdupes

From 3999092882b89d43295a4c6dfff1f60779ce5db3 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Thu, 21 Dec 2023 10:24:58 +0100
Subject: [PATCH 03/43] Set the table of predefined configuration variables in
 a table directive

---
 doc/src/_static/css/captions.css | 14 +++++++
 doc/src/conf.py                  |  1 +
 doc/src/config.rst               | 66 ++++++++++++++++----------------
 3 files changed, 49 insertions(+), 32 deletions(-)
 create mode 100644 doc/src/_static/css/captions.css

diff --git a/doc/src/_static/css/captions.css b/doc/src/_static/css/captions.css
new file mode 100644
index 00000000..8321eee7
--- /dev/null
+++ b/doc/src/_static/css/captions.css
@@ -0,0 +1,14 @@
+.rst-content div.figure p.caption, .rst-content table.docutils caption, .rst-content div.code-block-caption{
+    color: #404040;
+    font-style: italic;
+    font-size: 90%;
+    line-height: normal;
+    text-align: left;
+}
+.rst-content div.figure p.caption span.caption-number, .rst-content table.docutils caption span.caption-number, .rst-content div.code-block-caption span.caption-number{
+    font-weight: bold;
+}
+.rst-content div.code-block-caption a.headerlink, .rst-content table.docutils caption a.headerlink{
+    display: none;
+    visibility: hidden;
+}
diff --git a/doc/src/conf.py b/doc/src/conf.py
index f41f3773..a5fca0c9 100644
--- a/doc/src/conf.py
+++ b/doc/src/conf.py
@@ -109,6 +109,7 @@
 html_favicon = "images/favicon-32x32.png"
 
 html_css_files = [
+    'css/captions.css',
     'css/spacing.css',
 ]
 
diff --git a/doc/src/config.rst b/doc/src/config.rst
index 0f5c42bd..ff706eed 100644
--- a/doc/src/config.rst
+++ b/doc/src/config.rst
@@ -138,38 +138,40 @@ A few derived variables are also set in
     (username and password if authenticator information is not
     available) suitable to be passed to :meth:`icat.client.Client.login`.
 
-The command line arguments, environment variables, and default values
-for the configuration variables are as follows:
-
-+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
-| Name            | Command line                | Environment           | Default        | Mandatory | Notes        |
-+=================+=============================+=======================+================+===========+==============+
-| `configFile`    | ``-c``, ``--configfile``    | ``ICAT_CFG``          | depends        | no        | \(1)         |
-+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
-| `configSection` | ``-s``, ``--configsection`` | ``ICAT_CFG_SECTION``  | :const:`None`  | no        |              |
-+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
-| `url`           | ``-w``, ``--url``           | ``ICAT_SERVICE``      |                | yes       |              |
-+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
-| `idsurl`        | ``--idsurl``                | ``ICAT_DATA_SERVICE`` | :const:`None`  | depends   | \(2)         |
-+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
-| `checkCert`     | ``--check-certificate``,    |                       | :const:`True`  | no        |              |
-|                 | ``--no-check-certificate``  |                       |                |           |              |
-+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
-| `http_proxy`    | ``--http-proxy``            | ``http_proxy``        | :const:`None`  | no        |              |
-+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
-| `https_proxy`   | ``--https-proxy``           | ``https_proxy``       | :const:`None`  | no        |              |
-+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
-| `no_proxy`      | ``--no-proxy``              | ``no_proxy``          | :const:`None`  | no        |              |
-+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
-| `auth`          | ``-a``, ``--auth``          | ``ICAT_AUTH``         |                | yes       | \(3)         |
-+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
-| `username`      | ``-u``, ``--user``          | ``ICAT_USER``         |                | yes       | \(3),(4)     |
-+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
-| `password`      | ``-p``, ``--pass``          |                       | interactive    | yes       | \(3),(4),(5) |
-+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
-| `promptPass`    | ``-P``, ``--prompt-pass``   |                       | :const:`False` | no        | \(3),(4),(5) |
-+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
-
+.. table:: Command line arguments, environment variables, and default values
+	   for the configuration variables.
+    :name: tab-config-vars
+
+    +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
+    | Name            | Command line                | Environment           | Default        | Mandatory | Notes        |
+    +=================+=============================+=======================+================+===========+==============+
+    | `configFile`    | ``-c``, ``--configfile``    | ``ICAT_CFG``          | depends        | no        | \(1)         |
+    +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
+    | `configSection` | ``-s``, ``--configsection`` | ``ICAT_CFG_SECTION``  | :const:`None`  | no        |              |
+    +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
+    | `url`           | ``-w``, ``--url``           | ``ICAT_SERVICE``      |                | yes       |              |
+    +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
+    | `idsurl`        | ``--idsurl``                | ``ICAT_DATA_SERVICE`` | :const:`None`  | depends   | \(2)         |
+    +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
+    | `checkCert`     | ``--check-certificate``,    |                       | :const:`True`  | no        |              |
+    |                 | ``--no-check-certificate``  |                       |                |           |              |
+    +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
+    | `http_proxy`    | ``--http-proxy``            | ``http_proxy``        | :const:`None`  | no        |              |
+    +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
+    | `https_proxy`   | ``--https-proxy``           | ``https_proxy``       | :const:`None`  | no        |              |
+    +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
+    | `no_proxy`      | ``--no-proxy``              | ``no_proxy``          | :const:`None`  | no        |              |
+    +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
+    | `auth`          | ``-a``, ``--auth``          | ``ICAT_AUTH``         |                | yes       | \(3)         |
+    +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
+    | `username`      | ``-u``, ``--user``          | ``ICAT_USER``         |                | yes       | \(3),(4)     |
+    +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
+    | `password`      | ``-p``, ``--pass``          |                       | interactive    | yes       | \(3),(4),(5) |
+    +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
+    | `promptPass`    | ``-P``, ``--prompt-pass``   |                       | :const:`False` | no        | \(3),(4),(5) |
+    +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
+
+See the table for an overview of predefined configuration variables.
 Mandatory means that an error will be raised in
 :meth:`icat.config.Config.getconfig` if no value is found for the
 configuration variable in question.

From 299c82bc817dc673e7a1ca981b410a9b9d8f7f8c Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Thu, 21 Dec 2023 11:21:14 +0100
Subject: [PATCH 04/43] ReST style fixes: - fix tabulation used for indentation
 - remove trailing white space

---
 doc/src/client.rst     | 2 +-
 doc/src/config.rst     | 6 +++---
 doc/src/icatingest.rst | 6 +++---
 doc/src/ingest.rst     | 2 +-
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/doc/src/client.rst b/doc/src/client.rst
index 0f0b99dc..e32dc2df 100644
--- a/doc/src/client.rst
+++ b/doc/src/client.rst
@@ -29,7 +29,7 @@ manages the interaction with an ICAT service as a client.
 
         Version of the ICAT server this client connects to.
 
-	.. versionchanged:: 1.0.0
+        .. versionchanged:: 1.0.0
             changed type to :class:`icat.helper.Version`
 
     .. attribute:: autoLogout
diff --git a/doc/src/config.rst b/doc/src/config.rst
index ff706eed..af597211 100644
--- a/doc/src/config.rst
+++ b/doc/src/config.rst
@@ -62,8 +62,8 @@ added.  The main class that client programs interact with is
     .. attribute:: client
 
         The :class:`icat.client.Client` object initialized according to
-	the configuration.  This is also the first element in the
-	return value from :meth:`getconfig`.
+        the configuration.  This is also the first element in the
+        return value from :meth:`getconfig`.
 
     .. attribute:: client_kwargs
 
@@ -139,7 +139,7 @@ A few derived variables are also set in
     available) suitable to be passed to :meth:`icat.client.Client.login`.
 
 .. table:: Command line arguments, environment variables, and default values
-	   for the configuration variables.
+           for the configuration variables.
     :name: tab-config-vars
 
     +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+
diff --git a/doc/src/icatingest.rst b/doc/src/icatingest.rst
index 7cba2199..62e571cb 100644
--- a/doc/src/icatingest.rst
+++ b/doc/src/icatingest.rst
@@ -71,12 +71,12 @@ The following options are specific to icatingest:
 
     **CHECK**
         Compare all attributes from the input object with the already
-	existing object in ICAT.  Throw an error of any attribute
-	differs.
+        existing object in ICAT.  Throw an error of any attribute
+        differs.
 
     **OVERWRITE**
         Overwrite the existing object in ICAT, e.g. update it with all
-	attributes set to the values found in the input object.
+        attributes set to the values found in the input object.
 
     If :option:`--upload-datafiles` is set, this option will be
     ignored for Datafile objects which will then always raise an error
diff --git a/doc/src/ingest.rst b/doc/src/ingest.rst
index 4fed8b7e..a2880030 100644
--- a/doc/src/ingest.rst
+++ b/doc/src/ingest.rst
@@ -121,7 +121,7 @@ class attributes as follows::
   import icat.ingest
 
   class MyFacilityIngestReader(icat.ingest.IngestReader):
-  
+
       # Override the directory to search for XSD and XSLT files:
       SchemaDir = Path("/usr/share/icat/my-facility")
 

From 1b27f82fe3607d917bed830b7a92fdb2a8d49a72 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Thu, 21 Dec 2023 11:24:13 +0100
Subject: [PATCH 05/43] Set the Synopsis section in man pages as line blocks

---
 doc/src/icatdump.rst   | 2 +-
 doc/src/icatingest.rst | 3 ++-
 doc/src/wipeicat.rst   | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/doc/src/icatdump.rst b/doc/src/icatdump.rst
index 6e7d0caf..0023fca3 100644
--- a/doc/src/icatdump.rst
+++ b/doc/src/icatdump.rst
@@ -7,7 +7,7 @@ icatdump
 Synopsis
 ~~~~~~~~
 
-**icatdump** [*standard options*] [-o FILE] [-f FORMAT]
+| **icatdump** [*standard options*] [-o FILE] [-f FORMAT]
 
 
 Description
diff --git a/doc/src/icatingest.rst b/doc/src/icatingest.rst
index 62e571cb..c260d468 100644
--- a/doc/src/icatingest.rst
+++ b/doc/src/icatingest.rst
@@ -7,7 +7,8 @@ icatingest
 Synopsis
 ~~~~~~~~
 
-**icatingest** [*standard options*] [-i FILE] [-f FORMAT] [--upload-datafiles] [--datafile-dir DATADIR] [--duplicate OPTION]
+| **icatingest** [*standard options*] [-i FILE] [-f FORMAT]
+|     [--upload-datafiles] [--datafile-dir DATADIR] [--duplicate OPTION]
 
 
 Description
diff --git a/doc/src/wipeicat.rst b/doc/src/wipeicat.rst
index 89567684..1c1ca4cd 100644
--- a/doc/src/wipeicat.rst
+++ b/doc/src/wipeicat.rst
@@ -7,7 +7,7 @@ wipeicat
 Synopsis
 ~~~~~~~~
 
-**wipeicat** [*options*]
+| **wipeicat** [*options*]
 
 
 Description

From d1d0f385f7d64f05c70534d3de14c8af1d987287 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Thu, 21 Dec 2023 11:47:07 +0100
Subject: [PATCH 06/43] Add GitHub action to check ReST input files

---
 .github/workflows/rst-lint.yaml | 12 ++++++++++++
 1 file changed, 12 insertions(+)
 create mode 100644 .github/workflows/rst-lint.yaml

diff --git a/.github/workflows/rst-lint.yaml b/.github/workflows/rst-lint.yaml
new file mode 100644
index 00000000..b5e7c2fe
--- /dev/null
+++ b/.github/workflows/rst-lint.yaml
@@ -0,0 +1,12 @@
+name: Check ReST input files
+on: [push, pull_request]
+jobs:
+  doc8:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Check out repository code
+        uses: actions/checkout@v4
+      - name: doc8-check
+        uses: deep-entertainment/doc8-action@v4
+        with:
+          scanPaths: "doc/src"

From cec25957e021d341bff3a460156748c581d77771 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Fri, 22 Dec 2023 11:09:57 +0100
Subject: [PATCH 07/43] Drop version constraint on Sphinx in RtD requirements,
 e.g. essentially update tha Sphinx version used for building the
 documentation

---
 .rtd-require | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/.rtd-require b/.rtd-require
index 2de815cd..2972d516 100644
--- a/.rtd-require
+++ b/.rtd-require
@@ -4,6 +4,4 @@ packaging
 setuptools
 setuptools_scm
 suds
-jinja2<3.1
-sphinx>=2,<3
-sphinx-rtd-theme>=0.5,<1
+sphinx_rtd_theme

From 4306390a8f27b96a4ae33fba0cb93c2bae7ea271 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Fri, 22 Dec 2023 11:44:28 +0100
Subject: [PATCH 08/43] Add sphinx_copybutton extension

---
 .rtd-require    | 1 +
 doc/src/conf.py | 1 +
 2 files changed, 2 insertions(+)

diff --git a/.rtd-require b/.rtd-require
index 2972d516..99c132bb 100644
--- a/.rtd-require
+++ b/.rtd-require
@@ -4,4 +4,5 @@ packaging
 setuptools
 setuptools_scm
 suds
+sphinx-copybutton
 sphinx_rtd_theme
diff --git a/doc/src/conf.py b/doc/src/conf.py
index a5fca0c9..a75c5c52 100644
--- a/doc/src/conf.py
+++ b/doc/src/conf.py
@@ -40,6 +40,7 @@
 extensions = [
     'sphinx.ext.autodoc',
     'sphinx.ext.intersphinx',
+    'sphinx_copybutton',
 ]
 
 # Add any paths that contain templates here, relative to this directory.

From fafaa64ce9a89f6f50a5b4a0257fe0ed70bf27ef Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Fri, 22 Dec 2023 13:54:12 +0100
Subject: [PATCH 09/43] Add python scripts to contain the interactive code
 blocks from the tutorial

---
 doc/tutorial/create.py |  58 ++++++++++++
 doc/tutorial/edit.py   |  43 +++++++++
 doc/tutorial/ids.py    |  83 +++++++++++++++++
 doc/tutorial/search.py | 200 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 384 insertions(+)
 create mode 100644 doc/tutorial/create.py
 create mode 100644 doc/tutorial/edit.py
 create mode 100644 doc/tutorial/ids.py
 create mode 100644 doc/tutorial/search.py

diff --git a/doc/tutorial/create.py b/doc/tutorial/create.py
new file mode 100644
index 00000000..9a2fc841
--- /dev/null
+++ b/doc/tutorial/create.py
@@ -0,0 +1,58 @@
+# Tutorial / Creating stuff in the ICAT server
+# interactive code blocks
+
+# Creating simple objects
+
+f1 = client.new("Facility")
+f1.name = "Fac1"
+f1.fullName = "Facility 1"
+f1.id = client.create(f1)
+client.search("SELECT f FROM Facility f")
+
+# --------------------
+
+f2 = client.new("Facility", name="Fac2", fullName="Facility 2")
+f2.create()
+client.search("SELECT f FROM Facility f")
+
+# Relationships to other objects
+
+f1 = client.get("Facility", 1)
+
+# --------------------
+
+pt1 = client.new("ParameterType")
+pt1.name = "Test parameter type 1"
+pt1.units = "pct"
+pt1.applicableToDataset = True
+pt1.valueType = "NUMERIC"
+pt1.facility = f1
+pt1.create()
+
+# --------------------
+
+pt2 = client.new("ParameterType")
+pt2.name = "Test parameter type 2"
+pt2.units = "N/A"
+pt2.applicableToDataset = True
+pt2.valueType = "STRING"
+pt2.facility = f1
+for v in ["buono", "brutto", "cattivo"]:
+    psv = client.new("PermissibleStringValue", value=v)
+    pt2.permissibleStringValues.append(psv)
+
+pt2.create()
+
+# --------------------
+
+query = "SELECT pt FROM ParameterType pt INCLUDE pt.facility, pt.permissibleStringValues"
+client.search(query)
+
+# Access rules
+
+publicTables = [ "Application", "DatafileFormat", "DatasetType",
+                 "Facility", "FacilityCycle", "Instrument",
+                 "InvestigationType", "ParameterType",
+                 "PermissibleStringValue", "SampleType", ]
+queries = [ "SELECT o FROM %s o" % t for t in publicTables ]
+client.createRules("R", queries)
diff --git a/doc/tutorial/edit.py b/doc/tutorial/edit.py
new file mode 100644
index 00000000..ca2aacc4
--- /dev/null
+++ b/doc/tutorial/edit.py
@@ -0,0 +1,43 @@
+# Tutorial / Working with objects in the ICAT server
+# interactive code blocks
+
+client.search("SELECT f FROM Facility f")
+
+# Editing the attributes of objects
+
+for facility in client.search("SELECT f FROM Facility f"):
+    facility.description = "An example facility"
+    facility.daysUntilRelease = 1826
+    facility.fullName = "%s Facility" % facility.name
+    client.update(facility)
+
+client.search("SELECT f FROM Facility f")
+
+# --------------------
+
+for facility in client.search("SELECT f FROM Facility f"):
+    facility.description = None
+    facility.update()
+
+client.search("SELECT f FROM Facility f")
+
+# Copying objects
+
+fac = client.get("Facility f INCLUDE f.parameterTypes", 1)
+print(fac)
+
+# --------------------
+
+facc = fac.copy()
+print(facc.name)
+print(facc.parameterTypes[0].name)
+facc.name = "Fac0"
+facc.parameterTypes[0].name = "Test parameter type 0"
+print(fac.name)
+print(fac.parameterTypes[0].name)
+
+# --------------------
+
+fac.truncateRelations()
+print(fac)
+print(facc)
diff --git a/doc/tutorial/ids.py b/doc/tutorial/ids.py
new file mode 100644
index 00000000..84bc12c1
--- /dev/null
+++ b/doc/tutorial/ids.py
@@ -0,0 +1,83 @@
+# Tutorial / Upload and download files to and from IDS
+# interactive code blocks
+
+client.ids.isReadOnly()
+
+# Upload files
+
+users = [("jdoe", "John"), ("nbour", "Nicolas"), ("rbeck", "Rudolph")]
+for user, name in users:
+    with open("greet-%s.txt" % user, "wt") as f:
+        print("Hello %s!" % name, file=f)
+
+# --------------------
+
+from icat.query import Query
+investigation = client.assertedSearch(Query(client, "Investigation", conditions={"name": "= '12100409-ST'"}))[0]
+dataset = client.new("Dataset")
+dataset.investigation = investigation
+dataset.type = client.assertedSearch(Query(client, "DatasetType", conditions={"name": "= 'other'"}))[0]
+dataset.name = "greetings"
+dataset.complete = False
+dataset.create()
+
+# --------------------
+
+df_format = client.assertedSearch(Query(client, "DatafileFormat", conditions={"name": "= 'Text'"}))[0]
+for fname in ("greet-jdoe.txt", "greet-nbour.txt", "greet-rbeck.txt"):
+    datafile = client.new("Datafile", name=fname, dataset=dataset, datafileFormat=df_format)
+    client.putData(fname, datafile)
+
+# Download files
+
+query = Query(client, "Datafile", conditions={"name": "= 'greet-jdoe.txt'", "dataset.name": "= 'greetings'"})
+df = client.assertedSearch(query)[0]
+data = client.getData([df])
+type(data)
+data.read().decode('utf8')
+
+# --------------------
+
+from io import BytesIO
+from zipfile import ZipFile
+query = Query(client, "Dataset", conditions={"name": "= 'greetings'"})
+ds = client.assertedSearch(query)[0]
+data = client.getData([ds])
+buffer = BytesIO(data.read())
+with ZipFile(buffer) as zipfile:
+    for f in zipfile.namelist():
+        print("file name: %s" % f)
+        print("content: %r" % zipfile.open(f).read().decode('utf8'))
+
+# --------------------
+
+from icat.ids import DataSelection
+selection = DataSelection([ds])
+client.ids.archive(selection)
+
+# --------------------
+
+client.ids.getStatus(selection)
+
+# --------------------
+
+data = client.getData([ds])
+
+# --------------------
+
+client.ids.getStatus(selection)
+data = client.getData([ds])
+len(data.read())
+
+# --------------------
+
+preparedId = client.prepareData(selection)
+preparedId
+
+# --------------------
+
+client.isDataPrepared(preparedId)
+data = client.getData(preparedId)
+buffer = BytesIO(data.read())
+with ZipFile(buffer) as zipfile:
+    zipfile.namelist()
diff --git a/doc/tutorial/search.py b/doc/tutorial/search.py
new file mode 100644
index 00000000..4d2d12f4
--- /dev/null
+++ b/doc/tutorial/search.py
@@ -0,0 +1,200 @@
+# Tutorial / Working with objects in the ICAT server
+# interactive code blocks
+
+client.search("SELECT f FROM Facility f INCLUDE f.parameterTypes LIMIT 1,1")
+
+# Building advanced queries
+
+from icat.query import Query
+
+# --------------------
+
+query = Query(client, "Investigation")
+print(query)
+client.search(query)
+
+# --------------------
+
+query = Query(client, "Investigation", conditions={"name": "= '10100601-ST'"})
+print(query)
+client.search(query)
+
+# --------------------
+
+query = Query(client, "Investigation", conditions={"name": "= '10100601-ST'"}, includes=["datasets"])
+print(query)
+client.search(query)
+
+# --------------------
+
+query = Query(client, "Investigation", conditions={"LENGTH(title)": "= 18"})
+print(query)
+client.search(query)
+
+# --------------------
+
+conditions = {
+    "investigation.name": "= '10100601-ST'",
+    "parameters.type.name": "= 'Magnetic field'",
+    "parameters.type.units": "= 'T'",
+    "parameters.numericValue": "> 5.0",
+}
+query = Query(client, "Dataset", conditions=conditions, includes=["parameters.type"])
+print(query)
+client.search(query)
+
+# --------------------
+
+def get_investigation(client, name, visitId=None):
+    query = Query(client, "Investigation")
+    query.addConditions({"name": "= '%s'" % name})
+    if visitId is not None:
+        query.addConditions({"visitId": "= '%s'" % visitId})
+    print(query)
+    return client.assertedSearch(query)[0]
+
+get_investigation(client, "08100122-EF")
+get_investigation(client, "12100409-ST", "1.1-P")
+
+# --------------------
+
+conditions = {
+    "datafileCreateTime": [">= '2012-01-01'", "< '2013-01-01'"]
+}
+query = Query(client, "Datafile", conditions=conditions)
+print(query)
+client.search(query)
+
+# --------------------
+
+query = Query(client, "Datafile")
+query.addConditions({"datafileCreateTime": ">= '2012-01-01'"})
+query.addConditions({"datafileCreateTime": "< '2013-01-01'"})
+print(query)
+
+# --------------------
+
+query = Query(client, "Dataset", attributes="name")
+print(query)
+client.search(query)
+
+# --------------------
+
+query = Query(client, "Dataset", attributes=["investigation.name", "name", "complete", "type.name"])
+print(query)
+client.search(query)
+
+# --------------------
+
+query = Query(client, "Dataset", aggregate="COUNT")
+print(query)
+client.search(query)
+
+# --------------------
+
+conditions = {
+    "dataset.investigation.name": "= '10100601-ST'",
+    "type.name": "= 'Magnetic field'",
+    "type.units": "= 'T'",
+}
+query = Query(client, "DatasetParameter", conditions=conditions, attributes="numericValue")
+print(query)
+client.search(query)
+query.setAggregate("MIN")
+print(query)
+client.search(query)
+query.setAggregate("MAX")
+print(query)
+client.search(query)
+query.setAggregate("AVG")
+print(query)
+client.search(query)
+
+# --------------------
+
+conditions = {
+    "datasets.parameters.type.name": "= 'Magnetic field'",
+    "datasets.parameters.type.units": "= 'T'",
+}
+query = Query(client, "Investigation", conditions=conditions)
+print(query)
+client.search(query)
+
+# --------------------
+
+query.setAggregate("DISTINCT")
+print(query)
+client.search(query)
+
+# --------------------
+
+conditions = {
+    "datasets.parameters.type.name": "= 'Magnetic field'",
+    "datasets.parameters.type.units": "= 'T'",
+}
+query = Query(client, "Investigation", conditions=conditions, aggregate="COUNT")
+print(query)
+client.search(query)
+query.setAggregate("COUNT:DISTINCT")
+print(query)
+client.search(query)
+
+# --------------------
+
+order = ["type.name", "type.units", ("numericValue", "DESC")]
+query = Query(client, "DatasetParameter", includes=["type"], order=order)
+print(query)
+client.search(query)
+
+# --------------------
+
+query = Query(client, "User", conditions={"fullName": "IS NOT NULL"}, order=[("LENGTH(fullName)", "DESC")])
+print(query)
+for user in client.search(query):
+    print("%d: %s" % (len(user.fullName), user.fullName))
+
+# --------------------
+
+query = Query(client, "Dataset", order=[("endDate", "DESC")], limit=(2, 1))
+print(query)
+client.search(query)
+
+# Useful search methods
+
+res = client.search(Query(client, "Facility"))
+if not res:
+    raise RuntimeError("Facility not found")
+elif len(res) > 1:
+    raise RuntimeError("Facility not unique")
+
+facility = res[0]
+facility = client.assertedSearch(Query(client, "Facility"))[0]
+
+# --------------------
+
+for ds in client.searchChunked(Query(client, "Dataset")):
+    # do something useful with the dataset ds ...
+    print(ds.name)
+
+# --------------------
+
+def get_dataset(client, inv_name, ds_name, ds_type="raw"):
+    """Get a dataset in an investigation.
+    If it already exists, search and return it, create it, if not.
+    """
+    try:
+        dataset = client.new("Dataset")
+        query = Query(client, "Investigation", conditions={
+            "name": "= '%s'" % inv_name
+        })
+        dataset.investigation = client.assertedSearch(query)[0]
+        query = Query(client, "DatasetType", conditions={
+            "name": "= '%s'" % ds_type
+        })
+        dataset.type = client.assertedSearch(query)[0]
+        dataset.complete = False
+        dataset.name = ds_name
+        dataset.create()
+    except icat.ICATObjectExistsError:
+        dataset = client.searchMatching(dataset)
+    return dataset

From 4e3bbe39a03abbde2ec3970285f397b27a34deb7 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Fri, 22 Dec 2023 14:52:12 +0100
Subject: [PATCH 10/43] Fix overly long lines in interactive tutorial examples

---
 doc/src/tutorial-create.rst |  3 ++-
 doc/src/tutorial-ids.rst    | 19 ++++++++++++++-----
 doc/src/tutorial-search.rst | 30 +++++++++++++++++++++---------
 doc/tutorial/create.py      |  3 ++-
 doc/tutorial/ids.py         | 19 ++++++++++++++-----
 doc/tutorial/search.py      | 30 +++++++++++++++++++++---------
 6 files changed, 74 insertions(+), 30 deletions(-)

diff --git a/doc/src/tutorial-create.rst b/doc/src/tutorial-create.rst
index c6c56ea8..07977db1 100644
--- a/doc/src/tutorial-create.rst
+++ b/doc/src/tutorial-create.rst
@@ -132,7 +132,8 @@ created together with the ``ParameterType`` object.
 
 We can verify this by searching for the newly created objects::
 
-  >>> query = "SELECT pt FROM ParameterType pt INCLUDE pt.facility, pt.permissibleStringValues"
+  >>> query = ("SELECT pt FROM ParameterType pt "
+  ...          "INCLUDE pt.facility, pt.permissibleStringValues")
   >>> client.search(query)
   [(parameterType){
      createId = "simple/root"
diff --git a/doc/src/tutorial-ids.rst b/doc/src/tutorial-ids.rst
index 0ce2748c..c71d221e 100644
--- a/doc/src/tutorial-ids.rst
+++ b/doc/src/tutorial-ids.rst
@@ -54,10 +54,12 @@ We need a dataset in ICAT that the uploaded files should be put into,
 so let's create one::
 
   >>> from icat.query import Query
-  >>> investigation = client.assertedSearch(Query(client, "Investigation", conditions={"name": "= '12100409-ST'"}))[0]
+  >>> query = Query(client, "Investigation", conditions={"name": "= '12100409-ST'"})
+  >>> investigation = client.assertedSearch(query)[0]
   >>> dataset = client.new("Dataset")
   >>> dataset.investigation = investigation
-  >>> dataset.type = client.assertedSearch(Query(client, "DatasetType", conditions={"name": "= 'other'"}))[0]
+  >>> query = Query(client, "DatasetType", conditions={"name": "= 'other'"})
+  >>> dataset.type = client.assertedSearch(query)[0]
   >>> dataset.name = "greetings"
   >>> dataset.complete = False
   >>> dataset.create()
@@ -65,9 +67,13 @@ so let's create one::
 For each of the files, we create a new datafile object and call the
 :meth:`~icat.client.Client.putData` method to upload it::
 
-  >>> df_format = client.assertedSearch(Query(client, "DatafileFormat", conditions={"name": "= 'Text'"}))[0]
+  >>> query = Query(client, "DatafileFormat", conditions={"name": "= 'Text'"})
+  >>> df_format = client.assertedSearch(query)[0]
   >>> for fname in ("greet-jdoe.txt", "greet-nbour.txt", "greet-rbeck.txt"):
-  ...     datafile = client.new("Datafile", name=fname, dataset=dataset, datafileFormat=df_format)
+  ...     datafile = client.new("Datafile",
+  ...                           name=fname,
+  ...                           dataset=dataset,
+  ...                           datafileFormat=df_format)
   ...     client.putData(fname, datafile)
   ...
   (datafile){
@@ -125,7 +131,10 @@ Download files
 We can request a download of a set of data using the
 :meth:`~icat.client.Client.getData` method::
 
-  >>> query = Query(client, "Datafile", conditions={"name": "= 'greet-jdoe.txt'", "dataset.name": "= 'greetings'"})
+  >>> query = Query(client, "Datafile", conditions={
+  ...     "name": "= 'greet-jdoe.txt'",
+  ...     "dataset.name": "= 'greetings'"
+  ... })
   >>> df = client.assertedSearch(query)[0]
   >>> data = client.getData([df])
   >>> type(data)
diff --git a/doc/src/tutorial-search.rst b/doc/src/tutorial-search.rst
index ed9843ae..9d1c5fec 100644
--- a/doc/src/tutorial-search.rst
+++ b/doc/src/tutorial-search.rst
@@ -122,7 +122,8 @@ appropriate condition.  The `conditions` argument to
 :class:`~icat.query.Query` should be a mapping of attribute names to
 conditions on that attribute::
 
-  >>> query = Query(client, "Investigation", conditions={"name": "= '10100601-ST'"})
+  >>> query = Query(client, "Investigation",
+  ...               conditions={"name": "= '10100601-ST'"})
   >>> print(query)
   SELECT o FROM Investigation o WHERE o.name = '10100601-ST'
   >>> client.search(query)
@@ -144,7 +145,9 @@ conditions on that attribute::
 
 We may also include related objects in the search results::
 
-  >>> query = Query(client, "Investigation", conditions={"name": "= '10100601-ST'"}, includes=["datasets"])
+  >>> query = Query(client, "Investigation",
+  ...               conditions={"name": "= '10100601-ST'"},
+  ...               includes=["datasets"])
   >>> print(query)
   SELECT o FROM Investigation o WHERE o.name = '10100601-ST' INCLUDE o.datasets
   >>> client.search(query)
@@ -208,7 +211,8 @@ python-icat supports the use of some JPQL functions when specifying
 which attribute a condition should be applied to.  Consider the
 following query::
 
-  >>> query = Query(client, "Investigation", conditions={"LENGTH(title)": "= 18"})
+  >>> query = Query(client, "Investigation",
+  ...               conditions={"LENGTH(title)": "= 18"})
   >>> print(query)
   SELECT o FROM Investigation o WHERE LENGTH(o.title) = 18
   >>> client.search(query)
@@ -253,7 +257,8 @@ field larger then 5 Tesla and include its parameters in the result::
   ...     "parameters.type.units": "= 'T'",
   ...     "parameters.numericValue": "> 5.0",
   ... }
-  >>> query = Query(client, "Dataset", conditions=conditions, includes=["parameters.type"])
+  >>> query = Query(client, "Dataset",
+  ...               conditions=conditions, includes=["parameters.type"])
   >>> print(query)
   SELECT o FROM Dataset o JOIN o.investigation AS i JOIN o.parameters AS p JOIN p.type AS pt WHERE i.name = '10100601-ST' AND p.numericValue > 5.0 AND pt.name = 'Magnetic field' AND pt.units = 'T' INCLUDE o.parameters AS p, p.type
   >>> client.search(query)
@@ -456,7 +461,9 @@ multiple attributes at once.  The result will be a tuple of attribute
 values rather then a single value for each object found in the query.
 This requires an ICAT server version 4.11 or newer though::
 
-  >>> query = Query(client, "Dataset", attributes=["investigation.name", "name", "complete", "type.name"])
+  >>> query = Query(client, "Dataset", attributes=[
+  ...     "investigation.name", "name", "complete", "type.name"
+  ... ])
   >>> print(query)
   SELECT i.name, o.name, o.complete, t.name FROM Dataset o JOIN o.investigation AS i JOIN o.type AS t
   >>> client.search(query)
@@ -485,7 +492,8 @@ average magnetic field applied in the measurements::
   ...     "type.name": "= 'Magnetic field'",
   ...     "type.units": "= 'T'",
   ... }
-  >>> query = Query(client, "DatasetParameter", conditions=conditions, attributes="numericValue")
+  >>> query = Query(client, "DatasetParameter",
+  ...               conditions=conditions, attributes="numericValue")
   >>> print(query)
   SELECT o.numericValue FROM DatasetParameter o JOIN o.dataset AS ds JOIN ds.investigation AS i JOIN o.type AS t WHERE i.name = '10100601-ST' AND t.name = 'Magnetic field' AND t.units = 'T'
   >>> client.search(query)
@@ -578,7 +586,8 @@ make sure not to count the same object more then once::
   ...     "datasets.parameters.type.name": "= 'Magnetic field'",
   ...     "datasets.parameters.type.units": "= 'T'",
   ... }
-  >>> query = Query(client, "Investigation", conditions=conditions, aggregate="COUNT")
+  >>> query = Query(client, "Investigation",
+  ...               conditions=conditions, aggregate="COUNT")
   >>> print(query)
   SELECT COUNT(o) FROM Investigation o JOIN o.datasets AS s1 JOIN s1.parameters AS s2 JOIN s2.type AS s3 WHERE s3.name = 'Magnetic field' AND s3.units = 'T'
   >>> client.search(query)
@@ -761,7 +770,9 @@ in the `order` argument to :class:`~icat.query.Query`.  Let's search
 for user sorted by the length of their name, from longest to
 shortest::
 
-  >>> query = Query(client, "User", conditions={"fullName": "IS NOT NULL"}, order=[("LENGTH(fullName)", "DESC")])
+  >>> query = Query(client, "User", conditions={
+  ...     "fullName": "IS NOT NULL"
+  ... }, order=[("LENGTH(fullName)", "DESC")])
   >>> print(query)
   SELECT o FROM User o WHERE o.fullName IS NOT NULL ORDER BY LENGTH(o.fullName) DESC
   >>> for user in client.search(query):
@@ -782,7 +793,8 @@ shortest::
 We may limit the number of returned items.  Search for the third to
 last dataset to have been finished::
 
-  >>> query = Query(client, "Dataset", order=[("endDate", "DESC")], limit=(2, 1))
+  >>> query = Query(client, "Dataset",
+  ...               order=[("endDate", "DESC")], limit=(2, 1))
   >>> print(query)
   SELECT o FROM Dataset o ORDER BY o.endDate DESC LIMIT 2, 1
   >>> client.search(query)
diff --git a/doc/tutorial/create.py b/doc/tutorial/create.py
index 9a2fc841..c6ad80f0 100644
--- a/doc/tutorial/create.py
+++ b/doc/tutorial/create.py
@@ -45,7 +45,8 @@
 
 # --------------------
 
-query = "SELECT pt FROM ParameterType pt INCLUDE pt.facility, pt.permissibleStringValues"
+query = ("SELECT pt FROM ParameterType pt "
+         "INCLUDE pt.facility, pt.permissibleStringValues")
 client.search(query)
 
 # Access rules
diff --git a/doc/tutorial/ids.py b/doc/tutorial/ids.py
index 84bc12c1..f3156039 100644
--- a/doc/tutorial/ids.py
+++ b/doc/tutorial/ids.py
@@ -13,24 +13,33 @@
 # --------------------
 
 from icat.query import Query
-investigation = client.assertedSearch(Query(client, "Investigation", conditions={"name": "= '12100409-ST'"}))[0]
+query = Query(client, "Investigation", conditions={"name": "= '12100409-ST'"})
+investigation = client.assertedSearch(query)[0]
 dataset = client.new("Dataset")
 dataset.investigation = investigation
-dataset.type = client.assertedSearch(Query(client, "DatasetType", conditions={"name": "= 'other'"}))[0]
+query = Query(client, "DatasetType", conditions={"name": "= 'other'"})
+dataset.type = client.assertedSearch(query)[0]
 dataset.name = "greetings"
 dataset.complete = False
 dataset.create()
 
 # --------------------
 
-df_format = client.assertedSearch(Query(client, "DatafileFormat", conditions={"name": "= 'Text'"}))[0]
+query = Query(client, "DatafileFormat", conditions={"name": "= 'Text'"})
+df_format = client.assertedSearch(query)[0]
 for fname in ("greet-jdoe.txt", "greet-nbour.txt", "greet-rbeck.txt"):
-    datafile = client.new("Datafile", name=fname, dataset=dataset, datafileFormat=df_format)
+    datafile = client.new("Datafile",
+                          name=fname,
+                          dataset=dataset,
+                          datafileFormat=df_format)
     client.putData(fname, datafile)
 
 # Download files
 
-query = Query(client, "Datafile", conditions={"name": "= 'greet-jdoe.txt'", "dataset.name": "= 'greetings'"})
+query = Query(client, "Datafile", conditions={
+    "name": "= 'greet-jdoe.txt'",
+    "dataset.name": "= 'greetings'"
+})
 df = client.assertedSearch(query)[0]
 data = client.getData([df])
 type(data)
diff --git a/doc/tutorial/search.py b/doc/tutorial/search.py
index 4d2d12f4..a697581e 100644
--- a/doc/tutorial/search.py
+++ b/doc/tutorial/search.py
@@ -15,19 +15,23 @@
 
 # --------------------
 
-query = Query(client, "Investigation", conditions={"name": "= '10100601-ST'"})
+query = Query(client, "Investigation",
+              conditions={"name": "= '10100601-ST'"})
 print(query)
 client.search(query)
 
 # --------------------
 
-query = Query(client, "Investigation", conditions={"name": "= '10100601-ST'"}, includes=["datasets"])
+query = Query(client, "Investigation",
+              conditions={"name": "= '10100601-ST'"},
+              includes=["datasets"])
 print(query)
 client.search(query)
 
 # --------------------
 
-query = Query(client, "Investigation", conditions={"LENGTH(title)": "= 18"})
+query = Query(client, "Investigation",
+              conditions={"LENGTH(title)": "= 18"})
 print(query)
 client.search(query)
 
@@ -39,7 +43,8 @@
     "parameters.type.units": "= 'T'",
     "parameters.numericValue": "> 5.0",
 }
-query = Query(client, "Dataset", conditions=conditions, includes=["parameters.type"])
+query = Query(client, "Dataset",
+              conditions=conditions, includes=["parameters.type"])
 print(query)
 client.search(query)
 
@@ -80,7 +85,9 @@ def get_investigation(client, name, visitId=None):
 
 # --------------------
 
-query = Query(client, "Dataset", attributes=["investigation.name", "name", "complete", "type.name"])
+query = Query(client, "Dataset", attributes=[
+    "investigation.name", "name", "complete", "type.name"
+])
 print(query)
 client.search(query)
 
@@ -97,7 +104,8 @@ def get_investigation(client, name, visitId=None):
     "type.name": "= 'Magnetic field'",
     "type.units": "= 'T'",
 }
-query = Query(client, "DatasetParameter", conditions=conditions, attributes="numericValue")
+query = Query(client, "DatasetParameter",
+              conditions=conditions, attributes="numericValue")
 print(query)
 client.search(query)
 query.setAggregate("MIN")
@@ -132,7 +140,8 @@ def get_investigation(client, name, visitId=None):
     "datasets.parameters.type.name": "= 'Magnetic field'",
     "datasets.parameters.type.units": "= 'T'",
 }
-query = Query(client, "Investigation", conditions=conditions, aggregate="COUNT")
+query = Query(client, "Investigation",
+              conditions=conditions, aggregate="COUNT")
 print(query)
 client.search(query)
 query.setAggregate("COUNT:DISTINCT")
@@ -148,14 +157,17 @@ def get_investigation(client, name, visitId=None):
 
 # --------------------
 
-query = Query(client, "User", conditions={"fullName": "IS NOT NULL"}, order=[("LENGTH(fullName)", "DESC")])
+query = Query(client, "User", conditions={
+    "fullName": "IS NOT NULL"
+}, order=[("LENGTH(fullName)", "DESC")])
 print(query)
 for user in client.search(query):
     print("%d: %s" % (len(user.fullName), user.fullName))
 
 # --------------------
 
-query = Query(client, "Dataset", order=[("endDate", "DESC")], limit=(2, 1))
+query = Query(client, "Dataset",
+              order=[("endDate", "DESC")], limit=(2, 1))
 print(query)
 client.search(query)
 

From c941a9d89ede2f115af91fd99c646e4ac508d066 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Tue, 2 Jan 2024 15:38:10 +0100
Subject: [PATCH 11/43] Restrict running ReST lint on push to branches develop
 and master

---
 .github/workflows/rst-lint.yaml | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/rst-lint.yaml b/.github/workflows/rst-lint.yaml
index b5e7c2fe..b9b239f7 100644
--- a/.github/workflows/rst-lint.yaml
+++ b/.github/workflows/rst-lint.yaml
@@ -1,5 +1,10 @@
 name: Check ReST input files
-on: [push, pull_request]
+on:
+  push:
+    branches:
+      - develop
+      - master
+  pull_request:
 jobs:
   doc8:
     runs-on: ubuntu-latest

From 16443e88aa6c3e9b4319c5f67243ad31d42577eb Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Tue, 2 Jan 2024 15:59:14 +0100
Subject: [PATCH 12/43] Documentation fix: move a version changed note from
 module icat.ingest to class icat.ingest.IngestReader

---
 doc/src/ingest.rst | 4 ----
 src/icat/ingest.py | 4 ++++
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/src/ingest.rst b/doc/src/ingest.rst
index e9abda8e..72eeb07a 100644
--- a/doc/src/ingest.rst
+++ b/doc/src/ingest.rst
@@ -55,10 +55,6 @@ the ``Dataset``.
 .. versionchanged:: 1.2.0
    add version 1.1 of the ingest file format, including references to samples
 
-.. versionchanged:: 1.3.0
-   drop class attribute :attr:`~icat.ingest.IngestReader.XSLT_name` in
-   favour of :attr:`~icat.ingest.IngestReader.XSLT_Map`.
-
 .. autoclass:: icat.ingest.IngestReader
     :members:
     :show-inheritance:
diff --git a/src/icat/ingest.py b/src/icat/ingest.py
index 57f15648..6c725a0f 100644
--- a/src/icat/ingest.py
+++ b/src/icat/ingest.py
@@ -37,6 +37,10 @@ class IngestReader(XMLDumpFileReader):
     :type investigation: :class:`icat.entity.Entity`
     :raise icat.exception.InvalidIngestFileError: if the input in
         metadata is not valid.
+
+    .. versionchanged:: 1.3.0
+       drop class attribute :attr:`~icat.ingest.IngestReader.XSLT_name`
+       in favour of :attr:`~icat.ingest.IngestReader.XSLT_Map`.
     """
 
     SchemaDir = Path("/usr/share/icat")

From 8446200535f437327cfea057c01e7cc1d302c326 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Tue, 2 Jan 2024 16:14:12 +0100
Subject: [PATCH 13/43] Minor doc config fixes

---
 doc/src/conf.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/src/conf.py b/doc/src/conf.py
index 38c5c319..2f880389 100644
--- a/doc/src/conf.py
+++ b/doc/src/conf.py
@@ -12,10 +12,10 @@
 maindir = Path(__file__).resolve().parent.parent.parent
 buildlib = maindir / "build" / "lib"
 sys.path[0] = str(buildlib)
+sys.dont_write_bytecode = True
 
 import icat._meta
 
-
 # -- Project information -----------------------------------------------------
 
 project = 'python-icat'
@@ -58,7 +58,7 @@
 #
 # This is also used if you do content translation via gettext catalogs.
 # Usually you set "language" from the command line for these cases.
-language = None
+language = 'en'
 
 # List of patterns, relative to source directory, that match files and
 # directories to ignore when looking for source files.

From af28f5de6ade53d8252c1c9021f4ded4ef56ee6f Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Wed, 3 Jan 2024 11:28:37 +0100
Subject: [PATCH 14/43] - Add a new section on file formats to the
 documentation - Move the subsection on ICAT data files from the icat.dumpfile
 module   reference into the new file formats section - Add a subsection on
 Metadata ingest files (only a section heading by   now)

---
 doc/src/dumpfile.rst        | 64 ++-----------------------------------
 doc/src/file-icatdata.rst   | 58 +++++++++++++++++++++++++++++++++
 doc/src/file-icatingest.rst |  6 ++++
 doc/src/fileformats.rst     | 11 +++++++
 doc/src/index.rst           |  1 +
 5 files changed, 78 insertions(+), 62 deletions(-)
 create mode 100644 doc/src/file-icatdata.rst
 create mode 100644 doc/src/file-icatingest.rst
 create mode 100644 doc/src/fileformats.rst

diff --git a/doc/src/dumpfile.rst b/doc/src/dumpfile.rst
index 1fc44d6e..d87e8c9f 100644
--- a/doc/src/dumpfile.rst
+++ b/doc/src/dumpfile.rst
@@ -6,8 +6,8 @@
 This module provides the base classes
 :class:`icat.dumpfile.DumpFileReader` and
 :class:`icat.dumpfile.DumpFileWriter` that define the API and the
-logic for reading and writing ICAT data files.  The actual work is
-done in file format specific backend modules that should provide
+logic for reading and writing :ref:`ICAT-data-files`.  The actual work
+is done in file format specific backend modules that should provide
 subclasses that must implement the abstract methods.
 
 .. autoclass:: icat.dumpfile.DumpFileReader
@@ -23,63 +23,3 @@ subclasses that must implement the abstract methods.
 .. autofunction:: icat.dumpfile.register_backend
 
 .. autofunction:: icat.dumpfile.open_dumpfile
-
-
-.. _ICAT-data-files:
-
-ICAT data files
----------------
-
-ICAT data files provide a way to serialize ICAT content to a flat
-file.  This section describes the logical structure of ICAT data
-files.  The actual file format depends on the backend, python-icat
-provides backends using XML and YAML.
-
-There is a one-to-one correspondence of the objects in the data
-file and the corresponding object in ICAT according to the ICAT
-schema, including all attributes and relations to other objects.
-Special unique keys are used to encode the relations.
-:meth:`icat.entity.Entity.getUniqueKey` may be used to get such a
-unique key for an entity object and
-:meth:`icat.client.Client.searchUniqueKey` may be used to search an
-object by its key.  Otherwise these keys should be considered as
-opaque ids.
-
-Data files are partitioned in chunks.  This is done to avoid having
-the whole file, e.g. the complete inventory of the ICAT, at once in
-memory.  The problem is that objects contain references to other
-objects (e.g. Datafiles refer to Datasets, the latter refer to
-Investigations, and so forth).  We keep an index of the objects in
-order to resolve these references.  But there is a memory versus time
-tradeoff: we cannot keep all the objects in the index, that would
-again mean the complete inventory of the ICAT.  And we can't know
-beforehand which object is going to be referenced later on, so we
-don't know which one to keep and which one to discard from the index.
-Fortunately we can query objects we discarded once back from the ICAT
-server.  But this is expensive.  So the strategy is as follows: keep
-all objects from the current chunk in the index and discard the
-complete index each time a chunk has been processed.  This will work
-fine if objects are mostly referencing other objects from the same
-chunk and only a few references go across chunk boundaries.
-
-Therefore, we want these chunks to be small enough to fit into memory,
-but at the same time large enough to keep as many relations between
-objects as possible local in a chunk.  It is in the responsibility of
-the writer of the data file to create the chunks in this manner.
-
-The objects that get written to the data file and how this file is
-organized is controlled by lists of ICAT search expressions, see
-:meth:`icat.dumpfile.DumpFileWriter.writeobjs`.  There is some degree
-of flexibility: an object may include related objects in an
-one-to-many relation, just by including them in the search expression.
-In this case, these related objects should not have a search
-expression on their own again.  For instance, the search expression
-for Grouping may include UserGroup.  The UserGroups will then be
-embedded in their respective grouping in the data file.  There should
-not be a search expression for UserGroup then.
-
-Objects related in a many-to-one relation must always be included in
-the search expression.  This is also true if the object is
-indirectly related to one of the included objects.  In this case,
-only a reference to the related object will be included in the data
-file.  The related object must have its own list entry.
diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst
new file mode 100644
index 00000000..b8d93ed1
--- /dev/null
+++ b/doc/src/file-icatdata.rst
@@ -0,0 +1,58 @@
+.. _ICAT-data-files:
+
+ICAT data files
+===============
+
+ICAT data files provide a way to serialize ICAT content to a flat
+file.  This section describes the logical structure of ICAT data
+files.  The actual file format depends on the backend, python-icat
+provides backends using XML and YAML.
+
+There is a one-to-one correspondence of the objects in the data
+file and the corresponding object in ICAT according to the ICAT
+schema, including all attributes and relations to other objects.
+Special unique keys are used to encode the relations.
+:meth:`icat.entity.Entity.getUniqueKey` may be used to get such a
+unique key for an entity object and
+:meth:`icat.client.Client.searchUniqueKey` may be used to search an
+object by its key.  Otherwise these keys should be considered as
+opaque ids.
+
+Data files are partitioned in chunks.  This is done to avoid having
+the whole file, e.g. the complete inventory of the ICAT, at once in
+memory.  The problem is that objects contain references to other
+objects (e.g. Datafiles refer to Datasets, the latter refer to
+Investigations, and so forth).  We keep an index of the objects in
+order to resolve these references.  But there is a memory versus time
+tradeoff: we cannot keep all the objects in the index, that would
+again mean the complete inventory of the ICAT.  And we can't know
+beforehand which object is going to be referenced later on, so we
+don't know which one to keep and which one to discard from the index.
+Fortunately we can query objects we discarded once back from the ICAT
+server.  But this is expensive.  So the strategy is as follows: keep
+all objects from the current chunk in the index and discard the
+complete index each time a chunk has been processed.  This will work
+fine if objects are mostly referencing other objects from the same
+chunk and only a few references go across chunk boundaries.
+
+Therefore, we want these chunks to be small enough to fit into memory,
+but at the same time large enough to keep as many relations between
+objects as possible local in a chunk.  It is in the responsibility of
+the writer of the data file to create the chunks in this manner.
+
+The objects that get written to the data file and how this file is
+organized is controlled by lists of ICAT search expressions, see
+:meth:`icat.dumpfile.DumpFileWriter.writeobjs`.  There is some degree
+of flexibility: an object may include related objects in an
+one-to-many relation, just by including them in the search expression.
+In this case, these related objects should not have a search
+expression on their own again.  For instance, the search expression
+for Grouping may include UserGroup.  The UserGroups will then be
+embedded in their respective grouping in the data file.  There should
+not be a search expression for UserGroup then.
+
+Objects related in a many-to-one relation must always be included in
+the search expression.  This is also true if the object is
+indirectly related to one of the included objects.  In this case,
+only a reference to the related object will be included in the data
+file.  The related object must have its own list entry.
diff --git a/doc/src/file-icatingest.rst b/doc/src/file-icatingest.rst
new file mode 100644
index 00000000..04954679
--- /dev/null
+++ b/doc/src/file-icatingest.rst
@@ -0,0 +1,6 @@
+.. _ICAT-ingest-files:
+
+Metadata ingest files
+=====================
+
+
diff --git a/doc/src/fileformats.rst b/doc/src/fileformats.rst
new file mode 100644
index 00000000..c90eaec1
--- /dev/null
+++ b/doc/src/fileformats.rst
@@ -0,0 +1,11 @@
+File formats
+============
+
+Some components of python-icat read input files or write output files.
+This section describes the file formats being used.
+
+.. toctree::
+   :maxdepth: 1
+
+   file-icatdata
+   file-icatingest
diff --git a/doc/src/index.rst b/doc/src/index.rst
index 1fdc3c09..a3d947c0 100644
--- a/doc/src/index.rst
+++ b/doc/src/index.rst
@@ -38,6 +38,7 @@ Parts of the documentation
    tutorial
    moduleref
    scripts
+   fileformats
    known-issues
    changelog
 

From 3b9367ece06dbe4c540f7d6ba5e6abe5b07966ed Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Wed, 3 Jan 2024 12:24:06 +0100
Subject: [PATCH 15/43] Review introduction of ICAT data files section

---
 doc/src/file-icatdata.rst | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst
index b8d93ed1..2dbb00c1 100644
--- a/doc/src/file-icatdata.rst
+++ b/doc/src/file-icatdata.rst
@@ -4,9 +4,16 @@ ICAT data files
 ===============
 
 ICAT data files provide a way to serialize ICAT content to a flat
-file.  This section describes the logical structure of ICAT data
-files.  The actual file format depends on the backend, python-icat
-provides backends using XML and YAML.
+file.  These files are read by the :ref:`icatingest` and written by
+the :ref:`icatdump` command line scripts respectively.  The program
+logic for reading and writing the files is provided by the
+:mod:`icat.dumpfile` module.
+
+The actual file format depends on the version of the ICAT schema and
+on the backend: python-icat provides backends using XML and YAML.
+
+Logical structure of ICAT data files
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 There is a one-to-one correspondence of the objects in the data
 file and the corresponding object in ICAT according to the ICAT

From 33f8650e84c06ee8bab79a7e80317d7584b5d8e8 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Wed, 3 Jan 2024 13:50:40 +0100
Subject: [PATCH 16/43] Some formulation review to the subsection on the
 structure of ICAT data files

---
 doc/src/file-icatdata.rst | 51 ++++++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 22 deletions(-)

diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst
index 2dbb00c1..06d8f70c 100644
--- a/doc/src/file-icatdata.rst
+++ b/doc/src/file-icatdata.rst
@@ -29,18 +29,19 @@ Data files are partitioned in chunks.  This is done to avoid having
 the whole file, e.g. the complete inventory of the ICAT, at once in
 memory.  The problem is that objects contain references to other
 objects (e.g. Datafiles refer to Datasets, the latter refer to
-Investigations, and so forth).  We keep an index of the objects in
-order to resolve these references.  But there is a memory versus time
-tradeoff: we cannot keep all the objects in the index, that would
-again mean the complete inventory of the ICAT.  And we can't know
-beforehand which object is going to be referenced later on, so we
-don't know which one to keep and which one to discard from the index.
-Fortunately we can query objects we discarded once back from the ICAT
-server.  But this is expensive.  So the strategy is as follows: keep
-all objects from the current chunk in the index and discard the
-complete index each time a chunk has been processed.  This will work
-fine if objects are mostly referencing other objects from the same
-chunk and only a few references go across chunk boundaries.
+Investigations, and so forth).  We keep an index of the objects as
+cache in order to resolve these references.  But there is a memory
+versus time tradeoff: we cannot keep all the objects in the index,
+that would again mean the complete inventory of the ICAT.  And we
+can't know beforehand which object is going to be referenced later on,
+so we don't know which one to keep and which one to discard from the
+index.  Fortunately we can query objects that we discarded once back
+from the ICAT server.  But this is expensive.  So the strategy is as
+follows: keep all objects from the current chunk in the index and
+discard the complete index each time a chunk has been
+processed. [#dc]_ This will work fine if objects are mostly
+referencing other objects from the same chunk and only a few
+references go across chunk boundaries.
 
 Therefore, we want these chunks to be small enough to fit into memory,
 but at the same time large enough to keep as many relations between
@@ -48,18 +49,24 @@ objects as possible local in a chunk.  It is in the responsibility of
 the writer of the data file to create the chunks in this manner.
 
 The objects that get written to the data file and how this file is
-organized is controlled by lists of ICAT search expressions, see
-:meth:`icat.dumpfile.DumpFileWriter.writeobjs`.  There is some degree
-of flexibility: an object may include related objects in an
-one-to-many relation, just by including them in the search expression.
-In this case, these related objects should not have a search
-expression on their own again.  For instance, the search expression
-for Grouping may include UserGroup.  The UserGroups will then be
-embedded in their respective grouping in the data file.  There should
-not be a search expression for UserGroup then.
+organized is controlled by lists of ICAT search expressions or entity
+objects, see :meth:`icat.dumpfile.DumpFileWriter.writeobjs`.  There is
+some degree of flexibility: an object may include related objects in
+an one-to-many relation.  In this case, these related objects should
+not be added on their own again.  For instance, you may write User,
+Grouping, and UserGroup as separate objects into the file.  In this
+case, the UserGroup entries must properly reference related User and
+Grouping.  Alternatively you may include the UserGroups in the
+corresponding Grouping objects.  In this case, you must not add the
+UserGroups again on their own.
 
 Objects related in a many-to-one relation must always be included in
 the search expression.  This is also true if the object is
 indirectly related to one of the included objects.  In this case,
 only a reference to the related object will be included in the data
-file.  The related object must have its own list entry.
+file.  The related object must have its own entry.
+
+
+.. [#dc] There is one exception: DataCollections don't have a
+         uniqueness constraint and can't reliably be searched by
+         attributes.  They are always kept in the index.

From da2415f3097965660069cf55b3558750b595c515 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Wed, 3 Jan 2024 15:10:23 +0100
Subject: [PATCH 17/43] Add simple ICAT data file examples

---
 doc/examples/icatdump-simple-1.xml  | 103 ++++++++++++++++++++++++++
 doc/examples/icatdump-simple-1.yaml |  71 ++++++++++++++++++
 doc/examples/icatdump-simple-2.xml  | 108 ++++++++++++++++++++++++++++
 doc/examples/icatdump-simple-2.yaml |  79 ++++++++++++++++++++
 4 files changed, 361 insertions(+)
 create mode 100644 doc/examples/icatdump-simple-1.xml
 create mode 100644 doc/examples/icatdump-simple-1.yaml
 create mode 100644 doc/examples/icatdump-simple-2.xml
 create mode 100644 doc/examples/icatdump-simple-2.yaml

diff --git a/doc/examples/icatdump-simple-1.xml b/doc/examples/icatdump-simple-1.xml
new file mode 100644
index 00000000..b2c23038
--- /dev/null
+++ b/doc/examples/icatdump-simple-1.xml
@@ -0,0 +1,103 @@
+<?xml version="1.0" encoding="utf-8"?>
+<icatdata>
+<head>
+  <date>2024-01-03T13:21:15+00:00</date>
+  <service>https://icat.example.com:8181/ICATService/ICAT?wsdl</service>
+  <apiversion>6.0.0</apiversion>
+  <generator>icatdump (python-icat 1.2.0)</generator>
+</head>
+<data>
+  <user id="User_name-db=2Fahau">
+    <affiliation>Goethe University Frankfurt, Faculty of Philosophy and History</affiliation>
+    <email>ahau@example.org</email>
+    <familyName>Hau</familyName>
+    <fullName>Arnold Hau</fullName>
+    <givenName>Arnold</givenName>
+    <name>db/ahau</name>
+    <orcidId>0000-0002-3263</orcidId>
+  </user>
+  <user id="User_name-db=2Fahau">
+    <affiliation>Goethe University Frankfurt, Faculty of Philosophy and History</affiliation>
+    <email>ahau@example.org</email>
+    <familyName>Hau</familyName>
+    <fullName>Arnold Hau</fullName>
+    <givenName>Arnold</givenName>
+    <name>db/ahau</name>
+    <orcidId>0000-0002-3263</orcidId>
+  </user>
+  <user id="User_name-db=2Fjbotu">
+    <affiliation>Universit&#233; Paul-Val&#233;ry Montpellier 3</affiliation>
+    <email>jbotu@example.org</email>
+    <familyName>Botul</familyName>
+    <fullName>Jean-Baptiste Botul</fullName>
+    <givenName>Jean-Baptiste</givenName>
+    <name>db/jbotu</name>
+    <orcidId>0000-0002-3264</orcidId>
+  </user>
+  <user id="User_name-db=2Fjdoe">
+    <email>jdoe@example.org</email>
+    <familyName>Doe</familyName>
+    <fullName>John Doe</fullName>
+    <givenName>John</givenName>
+    <name>db/jdoe</name>
+  </user>
+  <user id="User_name-db=2Fnbour">
+    <affiliation>University of Nancago</affiliation>
+    <email>nbour@example.org</email>
+    <familyName>Bourbaki</familyName>
+    <fullName>Nicolas Bourbaki</fullName>
+    <givenName>Nicolas</givenName>
+    <name>db/nbour</name>
+    <orcidId>0000-0002-3266</orcidId>
+  </user>
+  <grouping id="Grouping_name-investigation=5F10100601=2DST=5Fowner">
+    <name>investigation_10100601-ST_owner</name>
+    <userGroups>
+      <user ref="User_name-db=2Fahau"/>
+    </userGroups>
+  </grouping>
+  <grouping id="Grouping_name-investigation=5F10100601=2DST=5Freader">
+    <name>investigation_10100601-ST_reader</name>
+    <userGroups>
+      <user ref="User_name-db=2Fjbotu"/>
+    </userGroups>
+    <userGroups>
+      <user ref="User_name-db=2Fjdoe"/>
+    </userGroups>
+    <userGroups>
+      <user ref="User_name-db=2Fnbour"/>
+    </userGroups>
+  </grouping>
+  <grouping id="Grouping_name-investigation=5F10100601=2DST=5Fwriter">
+    <name>investigation_10100601-ST_writer</name>
+    <userGroups>
+      <user ref="User_name-db=2Fahau"/>
+    </userGroups>
+  </grouping>
+</data>
+<data>
+  <investigation id="Investigation_facility-(name-ESNF)_name-10100601=2DST_visitId-1=2E1=2DN">
+    <doi>DOI:00.0815/inv-00601</doi>
+    <endDate>2010-10-12T15:00:00+00:00</endDate>
+    <fileCount>4</fileCount>
+    <fileSize>127125</fileSize>
+    <name>10100601-ST</name>
+    <startDate>2010-09-30T10:27:24+00:00</startDate>
+    <title>Ni-Mn-Ga flat cone</title>
+    <visitId>1.1-N</visitId>
+    <facility ref="Facility_name-ESNF"/>
+    <investigationGroups>
+      <role>owner</role>
+      <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Fowner"/>
+    </investigationGroups>
+    <investigationGroups>
+      <role>reader</role>
+      <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Freader"/>
+    </investigationGroups>
+    <investigationGroups>
+      <role>writer</role>
+      <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Fwriter"/>
+    </investigationGroups>
+  </investigation>
+</data>
+</icatdata>
diff --git a/doc/examples/icatdump-simple-1.yaml b/doc/examples/icatdump-simple-1.yaml
new file mode 100644
index 00000000..26648f3b
--- /dev/null
+++ b/doc/examples/icatdump-simple-1.yaml
@@ -0,0 +1,71 @@
+%YAML 1.1
+# Date: Wed, 03 Jan 2024 13:24:51 +0000
+# Service: https://icat.example.com:8181/ICATService/ICAT?wsdl
+# ICAT-API: 6.0.0
+# Generator: icatdump (python-icat 1.2.0)
+---
+grouping:
+  Grouping_name-investigation=5F10100601=2DST=5Fowner:
+    name: investigation_10100601-ST_owner
+    userGroups:
+    - user: User_name-db=2Fahau
+  Grouping_name-investigation=5F10100601=2DST=5Freader:
+    name: investigation_10100601-ST_reader
+    userGroups:
+    - user: User_name-db=2Fjbotu
+    - user: User_name-db=2Fjdoe
+    - user: User_name-db=2Fnbour
+  Grouping_name-investigation=5F10100601=2DST=5Fwriter:
+    name: investigation_10100601-ST_writer
+    userGroups:
+    - user: User_name-db=2Fahau
+user:
+  User_name-db=2Fahau:
+    affiliation: Goethe University Frankfurt, Faculty of Philosophy and History
+    email: ahau@example.org
+    familyName: Hau
+    fullName: Arnold Hau
+    givenName: Arnold
+    name: db/ahau
+    orcidId: 0000-0002-3263
+  User_name-db=2Fjbotu:
+    affiliation: "Universit\xE9 Paul-Val\xE9ry Montpellier 3"
+    email: jbotu@example.org
+    familyName: Botul
+    fullName: Jean-Baptiste Botul
+    givenName: Jean-Baptiste
+    name: db/jbotu
+    orcidId: 0000-0002-3264
+  User_name-db=2Fjdoe:
+    email: jdoe@example.org
+    familyName: Doe
+    fullName: John Doe
+    givenName: John
+    name: db/jdoe
+  User_name-db=2Fnbour:
+    affiliation: University of Nancago
+    email: nbour@example.org
+    familyName: Bourbaki
+    fullName: Nicolas Bourbaki
+    givenName: Nicolas
+    name: db/nbour
+    orcidId: 0000-0002-3266
+---
+investigation:
+  Investigation_facility-(name-ESNF)_name-10100601=2DST_visitId-1=2E1=2DN:
+    doi: DOI:00.0815/inv-00601
+    endDate: '2010-10-12T15:00:00+00:00'
+    facility: Facility_name-ESNF
+    fileCount: 4
+    fileSize: 127125
+    investigationGroups:
+    - grouping: Grouping_name-investigation=5F10100601=2DST=5Fowner
+      role: owner
+    - grouping: Grouping_name-investigation=5F10100601=2DST=5Freader
+      role: reader
+    - grouping: Grouping_name-investigation=5F10100601=2DST=5Fwriter
+      role: writer
+    name: 10100601-ST
+    startDate: '2010-09-30T10:27:24+00:00'
+    title: Ni-Mn-Ga flat cone
+    visitId: 1.1-N
diff --git a/doc/examples/icatdump-simple-2.xml b/doc/examples/icatdump-simple-2.xml
new file mode 100644
index 00000000..1c309602
--- /dev/null
+++ b/doc/examples/icatdump-simple-2.xml
@@ -0,0 +1,108 @@
+<?xml version="1.0" encoding="utf-8"?>
+<icatdata>
+<head>
+  <date>2024-01-03T13:27:37+00:00</date>
+  <service>https://icat.example.com:8181/ICATService/ICAT?wsdl</service>
+  <apiversion>6.0.0</apiversion>
+  <generator>icatdump (python-icat 1.2.0)</generator>
+</head>
+<data>
+  <user id="User_name-db=2Fahau">
+    <affiliation>Goethe University Frankfurt, Faculty of Philosophy and History</affiliation>
+    <email>ahau@example.org</email>
+    <familyName>Hau</familyName>
+    <fullName>Arnold Hau</fullName>
+    <givenName>Arnold</givenName>
+    <name>db/ahau</name>
+    <orcidId>0000-0002-3263</orcidId>
+  </user>
+  <user id="User_name-db=2Fahau">
+    <affiliation>Goethe University Frankfurt, Faculty of Philosophy and History</affiliation>
+    <email>ahau@example.org</email>
+    <familyName>Hau</familyName>
+    <fullName>Arnold Hau</fullName>
+    <givenName>Arnold</givenName>
+    <name>db/ahau</name>
+    <orcidId>0000-0002-3263</orcidId>
+  </user>
+  <user id="User_name-db=2Fjbotu">
+    <affiliation>Universit&#233; Paul-Val&#233;ry Montpellier 3</affiliation>
+    <email>jbotu@example.org</email>
+    <familyName>Botul</familyName>
+    <fullName>Jean-Baptiste Botul</fullName>
+    <givenName>Jean-Baptiste</givenName>
+    <name>db/jbotu</name>
+    <orcidId>0000-0002-3264</orcidId>
+  </user>
+  <user id="User_name-db=2Fjdoe">
+    <email>jdoe@example.org</email>
+    <familyName>Doe</familyName>
+    <fullName>John Doe</fullName>
+    <givenName>John</givenName>
+    <name>db/jdoe</name>
+  </user>
+  <user id="User_name-db=2Fnbour">
+    <affiliation>University of Nancago</affiliation>
+    <email>nbour@example.org</email>
+    <familyName>Bourbaki</familyName>
+    <fullName>Nicolas Bourbaki</fullName>
+    <givenName>Nicolas</givenName>
+    <name>db/nbour</name>
+    <orcidId>0000-0002-3266</orcidId>
+  </user>
+  <grouping id="Grouping_name-investigation=5F10100601=2DST=5Fowner">
+    <name>investigation_10100601-ST_owner</name>
+  </grouping>
+  <grouping id="Grouping_name-investigation=5F10100601=2DST=5Freader">
+    <name>investigation_10100601-ST_reader</name>
+  </grouping>
+  <grouping id="Grouping_name-investigation=5F10100601=2DST=5Fwriter">
+    <name>investigation_10100601-ST_writer</name>
+  </grouping>
+  <userGroup id="UserGroup_user-(name-db=2Fahau)_grouping-(name-investigation=5F10100601=2DST=5Fowner)">
+    <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Fowner"/>
+    <user ref="User_name-db=2Fahau"/>
+  </userGroup>
+  <userGroup id="UserGroup_user-(name-db=2Fahau)_grouping-(name-investigation=5F10100601=2DST=5Fwriter)">
+    <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Fwriter"/>
+    <user ref="User_name-db=2Fahau"/>
+  </userGroup>
+  <userGroup id="UserGroup_user-(name-db=2Fjbotu)_grouping-(name-investigation=5F10100601=2DST=5Freader)">
+    <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Freader"/>
+    <user ref="User_name-db=2Fjbotu"/>
+  </userGroup>
+  <userGroup id="UserGroup_user-(name-db=2Fjdoe)_grouping-(name-investigation=5F10100601=2DST=5Freader)">
+    <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Freader"/>
+    <user ref="User_name-db=2Fjdoe"/>
+  </userGroup>
+  <userGroup id="UserGroup_user-(name-db=2Fnbour)_grouping-(name-investigation=5F10100601=2DST=5Freader)">
+    <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Freader"/>
+    <user ref="User_name-db=2Fnbour"/>
+  </userGroup>
+</data>
+<data>
+  <investigation id="Investigation_facility-(name-ESNF)_name-10100601=2DST_visitId-1=2E1=2DN">
+    <doi>DOI:00.0815/inv-00601</doi>
+    <endDate>2010-10-12T15:00:00+00:00</endDate>
+    <fileCount>4</fileCount>
+    <fileSize>127125</fileSize>
+    <name>10100601-ST</name>
+    <startDate>2010-09-30T10:27:24+00:00</startDate>
+    <title>Ni-Mn-Ga flat cone</title>
+    <visitId>1.1-N</visitId>
+    <facility ref="Facility_name-ESNF"/>
+    <investigationGroups>
+      <role>owner</role>
+      <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Fowner"/>
+    </investigationGroups>
+    <investigationGroups>
+      <role>reader</role>
+      <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Freader"/>
+    </investigationGroups>
+    <investigationGroups>
+      <role>writer</role>
+      <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Fwriter"/>
+    </investigationGroups>
+  </investigation>
+</data>
+</icatdata>
diff --git a/doc/examples/icatdump-simple-2.yaml b/doc/examples/icatdump-simple-2.yaml
new file mode 100644
index 00000000..79e4a296
--- /dev/null
+++ b/doc/examples/icatdump-simple-2.yaml
@@ -0,0 +1,79 @@
+%YAML 1.1
+# Date: Wed, 03 Jan 2024 13:27:52 +0000
+# Service: https://icat.example.com:8181/ICATService/ICAT?wsdl
+# ICAT-API: 6.0.0
+# Generator: icatdump (python-icat 1.2.0)
+---
+grouping:
+  Grouping_name-investigation=5F10100601=2DST=5Fowner:
+    name: investigation_10100601-ST_owner
+  Grouping_name-investigation=5F10100601=2DST=5Freader:
+    name: investigation_10100601-ST_reader
+  Grouping_name-investigation=5F10100601=2DST=5Fwriter:
+    name: investigation_10100601-ST_writer
+user:
+  User_name-db=2Fahau:
+    affiliation: Goethe University Frankfurt, Faculty of Philosophy and History
+    email: ahau@example.org
+    familyName: Hau
+    fullName: Arnold Hau
+    givenName: Arnold
+    name: db/ahau
+    orcidId: 0000-0002-3263
+  User_name-db=2Fjbotu:
+    affiliation: "Universit\xE9 Paul-Val\xE9ry Montpellier 3"
+    email: jbotu@example.org
+    familyName: Botul
+    fullName: Jean-Baptiste Botul
+    givenName: Jean-Baptiste
+    name: db/jbotu
+    orcidId: 0000-0002-3264
+  User_name-db=2Fjdoe:
+    email: jdoe@example.org
+    familyName: Doe
+    fullName: John Doe
+    givenName: John
+    name: db/jdoe
+  User_name-db=2Fnbour:
+    affiliation: University of Nancago
+    email: nbour@example.org
+    familyName: Bourbaki
+    fullName: Nicolas Bourbaki
+    givenName: Nicolas
+    name: db/nbour
+    orcidId: 0000-0002-3266
+userGroup:
+  UserGroup_user-(name-db=2Fahau)_grouping-(name-investigation=5F10100601=2DST=5Fowner):
+    grouping: Grouping_name-investigation=5F10100601=2DST=5Fowner
+    user: User_name-db=2Fahau
+  UserGroup_user-(name-db=2Fahau)_grouping-(name-investigation=5F10100601=2DST=5Fwriter):
+    grouping: Grouping_name-investigation=5F10100601=2DST=5Fwriter
+    user: User_name-db=2Fahau
+  UserGroup_user-(name-db=2Fjbotu)_grouping-(name-investigation=5F10100601=2DST=5Freader):
+    grouping: Grouping_name-investigation=5F10100601=2DST=5Freader
+    user: User_name-db=2Fjbotu
+  UserGroup_user-(name-db=2Fjdoe)_grouping-(name-investigation=5F10100601=2DST=5Freader):
+    grouping: Grouping_name-investigation=5F10100601=2DST=5Freader
+    user: User_name-db=2Fjdoe
+  UserGroup_user-(name-db=2Fnbour)_grouping-(name-investigation=5F10100601=2DST=5Freader):
+    grouping: Grouping_name-investigation=5F10100601=2DST=5Freader
+    user: User_name-db=2Fnbour
+---
+investigation:
+  Investigation_facility-(name-ESNF)_name-10100601=2DST_visitId-1=2E1=2DN:
+    doi: DOI:00.0815/inv-00601
+    endDate: '2010-10-12T15:00:00+00:00'
+    facility: Facility_name-ESNF
+    fileCount: 4
+    fileSize: 127125
+    investigationGroups:
+    - grouping: Grouping_name-investigation=5F10100601=2DST=5Fowner
+      role: owner
+    - grouping: Grouping_name-investigation=5F10100601=2DST=5Freader
+      role: reader
+    - grouping: Grouping_name-investigation=5F10100601=2DST=5Fwriter
+      role: writer
+    name: 10100601-ST
+    startDate: '2010-09-30T10:27:24+00:00'
+    title: Ni-Mn-Ga flat cone
+    visitId: 1.1-N

From af0ee8dc1fd9b401b73ca6644863da8c1fb2315d Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Wed, 3 Jan 2024 17:15:03 +0100
Subject: [PATCH 18/43] Add subsections on ICAT data XML files and on ICAT data
 YAML files including the example data files, but no other content yet

---
 doc/src/file-icatdata.rst | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst
index 06d8f70c..a0126383 100644
--- a/doc/src/file-icatdata.rst
+++ b/doc/src/file-icatdata.rst
@@ -66,6 +66,30 @@ indirectly related to one of the included objects.  In this case,
 only a reference to the related object will be included in the data
 file.  The related object must have its own entry.
 
+ICAT data XML files
+~~~~~~~~~~~~~~~~~~~
+
+In this section we describe the ICAT data file format using the XML
+backend.
+
+.. literalinclude:: ../examples/icatdump-simple-1.xml
+   :language: xml
+
+.. literalinclude:: ../examples/icatdump-simple-2.xml
+   :language: xml
+
+ICAT data YAML files
+~~~~~~~~~~~~~~~~~~~~
+
+In this section we describe the ICAT data file format using the YAML
+backend.
+
+.. literalinclude:: ../examples/icatdump-simple-1.yaml
+   :language: yaml
+
+.. literalinclude:: ../examples/icatdump-simple-2.yaml
+   :language: yaml
+
 
 .. [#dc] There is one exception: DataCollections don't have a
          uniqueness constraint and can't reliably be searched by

From 4ad8e9e9dd3b48907dab282c46ecf037f55cb2d6 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Wed, 3 Jan 2024 19:09:50 +0100
Subject: [PATCH 19/43] Add the text content for the subsection on ICAT data
 XML files

---
 doc/src/file-icatdata.rst | 73 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst
index a0126383..b856e625 100644
--- a/doc/src/file-icatdata.rst
+++ b/doc/src/file-icatdata.rst
@@ -70,13 +70,83 @@ ICAT data XML files
 ~~~~~~~~~~~~~~~~~~~
 
 In this section we describe the ICAT data file format using the XML
-backend.
+backend.  Consider the following example:
 
 .. literalinclude:: ../examples/icatdump-simple-1.xml
    :language: xml
 
+The root element of ICAT data XML files is ``icatdata``.  It may
+optionally have one ``head`` subelement and one or more ``data``
+subelements.
+
+The ``head`` element will be ignored by :ref:`icatingest`.  It serves
+to provide some information on the context of the creation of the data
+file, which may be useful for debugging in case of issues.
+
+The content of each ``data`` element is one chunk according to the
+logical structure explained above.  The present example contains two
+chunks.  Each element within the ``data`` element corresponds to an
+ICAT object according to the ICAT schema.  In the present example, the
+first chunk contains five User objects and three Grouping objects.
+The second chunk only contains one Investigation.
+
+These object elements should have an ``id`` attribute that may be used
+to reference the object in relations later on.  The ``id`` value has
+no meaning other than this file internal referencing between objects.
+The subelements of the object elements correspond to the object's
+attributes and relations in the ICAT schema.  All many-to-one
+relations must be provided and reference already existing objects,
+e.g. they must either already have existed before starting the
+ingestion or appear earlier in the ICAT data file than the referencing
+object, so that they will be created earlier.  The related object may
+either be referenced by id using the special attribute ``ref`` or by
+the related object's attribute values, using XML attributes of the
+same name.  In the latter case, the attribute values must uniquely
+define the related object.
+
+The object elements may include one-to-many relations.  In this case,
+the related objects will be created along with the parent in one
+single cascading call.  Alternatively, these related objects may be
+added separately as subelements of the ``data`` element later in the
+file.  In the present example, the Grouping object include their
+related UserGroup objects.  Note that these UserGroups include their
+relation to the User.  The User object is referenced by their
+respective id in the ``ref`` attribute.  But the UserGroups do not
+include their relation with Grouping.  That relationship is implied by
+the parent relation of the object in the file.
+
+In a similar way, the Investigation in the second chunk includes
+related InvestigationGroups that will be created along with the
+Investigation.  The InvestigationGroup objects include a reference to
+the corresponding Grouping.  Note that these references go across
+chunk boundaries.  The index that caches the object ids to resolve
+object relations from the first chunk that did contain the ids of the
+Groupings will already have been discarded from memeory when the
+second chunk is read.  But the references use the key that can be
+passed to :meth:`icat.client.Client.searchUniqueKey` to search these
+Groupings from ICAT.
+
+Finally note the the file format also depends on the ICAT schema
+version: the present example can only be ingested into ICAT server 5.0
+or newer, because the attributes fileCount and fileSize have been
+added to Investigation in this version.  With older ICAT versions, it
+will fail because the attributes are not defined.
+
+Consider a second example, it defines a subset of the same content
+as the previous example:
+
 .. literalinclude:: ../examples/icatdump-simple-2.xml
    :language: xml
+   :lines: 1-9,28-52,56-58,70-82,108
+
+The difference is that we now add the Usergroup objects separately in
+direct subelements of ``data`` instead of including them in the
+related Grouping objects.
+
+You will find more extensive examples in the source distribution of
+python-icat.  The distribution also provides XML Schema Definition
+files for the ICAT data XML file format corresponding to various ICAT
+schema versions.
 
 ICAT data YAML files
 ~~~~~~~~~~~~~~~~~~~~
@@ -89,6 +159,7 @@ backend.
 
 .. literalinclude:: ../examples/icatdump-simple-2.yaml
    :language: yaml
+   :lines: 1-7,10-11,14,23-45,52-60
 
 
 .. [#dc] There is one exception: DataCollections don't have a

From ea4b9d60f6035f4a9604dc2579de23232a9b2e73 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Sat, 6 Jan 2024 18:59:53 +0100
Subject: [PATCH 20/43] Typo

---
 doc/src/scripts.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/scripts.rst b/doc/src/scripts.rst
index 82f57d75..f944efde 100644
--- a/doc/src/scripts.rst
+++ b/doc/src/scripts.rst
@@ -2,7 +2,7 @@ Command line scripts
 ====================
 
 This section provides a reference for the command line scripts that
-are alongside with python-icat.
+are installed alongside with python-icat.
 
 .. toctree::
    :maxdepth: 1

From e62dc5ae52a8f4a835e96fad9a4b1976181b19e1 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Tue, 16 Jan 2024 12:36:27 +0100
Subject: [PATCH 21/43] - Review documentation Section "ICAT data XML files",
 adding more   inline examples - Drop icatdump-simple-2.xml example, rename
 icatdump-simple-1.xml to   icatdump-simple.xml

---
 doc/examples/icatdump-simple-2.xml            | 108 ------------------
 ...tdump-simple-1.xml => icatdump-simple.xml} |   0
 doc/src/file-icatdata.rst                     | 108 +++++++++++++-----
 3 files changed, 80 insertions(+), 136 deletions(-)
 delete mode 100644 doc/examples/icatdump-simple-2.xml
 rename doc/examples/{icatdump-simple-1.xml => icatdump-simple.xml} (100%)

diff --git a/doc/examples/icatdump-simple-2.xml b/doc/examples/icatdump-simple-2.xml
deleted file mode 100644
index 1c309602..00000000
--- a/doc/examples/icatdump-simple-2.xml
+++ /dev/null
@@ -1,108 +0,0 @@
-<?xml version="1.0" encoding="utf-8"?>
-<icatdata>
-<head>
-  <date>2024-01-03T13:27:37+00:00</date>
-  <service>https://icat.example.com:8181/ICATService/ICAT?wsdl</service>
-  <apiversion>6.0.0</apiversion>
-  <generator>icatdump (python-icat 1.2.0)</generator>
-</head>
-<data>
-  <user id="User_name-db=2Fahau">
-    <affiliation>Goethe University Frankfurt, Faculty of Philosophy and History</affiliation>
-    <email>ahau@example.org</email>
-    <familyName>Hau</familyName>
-    <fullName>Arnold Hau</fullName>
-    <givenName>Arnold</givenName>
-    <name>db/ahau</name>
-    <orcidId>0000-0002-3263</orcidId>
-  </user>
-  <user id="User_name-db=2Fahau">
-    <affiliation>Goethe University Frankfurt, Faculty of Philosophy and History</affiliation>
-    <email>ahau@example.org</email>
-    <familyName>Hau</familyName>
-    <fullName>Arnold Hau</fullName>
-    <givenName>Arnold</givenName>
-    <name>db/ahau</name>
-    <orcidId>0000-0002-3263</orcidId>
-  </user>
-  <user id="User_name-db=2Fjbotu">
-    <affiliation>Universit&#233; Paul-Val&#233;ry Montpellier 3</affiliation>
-    <email>jbotu@example.org</email>
-    <familyName>Botul</familyName>
-    <fullName>Jean-Baptiste Botul</fullName>
-    <givenName>Jean-Baptiste</givenName>
-    <name>db/jbotu</name>
-    <orcidId>0000-0002-3264</orcidId>
-  </user>
-  <user id="User_name-db=2Fjdoe">
-    <email>jdoe@example.org</email>
-    <familyName>Doe</familyName>
-    <fullName>John Doe</fullName>
-    <givenName>John</givenName>
-    <name>db/jdoe</name>
-  </user>
-  <user id="User_name-db=2Fnbour">
-    <affiliation>University of Nancago</affiliation>
-    <email>nbour@example.org</email>
-    <familyName>Bourbaki</familyName>
-    <fullName>Nicolas Bourbaki</fullName>
-    <givenName>Nicolas</givenName>
-    <name>db/nbour</name>
-    <orcidId>0000-0002-3266</orcidId>
-  </user>
-  <grouping id="Grouping_name-investigation=5F10100601=2DST=5Fowner">
-    <name>investigation_10100601-ST_owner</name>
-  </grouping>
-  <grouping id="Grouping_name-investigation=5F10100601=2DST=5Freader">
-    <name>investigation_10100601-ST_reader</name>
-  </grouping>
-  <grouping id="Grouping_name-investigation=5F10100601=2DST=5Fwriter">
-    <name>investigation_10100601-ST_writer</name>
-  </grouping>
-  <userGroup id="UserGroup_user-(name-db=2Fahau)_grouping-(name-investigation=5F10100601=2DST=5Fowner)">
-    <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Fowner"/>
-    <user ref="User_name-db=2Fahau"/>
-  </userGroup>
-  <userGroup id="UserGroup_user-(name-db=2Fahau)_grouping-(name-investigation=5F10100601=2DST=5Fwriter)">
-    <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Fwriter"/>
-    <user ref="User_name-db=2Fahau"/>
-  </userGroup>
-  <userGroup id="UserGroup_user-(name-db=2Fjbotu)_grouping-(name-investigation=5F10100601=2DST=5Freader)">
-    <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Freader"/>
-    <user ref="User_name-db=2Fjbotu"/>
-  </userGroup>
-  <userGroup id="UserGroup_user-(name-db=2Fjdoe)_grouping-(name-investigation=5F10100601=2DST=5Freader)">
-    <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Freader"/>
-    <user ref="User_name-db=2Fjdoe"/>
-  </userGroup>
-  <userGroup id="UserGroup_user-(name-db=2Fnbour)_grouping-(name-investigation=5F10100601=2DST=5Freader)">
-    <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Freader"/>
-    <user ref="User_name-db=2Fnbour"/>
-  </userGroup>
-</data>
-<data>
-  <investigation id="Investigation_facility-(name-ESNF)_name-10100601=2DST_visitId-1=2E1=2DN">
-    <doi>DOI:00.0815/inv-00601</doi>
-    <endDate>2010-10-12T15:00:00+00:00</endDate>
-    <fileCount>4</fileCount>
-    <fileSize>127125</fileSize>
-    <name>10100601-ST</name>
-    <startDate>2010-09-30T10:27:24+00:00</startDate>
-    <title>Ni-Mn-Ga flat cone</title>
-    <visitId>1.1-N</visitId>
-    <facility ref="Facility_name-ESNF"/>
-    <investigationGroups>
-      <role>owner</role>
-      <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Fowner"/>
-    </investigationGroups>
-    <investigationGroups>
-      <role>reader</role>
-      <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Freader"/>
-    </investigationGroups>
-    <investigationGroups>
-      <role>writer</role>
-      <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Fwriter"/>
-    </investigationGroups>
-  </investigation>
-</data>
-</icatdata>
diff --git a/doc/examples/icatdump-simple-1.xml b/doc/examples/icatdump-simple.xml
similarity index 100%
rename from doc/examples/icatdump-simple-1.xml
rename to doc/examples/icatdump-simple.xml
diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst
index b856e625..fa82f96d 100644
--- a/doc/src/file-icatdata.rst
+++ b/doc/src/file-icatdata.rst
@@ -72,7 +72,7 @@ ICAT data XML files
 In this section we describe the ICAT data file format using the XML
 backend.  Consider the following example:
 
-.. literalinclude:: ../examples/icatdump-simple-1.xml
+.. literalinclude:: ../examples/icatdump-simple.xml
    :language: xml
 
 The root element of ICAT data XML files is ``icatdata``.  It may
@@ -88,7 +88,8 @@ logical structure explained above.  The present example contains two
 chunks.  Each element within the ``data`` element corresponds to an
 ICAT object according to the ICAT schema.  In the present example, the
 first chunk contains five User objects and three Grouping objects.
-The second chunk only contains one Investigation.
+The Groupings include related UserGroups.  The second chunk only
+contains one Investigation, including related investigationGroups.
 
 These object elements should have an ``id`` attribute that may be used
 to reference the object in relations later on.  The ``id`` value has
@@ -104,27 +105,87 @@ the related object's attribute values, using XML attributes of the
 same name.  In the latter case, the attribute values must uniquely
 define the related object.
 
+In the present example, consider the first grouping:
+
+.. code-block:: XML
+
+  <grouping id="Grouping_name-investigation=5F10100601=2DST=5Fowner">
+    <name>investigation_10100601-ST_owner</name>
+    <userGroups>
+      <user ref="User_name-db=2Fahau"/>
+    </userGroups>
+  </grouping>
+
+It includes a related userGroup object that in turn references a
+related User.  This User is referenced in the ``ref`` attribute using
+a key defined in the User's ``id`` attribute earlier in the file.
+Another example is how the Investigation references its Facility:
+
+.. code-block:: XML
+
+  <investigation>
+    <!--  ... -->
+    <facility ref="Facility_name-ESNF"/>
+    <!--  ... -->
+  </investigation>
+
+The Facility is not defined in the data file.  It is assumed to exist
+in ICAT before ingesting the file.  In this case, it must be
+referenced by the unique key that could have been obtained by calling
+``facility.getUniqueKey()``.  Alternatively, the Facility could have
+been referenced by attribute as in:
+
+.. code-block:: XML
+
+  <investigation>
+    <!--  ... -->
+    <facility name="ESNF"/>
+    <!--  ... -->
+  </investigation>
+
+
 The object elements may include one-to-many relations.  In this case,
 the related objects will be created along with the parent in one
-single cascading call.  Alternatively, these related objects may be
-added separately as subelements of the ``data`` element later in the
-file.  In the present example, the Grouping object include their
-related UserGroup objects.  Note that these UserGroups include their
-relation to the User.  The User object is referenced by their
-respective id in the ``ref`` attribute.  But the UserGroups do not
-include their relation with Grouping.  That relationship is implied by
-the parent relation of the object in the file.
-
-In a similar way, the Investigation in the second chunk includes
+single cascading call.  In the present example, the Grouping objects
+include their related UserGroup objects.  Note that these UserGroups
+include their relation to the User, but not their relation with
+Grouping.  The latter relationship is implied by the parent relation
+of the object in the file.
+
+As an alternative, the Usergroups could have been added to the file as
+separate objects as direct subelements of ``data`` as in:
+
+.. code-block:: XML
+
+  <data>
+    <user id="User_name-db=2Fahau">
+      <affiliation>Goethe University Frankfurt, Faculty of Philosophy and History</affiliation>
+      <email>ahau@example.org</email>
+      <familyName>Hau</familyName>
+      <fullName>Arnold Hau</fullName>
+      <givenName>Arnold</givenName>
+      <name>db/ahau</name>
+      <orcidId>0000-0002-3263</orcidId>
+    </user>
+    <grouping id="Grouping_name-investigation=5F10100601=2DST=5Fowner">
+      <name>investigation_10100601-ST_owner</name>
+    </grouping>
+    <userGroup id="UserGroup_user-(name-db=2Fahau)_grouping-(name-investigation=5F10100601=2DST=5Fowner)">
+      <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Fowner"/>
+      <user ref="User_name-db=2Fahau"/>
+    </userGroup>
+  </data>
+
+The Investigation in the second chunk in the present example includes
 related InvestigationGroups that will be created along with the
 Investigation.  The InvestigationGroup objects include a reference to
 the corresponding Grouping.  Note that these references go across
 chunk boundaries.  The index that caches the object ids to resolve
 object relations from the first chunk that did contain the ids of the
-Groupings will already have been discarded from memeory when the
-second chunk is read.  But the references use the key that can be
-passed to :meth:`icat.client.Client.searchUniqueKey` to search these
-Groupings from ICAT.
+Groupings will already have been discarded from memory when the second
+chunk is read.  But the references use the key that can be passed to
+:meth:`icat.client.Client.searchUniqueKey` to search these Groupings
+from ICAT.
 
 Finally note the the file format also depends on the ICAT schema
 version: the present example can only be ingested into ICAT server 5.0
@@ -132,21 +193,12 @@ or newer, because the attributes fileCount and fileSize have been
 added to Investigation in this version.  With older ICAT versions, it
 will fail because the attributes are not defined.
 
-Consider a second example, it defines a subset of the same content
-as the previous example:
-
-.. literalinclude:: ../examples/icatdump-simple-2.xml
-   :language: xml
-   :lines: 1-9,28-52,56-58,70-82,108
-
-The difference is that we now add the Usergroup objects separately in
-direct subelements of ``data`` instead of including them in the
-related Grouping objects.
-
 You will find more extensive examples in the source distribution of
 python-icat.  The distribution also provides XML Schema Definition
 files for the ICAT data XML file format corresponding to various ICAT
-schema versions.
+schema versions.  Note the these  XML Schema Definition
+files are provided for reference only.  The :ref:`icatingest` script
+does not validate its input.
 
 ICAT data YAML files
 ~~~~~~~~~~~~~~~~~~~~

From a472fc795e82bc9901d89f869529b063b494242b Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Tue, 16 Jan 2024 14:24:44 +0100
Subject: [PATCH 22/43] Fix duplicate user entry in example ICAT data file

---
 doc/examples/icatdump-simple.xml | 9 ---------
 doc/src/file-icatdata.rst        | 2 +-
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/doc/examples/icatdump-simple.xml b/doc/examples/icatdump-simple.xml
index b2c23038..63dc689d 100644
--- a/doc/examples/icatdump-simple.xml
+++ b/doc/examples/icatdump-simple.xml
@@ -7,15 +7,6 @@
   <generator>icatdump (python-icat 1.2.0)</generator>
 </head>
 <data>
-  <user id="User_name-db=2Fahau">
-    <affiliation>Goethe University Frankfurt, Faculty of Philosophy and History</affiliation>
-    <email>ahau@example.org</email>
-    <familyName>Hau</familyName>
-    <fullName>Arnold Hau</fullName>
-    <givenName>Arnold</givenName>
-    <name>db/ahau</name>
-    <orcidId>0000-0002-3263</orcidId>
-  </user>
   <user id="User_name-db=2Fahau">
     <affiliation>Goethe University Frankfurt, Faculty of Philosophy and History</affiliation>
     <email>ahau@example.org</email>
diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst
index fa82f96d..84d587b2 100644
--- a/doc/src/file-icatdata.rst
+++ b/doc/src/file-icatdata.rst
@@ -87,7 +87,7 @@ The content of each ``data`` element is one chunk according to the
 logical structure explained above.  The present example contains two
 chunks.  Each element within the ``data`` element corresponds to an
 ICAT object according to the ICAT schema.  In the present example, the
-first chunk contains five User objects and three Grouping objects.
+first chunk contains four User objects and three Grouping objects.
 The Groupings include related UserGroups.  The second chunk only
 contains one Investigation, including related investigationGroups.
 

From b3e30520d4dfce0bc92d028cd4f2afea31e5217e Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Tue, 16 Jan 2024 14:55:33 +0100
Subject: [PATCH 23/43] - Review documentation Section "ICAT data YAML files" -
 Drop icatdump-simple-2.yaml example, rename icatdump-simple-1.yaml   to
 icatdump-simple.yaml

---
 doc/examples/icatdump-simple-2.yaml           | 79 -------------------
 ...ump-simple-1.yaml => icatdump-simple.yaml} |  0
 doc/src/file-icatdata.rst                     | 73 +++++++++++++++--
 3 files changed, 67 insertions(+), 85 deletions(-)
 delete mode 100644 doc/examples/icatdump-simple-2.yaml
 rename doc/examples/{icatdump-simple-1.yaml => icatdump-simple.yaml} (100%)

diff --git a/doc/examples/icatdump-simple-2.yaml b/doc/examples/icatdump-simple-2.yaml
deleted file mode 100644
index 79e4a296..00000000
--- a/doc/examples/icatdump-simple-2.yaml
+++ /dev/null
@@ -1,79 +0,0 @@
-%YAML 1.1
-# Date: Wed, 03 Jan 2024 13:27:52 +0000
-# Service: https://icat.example.com:8181/ICATService/ICAT?wsdl
-# ICAT-API: 6.0.0
-# Generator: icatdump (python-icat 1.2.0)
----
-grouping:
-  Grouping_name-investigation=5F10100601=2DST=5Fowner:
-    name: investigation_10100601-ST_owner
-  Grouping_name-investigation=5F10100601=2DST=5Freader:
-    name: investigation_10100601-ST_reader
-  Grouping_name-investigation=5F10100601=2DST=5Fwriter:
-    name: investigation_10100601-ST_writer
-user:
-  User_name-db=2Fahau:
-    affiliation: Goethe University Frankfurt, Faculty of Philosophy and History
-    email: ahau@example.org
-    familyName: Hau
-    fullName: Arnold Hau
-    givenName: Arnold
-    name: db/ahau
-    orcidId: 0000-0002-3263
-  User_name-db=2Fjbotu:
-    affiliation: "Universit\xE9 Paul-Val\xE9ry Montpellier 3"
-    email: jbotu@example.org
-    familyName: Botul
-    fullName: Jean-Baptiste Botul
-    givenName: Jean-Baptiste
-    name: db/jbotu
-    orcidId: 0000-0002-3264
-  User_name-db=2Fjdoe:
-    email: jdoe@example.org
-    familyName: Doe
-    fullName: John Doe
-    givenName: John
-    name: db/jdoe
-  User_name-db=2Fnbour:
-    affiliation: University of Nancago
-    email: nbour@example.org
-    familyName: Bourbaki
-    fullName: Nicolas Bourbaki
-    givenName: Nicolas
-    name: db/nbour
-    orcidId: 0000-0002-3266
-userGroup:
-  UserGroup_user-(name-db=2Fahau)_grouping-(name-investigation=5F10100601=2DST=5Fowner):
-    grouping: Grouping_name-investigation=5F10100601=2DST=5Fowner
-    user: User_name-db=2Fahau
-  UserGroup_user-(name-db=2Fahau)_grouping-(name-investigation=5F10100601=2DST=5Fwriter):
-    grouping: Grouping_name-investigation=5F10100601=2DST=5Fwriter
-    user: User_name-db=2Fahau
-  UserGroup_user-(name-db=2Fjbotu)_grouping-(name-investigation=5F10100601=2DST=5Freader):
-    grouping: Grouping_name-investigation=5F10100601=2DST=5Freader
-    user: User_name-db=2Fjbotu
-  UserGroup_user-(name-db=2Fjdoe)_grouping-(name-investigation=5F10100601=2DST=5Freader):
-    grouping: Grouping_name-investigation=5F10100601=2DST=5Freader
-    user: User_name-db=2Fjdoe
-  UserGroup_user-(name-db=2Fnbour)_grouping-(name-investigation=5F10100601=2DST=5Freader):
-    grouping: Grouping_name-investigation=5F10100601=2DST=5Freader
-    user: User_name-db=2Fnbour
----
-investigation:
-  Investigation_facility-(name-ESNF)_name-10100601=2DST_visitId-1=2E1=2DN:
-    doi: DOI:00.0815/inv-00601
-    endDate: '2010-10-12T15:00:00+00:00'
-    facility: Facility_name-ESNF
-    fileCount: 4
-    fileSize: 127125
-    investigationGroups:
-    - grouping: Grouping_name-investigation=5F10100601=2DST=5Fowner
-      role: owner
-    - grouping: Grouping_name-investigation=5F10100601=2DST=5Freader
-      role: reader
-    - grouping: Grouping_name-investigation=5F10100601=2DST=5Fwriter
-      role: writer
-    name: 10100601-ST
-    startDate: '2010-09-30T10:27:24+00:00'
-    title: Ni-Mn-Ga flat cone
-    visitId: 1.1-N
diff --git a/doc/examples/icatdump-simple-1.yaml b/doc/examples/icatdump-simple.yaml
similarity index 100%
rename from doc/examples/icatdump-simple-1.yaml
rename to doc/examples/icatdump-simple.yaml
diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst
index 84d587b2..a568969e 100644
--- a/doc/src/file-icatdata.rst
+++ b/doc/src/file-icatdata.rst
@@ -143,7 +143,6 @@ been referenced by attribute as in:
     <!--  ... -->
   </investigation>
 
-
 The object elements may include one-to-many relations.  In this case,
 the related objects will be created along with the parent in one
 single cascading call.  In the present example, the Grouping objects
@@ -204,14 +203,76 @@ ICAT data YAML files
 ~~~~~~~~~~~~~~~~~~~~
 
 In this section we describe the ICAT data file format using the YAML
-backend.
+backend.  Consider the following example, it corresponds to the same
+ICAT content as the XML example above:
 
-.. literalinclude:: ../examples/icatdump-simple-1.yaml
+.. literalinclude:: ../examples/icatdump-simple.yaml
    :language: yaml
 
-.. literalinclude:: ../examples/icatdump-simple-2.yaml
-   :language: yaml
-   :lines: 1-7,10-11,14,23-45,52-60
+ICAT data YAML files start with a head consisting of a few comment
+lines, followed by one or more YAML documents.  YAML documents are
+separated by a line containing only ``---``.  The comments in the head
+provide some information on the context of the creation of the data
+file, which may be useful for debugging in case of issues.
+
+Each YAML document defines one chunk of data according to the logical
+structure explained above.  It consists of a mapping having the name
+of entity types in the ICAT schema as keys.  The values are in turn
+mappings that map object ids as key to ICAT object definitions as
+value.  The object id may be used to reference that object in
+relations later on.  It has no meaning other than this file internal
+referencing between objects.  In the present example, the first chunk
+contains four User objects and three Grouping objects.  The Groupings
+include related UserGroups.  The second chunk only contains one
+Investigation, including related investigationGroups.
+
+Each of the ICAT object definitions corresponds to an object in the
+ICAT schema.  It is again a mapping with the object's attribute and
+relation names as keys and corresponding values.  All many-to-one
+relations must be provided and reference existing objects, e.g. they
+must either already have existed before starting the ingestion or
+appear in the same or an earlier YAML document in the ICAT data file.
+The values of many-to-one relations are the related object's id,
+either as defined in the same YAML document or the unique key as
+returned by :meth:`icat.entity.Entity.getUniqueKey`.
+
+The object definitions may include one-to-many relations.  In this
+case, the value for the relation name is a list of object definitions
+for the related objects.  These related objects will be created along
+with the parent in one single cascading call.  In the present example,
+the Grouping objects include their related UserGroup objects.  Note
+that these UserGroups include their relation to the User, but not
+their relation with Grouping.  The latter relationship is implied by
+the parent relation of the object in the file.
+
+As an alternative, in the present example, the Usergroups could have
+been added to the file as separate objects as in:
+
+.. code-block:: YAML
+
+  ---
+  grouping:
+    Grouping_name-investigation=5F10100601=2DST=5Fowner:
+      name: investigation_10100601-ST_owner
+  user:
+    User_name-db=2Fahau:
+      affiliation: Goethe University Frankfurt, Faculty of Philosophy and History
+      email: ahau@example.org
+      familyName: Hau
+      fullName: Arnold Hau
+      givenName: Arnold
+      name: db/ahau
+      orcidId: 0000-0002-3263
+  userGroup:
+    UserGroup_user-(name-db=2Fahau)_grouping-(name-investigation=5F10100601=2DST=5Fowner):
+      grouping: Grouping_name-investigation=5F10100601=2DST=5Fowner
+      user: User_name-db=2Fahau
+  ---
+
+Note that the entries in the mappings have no inherent order.  The
+:ref:`icatingest` script uses a predefined order to read the ICAT
+entity types in order to make sure that referenced objects are created
+before any object that may reference them.
 
 
 .. [#dc] There is one exception: DataCollections don't have a

From acaff9d4ffe9987bc0324f2e4e01459dc5bd547c Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Tue, 16 Jan 2024 21:21:21 +0100
Subject: [PATCH 24/43] Review Section ICAT data files with respect to object
 references, add a Subsection References to ICAT objects and unique keys

---
 doc/src/file-icatdata.rst | 108 ++++++++++++++++++++------------------
 1 file changed, 57 insertions(+), 51 deletions(-)

diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst
index a568969e..6b8730cc 100644
--- a/doc/src/file-icatdata.rst
+++ b/doc/src/file-icatdata.rst
@@ -18,27 +18,19 @@ Logical structure of ICAT data files
 There is a one-to-one correspondence of the objects in the data
 file and the corresponding object in ICAT according to the ICAT
 schema, including all attributes and relations to other objects.
-Special unique keys are used to encode the relations.
-:meth:`icat.entity.Entity.getUniqueKey` may be used to get such a
-unique key for an entity object and
-:meth:`icat.client.Client.searchUniqueKey` may be used to search an
-object by its key.  Otherwise these keys should be considered as
-opaque ids.
 
 Data files are partitioned in chunks.  This is done to avoid having
 the whole file, e.g. the complete inventory of the ICAT, at once in
 memory.  The problem is that objects contain references to other
-objects (e.g. Datafiles refer to Datasets, the latter refer to
-Investigations, and so forth).  We keep an index of the objects as
+objects, e.g. Datafiles refer to Datasets, the latter refer to
+Investigations, and so forth.  We keep an index of the objects as
 cache in order to resolve these references.  But there is a memory
-versus time tradeoff: we cannot keep all the objects in the index,
-that would again mean the complete inventory of the ICAT.  And we
-can't know beforehand which object is going to be referenced later on,
-so we don't know which one to keep and which one to discard from the
-index.  Fortunately we can query objects that we discarded once back
-from the ICAT server.  But this is expensive.  So the strategy is as
-follows: keep all objects from the current chunk in the index and
-discard the complete index each time a chunk has been
+versus time tradeoff: in order to avoid the index to grow beyond
+bounds, objects need to be discarded from the index from time to time.
+References to objects that can not be resolved from the index need to
+be searched from the ICAT server, which of course is expensive.  So
+the strategy is as follows: keep all objects from the current chunk in
+the index and discard the complete index each time a chunk has been
 processed. [#dc]_ This will work fine if objects are mostly
 referencing other objects from the same chunk and only a few
 references go across chunk boundaries.
@@ -66,6 +58,26 @@ indirectly related to one of the included objects.  In this case,
 only a reference to the related object will be included in the data
 file.  The related object must have its own entry.
 
+References to ICAT objects and unique keys
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+References to related objects are encoded in ICAT data files by
+reference keys.  There are two kinds of those keys: local keys and
+unique keys.
+
+When an ICAT object is defined in the file, it generally defines a
+local key at the same time.  Local keys are stored in the object index
+and may be used to reference this object from other obejcts in the
+same data chunk.  Unique keys can be obtained from an object by
+calling :meth:`icat.entity.Entity.getUniqueKey`.  An object can be
+searched by its unique key from the ICAT server by calling
+:meth:`icat.client.Client.searchUniqueKey`.  As a result, it is
+possible to reference an object by its unique key even if the
+reference is not in the object index.  All references that go across
+chunk boundaries must use unique keys. [#dc]_
+
+Reference keys should be considered as opaque ids.
+
 ICAT data XML files
 ~~~~~~~~~~~~~~~~~~~
 
@@ -91,19 +103,17 @@ first chunk contains four User objects and three Grouping objects.
 The Groupings include related UserGroups.  The second chunk only
 contains one Investigation, including related investigationGroups.
 
-These object elements should have an ``id`` attribute that may be used
-to reference the object in relations later on.  The ``id`` value has
-no meaning other than this file internal referencing between objects.
-The subelements of the object elements correspond to the object's
-attributes and relations in the ICAT schema.  All many-to-one
-relations must be provided and reference already existing objects,
-e.g. they must either already have existed before starting the
-ingestion or appear earlier in the ICAT data file than the referencing
-object, so that they will be created earlier.  The related object may
-either be referenced by id using the special attribute ``ref`` or by
-the related object's attribute values, using XML attributes of the
-same name.  In the latter case, the attribute values must uniquely
-define the related object.
+These object elements may have an ``id`` attribute that defines a
+local key to reference the object later on.  The subelements of the
+object elements correspond to the object's attributes and relations in
+the ICAT schema.  All many-to-one relations must be provided and
+reference already existing objects, e.g. they must either already have
+existed before starting the ingestion or appear earlier in the ICAT
+data file than the referencing object, so that they will be created
+earlier.  The related object may either be referenced by reference key
+using the ``ref`` attribute or by the related object's attribute
+values, using XML attributes of the same name.  In the latter case,
+the attribute values must uniquely define the related object.
 
 In the present example, consider the first grouping:
 
@@ -118,8 +128,9 @@ In the present example, consider the first grouping:
 
 It includes a related userGroup object that in turn references a
 related User.  This User is referenced in the ``ref`` attribute using
-a key defined in the User's ``id`` attribute earlier in the file.
-Another example is how the Investigation references its Facility:
+a local key defined in the User's ``id`` attribute earlier in the
+file.  Another example is how the Investigation references its
+Facility:
 
 .. code-block:: XML
 
@@ -131,8 +142,7 @@ Another example is how the Investigation references its Facility:
 
 The Facility is not defined in the data file.  It is assumed to exist
 in ICAT before ingesting the file.  In this case, it must be
-referenced by the unique key that could have been obtained by calling
-``facility.getUniqueKey()``.  Alternatively, the Facility could have
+referenced by its unique key.  Alternatively, the Facility could have
 been referenced by attribute as in:
 
 .. code-block:: XML
@@ -179,14 +189,10 @@ The Investigation in the second chunk in the present example includes
 related InvestigationGroups that will be created along with the
 Investigation.  The InvestigationGroup objects include a reference to
 the corresponding Grouping.  Note that these references go across
-chunk boundaries.  The index that caches the object ids to resolve
-object relations from the first chunk that did contain the ids of the
-Groupings will already have been discarded from memory when the second
-chunk is read.  But the references use the key that can be passed to
-:meth:`icat.client.Client.searchUniqueKey` to search these Groupings
-from ICAT.
-
-Finally note the the file format also depends on the ICAT schema
+chunk boundaries.  Thus, unique keys for the Groupings need to be used
+here.
+
+Finally note that the file format also depends on the ICAT schema
 version: the present example can only be ingested into ICAT server 5.0
 or newer, because the attributes fileCount and fileSize have been
 added to Investigation in this version.  With older ICAT versions, it
@@ -219,12 +225,11 @@ Each YAML document defines one chunk of data according to the logical
 structure explained above.  It consists of a mapping having the name
 of entity types in the ICAT schema as keys.  The values are in turn
 mappings that map object ids as key to ICAT object definitions as
-value.  The object id may be used to reference that object in
-relations later on.  It has no meaning other than this file internal
-referencing between objects.  In the present example, the first chunk
-contains four User objects and three Grouping objects.  The Groupings
-include related UserGroups.  The second chunk only contains one
-Investigation, including related investigationGroups.
+value.  These object ids define local keys that may be used to
+reference the respective object later on.  In the present example, the
+first chunk contains four User objects and three Grouping objects.
+The Groupings include related UserGroups.  The second chunk only
+contains one Investigation, including related investigationGroups.
 
 Each of the ICAT object definitions corresponds to an object in the
 ICAT schema.  It is again a mapping with the object's attribute and
@@ -232,9 +237,8 @@ relation names as keys and corresponding values.  All many-to-one
 relations must be provided and reference existing objects, e.g. they
 must either already have existed before starting the ingestion or
 appear in the same or an earlier YAML document in the ICAT data file.
-The values of many-to-one relations are the related object's id,
-either as defined in the same YAML document or the unique key as
-returned by :meth:`icat.entity.Entity.getUniqueKey`.
+The values of many-to-one relations are reference keys, either local
+keys defined in the same YAML document or unique keys.
 
 The object definitions may include one-to-many relations.  In this
 case, the value for the relation name is a list of object definitions
@@ -277,4 +281,6 @@ before any object that may reference them.
 
 .. [#dc] There is one exception: DataCollections don't have a
          uniqueness constraint and can't reliably be searched by
-         attributes.  They are always kept in the index.
+         attributes.  Therefore local keys for DataCollections are
+         always kept in the object index and may be used to reference
+         them across chunk boundaries.

From 9c5085c63a4b92493a98cfc05adc0362c2f77968 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Thu, 18 Jan 2024 18:51:14 +0100
Subject: [PATCH 25/43] Rework Section ICAT data files once again

---
 doc/src/file-icatdata.rst | 189 +++++++++++++++++++-------------------
 1 file changed, 97 insertions(+), 92 deletions(-)

diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst
index 6b8730cc..57183153 100644
--- a/doc/src/file-icatdata.rst
+++ b/doc/src/file-icatdata.rst
@@ -15,20 +15,16 @@ on the backend: python-icat provides backends using XML and YAML.
 Logical structure of ICAT data files
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-There is a one-to-one correspondence of the objects in the data
-file and the corresponding object in ICAT according to the ICAT
-schema, including all attributes and relations to other objects.
-
 Data files are partitioned in chunks.  This is done to avoid having
 the whole file, e.g. the complete inventory of the ICAT, at once in
 memory.  The problem is that objects contain references to other
 objects, e.g. Datafiles refer to Datasets, the latter refer to
 Investigations, and so forth.  We keep an index of the objects as
-cache in order to resolve these references.  But there is a memory
+a cache in order to resolve these references.  But there is a memory
 versus time tradeoff: in order to avoid the index to grow beyond
 bounds, objects need to be discarded from the index from time to time.
 References to objects that can not be resolved from the index need to
-be searched from the ICAT server, which of course is expensive.  So
+be searched from the ICAT server, which is of course expensive.  So
 the strategy is as follows: keep all objects from the current chunk in
 the index and discard the complete index each time a chunk has been
 processed. [#dc]_ This will work fine if objects are mostly
@@ -40,37 +36,40 @@ but at the same time large enough to keep as many relations between
 objects as possible local in a chunk.  It is in the responsibility of
 the writer of the data file to create the chunks in this manner.
 
-The objects that get written to the data file and how this file is
-organized is controlled by lists of ICAT search expressions or entity
-objects, see :meth:`icat.dumpfile.DumpFileWriter.writeobjs`.  There is
-some degree of flexibility: an object may include related objects in
-an one-to-many relation.  In this case, these related objects should
-not be added on their own again.  For instance, you may write User,
-Grouping, and UserGroup as separate objects into the file.  In this
-case, the UserGroup entries must properly reference related User and
-Grouping.  Alternatively you may include the UserGroups in the
-corresponding Grouping objects.  In this case, you must not add the
-UserGroups again on their own.
-
-Objects related in a many-to-one relation must always be included in
-the search expression.  This is also true if the object is
-indirectly related to one of the included objects.  In this case,
-only a reference to the related object will be included in the data
-file.  The related object must have its own entry.
+The data chunks contain ICAT object definitions, e.g. serializations
+of individual ICAT objects, including all attribute values and
+many-to-one relations.  The many-to-one relations are provided as
+references to other objects that must exist in the ICAT server at the
+moment that this object definition is read.
+
+There is some degree of flexibility with respect to related objects in
+one-to-many relations: object definitions for these related objects
+may be included in the object definitions of the parent object.  When
+the parent is read, these related objects will be created along with
+the parent in one single cascading call.  Thus, the related objects
+must not be included again as a separate object in the ICAT data file.
+For instance, an ICAT data file may include User, Grouping, and
+UserGroup as separate objects.  In this case, the UserGroup entries
+must properly reference User and Grouping as their related objects.
+Alternatively the file may only contain User and Grouping objects,
+with the UserGroups being included into the object definition of the
+corresponding Grouping objects.
 
 References to ICAT objects and unique keys
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 References to related objects are encoded in ICAT data files by
-reference keys.  There are two kinds of those keys: local keys and
-unique keys.
+reference keys.  There are two kinds of those keys, local keys and
+unique keys:
 
 When an ICAT object is defined in the file, it generally defines a
 local key at the same time.  Local keys are stored in the object index
 and may be used to reference this object from other obejcts in the
-same data chunk.  Unique keys can be obtained from an object by
-calling :meth:`icat.entity.Entity.getUniqueKey`.  An object can be
-searched by its unique key from the ICAT server by calling
+same data chunk.
+
+Unique keys can be obtained from an object by calling
+:meth:`icat.entity.Entity.getUniqueKey`.  An object can be searched by
+its unique key from the ICAT server by calling
 :meth:`icat.client.Client.searchUniqueKey`.  As a result, it is
 possible to reference an object by its unique key even if the
 reference is not in the object index.  All references that go across
@@ -95,42 +94,80 @@ The ``head`` element will be ignored by :ref:`icatingest`.  It serves
 to provide some information on the context of the creation of the data
 file, which may be useful for debugging in case of issues.
 
-The content of each ``data`` element is one chunk according to the
-logical structure explained above.  The present example contains two
-chunks.  Each element within the ``data`` element corresponds to an
-ICAT object according to the ICAT schema.  In the present example, the
-first chunk contains four User objects and three Grouping objects.
-The Groupings include related UserGroups.  The second chunk only
-contains one Investigation, including related investigationGroups.
+The content of each ``data`` element is one chunk, its subelements are
+the ICAT object definitions according to the logical structure
+explained above.  The present example contains two chunks: the first
+chunk contains four User objects and three Grouping objects.  The
+Groupings include related UserGroups.  The second chunk only contains
+one Investigation, including related InvestigationGroups.
+
+The object elements may have an ``id`` attribute that define a local
+key to reference the object later on.  The subelements of the object
+elements correspond to the object's attributes and relations in the
+ICAT schema.  All many-to-one relations must be provided and reference
+already existing objects, e.g. they must either already have existed
+before starting the ingestion or appear earlier in the ICAT data file
+than the referencing object, so that they will be created earlier.
+The related object may either be referenced by reference key using the
+``ref`` attribute or by the related object's attribute values, using
+XML attributes of the same name.  In the latter case, the attribute
+values must uniquely define the related object.
+
+Consider a simplified version of the first chunk from the present
+example, defining only one User, Grouping and UserGroup respectively:
 
-These object elements may have an ``id`` attribute that defines a
-local key to reference the object later on.  The subelements of the
-object elements correspond to the object's attributes and relations in
-the ICAT schema.  All many-to-one relations must be provided and
-reference already existing objects, e.g. they must either already have
-existed before starting the ingestion or appear earlier in the ICAT
-data file than the referencing object, so that they will be created
-earlier.  The related object may either be referenced by reference key
-using the ``ref`` attribute or by the related object's attribute
-values, using XML attributes of the same name.  In the latter case,
-the attribute values must uniquely define the related object.
+.. code-block:: XML
 
-In the present example, consider the first grouping:
+  <data>
+    <user id="User_name-db=2Fahau">
+      <affiliation>Goethe University Frankfurt, Faculty of Philosophy and History</affiliation>
+      <email>ahau@example.org</email>
+      <familyName>Hau</familyName>
+      <fullName>Arnold Hau</fullName>
+      <givenName>Arnold</givenName>
+      <name>db/ahau</name>
+      <orcidId>0000-0002-3263</orcidId>
+    </user>
+    <grouping id="Grouping_name-investigation=5F10100601=2DST=5Fowner">
+      <name>investigation_10100601-ST_owner</name>
+      <userGroups>
+        <user ref="User_name-db=2Fahau"/>
+      </userGroups>
+    </grouping>
+  </data>
+
+The Grouping includes the related UserGroup object that in turn
+references the related User.  This User is referenced in the ``ref``
+attribute using a local key defined in the User's ``id`` attribute.
+Note that the UserGroup does not include its relation with Grouping.
+The latter relationship is implied by the parent relation of the
+object in the file.
+
+As an alternative, the Usergroup could have been added to the file as
+separate object as direct subelement of ``data``:
 
 .. code-block:: XML
 
-  <grouping id="Grouping_name-investigation=5F10100601=2DST=5Fowner">
-    <name>investigation_10100601-ST_owner</name>
-    <userGroups>
+  <data>
+    <user id="User_name-db=2Fahau">
+      <affiliation>Goethe University Frankfurt, Faculty of Philosophy and History</affiliation>
+      <email>ahau@example.org</email>
+      <familyName>Hau</familyName>
+      <fullName>Arnold Hau</fullName>
+      <givenName>Arnold</givenName>
+      <name>db/ahau</name>
+      <orcidId>0000-0002-3263</orcidId>
+    </user>
+    <grouping id="Grouping_name-investigation=5F10100601=2DST=5Fowner">
+      <name>investigation_10100601-ST_owner</name>
+    </grouping>
+    <userGroup id="UserGroup_user-(name-db=2Fahau)_grouping-(name-investigation=5F10100601=2DST=5Fowner)">
+      <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Fowner"/>
       <user ref="User_name-db=2Fahau"/>
-    </userGroups>
-  </grouping>
+    </userGroup>
+  </data>
 
-It includes a related userGroup object that in turn references a
-related User.  This User is referenced in the ``ref`` attribute using
-a local key defined in the User's ``id`` attribute earlier in the
-file.  Another example is how the Investigation references its
-Facility:
+Another example is how the Investigation references its Facility:
 
 .. code-block:: XML
 
@@ -153,44 +190,12 @@ been referenced by attribute as in:
     <!--  ... -->
   </investigation>
 
-The object elements may include one-to-many relations.  In this case,
-the related objects will be created along with the parent in one
-single cascading call.  In the present example, the Grouping objects
-include their related UserGroup objects.  Note that these UserGroups
-include their relation to the User, but not their relation with
-Grouping.  The latter relationship is implied by the parent relation
-of the object in the file.
-
-As an alternative, the Usergroups could have been added to the file as
-separate objects as direct subelements of ``data`` as in:
-
-.. code-block:: XML
-
-  <data>
-    <user id="User_name-db=2Fahau">
-      <affiliation>Goethe University Frankfurt, Faculty of Philosophy and History</affiliation>
-      <email>ahau@example.org</email>
-      <familyName>Hau</familyName>
-      <fullName>Arnold Hau</fullName>
-      <givenName>Arnold</givenName>
-      <name>db/ahau</name>
-      <orcidId>0000-0002-3263</orcidId>
-    </user>
-    <grouping id="Grouping_name-investigation=5F10100601=2DST=5Fowner">
-      <name>investigation_10100601-ST_owner</name>
-    </grouping>
-    <userGroup id="UserGroup_user-(name-db=2Fahau)_grouping-(name-investigation=5F10100601=2DST=5Fowner)">
-      <grouping ref="Grouping_name-investigation=5F10100601=2DST=5Fowner"/>
-      <user ref="User_name-db=2Fahau"/>
-    </userGroup>
-  </data>
-
 The Investigation in the second chunk in the present example includes
 related InvestigationGroups that will be created along with the
 Investigation.  The InvestigationGroup objects include a reference to
-the corresponding Grouping.  Note that these references go across
-chunk boundaries.  Thus, unique keys for the Groupings need to be used
-here.
+the corresponding Grouping respectively.  Note that these references
+go across chunk boundaries.  Thus, unique keys for the Groupings need
+to be used here.
 
 Finally note that the file format also depends on the ICAT schema
 version: the present example can only be ingested into ICAT server 5.0

From 0fc8b0030dd2b4e9ccea39cc68c0fdac4fcf1e87 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Thu, 18 Jan 2024 20:13:40 +0100
Subject: [PATCH 26/43] Drop most of the docstring from module icat.dumpfile as
 this is now much better explained in the online documentation

---
 src/icat/dumpfile.py | 40 ----------------------------------------
 1 file changed, 40 deletions(-)

diff --git a/src/icat/dumpfile.py b/src/icat/dumpfile.py
index 099f4364..c5c5a002 100644
--- a/src/icat/dumpfile.py
+++ b/src/icat/dumpfile.py
@@ -5,46 +5,6 @@
 writing ICAT data files.  The actual work is done in file format
 specific modules that should provide subclasses that must implement
 the abstract methods.
-
-Data files are partitioned in chunks.  This is done to avoid having
-the whole file, e.g. the complete inventory of the ICAT, at once in
-memory.  The problem is that objects contain references to other
-objects (e.g. Datafiles refer to Datasets, the latter refer to
-Investigations, and so forth).  We keep an index of the objects in
-order to resolve these references.  But there is a memory versus time
-tradeoff: we cannot keep all the objects in the index, that would
-again mean the complete inventory of the ICAT.  And we can't know
-beforehand which object is going to be referenced later on, so we
-don't know which one to keep and which one to discard from the index.
-Fortunately we can query objects we discarded once back from the ICAT
-server with :meth:`icat.client.Client.searchUniqueKey`.  But this is
-expensive.  So the strategy is as follows: keep all objects from the
-current chunk in the index and discard the complete index each time a
-chunk has been processed.  This will work fine if objects are mostly
-referencing other objects from the same chunk and only a few
-references go across chunk boundaries.
-
-Therefore, we want these chunks to be small enough to fit into memory,
-but at the same time large enough to keep as many relations between
-objects as possible local in a chunk.  It is in the responsibility of
-the writer of the data file to create the chunks in this manner.
-
-The objects that get written to the data file and how this file is
-organized is controlled by lists of ICAT search expressions, see
-:meth:`icat.dumpfile.DumpFileWriter.writeobjs`.  There is some degree
-of flexibility: an object may include related objects in an
-one-to-many relation, just by including them in the search expression.
-In this case, these related objects should not have a search
-expression on their own again.  For instance, the search expression
-for Grouping may include UserGroup.  The UserGroups will then be
-embedded in their respective grouping in the data file.  There should
-not be a search expression for UserGroup then.
-
-Objects related in a many-to-one relation must always be included in
-the search expression.  This is also true if the object is
-indirectly related to one of the included objects.  In this case,
-only a reference to the related object will be included in the data
-file.  The related object must have its own list entry.
 """
 
 from collections import ChainMap

From 504d09179465fb18946bd0a690bff349a22d25b4 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Thu, 18 Jan 2024 20:49:36 +0100
Subject: [PATCH 27/43] Indicate in the documentation of icat.dumpfile which
 methods of class icat.dumpfile.DumpFileReader and class
 icat.dumpfile.DumpFileWriter are abstract and thus need to implemented in the
 file format specific backend

---
 src/icat/dumpfile.py | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/src/icat/dumpfile.py b/src/icat/dumpfile.py
index c5c5a002..a18832a3 100644
--- a/src/icat/dumpfile.py
+++ b/src/icat/dumpfile.py
@@ -99,6 +99,9 @@ def getdata(self):
         specific to the implementing backend and should be passed as
         the `data` argument to
         :meth:`~icat.dumpfile.DumpFileReader.getobjs_from_data`.
+
+        This abstract method must be implemented in the file format
+        specific backend.
         """
         raise NotImplementedError
 
@@ -107,6 +110,9 @@ def getobjs_from_data(self, data, objindex):
 
         Yield a new entity object in each iteration.  The object is
         initialized from the data, but not yet created at the client.
+
+        This abstract method must be implemented in the file format
+        specific backend.
         """
         raise NotImplementedError
 
@@ -197,7 +203,11 @@ def __exit__(self, type, value, traceback):
             self.outfile.close()
 
     def head(self):
-        """Write a header with some meta information to the data file."""
+        """Write a header with some meta information to the data file.
+
+        This abstract method must be implemented in the file format
+        specific backend.
+        """
         raise NotImplementedError
 
     def startdata(self):
@@ -205,15 +215,26 @@ def startdata(self):
 
         If the current chunk contains any data, write it to the data
         file.
+
+        This abstract method must be implemented in the file format
+        specific backend.
         """
         raise NotImplementedError
 
     def writeobj(self, key, obj, keyindex):
-        """Add an entity object to the current data chunk."""
+        """Add an entity object to the current data chunk.
+
+        This abstract method must be implemented in the file format
+        specific backend.
+        """
         raise NotImplementedError
 
     def finalize(self):
-        """Finalize the data file."""
+        """Finalize the data file.
+
+        This abstract method must be implemented in the file format
+        specific backend.
+        """
         raise NotImplementedError
 
     def writeobjs(self, objs, keyindex, chunksize=100):

From a52714b1c066e55e43bef02cae7a37c767064cd9 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Fri, 19 Jan 2024 11:51:54 +0100
Subject: [PATCH 28/43] Minor language fixes

---
 doc/src/file-icatdata.rst | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst
index 57183153..97efd819 100644
--- a/doc/src/file-icatdata.rst
+++ b/doc/src/file-icatdata.rst
@@ -103,15 +103,15 @@ one Investigation, including related InvestigationGroups.
 
 The object elements may have an ``id`` attribute that define a local
 key to reference the object later on.  The subelements of the object
-elements correspond to the object's attributes and relations in the
-ICAT schema.  All many-to-one relations must be provided and reference
-already existing objects, e.g. they must either already have existed
-before starting the ingestion or appear earlier in the ICAT data file
-than the referencing object, so that they will be created earlier.
-The related object may either be referenced by reference key using the
-``ref`` attribute or by the related object's attribute values, using
-XML attributes of the same name.  In the latter case, the attribute
-values must uniquely define the related object.
+elements correspond to the object's attributes and relations according
+to the ICAT schema.  All many-to-one relations must be provided and
+reference already existing objects, e.g. they must either already have
+existed before starting the ingestion or appear earlier in the ICAT
+data file than the referencing object, so that they will be created
+earlier.  The related object may either be referenced by reference key
+using the ``ref`` attribute or by the related object's attribute
+values, using XML attributes of the same name.  In the latter case,
+the attribute values must uniquely define the related object.
 
 Consider a simplified version of the first chunk from the present
 example, defining only one User, Grouping and UserGroup respectively:
@@ -201,7 +201,7 @@ Finally note that the file format also depends on the ICAT schema
 version: the present example can only be ingested into ICAT server 5.0
 or newer, because the attributes fileCount and fileSize have been
 added to Investigation in this version.  With older ICAT versions, it
-will fail because the attributes are not defined.
+will fail because these attributes are not defined.
 
 You will find more extensive examples in the source distribution of
 python-icat.  The distribution also provides XML Schema Definition

From f0112b8f3fa88312ded8a6df1e110f1665176a7c Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Fri, 19 Jan 2024 14:04:54 +0100
Subject: [PATCH 29/43] Add first input to Section Metadata ingest files

---
 doc/src/file-icatdata.rst   |  8 ++++++
 doc/src/file-icatingest.rst | 51 +++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)

diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst
index 97efd819..73e84f3e 100644
--- a/doc/src/file-icatdata.rst
+++ b/doc/src/file-icatdata.rst
@@ -12,6 +12,8 @@ logic for reading and writing the files is provided by the
 The actual file format depends on the version of the ICAT schema and
 on the backend: python-icat provides backends using XML and YAML.
 
+.. _ICAT-data-files-structure:
+
 Logical structure of ICAT data files
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -55,6 +57,8 @@ Alternatively the file may only contain User and Grouping objects,
 with the UserGroups being included into the object definition of the
 corresponding Grouping objects.
 
+.. _ICAT-data-files-references:
+
 References to ICAT objects and unique keys
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -77,6 +81,8 @@ chunk boundaries must use unique keys. [#dc]_
 
 Reference keys should be considered as opaque ids.
 
+.. _ICAT-data-xml-files:
+
 ICAT data XML files
 ~~~~~~~~~~~~~~~~~~~
 
@@ -210,6 +216,8 @@ schema versions.  Note the these  XML Schema Definition
 files are provided for reference only.  The :ref:`icatingest` script
 does not validate its input.
 
+.. _ICAT-data-yaml-files:
+
 ICAT data YAML files
 ~~~~~~~~~~~~~~~~~~~~
 
diff --git a/doc/src/file-icatingest.rst b/doc/src/file-icatingest.rst
index 04954679..c7103833 100644
--- a/doc/src/file-icatingest.rst
+++ b/doc/src/file-icatingest.rst
@@ -3,4 +3,55 @@
 Metadata ingest files
 =====================
 
+Metadata ingest files are the input format for class
+:class:`icat.ingest.IngestReader`.  This class is intended to be uesd
+in scripts that read the metadata created by experimments into ICAT.
+The file format is basically a restricted version of
+:ref:`ICAT-data-xml-files`.
 
+The underlying idea is that ICAT data files are in principle suitable
+to encode the metadata to be ingested from the experiment.  The only
+problem is that this file format is too powerful: it can encode any
+ICAT content.  We want the ingest files from the experiment to create
+new Datasets and DatasetParameters, we certainly don't want these
+files to create new Instruments or Users in ICAT.  And we also want to
+control the Investigation that newly created Datasets will be added
+to.  It would be rather difficult to control the power of the input
+format if we would use plain ICAT data files for this purpose.
+
+Class :class:`icat.ingest.IngestReader` takes an ``investigation``
+argument.  We will refer to the Investigation given in this argument
+as the *prescribed Investigation* in the following.  The metadata
+ingest file format restricts ICAT data XML files in the following
+ways:
+
+* ingest files must contain one and only one  ``data`` element,
+  e.g. chunks according to the :ref:`ICAT-data-files-structure`.
+
+* the allowed object types are restricted to Dataset,
+  DatasetInstrument, DatasetTechnique, and DatasetParameter.
+
+* the attributes in the object definitions for Datasets are restricted
+  to name, description, startDate, and endDate.
+
+* object definitions for Datasets can not include a reference to the
+  related Investigation.  The relation with the prescribed
+  Investigation will be implied.
+
+* object definitions for Datasets can reference a related Sample only
+  by name or by pid.  A relation of the related Sample with the
+  prescribed Investigation will be implied.
+
+* references to the related Dataset in DatasetInstrument,
+  DatasetTechnique, and DatasetParameter definitions are restricted to
+  :ref:`local keys <ICAT-data-files-references>`.  These objects can
+  thus only relate to Datasets defined in the same ingest file.
+
+* other object references are restricted to reference by attributes.
+
+These restrictions are enforced by validating the input against an XML
+Schema Definition (XSD).
+
+Another change with respect to ICAT data XML files is that the name of
+the root element is ``icatingest`` and that it must have a ``version``
+attrbute.

From 048a98cb0d151a918262628601716cb7da1092c5 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Fri, 19 Jan 2024 14:29:11 +0100
Subject: [PATCH 30/43] Minor language fixes

---
 doc/src/file-icatingest.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/src/file-icatingest.rst b/doc/src/file-icatingest.rst
index c7103833..bf2af389 100644
--- a/doc/src/file-icatingest.rst
+++ b/doc/src/file-icatingest.rst
@@ -26,7 +26,7 @@ ingest file format restricts ICAT data XML files in the following
 ways:
 
 * ingest files must contain one and only one  ``data`` element,
-  e.g. chunks according to the :ref:`ICAT-data-files-structure`.
+  e.g. one chunk according to the :ref:`ICAT-data-files-structure`.
 
 * the allowed object types are restricted to Dataset,
   DatasetInstrument, DatasetTechnique, and DatasetParameter.
@@ -44,8 +44,8 @@ ways:
 
 * references to the related Dataset in DatasetInstrument,
   DatasetTechnique, and DatasetParameter definitions are restricted to
-  :ref:`local keys <ICAT-data-files-references>`.  These objects can
-  thus only relate to Datasets defined in the same ingest file.
+  :ref:`local keys <ICAT-data-files-references>`.  As a result, these
+  objects can only relate to Datasets defined in the same ingest file.
 
 * other object references are restricted to reference by attributes.
 

From dd7473f5d07a608fdeaf153008ad61d1a2428565 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Fri, 2 Feb 2024 15:50:11 +0100
Subject: [PATCH 31/43] Add an example to the Metadata ingest files Section of
 the documentation

---
 MANIFEST.in                 |  1 +
 doc/examples/metadata.xml   | 94 +++++++++++++++++++++++++++++++++++++
 doc/src/file-icatingest.rst | 34 ++++++++++++++
 3 files changed, 129 insertions(+)
 create mode 100644 doc/examples/metadata.xml

diff --git a/MANIFEST.in b/MANIFEST.in
index a7c92f8b..655665c1 100644
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -10,6 +10,7 @@ include doc/examples/icatdump-*.xml
 include doc/examples/icatdump-*.yaml
 include doc/examples/ingest-*.xml
 include doc/examples/metadata-*.xml
+include doc/examples/metadata.xml
 include doc/icatdata*.xsd
 include doc/man/*
 include doc/tutorial/*.py
diff --git a/doc/examples/metadata.xml b/doc/examples/metadata.xml
new file mode 100644
index 00000000..121b0432
--- /dev/null
+++ b/doc/examples/metadata.xml
@@ -0,0 +1,94 @@
+<?xml version='1.0' encoding='UTF-8'?>
+<icatingest version="1.1">
+  <head>
+    <date>2024-02-02T12:52:00+01:00</date>
+    <generator>metadata-writer 0.28</generator>
+  </head>
+  <data>
+    <dataset id="Dataset_1">
+      <name>e202553</name>
+      <description>Dy01Cp02 at 2.7 K</description>
+      <startDate>2020-09-30T18:02:17+02:00</startDate>
+      <endDate>2020-09-30T20:18:36+02:00</endDate>
+      <sample name="ab3465"/>
+      <datasetInstruments>
+        <instrument pid="DOI:00.0815/inst-00001"/>
+      </datasetInstruments>
+      <datasetTechniques>
+        <technique pid="PaNET:PaNET01217"/>
+      </datasetTechniques>
+    </dataset>
+    <dataset id="Dataset_2">
+      <name>e202554</name>
+      <description>Dy01Cp02 at 5.1 K</description>
+      <startDate>2020-09-30T20:29:19+02:00</startDate>
+      <endDate>2020-09-30T21:23:49+02:00</endDate>
+      <sample name="ab3465"/>
+      <datasetInstruments>
+        <instrument pid="DOI:00.0815/inst-00001"/>
+      </datasetInstruments>
+      <datasetTechniques>
+        <technique pid="PaNET:PaNET01217"/>
+      </datasetTechniques>
+    </dataset>
+    <dataset id="Dataset_3">
+      <name>e202555</name>
+      <description>Dy01Cp02 at 2.7 K</description>
+      <startDate>2020-09-30T21:35:16+02:00</startDate>
+      <endDate>2020-09-30T23:04:27+02:00</endDate>
+      <sample name="ab3466"/>
+      <datasetInstruments>
+        <instrument pid="DOI:00.0815/inst-00001"/>
+      </datasetInstruments>
+      <datasetTechniques>
+        <technique pid="PaNET:PaNET01217"/>
+      </datasetTechniques>
+    </dataset>
+    <dataset id="Dataset_4">
+      <name>e202556</name>
+      <description>reference</description>
+      <startDate>2020-09-30T23:04:31+02:00</startDate>
+      <endDate>2020-10-01T01:26:07+02:00</endDate>
+      <datasetInstruments>
+        <instrument pid="DOI:00.0815/inst-00001"/>
+      </datasetInstruments>
+      <datasetTechniques>
+        <technique pid="PaNET:PaNET01217"/>
+      </datasetTechniques>
+    </dataset>
+    <datasetParameter>
+      <stringValue>neutron</stringValue>
+      <dataset ref="Dataset_1"/>
+      <type name="Probe"/>
+    </datasetParameter>
+    <datasetParameter>
+      <numericValue>5.3</numericValue>
+      <dataset ref="Dataset_1"/>
+      <type name="Reactor power" units="MW"/>
+    </datasetParameter>
+    <datasetParameter>
+      <numericValue>2.74103</numericValue>
+      <rangeBottom>2.7408</rangeBottom>
+      <rangeTop>2.7414</rangeTop>
+      <dataset ref="Dataset_1"/>
+      <type name="Sample temperature" units="K"/>
+    </datasetParameter>
+    <datasetParameter>
+      <stringValue>neutron</stringValue>
+      <dataset ref="Dataset_2"/>
+      <type name="Probe"/>
+    </datasetParameter>
+    <datasetParameter>
+      <numericValue>5.3</numericValue>
+      <dataset ref="Dataset_2"/>
+      <type name="Reactor power" units="MW"/>
+    </datasetParameter>
+    <datasetParameter>
+      <numericValue>5.1239</numericValue>
+      <rangeBottom>5.1045</rangeBottom>
+      <rangeTop>5.1823</rangeTop>
+      <dataset ref="Dataset_2"/>
+      <type name="Sample temperature" units="K"/>
+    </datasetParameter>
+  </data>
+</icatingest>
diff --git a/doc/src/file-icatingest.rst b/doc/src/file-icatingest.rst
index bf2af389..2c650263 100644
--- a/doc/src/file-icatingest.rst
+++ b/doc/src/file-icatingest.rst
@@ -19,6 +19,9 @@ control the Investigation that newly created Datasets will be added
 to.  It would be rather difficult to control the power of the input
 format if we would use plain ICAT data files for this purpose.
 
+Differences compared to ICAT data XML files
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
 Class :class:`icat.ingest.IngestReader` takes an ``investigation``
 argument.  We will refer to the Investigation given in this argument
 as the *prescribed Investigation* in the following.  The metadata
@@ -55,3 +58,34 @@ Schema Definition (XSD).
 Another change with respect to ICAT data XML files is that the name of
 the root element is ``icatingest`` and that it must have a ``version``
 attrbute.
+
+Example
+~~~~~~~
+
+Consider the following example:
+
+.. literalinclude:: ../examples/metadata.xml
+   :language: xml
+
+This file defines four Datasets with related objects.  All datasets
+have a ``name``, ``description``, ``startDate``, and ``endDate``
+attribute and include a relation with an Instrument and a Technique,
+respectively.
+
+Note that the Datasets have no ``complete`` attribute and no relation
+with Investigation or DatasetType respectively.  All of these are
+added with prescribed values by class
+:class:`icat.ingest.IngestReader`.
+
+Some Datasets relate to Samples: the first two Datasets relate to the
+same Sample, the third Dataset to another Sample, while the last
+Dataset has no relation with any Sample.  All Samples a referenced by
+their name.  Class :class:`icat.ingest.IngestReader` will add a
+reference to the Investigation to this, so that only Samples that are
+related to prescribed Investigation can actually be referenced.
+
+Some DatasetParameter are added as separate objects in the file.  They
+respectively reference their related Datasets using local keys that
+are defined in the ``id`` attribute of the corresponding Dataset
+earlier in the file.  Alternatively, the DatasetParameter could have
+been included into into the respective Datasets.

From 634a4163b344acfd8360a3a274a479167fcf097b Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Mon, 12 Feb 2024 12:01:05 +0100
Subject: [PATCH 32/43] Language fixes in the documentation

---
 doc/src/file-icatdata.rst   | 12 ++++++------
 doc/src/file-icatingest.rst | 10 +++++-----
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst
index 73e84f3e..878c87f6 100644
--- a/doc/src/file-icatdata.rst
+++ b/doc/src/file-icatdata.rst
@@ -6,7 +6,7 @@ ICAT data files
 ICAT data files provide a way to serialize ICAT content to a flat
 file.  These files are read by the :ref:`icatingest` and written by
 the :ref:`icatdump` command line scripts respectively.  The program
-logic for reading and writing the files is provided by the
+logic for reading and writing the files is provided in the
 :mod:`icat.dumpfile` module.
 
 The actual file format depends on the version of the ICAT schema and
@@ -62,13 +62,13 @@ corresponding Grouping objects.
 References to ICAT objects and unique keys
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-References to related objects are encoded in ICAT data files by
+References to related objects are encoded in ICAT data files with
 reference keys.  There are two kinds of those keys, local keys and
 unique keys:
 
 When an ICAT object is defined in the file, it generally defines a
 local key at the same time.  Local keys are stored in the object index
-and may be used to reference this object from other obejcts in the
+and may be used to reference this object from other objects in the
 same data chunk.
 
 Unique keys can be obtained from an object by calling
@@ -149,7 +149,7 @@ Note that the UserGroup does not include its relation with Grouping.
 The latter relationship is implied by the parent relation of the
 object in the file.
 
-As an alternative, the Usergroup could have been added to the file as
+As an alternative, the UserGroup could have been added to the file as
 separate object as direct subelement of ``data``:
 
 .. code-block:: XML
@@ -262,7 +262,7 @@ that these UserGroups include their relation to the User, but not
 their relation with Grouping.  The latter relationship is implied by
 the parent relation of the object in the file.
 
-As an alternative, in the present example, the Usergroups could have
+As an alternative, in the present example, the UserGroups could have
 been added to the file as separate objects as in:
 
 .. code-block:: YAML
@@ -292,7 +292,7 @@ entity types in order to make sure that referenced objects are created
 before any object that may reference them.
 
 
-.. [#dc] There is one exception: DataCollections don't have a
+.. [#dc] There is one exception: DataCollections doesn't have a
          uniqueness constraint and can't reliably be searched by
          attributes.  Therefore local keys for DataCollections are
          always kept in the object index and may be used to reference
diff --git a/doc/src/file-icatingest.rst b/doc/src/file-icatingest.rst
index 2c650263..20b853f2 100644
--- a/doc/src/file-icatingest.rst
+++ b/doc/src/file-icatingest.rst
@@ -4,8 +4,8 @@ Metadata ingest files
 =====================
 
 Metadata ingest files are the input format for class
-:class:`icat.ingest.IngestReader`.  This class is intended to be uesd
-in scripts that read the metadata created by experimments into ICAT.
+:class:`icat.ingest.IngestReader`.  This class is intended to be used
+in scripts that read the metadata created by experiments into ICAT.
 The file format is basically a restricted version of
 :ref:`ICAT-data-xml-files`.
 
@@ -57,7 +57,7 @@ Schema Definition (XSD).
 
 Another change with respect to ICAT data XML files is that the name of
 the root element is ``icatingest`` and that it must have a ``version``
-attrbute.
+attribute.
 
 Example
 ~~~~~~~
@@ -79,8 +79,8 @@ added with prescribed values by class
 
 Some Datasets relate to Samples: the first two Datasets relate to the
 same Sample, the third Dataset to another Sample, while the last
-Dataset has no relation with any Sample.  All Samples a referenced by
-their name.  Class :class:`icat.ingest.IngestReader` will add a
+Dataset has no relation with any Sample.  All Samples are referenced
+by their name.  Class :class:`icat.ingest.IngestReader` will add a
 reference to the Investigation to this, so that only Samples that are
 related to prescribed Investigation can actually be referenced.
 

From 987f22ed0c53df433e7406d7d966c5592109ebb0 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Mon, 12 Feb 2024 12:19:22 +0100
Subject: [PATCH 33/43] Documentation fix: also the relation to DatasetType is
 added by IngestReader

---
 doc/src/file-icatingest.rst | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/doc/src/file-icatingest.rst b/doc/src/file-icatingest.rst
index 20b853f2..4ba46517 100644
--- a/doc/src/file-icatingest.rst
+++ b/doc/src/file-icatingest.rst
@@ -37,9 +37,10 @@ ways:
 * the attributes in the object definitions for Datasets are restricted
   to name, description, startDate, and endDate.
 
-* object definitions for Datasets can not include a reference to the
-  related Investigation.  The relation with the prescribed
-  Investigation will be implied.
+* object definitions for Datasets can not include references to the
+  related Investigation or DatasetType.  These relation will be added
+  by :class:`icat.ingest.IngestReader`.  The relation to the
+  Investigation will be set to the prescribed Investigation.
 
 * object definitions for Datasets can reference a related Sample only
   by name or by pid.  A relation of the related Sample with the

From 21235fadd3be2ae519b358f3d6e4aa923979149f Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Mon, 12 Feb 2024 13:45:56 +0100
Subject: [PATCH 34/43] - add a note on the versioning to metadata ingest file
 documentation - move the versionchanged note about adding icatingest 1.1 from
   documentation on module ingest to the metadata ingest file page

---
 doc/src/file-icatingest.rst | 10 ++++++++++
 doc/src/ingest.rst          |  3 ---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/doc/src/file-icatingest.rst b/doc/src/file-icatingest.rst
index 4ba46517..22c77814 100644
--- a/doc/src/file-icatingest.rst
+++ b/doc/src/file-icatingest.rst
@@ -19,6 +19,16 @@ control the Investigation that newly created Datasets will be added
 to.  It would be rather difficult to control the power of the input
 format if we would use plain ICAT data files for this purpose.
 
+.. note::
+   The metadata ingest file format is versioned.  This version number
+   is independent from the python-icat version.  It is incremented
+   only when the format changes.  The latest version of the metadata
+   ingest file format is 1.1.
+
+.. versionchanged:: 1.2.0
+   add metadata ingest file format version 1.1: add support for
+   relating Datasets with Samples.
+
 Differences compared to ICAT data XML files
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/doc/src/ingest.rst b/doc/src/ingest.rst
index 72eeb07a..ab6db393 100644
--- a/doc/src/ingest.rst
+++ b/doc/src/ingest.rst
@@ -52,9 +52,6 @@ reference to a ``Sample``.  That ``Sample`` objects needs to exist
 beforehand and needs to be related to the same ``Investigation`` as
 the ``Dataset``.
 
-.. versionchanged:: 1.2.0
-   add version 1.1 of the ingest file format, including references to samples
-
 .. autoclass:: icat.ingest.IngestReader
     :members:
     :show-inheritance:

From c3d360b997e25700228fc773e4d9b0625108208b Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Mon, 12 Feb 2024 14:07:53 +0100
Subject: [PATCH 35/43] Update documentation for module icat.ingest taking into
 account the new file format documentation

---
 doc/src/ingest.rst | 31 +++++++++----------------------
 1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/doc/src/ingest.rst b/doc/src/ingest.rst
index ab6db393..9ab94740 100644
--- a/doc/src/ingest.rst
+++ b/doc/src/ingest.rst
@@ -11,7 +11,7 @@
    even in minor releases of python-icat.
 
 This module provides class :class:`icat.ingest.IngestReader` that
-reads metadata from an XML file to add them to ICAT.  It is designed
+reads :ref:`ICAT-ingest-files` to add them to ICAT.  It is designed
 for the use case of ingesting metadata for datasets created during
 experiments.
 
@@ -21,22 +21,14 @@ that base class in restricting the vocabular of the input file: only
 objects that need to be created during ingestion from the experiment
 may appear in the input.  This restriction is enforced by first
 validating the input against an XML Schema Definition (XSD).  In a
-second step, the input is transformed into generic XML :ref:`ICAT data
-file <ICAT-data-files>` format using an XSL Transformation (XSLT) and
-then fed into :class:`~icat.dumpfile_xml.XMLDumpFileReader`.  The
-format of the input files may be customized to some extent by providing
-custom versions of XSD and XSLT files, see :ref:`ingest-customize`
-below.
-
-The input accepted by :class:`~icat.ingest.IngestReader` consists of
-one or more ``Dataset`` objects that all need to relate to the same
-``Investigation`` and any number of related ``DatasetTechnique``,
-``DatasetInstrument``, and ``DatasetParameter`` objects.  The
-``Investigation`` must exist beforehand in ICAT.  The relation from
-the ``Dataset`` objects to the ``Investigation`` will be set by
-:class:`~icat.ingest.IngestReader` accordingly.  (Actually, the XSLT
-will add that attribute to the datasets in the input.)  The
-``Dataset`` objects will not be created by
+second step, the input is transformed into generic :ref:`ICAT data XML
+file format <ICAT-data-xml-files>` using an XSL Transformation (XSLT)
+and then fed into :class:`~icat.dumpfile_xml.XMLDumpFileReader`.  The
+format of the input files may be customized to some extent by
+providing custom versions of XSD and XSLT files, see
+:ref:`ingest-customize` below.
+
+The ``Dataset`` objects in the input will not be created by
 :class:`~icat.ingest.IngestReader`, because it is assumed that a
 separate workflow in the caller will copy the content of datafiles to
 the storage managed by IDS and create the corresponding ``Dataset``
@@ -47,11 +39,6 @@ of the datasets will be read from the input file and set in the
 ``DatasetTechnique``, ``DatasetInstrument`` and ``DatasetParameter``
 objects read from the input file in ICAT.
 
-Using ingest file format 1.1, ``Dataset`` objects may also include a
-reference to a ``Sample``.  That ``Sample`` objects needs to exist
-beforehand and needs to be related to the same ``Investigation`` as
-the ``Dataset``.
-
 .. autoclass:: icat.ingest.IngestReader
     :members:
     :show-inheritance:

From f2b9657153cbb87d66bf7bfaa272f3fd89466d5e Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Mon, 12 Feb 2024 14:15:12 +0100
Subject: [PATCH 36/43] Another language fix

---
 doc/src/file-icatingest.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/src/file-icatingest.rst b/doc/src/file-icatingest.rst
index 22c77814..9794ba75 100644
--- a/doc/src/file-icatingest.rst
+++ b/doc/src/file-icatingest.rst
@@ -15,8 +15,8 @@ problem is that this file format is too powerful: it can encode any
 ICAT content.  We want the ingest files from the experiment to create
 new Datasets and DatasetParameters, we certainly don't want these
 files to create new Instruments or Users in ICAT.  And we also want to
-control the Investigation that newly created Datasets will be added
-to.  It would be rather difficult to control the power of the input
+control to which Investigation newly created Datasets are going to be
+added.  It would be rather difficult to control the power of the input
 format if we would use plain ICAT data files for this purpose.
 
 .. note::

From 9ad05b5962c3279a4e6be85e56c96337980ab242 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Mon, 12 Feb 2024 14:22:56 +0100
Subject: [PATCH 37/43] Yet another language fix

---
 doc/src/file-icatingest.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/file-icatingest.rst b/doc/src/file-icatingest.rst
index 9794ba75..7348259f 100644
--- a/doc/src/file-icatingest.rst
+++ b/doc/src/file-icatingest.rst
@@ -93,7 +93,7 @@ same Sample, the third Dataset to another Sample, while the last
 Dataset has no relation with any Sample.  All Samples are referenced
 by their name.  Class :class:`icat.ingest.IngestReader` will add a
 reference to the Investigation to this, so that only Samples that are
-related to prescribed Investigation can actually be referenced.
+related to the prescribed Investigation can actually be referenced.
 
 Some DatasetParameter are added as separate objects in the file.  They
 respectively reference their related Datasets using local keys that

From 3eefef5a94405e290705a9f89aa2cb208976d7a5 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Mon, 12 Feb 2024 14:41:22 +0100
Subject: [PATCH 38/43] Add kink anchors to the entries for each version in the
 changelog in order to provide more stable permalinks

---
 CHANGES.rst | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/CHANGES.rst b/CHANGES.rst
index 1744b6ef..a152e835 100644
--- a/CHANGES.rst
+++ b/CHANGES.rst
@@ -2,6 +2,8 @@ Changelog
 =========
 
 
+.. _changes-1_3_0:
+
 1.3.0 (not yet released)
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -36,6 +38,8 @@ Bug fixes and minor changes
 .. _#147: https://github.com/icatproject/python-icat/pull/147
 
 
+.. _changes-1_2_0:
+
 1.2.0 (2023-10-31)
 ~~~~~~~~~~~~~~~~~~
 
@@ -84,6 +88,8 @@ Bug fixes and minor changes
 .. _#140: https://github.com/icatproject/python-icat/pull/140
 
 
+.. _changes-1_1_0:
+
 1.1.0 (2023-06-30)
 ~~~~~~~~~~~~~~~~~~
 
@@ -139,6 +145,8 @@ Bug fixes and minor changes
 .. _#129: https://github.com/icatproject/python-icat/pull/129
 
 
+.. _changes-1_0_0:
+
 1.0.0 (2022-12-21)
 ~~~~~~~~~~~~~~~~~~
 
@@ -231,6 +239,8 @@ Bug fixes and minor changes
 .. _#106: https://github.com/icatproject/python-icat/pull/106
 
 
+.. _changes-0_21_0:
+
 0.21.0 (2022-01-28)
 ~~~~~~~~~~~~~~~~~~~
 
@@ -249,6 +259,8 @@ New features
 .. _#100: https://github.com/icatproject/python-icat/pull/100
 
 
+.. _changes-0_20_1:
+
 0.20.1 (2021-11-04)
 ~~~~~~~~~~~~~~~~~~~
 
@@ -260,6 +272,8 @@ Bug fixes and minor changes
 .. _#96: https://github.com/icatproject/python-icat/pull/96
 
 
+.. _changes-0_20_0:
+
 0.20.0 (2021-10-29)
 ~~~~~~~~~~~~~~~~~~~
 
@@ -296,6 +310,8 @@ Bug fixes and minor changes
 .. _#95: https://github.com/icatproject/python-icat/pull/95
 
 
+.. _changes-0_19_0:
+
 0.19.0 (2021-07-20)
 ~~~~~~~~~~~~~~~~~~~
 
@@ -324,6 +340,8 @@ Bug fixes and minor changes
 .. _#85: https://github.com/icatproject/python-icat/pull/85
 
 
+.. _changes-0_18_1:
+
 0.18.1 (2021-04-13)
 ~~~~~~~~~~~~~~~~~~~
 
@@ -341,6 +359,8 @@ Bug fixes and minor changes
 .. _#82: https://github.com/icatproject/python-icat/pull/82
 
 
+.. _changes-0_18_0:
+
 0.18.0 (2021-03-29)
 ~~~~~~~~~~~~~~~~~~~
 
@@ -377,6 +397,8 @@ Bug fixes and minor changes
 .. _#80: https://github.com/icatproject/python-icat/pull/80
 
 
+.. _changes-0_17_0:
+
 0.17.0 (2020-04-30)
 ~~~~~~~~~~~~~~~~~~~
 
@@ -468,6 +490,8 @@ Misc
 .. _#72: https://github.com/icatproject/python-icat/issues/72
 
 
+.. _changes-0_16_0:
+
 0.16.0 (2019-09-26)
 ~~~~~~~~~~~~~~~~~~~
 
@@ -492,6 +516,8 @@ Bug fixes and minor changes
 .. _#60: https://github.com/icatproject/python-icat/pull/60
 
 
+.. _changes-0_15_1:
+
 0.15.1 (2019-07-12)
 ~~~~~~~~~~~~~~~~~~~
 
@@ -513,6 +539,8 @@ Bug fixes and minor changes
 .. _#57: https://github.com/icatproject/python-icat/issues/57
 
 
+.. _changes-0_15_0:
+
 0.15.0 (2019-03-27)
 ~~~~~~~~~~~~~~~~~~~
 
@@ -551,6 +579,8 @@ Bug fixes and minor changes
 .. _#54: https://github.com/icatproject/python-icat/issues/54
 
 
+.. _changes-0_14_2:
+
 0.14.2 (2018-10-25)
 ~~~~~~~~~~~~~~~~~~~
 
@@ -563,6 +593,8 @@ Bug fixes and minor changes
   probably not need it.
 
 
+.. _changes-0_14_1:
+
 0.14.1 (2018-06-05)
 ~~~~~~~~~~~~~~~~~~~
 
@@ -573,6 +605,8 @@ Bug fixes and minor changes
   for the Write API call.
 
 
+.. _changes-0_14_0:
+
 0.14.0 (2018-06-01)
 ~~~~~~~~~~~~~~~~~~~
 
@@ -628,6 +662,8 @@ Bug fixes and minor changes
 .. _#48: https://github.com/icatproject/python-icat/issues/48
 
 
+.. _changes-0_13_1:
+
 0.13.1 (2017-07-12)
 ~~~~~~~~~~~~~~~~~~~
 
@@ -640,6 +676,8 @@ Bug fixes and minor changes
 .. _#38: https://github.com/icatproject/python-icat/issues/38
 
 
+.. _changes-0_13_0:
+
 0.13.0 (2017-06-09)
 ~~~~~~~~~~~~~~~~~~~
 
@@ -798,6 +836,8 @@ Bug fixes and minor changes
 .. _pytest-dependency: https://pypi.python.org/pypi/pytest_dependency/
 
 
+.. _changes-0_12_0:
+
 0.12.0 (2016-10-10)
 ~~~~~~~~~~~~~~~~~~~
 
@@ -837,6 +877,8 @@ Bug fixes and minor changes
 .. _#28: https://github.com/icatproject/python-icat/issues/28
 
 
+.. _changes-0_11_0:
+
 0.11.0 (2016-06-01)
 ~~~~~~~~~~~~~~~~~~~
 
@@ -896,6 +938,8 @@ Misc
 .. _distutils_pytest: https://github.com/RKrahl/distutils-pytest
 
 
+.. _changes-0_10_0:
+
 0.10.0 (2015-12-06)
 ~~~~~~~~~~~~~~~~~~~
 
@@ -964,6 +1008,8 @@ Bug fixes and minor changes
 .. _#15: https://github.com/icatproject/python-icat/issues/15
 
 
+.. _changes-0_9_0:
+
 0.9.0 (2015-08-13)
 ~~~~~~~~~~~~~~~~~~
 
@@ -1067,6 +1113,8 @@ Bug fixes and minor changes
 .. _#10: https://github.com/icatproject/python-icat/issues/10
 
 
+.. _changes-0_8_0:
+
 0.8.0 (2015-05-08)
 ~~~~~~~~~~~~~~~~~~
 
@@ -1156,6 +1204,8 @@ Bug fixes and minor changes
   :meth:`icat.query.Query.__repr__`.
 
 
+.. _changes-0_7_0:
+
 0.7.0 (2015-02-11)
 ~~~~~~~~~~~~~~~~~~
 
@@ -1187,6 +1237,8 @@ New features
   :meth:`icat.ids.IDSClient.getLink` method.
 
 
+.. _changes-0_6_0:
+
 0.6.0 (2014-12-15)
 ~~~~~~~~~~~~~~~~~~
 
@@ -1314,6 +1366,8 @@ Minor changes and fixes
 + Add comparison operators to class :class:`icat.listproxy.ListProxy`.
 
 
+.. _changes-0_5_1:
+
 0.5.1 (2014-07-07)
 ~~~~~~~~~~~~~~~~~~
 
@@ -1357,6 +1411,8 @@ Minor changes and fixes
   modifications, such as running 2to3 on them.
 
 
+.. _changes-0_5_0:
+
 0.5.0 (2014-06-24)
 ~~~~~~~~~~~~~~~~~~
 
@@ -1399,6 +1455,8 @@ Minor changes and fixes
 .. __: https://github.com/icatproject/icat.server/issues/112
 
 
+.. _changes-0_4_0:
+
 0.4.0 (2014-02-11)
 ~~~~~~~~~~~~~~~~~~
 
@@ -1446,6 +1504,8 @@ Minor changes and fixes
   :ref:`icatrestore <icatingest>`.
 
 
+.. _changes-0_3_0:
+
 0.3.0 (2014-01-10)
 ~~~~~~~~~~~~~~~~~~
 
@@ -1492,6 +1552,8 @@ Minor changes and fixes
 + Add example scripts :ref:`icatdump` and :ref:`icatrestore <icatingest>`.
 
 
+.. _changes-0_2_0:
+
 0.2.0 (2013-11-18)
 ~~~~~~~~~~~~~~~~~~
 
@@ -1532,6 +1594,8 @@ Minor changes and fixes
   import :mod:`icat` and :mod:`icat.config`.
 
 
+.. _changes-0_1_0:
+
 0.1.0 (2013-11-01)
 ~~~~~~~~~~~~~~~~~~
 

From f1f2b73fe933898a2ed8f6dca5dfa861a9fa2c3d Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Mon, 19 Feb 2024 15:51:29 +0100
Subject: [PATCH 39/43] Dynamically create a file _meta.rst in the
 documentation source that defines substitutions and download links for the
 latest source distribution and signature file

---
 doc/.gitignore  |  1 +
 doc/Makefile    |  1 +
 doc/src/conf.py | 29 ++++++++++++++++++++++++++++-
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/doc/.gitignore b/doc/.gitignore
index e938dd2d..b6a292cd 100644
--- a/doc/.gitignore
+++ b/doc/.gitignore
@@ -1,3 +1,4 @@
+/src/_meta.rst
 /devhelp/
 /dirhtml/
 /doctest/
diff --git a/doc/Makefile b/doc/Makefile
index 9cc7cebc..7358c71a 100644
--- a/doc/Makefile
+++ b/doc/Makefile
@@ -20,6 +20,7 @@ $(BUILDERS): $(STATIC_SOURCEDIRS)
 
 distclean:
 	rm -rf doctrees $(BUILDERS)
+	rm -f src/_meta.rst
 
 $(STATIC_SOURCEDIRS):
 	mkdir $@
diff --git a/doc/src/conf.py b/doc/src/conf.py
index 2f880389..1496d62a 100644
--- a/doc/src/conf.py
+++ b/doc/src/conf.py
@@ -9,7 +9,8 @@
 from pathlib import Path
 import sys
 
-maindir = Path(__file__).resolve().parent.parent.parent
+docsrcdir = Path(__file__).resolve().parent
+maindir = docsrcdir.parent.parent
 buildlib = maindir / "build" / "lib"
 sys.path[0] = str(buildlib)
 sys.dont_write_bytecode = True
@@ -28,6 +29,32 @@
 # The short X.Y version
 version = ".".join(release.split(".")[0:2])
 
+# Write a _meta.rst that defines some custom substitutions
+def make_meta_rst(last_release):
+    template = """:orphan:
+
+.. |distribution_source| replace:: %(dist_src_name)s
+.. |distribution_signature| replace:: %(dist_sig_name)s
+.. _distribution_source: %(dist_src_url)s
+.. _distribution_signature: %(dist_sig_url)s
+"""
+    github_repo = "https://github.com/icatproject/python-icat"
+    dist_src_name = "python-icat-%s.tar.gz" % last_release
+    dist_src_url = ("%s/releases/download/%s/%s"
+                    % (github_repo, last_release, dist_src_name))
+    dist_sig_name = "python-icat-%s.tar.gz.asc" % last_release
+    dist_sig_url = ("%s/releases/download/%s/%s"
+                    % (github_repo, last_release, dist_sig_name))
+    subst = {
+        'dist_src_name': dist_src_name,
+        'dist_src_url': dist_src_url,
+        'dist_sig_name': dist_sig_name,
+        'dist_sig_url': dist_sig_url,
+    }
+    with (docsrcdir / '_meta.rst').open('wt') as f:
+        print(template % subst, file=f)
+
+make_meta_rst(icat._meta.release)
 
 # -- General configuration ---------------------------------------------------
 

From 8f35940836c76ef500a465ca45b73163194fbe9e Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Mon, 19 Feb 2024 17:26:27 +0100
Subject: [PATCH 40/43] Review install instructions, explaining how to verify
 the signature

---
 doc/src/83F336432C7FCC91.pub | 44 ++++++++++++++++++++++++++++++++
 doc/src/install.rst          | 49 ++++++++++++++++++++++++++++--------
 2 files changed, 83 insertions(+), 10 deletions(-)
 create mode 100644 doc/src/83F336432C7FCC91.pub

diff --git a/doc/src/83F336432C7FCC91.pub b/doc/src/83F336432C7FCC91.pub
new file mode 100644
index 00000000..330f2f80
--- /dev/null
+++ b/doc/src/83F336432C7FCC91.pub
@@ -0,0 +1,44 @@
+-----BEGIN PGP PUBLIC KEY BLOCK-----
+
+mQENBFE3WkEBCADM4jKAQMsVlnU5NxbJ5JmpqhPRj54eSkDcvIjPcEQLkMmQjCDT
+HHwN5ZjzHNTj7nXkvmjjWMgyzjpNmdUAofsh6MBp1etXNzYNkoEs+urRlw1wuRaU
+NMK4Pf0G35THrQ0nJdmmCGkzxiTgQTitLVA52zZclq3Vqo/ZsO26gkLB2ErhZJZE
+2q+TL6BBr98m+1zXpG5kqF/IE4pF4Yl1Oysp8imAAbodr+6X1DGfOM2h1NwMSbAo
+Uw49hR4PIwxKP5Sluv6GNUVgyPaOrk8LVE4c+H0lswmz6nZOlxhhbtplN0KViqki
+6pqyrOuwv3ZgzUXO4bjEexScyWe2PxKUzjFFABEBAAG0K1JvbGYgS3JhaGwgPHJv
+bGYua3JhaGxAaGVsbWhvbHR6LWJlcmxpbi5kZT6JATkEEwECACMFAlE3WkECGwMH
+CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCD8zZDLH/Mkcj5CAC0x2GU88xD
+eBR1MyGq9nUTDgjO/EkiztDZirBg1FLGwCVtXY3yZc0nSriEj4oF8lNiGU539rU1
+R+z76UCDTlq/xq2/a1BazStkHuv+OuUfoA/Hl5/Tvp+dwk7BXG6dlyr6joT3i9Pz
+RgH/kFe1RAJNnT/oy5LTRsydcWb/mCey/O/ON47zlKzNbbGvL6YPwmsyaO22vUmO
+JsH4JZM36BDu3Wt2LPB+A51ZanzlxkfA3Mcc0cIe9PsSqufvnV/kG4cQxJedgXes
+lVniggXbtsudl8EqmUpq/yS+/X3BLBfidTA2Yicx6udmR5ZFQHoCrOlcTfylW0mz
+x5rhClZPgrgaiEYEExECAAYFAlQVluoACgkQUcvGPyCdlGaaRgCg0s2cWgUXWeb7
+noexGZNxnmQIMrgAoJqBXBVVrWfd7bwdWT1IEnyGMiCeiEYEExECAAYFAlQVlyYA
+CgkQO0qCjX1HQDs8HACfduvRjIu+wmrvyN+ikPXHN6ZJYOAAni4k+F5m7P9RkUK/
+MPW34JrqaIg8iQIcBBMBAgAGBQJYRG+HAAoJEAihJkF1ND5uePgP/3okgaIQOwcy
+7lN2SiP1k/UxjmqynrdrsTWdGRm+wyJ9Er9WlHgMQavaxk2XOpTQ8DcAuczpNyOb
+qaYI6l+xd8mDvdJ7lbYZboiZj62nb/yUwRAyN3TJ7PRjuWXqLZjVnywQzYN66Z2v
+kuxewEqZUeLVlUcg7IEwwCOErAmHFfYmIER7Q0Hyvc8gdkbFzgQ5UNHyLUngMe+6
+VGLlkoyRykF9DDCmqMQO06Ork78gsTVTHr0LEMG3HyKiQ8rLZouSQS9tiw7RVIji
+nbf1EWRvVwgSXPSsx545uVwUOSyXlozK7AzFxjlFJU8G9+h1fXYlkviFPrsU2vwa
+6q8GiVnaLpwa2QC9iznPTzSnUFh9Eqg8aO4DqpH28L+o5PTClmWUGncqigmYGipm
+2s0AKdtRFVXcz7fmH8JKi9u9dBtJPIbdA3Kq/D6+1GkiS5V0aELWI+0424RJ5qlO
+MHukVUxg0QH/MJnzfRT3MAV5gBpJC5KrijwS7FN8m+CQN35+OMoiBbpOKt/+wQgF
+K31D/M55CZoaeVtkcLiTRjUig2Dwr/16IMd5IcpetNoIcUILDENcWh0mYo02kaJt
+nldsZIAi77goxdgKu41AIIhEv0FmlXp6OB/QoEJRiDOVtxSW7bG1F+JbularecE2
+t5PehBq5k35vxo8tteL1xQIP+8nnOtUJuQENBFE3WkEBCADB84pLmmsdFjV5R+0e
+zL2COBZBUxUPSIuKOdEfHkR5M5AxbXdg9GwxDMZE1TLAdX8sn1ymwUlZt6dSUFO0
+hg0LdZAOMvjvFb6dF+RE7gfeOsH0usTN32NUzW0/S1E2V8LRlplGIXtHa9YZArQw
+k97gpFATheh4K/QHvrIyneVam+B+6WH8zJtBfGmWtjfBLwSiWohQPQAvYBW6hi86
++I3z0yCrOhgM/N9uylgWu+BQzoQ8/Jv2g22bzSa1mbCP1OVp587HpJy9WbX/aKH4
+7I/vp0qLysWekbuX5OOjsiItW2Yv7oK/S7OtoagTUqX3KG1KRTJZHTTS03dy3DME
+fqNtABEBAAGJAR8EGAECAAkFAlE3WkECGwwACgkQg/M2Qyx/zJEJcAgAsE8NNJYX
+/3Vdd9WQih4Xg2Pvz66Z9jwTyS9Rb3boB0gtZMgqsHQBdF9iYNVxREpiVDPA0YKR
+x1iTjFblt9Ryq7MZVPhRI1cfDfHKCw6bMz1hZDBRr1BSZVjiru74OCebreeOMhzI
+zmyP7GSi0q5edZO0zpYkOlme3dQBatSkEAnSDOA9ct6EEMG3ZsQda1YXa9BMKj7e
+B+UdFUdGb5SB8buW5RKLMTD485gKpvxWpYptP5DD3r3mThc2m5uWdiAM+jqm9Flc
+NlD0bZ8tdZpbPOgxnbAuy7HEPaS/VnGZHouwZWpb484dynCO7+Oi1f2y2tPx0uXV
+DRFDDLLR3oBEag==
+=+2H3
+-----END PGP PUBLIC KEY BLOCK-----
diff --git a/doc/src/install.rst b/doc/src/install.rst
index 78fd935d..de15d475 100644
--- a/doc/src/install.rst
+++ b/doc/src/install.rst
@@ -1,11 +1,11 @@
+.. include:: _meta.rst
+
 Install instructions
 ====================
 
-Release packages of python-icat are published in the `Python Package
-Index (PyPI)`__.  See :ref:`install-using-pip` for the short version
-of the install instructions.
+See :ref:`install-using-pip` for the short version of the install
+instructions.
 
-.. __: `PyPI site`_
 
 
 System requirements
@@ -114,26 +114,54 @@ Installation
 Installation using pip
 ......................
 
-You can install python-icat from PyPI using pip::
+You can install python-icat from the
+`Python Package Index (PyPI) <PyPI site_>`_ using pip::
 
   $ pip install python-icat
 
+Note that while installing from PyPI is convenient, there is no way to
+verify the integrity of the source distribution, which may be
+considered a security risk.
+
 Installation from the source distribution
 .........................................
 
 Steps to manually build from the source distribution:
 
-1. Download the sources, unpack, and change into the source directory.
+1. Download the sources.
+
+   From the `Release Page <GitHub latest release_>`_ you may download
+   the source distribution file |distribution_source|_ and the
+   detached signature file |distribution_signature|_
+
+2. Check the signature (optional).
+
+   You may verify the integrity of the source distribution by checking
+   the signature (showing the output for version 1.2.0 as an example)::
+
+     $ gpg --verify python-icat-1.2.0.tar.gz.asc
+     gpg: assuming signed data in 'python-icat-1.2.0.tar.gz'
+     gpg: Signature made Tue Oct 31 07:01:55 2023 CET
+     gpg:                using RSA key 760465DAF652737A61EC0C9D83F336432C7FCC91
+     gpg: Good signature from "Rolf Krahl <rolf.krahl@helmholtz-berlin.de>" [full]
 
-2. Build::
+   The signature should be made by the key
+   :download:`0x760465DAF652737A61EC0C9D83F336432C7FCC91
+   <83F336432C7FCC91.pub>`.  The fingerprint of that key is::
+
+     7604 65DA F652 737A 61EC  0C9D 83F3 3643 2C7F CC91
+
+3. Unpack and change into the source directory.
+
+4. Build (optional)::
 
      $ python setup.py build
 
-3. Test (optional, see below)::
+5. Test (optional, see below)::
 
      $ python setup.py test
 
-4. Install::
+6. Install::
 
      $ python setup.py install
 
@@ -179,7 +207,6 @@ You can safely run the tests without configuring any test server.  You
 will just get many skipped tests then.
 
 
-.. _PyPI site: https://pypi.org/project/python-icat/
 .. _setuptools: https://github.com/pypa/setuptools/
 .. _packaging: https://github.com/pypa/packaging/
 .. _suds-jurko: https://pypi.org/project/suds-jurko/
@@ -191,5 +218,7 @@ will just get many skipped tests then.
 .. _pytest: https://docs.pytest.org/en/latest/
 .. _pytest-dependency: https://pypi.org/project/pytest-dependency/
 .. _distutils-pytest: https://github.com/RKrahl/distutils-pytest/
+.. _PyPI site: https://pypi.org/project/python-icat/
+.. _GitHub latest release: https://github.com/icatproject/python-icat/releases/latest/
 .. _GitHub repository: https://github.com/icatproject/python-icat/
 .. _Issue #72: https://github.com/icatproject/python-icat/issues/72

From 67b947c31be3111b1610d0a2baf627006b58e557 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Mon, 19 Feb 2024 17:43:45 +0100
Subject: [PATCH 41/43] Fixup 8f35940: need to run doc/src/conf.py before
 doc8-check now

---
 .github/workflows/rst-lint.yaml | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/.github/workflows/rst-lint.yaml b/.github/workflows/rst-lint.yaml
index b9b239f7..9205803a 100644
--- a/.github/workflows/rst-lint.yaml
+++ b/.github/workflows/rst-lint.yaml
@@ -11,6 +11,17 @@ jobs:
     steps:
       - name: Check out repository code
         uses: actions/checkout@v4
+      - name: Set up Python 3.11
+        uses: actions/setup-python@v4
+        with:
+          python-version: 3.11
+      - name: Install dependencies
+        run: |
+          pip install setuptools packaging git-props suds
+      - name: Run conf.py
+        run: |
+          python setup.py build
+          python doc/src/conf.py
       - name: doc8-check
         uses: deep-entertainment/doc8-action@v4
         with:

From 2a827fdcc2820e7d6f2d76477730d081f96119ef Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Mon, 19 Feb 2024 17:50:30 +0100
Subject: [PATCH 42/43] Aesthetic fix for rst-lint action: unshallow the
 checked out repository in order to get the correct version number in the
 diagnostics

---
 .github/workflows/rst-lint.yaml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/.github/workflows/rst-lint.yaml b/.github/workflows/rst-lint.yaml
index 9205803a..187ce87c 100644
--- a/.github/workflows/rst-lint.yaml
+++ b/.github/workflows/rst-lint.yaml
@@ -11,6 +11,8 @@ jobs:
     steps:
       - name: Check out repository code
         uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
       - name: Set up Python 3.11
         uses: actions/setup-python@v4
         with:

From 848a4745abd3aeaab680c2dc25b5f4d47c7fe357 Mon Sep 17 00:00:00 2001
From: Rolf Krahl <rolf.krahl@helmholtz-berlin.de>
Date: Mon, 19 Feb 2024 18:09:44 +0100
Subject: [PATCH 43/43] Some tweaks in the install instructions: - Point out
 that a manual install does not automatically install   dependencies, -
 Removed yet another reference of PyPI yo get release versions from, - Minor
 formulation fix.

---
 doc/src/install.rst | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/doc/src/install.rst b/doc/src/install.rst
index de15d475..12d0bd41 100644
--- a/doc/src/install.rst
+++ b/doc/src/install.rst
@@ -126,7 +126,9 @@ considered a security risk.
 Installation from the source distribution
 .........................................
 
-Steps to manually build from the source distribution:
+Note that the manual build does not automatically check the
+dependencies.  So we assume that you have all the systems requirements
+installed.  Steps to manually build from the source distribution:
 
 1. Download the sources.
 
@@ -172,9 +174,9 @@ Building from development sources
 .................................
 
 For production use, it is always recommended to use the latest release
-version from PyPI, see above.  If you need some not yet released
-bleeding edge feature or if you want to participate in the
-development, you may also clone the `source repository from GitHub`__.
+version, see above.  If you need some not yet released bleeding edge
+feature or if you want to participate in the development, you may also
+clone the `source repository from GitHub`__.
 
 Note that some source files are dynamically created and thus missing
 in the development sources.  If you want to build from the development
@@ -203,8 +205,8 @@ authentication plugin must also have these users configured.
 from the test server and replace it with example content.  Do not
 configure the tests to access a production server!
 
-You can safely run the tests without configuring any test server.  You
-will just get many skipped tests then.
+You can safely run the tests without configuring any test server.  But
+most of the test will be skipped then.
 
 
 .. _setuptools: https://github.com/pypa/setuptools/