From bba8f5f948821c5afb9355ee596ff644f7d24615 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Wed, 22 Nov 2023 09:33:37 +0100 Subject: [PATCH 01/43] Point to the release page on GitHub instead of PyPI for the download link in the README --- README.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.rst b/README.rst index 1d1b19ca..e32e8ce3 100644 --- a/README.rst +++ b/README.rst @@ -21,10 +21,10 @@ is based on Suds and extends it with ICAT specific features. Download -------- -The latest release version can be found in the -`Python Package Index (PyPI)`__. +The latest release version can be found at the +`release page on GitHub`__. -.. __: `PyPI site`_ +.. __: `GitHub release`_ Documentation @@ -64,6 +64,6 @@ permissions and limitations under the License. .. _ICAT: https://icatproject.org/ -.. _PyPI site: https://pypi.org/project/python-icat/ +.. _GitHub release: https://github.com/icatproject/python-icat/releases/latest .. _Read the Docs site: https://python-icat.readthedocs.io/ .. _Apache License: https://www.apache.org/licenses/LICENSE-2.0 From 9be8303b0dd96ac5ef991a2fecab34b3830303fd Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Wed, 22 Nov 2023 09:48:27 +0100 Subject: [PATCH 02/43] Put a full download URL in the spec file --- python-icat.spec | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/python-icat.spec b/python-icat.spec index b0f4292c..2401ab9f 100644 --- a/python-icat.spec +++ b/python-icat.spec @@ -15,7 +15,7 @@ Url: $url Summary: $description License: Apache-2.0 Group: Development/Libraries/Python -Source: %{name}-%{version}.tar.gz +Source: https://github.com/icatproject/python-icat/releases/latest/download/python-icat-%{version}.tar.gz BuildRequires: python%{pyversfx}-base >= 3.4 BuildRequires: python%{pyversfx}-setuptools BuildRequires: fdupes From 3999092882b89d43295a4c6dfff1f60779ce5db3 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Thu, 21 Dec 2023 10:24:58 +0100 Subject: [PATCH 03/43] Set the table of predefined configuration variables in a table directive --- doc/src/_static/css/captions.css | 14 +++++++ doc/src/conf.py | 1 + doc/src/config.rst | 66 ++++++++++++++++---------------- 3 files changed, 49 insertions(+), 32 deletions(-) create mode 100644 doc/src/_static/css/captions.css diff --git a/doc/src/_static/css/captions.css b/doc/src/_static/css/captions.css new file mode 100644 index 00000000..8321eee7 --- /dev/null +++ b/doc/src/_static/css/captions.css @@ -0,0 +1,14 @@ +.rst-content div.figure p.caption, .rst-content table.docutils caption, .rst-content div.code-block-caption{ + color: #404040; + font-style: italic; + font-size: 90%; + line-height: normal; + text-align: left; +} +.rst-content div.figure p.caption span.caption-number, .rst-content table.docutils caption span.caption-number, .rst-content div.code-block-caption span.caption-number{ + font-weight: bold; +} +.rst-content div.code-block-caption a.headerlink, .rst-content table.docutils caption a.headerlink{ + display: none; + visibility: hidden; +} diff --git a/doc/src/conf.py b/doc/src/conf.py index f41f3773..a5fca0c9 100644 --- a/doc/src/conf.py +++ b/doc/src/conf.py @@ -109,6 +109,7 @@ html_favicon = "images/favicon-32x32.png" html_css_files = [ + 'css/captions.css', 'css/spacing.css', ] diff --git a/doc/src/config.rst b/doc/src/config.rst index 0f5c42bd..ff706eed 100644 --- a/doc/src/config.rst +++ b/doc/src/config.rst @@ -138,38 +138,40 @@ A few derived variables are also set in (username and password if authenticator information is not available) suitable to be passed to :meth:`icat.client.Client.login`. -The command line arguments, environment variables, and default values -for the configuration variables are as follows: - -+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ -| Name | Command line | Environment | Default | Mandatory | Notes | -+=================+=============================+=======================+================+===========+==============+ -| `configFile` | ``-c``, ``--configfile`` | ``ICAT_CFG`` | depends | no | \(1) | -+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ -| `configSection` | ``-s``, ``--configsection`` | ``ICAT_CFG_SECTION`` | :const:`None` | no | | -+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ -| `url` | ``-w``, ``--url`` | ``ICAT_SERVICE`` | | yes | | -+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ -| `idsurl` | ``--idsurl`` | ``ICAT_DATA_SERVICE`` | :const:`None` | depends | \(2) | -+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ -| `checkCert` | ``--check-certificate``, | | :const:`True` | no | | -| | ``--no-check-certificate`` | | | | | -+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ -| `http_proxy` | ``--http-proxy`` | ``http_proxy`` | :const:`None` | no | | -+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ -| `https_proxy` | ``--https-proxy`` | ``https_proxy`` | :const:`None` | no | | -+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ -| `no_proxy` | ``--no-proxy`` | ``no_proxy`` | :const:`None` | no | | -+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ -| `auth` | ``-a``, ``--auth`` | ``ICAT_AUTH`` | | yes | \(3) | -+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ -| `username` | ``-u``, ``--user`` | ``ICAT_USER`` | | yes | \(3),(4) | -+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ -| `password` | ``-p``, ``--pass`` | | interactive | yes | \(3),(4),(5) | -+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ -| `promptPass` | ``-P``, ``--prompt-pass`` | | :const:`False` | no | \(3),(4),(5) | -+-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ - +.. table:: Command line arguments, environment variables, and default values + for the configuration variables. + :name: tab-config-vars + + +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ + | Name | Command line | Environment | Default | Mandatory | Notes | + +=================+=============================+=======================+================+===========+==============+ + | `configFile` | ``-c``, ``--configfile`` | ``ICAT_CFG`` | depends | no | \(1) | + +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ + | `configSection` | ``-s``, ``--configsection`` | ``ICAT_CFG_SECTION`` | :const:`None` | no | | + +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ + | `url` | ``-w``, ``--url`` | ``ICAT_SERVICE`` | | yes | | + +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ + | `idsurl` | ``--idsurl`` | ``ICAT_DATA_SERVICE`` | :const:`None` | depends | \(2) | + +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ + | `checkCert` | ``--check-certificate``, | | :const:`True` | no | | + | | ``--no-check-certificate`` | | | | | + +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ + | `http_proxy` | ``--http-proxy`` | ``http_proxy`` | :const:`None` | no | | + +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ + | `https_proxy` | ``--https-proxy`` | ``https_proxy`` | :const:`None` | no | | + +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ + | `no_proxy` | ``--no-proxy`` | ``no_proxy`` | :const:`None` | no | | + +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ + | `auth` | ``-a``, ``--auth`` | ``ICAT_AUTH`` | | yes | \(3) | + +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ + | `username` | ``-u``, ``--user`` | ``ICAT_USER`` | | yes | \(3),(4) | + +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ + | `password` | ``-p``, ``--pass`` | | interactive | yes | \(3),(4),(5) | + +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ + | `promptPass` | ``-P``, ``--prompt-pass`` | | :const:`False` | no | \(3),(4),(5) | + +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ + +See the table for an overview of predefined configuration variables. Mandatory means that an error will be raised in :meth:`icat.config.Config.getconfig` if no value is found for the configuration variable in question. From 299c82bc817dc673e7a1ca981b410a9b9d8f7f8c Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Thu, 21 Dec 2023 11:21:14 +0100 Subject: [PATCH 04/43] ReST style fixes: - fix tabulation used for indentation - remove trailing white space --- doc/src/client.rst | 2 +- doc/src/config.rst | 6 +++--- doc/src/icatingest.rst | 6 +++--- doc/src/ingest.rst | 2 +- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/doc/src/client.rst b/doc/src/client.rst index 0f0b99dc..e32dc2df 100644 --- a/doc/src/client.rst +++ b/doc/src/client.rst @@ -29,7 +29,7 @@ manages the interaction with an ICAT service as a client. Version of the ICAT server this client connects to. - .. versionchanged:: 1.0.0 + .. versionchanged:: 1.0.0 changed type to :class:`icat.helper.Version` .. attribute:: autoLogout diff --git a/doc/src/config.rst b/doc/src/config.rst index ff706eed..af597211 100644 --- a/doc/src/config.rst +++ b/doc/src/config.rst @@ -62,8 +62,8 @@ added. The main class that client programs interact with is .. attribute:: client The :class:`icat.client.Client` object initialized according to - the configuration. This is also the first element in the - return value from :meth:`getconfig`. + the configuration. This is also the first element in the + return value from :meth:`getconfig`. .. attribute:: client_kwargs @@ -139,7 +139,7 @@ A few derived variables are also set in available) suitable to be passed to :meth:`icat.client.Client.login`. .. table:: Command line arguments, environment variables, and default values - for the configuration variables. + for the configuration variables. :name: tab-config-vars +-----------------+-----------------------------+-----------------------+----------------+-----------+--------------+ diff --git a/doc/src/icatingest.rst b/doc/src/icatingest.rst index 7cba2199..62e571cb 100644 --- a/doc/src/icatingest.rst +++ b/doc/src/icatingest.rst @@ -71,12 +71,12 @@ The following options are specific to icatingest: **CHECK** Compare all attributes from the input object with the already - existing object in ICAT. Throw an error of any attribute - differs. + existing object in ICAT. Throw an error of any attribute + differs. **OVERWRITE** Overwrite the existing object in ICAT, e.g. update it with all - attributes set to the values found in the input object. + attributes set to the values found in the input object. If :option:`--upload-datafiles` is set, this option will be ignored for Datafile objects which will then always raise an error diff --git a/doc/src/ingest.rst b/doc/src/ingest.rst index 4fed8b7e..a2880030 100644 --- a/doc/src/ingest.rst +++ b/doc/src/ingest.rst @@ -121,7 +121,7 @@ class attributes as follows:: import icat.ingest class MyFacilityIngestReader(icat.ingest.IngestReader): - + # Override the directory to search for XSD and XSLT files: SchemaDir = Path("/usr/share/icat/my-facility") From 1b27f82fe3607d917bed830b7a92fdb2a8d49a72 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Thu, 21 Dec 2023 11:24:13 +0100 Subject: [PATCH 05/43] Set the Synopsis section in man pages as line blocks --- doc/src/icatdump.rst | 2 +- doc/src/icatingest.rst | 3 ++- doc/src/wipeicat.rst | 2 +- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/doc/src/icatdump.rst b/doc/src/icatdump.rst index 6e7d0caf..0023fca3 100644 --- a/doc/src/icatdump.rst +++ b/doc/src/icatdump.rst @@ -7,7 +7,7 @@ icatdump Synopsis ~~~~~~~~ -**icatdump** [*standard options*] [-o FILE] [-f FORMAT] +| **icatdump** [*standard options*] [-o FILE] [-f FORMAT] Description diff --git a/doc/src/icatingest.rst b/doc/src/icatingest.rst index 62e571cb..c260d468 100644 --- a/doc/src/icatingest.rst +++ b/doc/src/icatingest.rst @@ -7,7 +7,8 @@ icatingest Synopsis ~~~~~~~~ -**icatingest** [*standard options*] [-i FILE] [-f FORMAT] [--upload-datafiles] [--datafile-dir DATADIR] [--duplicate OPTION] +| **icatingest** [*standard options*] [-i FILE] [-f FORMAT] +| [--upload-datafiles] [--datafile-dir DATADIR] [--duplicate OPTION] Description diff --git a/doc/src/wipeicat.rst b/doc/src/wipeicat.rst index 89567684..1c1ca4cd 100644 --- a/doc/src/wipeicat.rst +++ b/doc/src/wipeicat.rst @@ -7,7 +7,7 @@ wipeicat Synopsis ~~~~~~~~ -**wipeicat** [*options*] +| **wipeicat** [*options*] Description From d1d0f385f7d64f05c70534d3de14c8af1d987287 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Thu, 21 Dec 2023 11:47:07 +0100 Subject: [PATCH 06/43] Add GitHub action to check ReST input files --- .github/workflows/rst-lint.yaml | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 .github/workflows/rst-lint.yaml diff --git a/.github/workflows/rst-lint.yaml b/.github/workflows/rst-lint.yaml new file mode 100644 index 00000000..b5e7c2fe --- /dev/null +++ b/.github/workflows/rst-lint.yaml @@ -0,0 +1,12 @@ +name: Check ReST input files +on: [push, pull_request] +jobs: + doc8: + runs-on: ubuntu-latest + steps: + - name: Check out repository code + uses: actions/checkout@v4 + - name: doc8-check + uses: deep-entertainment/doc8-action@v4 + with: + scanPaths: "doc/src" From cec25957e021d341bff3a460156748c581d77771 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Fri, 22 Dec 2023 11:09:57 +0100 Subject: [PATCH 07/43] Drop version constraint on Sphinx in RtD requirements, e.g. essentially update tha Sphinx version used for building the documentation --- .rtd-require | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/.rtd-require b/.rtd-require index 2de815cd..2972d516 100644 --- a/.rtd-require +++ b/.rtd-require @@ -4,6 +4,4 @@ packaging setuptools setuptools_scm suds -jinja2<3.1 -sphinx>=2,<3 -sphinx-rtd-theme>=0.5,<1 +sphinx_rtd_theme From 4306390a8f27b96a4ae33fba0cb93c2bae7ea271 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Fri, 22 Dec 2023 11:44:28 +0100 Subject: [PATCH 08/43] Add sphinx_copybutton extension --- .rtd-require | 1 + doc/src/conf.py | 1 + 2 files changed, 2 insertions(+) diff --git a/.rtd-require b/.rtd-require index 2972d516..99c132bb 100644 --- a/.rtd-require +++ b/.rtd-require @@ -4,4 +4,5 @@ packaging setuptools setuptools_scm suds +sphinx-copybutton sphinx_rtd_theme diff --git a/doc/src/conf.py b/doc/src/conf.py index a5fca0c9..a75c5c52 100644 --- a/doc/src/conf.py +++ b/doc/src/conf.py @@ -40,6 +40,7 @@ extensions = [ 'sphinx.ext.autodoc', 'sphinx.ext.intersphinx', + 'sphinx_copybutton', ] # Add any paths that contain templates here, relative to this directory. From fafaa64ce9a89f6f50a5b4a0257fe0ed70bf27ef Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Fri, 22 Dec 2023 13:54:12 +0100 Subject: [PATCH 09/43] Add python scripts to contain the interactive code blocks from the tutorial --- doc/tutorial/create.py | 58 ++++++++++++ doc/tutorial/edit.py | 43 +++++++++ doc/tutorial/ids.py | 83 +++++++++++++++++ doc/tutorial/search.py | 200 +++++++++++++++++++++++++++++++++++++++++ 4 files changed, 384 insertions(+) create mode 100644 doc/tutorial/create.py create mode 100644 doc/tutorial/edit.py create mode 100644 doc/tutorial/ids.py create mode 100644 doc/tutorial/search.py diff --git a/doc/tutorial/create.py b/doc/tutorial/create.py new file mode 100644 index 00000000..9a2fc841 --- /dev/null +++ b/doc/tutorial/create.py @@ -0,0 +1,58 @@ +# Tutorial / Creating stuff in the ICAT server +# interactive code blocks + +# Creating simple objects + +f1 = client.new("Facility") +f1.name = "Fac1" +f1.fullName = "Facility 1" +f1.id = client.create(f1) +client.search("SELECT f FROM Facility f") + +# -------------------- + +f2 = client.new("Facility", name="Fac2", fullName="Facility 2") +f2.create() +client.search("SELECT f FROM Facility f") + +# Relationships to other objects + +f1 = client.get("Facility", 1) + +# -------------------- + +pt1 = client.new("ParameterType") +pt1.name = "Test parameter type 1" +pt1.units = "pct" +pt1.applicableToDataset = True +pt1.valueType = "NUMERIC" +pt1.facility = f1 +pt1.create() + +# -------------------- + +pt2 = client.new("ParameterType") +pt2.name = "Test parameter type 2" +pt2.units = "N/A" +pt2.applicableToDataset = True +pt2.valueType = "STRING" +pt2.facility = f1 +for v in ["buono", "brutto", "cattivo"]: + psv = client.new("PermissibleStringValue", value=v) + pt2.permissibleStringValues.append(psv) + +pt2.create() + +# -------------------- + +query = "SELECT pt FROM ParameterType pt INCLUDE pt.facility, pt.permissibleStringValues" +client.search(query) + +# Access rules + +publicTables = [ "Application", "DatafileFormat", "DatasetType", + "Facility", "FacilityCycle", "Instrument", + "InvestigationType", "ParameterType", + "PermissibleStringValue", "SampleType", ] +queries = [ "SELECT o FROM %s o" % t for t in publicTables ] +client.createRules("R", queries) diff --git a/doc/tutorial/edit.py b/doc/tutorial/edit.py new file mode 100644 index 00000000..ca2aacc4 --- /dev/null +++ b/doc/tutorial/edit.py @@ -0,0 +1,43 @@ +# Tutorial / Working with objects in the ICAT server +# interactive code blocks + +client.search("SELECT f FROM Facility f") + +# Editing the attributes of objects + +for facility in client.search("SELECT f FROM Facility f"): + facility.description = "An example facility" + facility.daysUntilRelease = 1826 + facility.fullName = "%s Facility" % facility.name + client.update(facility) + +client.search("SELECT f FROM Facility f") + +# -------------------- + +for facility in client.search("SELECT f FROM Facility f"): + facility.description = None + facility.update() + +client.search("SELECT f FROM Facility f") + +# Copying objects + +fac = client.get("Facility f INCLUDE f.parameterTypes", 1) +print(fac) + +# -------------------- + +facc = fac.copy() +print(facc.name) +print(facc.parameterTypes[0].name) +facc.name = "Fac0" +facc.parameterTypes[0].name = "Test parameter type 0" +print(fac.name) +print(fac.parameterTypes[0].name) + +# -------------------- + +fac.truncateRelations() +print(fac) +print(facc) diff --git a/doc/tutorial/ids.py b/doc/tutorial/ids.py new file mode 100644 index 00000000..84bc12c1 --- /dev/null +++ b/doc/tutorial/ids.py @@ -0,0 +1,83 @@ +# Tutorial / Upload and download files to and from IDS +# interactive code blocks + +client.ids.isReadOnly() + +# Upload files + +users = [("jdoe", "John"), ("nbour", "Nicolas"), ("rbeck", "Rudolph")] +for user, name in users: + with open("greet-%s.txt" % user, "wt") as f: + print("Hello %s!" % name, file=f) + +# -------------------- + +from icat.query import Query +investigation = client.assertedSearch(Query(client, "Investigation", conditions={"name": "= '12100409-ST'"}))[0] +dataset = client.new("Dataset") +dataset.investigation = investigation +dataset.type = client.assertedSearch(Query(client, "DatasetType", conditions={"name": "= 'other'"}))[0] +dataset.name = "greetings" +dataset.complete = False +dataset.create() + +# -------------------- + +df_format = client.assertedSearch(Query(client, "DatafileFormat", conditions={"name": "= 'Text'"}))[0] +for fname in ("greet-jdoe.txt", "greet-nbour.txt", "greet-rbeck.txt"): + datafile = client.new("Datafile", name=fname, dataset=dataset, datafileFormat=df_format) + client.putData(fname, datafile) + +# Download files + +query = Query(client, "Datafile", conditions={"name": "= 'greet-jdoe.txt'", "dataset.name": "= 'greetings'"}) +df = client.assertedSearch(query)[0] +data = client.getData([df]) +type(data) +data.read().decode('utf8') + +# -------------------- + +from io import BytesIO +from zipfile import ZipFile +query = Query(client, "Dataset", conditions={"name": "= 'greetings'"}) +ds = client.assertedSearch(query)[0] +data = client.getData([ds]) +buffer = BytesIO(data.read()) +with ZipFile(buffer) as zipfile: + for f in zipfile.namelist(): + print("file name: %s" % f) + print("content: %r" % zipfile.open(f).read().decode('utf8')) + +# -------------------- + +from icat.ids import DataSelection +selection = DataSelection([ds]) +client.ids.archive(selection) + +# -------------------- + +client.ids.getStatus(selection) + +# -------------------- + +data = client.getData([ds]) + +# -------------------- + +client.ids.getStatus(selection) +data = client.getData([ds]) +len(data.read()) + +# -------------------- + +preparedId = client.prepareData(selection) +preparedId + +# -------------------- + +client.isDataPrepared(preparedId) +data = client.getData(preparedId) +buffer = BytesIO(data.read()) +with ZipFile(buffer) as zipfile: + zipfile.namelist() diff --git a/doc/tutorial/search.py b/doc/tutorial/search.py new file mode 100644 index 00000000..4d2d12f4 --- /dev/null +++ b/doc/tutorial/search.py @@ -0,0 +1,200 @@ +# Tutorial / Working with objects in the ICAT server +# interactive code blocks + +client.search("SELECT f FROM Facility f INCLUDE f.parameterTypes LIMIT 1,1") + +# Building advanced queries + +from icat.query import Query + +# -------------------- + +query = Query(client, "Investigation") +print(query) +client.search(query) + +# -------------------- + +query = Query(client, "Investigation", conditions={"name": "= '10100601-ST'"}) +print(query) +client.search(query) + +# -------------------- + +query = Query(client, "Investigation", conditions={"name": "= '10100601-ST'"}, includes=["datasets"]) +print(query) +client.search(query) + +# -------------------- + +query = Query(client, "Investigation", conditions={"LENGTH(title)": "= 18"}) +print(query) +client.search(query) + +# -------------------- + +conditions = { + "investigation.name": "= '10100601-ST'", + "parameters.type.name": "= 'Magnetic field'", + "parameters.type.units": "= 'T'", + "parameters.numericValue": "> 5.0", +} +query = Query(client, "Dataset", conditions=conditions, includes=["parameters.type"]) +print(query) +client.search(query) + +# -------------------- + +def get_investigation(client, name, visitId=None): + query = Query(client, "Investigation") + query.addConditions({"name": "= '%s'" % name}) + if visitId is not None: + query.addConditions({"visitId": "= '%s'" % visitId}) + print(query) + return client.assertedSearch(query)[0] + +get_investigation(client, "08100122-EF") +get_investigation(client, "12100409-ST", "1.1-P") + +# -------------------- + +conditions = { + "datafileCreateTime": [">= '2012-01-01'", "< '2013-01-01'"] +} +query = Query(client, "Datafile", conditions=conditions) +print(query) +client.search(query) + +# -------------------- + +query = Query(client, "Datafile") +query.addConditions({"datafileCreateTime": ">= '2012-01-01'"}) +query.addConditions({"datafileCreateTime": "< '2013-01-01'"}) +print(query) + +# -------------------- + +query = Query(client, "Dataset", attributes="name") +print(query) +client.search(query) + +# -------------------- + +query = Query(client, "Dataset", attributes=["investigation.name", "name", "complete", "type.name"]) +print(query) +client.search(query) + +# -------------------- + +query = Query(client, "Dataset", aggregate="COUNT") +print(query) +client.search(query) + +# -------------------- + +conditions = { + "dataset.investigation.name": "= '10100601-ST'", + "type.name": "= 'Magnetic field'", + "type.units": "= 'T'", +} +query = Query(client, "DatasetParameter", conditions=conditions, attributes="numericValue") +print(query) +client.search(query) +query.setAggregate("MIN") +print(query) +client.search(query) +query.setAggregate("MAX") +print(query) +client.search(query) +query.setAggregate("AVG") +print(query) +client.search(query) + +# -------------------- + +conditions = { + "datasets.parameters.type.name": "= 'Magnetic field'", + "datasets.parameters.type.units": "= 'T'", +} +query = Query(client, "Investigation", conditions=conditions) +print(query) +client.search(query) + +# -------------------- + +query.setAggregate("DISTINCT") +print(query) +client.search(query) + +# -------------------- + +conditions = { + "datasets.parameters.type.name": "= 'Magnetic field'", + "datasets.parameters.type.units": "= 'T'", +} +query = Query(client, "Investigation", conditions=conditions, aggregate="COUNT") +print(query) +client.search(query) +query.setAggregate("COUNT:DISTINCT") +print(query) +client.search(query) + +# -------------------- + +order = ["type.name", "type.units", ("numericValue", "DESC")] +query = Query(client, "DatasetParameter", includes=["type"], order=order) +print(query) +client.search(query) + +# -------------------- + +query = Query(client, "User", conditions={"fullName": "IS NOT NULL"}, order=[("LENGTH(fullName)", "DESC")]) +print(query) +for user in client.search(query): + print("%d: %s" % (len(user.fullName), user.fullName)) + +# -------------------- + +query = Query(client, "Dataset", order=[("endDate", "DESC")], limit=(2, 1)) +print(query) +client.search(query) + +# Useful search methods + +res = client.search(Query(client, "Facility")) +if not res: + raise RuntimeError("Facility not found") +elif len(res) > 1: + raise RuntimeError("Facility not unique") + +facility = res[0] +facility = client.assertedSearch(Query(client, "Facility"))[0] + +# -------------------- + +for ds in client.searchChunked(Query(client, "Dataset")): + # do something useful with the dataset ds ... + print(ds.name) + +# -------------------- + +def get_dataset(client, inv_name, ds_name, ds_type="raw"): + """Get a dataset in an investigation. + If it already exists, search and return it, create it, if not. + """ + try: + dataset = client.new("Dataset") + query = Query(client, "Investigation", conditions={ + "name": "= '%s'" % inv_name + }) + dataset.investigation = client.assertedSearch(query)[0] + query = Query(client, "DatasetType", conditions={ + "name": "= '%s'" % ds_type + }) + dataset.type = client.assertedSearch(query)[0] + dataset.complete = False + dataset.name = ds_name + dataset.create() + except icat.ICATObjectExistsError: + dataset = client.searchMatching(dataset) + return dataset From 4e3bbe39a03abbde2ec3970285f397b27a34deb7 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Fri, 22 Dec 2023 14:52:12 +0100 Subject: [PATCH 10/43] Fix overly long lines in interactive tutorial examples --- doc/src/tutorial-create.rst | 3 ++- doc/src/tutorial-ids.rst | 19 ++++++++++++++----- doc/src/tutorial-search.rst | 30 +++++++++++++++++++++--------- doc/tutorial/create.py | 3 ++- doc/tutorial/ids.py | 19 ++++++++++++++----- doc/tutorial/search.py | 30 +++++++++++++++++++++--------- 6 files changed, 74 insertions(+), 30 deletions(-) diff --git a/doc/src/tutorial-create.rst b/doc/src/tutorial-create.rst index c6c56ea8..07977db1 100644 --- a/doc/src/tutorial-create.rst +++ b/doc/src/tutorial-create.rst @@ -132,7 +132,8 @@ created together with the ``ParameterType`` object. We can verify this by searching for the newly created objects:: - >>> query = "SELECT pt FROM ParameterType pt INCLUDE pt.facility, pt.permissibleStringValues" + >>> query = ("SELECT pt FROM ParameterType pt " + ... "INCLUDE pt.facility, pt.permissibleStringValues") >>> client.search(query) [(parameterType){ createId = "simple/root" diff --git a/doc/src/tutorial-ids.rst b/doc/src/tutorial-ids.rst index 0ce2748c..c71d221e 100644 --- a/doc/src/tutorial-ids.rst +++ b/doc/src/tutorial-ids.rst @@ -54,10 +54,12 @@ We need a dataset in ICAT that the uploaded files should be put into, so let's create one:: >>> from icat.query import Query - >>> investigation = client.assertedSearch(Query(client, "Investigation", conditions={"name": "= '12100409-ST'"}))[0] + >>> query = Query(client, "Investigation", conditions={"name": "= '12100409-ST'"}) + >>> investigation = client.assertedSearch(query)[0] >>> dataset = client.new("Dataset") >>> dataset.investigation = investigation - >>> dataset.type = client.assertedSearch(Query(client, "DatasetType", conditions={"name": "= 'other'"}))[0] + >>> query = Query(client, "DatasetType", conditions={"name": "= 'other'"}) + >>> dataset.type = client.assertedSearch(query)[0] >>> dataset.name = "greetings" >>> dataset.complete = False >>> dataset.create() @@ -65,9 +67,13 @@ so let's create one:: For each of the files, we create a new datafile object and call the :meth:`~icat.client.Client.putData` method to upload it:: - >>> df_format = client.assertedSearch(Query(client, "DatafileFormat", conditions={"name": "= 'Text'"}))[0] + >>> query = Query(client, "DatafileFormat", conditions={"name": "= 'Text'"}) + >>> df_format = client.assertedSearch(query)[0] >>> for fname in ("greet-jdoe.txt", "greet-nbour.txt", "greet-rbeck.txt"): - ... datafile = client.new("Datafile", name=fname, dataset=dataset, datafileFormat=df_format) + ... datafile = client.new("Datafile", + ... name=fname, + ... dataset=dataset, + ... datafileFormat=df_format) ... client.putData(fname, datafile) ... (datafile){ @@ -125,7 +131,10 @@ Download files We can request a download of a set of data using the :meth:`~icat.client.Client.getData` method:: - >>> query = Query(client, "Datafile", conditions={"name": "= 'greet-jdoe.txt'", "dataset.name": "= 'greetings'"}) + >>> query = Query(client, "Datafile", conditions={ + ... "name": "= 'greet-jdoe.txt'", + ... "dataset.name": "= 'greetings'" + ... }) >>> df = client.assertedSearch(query)[0] >>> data = client.getData([df]) >>> type(data) diff --git a/doc/src/tutorial-search.rst b/doc/src/tutorial-search.rst index ed9843ae..9d1c5fec 100644 --- a/doc/src/tutorial-search.rst +++ b/doc/src/tutorial-search.rst @@ -122,7 +122,8 @@ appropriate condition. The `conditions` argument to :class:`~icat.query.Query` should be a mapping of attribute names to conditions on that attribute:: - >>> query = Query(client, "Investigation", conditions={"name": "= '10100601-ST'"}) + >>> query = Query(client, "Investigation", + ... conditions={"name": "= '10100601-ST'"}) >>> print(query) SELECT o FROM Investigation o WHERE o.name = '10100601-ST' >>> client.search(query) @@ -144,7 +145,9 @@ conditions on that attribute:: We may also include related objects in the search results:: - >>> query = Query(client, "Investigation", conditions={"name": "= '10100601-ST'"}, includes=["datasets"]) + >>> query = Query(client, "Investigation", + ... conditions={"name": "= '10100601-ST'"}, + ... includes=["datasets"]) >>> print(query) SELECT o FROM Investigation o WHERE o.name = '10100601-ST' INCLUDE o.datasets >>> client.search(query) @@ -208,7 +211,8 @@ python-icat supports the use of some JPQL functions when specifying which attribute a condition should be applied to. Consider the following query:: - >>> query = Query(client, "Investigation", conditions={"LENGTH(title)": "= 18"}) + >>> query = Query(client, "Investigation", + ... conditions={"LENGTH(title)": "= 18"}) >>> print(query) SELECT o FROM Investigation o WHERE LENGTH(o.title) = 18 >>> client.search(query) @@ -253,7 +257,8 @@ field larger then 5 Tesla and include its parameters in the result:: ... "parameters.type.units": "= 'T'", ... "parameters.numericValue": "> 5.0", ... } - >>> query = Query(client, "Dataset", conditions=conditions, includes=["parameters.type"]) + >>> query = Query(client, "Dataset", + ... conditions=conditions, includes=["parameters.type"]) >>> print(query) SELECT o FROM Dataset o JOIN o.investigation AS i JOIN o.parameters AS p JOIN p.type AS pt WHERE i.name = '10100601-ST' AND p.numericValue > 5.0 AND pt.name = 'Magnetic field' AND pt.units = 'T' INCLUDE o.parameters AS p, p.type >>> client.search(query) @@ -456,7 +461,9 @@ multiple attributes at once. The result will be a tuple of attribute values rather then a single value for each object found in the query. This requires an ICAT server version 4.11 or newer though:: - >>> query = Query(client, "Dataset", attributes=["investigation.name", "name", "complete", "type.name"]) + >>> query = Query(client, "Dataset", attributes=[ + ... "investigation.name", "name", "complete", "type.name" + ... ]) >>> print(query) SELECT i.name, o.name, o.complete, t.name FROM Dataset o JOIN o.investigation AS i JOIN o.type AS t >>> client.search(query) @@ -485,7 +492,8 @@ average magnetic field applied in the measurements:: ... "type.name": "= 'Magnetic field'", ... "type.units": "= 'T'", ... } - >>> query = Query(client, "DatasetParameter", conditions=conditions, attributes="numericValue") + >>> query = Query(client, "DatasetParameter", + ... conditions=conditions, attributes="numericValue") >>> print(query) SELECT o.numericValue FROM DatasetParameter o JOIN o.dataset AS ds JOIN ds.investigation AS i JOIN o.type AS t WHERE i.name = '10100601-ST' AND t.name = 'Magnetic field' AND t.units = 'T' >>> client.search(query) @@ -578,7 +586,8 @@ make sure not to count the same object more then once:: ... "datasets.parameters.type.name": "= 'Magnetic field'", ... "datasets.parameters.type.units": "= 'T'", ... } - >>> query = Query(client, "Investigation", conditions=conditions, aggregate="COUNT") + >>> query = Query(client, "Investigation", + ... conditions=conditions, aggregate="COUNT") >>> print(query) SELECT COUNT(o) FROM Investigation o JOIN o.datasets AS s1 JOIN s1.parameters AS s2 JOIN s2.type AS s3 WHERE s3.name = 'Magnetic field' AND s3.units = 'T' >>> client.search(query) @@ -761,7 +770,9 @@ in the `order` argument to :class:`~icat.query.Query`. Let's search for user sorted by the length of their name, from longest to shortest:: - >>> query = Query(client, "User", conditions={"fullName": "IS NOT NULL"}, order=[("LENGTH(fullName)", "DESC")]) + >>> query = Query(client, "User", conditions={ + ... "fullName": "IS NOT NULL" + ... }, order=[("LENGTH(fullName)", "DESC")]) >>> print(query) SELECT o FROM User o WHERE o.fullName IS NOT NULL ORDER BY LENGTH(o.fullName) DESC >>> for user in client.search(query): @@ -782,7 +793,8 @@ shortest:: We may limit the number of returned items. Search for the third to last dataset to have been finished:: - >>> query = Query(client, "Dataset", order=[("endDate", "DESC")], limit=(2, 1)) + >>> query = Query(client, "Dataset", + ... order=[("endDate", "DESC")], limit=(2, 1)) >>> print(query) SELECT o FROM Dataset o ORDER BY o.endDate DESC LIMIT 2, 1 >>> client.search(query) diff --git a/doc/tutorial/create.py b/doc/tutorial/create.py index 9a2fc841..c6ad80f0 100644 --- a/doc/tutorial/create.py +++ b/doc/tutorial/create.py @@ -45,7 +45,8 @@ # -------------------- -query = "SELECT pt FROM ParameterType pt INCLUDE pt.facility, pt.permissibleStringValues" +query = ("SELECT pt FROM ParameterType pt " + "INCLUDE pt.facility, pt.permissibleStringValues") client.search(query) # Access rules diff --git a/doc/tutorial/ids.py b/doc/tutorial/ids.py index 84bc12c1..f3156039 100644 --- a/doc/tutorial/ids.py +++ b/doc/tutorial/ids.py @@ -13,24 +13,33 @@ # -------------------- from icat.query import Query -investigation = client.assertedSearch(Query(client, "Investigation", conditions={"name": "= '12100409-ST'"}))[0] +query = Query(client, "Investigation", conditions={"name": "= '12100409-ST'"}) +investigation = client.assertedSearch(query)[0] dataset = client.new("Dataset") dataset.investigation = investigation -dataset.type = client.assertedSearch(Query(client, "DatasetType", conditions={"name": "= 'other'"}))[0] +query = Query(client, "DatasetType", conditions={"name": "= 'other'"}) +dataset.type = client.assertedSearch(query)[0] dataset.name = "greetings" dataset.complete = False dataset.create() # -------------------- -df_format = client.assertedSearch(Query(client, "DatafileFormat", conditions={"name": "= 'Text'"}))[0] +query = Query(client, "DatafileFormat", conditions={"name": "= 'Text'"}) +df_format = client.assertedSearch(query)[0] for fname in ("greet-jdoe.txt", "greet-nbour.txt", "greet-rbeck.txt"): - datafile = client.new("Datafile", name=fname, dataset=dataset, datafileFormat=df_format) + datafile = client.new("Datafile", + name=fname, + dataset=dataset, + datafileFormat=df_format) client.putData(fname, datafile) # Download files -query = Query(client, "Datafile", conditions={"name": "= 'greet-jdoe.txt'", "dataset.name": "= 'greetings'"}) +query = Query(client, "Datafile", conditions={ + "name": "= 'greet-jdoe.txt'", + "dataset.name": "= 'greetings'" +}) df = client.assertedSearch(query)[0] data = client.getData([df]) type(data) diff --git a/doc/tutorial/search.py b/doc/tutorial/search.py index 4d2d12f4..a697581e 100644 --- a/doc/tutorial/search.py +++ b/doc/tutorial/search.py @@ -15,19 +15,23 @@ # -------------------- -query = Query(client, "Investigation", conditions={"name": "= '10100601-ST'"}) +query = Query(client, "Investigation", + conditions={"name": "= '10100601-ST'"}) print(query) client.search(query) # -------------------- -query = Query(client, "Investigation", conditions={"name": "= '10100601-ST'"}, includes=["datasets"]) +query = Query(client, "Investigation", + conditions={"name": "= '10100601-ST'"}, + includes=["datasets"]) print(query) client.search(query) # -------------------- -query = Query(client, "Investigation", conditions={"LENGTH(title)": "= 18"}) +query = Query(client, "Investigation", + conditions={"LENGTH(title)": "= 18"}) print(query) client.search(query) @@ -39,7 +43,8 @@ "parameters.type.units": "= 'T'", "parameters.numericValue": "> 5.0", } -query = Query(client, "Dataset", conditions=conditions, includes=["parameters.type"]) +query = Query(client, "Dataset", + conditions=conditions, includes=["parameters.type"]) print(query) client.search(query) @@ -80,7 +85,9 @@ def get_investigation(client, name, visitId=None): # -------------------- -query = Query(client, "Dataset", attributes=["investigation.name", "name", "complete", "type.name"]) +query = Query(client, "Dataset", attributes=[ + "investigation.name", "name", "complete", "type.name" +]) print(query) client.search(query) @@ -97,7 +104,8 @@ def get_investigation(client, name, visitId=None): "type.name": "= 'Magnetic field'", "type.units": "= 'T'", } -query = Query(client, "DatasetParameter", conditions=conditions, attributes="numericValue") +query = Query(client, "DatasetParameter", + conditions=conditions, attributes="numericValue") print(query) client.search(query) query.setAggregate("MIN") @@ -132,7 +140,8 @@ def get_investigation(client, name, visitId=None): "datasets.parameters.type.name": "= 'Magnetic field'", "datasets.parameters.type.units": "= 'T'", } -query = Query(client, "Investigation", conditions=conditions, aggregate="COUNT") +query = Query(client, "Investigation", + conditions=conditions, aggregate="COUNT") print(query) client.search(query) query.setAggregate("COUNT:DISTINCT") @@ -148,14 +157,17 @@ def get_investigation(client, name, visitId=None): # -------------------- -query = Query(client, "User", conditions={"fullName": "IS NOT NULL"}, order=[("LENGTH(fullName)", "DESC")]) +query = Query(client, "User", conditions={ + "fullName": "IS NOT NULL" +}, order=[("LENGTH(fullName)", "DESC")]) print(query) for user in client.search(query): print("%d: %s" % (len(user.fullName), user.fullName)) # -------------------- -query = Query(client, "Dataset", order=[("endDate", "DESC")], limit=(2, 1)) +query = Query(client, "Dataset", + order=[("endDate", "DESC")], limit=(2, 1)) print(query) client.search(query) From c941a9d89ede2f115af91fd99c646e4ac508d066 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Tue, 2 Jan 2024 15:38:10 +0100 Subject: [PATCH 11/43] Restrict running ReST lint on push to branches develop and master --- .github/workflows/rst-lint.yaml | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/.github/workflows/rst-lint.yaml b/.github/workflows/rst-lint.yaml index b5e7c2fe..b9b239f7 100644 --- a/.github/workflows/rst-lint.yaml +++ b/.github/workflows/rst-lint.yaml @@ -1,5 +1,10 @@ name: Check ReST input files -on: [push, pull_request] +on: + push: + branches: + - develop + - master + pull_request: jobs: doc8: runs-on: ubuntu-latest From 16443e88aa6c3e9b4319c5f67243ad31d42577eb Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Tue, 2 Jan 2024 15:59:14 +0100 Subject: [PATCH 12/43] Documentation fix: move a version changed note from module icat.ingest to class icat.ingest.IngestReader --- doc/src/ingest.rst | 4 ---- src/icat/ingest.py | 4 ++++ 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/src/ingest.rst b/doc/src/ingest.rst index e9abda8e..72eeb07a 100644 --- a/doc/src/ingest.rst +++ b/doc/src/ingest.rst @@ -55,10 +55,6 @@ the ``Dataset``. .. versionchanged:: 1.2.0 add version 1.1 of the ingest file format, including references to samples -.. versionchanged:: 1.3.0 - drop class attribute :attr:`~icat.ingest.IngestReader.XSLT_name` in - favour of :attr:`~icat.ingest.IngestReader.XSLT_Map`. - .. autoclass:: icat.ingest.IngestReader :members: :show-inheritance: diff --git a/src/icat/ingest.py b/src/icat/ingest.py index 57f15648..6c725a0f 100644 --- a/src/icat/ingest.py +++ b/src/icat/ingest.py @@ -37,6 +37,10 @@ class IngestReader(XMLDumpFileReader): :type investigation: :class:`icat.entity.Entity` :raise icat.exception.InvalidIngestFileError: if the input in metadata is not valid. + + .. versionchanged:: 1.3.0 + drop class attribute :attr:`~icat.ingest.IngestReader.XSLT_name` + in favour of :attr:`~icat.ingest.IngestReader.XSLT_Map`. """ SchemaDir = Path("/usr/share/icat") From 8446200535f437327cfea057c01e7cc1d302c326 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Tue, 2 Jan 2024 16:14:12 +0100 Subject: [PATCH 13/43] Minor doc config fixes --- doc/src/conf.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/src/conf.py b/doc/src/conf.py index 38c5c319..2f880389 100644 --- a/doc/src/conf.py +++ b/doc/src/conf.py @@ -12,10 +12,10 @@ maindir = Path(__file__).resolve().parent.parent.parent buildlib = maindir / "build" / "lib" sys.path[0] = str(buildlib) +sys.dont_write_bytecode = True import icat._meta - # -- Project information ----------------------------------------------------- project = 'python-icat' @@ -58,7 +58,7 @@ # # This is also used if you do content translation via gettext catalogs. # Usually you set "language" from the command line for these cases. -language = None +language = 'en' # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. From af28f5de6ade53d8252c1c9021f4ded4ef56ee6f Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Wed, 3 Jan 2024 11:28:37 +0100 Subject: [PATCH 14/43] - Add a new section on file formats to the documentation - Move the subsection on ICAT data files from the icat.dumpfile module reference into the new file formats section - Add a subsection on Metadata ingest files (only a section heading by now) --- doc/src/dumpfile.rst | 64 ++----------------------------------- doc/src/file-icatdata.rst | 58 +++++++++++++++++++++++++++++++++ doc/src/file-icatingest.rst | 6 ++++ doc/src/fileformats.rst | 11 +++++++ doc/src/index.rst | 1 + 5 files changed, 78 insertions(+), 62 deletions(-) create mode 100644 doc/src/file-icatdata.rst create mode 100644 doc/src/file-icatingest.rst create mode 100644 doc/src/fileformats.rst diff --git a/doc/src/dumpfile.rst b/doc/src/dumpfile.rst index 1fc44d6e..d87e8c9f 100644 --- a/doc/src/dumpfile.rst +++ b/doc/src/dumpfile.rst @@ -6,8 +6,8 @@ This module provides the base classes :class:`icat.dumpfile.DumpFileReader` and :class:`icat.dumpfile.DumpFileWriter` that define the API and the -logic for reading and writing ICAT data files. The actual work is -done in file format specific backend modules that should provide +logic for reading and writing :ref:`ICAT-data-files`. The actual work +is done in file format specific backend modules that should provide subclasses that must implement the abstract methods. .. autoclass:: icat.dumpfile.DumpFileReader @@ -23,63 +23,3 @@ subclasses that must implement the abstract methods. .. autofunction:: icat.dumpfile.register_backend .. autofunction:: icat.dumpfile.open_dumpfile - - -.. _ICAT-data-files: - -ICAT data files ---------------- - -ICAT data files provide a way to serialize ICAT content to a flat -file. This section describes the logical structure of ICAT data -files. The actual file format depends on the backend, python-icat -provides backends using XML and YAML. - -There is a one-to-one correspondence of the objects in the data -file and the corresponding object in ICAT according to the ICAT -schema, including all attributes and relations to other objects. -Special unique keys are used to encode the relations. -:meth:`icat.entity.Entity.getUniqueKey` may be used to get such a -unique key for an entity object and -:meth:`icat.client.Client.searchUniqueKey` may be used to search an -object by its key. Otherwise these keys should be considered as -opaque ids. - -Data files are partitioned in chunks. This is done to avoid having -the whole file, e.g. the complete inventory of the ICAT, at once in -memory. The problem is that objects contain references to other -objects (e.g. Datafiles refer to Datasets, the latter refer to -Investigations, and so forth). We keep an index of the objects in -order to resolve these references. But there is a memory versus time -tradeoff: we cannot keep all the objects in the index, that would -again mean the complete inventory of the ICAT. And we can't know -beforehand which object is going to be referenced later on, so we -don't know which one to keep and which one to discard from the index. -Fortunately we can query objects we discarded once back from the ICAT -server. But this is expensive. So the strategy is as follows: keep -all objects from the current chunk in the index and discard the -complete index each time a chunk has been processed. This will work -fine if objects are mostly referencing other objects from the same -chunk and only a few references go across chunk boundaries. - -Therefore, we want these chunks to be small enough to fit into memory, -but at the same time large enough to keep as many relations between -objects as possible local in a chunk. It is in the responsibility of -the writer of the data file to create the chunks in this manner. - -The objects that get written to the data file and how this file is -organized is controlled by lists of ICAT search expressions, see -:meth:`icat.dumpfile.DumpFileWriter.writeobjs`. There is some degree -of flexibility: an object may include related objects in an -one-to-many relation, just by including them in the search expression. -In this case, these related objects should not have a search -expression on their own again. For instance, the search expression -for Grouping may include UserGroup. The UserGroups will then be -embedded in their respective grouping in the data file. There should -not be a search expression for UserGroup then. - -Objects related in a many-to-one relation must always be included in -the search expression. This is also true if the object is -indirectly related to one of the included objects. In this case, -only a reference to the related object will be included in the data -file. The related object must have its own list entry. diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst new file mode 100644 index 00000000..b8d93ed1 --- /dev/null +++ b/doc/src/file-icatdata.rst @@ -0,0 +1,58 @@ +.. _ICAT-data-files: + +ICAT data files +=============== + +ICAT data files provide a way to serialize ICAT content to a flat +file. This section describes the logical structure of ICAT data +files. The actual file format depends on the backend, python-icat +provides backends using XML and YAML. + +There is a one-to-one correspondence of the objects in the data +file and the corresponding object in ICAT according to the ICAT +schema, including all attributes and relations to other objects. +Special unique keys are used to encode the relations. +:meth:`icat.entity.Entity.getUniqueKey` may be used to get such a +unique key for an entity object and +:meth:`icat.client.Client.searchUniqueKey` may be used to search an +object by its key. Otherwise these keys should be considered as +opaque ids. + +Data files are partitioned in chunks. This is done to avoid having +the whole file, e.g. the complete inventory of the ICAT, at once in +memory. The problem is that objects contain references to other +objects (e.g. Datafiles refer to Datasets, the latter refer to +Investigations, and so forth). We keep an index of the objects in +order to resolve these references. But there is a memory versus time +tradeoff: we cannot keep all the objects in the index, that would +again mean the complete inventory of the ICAT. And we can't know +beforehand which object is going to be referenced later on, so we +don't know which one to keep and which one to discard from the index. +Fortunately we can query objects we discarded once back from the ICAT +server. But this is expensive. So the strategy is as follows: keep +all objects from the current chunk in the index and discard the +complete index each time a chunk has been processed. This will work +fine if objects are mostly referencing other objects from the same +chunk and only a few references go across chunk boundaries. + +Therefore, we want these chunks to be small enough to fit into memory, +but at the same time large enough to keep as many relations between +objects as possible local in a chunk. It is in the responsibility of +the writer of the data file to create the chunks in this manner. + +The objects that get written to the data file and how this file is +organized is controlled by lists of ICAT search expressions, see +:meth:`icat.dumpfile.DumpFileWriter.writeobjs`. There is some degree +of flexibility: an object may include related objects in an +one-to-many relation, just by including them in the search expression. +In this case, these related objects should not have a search +expression on their own again. For instance, the search expression +for Grouping may include UserGroup. The UserGroups will then be +embedded in their respective grouping in the data file. There should +not be a search expression for UserGroup then. + +Objects related in a many-to-one relation must always be included in +the search expression. This is also true if the object is +indirectly related to one of the included objects. In this case, +only a reference to the related object will be included in the data +file. The related object must have its own list entry. diff --git a/doc/src/file-icatingest.rst b/doc/src/file-icatingest.rst new file mode 100644 index 00000000..04954679 --- /dev/null +++ b/doc/src/file-icatingest.rst @@ -0,0 +1,6 @@ +.. _ICAT-ingest-files: + +Metadata ingest files +===================== + + diff --git a/doc/src/fileformats.rst b/doc/src/fileformats.rst new file mode 100644 index 00000000..c90eaec1 --- /dev/null +++ b/doc/src/fileformats.rst @@ -0,0 +1,11 @@ +File formats +============ + +Some components of python-icat read input files or write output files. +This section describes the file formats being used. + +.. toctree:: + :maxdepth: 1 + + file-icatdata + file-icatingest diff --git a/doc/src/index.rst b/doc/src/index.rst index 1fdc3c09..a3d947c0 100644 --- a/doc/src/index.rst +++ b/doc/src/index.rst @@ -38,6 +38,7 @@ Parts of the documentation tutorial moduleref scripts + fileformats known-issues changelog From 3b9367ece06dbe4c540f7d6ba5e6abe5b07966ed Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Wed, 3 Jan 2024 12:24:06 +0100 Subject: [PATCH 15/43] Review introduction of ICAT data files section --- doc/src/file-icatdata.rst | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst index b8d93ed1..2dbb00c1 100644 --- a/doc/src/file-icatdata.rst +++ b/doc/src/file-icatdata.rst @@ -4,9 +4,16 @@ ICAT data files =============== ICAT data files provide a way to serialize ICAT content to a flat -file. This section describes the logical structure of ICAT data -files. The actual file format depends on the backend, python-icat -provides backends using XML and YAML. +file. These files are read by the :ref:`icatingest` and written by +the :ref:`icatdump` command line scripts respectively. The program +logic for reading and writing the files is provided by the +:mod:`icat.dumpfile` module. + +The actual file format depends on the version of the ICAT schema and +on the backend: python-icat provides backends using XML and YAML. + +Logical structure of ICAT data files +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There is a one-to-one correspondence of the objects in the data file and the corresponding object in ICAT according to the ICAT From 33f8650e84c06ee8bab79a7e80317d7584b5d8e8 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Wed, 3 Jan 2024 13:50:40 +0100 Subject: [PATCH 16/43] Some formulation review to the subsection on the structure of ICAT data files --- doc/src/file-icatdata.rst | 51 ++++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 22 deletions(-) diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst index 2dbb00c1..06d8f70c 100644 --- a/doc/src/file-icatdata.rst +++ b/doc/src/file-icatdata.rst @@ -29,18 +29,19 @@ Data files are partitioned in chunks. This is done to avoid having the whole file, e.g. the complete inventory of the ICAT, at once in memory. The problem is that objects contain references to other objects (e.g. Datafiles refer to Datasets, the latter refer to -Investigations, and so forth). We keep an index of the objects in -order to resolve these references. But there is a memory versus time -tradeoff: we cannot keep all the objects in the index, that would -again mean the complete inventory of the ICAT. And we can't know -beforehand which object is going to be referenced later on, so we -don't know which one to keep and which one to discard from the index. -Fortunately we can query objects we discarded once back from the ICAT -server. But this is expensive. So the strategy is as follows: keep -all objects from the current chunk in the index and discard the -complete index each time a chunk has been processed. This will work -fine if objects are mostly referencing other objects from the same -chunk and only a few references go across chunk boundaries. +Investigations, and so forth). We keep an index of the objects as +cache in order to resolve these references. But there is a memory +versus time tradeoff: we cannot keep all the objects in the index, +that would again mean the complete inventory of the ICAT. And we +can't know beforehand which object is going to be referenced later on, +so we don't know which one to keep and which one to discard from the +index. Fortunately we can query objects that we discarded once back +from the ICAT server. But this is expensive. So the strategy is as +follows: keep all objects from the current chunk in the index and +discard the complete index each time a chunk has been +processed. [#dc]_ This will work fine if objects are mostly +referencing other objects from the same chunk and only a few +references go across chunk boundaries. Therefore, we want these chunks to be small enough to fit into memory, but at the same time large enough to keep as many relations between @@ -48,18 +49,24 @@ objects as possible local in a chunk. It is in the responsibility of the writer of the data file to create the chunks in this manner. The objects that get written to the data file and how this file is -organized is controlled by lists of ICAT search expressions, see -:meth:`icat.dumpfile.DumpFileWriter.writeobjs`. There is some degree -of flexibility: an object may include related objects in an -one-to-many relation, just by including them in the search expression. -In this case, these related objects should not have a search -expression on their own again. For instance, the search expression -for Grouping may include UserGroup. The UserGroups will then be -embedded in their respective grouping in the data file. There should -not be a search expression for UserGroup then. +organized is controlled by lists of ICAT search expressions or entity +objects, see :meth:`icat.dumpfile.DumpFileWriter.writeobjs`. There is +some degree of flexibility: an object may include related objects in +an one-to-many relation. In this case, these related objects should +not be added on their own again. For instance, you may write User, +Grouping, and UserGroup as separate objects into the file. In this +case, the UserGroup entries must properly reference related User and +Grouping. Alternatively you may include the UserGroups in the +corresponding Grouping objects. In this case, you must not add the +UserGroups again on their own. Objects related in a many-to-one relation must always be included in the search expression. This is also true if the object is indirectly related to one of the included objects. In this case, only a reference to the related object will be included in the data -file. The related object must have its own list entry. +file. The related object must have its own entry. + + +.. [#dc] There is one exception: DataCollections don't have a + uniqueness constraint and can't reliably be searched by + attributes. They are always kept in the index. From da2415f3097965660069cf55b3558750b595c515 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Wed, 3 Jan 2024 15:10:23 +0100 Subject: [PATCH 17/43] Add simple ICAT data file examples --- doc/examples/icatdump-simple-1.xml | 103 ++++++++++++++++++++++++++ doc/examples/icatdump-simple-1.yaml | 71 ++++++++++++++++++ doc/examples/icatdump-simple-2.xml | 108 ++++++++++++++++++++++++++++ doc/examples/icatdump-simple-2.yaml | 79 ++++++++++++++++++++ 4 files changed, 361 insertions(+) create mode 100644 doc/examples/icatdump-simple-1.xml create mode 100644 doc/examples/icatdump-simple-1.yaml create mode 100644 doc/examples/icatdump-simple-2.xml create mode 100644 doc/examples/icatdump-simple-2.yaml diff --git a/doc/examples/icatdump-simple-1.xml b/doc/examples/icatdump-simple-1.xml new file mode 100644 index 00000000..b2c23038 --- /dev/null +++ b/doc/examples/icatdump-simple-1.xml @@ -0,0 +1,103 @@ + + + + 2024-01-03T13:21:15+00:00 + https://icat.example.com:8181/ICATService/ICAT?wsdl + 6.0.0 + icatdump (python-icat 1.2.0) + + + + Goethe University Frankfurt, Faculty of Philosophy and History + ahau@example.org + Hau + Arnold Hau + Arnold + db/ahau + 0000-0002-3263 + + + Goethe University Frankfurt, Faculty of Philosophy and History + ahau@example.org + Hau + Arnold Hau + Arnold + db/ahau + 0000-0002-3263 + + + Université Paul-Valéry Montpellier 3 + jbotu@example.org + Botul + Jean-Baptiste Botul + Jean-Baptiste + db/jbotu + 0000-0002-3264 + + + jdoe@example.org + Doe + John Doe + John + db/jdoe + + + University of Nancago + nbour@example.org + Bourbaki + Nicolas Bourbaki + Nicolas + db/nbour + 0000-0002-3266 + + + investigation_10100601-ST_owner + + + + + + investigation_10100601-ST_reader + + + + + + + + + + + + investigation_10100601-ST_writer + + + + + + + + DOI:00.0815/inv-00601 + 2010-10-12T15:00:00+00:00 + 4 + 127125 + 10100601-ST + 2010-09-30T10:27:24+00:00 + Ni-Mn-Ga flat cone + 1.1-N + + + owner + + + + reader + + + + writer + + + + + diff --git a/doc/examples/icatdump-simple-1.yaml b/doc/examples/icatdump-simple-1.yaml new file mode 100644 index 00000000..26648f3b --- /dev/null +++ b/doc/examples/icatdump-simple-1.yaml @@ -0,0 +1,71 @@ +%YAML 1.1 +# Date: Wed, 03 Jan 2024 13:24:51 +0000 +# Service: https://icat.example.com:8181/ICATService/ICAT?wsdl +# ICAT-API: 6.0.0 +# Generator: icatdump (python-icat 1.2.0) +--- +grouping: + Grouping_name-investigation=5F10100601=2DST=5Fowner: + name: investigation_10100601-ST_owner + userGroups: + - user: User_name-db=2Fahau + Grouping_name-investigation=5F10100601=2DST=5Freader: + name: investigation_10100601-ST_reader + userGroups: + - user: User_name-db=2Fjbotu + - user: User_name-db=2Fjdoe + - user: User_name-db=2Fnbour + Grouping_name-investigation=5F10100601=2DST=5Fwriter: + name: investigation_10100601-ST_writer + userGroups: + - user: User_name-db=2Fahau +user: + User_name-db=2Fahau: + affiliation: Goethe University Frankfurt, Faculty of Philosophy and History + email: ahau@example.org + familyName: Hau + fullName: Arnold Hau + givenName: Arnold + name: db/ahau + orcidId: 0000-0002-3263 + User_name-db=2Fjbotu: + affiliation: "Universit\xE9 Paul-Val\xE9ry Montpellier 3" + email: jbotu@example.org + familyName: Botul + fullName: Jean-Baptiste Botul + givenName: Jean-Baptiste + name: db/jbotu + orcidId: 0000-0002-3264 + User_name-db=2Fjdoe: + email: jdoe@example.org + familyName: Doe + fullName: John Doe + givenName: John + name: db/jdoe + User_name-db=2Fnbour: + affiliation: University of Nancago + email: nbour@example.org + familyName: Bourbaki + fullName: Nicolas Bourbaki + givenName: Nicolas + name: db/nbour + orcidId: 0000-0002-3266 +--- +investigation: + Investigation_facility-(name-ESNF)_name-10100601=2DST_visitId-1=2E1=2DN: + doi: DOI:00.0815/inv-00601 + endDate: '2010-10-12T15:00:00+00:00' + facility: Facility_name-ESNF + fileCount: 4 + fileSize: 127125 + investigationGroups: + - grouping: Grouping_name-investigation=5F10100601=2DST=5Fowner + role: owner + - grouping: Grouping_name-investigation=5F10100601=2DST=5Freader + role: reader + - grouping: Grouping_name-investigation=5F10100601=2DST=5Fwriter + role: writer + name: 10100601-ST + startDate: '2010-09-30T10:27:24+00:00' + title: Ni-Mn-Ga flat cone + visitId: 1.1-N diff --git a/doc/examples/icatdump-simple-2.xml b/doc/examples/icatdump-simple-2.xml new file mode 100644 index 00000000..1c309602 --- /dev/null +++ b/doc/examples/icatdump-simple-2.xml @@ -0,0 +1,108 @@ + + + + 2024-01-03T13:27:37+00:00 + https://icat.example.com:8181/ICATService/ICAT?wsdl + 6.0.0 + icatdump (python-icat 1.2.0) + + + + Goethe University Frankfurt, Faculty of Philosophy and History + ahau@example.org + Hau + Arnold Hau + Arnold + db/ahau + 0000-0002-3263 + + + Goethe University Frankfurt, Faculty of Philosophy and History + ahau@example.org + Hau + Arnold Hau + Arnold + db/ahau + 0000-0002-3263 + + + Université Paul-Valéry Montpellier 3 + jbotu@example.org + Botul + Jean-Baptiste Botul + Jean-Baptiste + db/jbotu + 0000-0002-3264 + + + jdoe@example.org + Doe + John Doe + John + db/jdoe + + + University of Nancago + nbour@example.org + Bourbaki + Nicolas Bourbaki + Nicolas + db/nbour + 0000-0002-3266 + + + investigation_10100601-ST_owner + + + investigation_10100601-ST_reader + + + investigation_10100601-ST_writer + + + + + + + + + + + + + + + + + + + + + + + + + DOI:00.0815/inv-00601 + 2010-10-12T15:00:00+00:00 + 4 + 127125 + 10100601-ST + 2010-09-30T10:27:24+00:00 + Ni-Mn-Ga flat cone + 1.1-N + + + owner + + + + reader + + + + writer + + + + + diff --git a/doc/examples/icatdump-simple-2.yaml b/doc/examples/icatdump-simple-2.yaml new file mode 100644 index 00000000..79e4a296 --- /dev/null +++ b/doc/examples/icatdump-simple-2.yaml @@ -0,0 +1,79 @@ +%YAML 1.1 +# Date: Wed, 03 Jan 2024 13:27:52 +0000 +# Service: https://icat.example.com:8181/ICATService/ICAT?wsdl +# ICAT-API: 6.0.0 +# Generator: icatdump (python-icat 1.2.0) +--- +grouping: + Grouping_name-investigation=5F10100601=2DST=5Fowner: + name: investigation_10100601-ST_owner + Grouping_name-investigation=5F10100601=2DST=5Freader: + name: investigation_10100601-ST_reader + Grouping_name-investigation=5F10100601=2DST=5Fwriter: + name: investigation_10100601-ST_writer +user: + User_name-db=2Fahau: + affiliation: Goethe University Frankfurt, Faculty of Philosophy and History + email: ahau@example.org + familyName: Hau + fullName: Arnold Hau + givenName: Arnold + name: db/ahau + orcidId: 0000-0002-3263 + User_name-db=2Fjbotu: + affiliation: "Universit\xE9 Paul-Val\xE9ry Montpellier 3" + email: jbotu@example.org + familyName: Botul + fullName: Jean-Baptiste Botul + givenName: Jean-Baptiste + name: db/jbotu + orcidId: 0000-0002-3264 + User_name-db=2Fjdoe: + email: jdoe@example.org + familyName: Doe + fullName: John Doe + givenName: John + name: db/jdoe + User_name-db=2Fnbour: + affiliation: University of Nancago + email: nbour@example.org + familyName: Bourbaki + fullName: Nicolas Bourbaki + givenName: Nicolas + name: db/nbour + orcidId: 0000-0002-3266 +userGroup: + UserGroup_user-(name-db=2Fahau)_grouping-(name-investigation=5F10100601=2DST=5Fowner): + grouping: Grouping_name-investigation=5F10100601=2DST=5Fowner + user: User_name-db=2Fahau + UserGroup_user-(name-db=2Fahau)_grouping-(name-investigation=5F10100601=2DST=5Fwriter): + grouping: Grouping_name-investigation=5F10100601=2DST=5Fwriter + user: User_name-db=2Fahau + UserGroup_user-(name-db=2Fjbotu)_grouping-(name-investigation=5F10100601=2DST=5Freader): + grouping: Grouping_name-investigation=5F10100601=2DST=5Freader + user: User_name-db=2Fjbotu + UserGroup_user-(name-db=2Fjdoe)_grouping-(name-investigation=5F10100601=2DST=5Freader): + grouping: Grouping_name-investigation=5F10100601=2DST=5Freader + user: User_name-db=2Fjdoe + UserGroup_user-(name-db=2Fnbour)_grouping-(name-investigation=5F10100601=2DST=5Freader): + grouping: Grouping_name-investigation=5F10100601=2DST=5Freader + user: User_name-db=2Fnbour +--- +investigation: + Investigation_facility-(name-ESNF)_name-10100601=2DST_visitId-1=2E1=2DN: + doi: DOI:00.0815/inv-00601 + endDate: '2010-10-12T15:00:00+00:00' + facility: Facility_name-ESNF + fileCount: 4 + fileSize: 127125 + investigationGroups: + - grouping: Grouping_name-investigation=5F10100601=2DST=5Fowner + role: owner + - grouping: Grouping_name-investigation=5F10100601=2DST=5Freader + role: reader + - grouping: Grouping_name-investigation=5F10100601=2DST=5Fwriter + role: writer + name: 10100601-ST + startDate: '2010-09-30T10:27:24+00:00' + title: Ni-Mn-Ga flat cone + visitId: 1.1-N From af0ee8dc1fd9b401b73ca6644863da8c1fb2315d Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Wed, 3 Jan 2024 17:15:03 +0100 Subject: [PATCH 18/43] Add subsections on ICAT data XML files and on ICAT data YAML files including the example data files, but no other content yet --- doc/src/file-icatdata.rst | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst index 06d8f70c..a0126383 100644 --- a/doc/src/file-icatdata.rst +++ b/doc/src/file-icatdata.rst @@ -66,6 +66,30 @@ indirectly related to one of the included objects. In this case, only a reference to the related object will be included in the data file. The related object must have its own entry. +ICAT data XML files +~~~~~~~~~~~~~~~~~~~ + +In this section we describe the ICAT data file format using the XML +backend. + +.. literalinclude:: ../examples/icatdump-simple-1.xml + :language: xml + +.. literalinclude:: ../examples/icatdump-simple-2.xml + :language: xml + +ICAT data YAML files +~~~~~~~~~~~~~~~~~~~~ + +In this section we describe the ICAT data file format using the YAML +backend. + +.. literalinclude:: ../examples/icatdump-simple-1.yaml + :language: yaml + +.. literalinclude:: ../examples/icatdump-simple-2.yaml + :language: yaml + .. [#dc] There is one exception: DataCollections don't have a uniqueness constraint and can't reliably be searched by From 4ad8e9e9dd3b48907dab282c46ecf037f55cb2d6 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Wed, 3 Jan 2024 19:09:50 +0100 Subject: [PATCH 19/43] Add the text content for the subsection on ICAT data XML files --- doc/src/file-icatdata.rst | 73 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 72 insertions(+), 1 deletion(-) diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst index a0126383..b856e625 100644 --- a/doc/src/file-icatdata.rst +++ b/doc/src/file-icatdata.rst @@ -70,13 +70,83 @@ ICAT data XML files ~~~~~~~~~~~~~~~~~~~ In this section we describe the ICAT data file format using the XML -backend. +backend. Consider the following example: .. literalinclude:: ../examples/icatdump-simple-1.xml :language: xml +The root element of ICAT data XML files is ``icatdata``. It may +optionally have one ``head`` subelement and one or more ``data`` +subelements. + +The ``head`` element will be ignored by :ref:`icatingest`. It serves +to provide some information on the context of the creation of the data +file, which may be useful for debugging in case of issues. + +The content of each ``data`` element is one chunk according to the +logical structure explained above. The present example contains two +chunks. Each element within the ``data`` element corresponds to an +ICAT object according to the ICAT schema. In the present example, the +first chunk contains five User objects and three Grouping objects. +The second chunk only contains one Investigation. + +These object elements should have an ``id`` attribute that may be used +to reference the object in relations later on. The ``id`` value has +no meaning other than this file internal referencing between objects. +The subelements of the object elements correspond to the object's +attributes and relations in the ICAT schema. All many-to-one +relations must be provided and reference already existing objects, +e.g. they must either already have existed before starting the +ingestion or appear earlier in the ICAT data file than the referencing +object, so that they will be created earlier. The related object may +either be referenced by id using the special attribute ``ref`` or by +the related object's attribute values, using XML attributes of the +same name. In the latter case, the attribute values must uniquely +define the related object. + +The object elements may include one-to-many relations. In this case, +the related objects will be created along with the parent in one +single cascading call. Alternatively, these related objects may be +added separately as subelements of the ``data`` element later in the +file. In the present example, the Grouping object include their +related UserGroup objects. Note that these UserGroups include their +relation to the User. The User object is referenced by their +respective id in the ``ref`` attribute. But the UserGroups do not +include their relation with Grouping. That relationship is implied by +the parent relation of the object in the file. + +In a similar way, the Investigation in the second chunk includes +related InvestigationGroups that will be created along with the +Investigation. The InvestigationGroup objects include a reference to +the corresponding Grouping. Note that these references go across +chunk boundaries. The index that caches the object ids to resolve +object relations from the first chunk that did contain the ids of the +Groupings will already have been discarded from memeory when the +second chunk is read. But the references use the key that can be +passed to :meth:`icat.client.Client.searchUniqueKey` to search these +Groupings from ICAT. + +Finally note the the file format also depends on the ICAT schema +version: the present example can only be ingested into ICAT server 5.0 +or newer, because the attributes fileCount and fileSize have been +added to Investigation in this version. With older ICAT versions, it +will fail because the attributes are not defined. + +Consider a second example, it defines a subset of the same content +as the previous example: + .. literalinclude:: ../examples/icatdump-simple-2.xml :language: xml + :lines: 1-9,28-52,56-58,70-82,108 + +The difference is that we now add the Usergroup objects separately in +direct subelements of ``data`` instead of including them in the +related Grouping objects. + +You will find more extensive examples in the source distribution of +python-icat. The distribution also provides XML Schema Definition +files for the ICAT data XML file format corresponding to various ICAT +schema versions. ICAT data YAML files ~~~~~~~~~~~~~~~~~~~~ @@ -89,6 +159,7 @@ backend. .. literalinclude:: ../examples/icatdump-simple-2.yaml :language: yaml + :lines: 1-7,10-11,14,23-45,52-60 .. [#dc] There is one exception: DataCollections don't have a From ea4b9d60f6035f4a9604dc2579de23232a9b2e73 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Sat, 6 Jan 2024 18:59:53 +0100 Subject: [PATCH 20/43] Typo --- doc/src/scripts.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/src/scripts.rst b/doc/src/scripts.rst index 82f57d75..f944efde 100644 --- a/doc/src/scripts.rst +++ b/doc/src/scripts.rst @@ -2,7 +2,7 @@ Command line scripts ==================== This section provides a reference for the command line scripts that -are alongside with python-icat. +are installed alongside with python-icat. .. toctree:: :maxdepth: 1 From e62dc5ae52a8f4a835e96fad9a4b1976181b19e1 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Tue, 16 Jan 2024 12:36:27 +0100 Subject: [PATCH 21/43] - Review documentation Section "ICAT data XML files", adding more inline examples - Drop icatdump-simple-2.xml example, rename icatdump-simple-1.xml to icatdump-simple.xml --- doc/examples/icatdump-simple-2.xml | 108 ------------------ ...tdump-simple-1.xml => icatdump-simple.xml} | 0 doc/src/file-icatdata.rst | 108 +++++++++++++----- 3 files changed, 80 insertions(+), 136 deletions(-) delete mode 100644 doc/examples/icatdump-simple-2.xml rename doc/examples/{icatdump-simple-1.xml => icatdump-simple.xml} (100%) diff --git a/doc/examples/icatdump-simple-2.xml b/doc/examples/icatdump-simple-2.xml deleted file mode 100644 index 1c309602..00000000 --- a/doc/examples/icatdump-simple-2.xml +++ /dev/null @@ -1,108 +0,0 @@ - - - - 2024-01-03T13:27:37+00:00 - https://icat.example.com:8181/ICATService/ICAT?wsdl - 6.0.0 - icatdump (python-icat 1.2.0) - - - - Goethe University Frankfurt, Faculty of Philosophy and History - ahau@example.org - Hau - Arnold Hau - Arnold - db/ahau - 0000-0002-3263 - - - Goethe University Frankfurt, Faculty of Philosophy and History - ahau@example.org - Hau - Arnold Hau - Arnold - db/ahau - 0000-0002-3263 - - - Université Paul-Valéry Montpellier 3 - jbotu@example.org - Botul - Jean-Baptiste Botul - Jean-Baptiste - db/jbotu - 0000-0002-3264 - - - jdoe@example.org - Doe - John Doe - John - db/jdoe - - - University of Nancago - nbour@example.org - Bourbaki - Nicolas Bourbaki - Nicolas - db/nbour - 0000-0002-3266 - - - investigation_10100601-ST_owner - - - investigation_10100601-ST_reader - - - investigation_10100601-ST_writer - - - - - - - - - - - - - - - - - - - - - - - - - DOI:00.0815/inv-00601 - 2010-10-12T15:00:00+00:00 - 4 - 127125 - 10100601-ST - 2010-09-30T10:27:24+00:00 - Ni-Mn-Ga flat cone - 1.1-N - - - owner - - - - reader - - - - writer - - - - - diff --git a/doc/examples/icatdump-simple-1.xml b/doc/examples/icatdump-simple.xml similarity index 100% rename from doc/examples/icatdump-simple-1.xml rename to doc/examples/icatdump-simple.xml diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst index b856e625..fa82f96d 100644 --- a/doc/src/file-icatdata.rst +++ b/doc/src/file-icatdata.rst @@ -72,7 +72,7 @@ ICAT data XML files In this section we describe the ICAT data file format using the XML backend. Consider the following example: -.. literalinclude:: ../examples/icatdump-simple-1.xml +.. literalinclude:: ../examples/icatdump-simple.xml :language: xml The root element of ICAT data XML files is ``icatdata``. It may @@ -88,7 +88,8 @@ logical structure explained above. The present example contains two chunks. Each element within the ``data`` element corresponds to an ICAT object according to the ICAT schema. In the present example, the first chunk contains five User objects and three Grouping objects. -The second chunk only contains one Investigation. +The Groupings include related UserGroups. The second chunk only +contains one Investigation, including related investigationGroups. These object elements should have an ``id`` attribute that may be used to reference the object in relations later on. The ``id`` value has @@ -104,27 +105,87 @@ the related object's attribute values, using XML attributes of the same name. In the latter case, the attribute values must uniquely define the related object. +In the present example, consider the first grouping: + +.. code-block:: XML + + + investigation_10100601-ST_owner + + + + + +It includes a related userGroup object that in turn references a +related User. This User is referenced in the ``ref`` attribute using +a key defined in the User's ``id`` attribute earlier in the file. +Another example is how the Investigation references its Facility: + +.. code-block:: XML + + + + + + + +The Facility is not defined in the data file. It is assumed to exist +in ICAT before ingesting the file. In this case, it must be +referenced by the unique key that could have been obtained by calling +``facility.getUniqueKey()``. Alternatively, the Facility could have +been referenced by attribute as in: + +.. code-block:: XML + + + + + + + + The object elements may include one-to-many relations. In this case, the related objects will be created along with the parent in one -single cascading call. Alternatively, these related objects may be -added separately as subelements of the ``data`` element later in the -file. In the present example, the Grouping object include their -related UserGroup objects. Note that these UserGroups include their -relation to the User. The User object is referenced by their -respective id in the ``ref`` attribute. But the UserGroups do not -include their relation with Grouping. That relationship is implied by -the parent relation of the object in the file. - -In a similar way, the Investigation in the second chunk includes +single cascading call. In the present example, the Grouping objects +include their related UserGroup objects. Note that these UserGroups +include their relation to the User, but not their relation with +Grouping. The latter relationship is implied by the parent relation +of the object in the file. + +As an alternative, the Usergroups could have been added to the file as +separate objects as direct subelements of ``data`` as in: + +.. code-block:: XML + + + + Goethe University Frankfurt, Faculty of Philosophy and History + ahau@example.org + Hau + Arnold Hau + Arnold + db/ahau + 0000-0002-3263 + + + investigation_10100601-ST_owner + + + + + + + +The Investigation in the second chunk in the present example includes related InvestigationGroups that will be created along with the Investigation. The InvestigationGroup objects include a reference to the corresponding Grouping. Note that these references go across chunk boundaries. The index that caches the object ids to resolve object relations from the first chunk that did contain the ids of the -Groupings will already have been discarded from memeory when the -second chunk is read. But the references use the key that can be -passed to :meth:`icat.client.Client.searchUniqueKey` to search these -Groupings from ICAT. +Groupings will already have been discarded from memory when the second +chunk is read. But the references use the key that can be passed to +:meth:`icat.client.Client.searchUniqueKey` to search these Groupings +from ICAT. Finally note the the file format also depends on the ICAT schema version: the present example can only be ingested into ICAT server 5.0 @@ -132,21 +193,12 @@ or newer, because the attributes fileCount and fileSize have been added to Investigation in this version. With older ICAT versions, it will fail because the attributes are not defined. -Consider a second example, it defines a subset of the same content -as the previous example: - -.. literalinclude:: ../examples/icatdump-simple-2.xml - :language: xml - :lines: 1-9,28-52,56-58,70-82,108 - -The difference is that we now add the Usergroup objects separately in -direct subelements of ``data`` instead of including them in the -related Grouping objects. - You will find more extensive examples in the source distribution of python-icat. The distribution also provides XML Schema Definition files for the ICAT data XML file format corresponding to various ICAT -schema versions. +schema versions. Note the these XML Schema Definition +files are provided for reference only. The :ref:`icatingest` script +does not validate its input. ICAT data YAML files ~~~~~~~~~~~~~~~~~~~~ From a472fc795e82bc9901d89f869529b063b494242b Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Tue, 16 Jan 2024 14:24:44 +0100 Subject: [PATCH 22/43] Fix duplicate user entry in example ICAT data file --- doc/examples/icatdump-simple.xml | 9 --------- doc/src/file-icatdata.rst | 2 +- 2 files changed, 1 insertion(+), 10 deletions(-) diff --git a/doc/examples/icatdump-simple.xml b/doc/examples/icatdump-simple.xml index b2c23038..63dc689d 100644 --- a/doc/examples/icatdump-simple.xml +++ b/doc/examples/icatdump-simple.xml @@ -7,15 +7,6 @@ icatdump (python-icat 1.2.0) - - Goethe University Frankfurt, Faculty of Philosophy and History - ahau@example.org - Hau - Arnold Hau - Arnold - db/ahau - 0000-0002-3263 - Goethe University Frankfurt, Faculty of Philosophy and History ahau@example.org diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst index fa82f96d..84d587b2 100644 --- a/doc/src/file-icatdata.rst +++ b/doc/src/file-icatdata.rst @@ -87,7 +87,7 @@ The content of each ``data`` element is one chunk according to the logical structure explained above. The present example contains two chunks. Each element within the ``data`` element corresponds to an ICAT object according to the ICAT schema. In the present example, the -first chunk contains five User objects and three Grouping objects. +first chunk contains four User objects and three Grouping objects. The Groupings include related UserGroups. The second chunk only contains one Investigation, including related investigationGroups. From b3e30520d4dfce0bc92d028cd4f2afea31e5217e Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Tue, 16 Jan 2024 14:55:33 +0100 Subject: [PATCH 23/43] - Review documentation Section "ICAT data YAML files" - Drop icatdump-simple-2.yaml example, rename icatdump-simple-1.yaml to icatdump-simple.yaml --- doc/examples/icatdump-simple-2.yaml | 79 ------------------- ...ump-simple-1.yaml => icatdump-simple.yaml} | 0 doc/src/file-icatdata.rst | 73 +++++++++++++++-- 3 files changed, 67 insertions(+), 85 deletions(-) delete mode 100644 doc/examples/icatdump-simple-2.yaml rename doc/examples/{icatdump-simple-1.yaml => icatdump-simple.yaml} (100%) diff --git a/doc/examples/icatdump-simple-2.yaml b/doc/examples/icatdump-simple-2.yaml deleted file mode 100644 index 79e4a296..00000000 --- a/doc/examples/icatdump-simple-2.yaml +++ /dev/null @@ -1,79 +0,0 @@ -%YAML 1.1 -# Date: Wed, 03 Jan 2024 13:27:52 +0000 -# Service: https://icat.example.com:8181/ICATService/ICAT?wsdl -# ICAT-API: 6.0.0 -# Generator: icatdump (python-icat 1.2.0) ---- -grouping: - Grouping_name-investigation=5F10100601=2DST=5Fowner: - name: investigation_10100601-ST_owner - Grouping_name-investigation=5F10100601=2DST=5Freader: - name: investigation_10100601-ST_reader - Grouping_name-investigation=5F10100601=2DST=5Fwriter: - name: investigation_10100601-ST_writer -user: - User_name-db=2Fahau: - affiliation: Goethe University Frankfurt, Faculty of Philosophy and History - email: ahau@example.org - familyName: Hau - fullName: Arnold Hau - givenName: Arnold - name: db/ahau - orcidId: 0000-0002-3263 - User_name-db=2Fjbotu: - affiliation: "Universit\xE9 Paul-Val\xE9ry Montpellier 3" - email: jbotu@example.org - familyName: Botul - fullName: Jean-Baptiste Botul - givenName: Jean-Baptiste - name: db/jbotu - orcidId: 0000-0002-3264 - User_name-db=2Fjdoe: - email: jdoe@example.org - familyName: Doe - fullName: John Doe - givenName: John - name: db/jdoe - User_name-db=2Fnbour: - affiliation: University of Nancago - email: nbour@example.org - familyName: Bourbaki - fullName: Nicolas Bourbaki - givenName: Nicolas - name: db/nbour - orcidId: 0000-0002-3266 -userGroup: - UserGroup_user-(name-db=2Fahau)_grouping-(name-investigation=5F10100601=2DST=5Fowner): - grouping: Grouping_name-investigation=5F10100601=2DST=5Fowner - user: User_name-db=2Fahau - UserGroup_user-(name-db=2Fahau)_grouping-(name-investigation=5F10100601=2DST=5Fwriter): - grouping: Grouping_name-investigation=5F10100601=2DST=5Fwriter - user: User_name-db=2Fahau - UserGroup_user-(name-db=2Fjbotu)_grouping-(name-investigation=5F10100601=2DST=5Freader): - grouping: Grouping_name-investigation=5F10100601=2DST=5Freader - user: User_name-db=2Fjbotu - UserGroup_user-(name-db=2Fjdoe)_grouping-(name-investigation=5F10100601=2DST=5Freader): - grouping: Grouping_name-investigation=5F10100601=2DST=5Freader - user: User_name-db=2Fjdoe - UserGroup_user-(name-db=2Fnbour)_grouping-(name-investigation=5F10100601=2DST=5Freader): - grouping: Grouping_name-investigation=5F10100601=2DST=5Freader - user: User_name-db=2Fnbour ---- -investigation: - Investigation_facility-(name-ESNF)_name-10100601=2DST_visitId-1=2E1=2DN: - doi: DOI:00.0815/inv-00601 - endDate: '2010-10-12T15:00:00+00:00' - facility: Facility_name-ESNF - fileCount: 4 - fileSize: 127125 - investigationGroups: - - grouping: Grouping_name-investigation=5F10100601=2DST=5Fowner - role: owner - - grouping: Grouping_name-investigation=5F10100601=2DST=5Freader - role: reader - - grouping: Grouping_name-investigation=5F10100601=2DST=5Fwriter - role: writer - name: 10100601-ST - startDate: '2010-09-30T10:27:24+00:00' - title: Ni-Mn-Ga flat cone - visitId: 1.1-N diff --git a/doc/examples/icatdump-simple-1.yaml b/doc/examples/icatdump-simple.yaml similarity index 100% rename from doc/examples/icatdump-simple-1.yaml rename to doc/examples/icatdump-simple.yaml diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst index 84d587b2..a568969e 100644 --- a/doc/src/file-icatdata.rst +++ b/doc/src/file-icatdata.rst @@ -143,7 +143,6 @@ been referenced by attribute as in: - The object elements may include one-to-many relations. In this case, the related objects will be created along with the parent in one single cascading call. In the present example, the Grouping objects @@ -204,14 +203,76 @@ ICAT data YAML files ~~~~~~~~~~~~~~~~~~~~ In this section we describe the ICAT data file format using the YAML -backend. +backend. Consider the following example, it corresponds to the same +ICAT content as the XML example above: -.. literalinclude:: ../examples/icatdump-simple-1.yaml +.. literalinclude:: ../examples/icatdump-simple.yaml :language: yaml -.. literalinclude:: ../examples/icatdump-simple-2.yaml - :language: yaml - :lines: 1-7,10-11,14,23-45,52-60 +ICAT data YAML files start with a head consisting of a few comment +lines, followed by one or more YAML documents. YAML documents are +separated by a line containing only ``---``. The comments in the head +provide some information on the context of the creation of the data +file, which may be useful for debugging in case of issues. + +Each YAML document defines one chunk of data according to the logical +structure explained above. It consists of a mapping having the name +of entity types in the ICAT schema as keys. The values are in turn +mappings that map object ids as key to ICAT object definitions as +value. The object id may be used to reference that object in +relations later on. It has no meaning other than this file internal +referencing between objects. In the present example, the first chunk +contains four User objects and three Grouping objects. The Groupings +include related UserGroups. The second chunk only contains one +Investigation, including related investigationGroups. + +Each of the ICAT object definitions corresponds to an object in the +ICAT schema. It is again a mapping with the object's attribute and +relation names as keys and corresponding values. All many-to-one +relations must be provided and reference existing objects, e.g. they +must either already have existed before starting the ingestion or +appear in the same or an earlier YAML document in the ICAT data file. +The values of many-to-one relations are the related object's id, +either as defined in the same YAML document or the unique key as +returned by :meth:`icat.entity.Entity.getUniqueKey`. + +The object definitions may include one-to-many relations. In this +case, the value for the relation name is a list of object definitions +for the related objects. These related objects will be created along +with the parent in one single cascading call. In the present example, +the Grouping objects include their related UserGroup objects. Note +that these UserGroups include their relation to the User, but not +their relation with Grouping. The latter relationship is implied by +the parent relation of the object in the file. + +As an alternative, in the present example, the Usergroups could have +been added to the file as separate objects as in: + +.. code-block:: YAML + + --- + grouping: + Grouping_name-investigation=5F10100601=2DST=5Fowner: + name: investigation_10100601-ST_owner + user: + User_name-db=2Fahau: + affiliation: Goethe University Frankfurt, Faculty of Philosophy and History + email: ahau@example.org + familyName: Hau + fullName: Arnold Hau + givenName: Arnold + name: db/ahau + orcidId: 0000-0002-3263 + userGroup: + UserGroup_user-(name-db=2Fahau)_grouping-(name-investigation=5F10100601=2DST=5Fowner): + grouping: Grouping_name-investigation=5F10100601=2DST=5Fowner + user: User_name-db=2Fahau + --- + +Note that the entries in the mappings have no inherent order. The +:ref:`icatingest` script uses a predefined order to read the ICAT +entity types in order to make sure that referenced objects are created +before any object that may reference them. .. [#dc] There is one exception: DataCollections don't have a From acaff9d4ffe9987bc0324f2e4e01459dc5bd547c Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Tue, 16 Jan 2024 21:21:21 +0100 Subject: [PATCH 24/43] Review Section ICAT data files with respect to object references, add a Subsection References to ICAT objects and unique keys --- doc/src/file-icatdata.rst | 108 ++++++++++++++++++++------------------ 1 file changed, 57 insertions(+), 51 deletions(-) diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst index a568969e..6b8730cc 100644 --- a/doc/src/file-icatdata.rst +++ b/doc/src/file-icatdata.rst @@ -18,27 +18,19 @@ Logical structure of ICAT data files There is a one-to-one correspondence of the objects in the data file and the corresponding object in ICAT according to the ICAT schema, including all attributes and relations to other objects. -Special unique keys are used to encode the relations. -:meth:`icat.entity.Entity.getUniqueKey` may be used to get such a -unique key for an entity object and -:meth:`icat.client.Client.searchUniqueKey` may be used to search an -object by its key. Otherwise these keys should be considered as -opaque ids. Data files are partitioned in chunks. This is done to avoid having the whole file, e.g. the complete inventory of the ICAT, at once in memory. The problem is that objects contain references to other -objects (e.g. Datafiles refer to Datasets, the latter refer to -Investigations, and so forth). We keep an index of the objects as +objects, e.g. Datafiles refer to Datasets, the latter refer to +Investigations, and so forth. We keep an index of the objects as cache in order to resolve these references. But there is a memory -versus time tradeoff: we cannot keep all the objects in the index, -that would again mean the complete inventory of the ICAT. And we -can't know beforehand which object is going to be referenced later on, -so we don't know which one to keep and which one to discard from the -index. Fortunately we can query objects that we discarded once back -from the ICAT server. But this is expensive. So the strategy is as -follows: keep all objects from the current chunk in the index and -discard the complete index each time a chunk has been +versus time tradeoff: in order to avoid the index to grow beyond +bounds, objects need to be discarded from the index from time to time. +References to objects that can not be resolved from the index need to +be searched from the ICAT server, which of course is expensive. So +the strategy is as follows: keep all objects from the current chunk in +the index and discard the complete index each time a chunk has been processed. [#dc]_ This will work fine if objects are mostly referencing other objects from the same chunk and only a few references go across chunk boundaries. @@ -66,6 +58,26 @@ indirectly related to one of the included objects. In this case, only a reference to the related object will be included in the data file. The related object must have its own entry. +References to ICAT objects and unique keys +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +References to related objects are encoded in ICAT data files by +reference keys. There are two kinds of those keys: local keys and +unique keys. + +When an ICAT object is defined in the file, it generally defines a +local key at the same time. Local keys are stored in the object index +and may be used to reference this object from other obejcts in the +same data chunk. Unique keys can be obtained from an object by +calling :meth:`icat.entity.Entity.getUniqueKey`. An object can be +searched by its unique key from the ICAT server by calling +:meth:`icat.client.Client.searchUniqueKey`. As a result, it is +possible to reference an object by its unique key even if the +reference is not in the object index. All references that go across +chunk boundaries must use unique keys. [#dc]_ + +Reference keys should be considered as opaque ids. + ICAT data XML files ~~~~~~~~~~~~~~~~~~~ @@ -91,19 +103,17 @@ first chunk contains four User objects and three Grouping objects. The Groupings include related UserGroups. The second chunk only contains one Investigation, including related investigationGroups. -These object elements should have an ``id`` attribute that may be used -to reference the object in relations later on. The ``id`` value has -no meaning other than this file internal referencing between objects. -The subelements of the object elements correspond to the object's -attributes and relations in the ICAT schema. All many-to-one -relations must be provided and reference already existing objects, -e.g. they must either already have existed before starting the -ingestion or appear earlier in the ICAT data file than the referencing -object, so that they will be created earlier. The related object may -either be referenced by id using the special attribute ``ref`` or by -the related object's attribute values, using XML attributes of the -same name. In the latter case, the attribute values must uniquely -define the related object. +These object elements may have an ``id`` attribute that defines a +local key to reference the object later on. The subelements of the +object elements correspond to the object's attributes and relations in +the ICAT schema. All many-to-one relations must be provided and +reference already existing objects, e.g. they must either already have +existed before starting the ingestion or appear earlier in the ICAT +data file than the referencing object, so that they will be created +earlier. The related object may either be referenced by reference key +using the ``ref`` attribute or by the related object's attribute +values, using XML attributes of the same name. In the latter case, +the attribute values must uniquely define the related object. In the present example, consider the first grouping: @@ -118,8 +128,9 @@ In the present example, consider the first grouping: It includes a related userGroup object that in turn references a related User. This User is referenced in the ``ref`` attribute using -a key defined in the User's ``id`` attribute earlier in the file. -Another example is how the Investigation references its Facility: +a local key defined in the User's ``id`` attribute earlier in the +file. Another example is how the Investigation references its +Facility: .. code-block:: XML @@ -131,8 +142,7 @@ Another example is how the Investigation references its Facility: The Facility is not defined in the data file. It is assumed to exist in ICAT before ingesting the file. In this case, it must be -referenced by the unique key that could have been obtained by calling -``facility.getUniqueKey()``. Alternatively, the Facility could have +referenced by its unique key. Alternatively, the Facility could have been referenced by attribute as in: .. code-block:: XML @@ -179,14 +189,10 @@ The Investigation in the second chunk in the present example includes related InvestigationGroups that will be created along with the Investigation. The InvestigationGroup objects include a reference to the corresponding Grouping. Note that these references go across -chunk boundaries. The index that caches the object ids to resolve -object relations from the first chunk that did contain the ids of the -Groupings will already have been discarded from memory when the second -chunk is read. But the references use the key that can be passed to -:meth:`icat.client.Client.searchUniqueKey` to search these Groupings -from ICAT. - -Finally note the the file format also depends on the ICAT schema +chunk boundaries. Thus, unique keys for the Groupings need to be used +here. + +Finally note that the file format also depends on the ICAT schema version: the present example can only be ingested into ICAT server 5.0 or newer, because the attributes fileCount and fileSize have been added to Investigation in this version. With older ICAT versions, it @@ -219,12 +225,11 @@ Each YAML document defines one chunk of data according to the logical structure explained above. It consists of a mapping having the name of entity types in the ICAT schema as keys. The values are in turn mappings that map object ids as key to ICAT object definitions as -value. The object id may be used to reference that object in -relations later on. It has no meaning other than this file internal -referencing between objects. In the present example, the first chunk -contains four User objects and three Grouping objects. The Groupings -include related UserGroups. The second chunk only contains one -Investigation, including related investigationGroups. +value. These object ids define local keys that may be used to +reference the respective object later on. In the present example, the +first chunk contains four User objects and three Grouping objects. +The Groupings include related UserGroups. The second chunk only +contains one Investigation, including related investigationGroups. Each of the ICAT object definitions corresponds to an object in the ICAT schema. It is again a mapping with the object's attribute and @@ -232,9 +237,8 @@ relation names as keys and corresponding values. All many-to-one relations must be provided and reference existing objects, e.g. they must either already have existed before starting the ingestion or appear in the same or an earlier YAML document in the ICAT data file. -The values of many-to-one relations are the related object's id, -either as defined in the same YAML document or the unique key as -returned by :meth:`icat.entity.Entity.getUniqueKey`. +The values of many-to-one relations are reference keys, either local +keys defined in the same YAML document or unique keys. The object definitions may include one-to-many relations. In this case, the value for the relation name is a list of object definitions @@ -277,4 +281,6 @@ before any object that may reference them. .. [#dc] There is one exception: DataCollections don't have a uniqueness constraint and can't reliably be searched by - attributes. They are always kept in the index. + attributes. Therefore local keys for DataCollections are + always kept in the object index and may be used to reference + them across chunk boundaries. From 9c5085c63a4b92493a98cfc05adc0362c2f77968 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Thu, 18 Jan 2024 18:51:14 +0100 Subject: [PATCH 25/43] Rework Section ICAT data files once again --- doc/src/file-icatdata.rst | 189 +++++++++++++++++++------------------- 1 file changed, 97 insertions(+), 92 deletions(-) diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst index 6b8730cc..57183153 100644 --- a/doc/src/file-icatdata.rst +++ b/doc/src/file-icatdata.rst @@ -15,20 +15,16 @@ on the backend: python-icat provides backends using XML and YAML. Logical structure of ICAT data files ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -There is a one-to-one correspondence of the objects in the data -file and the corresponding object in ICAT according to the ICAT -schema, including all attributes and relations to other objects. - Data files are partitioned in chunks. This is done to avoid having the whole file, e.g. the complete inventory of the ICAT, at once in memory. The problem is that objects contain references to other objects, e.g. Datafiles refer to Datasets, the latter refer to Investigations, and so forth. We keep an index of the objects as -cache in order to resolve these references. But there is a memory +a cache in order to resolve these references. But there is a memory versus time tradeoff: in order to avoid the index to grow beyond bounds, objects need to be discarded from the index from time to time. References to objects that can not be resolved from the index need to -be searched from the ICAT server, which of course is expensive. So +be searched from the ICAT server, which is of course expensive. So the strategy is as follows: keep all objects from the current chunk in the index and discard the complete index each time a chunk has been processed. [#dc]_ This will work fine if objects are mostly @@ -40,37 +36,40 @@ but at the same time large enough to keep as many relations between objects as possible local in a chunk. It is in the responsibility of the writer of the data file to create the chunks in this manner. -The objects that get written to the data file and how this file is -organized is controlled by lists of ICAT search expressions or entity -objects, see :meth:`icat.dumpfile.DumpFileWriter.writeobjs`. There is -some degree of flexibility: an object may include related objects in -an one-to-many relation. In this case, these related objects should -not be added on their own again. For instance, you may write User, -Grouping, and UserGroup as separate objects into the file. In this -case, the UserGroup entries must properly reference related User and -Grouping. Alternatively you may include the UserGroups in the -corresponding Grouping objects. In this case, you must not add the -UserGroups again on their own. - -Objects related in a many-to-one relation must always be included in -the search expression. This is also true if the object is -indirectly related to one of the included objects. In this case, -only a reference to the related object will be included in the data -file. The related object must have its own entry. +The data chunks contain ICAT object definitions, e.g. serializations +of individual ICAT objects, including all attribute values and +many-to-one relations. The many-to-one relations are provided as +references to other objects that must exist in the ICAT server at the +moment that this object definition is read. + +There is some degree of flexibility with respect to related objects in +one-to-many relations: object definitions for these related objects +may be included in the object definitions of the parent object. When +the parent is read, these related objects will be created along with +the parent in one single cascading call. Thus, the related objects +must not be included again as a separate object in the ICAT data file. +For instance, an ICAT data file may include User, Grouping, and +UserGroup as separate objects. In this case, the UserGroup entries +must properly reference User and Grouping as their related objects. +Alternatively the file may only contain User and Grouping objects, +with the UserGroups being included into the object definition of the +corresponding Grouping objects. References to ICAT objects and unique keys ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ References to related objects are encoded in ICAT data files by -reference keys. There are two kinds of those keys: local keys and -unique keys. +reference keys. There are two kinds of those keys, local keys and +unique keys: When an ICAT object is defined in the file, it generally defines a local key at the same time. Local keys are stored in the object index and may be used to reference this object from other obejcts in the -same data chunk. Unique keys can be obtained from an object by -calling :meth:`icat.entity.Entity.getUniqueKey`. An object can be -searched by its unique key from the ICAT server by calling +same data chunk. + +Unique keys can be obtained from an object by calling +:meth:`icat.entity.Entity.getUniqueKey`. An object can be searched by +its unique key from the ICAT server by calling :meth:`icat.client.Client.searchUniqueKey`. As a result, it is possible to reference an object by its unique key even if the reference is not in the object index. All references that go across @@ -95,42 +94,80 @@ The ``head`` element will be ignored by :ref:`icatingest`. It serves to provide some information on the context of the creation of the data file, which may be useful for debugging in case of issues. -The content of each ``data`` element is one chunk according to the -logical structure explained above. The present example contains two -chunks. Each element within the ``data`` element corresponds to an -ICAT object according to the ICAT schema. In the present example, the -first chunk contains four User objects and three Grouping objects. -The Groupings include related UserGroups. The second chunk only -contains one Investigation, including related investigationGroups. +The content of each ``data`` element is one chunk, its subelements are +the ICAT object definitions according to the logical structure +explained above. The present example contains two chunks: the first +chunk contains four User objects and three Grouping objects. The +Groupings include related UserGroups. The second chunk only contains +one Investigation, including related InvestigationGroups. + +The object elements may have an ``id`` attribute that define a local +key to reference the object later on. The subelements of the object +elements correspond to the object's attributes and relations in the +ICAT schema. All many-to-one relations must be provided and reference +already existing objects, e.g. they must either already have existed +before starting the ingestion or appear earlier in the ICAT data file +than the referencing object, so that they will be created earlier. +The related object may either be referenced by reference key using the +``ref`` attribute or by the related object's attribute values, using +XML attributes of the same name. In the latter case, the attribute +values must uniquely define the related object. + +Consider a simplified version of the first chunk from the present +example, defining only one User, Grouping and UserGroup respectively: -These object elements may have an ``id`` attribute that defines a -local key to reference the object later on. The subelements of the -object elements correspond to the object's attributes and relations in -the ICAT schema. All many-to-one relations must be provided and -reference already existing objects, e.g. they must either already have -existed before starting the ingestion or appear earlier in the ICAT -data file than the referencing object, so that they will be created -earlier. The related object may either be referenced by reference key -using the ``ref`` attribute or by the related object's attribute -values, using XML attributes of the same name. In the latter case, -the attribute values must uniquely define the related object. +.. code-block:: XML -In the present example, consider the first grouping: + + + Goethe University Frankfurt, Faculty of Philosophy and History + ahau@example.org + Hau + Arnold Hau + Arnold + db/ahau + 0000-0002-3263 + + + investigation_10100601-ST_owner + + + + + + +The Grouping includes the related UserGroup object that in turn +references the related User. This User is referenced in the ``ref`` +attribute using a local key defined in the User's ``id`` attribute. +Note that the UserGroup does not include its relation with Grouping. +The latter relationship is implied by the parent relation of the +object in the file. + +As an alternative, the Usergroup could have been added to the file as +separate object as direct subelement of ``data``: .. code-block:: XML - - investigation_10100601-ST_owner - + + + Goethe University Frankfurt, Faculty of Philosophy and History + ahau@example.org + Hau + Arnold Hau + Arnold + db/ahau + 0000-0002-3263 + + + investigation_10100601-ST_owner + + + - - + + -It includes a related userGroup object that in turn references a -related User. This User is referenced in the ``ref`` attribute using -a local key defined in the User's ``id`` attribute earlier in the -file. Another example is how the Investigation references its -Facility: +Another example is how the Investigation references its Facility: .. code-block:: XML @@ -153,44 +190,12 @@ been referenced by attribute as in: -The object elements may include one-to-many relations. In this case, -the related objects will be created along with the parent in one -single cascading call. In the present example, the Grouping objects -include their related UserGroup objects. Note that these UserGroups -include their relation to the User, but not their relation with -Grouping. The latter relationship is implied by the parent relation -of the object in the file. - -As an alternative, the Usergroups could have been added to the file as -separate objects as direct subelements of ``data`` as in: - -.. code-block:: XML - - - - Goethe University Frankfurt, Faculty of Philosophy and History - ahau@example.org - Hau - Arnold Hau - Arnold - db/ahau - 0000-0002-3263 - - - investigation_10100601-ST_owner - - - - - - - The Investigation in the second chunk in the present example includes related InvestigationGroups that will be created along with the Investigation. The InvestigationGroup objects include a reference to -the corresponding Grouping. Note that these references go across -chunk boundaries. Thus, unique keys for the Groupings need to be used -here. +the corresponding Grouping respectively. Note that these references +go across chunk boundaries. Thus, unique keys for the Groupings need +to be used here. Finally note that the file format also depends on the ICAT schema version: the present example can only be ingested into ICAT server 5.0 From 0fc8b0030dd2b4e9ccea39cc68c0fdac4fcf1e87 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Thu, 18 Jan 2024 20:13:40 +0100 Subject: [PATCH 26/43] Drop most of the docstring from module icat.dumpfile as this is now much better explained in the online documentation --- src/icat/dumpfile.py | 40 ---------------------------------------- 1 file changed, 40 deletions(-) diff --git a/src/icat/dumpfile.py b/src/icat/dumpfile.py index 099f4364..c5c5a002 100644 --- a/src/icat/dumpfile.py +++ b/src/icat/dumpfile.py @@ -5,46 +5,6 @@ writing ICAT data files. The actual work is done in file format specific modules that should provide subclasses that must implement the abstract methods. - -Data files are partitioned in chunks. This is done to avoid having -the whole file, e.g. the complete inventory of the ICAT, at once in -memory. The problem is that objects contain references to other -objects (e.g. Datafiles refer to Datasets, the latter refer to -Investigations, and so forth). We keep an index of the objects in -order to resolve these references. But there is a memory versus time -tradeoff: we cannot keep all the objects in the index, that would -again mean the complete inventory of the ICAT. And we can't know -beforehand which object is going to be referenced later on, so we -don't know which one to keep and which one to discard from the index. -Fortunately we can query objects we discarded once back from the ICAT -server with :meth:`icat.client.Client.searchUniqueKey`. But this is -expensive. So the strategy is as follows: keep all objects from the -current chunk in the index and discard the complete index each time a -chunk has been processed. This will work fine if objects are mostly -referencing other objects from the same chunk and only a few -references go across chunk boundaries. - -Therefore, we want these chunks to be small enough to fit into memory, -but at the same time large enough to keep as many relations between -objects as possible local in a chunk. It is in the responsibility of -the writer of the data file to create the chunks in this manner. - -The objects that get written to the data file and how this file is -organized is controlled by lists of ICAT search expressions, see -:meth:`icat.dumpfile.DumpFileWriter.writeobjs`. There is some degree -of flexibility: an object may include related objects in an -one-to-many relation, just by including them in the search expression. -In this case, these related objects should not have a search -expression on their own again. For instance, the search expression -for Grouping may include UserGroup. The UserGroups will then be -embedded in their respective grouping in the data file. There should -not be a search expression for UserGroup then. - -Objects related in a many-to-one relation must always be included in -the search expression. This is also true if the object is -indirectly related to one of the included objects. In this case, -only a reference to the related object will be included in the data -file. The related object must have its own list entry. """ from collections import ChainMap From 504d09179465fb18946bd0a690bff349a22d25b4 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Thu, 18 Jan 2024 20:49:36 +0100 Subject: [PATCH 27/43] Indicate in the documentation of icat.dumpfile which methods of class icat.dumpfile.DumpFileReader and class icat.dumpfile.DumpFileWriter are abstract and thus need to implemented in the file format specific backend --- src/icat/dumpfile.py | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/src/icat/dumpfile.py b/src/icat/dumpfile.py index c5c5a002..a18832a3 100644 --- a/src/icat/dumpfile.py +++ b/src/icat/dumpfile.py @@ -99,6 +99,9 @@ def getdata(self): specific to the implementing backend and should be passed as the `data` argument to :meth:`~icat.dumpfile.DumpFileReader.getobjs_from_data`. + + This abstract method must be implemented in the file format + specific backend. """ raise NotImplementedError @@ -107,6 +110,9 @@ def getobjs_from_data(self, data, objindex): Yield a new entity object in each iteration. The object is initialized from the data, but not yet created at the client. + + This abstract method must be implemented in the file format + specific backend. """ raise NotImplementedError @@ -197,7 +203,11 @@ def __exit__(self, type, value, traceback): self.outfile.close() def head(self): - """Write a header with some meta information to the data file.""" + """Write a header with some meta information to the data file. + + This abstract method must be implemented in the file format + specific backend. + """ raise NotImplementedError def startdata(self): @@ -205,15 +215,26 @@ def startdata(self): If the current chunk contains any data, write it to the data file. + + This abstract method must be implemented in the file format + specific backend. """ raise NotImplementedError def writeobj(self, key, obj, keyindex): - """Add an entity object to the current data chunk.""" + """Add an entity object to the current data chunk. + + This abstract method must be implemented in the file format + specific backend. + """ raise NotImplementedError def finalize(self): - """Finalize the data file.""" + """Finalize the data file. + + This abstract method must be implemented in the file format + specific backend. + """ raise NotImplementedError def writeobjs(self, objs, keyindex, chunksize=100): From a52714b1c066e55e43bef02cae7a37c767064cd9 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Fri, 19 Jan 2024 11:51:54 +0100 Subject: [PATCH 28/43] Minor language fixes --- doc/src/file-icatdata.rst | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst index 57183153..97efd819 100644 --- a/doc/src/file-icatdata.rst +++ b/doc/src/file-icatdata.rst @@ -103,15 +103,15 @@ one Investigation, including related InvestigationGroups. The object elements may have an ``id`` attribute that define a local key to reference the object later on. The subelements of the object -elements correspond to the object's attributes and relations in the -ICAT schema. All many-to-one relations must be provided and reference -already existing objects, e.g. they must either already have existed -before starting the ingestion or appear earlier in the ICAT data file -than the referencing object, so that they will be created earlier. -The related object may either be referenced by reference key using the -``ref`` attribute or by the related object's attribute values, using -XML attributes of the same name. In the latter case, the attribute -values must uniquely define the related object. +elements correspond to the object's attributes and relations according +to the ICAT schema. All many-to-one relations must be provided and +reference already existing objects, e.g. they must either already have +existed before starting the ingestion or appear earlier in the ICAT +data file than the referencing object, so that they will be created +earlier. The related object may either be referenced by reference key +using the ``ref`` attribute or by the related object's attribute +values, using XML attributes of the same name. In the latter case, +the attribute values must uniquely define the related object. Consider a simplified version of the first chunk from the present example, defining only one User, Grouping and UserGroup respectively: @@ -201,7 +201,7 @@ Finally note that the file format also depends on the ICAT schema version: the present example can only be ingested into ICAT server 5.0 or newer, because the attributes fileCount and fileSize have been added to Investigation in this version. With older ICAT versions, it -will fail because the attributes are not defined. +will fail because these attributes are not defined. You will find more extensive examples in the source distribution of python-icat. The distribution also provides XML Schema Definition From f0112b8f3fa88312ded8a6df1e110f1665176a7c Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Fri, 19 Jan 2024 14:04:54 +0100 Subject: [PATCH 29/43] Add first input to Section Metadata ingest files --- doc/src/file-icatdata.rst | 8 ++++++ doc/src/file-icatingest.rst | 51 +++++++++++++++++++++++++++++++++++++ 2 files changed, 59 insertions(+) diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst index 97efd819..73e84f3e 100644 --- a/doc/src/file-icatdata.rst +++ b/doc/src/file-icatdata.rst @@ -12,6 +12,8 @@ logic for reading and writing the files is provided by the The actual file format depends on the version of the ICAT schema and on the backend: python-icat provides backends using XML and YAML. +.. _ICAT-data-files-structure: + Logical structure of ICAT data files ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -55,6 +57,8 @@ Alternatively the file may only contain User and Grouping objects, with the UserGroups being included into the object definition of the corresponding Grouping objects. +.. _ICAT-data-files-references: + References to ICAT objects and unique keys ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -77,6 +81,8 @@ chunk boundaries must use unique keys. [#dc]_ Reference keys should be considered as opaque ids. +.. _ICAT-data-xml-files: + ICAT data XML files ~~~~~~~~~~~~~~~~~~~ @@ -210,6 +216,8 @@ schema versions. Note the these XML Schema Definition files are provided for reference only. The :ref:`icatingest` script does not validate its input. +.. _ICAT-data-yaml-files: + ICAT data YAML files ~~~~~~~~~~~~~~~~~~~~ diff --git a/doc/src/file-icatingest.rst b/doc/src/file-icatingest.rst index 04954679..c7103833 100644 --- a/doc/src/file-icatingest.rst +++ b/doc/src/file-icatingest.rst @@ -3,4 +3,55 @@ Metadata ingest files ===================== +Metadata ingest files are the input format for class +:class:`icat.ingest.IngestReader`. This class is intended to be uesd +in scripts that read the metadata created by experimments into ICAT. +The file format is basically a restricted version of +:ref:`ICAT-data-xml-files`. +The underlying idea is that ICAT data files are in principle suitable +to encode the metadata to be ingested from the experiment. The only +problem is that this file format is too powerful: it can encode any +ICAT content. We want the ingest files from the experiment to create +new Datasets and DatasetParameters, we certainly don't want these +files to create new Instruments or Users in ICAT. And we also want to +control the Investigation that newly created Datasets will be added +to. It would be rather difficult to control the power of the input +format if we would use plain ICAT data files for this purpose. + +Class :class:`icat.ingest.IngestReader` takes an ``investigation`` +argument. We will refer to the Investigation given in this argument +as the *prescribed Investigation* in the following. The metadata +ingest file format restricts ICAT data XML files in the following +ways: + +* ingest files must contain one and only one ``data`` element, + e.g. chunks according to the :ref:`ICAT-data-files-structure`. + +* the allowed object types are restricted to Dataset, + DatasetInstrument, DatasetTechnique, and DatasetParameter. + +* the attributes in the object definitions for Datasets are restricted + to name, description, startDate, and endDate. + +* object definitions for Datasets can not include a reference to the + related Investigation. The relation with the prescribed + Investigation will be implied. + +* object definitions for Datasets can reference a related Sample only + by name or by pid. A relation of the related Sample with the + prescribed Investigation will be implied. + +* references to the related Dataset in DatasetInstrument, + DatasetTechnique, and DatasetParameter definitions are restricted to + :ref:`local keys `. These objects can + thus only relate to Datasets defined in the same ingest file. + +* other object references are restricted to reference by attributes. + +These restrictions are enforced by validating the input against an XML +Schema Definition (XSD). + +Another change with respect to ICAT data XML files is that the name of +the root element is ``icatingest`` and that it must have a ``version`` +attrbute. From 048a98cb0d151a918262628601716cb7da1092c5 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Fri, 19 Jan 2024 14:29:11 +0100 Subject: [PATCH 30/43] Minor language fixes --- doc/src/file-icatingest.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/src/file-icatingest.rst b/doc/src/file-icatingest.rst index c7103833..bf2af389 100644 --- a/doc/src/file-icatingest.rst +++ b/doc/src/file-icatingest.rst @@ -26,7 +26,7 @@ ingest file format restricts ICAT data XML files in the following ways: * ingest files must contain one and only one ``data`` element, - e.g. chunks according to the :ref:`ICAT-data-files-structure`. + e.g. one chunk according to the :ref:`ICAT-data-files-structure`. * the allowed object types are restricted to Dataset, DatasetInstrument, DatasetTechnique, and DatasetParameter. @@ -44,8 +44,8 @@ ways: * references to the related Dataset in DatasetInstrument, DatasetTechnique, and DatasetParameter definitions are restricted to - :ref:`local keys `. These objects can - thus only relate to Datasets defined in the same ingest file. + :ref:`local keys `. As a result, these + objects can only relate to Datasets defined in the same ingest file. * other object references are restricted to reference by attributes. From dd7473f5d07a608fdeaf153008ad61d1a2428565 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Fri, 2 Feb 2024 15:50:11 +0100 Subject: [PATCH 31/43] Add an example to the Metadata ingest files Section of the documentation --- MANIFEST.in | 1 + doc/examples/metadata.xml | 94 +++++++++++++++++++++++++++++++++++++ doc/src/file-icatingest.rst | 34 ++++++++++++++ 3 files changed, 129 insertions(+) create mode 100644 doc/examples/metadata.xml diff --git a/MANIFEST.in b/MANIFEST.in index a7c92f8b..655665c1 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -10,6 +10,7 @@ include doc/examples/icatdump-*.xml include doc/examples/icatdump-*.yaml include doc/examples/ingest-*.xml include doc/examples/metadata-*.xml +include doc/examples/metadata.xml include doc/icatdata*.xsd include doc/man/* include doc/tutorial/*.py diff --git a/doc/examples/metadata.xml b/doc/examples/metadata.xml new file mode 100644 index 00000000..121b0432 --- /dev/null +++ b/doc/examples/metadata.xml @@ -0,0 +1,94 @@ + + + + 2024-02-02T12:52:00+01:00 + metadata-writer 0.28 + + + + e202553 + Dy01Cp02 at 2.7 K + 2020-09-30T18:02:17+02:00 + 2020-09-30T20:18:36+02:00 + + + + + + + + + + e202554 + Dy01Cp02 at 5.1 K + 2020-09-30T20:29:19+02:00 + 2020-09-30T21:23:49+02:00 + + + + + + + + + + e202555 + Dy01Cp02 at 2.7 K + 2020-09-30T21:35:16+02:00 + 2020-09-30T23:04:27+02:00 + + + + + + + + + + e202556 + reference + 2020-09-30T23:04:31+02:00 + 2020-10-01T01:26:07+02:00 + + + + + + + + + neutron + + + + + 5.3 + + + + + 2.74103 + 2.7408 + 2.7414 + + + + + neutron + + + + + 5.3 + + + + + 5.1239 + 5.1045 + 5.1823 + + + + + diff --git a/doc/src/file-icatingest.rst b/doc/src/file-icatingest.rst index bf2af389..2c650263 100644 --- a/doc/src/file-icatingest.rst +++ b/doc/src/file-icatingest.rst @@ -19,6 +19,9 @@ control the Investigation that newly created Datasets will be added to. It would be rather difficult to control the power of the input format if we would use plain ICAT data files for this purpose. +Differences compared to ICAT data XML files +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Class :class:`icat.ingest.IngestReader` takes an ``investigation`` argument. We will refer to the Investigation given in this argument as the *prescribed Investigation* in the following. The metadata @@ -55,3 +58,34 @@ Schema Definition (XSD). Another change with respect to ICAT data XML files is that the name of the root element is ``icatingest`` and that it must have a ``version`` attrbute. + +Example +~~~~~~~ + +Consider the following example: + +.. literalinclude:: ../examples/metadata.xml + :language: xml + +This file defines four Datasets with related objects. All datasets +have a ``name``, ``description``, ``startDate``, and ``endDate`` +attribute and include a relation with an Instrument and a Technique, +respectively. + +Note that the Datasets have no ``complete`` attribute and no relation +with Investigation or DatasetType respectively. All of these are +added with prescribed values by class +:class:`icat.ingest.IngestReader`. + +Some Datasets relate to Samples: the first two Datasets relate to the +same Sample, the third Dataset to another Sample, while the last +Dataset has no relation with any Sample. All Samples a referenced by +their name. Class :class:`icat.ingest.IngestReader` will add a +reference to the Investigation to this, so that only Samples that are +related to prescribed Investigation can actually be referenced. + +Some DatasetParameter are added as separate objects in the file. They +respectively reference their related Datasets using local keys that +are defined in the ``id`` attribute of the corresponding Dataset +earlier in the file. Alternatively, the DatasetParameter could have +been included into into the respective Datasets. From 634a4163b344acfd8360a3a274a479167fcf097b Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Mon, 12 Feb 2024 12:01:05 +0100 Subject: [PATCH 32/43] Language fixes in the documentation --- doc/src/file-icatdata.rst | 12 ++++++------ doc/src/file-icatingest.rst | 10 +++++----- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst index 73e84f3e..878c87f6 100644 --- a/doc/src/file-icatdata.rst +++ b/doc/src/file-icatdata.rst @@ -6,7 +6,7 @@ ICAT data files ICAT data files provide a way to serialize ICAT content to a flat file. These files are read by the :ref:`icatingest` and written by the :ref:`icatdump` command line scripts respectively. The program -logic for reading and writing the files is provided by the +logic for reading and writing the files is provided in the :mod:`icat.dumpfile` module. The actual file format depends on the version of the ICAT schema and @@ -62,13 +62,13 @@ corresponding Grouping objects. References to ICAT objects and unique keys ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -References to related objects are encoded in ICAT data files by +References to related objects are encoded in ICAT data files with reference keys. There are two kinds of those keys, local keys and unique keys: When an ICAT object is defined in the file, it generally defines a local key at the same time. Local keys are stored in the object index -and may be used to reference this object from other obejcts in the +and may be used to reference this object from other objects in the same data chunk. Unique keys can be obtained from an object by calling @@ -149,7 +149,7 @@ Note that the UserGroup does not include its relation with Grouping. The latter relationship is implied by the parent relation of the object in the file. -As an alternative, the Usergroup could have been added to the file as +As an alternative, the UserGroup could have been added to the file as separate object as direct subelement of ``data``: .. code-block:: XML @@ -262,7 +262,7 @@ that these UserGroups include their relation to the User, but not their relation with Grouping. The latter relationship is implied by the parent relation of the object in the file. -As an alternative, in the present example, the Usergroups could have +As an alternative, in the present example, the UserGroups could have been added to the file as separate objects as in: .. code-block:: YAML @@ -292,7 +292,7 @@ entity types in order to make sure that referenced objects are created before any object that may reference them. -.. [#dc] There is one exception: DataCollections don't have a +.. [#dc] There is one exception: DataCollections doesn't have a uniqueness constraint and can't reliably be searched by attributes. Therefore local keys for DataCollections are always kept in the object index and may be used to reference diff --git a/doc/src/file-icatingest.rst b/doc/src/file-icatingest.rst index 2c650263..20b853f2 100644 --- a/doc/src/file-icatingest.rst +++ b/doc/src/file-icatingest.rst @@ -4,8 +4,8 @@ Metadata ingest files ===================== Metadata ingest files are the input format for class -:class:`icat.ingest.IngestReader`. This class is intended to be uesd -in scripts that read the metadata created by experimments into ICAT. +:class:`icat.ingest.IngestReader`. This class is intended to be used +in scripts that read the metadata created by experiments into ICAT. The file format is basically a restricted version of :ref:`ICAT-data-xml-files`. @@ -57,7 +57,7 @@ Schema Definition (XSD). Another change with respect to ICAT data XML files is that the name of the root element is ``icatingest`` and that it must have a ``version`` -attrbute. +attribute. Example ~~~~~~~ @@ -79,8 +79,8 @@ added with prescribed values by class Some Datasets relate to Samples: the first two Datasets relate to the same Sample, the third Dataset to another Sample, while the last -Dataset has no relation with any Sample. All Samples a referenced by -their name. Class :class:`icat.ingest.IngestReader` will add a +Dataset has no relation with any Sample. All Samples are referenced +by their name. Class :class:`icat.ingest.IngestReader` will add a reference to the Investigation to this, so that only Samples that are related to prescribed Investigation can actually be referenced. From 987f22ed0c53df433e7406d7d966c5592109ebb0 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Mon, 12 Feb 2024 12:19:22 +0100 Subject: [PATCH 33/43] Documentation fix: also the relation to DatasetType is added by IngestReader --- doc/src/file-icatingest.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/doc/src/file-icatingest.rst b/doc/src/file-icatingest.rst index 20b853f2..4ba46517 100644 --- a/doc/src/file-icatingest.rst +++ b/doc/src/file-icatingest.rst @@ -37,9 +37,10 @@ ways: * the attributes in the object definitions for Datasets are restricted to name, description, startDate, and endDate. -* object definitions for Datasets can not include a reference to the - related Investigation. The relation with the prescribed - Investigation will be implied. +* object definitions for Datasets can not include references to the + related Investigation or DatasetType. These relation will be added + by :class:`icat.ingest.IngestReader`. The relation to the + Investigation will be set to the prescribed Investigation. * object definitions for Datasets can reference a related Sample only by name or by pid. A relation of the related Sample with the From 21235fadd3be2ae519b358f3d6e4aa923979149f Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Mon, 12 Feb 2024 13:45:56 +0100 Subject: [PATCH 34/43] - add a note on the versioning to metadata ingest file documentation - move the versionchanged note about adding icatingest 1.1 from documentation on module ingest to the metadata ingest file page --- doc/src/file-icatingest.rst | 10 ++++++++++ doc/src/ingest.rst | 3 --- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/doc/src/file-icatingest.rst b/doc/src/file-icatingest.rst index 4ba46517..22c77814 100644 --- a/doc/src/file-icatingest.rst +++ b/doc/src/file-icatingest.rst @@ -19,6 +19,16 @@ control the Investigation that newly created Datasets will be added to. It would be rather difficult to control the power of the input format if we would use plain ICAT data files for this purpose. +.. note:: + The metadata ingest file format is versioned. This version number + is independent from the python-icat version. It is incremented + only when the format changes. The latest version of the metadata + ingest file format is 1.1. + +.. versionchanged:: 1.2.0 + add metadata ingest file format version 1.1: add support for + relating Datasets with Samples. + Differences compared to ICAT data XML files ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/doc/src/ingest.rst b/doc/src/ingest.rst index 72eeb07a..ab6db393 100644 --- a/doc/src/ingest.rst +++ b/doc/src/ingest.rst @@ -52,9 +52,6 @@ reference to a ``Sample``. That ``Sample`` objects needs to exist beforehand and needs to be related to the same ``Investigation`` as the ``Dataset``. -.. versionchanged:: 1.2.0 - add version 1.1 of the ingest file format, including references to samples - .. autoclass:: icat.ingest.IngestReader :members: :show-inheritance: From c3d360b997e25700228fc773e4d9b0625108208b Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Mon, 12 Feb 2024 14:07:53 +0100 Subject: [PATCH 35/43] Update documentation for module icat.ingest taking into account the new file format documentation --- doc/src/ingest.rst | 31 +++++++++---------------------- 1 file changed, 9 insertions(+), 22 deletions(-) diff --git a/doc/src/ingest.rst b/doc/src/ingest.rst index ab6db393..9ab94740 100644 --- a/doc/src/ingest.rst +++ b/doc/src/ingest.rst @@ -11,7 +11,7 @@ even in minor releases of python-icat. This module provides class :class:`icat.ingest.IngestReader` that -reads metadata from an XML file to add them to ICAT. It is designed +reads :ref:`ICAT-ingest-files` to add them to ICAT. It is designed for the use case of ingesting metadata for datasets created during experiments. @@ -21,22 +21,14 @@ that base class in restricting the vocabular of the input file: only objects that need to be created during ingestion from the experiment may appear in the input. This restriction is enforced by first validating the input against an XML Schema Definition (XSD). In a -second step, the input is transformed into generic XML :ref:`ICAT data -file ` format using an XSL Transformation (XSLT) and -then fed into :class:`~icat.dumpfile_xml.XMLDumpFileReader`. The -format of the input files may be customized to some extent by providing -custom versions of XSD and XSLT files, see :ref:`ingest-customize` -below. - -The input accepted by :class:`~icat.ingest.IngestReader` consists of -one or more ``Dataset`` objects that all need to relate to the same -``Investigation`` and any number of related ``DatasetTechnique``, -``DatasetInstrument``, and ``DatasetParameter`` objects. The -``Investigation`` must exist beforehand in ICAT. The relation from -the ``Dataset`` objects to the ``Investigation`` will be set by -:class:`~icat.ingest.IngestReader` accordingly. (Actually, the XSLT -will add that attribute to the datasets in the input.) The -``Dataset`` objects will not be created by +second step, the input is transformed into generic :ref:`ICAT data XML +file format ` using an XSL Transformation (XSLT) +and then fed into :class:`~icat.dumpfile_xml.XMLDumpFileReader`. The +format of the input files may be customized to some extent by +providing custom versions of XSD and XSLT files, see +:ref:`ingest-customize` below. + +The ``Dataset`` objects in the input will not be created by :class:`~icat.ingest.IngestReader`, because it is assumed that a separate workflow in the caller will copy the content of datafiles to the storage managed by IDS and create the corresponding ``Dataset`` @@ -47,11 +39,6 @@ of the datasets will be read from the input file and set in the ``DatasetTechnique``, ``DatasetInstrument`` and ``DatasetParameter`` objects read from the input file in ICAT. -Using ingest file format 1.1, ``Dataset`` objects may also include a -reference to a ``Sample``. That ``Sample`` objects needs to exist -beforehand and needs to be related to the same ``Investigation`` as -the ``Dataset``. - .. autoclass:: icat.ingest.IngestReader :members: :show-inheritance: From f2b9657153cbb87d66bf7bfaa272f3fd89466d5e Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Mon, 12 Feb 2024 14:15:12 +0100 Subject: [PATCH 36/43] Another language fix --- doc/src/file-icatingest.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/src/file-icatingest.rst b/doc/src/file-icatingest.rst index 22c77814..9794ba75 100644 --- a/doc/src/file-icatingest.rst +++ b/doc/src/file-icatingest.rst @@ -15,8 +15,8 @@ problem is that this file format is too powerful: it can encode any ICAT content. We want the ingest files from the experiment to create new Datasets and DatasetParameters, we certainly don't want these files to create new Instruments or Users in ICAT. And we also want to -control the Investigation that newly created Datasets will be added -to. It would be rather difficult to control the power of the input +control to which Investigation newly created Datasets are going to be +added. It would be rather difficult to control the power of the input format if we would use plain ICAT data files for this purpose. .. note:: From 9ad05b5962c3279a4e6be85e56c96337980ab242 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Mon, 12 Feb 2024 14:22:56 +0100 Subject: [PATCH 37/43] Yet another language fix --- doc/src/file-icatingest.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/src/file-icatingest.rst b/doc/src/file-icatingest.rst index 9794ba75..7348259f 100644 --- a/doc/src/file-icatingest.rst +++ b/doc/src/file-icatingest.rst @@ -93,7 +93,7 @@ same Sample, the third Dataset to another Sample, while the last Dataset has no relation with any Sample. All Samples are referenced by their name. Class :class:`icat.ingest.IngestReader` will add a reference to the Investigation to this, so that only Samples that are -related to prescribed Investigation can actually be referenced. +related to the prescribed Investigation can actually be referenced. Some DatasetParameter are added as separate objects in the file. They respectively reference their related Datasets using local keys that From 3eefef5a94405e290705a9f89aa2cb208976d7a5 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Mon, 12 Feb 2024 14:41:22 +0100 Subject: [PATCH 38/43] Add kink anchors to the entries for each version in the changelog in order to provide more stable permalinks --- CHANGES.rst | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) diff --git a/CHANGES.rst b/CHANGES.rst index 1744b6ef..a152e835 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -2,6 +2,8 @@ Changelog ========= +.. _changes-1_3_0: + 1.3.0 (not yet released) ~~~~~~~~~~~~~~~~~~~~~~~~ @@ -36,6 +38,8 @@ Bug fixes and minor changes .. _#147: https://github.com/icatproject/python-icat/pull/147 +.. _changes-1_2_0: + 1.2.0 (2023-10-31) ~~~~~~~~~~~~~~~~~~ @@ -84,6 +88,8 @@ Bug fixes and minor changes .. _#140: https://github.com/icatproject/python-icat/pull/140 +.. _changes-1_1_0: + 1.1.0 (2023-06-30) ~~~~~~~~~~~~~~~~~~ @@ -139,6 +145,8 @@ Bug fixes and minor changes .. _#129: https://github.com/icatproject/python-icat/pull/129 +.. _changes-1_0_0: + 1.0.0 (2022-12-21) ~~~~~~~~~~~~~~~~~~ @@ -231,6 +239,8 @@ Bug fixes and minor changes .. _#106: https://github.com/icatproject/python-icat/pull/106 +.. _changes-0_21_0: + 0.21.0 (2022-01-28) ~~~~~~~~~~~~~~~~~~~ @@ -249,6 +259,8 @@ New features .. _#100: https://github.com/icatproject/python-icat/pull/100 +.. _changes-0_20_1: + 0.20.1 (2021-11-04) ~~~~~~~~~~~~~~~~~~~ @@ -260,6 +272,8 @@ Bug fixes and minor changes .. _#96: https://github.com/icatproject/python-icat/pull/96 +.. _changes-0_20_0: + 0.20.0 (2021-10-29) ~~~~~~~~~~~~~~~~~~~ @@ -296,6 +310,8 @@ Bug fixes and minor changes .. _#95: https://github.com/icatproject/python-icat/pull/95 +.. _changes-0_19_0: + 0.19.0 (2021-07-20) ~~~~~~~~~~~~~~~~~~~ @@ -324,6 +340,8 @@ Bug fixes and minor changes .. _#85: https://github.com/icatproject/python-icat/pull/85 +.. _changes-0_18_1: + 0.18.1 (2021-04-13) ~~~~~~~~~~~~~~~~~~~ @@ -341,6 +359,8 @@ Bug fixes and minor changes .. _#82: https://github.com/icatproject/python-icat/pull/82 +.. _changes-0_18_0: + 0.18.0 (2021-03-29) ~~~~~~~~~~~~~~~~~~~ @@ -377,6 +397,8 @@ Bug fixes and minor changes .. _#80: https://github.com/icatproject/python-icat/pull/80 +.. _changes-0_17_0: + 0.17.0 (2020-04-30) ~~~~~~~~~~~~~~~~~~~ @@ -468,6 +490,8 @@ Misc .. _#72: https://github.com/icatproject/python-icat/issues/72 +.. _changes-0_16_0: + 0.16.0 (2019-09-26) ~~~~~~~~~~~~~~~~~~~ @@ -492,6 +516,8 @@ Bug fixes and minor changes .. _#60: https://github.com/icatproject/python-icat/pull/60 +.. _changes-0_15_1: + 0.15.1 (2019-07-12) ~~~~~~~~~~~~~~~~~~~ @@ -513,6 +539,8 @@ Bug fixes and minor changes .. _#57: https://github.com/icatproject/python-icat/issues/57 +.. _changes-0_15_0: + 0.15.0 (2019-03-27) ~~~~~~~~~~~~~~~~~~~ @@ -551,6 +579,8 @@ Bug fixes and minor changes .. _#54: https://github.com/icatproject/python-icat/issues/54 +.. _changes-0_14_2: + 0.14.2 (2018-10-25) ~~~~~~~~~~~~~~~~~~~ @@ -563,6 +593,8 @@ Bug fixes and minor changes probably not need it. +.. _changes-0_14_1: + 0.14.1 (2018-06-05) ~~~~~~~~~~~~~~~~~~~ @@ -573,6 +605,8 @@ Bug fixes and minor changes for the Write API call. +.. _changes-0_14_0: + 0.14.0 (2018-06-01) ~~~~~~~~~~~~~~~~~~~ @@ -628,6 +662,8 @@ Bug fixes and minor changes .. _#48: https://github.com/icatproject/python-icat/issues/48 +.. _changes-0_13_1: + 0.13.1 (2017-07-12) ~~~~~~~~~~~~~~~~~~~ @@ -640,6 +676,8 @@ Bug fixes and minor changes .. _#38: https://github.com/icatproject/python-icat/issues/38 +.. _changes-0_13_0: + 0.13.0 (2017-06-09) ~~~~~~~~~~~~~~~~~~~ @@ -798,6 +836,8 @@ Bug fixes and minor changes .. _pytest-dependency: https://pypi.python.org/pypi/pytest_dependency/ +.. _changes-0_12_0: + 0.12.0 (2016-10-10) ~~~~~~~~~~~~~~~~~~~ @@ -837,6 +877,8 @@ Bug fixes and minor changes .. _#28: https://github.com/icatproject/python-icat/issues/28 +.. _changes-0_11_0: + 0.11.0 (2016-06-01) ~~~~~~~~~~~~~~~~~~~ @@ -896,6 +938,8 @@ Misc .. _distutils_pytest: https://github.com/RKrahl/distutils-pytest +.. _changes-0_10_0: + 0.10.0 (2015-12-06) ~~~~~~~~~~~~~~~~~~~ @@ -964,6 +1008,8 @@ Bug fixes and minor changes .. _#15: https://github.com/icatproject/python-icat/issues/15 +.. _changes-0_9_0: + 0.9.0 (2015-08-13) ~~~~~~~~~~~~~~~~~~ @@ -1067,6 +1113,8 @@ Bug fixes and minor changes .. _#10: https://github.com/icatproject/python-icat/issues/10 +.. _changes-0_8_0: + 0.8.0 (2015-05-08) ~~~~~~~~~~~~~~~~~~ @@ -1156,6 +1204,8 @@ Bug fixes and minor changes :meth:`icat.query.Query.__repr__`. +.. _changes-0_7_0: + 0.7.0 (2015-02-11) ~~~~~~~~~~~~~~~~~~ @@ -1187,6 +1237,8 @@ New features :meth:`icat.ids.IDSClient.getLink` method. +.. _changes-0_6_0: + 0.6.0 (2014-12-15) ~~~~~~~~~~~~~~~~~~ @@ -1314,6 +1366,8 @@ Minor changes and fixes + Add comparison operators to class :class:`icat.listproxy.ListProxy`. +.. _changes-0_5_1: + 0.5.1 (2014-07-07) ~~~~~~~~~~~~~~~~~~ @@ -1357,6 +1411,8 @@ Minor changes and fixes modifications, such as running 2to3 on them. +.. _changes-0_5_0: + 0.5.0 (2014-06-24) ~~~~~~~~~~~~~~~~~~ @@ -1399,6 +1455,8 @@ Minor changes and fixes .. __: https://github.com/icatproject/icat.server/issues/112 +.. _changes-0_4_0: + 0.4.0 (2014-02-11) ~~~~~~~~~~~~~~~~~~ @@ -1446,6 +1504,8 @@ Minor changes and fixes :ref:`icatrestore `. +.. _changes-0_3_0: + 0.3.0 (2014-01-10) ~~~~~~~~~~~~~~~~~~ @@ -1492,6 +1552,8 @@ Minor changes and fixes + Add example scripts :ref:`icatdump` and :ref:`icatrestore `. +.. _changes-0_2_0: + 0.2.0 (2013-11-18) ~~~~~~~~~~~~~~~~~~ @@ -1532,6 +1594,8 @@ Minor changes and fixes import :mod:`icat` and :mod:`icat.config`. +.. _changes-0_1_0: + 0.1.0 (2013-11-01) ~~~~~~~~~~~~~~~~~~ From f1f2b73fe933898a2ed8f6dca5dfa861a9fa2c3d Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Mon, 19 Feb 2024 15:51:29 +0100 Subject: [PATCH 39/43] Dynamically create a file _meta.rst in the documentation source that defines substitutions and download links for the latest source distribution and signature file --- doc/.gitignore | 1 + doc/Makefile | 1 + doc/src/conf.py | 29 ++++++++++++++++++++++++++++- 3 files changed, 30 insertions(+), 1 deletion(-) diff --git a/doc/.gitignore b/doc/.gitignore index e938dd2d..b6a292cd 100644 --- a/doc/.gitignore +++ b/doc/.gitignore @@ -1,3 +1,4 @@ +/src/_meta.rst /devhelp/ /dirhtml/ /doctest/ diff --git a/doc/Makefile b/doc/Makefile index 9cc7cebc..7358c71a 100644 --- a/doc/Makefile +++ b/doc/Makefile @@ -20,6 +20,7 @@ $(BUILDERS): $(STATIC_SOURCEDIRS) distclean: rm -rf doctrees $(BUILDERS) + rm -f src/_meta.rst $(STATIC_SOURCEDIRS): mkdir $@ diff --git a/doc/src/conf.py b/doc/src/conf.py index 2f880389..1496d62a 100644 --- a/doc/src/conf.py +++ b/doc/src/conf.py @@ -9,7 +9,8 @@ from pathlib import Path import sys -maindir = Path(__file__).resolve().parent.parent.parent +docsrcdir = Path(__file__).resolve().parent +maindir = docsrcdir.parent.parent buildlib = maindir / "build" / "lib" sys.path[0] = str(buildlib) sys.dont_write_bytecode = True @@ -28,6 +29,32 @@ # The short X.Y version version = ".".join(release.split(".")[0:2]) +# Write a _meta.rst that defines some custom substitutions +def make_meta_rst(last_release): + template = """:orphan: + +.. |distribution_source| replace:: %(dist_src_name)s +.. |distribution_signature| replace:: %(dist_sig_name)s +.. _distribution_source: %(dist_src_url)s +.. _distribution_signature: %(dist_sig_url)s +""" + github_repo = "https://github.com/icatproject/python-icat" + dist_src_name = "python-icat-%s.tar.gz" % last_release + dist_src_url = ("%s/releases/download/%s/%s" + % (github_repo, last_release, dist_src_name)) + dist_sig_name = "python-icat-%s.tar.gz.asc" % last_release + dist_sig_url = ("%s/releases/download/%s/%s" + % (github_repo, last_release, dist_sig_name)) + subst = { + 'dist_src_name': dist_src_name, + 'dist_src_url': dist_src_url, + 'dist_sig_name': dist_sig_name, + 'dist_sig_url': dist_sig_url, + } + with (docsrcdir / '_meta.rst').open('wt') as f: + print(template % subst, file=f) + +make_meta_rst(icat._meta.release) # -- General configuration --------------------------------------------------- From 8f35940836c76ef500a465ca45b73163194fbe9e Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Mon, 19 Feb 2024 17:26:27 +0100 Subject: [PATCH 40/43] Review install instructions, explaining how to verify the signature --- doc/src/83F336432C7FCC91.pub | 44 ++++++++++++++++++++++++++++++++ doc/src/install.rst | 49 ++++++++++++++++++++++++++++-------- 2 files changed, 83 insertions(+), 10 deletions(-) create mode 100644 doc/src/83F336432C7FCC91.pub diff --git a/doc/src/83F336432C7FCC91.pub b/doc/src/83F336432C7FCC91.pub new file mode 100644 index 00000000..330f2f80 --- /dev/null +++ b/doc/src/83F336432C7FCC91.pub @@ -0,0 +1,44 @@ +-----BEGIN PGP PUBLIC KEY BLOCK----- + +mQENBFE3WkEBCADM4jKAQMsVlnU5NxbJ5JmpqhPRj54eSkDcvIjPcEQLkMmQjCDT +HHwN5ZjzHNTj7nXkvmjjWMgyzjpNmdUAofsh6MBp1etXNzYNkoEs+urRlw1wuRaU +NMK4Pf0G35THrQ0nJdmmCGkzxiTgQTitLVA52zZclq3Vqo/ZsO26gkLB2ErhZJZE +2q+TL6BBr98m+1zXpG5kqF/IE4pF4Yl1Oysp8imAAbodr+6X1DGfOM2h1NwMSbAo +Uw49hR4PIwxKP5Sluv6GNUVgyPaOrk8LVE4c+H0lswmz6nZOlxhhbtplN0KViqki +6pqyrOuwv3ZgzUXO4bjEexScyWe2PxKUzjFFABEBAAG0K1JvbGYgS3JhaGwgPHJv +bGYua3JhaGxAaGVsbWhvbHR6LWJlcmxpbi5kZT6JATkEEwECACMFAlE3WkECGwMH +CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCD8zZDLH/Mkcj5CAC0x2GU88xD +eBR1MyGq9nUTDgjO/EkiztDZirBg1FLGwCVtXY3yZc0nSriEj4oF8lNiGU539rU1 +R+z76UCDTlq/xq2/a1BazStkHuv+OuUfoA/Hl5/Tvp+dwk7BXG6dlyr6joT3i9Pz +RgH/kFe1RAJNnT/oy5LTRsydcWb/mCey/O/ON47zlKzNbbGvL6YPwmsyaO22vUmO +JsH4JZM36BDu3Wt2LPB+A51ZanzlxkfA3Mcc0cIe9PsSqufvnV/kG4cQxJedgXes +lVniggXbtsudl8EqmUpq/yS+/X3BLBfidTA2Yicx6udmR5ZFQHoCrOlcTfylW0mz +x5rhClZPgrgaiEYEExECAAYFAlQVluoACgkQUcvGPyCdlGaaRgCg0s2cWgUXWeb7 +noexGZNxnmQIMrgAoJqBXBVVrWfd7bwdWT1IEnyGMiCeiEYEExECAAYFAlQVlyYA +CgkQO0qCjX1HQDs8HACfduvRjIu+wmrvyN+ikPXHN6ZJYOAAni4k+F5m7P9RkUK/ +MPW34JrqaIg8iQIcBBMBAgAGBQJYRG+HAAoJEAihJkF1ND5uePgP/3okgaIQOwcy +7lN2SiP1k/UxjmqynrdrsTWdGRm+wyJ9Er9WlHgMQavaxk2XOpTQ8DcAuczpNyOb +qaYI6l+xd8mDvdJ7lbYZboiZj62nb/yUwRAyN3TJ7PRjuWXqLZjVnywQzYN66Z2v +kuxewEqZUeLVlUcg7IEwwCOErAmHFfYmIER7Q0Hyvc8gdkbFzgQ5UNHyLUngMe+6 +VGLlkoyRykF9DDCmqMQO06Ork78gsTVTHr0LEMG3HyKiQ8rLZouSQS9tiw7RVIji +nbf1EWRvVwgSXPSsx545uVwUOSyXlozK7AzFxjlFJU8G9+h1fXYlkviFPrsU2vwa +6q8GiVnaLpwa2QC9iznPTzSnUFh9Eqg8aO4DqpH28L+o5PTClmWUGncqigmYGipm +2s0AKdtRFVXcz7fmH8JKi9u9dBtJPIbdA3Kq/D6+1GkiS5V0aELWI+0424RJ5qlO +MHukVUxg0QH/MJnzfRT3MAV5gBpJC5KrijwS7FN8m+CQN35+OMoiBbpOKt/+wQgF +K31D/M55CZoaeVtkcLiTRjUig2Dwr/16IMd5IcpetNoIcUILDENcWh0mYo02kaJt +nldsZIAi77goxdgKu41AIIhEv0FmlXp6OB/QoEJRiDOVtxSW7bG1F+JbularecE2 +t5PehBq5k35vxo8tteL1xQIP+8nnOtUJuQENBFE3WkEBCADB84pLmmsdFjV5R+0e +zL2COBZBUxUPSIuKOdEfHkR5M5AxbXdg9GwxDMZE1TLAdX8sn1ymwUlZt6dSUFO0 +hg0LdZAOMvjvFb6dF+RE7gfeOsH0usTN32NUzW0/S1E2V8LRlplGIXtHa9YZArQw +k97gpFATheh4K/QHvrIyneVam+B+6WH8zJtBfGmWtjfBLwSiWohQPQAvYBW6hi86 ++I3z0yCrOhgM/N9uylgWu+BQzoQ8/Jv2g22bzSa1mbCP1OVp587HpJy9WbX/aKH4 +7I/vp0qLysWekbuX5OOjsiItW2Yv7oK/S7OtoagTUqX3KG1KRTJZHTTS03dy3DME +fqNtABEBAAGJAR8EGAECAAkFAlE3WkECGwwACgkQg/M2Qyx/zJEJcAgAsE8NNJYX +/3Vdd9WQih4Xg2Pvz66Z9jwTyS9Rb3boB0gtZMgqsHQBdF9iYNVxREpiVDPA0YKR +x1iTjFblt9Ryq7MZVPhRI1cfDfHKCw6bMz1hZDBRr1BSZVjiru74OCebreeOMhzI +zmyP7GSi0q5edZO0zpYkOlme3dQBatSkEAnSDOA9ct6EEMG3ZsQda1YXa9BMKj7e +B+UdFUdGb5SB8buW5RKLMTD485gKpvxWpYptP5DD3r3mThc2m5uWdiAM+jqm9Flc +NlD0bZ8tdZpbPOgxnbAuy7HEPaS/VnGZHouwZWpb484dynCO7+Oi1f2y2tPx0uXV +DRFDDLLR3oBEag== +=+2H3 +-----END PGP PUBLIC KEY BLOCK----- diff --git a/doc/src/install.rst b/doc/src/install.rst index 78fd935d..de15d475 100644 --- a/doc/src/install.rst +++ b/doc/src/install.rst @@ -1,11 +1,11 @@ +.. include:: _meta.rst + Install instructions ==================== -Release packages of python-icat are published in the `Python Package -Index (PyPI)`__. See :ref:`install-using-pip` for the short version -of the install instructions. +See :ref:`install-using-pip` for the short version of the install +instructions. -.. __: `PyPI site`_ System requirements @@ -114,26 +114,54 @@ Installation Installation using pip ...................... -You can install python-icat from PyPI using pip:: +You can install python-icat from the +`Python Package Index (PyPI) `_ using pip:: $ pip install python-icat +Note that while installing from PyPI is convenient, there is no way to +verify the integrity of the source distribution, which may be +considered a security risk. + Installation from the source distribution ......................................... Steps to manually build from the source distribution: -1. Download the sources, unpack, and change into the source directory. +1. Download the sources. + + From the `Release Page `_ you may download + the source distribution file |distribution_source|_ and the + detached signature file |distribution_signature|_ + +2. Check the signature (optional). + + You may verify the integrity of the source distribution by checking + the signature (showing the output for version 1.2.0 as an example):: + + $ gpg --verify python-icat-1.2.0.tar.gz.asc + gpg: assuming signed data in 'python-icat-1.2.0.tar.gz' + gpg: Signature made Tue Oct 31 07:01:55 2023 CET + gpg: using RSA key 760465DAF652737A61EC0C9D83F336432C7FCC91 + gpg: Good signature from "Rolf Krahl " [full] -2. Build:: + The signature should be made by the key + :download:`0x760465DAF652737A61EC0C9D83F336432C7FCC91 + <83F336432C7FCC91.pub>`. The fingerprint of that key is:: + + 7604 65DA F652 737A 61EC 0C9D 83F3 3643 2C7F CC91 + +3. Unpack and change into the source directory. + +4. Build (optional):: $ python setup.py build -3. Test (optional, see below):: +5. Test (optional, see below):: $ python setup.py test -4. Install:: +6. Install:: $ python setup.py install @@ -179,7 +207,6 @@ You can safely run the tests without configuring any test server. You will just get many skipped tests then. -.. _PyPI site: https://pypi.org/project/python-icat/ .. _setuptools: https://github.com/pypa/setuptools/ .. _packaging: https://github.com/pypa/packaging/ .. _suds-jurko: https://pypi.org/project/suds-jurko/ @@ -191,5 +218,7 @@ will just get many skipped tests then. .. _pytest: https://docs.pytest.org/en/latest/ .. _pytest-dependency: https://pypi.org/project/pytest-dependency/ .. _distutils-pytest: https://github.com/RKrahl/distutils-pytest/ +.. _PyPI site: https://pypi.org/project/python-icat/ +.. _GitHub latest release: https://github.com/icatproject/python-icat/releases/latest/ .. _GitHub repository: https://github.com/icatproject/python-icat/ .. _Issue #72: https://github.com/icatproject/python-icat/issues/72 From 67b947c31be3111b1610d0a2baf627006b58e557 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Mon, 19 Feb 2024 17:43:45 +0100 Subject: [PATCH 41/43] Fixup 8f35940: need to run doc/src/conf.py before doc8-check now --- .github/workflows/rst-lint.yaml | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/.github/workflows/rst-lint.yaml b/.github/workflows/rst-lint.yaml index b9b239f7..9205803a 100644 --- a/.github/workflows/rst-lint.yaml +++ b/.github/workflows/rst-lint.yaml @@ -11,6 +11,17 @@ jobs: steps: - name: Check out repository code uses: actions/checkout@v4 + - name: Set up Python 3.11 + uses: actions/setup-python@v4 + with: + python-version: 3.11 + - name: Install dependencies + run: | + pip install setuptools packaging git-props suds + - name: Run conf.py + run: | + python setup.py build + python doc/src/conf.py - name: doc8-check uses: deep-entertainment/doc8-action@v4 with: From 2a827fdcc2820e7d6f2d76477730d081f96119ef Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Mon, 19 Feb 2024 17:50:30 +0100 Subject: [PATCH 42/43] Aesthetic fix for rst-lint action: unshallow the checked out repository in order to get the correct version number in the diagnostics --- .github/workflows/rst-lint.yaml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/workflows/rst-lint.yaml b/.github/workflows/rst-lint.yaml index 9205803a..187ce87c 100644 --- a/.github/workflows/rst-lint.yaml +++ b/.github/workflows/rst-lint.yaml @@ -11,6 +11,8 @@ jobs: steps: - name: Check out repository code uses: actions/checkout@v4 + with: + fetch-depth: 0 - name: Set up Python 3.11 uses: actions/setup-python@v4 with: From 848a4745abd3aeaab680c2dc25b5f4d47c7fe357 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Mon, 19 Feb 2024 18:09:44 +0100 Subject: [PATCH 43/43] Some tweaks in the install instructions: - Point out that a manual install does not automatically install dependencies, - Removed yet another reference of PyPI yo get release versions from, - Minor formulation fix. --- doc/src/install.rst | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/doc/src/install.rst b/doc/src/install.rst index de15d475..12d0bd41 100644 --- a/doc/src/install.rst +++ b/doc/src/install.rst @@ -126,7 +126,9 @@ considered a security risk. Installation from the source distribution ......................................... -Steps to manually build from the source distribution: +Note that the manual build does not automatically check the +dependencies. So we assume that you have all the systems requirements +installed. Steps to manually build from the source distribution: 1. Download the sources. @@ -172,9 +174,9 @@ Building from development sources ................................. For production use, it is always recommended to use the latest release -version from PyPI, see above. If you need some not yet released -bleeding edge feature or if you want to participate in the -development, you may also clone the `source repository from GitHub`__. +version, see above. If you need some not yet released bleeding edge +feature or if you want to participate in the development, you may also +clone the `source repository from GitHub`__. Note that some source files are dynamically created and thus missing in the development sources. If you want to build from the development @@ -203,8 +205,8 @@ authentication plugin must also have these users configured. from the test server and replace it with example content. Do not configure the tests to access a production server! -You can safely run the tests without configuring any test server. You -will just get many skipped tests then. +You can safely run the tests without configuring any test server. But +most of the test will be skipped then. .. _setuptools: https://github.com/pypa/setuptools/