Generators, slug rewrites and generic collections #2

stephenwf · 2024-01-04T12:46:49Z

Adds a few more hooks and fixes some bugs.

Note: This release will not have a build --watch option.

Generators

A generator is a script that can be added to headless project that will be called before the rest of the pipeline. A generator will be tasked with creating a folder of IIIF manifests. It will be given a folder in the cache, and unless specified, the contents will automatically be added to the rest of the headless static site pipeline.

The generator is structured in a way to maximise caching, and parallelism. However, these are optional and you can just run it as a single monolithic function. The build-in example uses the NASA photo archive to make a search query and build manifests from the results. It uses the parallel steps.

All the steps available are:

Prepare - this step is required, it should return a list of expected Manifests to be generated. However, this step can also just be used for the whole generation. It is ALWAYS called when a build/generation happens. Any caching here would have to be manual. However a cached fetch() is provided.
Invalidate - this is a function that returns a boolean. If it returns true then the following generate steps will be called.
Generate Each - The list of resources returned in the prepare step are gathered and in parallel the generateEach is called on them. Useful for async requests. Currently the generator does not have batching, so be careful if you are making lots of HTTP requests. In the generate you are passed a directory to save the resource, along with the data returned in the prepare step and caches. Similar to other steps in the pipeline you can return cache entries specific to this resource (by id) using return { cache: { anything: '...' } }
Generate - This is similar to Generate Each, but is only called once and passed a list of all the resources. It also has a single cache. You can implement both steps, or just one.
Post generate - This is called after the previous generation steps are done. You are passed the list of resources and the directory that they should now be saved to.

There is also an invalidateEach step that will be passed the resource-specific cache from the generateEach step, so you can only build things that have changed.

The NASA example uses:

Prepare
Genrate Each

The prepare step makes the search request to the NASA API and returns a list of results. It uses the configuration from the .iiifrc.yaml to limit the results and pages. It then returns a list of resources (NOT IIIF YET). E.g.

return [
  {
    id: 'KSC-97PC1225',
    type: 'Manifest',
    data: { nasa_id: 'KSC-97PC1225' }
  }
];

The identifier needs to be unique, but doesn't have to be a URL and the data can be anything serialisable.

The returned list of results are then passed to generateEach which will make a further call to the NASA APIs to get image information and metadata. It will then construct a IIIF Manifest (using IIIF Builder, instance passed in as a helper) and save it to disk.

There is a helper for saving data to disk (save and forget, no await).

If there is no output in the iiifrc.yaml configuration, then it will be automatically build into the static site using the iiif-json preset. You can the "virtual" store it generates in the build folder (.iiif/build/config/stores.json). If you set an output folder, that will be used for building instead - and then you can wire it up manually, if you want to save into source control or change the filter rules.

Example .iiifrc.yaml using the build in example generator (NASA)

server:
  url: http://localhost:7111

generators:
  sts-shuttles:
    type: nasa-generator
    config:
      label: STS Shuttles
      query: "sts orbit"
      maxResults: 10

By default, generators will be run during a build. But you can pass --no-generate to prevent this. You can also run generate as a distinct step using iiif-hss generate

Caching is aggressive, but --no-cache will disable the remote URL cache for generators.

To create a new generator, you can create a script - similar to extract/enrich.

import { generator } from 'iiif-hss';

generator({
  id: 'my-generator',
  name: 'My generator',
  async prepare() {
    return [ ... ];
  },
  async generate(list, dir, api) {
    // ...
  }
})

Rewrites

A new hook was added for rewriting slugs. This is always called, so ideally the generation of the rewrite should be minimal and not make slow requests. There is a bundled rewrite for flattening manifests/collections. If you implemented this in the scripts/ folder, it might look like this

// scripts/flat-manifests.js
import { rewrite } from 'iiif-hss';

rewrite({
  id: 'flat-manifests',
  name: 'Flat manifests',
  types: ['Manifest'],
  rewrite(slug, resource) {
    const parts = slug.split('/');
    const lastPart = parts.pop();
    return `manifests/${lastPart}`;
  },
});

This will rewrite the slug (the URL of the manifest, minus the /manifest.json at the end). Currently rewrites can only be added to the top-level run: in the config, and not per-store. This might change in the future, but you need to ensure you add it into the run configuration.

Generic collections

There is some work to do with collection processing, and this feature is a functional but not customisable implementation of creating generic collections during the extract step.

In addition to returning indicies, caches and meta from an extraction step, you can also return a list of collection slugs. e.g. path/to/my/collection. All manifests that are tagged in this way will be gathered together and a IIIF collection created. The labels are bad and you cannot customise them yet. However there is a plan to improve the collection enrichment step, including these collections and index collections.

There is a built-in generic collection extraction: folder-collections. This will create a collection from each folder that contains manifests.

For example:

exhibitions/manifest-1.json
exhibitions/manifest-2.json
exhibitions/manifest-3.json
objects/manifest-4.json
objects/manifest-5.json
objects/manifest-6.json

Will create 2 collections, each with 3 manifests:

collections/exhibitions/collection.json
collections/objects/collection.json

And if you pair this will flat manifests, they would be rewritten to:

manifests/manifest-1.json
manifests/manifest-2.json
manifests/manifest-3.json
manifests/manifest-4.json
manifests/manifest-5.json
manifests/manifest-6.json

So they are flat, but you have the folder structure preserved.

stephenwf added 7 commits October 26, 2023 15:38

0.1.5

aa5211b

Collection parsing + bug fixes

8b8d5aa

0.1.6

76d0243

Fix build

3c3c6e2

Fixed bugs with store request cache

841165d

Created load-scripts helper

ad99e9b

Added generators, slug rewrites and generic collections

e8424e8

stephenwf merged commit df932b8 into main Jan 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generators, slug rewrites and generic collections #2

Generators, slug rewrites and generic collections #2

stephenwf commented Jan 4, 2024

Generators, slug rewrites and generic collections #2

Generators, slug rewrites and generic collections #2

Conversation

stephenwf commented Jan 4, 2024

Generators

Rewrites

Generic collections