Skip to content

Releases: NYPL/drb-etl-pipeline

v0.10.0

04 Apr 19:46
Compare
Choose a tag to compare

This release contains a number of notable improvements and fixes. In addition to more minor bugfix improvements this includes:

  • Added citation generator endpoint and rules implementing MLA citation format
  • Added per-language indexing and search via ElasticSearch 8.x
  • Improved speed and stability of the clustering process
  • Enforced sanity checks during ingest process, including for publication dates
  • Removed polyglot and replaced it with fasttext for reasons related to support and dependency resolution

v0.9.6

25 Jan 19:58
Compare
Choose a tag to compare

This is a patch release targeting the Project MUSE ingest process. It aligns the import service with a change to the source data and improves web scraping from Project MUSE edition detail pages.

v0.9.5

22 Nov 22:43
Compare
Choose a tag to compare

This is a minor patch release that mainly includes edge case handling for the proxy endpoint. Also included in this release is a new deployment for feature branches via the tugboat.qa service. This should make development easier by allowing more QA processes to shift to these environments.

v0.9.4

28 Oct 19:48
Compare
Choose a tag to compare

Provides a minor tweak to how links to resources are served, privileging webpub manifests, which are the preferred format with the new web reader component.

v0.9.3

04 Oct 18:45
Compare
Choose a tag to compare

This adds the _meta block in search results to contain metadata concerning the search result object itself. In the initial iteration here this will contain only highlight metadata providing information about how search results were matched.

v0.9.2

04 Oct 16:26
Compare
Choose a tag to compare

This is a patch version release that includes several minor new features as well as several improvements to back-end processes. This includes

Added

  • Detect and add file types in s3
  • Add readerVersion parameter for /search, /work and /edition endpoints
  • Add ElasticSearch query highlighting in API response

Fixed

  • Improve error handling in the clustering process
  • Handle relative links in the proxy endpoint
  • Add embed flag for HTML links
  • Extend settings for utils/proxy endpoint
  • Resolve issue with display of links when filtering by format
  • Improve release stability with production tag

v0.9.1

09 Sep 14:42
Compare
Choose a tag to compare

This patch release includes minor improvements to the creation process of DRB works, ensuring that InternetArchive links are present where available and that matching records are combined into as few works as possible.

v0.9.0

19 Aug 16:51
Compare
Choose a tag to compare

This is a minor release that adds functionality for managing collections of DRB records as OPDS2 objects. This is a demonstration functionality to show the how such a service would operate on the back-end, and what data would be exposed to consuming front-end applications. Specifically added are the following endpoints:

  • /collection (POST) Creates a new collection of edition records with a title, creator and description. (Authentication required)
  • /collection/<uuid> (GET) Fetches a collection identified by a UUID
  • /collection/<uuid> (DELETE) Deletes specified collection. (Authentication required)
  • /collection/list (GET) Returns a list of all collections in the system

v0.8.0

03 Aug 18:23
Compare
Choose a tag to compare

This is a minor release that extends two features of the API for use with the newly developed web reader application. Both pertain to the display of PDF files in the web reader:

  1. A proxy endpoint has been added for use when proxying PDF resources is necessary for display. Access to this endpoint is controlled via a CORS header which can be configured via an environment variable
  2. The default Webpub Manifest implementation was extended with a conformsTo field in the metadata block to allow the webreader to identify the type of resources contained within a manifest before loading the necessary renderer/reader.

v0.7.1

21 Jul 21:11
Compare
Choose a tag to compare

This is a patch release that increases the stability/durability of several aspects of the ingest process. Specifically these aspects are:

  • The timeout for the OCLC catalog has been increased to deal with delays coming from the Classify process (specifically when it is processing large records)
  • Handle timeouts in the Elasticsearch process (also due to processing times for large records)
  • Restricted the types of records read by the clustering process to make it more effecient
  • Set max execution time of cover process to 12 hours to allow for API rate limits to reset