Releases: NYPL/drb-etl-pipeline
v0.10.0
This release contains a number of notable improvements and fixes. In addition to more minor bugfix improvements this includes:
- Added citation generator endpoint and rules implementing MLA citation format
- Added per-language indexing and search via ElasticSearch 8.x
- Improved speed and stability of the clustering process
- Enforced sanity checks during ingest process, including for publication dates
- Removed polyglot and replaced it with fasttext for reasons related to support and dependency resolution
v0.9.6
This is a patch release targeting the Project MUSE ingest process. It aligns the import service with a change to the source data and improves web scraping from Project MUSE edition detail pages.
v0.9.5
This is a minor patch release that mainly includes edge case handling for the proxy endpoint. Also included in this release is a new deployment for feature branches via the tugboat.qa service. This should make development easier by allowing more QA processes to shift to these environments.
v0.9.4
Provides a minor tweak to how links to resources are served, privileging webpub manifests, which are the preferred format with the new web reader component.
v0.9.3
This adds the _meta
block in search results to contain metadata concerning the search result object itself. In the initial iteration here this will contain only highlight
metadata providing information about how search results were matched.
v0.9.2
This is a patch version release that includes several minor new features as well as several improvements to back-end processes. This includes
Added
- Detect and add file types in s3
- Add
readerVersion
parameter for/search
,/work
and/edition
endpoints - Add ElasticSearch query highlighting in API response
Fixed
- Improve error handling in the clustering process
- Handle relative links in the proxy endpoint
- Add
embed
flag for HTML links - Extend settings for
utils/proxy
endpoint - Resolve issue with display of links when filtering by format
- Improve release stability with
production
tag
v0.9.1
This patch release includes minor improvements to the creation process of DRB works, ensuring that InternetArchive links are present where available and that matching records are combined into as few works as possible.
v0.9.0
This is a minor release that adds functionality for managing collections of DRB records as OPDS2 objects. This is a demonstration functionality to show the how such a service would operate on the back-end, and what data would be exposed to consuming front-end applications. Specifically added are the following endpoints:
/collection
(POST
) Creates a new collection of edition records with a title, creator and description. (Authentication required)/collection/<uuid>
(GET
) Fetches a collection identified by a UUID/collection/<uuid>
(DELETE
) Deletes specified collection. (Authentication required)/collection/list
(GET
) Returns a list of all collections in the system
v0.8.0
This is a minor release that extends two features of the API for use with the newly developed web reader application. Both pertain to the display of PDF files in the web reader:
- A proxy endpoint has been added for use when proxying PDF resources is necessary for display. Access to this endpoint is controlled via a CORS header which can be configured via an environment variable
- The default Webpub Manifest implementation was extended with a
conformsTo
field in the metadata block to allow the webreader to identify the type of resources contained within a manifest before loading the necessary renderer/reader.
v0.7.1
This is a patch release that increases the stability/durability of several aspects of the ingest process. Specifically these aspects are:
- The timeout for the OCLC catalog has been increased to deal with delays coming from the Classify process (specifically when it is processing large records)
- Handle timeouts in the Elasticsearch process (also due to processing times for large records)
- Restricted the types of records read by the clustering process to make it more effecient
- Set max execution time of cover process to 12 hours to allow for API rate limits to reset