Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archipelago 2021 (second quarter) Roadmap:1.0.0-RC3 #172

Open
DiegoPino opened this issue Apr 6, 2022 · 1 comment
Open

Archipelago 2021 (second quarter) Roadmap:1.0.0-RC3 #172

DiegoPino opened this issue Apr 6, 2022 · 1 comment
Labels
Composer.json Keep your Libraries fresh Deployment Strategies What every vendor would love to Copy and pasta documentation Improvements or additions to documentation Drupal9 Drupal9 is the new Drupal8 which was the new Drupal7 wich was the... Future Release Duties We are all duty here, heavy duty Service Settings Docker settings, Service Settings. What allows us to run the thing Site Building Things you do via the UI with a lot of Browser tabs open tigresses and bears Community work and Archipelago Travel
Milestone

Comments

@DiegoPino
Copy link
Member

DiegoPino commented Apr 6, 2022

Archipelago autumn 2021 Roadmap

See also #5 and #35 and #79 and #80 and #103 for a complete historic recreation.
This is our working enumeration of concrete tasks until November of 2021 (a long year), per Component and Service, for public evaluation (new ideas, requests, critics and comments welcome).

Still calm waters and sunny shores around Archipelago allow us and you to navigate smoothly and number of instances already deployed is growing, public and running and the unsuspected home-alone repositories we suspect exists makes us wonder how much more we can all discover or re-discover together

Checked tasks are ready, unchecked are in progress or planned. Priority is not given by this order.

Please feel free to comment, request more info or ask for clarification. Feature requests are also highly appreciated and taken in account (always, please!).

Strawberryfield

  • Field Property exposure to Drupal strategies
    • JSON KEY Provider (flattener)
    • JSON Flatten Keys
    • JSONPATH/JMESPATH
    • Entity Reference Casting Provider (Using UUID loading and configurable entity type) using JSON based hints to expose any semantic relationship to Search API. New to RC2 with Entity Type Selector allowing also Terms/Taxonomies, etc.
    • JSON stored Service Endpoints with extended logic (e.g HOCR) - A.k.a Strawberry Flavor Data Source.
    • Multi Map/ join: many properties to single. e.g All keys - Authorities- referring to creators, contributors etc unified as Agents keys. This leads to Fractal Ontologies and our Buckets approach.
  • File downloads and streaming
    • Ranged Request Streamer with back-to-front S3 managment and buffer/memory managment. For any exposed Binary Endpoint. Also streaming, fixed Local Files. This was some not sleeping much!
  • Strawberry Flavor Data structures can now hold NLP data and metadata
  • Strawberry Flavor Data structures and indexed Documents in Solr have cleanup on deletes and caching management
  • SMART (very) Breadcrumb generation with strategy selection (Longest Path, common repeating Path)

JSON representation and enrichment

  • Better File management (Better than Drupal)
    • File referencing via UUID instead of via Entity ID
    • Handle temporary files when moving from TEMP storage to PERMANENT
    • Increment file usage count on new versions
    • Decrement file usage count on version removal
    • Change file usage on Delete, EDIT on existing active content and versions
    • Add Webform based UI managment (reorder, replace, delete) for files
    • File based Post processing
      • TECHMD
      • ~~ ZIP/UNZIP ~~ MOVED to Strawberry Runners.
      • Derivative for larger MEDIA (video and Sound) MOVED to Strawberry Runners.
      • Pronom Service/Preservation
  • New JSON Service Architecture reference
  • Deposit/save on Node save whole, selfs sustainable Strawberry JSON blob in S3/Minio/FileSystem
  • Keep track of Service and action on Ingest/edit using Activity Streams
  • Add more agent information on our activity streams for provenance and tracking.AMI now also adds Set IDs
  • Add More Event Driven Subscribers. And better
  • Hook-able and override-able storage Pattern for files. @alliomeria we need to doc this for Developers.
  • Selective size of TECHMD generation based on amount of files present on a single ADO
  • MEDIAINFO processing

Webforms integration

  • Webform Driven UI Ingest with custom handler and widget
  • Handler allows direct CRUD without any node attached and also prepopulation of data using an existing node UUID @alliomeria we need docs here too 🥰 🥰
  • Create a set of Demo Webforms that cover base of our GLAM source data needs
  • Full Autosaving during Creation (sessions are kept alive for a week. Users can skip Steps, jump back and forth and Validation will still happen but at the end. Log out, come back, continue.
  • Allow Webform Field Widget selection be driven by RDF type and permissions.
  • Webform Widgets can start Open/Rendered or closed via settings and have "cancel edit" hidding to avoid users leaving the edit realm.
  • New Solr Aware Entity Select Views (with code code to handle Solr to Entity) which allows
    • Complex autocomplete elements (like get me all Digital Objects of Type Book with a green Cover the user can see
  • New Fine grained Entity (node to node) reference possible through this.
  • CSV to JSON importer element
  • XML to JSON importer element
  • Strawberry transplanter. Any JSON into filled Webform Elements (display) using a twig template.
  • Special Date element ISO8601, with Ranges, Single Dates and free form representation.
  • Create new, better, LoD Webform elements
    • WIKIDATA
    • LoC (with support for any Suggest endpoint)
    • LoC with support MADS RDF Types
    • WIKIDATA Agents with LD Roles
    • WIKIDATA using custom SPARQL
    • Viaf
    • EUROPEANA
    • SNAC/Orgs/Names/Family Names
    • MeSH (PubMed)
    • Multi Source, Multi Agent Element. Agents/Corporate can use now multiple Authority Controls.
    • Getty with exact and fuzzy search (updated to be better!)
    • Nominatim Geo reconciliation. Normal and Reverse.
    • Panorama Tour Building App (like 1200 lines of code, gosh!)
    • Image and EXIF extraction on upload for UI/facing previews.
  • Create Stub (temporary) WIKIDATA entities if query shows desired WIKIDATA entity does not exist upstream.
    • "publish" to wikibase functionality
    • Replace repo wide stub uri with official one once pushed.
    • Keep track on the stub who is referencing it is (bidirectional reference?)
  • Move Strawberryfield harvest Webform handler's logic to Event Subscribers. Stronger capabilities now.
    • Deal with as:images
    • Deal with as:documents, as:video, as:sound, as:dataset elements
    • Deal with as:models
  • Allow anonymous submits to be converted into proper Nodes by Admin (Self deposit, crowd sourced metadata) WOHO! This also allows self standing endpoints and custom mappings.
  • Make Webform API Interaction work with States(JS) by removing one From wrapper.
  • Make Webform API Interaction more versatile for our use. Use as schema validator. WIP. AMI.
  • Add JS to avoid main node CRUD to submit/validate embedded Webform as widget

Media Displays Entities

  • Display settings, new tab that shows only the active View Mode for an ADO
  • Admin/contextual block that shows how ADO to Type was chosen by the system (admin hint)
  • Add expected mime/type output to Media displays. Allows to tag media displays as JSON, XML, CSV, JSON-LD or HTML only.
    • React to mime type to allow JSON or XML output to be downloaded too.
    • Native/self rendering and Content-Type tagging with caching.
    • Automatic extraction From template of required/used variables (context). Not front facing yet but for sure useful for building a Pick-and-chose (or Data color picker) to aid in Twig Template building
  • Webforms are injected as Context. So a Webform Element Title can be used to match its value.
  • AMI set id and URLs are injected as Context during batch ingest
  • Add new Data Views Plugin integration to allow Media Displays to preprocess values on views exposed as API endpoints
  • Version/Revision Media Display Entities (This is config, annotations and Update Hooks)
  • Inline Preview with ADO selection. Means users can see the data, test the data and see the output with Live Updates even without saving
  • Per Metadata Display Extra data injection via any strawberry field that is added. @alliomeria we need docs!
  • Provide example Twig templates for
    • MODS
    • DC and
    • JSON-LD
    • GEOJSON
    • IIIF Manifest 2.1
    • IIIF Manifest 3.0
    • EAD2002 (With recursive C Element generation from CSV)
    • EAD3 (With recursive C Element generation from CSV)
    • IIIF Manifests for Creative WorkSeries and Children based on Views
    • a Carrousel
  • Metadata Display Exposed endpoints (reuse as Standalone API/download/streams)
  • New Twig Extensions:
  • Functions: sbf_entity_ids_by_label()
  • Filters: markdown_2_html, html_2_markdown,sbf_json_decode
  • API builder via UI using Endpoints. Any API, OAI, IIIF, etc. Allows a VIEW to be injected to feed data. Arguments are filtered and fully customizable. WIP. Coming to 1.0.0

Field Formatters

  • Static IIIF Images
  • Open Seadragon IIIF Images
    • W3C Web Annotations! Box and Polygon, fully IIIF compliant with CRUD endpoints. Caches until you are ready to save.
    • Add thumbnail navigation
  • IABookreader IIIF Images
  • Panorama via IIIF now with webGL max texture calculator and max Image size/memory preprocessing to avoid breaking Cantaloupe when using 400MP images.
  • Panorama Tours via other Panorama Objects and IIIF, including Hotspots of many types
  • Metadata up-casters
  • Metadata up-casters with download endpoint (Metadata Display Exposed endpoints)
  • Video (HTML5) with Subtitles (with grouping, multi Video, multi Subtitle)
  • Audio (HTML5) with Subtitles (with grouping, multi Audio, multi Subtitle)
  • PDF with multi file selection(custom, derived from the base PDF.js library. Not fancy. But Mozilla asks people to NOT use their fancy one directly and we agreed.
  • Web annotations (IIIF) with JMESPATH fine grained selector of which Files to attach
  • Complex nested structures (Whole graphs)
  • 3D! (Three + JSM) with Full Material Support and UV Textures
  • 3D UV Mapping using IIIF Sources and Scene/Light settings
  • 3D Point Clouds from JSON or URLS
  • Mirador 3.0 (With Resource comparison and multi sourced IIIF manifests, using full release now)
  • Mirador 3 (second JS) with HOCR/Text Highlights using https://github.com/dbmdz/mirador-textoverlay
  • Expose View Mode to JSON Type value mapping that triggers automatic View Mode Selection
  • Webrecorder.io native player (WARC replay) with WACZ capabilities version 1.3.2
  • Lazy Image Loading via CSS class. JS driven, only loads (when used) Images when visible by the user (+100 px to give them some time to load while users navigate)
  • All formatters can handle Embargoes based on Time and IP address/ranges with caching. Includes alternative Source for Media when embargoed
  • All formatters can handle with JMESPATH fine grained selector of which Files to attach

API Ingest, Migration and backup

  • Strawberryfield Normalizer: expands JSON string as a JSON when exporting
  • Strawberryfield denormalizer: string-ify JSON when importing
  • Wrap JSONAPI on a set of Drush script to (Strawberry Seeds)
    • Allow Single command line invoke files and node ingest
    • Create virtual field Entity "bucket" to allow Media to be ingested into those as links and routed to internal Strawberryfield elements (utility methods for ingest)
  • AMI (Archipelago Multi Import)
    • API Source (Other repos, ContentDM, generic Solr)
    • API Source (ISLANDORA Solr)
    • Google Spreadsheets (same as IMI)
    • Complete Drush 9 integration
    • AMI Set Entities
    • AMI Sets Entity processing via Batch or Enqueuing (for Hydroponics)
    • Separate processing for remote/single files allowing longer processing
    • AMI Sets Delete Ingested ADOs by this Set via batch (to clear and reingest)
    • LoD Reconciliation with complete per Label Processing and multiple Endpoint calls. Can be edited/refined and reused in a Metadata Display
    • Reusable, canned public facing AMI ingest strategies. Users can only add the source data, all the rest is pre-setup.
    • S3 Sources for AMI
    • Local file (server) Sources for AMI
    • Remote HTTP sources for AMI
    • ZIP (on the works)
    • Folder as a source (on the works)
    • Vouchers
  • Filesystem drop-and-forget ingest. You save a JSON file into S3, Archipelago creates entities and relationships.
  • Use JSON API to allow seamless moving of dependent assets between repositories and also for backups

Service Architecture (Strawberry Runners)

  • Develop webhook driven notification service for derivatives
  • Custom, user facing Plugins. Build your own derivative workflows (system calls, JSON processing, etc)
  • Document/deploy webhook triggers for minio S3 per mimetype
  • Document/deploy webhook triggers for AWS S3 (via lambda) per mimetype
  • Develop Shell processing using Custom Plugins (Processors) and user configurable for each case (rule system)
  • Allow Processor to be chained! And have multiple outputs.
  • Queue-worker processing
  • Generate JSON reference-able Services (plugins) for complex non descriptive metadata and data
    • HOCR
    • TECHMD
    • WACZ
    • Web Annotations
    • Tabular datasets
    • Transcripts (similar to Web Annotations, mostly dependant)
    • File Conversions (any that your Shell allows) with reingest
    • Smart checks on existing processed output to avoid double processing.
    • ~~ Build slim Content entity that can be used to index natively that content into Solr via search API ~~ This is now a fully capable Search API Datasource that can hold any output. one (node) to many (files) to even more sequences.
    • Allow Services to be self explaining of its capabilities. WIP how we expose this to the world. Probably GET will be allowed
    • Two Hydroponics approaches. Single Thread lineal one (default) and Multi Child, with how many children are spawn config. All using ReactPHP

SEO and API

  • Allow Media displays output to be embeded in HTML head for SEO
  • Test/Develop nested DATA VIEWS integration for OAI-ORE and OAI-PMH (See Format Strawberryfield and API builder)
  • Create (TWIG, metadata displays) and expose as endpoints full set of IIIF API JSON outputs.
  • Add helper methods and twig extensions to allow Metadata displays to access pre existing views (like object listings for a collection) to help build those lists.

ACL / Permissions

  • Integrate custom ACL with JSON Paths into per NODE ACL. Allowing this way to apply permissions to individual metadata elements/paths.
  • Embargoes with JSON key setup for dates/IPs (Individual and ranges). Includes Cron "release" system (deletes caches) and applies to Formatters, Metadata endpoints too
  • Same but needs better UI for referenced Services and Media
  • Allow Metadata (rule) to trigger ACL permissions. e.g if embargo_date == bla bla = remove public access
  • Allow for ACL inheritance (from parent, recursive) without hard copies.

Deployment and DevOPS

  • Sync Configurations and remove non used ones for minio branch / periodic for each Drupal release
  • Site-build and remove orphan blocks
  • Add more utility views
  • Enable JSONAPI by default on minio branch
  • Create jsonapi user with jsonapi credentials for minio branch
  • Create basic scripts to automate Docker/Bash operations
  • Update AWS deployer to match minio including docs and Cloud Services integration
  • XDEBUG integration. 2 PHP 7.4.9 Containers, Cookie based, routed by NGINX
  • Natural Language processing Service via Docker
  • Catmandu Docker container for large data mangling
  • Update all Strawberryfield modules script.
  • Drupal 8.9.x (Last Time) and bumps on every module
  • Drupal 9.2.9 now Primary and bumps on every module
  • Solr 8.11, MYSQL 8.
  • D9 ready-ness proven and working of course(our new default) 😄
  • DDEV deployment strategy
  • Archipelago Live with optimized folder structure and Production read AWS EC2 Docker deployment

Batch Operations

  • Bulk Batch Views PURE TEXT plugin to (All this via JSONPATCH so supports any operation)
    • Replace existing JSON values
  • Bulk Batch Views JSONPATH plugin to (All this via JSONPATCH so supports any operation)
    • Replace existing JSON values
    • Add to existing Values
    • Respect data type casted values, (entities, file references)
      [x] Bulk Batch Views Webform Element based plugin to
    • Replace existing JSON values using a Given Webform an a UX driven From/TO option
  • Bulk Batch Views MEDIA plugin to
    • Replace Media
    • Add Media
  • Bulk Batch Views ACL plugin to
    • Replace ACL and inheritance
    • Replace ACL individual Control List Elements
    • Add ACL individual Control List Elements
  • Integrate into Solr Results and Strawberryfield Taxonomy Term pages
  • CSV based export with selective type and AMI Set generation for future "Update" operation

Future roadmap

  • Solr Cloud/ Consortial ensemble
  • Native Wikibase/Wikidata publishing

Documentation:

  • https://docs.archipelago.nyc (First large iteration with Search and tags)
  • Devops and new repository deployers
  • Migration to and from.
  • Backup and restoring
  • Permissions, access and ACLs.
  • Twig Template Primer
  • AMI Ingest, Process
  • Metadata Professionals, JSON schema and schema-less. AS, DR and AP internal ontologies. UPDATED
  • Metadata Professionals, Key concepts of Archipelago
  • Metadata, Ingest and edit workflows.
  • Displays, Formatters and Media Plugins (Twig)
  • LoD Reconciliation for AMI
  • Views Integration (Solr and Blocks)
  • Strawberry Field Exposed Keys and Plugins
    • Property Exposing strategies and configs
  • Media Management
  • Solr and Discovery
  • Extending and Coding
  • SEO
@DiegoPino DiegoPino added documentation Improvements or additions to documentation Service Settings Docker settings, Service Settings. What allows us to run the thing Drupal9 Drupal9 is the new Drupal8 which was the new Drupal7 wich was the... Composer.json Keep your Libraries fresh Release Duties We are all duty here, heavy duty Deployment Strategies What every vendor would love to Copy and pasta Future tigresses and bears Community work and Archipelago Travel Site Building Things you do via the UI with a lot of Browser tabs open labels Apr 6, 2022
@DiegoPino DiegoPino pinned this issue Apr 6, 2022
@DiegoPino DiegoPino added this to the 1.0.0 milestone Apr 6, 2022
@giancarlobi
Copy link
Contributor

@DiegoPino I'd like to suggest a couple of things related to biological object as GBIF and NCBI taxonomy link and Darwin Core as an example more of twig template, just in case

@DiegoPino DiegoPino changed the title Archipelago 2022 (second quarter) Roadmap:1.0.0-RC3 Archipelago 2021 (second quarter) Roadmap:1.0.0-RC3 Jul 12, 2022
@DiegoPino DiegoPino modified the milestones: 1.0.0, 1.0.0-RC3 Jul 12, 2022
@alliomeria alliomeria unpinned this issue Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Composer.json Keep your Libraries fresh Deployment Strategies What every vendor would love to Copy and pasta documentation Improvements or additions to documentation Drupal9 Drupal9 is the new Drupal8 which was the new Drupal7 wich was the... Future Release Duties We are all duty here, heavy duty Service Settings Docker settings, Service Settings. What allows us to run the thing Site Building Things you do via the UI with a lot of Browser tabs open tigresses and bears Community work and Archipelago Travel
Projects
None yet
Development

No branches or pull requests

2 participants