View on GitHub

infrastructure-team

Site for high level documentation of infrastructure team projects and other useful concepts.

Concepts and Interactions

A bit more focused on how the services fit together and why.

types of Digital Objects

We started with more, but we’re kinda down to three

(ask Andrew)

Object Registration and Accessioning

How people get objects get into SDR

Argo

H2

Use H2 to create an object - https://github.com/sul-dlss/infrastructure-integration-test/blob/main/spec/features/create_object_h2_spec.rb

ETD

Create a new ETD - https://github.com/sul-dlss/infrastructure-integration-test/blob/main/spec/features/create_etd_spec.rb

Google Books

TODO: link to code or brief explanation of gbooks accessioning

was-registrar-app

TODO: link to code or brief explanation of was-registrar-app accessioning

cocina-models

SDR data model written as syntactically validatable with dry-struct, dry-types, and openapi

(ask JCoyne or JLitt)

Workflows and Robots

Robots are what we call our individual processing steps which are grouped into “workflows”, coordinated by the workflow server (and Resque and resque-pool). They do things like updating the SDR metadata store (currently Fedora data streams), generating technical metadata, handing off to the preservation system a copy of each object version, etc.

Together, workflow server and the robots provide a system for managing the SDR accessioning pipeline: the robots ingest content into the SDR; workflow server coordinates by queuing tasks to a shared Redis instance.

The robots inherit their worker functionality from the lyber-core gem.

Each type of robot-suite has one or more VMs of its own, but all share the same Redis instance, and all are managed on their respective VMs by resque-pool.

This architecture was chosen before open source workflow automation tools like Airflow were available/mature.

Another thing that adds indirection/confusion: sometimes the robots will perform the heavy lifting of a task, but sometimes even they call out to other services. For example, in the preservationIngestWF, the validate-moab step in the preservation_robots actually makes a REST call to preservation_catalog to do the validation asynchronously (because that service will do all auditing and cloud replication after the robot finishes updating the content). And then preservation_catalog tells workflow service when it’s done and workflow service tells the preservation_robots to do the next step.

(ask anyone and hand waving will occur. Perhaps Peter? Andrew? Naomi? JCoyne? - we can re-figure out the layers of legacy code as a team)

Access Rights for digital content

(ask JCoyne, John)

Embargoes

SDR APIs

Google-books

A github repository, a deployed app and so much more

(ask JLitt, JCoyne or Mike)

ETD app

Electronic Theses and Disserations

(ask Naomi or Mike)

Pre-assembly

When you have complex objects or objects of certain characteristics, this app will organize the files appropriate for getting them into the SDR

(ask Peter or Naomi)

Preservation

Used to keep digital content safe, both on prem and also in the cloud

(ask John)

Technical Metadata app

Service to extract and record technical metadata for files deposited into SDR

(ask JLitt)

SDR Tags database

Accessioneers and Argo users and some of our apps “tag” digital objects in the SDR

(ask Mike)

Modsulator

A gem and an app that can turn spreadsheets into MODS in a single bound … or something

Bulk jobs/actions in Argo

When making changes to objects one at a time is the wrong approach

Web archiving

yes, Virginia, DLSS does crawl or download to get WARCs and ingest them into SDR and serve them out

(ask JLitt or Naomi)

Solr indexing of SDR

How we currently index SDR content for searching; the index is used by Argo and dor-services-app. Fedora does not have robust search/query built in. We are occasionally bitten by this non-transactional separation between metadata persistence and indexing, e.g. the occasional failure to prevent re-use of a sourceID when registering items.

(ask JCoyne)

Events (e.g. in Argo object view)

A way to capture an objects changes over time

(ask JCoyne)

Goobi

A workflow tool used by DLSS digitization staff and then hooking into common-accessioning

(ask Peter)