DCB Source Ingest Concurrency Control (FOLIO)

This page explains the approach taken to controlling concurrency when ingesting large volumes of records from hosted FOLIO systems. Specific environment variables should be documented in the project README found here: https://github.com/openlibraryenvironment/dcb-service

Approach

DCB treats all it’s source systems as a single set. The system uses project reactor to set up a full set of reactive record streams. These streams are enumerated using a reactive threading model. This means that even with 1000 sources, we can control concurrency by adjusting the number of threads used to make requests from those sources. At a deeper level, sources can be grouped and concurrency controlled only within that group - for example - “The Hosted FOLIO group” may be configured with max concurency of 2. This means that against a set of 60 hosted FOLIO instances, only 2 OAI requests will be in flight at any one time. Page requests can be made randomly from any of the servers in the group based on resumption token. Requests will be made from the group until all sources have completed. Be aware, of course, that if a new source is added, it is likely all other sources will complete their delta quickly, and the new source will take longer to complete.

Each OAI client will stop after an environment defined number of records have been fetched, and wait for an environment defined pause (Default 2m). This is primarily to give the Aurora autovacuum space to run. Since OAI polling attempts to start every 2 minutes, hosting providers will see a pattern of (Numbers based on defaults, but tunable) ~10000 records, 2 min pause, ~10000 records, 2 min pause….. repeat until complete.

The figres are ~10000 records because OAI page size cannot always be controlled, so the triggering condition for a pause is record.count > threshold - where record.count increments by the number of records in an OAI page on each pass.

FOLIO Ingest FAQ

  • How are deletes managed?

    • The DCB OAI client requires OAI sources (Like FOLIO) to support deletes. OAI Clients not supporting deletes will not be able to reflect deleted resources. DCB does not do clear-down / full re-ingest to work around non profile compliant OAI sources.

  • How is concurrency controlled

Operated as a Community Resource by the Open Library Foundation