Catalog Build and Views in DCB Admin
References
Catalog Build Process
Since Ingest v2, catalog build is split into two separate processes.
Harvest
New and changed records are pulled from the LMS host.
This process is also known as Import.
When the service configuration setting
sheduled-tasks.enabled
(sic) is true, local hosts are polled on a recurring cycle for new and changed records to harvest.If this setting is false, no harvest or ingest processing occurs. (Note: this setting also determines whether circulation tracking occurs).
The polling frequency is determined by
harvest.interval
service configuration setting (default: 2 minutes).Only hosts that have the flag
ingest:enabled
set totrue
are polled for new or changed records.If DCB cannot connect to a host that is enabled for ingest polling, it is skipped after 3 retries.
A general exception log entry (“….“) is recorded after each failed connection attempt. There is no specific “harvest aborted after 3 attempts” log message.
The next time the harvest cycle runs, another attempt will be made to connect to the host.
New or changed records that are harvested are stored in the
source_record
table with thestatus
set to PROCESSING_REQUIRED.For OAI-PMH, DCB request a delta from the local host.
An empty return from OAI-PMH may indicate either no changes or a problem (especially if this is the first attempt to harvest a newly configured host).
For Sierra and Polaris, DCB interrogates the local catalog database for changes
The
job_checkpoint
table is updated based on the response received from the local host, and is a marker for when we go back for new changes the next time the cycle runs. This is useful for diagnosing harvest history in case of issues.
Ingest
Harvested records are processed, which may result in bib and cluster records being created, updated or deleted.
Processing records for ingest happens in its own processing cycle, independently of harvesting
It requires the setting
sheduled-tasks.enabled
(sic) to be set trueThe cycle interval is TBC .
The ingest service processes entries in the
source_record
that have a PROCESSING_REQUIRED status only.If a corresponding
bib_record
entry exists for thesource_record
entry, the bib record will be:deleted
if the source record entry is now flagged as deleted
if the source record entry is now flagged as suppressed
updated (for other changes to existing records)
If the
source_record
harvested is new, or indicates a switch from being previously suppressed, it will be created, and there is no existing correspondingbib_record
entry for it, a new bib record is created associated with the source record.In any case that the creation, deletion, removal or update of a bib record and its associated
cluster_record
does not complete, the source record status is changed to FAILURE.If the bib record already exists, it will not reflect changes harvested, and will be at variance with the corresponding source record.
A record that has failed to complete ingest processing will not be automatically re-processed. This needs to be manually updated to reset.
Otherwise the source record status is updated to SUCCESS, and the bib record (if it still exists), cluster record and source record should correspond.
Changes to the shared index are queued for processing in the
shared_index_queue_entry
table (as part of the ingest cycle), based on the revisedcluster_record
entry.
The Bib Record Count by Host LMS page shows the number of records in different states in the catalog build process.
DCB Admin Catalog Views
Within DCB Admin, there are three different views of catalog data.
Bib Record Count by Host LMS
Bib Records
Shared Index
Bib Record Count by Host LMS
Provides a summary view of harvest and ingest metrics for each catalog host.
Can be used to indicate ingest processing anomalies.
Column | Description | Notes |
---|---|---|
Source system name | Name of host library management system | Corresponds to Host LMS name across DCB Admin |
Harvest enabled | Whether the host will be polled for catalog updates (which applies to harvesting). | A host may not currently be enabled for ingest polling, but may have been previously. In this case, there will be source and bib records in DCB but they will not be updated. |
Harvested | Count of harvested records for the catalog host | Based on |
Awaiting ingest | Count of records that have been harvested from the source host but have not been yet been processed by DCB. | Based on |
Ingest failed | Count of failures to reflect changes indicated in harvested source records to corresponding bib records. | Based on |
Ingested | Count of source records that have been successfully processed for creation, update or removal of corresponding bib records. | Based on |
Bib records | Count of bib records for the catalog host | Based on |
Source/bib difference | Difference between count of records in source and bib tables. | This is the This can be used to indicate the number of deleted and suppressed entries for a host. Any source records that are in state FAILURE, will skew this number, depending on whether the failed processing is on a source record that does or does not yet have a corresponding bib record. |
Source system UUID | Unique system reference for catalog host | Hidden by default |
Checkpoint UUID |
| Hidden by default |
Bib Records
Presents an index of individual titles from each catalog host.
The record count indicates the total entries in the bib_record
table, rather than the source_record
table.
This view does not include suppressed or deleted records.
Shared Index
Presents a searchable view of titles in the union catalog, rather than the bib_records
or source_records
tables.
Use for reviewing match criteria for title clustering.
Operated as a Community Resource by the Open Library Foundation