/
Catalog Build and Views in DCB Admin
  • Current and Stable
  • Catalog Build and Views in DCB Admin

     

    References

     

    Catalog Build Process

    Since Ingest v2, catalog build is split into two separate processes.

    Harvest

    New and changed records are pulled from the LMS host.

    1. This process is also known as Import.

    2. When the service configuration setting sheduled-tasks.enabled (sic) is true, local hosts are polled on a recurring cycle for new and changed records to harvest.

      1. If this setting is false, no harvest or ingest processing occurs. (Note: this setting also determines whether circulation tracking occurs).

    3. The polling frequency is determined by harvest.interval service configuration setting (default: 2 minutes).

    4. Only hosts that have the flag ingest:enabledset to true are polled for new or changed records.

      1. If DCB cannot connect to a host that is enabled for ingest polling, it is skipped after 3 retries.

      2. A general exception log entry (“….“) is recorded after each failed connection attempt. There is no specific “harvest aborted after 3 attempts” log message.

      3. The next time the harvest cycle runs, another attempt will be made to connect to the host.

    5. New or changed records that are harvested are stored in the source_record table with the status set to PROCESSING_REQUIRED.

      1. For OAI-PMH, DCB request a delta from the local host.

        1. An empty return from OAI-PMH may indicate either no changes or a problem (especially if this is the first attempt to harvest a newly configured host).

      2. For Sierra and Polaris, DCB interrogates the local catalog database for changes

      3. The job_checkpoint table is updated based on the response received from the local host, and is a marker for when we go back for new changes the next time the cycle runs. This is useful for diagnosing harvest history in case of issues.

    Ingest

    Harvested records are processed, which may result in bib and cluster records being created, updated or deleted.

    1. Processing records for ingest happens in its own processing cycle, independently of harvesting

      1. It requires the setting sheduled-tasks.enabled (sic) to be set true

      2. The cycle interval is TBC .

    2. The ingest service processes entries in the source_record that have a PROCESSING_REQUIRED status only.

    3. If a corresponding bib_record entry exists for the source_record entry, the bib record will be:

      1. deleted

        1. if the source record entry is now flagged as deleted

        2. if the source record entry is now flagged as suppressed

      2. updated (for other changes to existing records)

    4. If the source_record harvested is new, or indicates a switch from being previously suppressed, it will be created, and there is no existing corresponding bib_record entry for it, a new bib record is created associated with the source record.

    5. In any case that the creation, deletion, removal or update of a bib record and its associated cluster_record does not complete, the source record status is changed to FAILURE.

      1. If the bib record already exists, it will not reflect changes harvested, and will be at variance with the corresponding source record.

      2. A record that has failed to complete ingest processing will not be automatically re-processed. This needs to be manually updated to reset.

    6. Otherwise the source record status is updated to SUCCESS, and the bib record (if it still exists), cluster record and source record should correspond.

    7. Changes to the shared index are queued for processing in the shared_index_queue_entry table (as part of the ingest cycle), based on the revised cluster_record entry.

    The Bib Record Count by Host LMS page shows the number of records in different states in the catalog build process.

     

    DCB Admin Catalog Views

    Within DCB Admin, there are three different views of catalog data.

    1. Bib Record Count by Host LMS

    2. Bib Records

    3. Shared Index

     

    Bib Record Count by Host LMS

    Provides a summary view of harvest and ingest metrics for each catalog host.

    Can be used to indicate ingest processing anomalies.

    Column

    Description

    Notes

    Column

    Description

    Notes

    Source system name

    Name of host library management system

    Corresponds to Host LMS name across DCB Admin

    Harvest enabled

    Whether the host will be polled for catalog updates (which applies to harvesting).

    A host may not currently be enabled for ingest polling, but may have been previously. In this case, there will be source and bib records in DCB but they will not be updated.

    Harvested

    Count of harvested records for the catalog host

    Based on source_record table. Does include suppressed and deleted entries.

    Awaiting ingest

    Count of records that have been harvested from the source host but have not been yet been processed by DCB.

    Based on source_record table, where status is PROCESSING_REQUIRED.

    Ingest failed

    Count of failures to reflect changes indicated in harvested source records to corresponding bib records.

    Based on source_record table, where status is FAILURE.

    Ingested

    Count of source records that have been successfully processed for creation, update or removal of corresponding bib records.

    Based on source_record table, where status is SUCCESS.

    Bib records

    Count of bib records for the catalog host

    Based on bib_record table. Includes successfully ingested source records. Does not include suppressed or deleted entries.

    Source/bib difference

    Difference between count of records in source and bib tables.

    This is the source_record count minus bib_record count per host

    This can be used to indicate the number of deleted and suppressed entries for a host. Any source records that are in state FAILURE, will skew this number, depending on whether the failed processing is on a source record that does or does not yet have a corresponding bib record.

    Source system UUID

    Unique system reference for catalog host

    Hidden by default

    Checkpoint UUID

     

    Hidden by default

     

    Bib Records

    Presents an index of individual titles from each catalog host.

    The record count indicates the total entries in the bib_record table, rather than the source_record table.

    This view does not include suppressed or deleted records.

     

    Shared Index

    Presents a searchable view of titles in the union catalog, rather than the bib_records or source_records tables.

    Use for reviewing match criteria for title clustering.

     

     

     

    Operated as a Community Resource by the Open Library Foundation