Docstore Initial Data Loading
Initial Data Loading (Bulk Ingest) in Docstore
Requirements
Â
Â
Â
Specifications
Data handled in initial load
Bib MARC records and metadata like createdBy, dateEntered, updatedBy, lastUpdated, fastAddFlag, supressFromPublic, status, statusUpdatedBy, statusUpdatedOn, staffOnlyFlag, harvestable etc. The metadata is supported outside the bib record, in the request xml tag /request/requestDocuments/ingestDocument/additionalAttributes
Instance OLEML records (holdings, items) with similar metadata as bib. The metadata is supported as part of holdings and item oleml record. /instanceCollection/instance/oleHoldings/extension/additionalAttributes; /instanceCollection/instance/items/item/extension/additionalAttributes.
Format of data files
Several files, each containing an OLE Docstore Request with n(configurable, e.g. 50000) ingestDocuments, each containing a bib MARC record and additional attributes
Several files, each containing an OLE Docstore Request with n(configurable, e.g. 50000) ingestDocuments, each containing an instance in OLEML format.
Bib MARC record has control field 001 with local identifier as the value. This is preserved in DocStore.
Instance has resourceIdentifier field (repeatable) which is a bib ID.
Location of data files
Â
Indexing of data
Currently each batch of records is saved in docstore and they are indexed in solr in one transaction
Â
Â
Operated as a Community Resource by the Open Library Foundation