Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 16 Next »

1. Overview

DocumentStore for OLE is a content management system with features like checkin, checkout, versioning, locking etc for library records such as Bibliographic, Item, Holdings, Patron, License etc. Most of the records are in XML format but  the documentstore is format agnostic in that it stores the content as is without any type conversion. Furthermore indexing of the stored data is also supported for efficient search and retrieval. Although the documentstore is an independent system that comes with basic UI to enable supported operations, majority of interaction happens from OLE such as ingest of new records, editing of existing records and search and retrievals. 

2. Core Technologies

DocumentStore for OLE uses Apache Jackrabbit 2.0 for content storage and Apache Solr 3.0 for indexing and searching. 

3. Architecture

The architecture was designed primarily around the need to store various document types, formats and the volume. Jackrabbit is a content hierarchy of "items". An item can be a node or a property (stores the actual value). For example if one were to store the first name, middle name and and the last name of the person in jackrabbit, there will be a node for person with properties for first name, middle name and last name as properties that store the actual value. In comparison to a traditional RDBMS, nodes can be thought of as tables and properties as columns that have the actual values.

Even though the architecture is flexible i.e. at the time of setup implementors can specify the content hierarchy of the data, the default content hierarchy has been setup assuming three levels i.e document category, document type and document forma as shown in the diagram below;


1 - Denotes the 1st level of content hierarchy under the root node which represents document categories. Here its is "Work" 

2 - Denotes the 2nd level which represents the document types such as "Bibliographic" and "Instance"

3 - Denotes the 3rd level which represents the document formats such as "MARC" and "Dublin-core" for Bibliographic records and "OLEML" (OLE defined) for Instance records.


L1 - This denotes level 1 where you can have up to 1K nodes.

L2 - This denotes level 2 where you can have up to 1k nodes.

L3 - This denotes level 3 where you can have up to 1K nodes.

Total number of MARC records at L1

Jackrabbit recommends small number of nodes per parent node for efficiency.

The total number of nodes (i.e. resulting number of files) that can be accommodated at L1 is 100M with possibility of being able to add more.


 


L1 - This denotes leve 1 with up to 1K nodes.

L2 - This denotes level 2 with up to 1K nodes.

L3 - This denotes level 3 with un to 1K nodes.

Total number of Instance records at L1

Jackrabbit recommends small number of nodes per parent node for efficiency.

The total number of nodes (i.e. resulting number of files) that can be accommodated at L1 is 100M with possibility of being able to add more.
  

4. Features (Available thru restful services)

DocumetnStore comes with two sets of restful service application programming interfaces (APIs). The first set consists of services for operations against the documetnstore which are as follows;

a. Ingest (Single file with one ore more records) - Allows storing of documents in the document store. The input file has to conform with a standard schema.

b. Checkout (Requires a Universally Unique Identified aka UUID of the document to be checked out) -  Allows for checking out a single file from the documentstore.

c. Checkin (Similar to the Ingest) - Checks in a file with versioning.

d. Browse - To be able to look at the repository file count. 

e. Link (Requires UUIDs of two documents) - Allows two records to be linked to each other via their UUIDs.

The second set of services are for the discovery layer. Currently these are straight forward SOLR APIs that are constructed based on the search criteria, examples of which will be provided in the technical section.

5. 

  • No labels