...
No Format |
---|
<response> <documents> <document id="715e92f0-b3ab-4263-96d9-58183a23e6d5"></document> </documents> <user>ole-khuntley</user> <operation>delete</operation> <status>Success</status> </response> |
2.8 Bulk Ingest (Admin function)
Bulk ingest process is used for loading the docstore repository with large amounts of document information.
It is usually done by an admin user during off-peak hours to avoid end-users experiencing slow down of docstore.
The input data is copied to a "Batch Upload" directory on the server (specified by documentstore.properties)
Go to /admin.jsp and 'Bulk Ingest' tab.
Click the button to start the process. Once started, the process runs in background and waits for input files to become available in the "Batch Upload" directory.
As soon as a file is available, it is picked up for processing. After a file is ingested it is moved to a ".done" sub-directory and the next available file is picked up.
To verify that the data is stored in DocumentStore, go to URL
http://localhost:8080/oledocstore
Click "Refresh Summary" button in the Summary tab.
Note for the count for each DocType.
To verify that the data is indexed in DocStore, go to URL
http://localhost:8080/oledocstore/discovery
Click "Refresh" button in the Summary tab.
Note for the count for each DocType.
2.9 Rebuild indexes (Admin function)
Sometimes the indexed data in Docstore may get corrupted. Or the data may need to be reindexed due to changes in indexing criteria and search/sort/facet rules.
In these cases the Docstore data can be re-indexed. This is also done by the Admin user.
Go to /admin.jsp and 'Rebuild Indexes' tab.
Click the 'Start' button to start the process.
Click the 'Status' button to view the status of the process.
Click the 'Stop' button to stop the process. The process is stopped after the current batch of data is reindexed.
To verify that the data is indexed in DocStore, go to URL
http://localhost:8080/oledocstore/discovery
Click "Refresh" button in the Summary tab.
Note for the count for each DocType.
2.10 Ingest Binary data (BagIt Requests)
When the document content is of (non-text or binary) format PDF, DOC etc, (as in the case of License Agreement documents) it is difficult to send it to docstore through a web page.
...
The response.xml along with the temp folder name is to the browser.
...
3.
...
0 Appendix
...
3.
...
0.1 Sample Input XML for Ingest
No Format |
---|
<request> <user>ole-khuntley</user> <operation>batchIngest</operation> <requestDocuments> <ingestDocument id="1" category="work" type="bibliographic" format="marc"> <content><![CDATA[ <collection xmlns="http://www.loc.gov/MARC21/slim"> <record> <leader>01142cam 2200301 a 4500</leader> <controlfield tag="001">92005291</controlfield> <controlfield tag="003">DLC</controlfield> <controlfield tag="005">19930521155141.9</controlfield> <controlfield tag="008">920219s1993 caua j 000 0 eng</controlfield> <datafield tag="010" ind1=" " ind2=" "> <subfield code="a">92005291</subfield> </datafield> <datafield tag="020" ind1=" " ind2=" "> <subfield code="a">0152038655 :</subfield> <subfield code="c">$15.95</subfield> </datafield> <datafield tag="040" ind1=" " ind2=" "> <subfield code="a">DLC</subfield> <subfield code="c">DLC</subfield> <subfield code="d">DLC</subfield> </datafield> <datafield tag="042" ind1=" " ind2=" "> <subfield code="a">lcac</subfield> </datafield> <datafield tag="050" ind1="0" ind2="0"> <subfield code="a">PS3537.A618</subfield> <subfield code="b">A88 1993</subfield> </datafield> <datafield tag="082" ind1="0" ind2="0"> <subfield code="a">811/.52</subfield> <subfield code="2">20</subfield> </datafield> <datafield tag="100" ind1="1" ind2=" "> <subfield code="a">Sandburg, Carl,</subfield> <subfield code="d">1878-1967.</subfield> </datafield> <datafield tag="245" ind1="1" ind2="0"> <subfield code="a">Arithmetic /</subfield> <subfield code="c"> Carl Sandburg ; illustrated as an anamorphic adventure by Ted Rand. </subfield> </datafield> <datafield tag="250" ind1=" " ind2=" "> <subfield code="a">1st ed.</subfield> </datafield> <datafield tag="260" ind1=" " ind2=" "> <subfield code="a">San Diego :</subfield> <subfield code="b">Harcourt Brace Jovanovich,</subfield> <subfield code="c">c1993.</subfield> </datafield> <datafield tag="300" ind1=" " ind2=" "> <subfield code="a">1 v. (unpaged) :</subfield> <subfield code="b">ill. (some col.) ;</subfield> <subfield code="c">26 cm.</subfield> </datafield> <datafield tag="500" ind1=" " ind2=" "> <subfield code="a">One Mylar sheet included in pocket.</subfield> </datafield> <datafield tag="520" ind1=" " ind2=" "> <subfield code="a"> A poem about numbers and their characteristics. Features anamorphic, or distorted, drawings which can be restored to normal by viewing from a particular angle or by viewing the image's reflection in the provided Mylar cone. </subfield> </datafield> <datafield tag="650" ind1=" " ind2="0"> <subfield code="a">Arithmetic</subfield> <subfield code="x">Juvenile poetry.</subfield> </datafield> <datafield tag="650" ind1=" " ind2="0"> <subfield code="a">Children's poetry, American.</subfield> </datafield> <datafield tag="650" ind1=" " ind2="1"> <subfield code="a">Arithmetic</subfield> <subfield code="x">Poetry.</subfield> </datafield> <datafield tag="650" ind1=" " ind2="1"> <subfield code="a">American poetry.</subfield> </datafield> <datafield tag="650" ind1=" " ind2="1"> <subfield code="a">Visual perception.</subfield> </datafield> <datafield tag="700" ind1="1" ind2=" "> <subfield code="a">Rand, Ted,</subfield> <subfield code="e">ill.</subfield> </datafield> </record> </collection> ]]> </content> </ingestDocument> </requestDocuments> </request> |
...
3.
...
0.2 Sample Input file for Check In
The "Id" attribute of <ingestDocument> should be a valid UUID of a previously ingested document.
No Format |
---|
<request> <user>ole-khuntley</user> <operation>checkIn</operation> <requestDocuments> <ingestDocument id="1" category="work" type="bibliographic" format="marc"> <content><![CDATA[ <collection xmlns="http://www.loc.gov/MARC21/slim"> <record> <leader>01142cam 2200301 a 4500</leader> <controlfield tag="001">92005291</controlfield> <controlfield tag="003">DLC</controlfield> <controlfield tag="005">19930521155141.9</controlfield> <controlfield tag="008">920219s1993 caua j 000 0 eng</controlfield> <datafield tag="010" ind1=" " ind2=" "> <subfield code="a">92005291</subfield> </datafield> <datafield tag="020" ind1=" " ind2=" "> <subfield code="a">0152038655 :</subfield> <subfield code="c">$15.95</subfield> </datafield> <datafield tag="040" ind1=" " ind2=" "> <subfield code="a">DLC</subfield> <subfield code="c">DLC</subfield> <subfield code="d">DLC</subfield> </datafield> <datafield tag="042" ind1=" " ind2=" "> <subfield code="a">lcac</subfield> </datafield> <datafield tag="050" ind1="0" ind2="0"> <subfield code="a">PS3537.A618</subfield> <subfield code="b">A88 1993</subfield> </datafield> <datafield tag="082" ind1="0" ind2="0"> <subfield code="a">811/.52</subfield> <subfield code="2">20</subfield> </datafield> <datafield tag="100" ind1="1" ind2=" "> <subfield code="a">Sandburg, Carl,</subfield> <subfield code="d">1878-1967.</subfield> </datafield> <datafield tag="245" ind1="1" ind2="0"> <subfield code="a">Arithmetic /</subfield> <subfield code="c"> Carl Sandburg ; illustrated as an anamorphic adventure by Ted Rand. </subfield> </datafield> <datafield tag="250" ind1=" " ind2=" "> <subfield code="a">1st ed.</subfield> </datafield> </record> </collection> ]]> </content> </ingestDocument> </requestDocuments> </request> |
...
4. Search
This functionality allows documents to be searched for by giving keywords or phases. Searching can be based on category, type, format, search fields.
...
4.1 Quick Search
Select Doc Category : Work
...
System shows records with any field matching one or more keywords.
...
4.2 Advanced Search
Select Doc Category : Work
...