Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

PERFORMANCE BENCHMARK FOR INGEST

...

ARCHITECTURE

DOCSTORE ARCHITECTURE

Please read OLE DocStore wiki for more detailed architecture design and data modeling and organizing in DocStore. Currently DocStore hosts the bib, instance, license/agreement data.

COMMENTS ON DOCSTORE ARCHITECTURE

JOHN PILLANS thought we shouldn't use JackRabbit for storing Bib and Instance data, since we can not fully use the features that JackRabbit provides, and JCR brings bad performance here. He suggested we may put Bib and Instance data to database (blob field), instead of DocStore. The architecture would be much simple, with much faster performance.

WOULD LIKE John provides more detailed information about the IU library system architecture and performance evaluation on database and Solr. 

CURRENT INGEST PERFORMANCE & REQUIREMENT

INGEST PERFORMANCE MINIMUM REQUIREMENT

From John Pillans: Ingest about 20 million legacy data (including bib, instance..) need to finish in one week!

CURRENT PERFORMANCE FOR INGESTING BIB DATA

Legacy data ingest:

Ingest 6 million bib records, processing time = 60hrs

...

Total Process Time: 0:4:51.854(H:M:S.ms)

BULK INGEST FOR INSTANCE DATA

Legacy data ingest:

Ingest 10 million instance records, processing time may take 42 days

...

Time for Linking to Bib records: 2.51minutes

More For more detailed time breakdown, please read the xls file.

PERFORMANCE MINIMUM REQUIREMENT

Ingest about 20 million legacy data (including bib, instance..) need to finish in one week!

PERFORMANCE ISSUES

ARCHITECTURE REVIEW

COMMENTS ON DOCSTORE ARCHITECTURE

...