Didn't find what you were looking for?

Email questions or documentation suggestions to info@projectreshare.org.

Record matching

When bibliographic records are ingested into a consortial Shared Inventory app, ReShare applies an algorithm designed to match and deduplicate records that represent the same entity. The desired result of this process is a single instance record linked to holdings and items from multiple contributors.

Match keys

ReShare matches records by creating a match key for each incoming record. Match keys are made up of portions of core bibliographic fields such as title, author, date of publication, and others. If two records have the same match key, they will be grouped together in the Shared Inventory.

ReShare uses a match key algorithm developed by the Colorado Alliance of Research Libraries for its Gold Rush Library Content Comparison system. Full details of the algorithm as it is currently implemented can be found at https://docs.google.com/document/d/1XjH6N31jS1jaDk2n1mWUfjdDfoCtqNC5IEeFWGfZe9Q/edit.

ReShare match keys are currently stored in the “Index Title” field on the Instance record, but will be moved to a dedicated field for the 1.1 release

Troubleshooting matches

The match key provides a helpful tool to troubleshoot match errors. These can include records that appear to represent the same resource, but have not matched (false negatives). Conversely, you may also find records that have matched, but don’t appear to represent the same resource (false positives). At this time, false negatives are much more common.

To investigate false negatives, copy and paste the match keys from the records you’re looking at into a document where they can be compared side by side and identify the differences between each line. Consult the match key algorithm to identify the source of the differing characters. In the example above, row 1 is missing the characters “1939” (the author’s birthdate), while row 2 is missing “365” (the number of pages). The missing data in the original source records has prevented a match.

Fixing match errors

Match errors must be fixed in the source bib record in your library’s local catalog, rather than directly in the shared index. When you fix a record at the source, any changes will be picked up during your next data refresh and the record should match correctly.

Note that, at this time, a side effect of fixing matches is that the original, unmatched record will not be deleted. This is a known issue and will be fixed in an upcoming release of the Shared Inventory software.

 

Operated as a Community Resource by the Open Library Foundation