/
Record matching

Didn't find what you were looking for?

Email questions or documentation suggestions to info@projectreshare.org.

Record matching

When bibliographic records are ingested into the shared inventory, ReShare applies an algorithm designed to match and deduplicate records that represent the same entity. The desired result of this process is a single cluster record linked to holdings and items from multiple contributors.

Match keys

ReShare matches records by creating a match key for each incoming record. Match keys are made up of portions of core bibliographic fields such as title, author, date of publication, and others. If two records have the same match key, they will be clustered together in the Shared Inventory.

ReShare uses a match key algorithm developed by the Colorado Alliance of Research Libraries for its Gold Rush Library Content Comparison system. Full details of the algorithm as it is currently implemented can be found at Gold Rush MARC Match Key.

ReShare match keys are stored in the 999 10 $m field in the cluster record. The match key can be seen in the “Staff View” of a title in VuFind.

Troubleshooting matches

The match key provides a helpful tool to troubleshoot match errors. These can include records that appear to represent the same resource, but have not matched (false negatives). Conversely, you may also find records that have matched, but don’t appear to represent the same resource (false positives). At this time, false negatives are much more common.

To investigate false negatives, copy and paste the match keys from the records you’re looking at into a document where they can be compared side by side and identify the differences between each line. Consult the match key algorithm to identify the source of the differing characters. In the example above, row 1 is missing the characters “1939” (the author’s birth date), while row 2 is missing “365” (the number of pages). The missing data in the original source records has prevented a match.

Fixing match errors

Match errors must be fixed in the source bib record in your library’s local catalog, rather than directly in the shared index. When you fix a record at the source, any changes will be picked up during your next data refresh and the record should match correctly.

 

 

Operated as a Community Resource by the Open Library Foundation