This page documents ways that users can address repeat errors in package files through the OpenRefine ingest process. Repeat errors may include: missing or incorrect ISSNs or eISSNs, missing dates, spelling errors, or other typos.

Macros

We have created macros for existing GOKb data providers and will continue to create new macros as needed. To run a macro in an OpenRefine project:

Capture-Edit

In addition to the Macros, you can also save changes you make that the cell-level by selecting "Capture Edit" from the dialog box. This will generate valid JSON code, which can then be copied and used the next time you update the package. For example, to save the change of a missing or incorrect ISSN:

Document repeated errors

If you work with the same data every month, you'll quickly realize how frustrating it is to fix the same errors again and again. One useful strategy is to document repeated errors so you don't have to research them each time you process a file. You can use an Excel template to document two kinds of errors – cell level errors and rows to delete.

For cell-level errors, you should document the title that is affected, the field name of the cell that you have to change, and the original and new values of that cell.

For entire rows that need to be deleted, you will need to document enough information to identify the row in the future – usually the title and at least one identifier.

Download the template

Work with the supplier

You may want to pursue working with the data supplier (usually a publisher or aggregator) to see if they are willing to fix the errors at the source. You may want to start the conversation by asking if they would be interested in receiving notice of errors in their data and their preferred format for receiving them.