Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page documents ways that users can address repeat errors in package files through the OpenRefine ingest process. Repeat errors may include: missing or incorrect ISSNs or eISSNs, missing dates, spelling errors, or other typos.

Macros

We have created macros for existing GOKb data providers and will continue to create new macros as needed. To run a macro in an OpenRefine project:

...

  • Click in the cell you would like to edit
  • Update the ISSN to the correct one
  • Check the box next to Capture Edit
  • Check to see that the PublicationTitle is selected in the drop-down menu
  • Click Apply  to complete the change and save the edit
    • To view the saved edit, go to the Undo/Redo tab and you will see your edit in the last documented step.

  • After you have completed all edits for this package, go to the Undo/Redo tab and select Extract. Copy and paste the JSON into a text file and then you can reuse this code next time you update the package so that you do not have to recreate all of the editing steps.

Document repeated errors

If you work with the same data every month, you'll quickly realize how frustrating it is to fix the same errors again and again. One useful strategy is to document repeated errors so you don't have to research them each time you process a file. You can use an Excel template to document two kinds of errors – cell level errors and rows to delete.

...