Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Many files that are processed through OpenRefine may contain repeated errors – on in other words, the source file contains the same error each time you use it to update GOKb. We would like to develop a long-term strategy to deal with repeated errors, but in the mean time, we need our users to work with repeated errors so that we can develop our tools based on their experience.This page documents a few ways that users can address repeated errors in their dataThis page documents a few ways that users can address repeated errors in their data.

Macros

We have created macros for existing GOKb data providers and will continue to create new macros as needed. To run a macro in an OpenRefine project:

  • In any cell, click Edit
  • Right-click in the box and select Apply Macro
  • In the search box, type the name of your provider to locate the correct macro
  • Click Ok  to run the macro. 
    • Note: the macro will run automatically and should only take a few seconds. When the macro functions are complete, you should see fewer Error and Warning messages in the left-hand column.

Capture-Edit

In addition to the Macros, you can also save changes you make that the cell-level by selecting "Capture Edit" from the dialog box. This will generate valid JSON code, which can then be copied and used the next time you update the package. For example, to save the change of a missing or incorrect ISSN:

  • Click in the cell you would like to edit
  • Update the ISSN to the correct one
  • Check the box next to Capture Edit
  • Check to see that the PublicationTitle is selected in the drop-down menu
  • Click Apply  to complete the change and save the edit
    • To view the saved edit, go to the Undo/Redo tab and you will see your edit in the last documented step.

  • After you have completed all edits for this package, go to the Undo/Redo tab and select Extract. Copy and paste the JSON into a text file and then you can reuse this code next time you update the package so that you do not have to recreate all of the editing steps.

Document repeated errors

If you work with the same data every month, you'll quickly realize how frustrating it is to fix the same errors again and again. One useful strategy is to document repeated errors so you don't have to research them each time you process a file. You can use an excel Excel template to document two kinds of errors – cell level errors and rows to delete.

...

You may want to pursue working with the data supplier (usually a publisher or aggregator) to see if they are willing to fix the errors at the source. You may want to start the conversation by asking if they would be interested in receiving notice of errors in their data and their preferred format for receiving them.

Future development in this area

Your experiences working with repeated errors will help information future development. In the short term, we'd like to experiment with the creation of macros to handle known errors. Longer term, we hope to build tools that are integrated with OpenRefine and GOKb environments to help in this area.