Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Many files that are processed through OpenRefine may contain repeated errors – on other words, the source file contains the same error each time you use it to update GOKb. We would like to develop a long-term strategy to deal with repeated errors, but in the mean time, we need our users to work with repeated errors so that we can develop our tools based on their experience.This page documents a few ways that users can address repeated errors in their data.

Document repeated errors

If you work with the same data every month, you'll quickly realize how frustrating it is to fix the same errors again and again. One useful strategy is to document repeated errors so you don't have to research them each time you process a file. You can use an excel template to document two kinds of errors – cell level errors and rows to delete.

For cell-level errors, you should document the title that is affected, the field name of the cell that you have to change, and the original and new values of that cell.

For entire rows that need to be deleted, you will need to document enough information to identify the row in the future – usually the title and at least one identifier.

Download the template

Work with the supplier

You may want to pursue working with the data supplier (usually a publisher or aggregator) to see if they are willing to fix the errors at the source. You may want to start the conversation by asking if they would be interested in receiving notice of errors in their data and their preferred format for receiving them.

Future development in this area

Your experiences working with repeated errors will help information future development. In the short term, we'd like to experiment with the creation of macros to handle known errors. Longer term, we hope to build tools that are integrated with OpenRefine and GOKb environments to help in this area.

 

  • No labels