Clean up data in OpenRefine (Unformatted Files)

When viewing a GOKb project in OpenRefine, you will see a series of error and warning messages in the left hand navigation pane. Fixing these errors ensures that the data being ingested into GOKb is of uniform high quality. Primarily, they check to make sure that all required fields are present and that the data in each cell is formatted correctly.

Error messages must be resolved before you will be allowed to ingest a file into GOKb. Warning messages do not need to be resolved – they highlight missing data that you may want to add, but that is not required. 

Steps

Step 1: Open your project

  • From the GOKb tab in OpenRefine, click the "check out" link next to the project you wish you to open.
  • If your project doesn't appear in the list, you may need to check it in for the first time.
  • If your project is checked out by another user, you will not be able to open or edit it. Checked out projects list the email address of the current user, and you can can contact them if necessary.

Step 2: Resolve all error messages

  • Each error message identifies a problem with your data that must be fixed before it can be ingested into GOKb.
  • Each error message has a button next to it that offers one or more quick resolutions.
  • You can resolve an error by applying the quick resolution or by manually fixing the data.
  • Consult the Columns page for more information about the requirements for each field.
  • Consult the OpenRefine guide for more information about how to use the features of Refine to make changes to your data.
  • Once an error is resolved, the message will disappear from the list.
  • When all the errors are resolved, you will see a new option to ingest the project into GOKb.

Step 3: Resolve warning messages as needed

  • Each warning message identifies a possible problem with your data that you may want to fix before ingesting it into GOKb
  • Each warning message has a button next to it that offers one or more quick resolutions.
  • You can resolve a warning by applying the quick resolution or by manually fixing the data.
  • Consult the Columns page for more information about the requirements for each field.
  • Consult the OpenRefine guide for more information about how to use the features of Refine to make changes to your data.
  • Once your have resolved all the warnings that you intend to fix, you can ingest your data.
  • You do not need to resolve all warnings before ingest.

Step 4: Add extra fields

Your GOKb project may contain additional fields that are not standard, but that may still be useful. You can load these fields into GOKb as custom fields.

  • If a field applies to the title across packages, add it at the title level by using the column heading: gokb.ti.{fieldname}
  • If a field applies only to the title as it appears in that particular package and platform, at it at the TIPP level by using the column heading: gokb.tipp.{fieldname}
  • If an identifier applies to a title across all packages and platforms, add it at the title level by using the column heading: title.identifier.{namespace}
  • There are some common fields that occur in in many title lists. To keep this data standardized, please use the following field names:
    • Publisher's proprietary ID: title.identifier.{publisher}
    • Imprint: gokb.ti.imprint
    • Title history free text notes: gokb.ti.titleHistoryNotes 
    • Previous title ID: gokb.ti.precedingPublicationTitleID

Operated as a Community Resource by the Open Library Foundation