GOKb Data Loading Cheat Sheet

This page is designed to be a quick reference guide for the GOKb data loading and ingest process. For more detailed information on each topic, please refer to the tutorials linked within the page. If you have not already taken training, contact Jennifer Solomon, GOKb Editor, to set up a time.

Load a file into OpenRefine

  • Open OpenRefine and log into the GOKb extension. Choose Create Project from the left-hand menu. Click Browse and locate the file you want to work with. Click Next. OpenRefine will show you a preview of your data. Scan it to make sure everything looks correct.

  • If there is extra text at the top of your file, you can use the Ignore first check-box at the bottom of the screen to prevent that data from being imported.

  • Uncheck the box labeled Parse cell test into numbers, dates... This may reformat your dates and ISSNs and can't be undone after import.

  • Choose UTF-8 as your character encoding standard. This will ensure that diacritics are correctly displayed.

  • Edit your project's name using the text field at the top of the screen using the project name format: Organization Name: Package Name: YYYYMMDD (where the date indicates the date you load the project into OpenRefine).

  • Click Create Project in the top right corner. Your project will automatically open.

Check a File Into GOKb

  • Click the GOKb button located in the top right corner of the screen. Select Check in this project for the first time.

  • You will be asked to provide the Source, Provider, Name, Description, and Notes (optional). Click Save and Check In.

Clean up data in OpenRefine

Use Macros to quickly rename columns

  • Click Edit in any cell and then right-click to show the Apply Macro option.

  • Click Apply Macro and then search for the KBART column transformation or the provider name macro (ex: American Society of Chemical Engineers).

  • Double-click the KBART macro and then wait while the application processes. Your columns will be renamed and you should see several error messages disappear.

  • The following three columns will always need to be added, and will require you to look up a controlled value to populate the data: platform.host.name, package.name column, org.publisher.name

  • The following columns are optional, and you may choose to add them if your data happens to contain extra information about these fields. If you are missing this information, you can omit these columns: TIPPPayment, TIPPStatus, Title.OAStatus

  • Address remaining invalid data errors and warnings and Capture repeat errors

  • Review additional fields and load these as custom columns

Ingest a Project Into GOKb

  • In the left hand navigation pane of OpenRefine, navigate to the Errors tab. Click the Update GOKb pane. Proceed by clicking Proceed with Ingest. Make any necessary changes, then click Save and Check-In.

  • Wait until the project is 100% ingested before moving on to the GOKb web app.

Verify a package record

  • In the GOKb web application, use the Search > Packages menu to locate the package you are working with.

  • Confirm that your package name is correct. If you think that people might search for the package using a term that isn't contained in the name, add a variant name.

Update package metadata fields (upper section)

  • List verifier: Check to make sure that you are the list verifier, or update this field with your name.

  • List verifier date: Update this field with today's date (YYYY-MM-DD)

  • Provider: The organization responsible for making the package available. This should be automatically populated with the provider selected in Refine.

  • Source: The location of the original data used to create the package. This should be automatically populated with the source selected in Refine.

  • Edit status: This field describes whether the package itself has been verified and is ready to use.

Populate package details (lower section)

  • Scope: Indicates how the provider has scoped the content of the package. Choices are Aggregator, Back file, Front file, Master file, Undefined.

  • Breakable: Indicates whether or not a package can be broken up by a subscribing library (Yes) or must be subscribed to as a whole (No).

  • Consistent: Indicates whether this package is has the same content for all libraries that subscribe (Yes) or can be customized (No).

  • Fixed: Indicates whether the package can change contents once purchased (No) or remains static (Yes).

  • Global: Indicates whether the package is offered to all customers (Global) or to a limited group of users (Consortial).

  • List status: Indicates whether the list of TIPPs associated with the package has been verified and is ready to use.

Celebrate! You’ve completed the data loading and ingest process for your package!

If you have received training on Processing Review Tasks, then go to the next step.

Otherwise, contact Jennifer and let her know that the package is ready for review.







Operated as a Community Resource by the Open Library Foundation