Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Compare one Refine project to another Refine project
  • Look up lines in a Refine project against a web site or API

Comparing two Refine

...

projects

The simplest way to compare two lists of journals from different sources is to first get both lists into Refine and to use the 'cross' function to lookup values in common or missing between the two projects. To do this the two projects will need to have at least one column containing the same type of data. Such columns do not have to be named the same but do need to contain the same information - for example ISSN. The method described below relies of matching values between two Refine projects, so values that are likely to be consistent are the most useful. ISSN, eISSN, DOI and other identifiers are useful. When dealing with KBART data the title_id or title_url columns can prove useful when you are comparing two files from the same content provider, but not when dealing with different content providers.

...

Doing a lookup from ListA to ListB would tell you that Journal 1 was missing from ListB, but would not be able to tell you about Journal 5, and vice versa.

Anchor
simple-comparison-eg
simple-comparison-eg
Simple comparison example

  • Two lists of journals have been used to create Refine projects (ListA and ListB), and both projects have been edited so they contain a column named 'title.identifier.issn' containing ISSNs for the journals in the list
  • Open ListA, click the 'title.identifier.issn' column header and choose 'Edit Column' -> 'Add column based on this column ...' from the dropdown menu
  • In the 'New column name' box enter a column name such as "ListB Comparison" (any name will do)
  • In the 'Expression' box enter the text:
    • cell.cross("ListB","title.identifier.issn").length()

...

  • Click 'OK'. This will populate a new column with a number representing the number of matching lines found in ListB
    • If the "ListB Comparison" column contains a zero (0) then no match has been found
    • If the "ListB Comparison" column contains a one (1) then a single match has been found
    • If the "ListB Comparison" column contains a two (2) then two matches have been found
    • etc.
  • Facet on the new 'ListB Comparison' column to find those lines in ListA that do not appear in ListB (a zero in the column)
  • To identify journals that are in ListB but not in ListA, the same process is carried out starting with the 'title.identifier.issn' column in the 'ListB' project

Anchor
comparison-with-title-eg
comparison-with-title-eg
Comparison with titles example

  • Two lists of journals have been used to create Refine projects (ListA and ListB), and both projects have been edited so they contain a column named 'title.identifier.issn' containing ISSNs and a column named 'PublicationTitle' containing journal titles
  • Open ListA, click the 'title.identifier.issn' column header and choose 'Edit Column' -> 'Add column based on this column ...' from the dropdown menu
  • In the 'New column name' box enter a column name such as "ListB Comparison" (any name will do)
  • In the 'Expression' box enter the text:
    • forEach(cell.cross("ListB","title.identifier.issn"),r,r.cells.PublicationTitle.value).join("|")
  • Click 'OK'. This will populate a new column with all matching titles found for the relevant ISSN, with a 'pipe' character | between each title. The pipe character was chosen simply because it is unlikely to appear in any journal title. Some other separator by amending the expression above. If there is no match the relevant cell will be blank
  • To find all the rows in ListA which found no match in ListB, click on the column header drop down in the new "ListB Comparison" column and choose 'Facet' -> 'Customized Facets' -> 'Facet by blank'
    • In the resulting Facet those rows found by the 'true' facet (i.e. there is a blank in the column) are the ones which found no match

...

forEach(cell.cross("ListB","title.identifier.issn"),r,r.cells.CoverageNotes.value).join("|")

Anchor
comparison-iterative-match-eg
comparison-iterative-match-eg
Comparison using iterative matching example

While ISSN and eISSN are good match points due to their consistency and prevalence in journal lists, it is not unusual in a single list for some titles to have an ISSN but no eISSN and vice-versa. It is also not unusual for some titles to have neither an ISSN or an eISSN. Because the 'cross' lookup relies on finding matches, if in one list Journal Title A only has an ISSN, and in the other list Journal Title A only has an eISSN (in a different column) then a comparison would suggest (falsely) that Journal Title A was missing from the second list. Therefor it is sometimes useful to run matches for several columns across the two lists before looking at the outcome of the comparison.

...

  • Click 'OK'. This will populate a new column with a number representing the number of matching lines found in ListB
  • Facet on the new 'ListB Comparison' column to find those lines in ListA that do not appear in ListB (a zero in the column) based on a title.identifier.issn match, and select the 'true' facet to ensure you have filtered the list to only those for which this first step did not find a match
  • Click on the column header of the new 'ListB Comparison' column and choose 'Edit Cells' -> 'Transform ...'
  • In the 'Expression' box enter the text:
    • cells["title.identifier.eissn"].cross("ListB","title.identifier.eissn").length()
  • When you click 'OK', this will use the contents of the title.identifier.eissn column in ListA to check for matches in the title.identifier.eissn column in ListB

Anchor
normalise-titles
normalise-titles
Normalise titles to do comparison

The above examples use ISSN to lookup between two sheets because generally ISSNs are used consistently. While it is sometimes necessary to do some small changes to ensure consistency between ISSNs (e.g. adding hyphens if they are missing, or ensuring both sets of ISSNs use uppercase 'X' - both easily fixed via Quick Fix Transformations), generally you can use ISSNs for comparisons without a lot of effort. However sometimes you want to compare using title strings, and as these vary much more between different journal lists simply using the title 'as is' can lead to you missing matches.

...

When you've created both 'Normalised Title' columns you can then use the 'cross' function as described in the examples above to compare the two lists based on title.

Further

...

examples

To

...

-do

  • screencast of lookups using the 'cross' function
  • Looking up via an API