Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info
titleJira Link

https://jira.kuali.org/browse/OLE-2194

From Requirements:
 1. 2nd indicator in MARC is non-filing character. need to use rules for these in applying sort/display standards. Ex. 245 1 3 $aAn April Shower- 2nd indicator is "3". Ignore first 3 characters, ie "An(space)" in applying sort rules.
2. NISO standards, Section 3- follow Sort order of characters very closely for Search Results display, and Browse/More display of Facets (main results view of facets is still by # of hits, hi to lo)
3. NISO standards, Section 4- Headings. Choice for current is "word by word". The following are in word-by-word sort order: cream, cream cheese, cream corn. 4.1.2.1
4. NISO standards, Section 7- Symbols. Choice for current is #7.1 for ASCII.
5. We are NOT yet addressing any non-roman/unicode characters, ie for treatment of Chinese, Russian etc. Weill still index or sort on their "romanized" values.

 

NISO Rule/Recommendation

Meaning

Example

Implemented ? (Y / N)

Comments

 

 

 

 

 

 

3

Order of Spaces

 

 

 

 

3.1

Spaces

If the data contains more than one space then it should be treated as a single space

 

Y

 

3.2

Punctuation Marks Treated as Spaces(  -,---,/)

The hyphen, dash (of any length), or slash is to be treated as a space.

 

N

Need to replace hyphen with a space in _sort field

3.3

Punctuation Marks Ignored (other than  -,---,/)

The following punctuation marks should be disregarded for arrangement purposes: period (full stop), comma, semi-colon, colon, parentheses, square brackets, angle brackets, braces (curved brackets), apostrophe, quotation marks (single or double), exclamation mark, question mark. They are not to be treated as spaces.

 

N

Need to remove these chars in _sort field.

3.4

Symbols Other Than Numerals, Letters and Punctuation Marks

Such symbols are arranged after a space but before a numeral.
Two or more contiguous symbols should be treated as a single character.

These symbols should be in the given order:  ¥ , $$ , %, $10

N

Need to investigate solr.

3.5

Numerals (0 through 9)

All data beginning with a numeral should be arranged ahead of any data beginning with a letter.

 

Y

 

3.6

Letters (A through Z)

The records should be arranged in the order of English alphabet ( Upper case and lower case has equal arrangement value)

 

N

convert _sort field to lower case.

3.6.1

Modified Letters

Letters modified by diacritical marks and ligatures of two letters should be arranged like their nearest basic equivalent letters in the English alphabet

 

N

Need to investigate solr.

3.7

Superscript and Subscript Characters

Superscript and subscript characters are arranged as "on-the-line"  Characters.

 

N

Need to investigate solr.

 

 

 

 

 

 

4.

Headings

 

 

 

 

4.1

Arrangement of Headings


 


 

4.1.1

Single-Word Headings

Data consisting of a single word precedes any data beginning with  the same word and followed by other words.

 

Y

 

4.1.2

Multi-word Headings(Word-by-Word)

This method is preferred, because it keeps together data beginning with the same word (or words).

Order is : N. E. Zenith Co. ? networks ? new moon ? Newton, Isaac ? Newton's rings

N

Can be done by modifying _sort field.

4.2

Headings with Qualifiers

The parentheses and square brackets are ignored when the data is like: bill (Bank note),Bill Clinton,bill (weapon)

 

N

Can be done by modifying _sort field.

4.3

Headings with Identical Initial Words

Data beginning with identical initial words should be arranged in the following sequence.
a)Single-word headings
b)Multi-word headings, including headings with qualifiers

 

N

Can be done by modifying _sort field.

4.4

Headings with Cross-References

Cross-references are not part of a heading, and  therefore do not affect the arrangement of a heading.

 

N

Cannot identify cross references.

4.5

Subheadings

Subheadings are normally arranged in alphanumeric sequence.Subheadings are subject to the same arrangement rules as  the headings they modify.

 

N

No subheadings seen in the data

4.6

Headings Beginning with Articles

Data beginning with Articles (a,an and the) are displayed in ascending order.

 

Y

 

 

 

 

 

 

 

5

Abbreviations

Abbreviations should be alphabetized exactly as written, not as spelled out.

Order is :  M'Bow, Ahmadu ? Mr. Adams ? Mrs. Smith but output : Mr. Adams ? Mrs. Smith ? M'Bow, Ahmadu

N

Can be done by modifying _sort field.

 

 

 

 

 

 

6

Numbers

 

 

 

 

6.1

Headings Containing Numbers

Numbers at beginning or within the data should arranged in arithmetical order and sorted in ascending order.

 

N

Need to investigate solr.

6.2

Punctuation in Numbers

Punctuation in numbers, as in other text, has no arrangement value (and sorted in ascending order).

 

N

Need to investigate solr.

6.3

Decimal Fractions

Decimal fractions should be arranged according to their arithmetical value (and sorted in ascending order).

 

N

Need to investigate solr.

6.4

Roman Numbers

Roman numbers should be arranged by their arithmetical value.

 

N

Cannot identify Roman numbers.

 

 

 

 

 

 

 

 

 

 

 

 

7

Arrangement of Symbols Other than Numerals and Letters

 

 

 

 

7.1

Arrangement in Standardized sequence

Symbols that form part of a standardized sequence. for example, ASCII (ANSI X3.4, American National Standard Code for Information Interchange)

 

Y

 

7.2

Arrangement in Order of Appearance

Not recommended as per Jira: OLE-2194

 

 

 

7.3

Arrangement by Verbal Equivalent

Not recommended as per Jira: OLE-2194