Jira Link
From Requirements, below are decision-points:
1. 2nd indicator in MARC is non-filing character. need to use rules for these in applying sort/display standards. Ex. 245 1 3 $aAn April Shower- 2nd indicator is "3". Ignore first 3 characters, ie "An(space)" in applying sort rules.
2. NISO standards, Section 3- follow Sort order of characters very closely for Search Results display, and Browse/More display of Facets (main results view of facets is still by # of hits, hi to lo)
3. NISO standards, Section 4- Headings. Choice for current is "word by word". The following are in word-by-word sort order: cream, cream cheese, cream corn. 4.1.2.1
4. NISO standards, Section 7- Symbols. Choice for current is #7.1 for ASCII.
5. We are NOT yet addressing any non-roman/unicode characters, ie for treatment of Chinese, Russian etc. Weill still index or sort on their "romanized" values.
But implementation still needs to address NISO standards for #5 Abbreviations, and # 6 Numbering.
|
NISO Rule/Recommendation |
Meaning |
Example |
Implemented ? (Y / N) |
Comments |
---|---|---|---|---|---|
|
|
|
|
|
|
3 |
Order of Spaces |
|
|
|
|
3.1 |
Spaces |
If the data contains more than one space then it should be treated as a single space |
|
Y |
|
3.2 |
Punctuation Marks Treated as Spaces( -,---,/) |
The hyphen, dash (of any length), or slash is to be treated as a space. |
|
N |
Need to replace hyphen with a space in _sort field |
3.3 |
Punctuation Marks Ignored (other than -,---,/) |
The following punctuation marks should be disregarded for arrangement purposes: period (full stop), comma, semi-colon, colon, parentheses, square brackets, angle brackets, braces (curved brackets), apostrophe, quotation marks (single or double), exclamation mark, question mark. They are not to be treated as spaces. |
Ambassador hotel |
N |
Need to remove these chars in _sort field. |
3.4 |
Symbols Other Than Numerals, Letters and Punctuation Marks |
Such symbols are arranged after a space but before a numeral. |
¥ £ $ exchange |
N |
Need to investigate solr. |
3.5 |
Numerals (0 through 9) |
All data beginning with a numeral should be arranged ahead of any data beginning with a letter. |
007 James Bond |
Y |
|
3.6 |
Letters (A through Z) |
The records should be arranged in the order of English alphabet ( Upper case and lower case has equal arrangement value) |
Abalone |
N |
convert _sort field to lower case. |
3.6.1 |
Modified Letters |
Letters modified by diacritical marks and ligatures of two letters should be arranged like their nearest basic equivalent letters in the English alphabet |
á, à, â, å, ä are arranged as a |
N |
Need to investigate solr. |
3.7 |
Superscript and Subscript Characters |
Superscript and subscript characters are arranged as "on-the-line" Characters. |
|
N |
Need to investigate solr. |
|
|
|
|
|
|
4. |
Headings |
|
|
|
|
4.1 |
Arrangement of Headings |
|
|
|
|
4.1.1 |
Single-Word Headings |
Data consisting of a single word precedes any data beginning with the same word and followed by other words. |
New |
Y |
|
4.1.2 |
Multi-word Headings(Word-by-Word) |
This method is preferred, because it keeps together data beginning with the same word (or words). |
networks |
N |
Can be done by modifying _sort field. |
4.2 |
Headings with Qualifiers |
Qualifying or explanatory terms are integral parts of a heading and should be arranged |
bill (bank note) |
N |
Can be done by modifying _sort field. |
4.3 |
Headings with Identical Initial Words |
Data beginning with identical initial words should be arranged in the following sequence. |
New |
N |
Can be done by modifying _sort field. |
4.4 |
Headings with Cross-References |
Cross-references are not part of a heading, and therefore do not affect the arrangement of a heading. |
fathers see parents |
N |
Cannot identify cross references. |
4.5 |
Subheadings |
Subheadings are normally arranged in alphanumeric sequence.Subheadings are subject to the same arrangement rules as the headings they modify. |
memory |
N |
No subheadings seen in the data |
4.6 |
Headings Beginning with Articles |
Data beginning with Articles (a,an and the) are displayed in ascending order. |
A man |
Y |
|
|
|
|
|
|
|
5 |
Abbreviations |
Abbreviations should be alphabetized exactly as written, not as spelled out. |
Order is : |
N |
Can be done by modifying _sort field. |
|
|
|
|
|
|
6 |
Numbers |
|
|
|
|
6.1 |
Headings Containing Numbers |
Numbers at beginning or within the data should arranged in arithmetical order and sorted in ascending order. |
007 James Bond |
N |
Need to investigate solr. |
6.2 |
Punctuation in Numbers |
Punctuation in numbers, as in other text, has no arrangement value (and sorted in ascending order). |
$5000 reward |
N |
Need to investigate solr. |
6.3 |
Decimal Fractions |
Decimal fractions should be arranged according to their arithmetical value (and sorted in ascending order). |
0.25 mm |
N |
Need to investigate solr. |
6.4 |
Roman Numbers |
Roman numbers should be arranged by their arithmetical value. |
17 days to better living |
N |
Cannot identify Roman numbers. |
|
|
|
|
|
|
|
|
|
|
|
|
7 |
Arrangement of Symbols Other than Numerals and Letters |
|
|
|
|
7.1 |
Arrangement in Standardized sequence |
Symbols that form part of a standardized sequence. for example, ASCII (ANSI X3.4, American National Standard Code for Information Interchange) |
" |
Y |
|
7.2 |
Arrangement in Order of Appearance |
Not recommended as per Jira: OLE-2194 |
|
|
|
7.3 |
Arrangement by Verbal Equivalent |
Not recommended as per Jira: OLE-2194 |
|
|
|