Jira Link

https://jira.kuali.org/browse/OLE-2194

From Requirements, below are decision-points:
1. 2nd indicator in MARC is non-filing character. need to use rules for these in applying sort/display standards. Ex. 245 1 3 $aAn April Shower- 2nd indicator is "3". Ignore first 3 characters, ie "An(space)" in applying sort rules.
2. NISO standards, Section 3- follow Sort order of characters very closely for Search Results display, and Browse/More display of Facets (main results view of facets is still by # of hits, hi to lo)
3. NISO standards, Section 4- Headings. Choice for current is "word by word". The following are in word-by-word sort order: cream, cream cheese, cream corn. 4.1.2.1
4. NISO standards, Section 7- Symbols. Choice for current is #7.1 for ASCII.
5. We are NOT yet addressing any non-roman/unicode characters, ie for treatment of Chinese, Russian etc. Weill still index or sort on their "romanized" values.

But implementation still needs to address NISO standards for #5 Abbreviations, and # 6 Numbering.

	NISO Rule/Recommendation	Meaning	SME Decisions	Example	Implementation Status	Comments

	2nd indicator in MARC is non-filing character. need to use rules for these in applying sort/display standards.	Ex. 245 1 3 $aAn April Shower- 2nd indicator is "3". Ignore first 3 characters, ie "An(space)" in applying sort rules.	Apply for Marc indexing, esp on Titles, possibly publishers, subjects, corporate authors- prior to applying sort orders below		Implemented in 0.8f
3	Order of Characters	The basic order of characters should be in the following sequence: spaces symbols other than numerals, letters, and punctuation marks numerals (0 through 9) letters (A through Z)	follow Sort order of characters very closely for Search Results display, and Browse/More display of Facets (main results view of facets is still by # of hits, hi to lo)	$$$ and sense 1, 2, buckle my shoe A-1 steak sauce	Implemented
3.1	Spaces	If the data contains more than one space then it should be treated as a single space			Implemented
3.2	Punctuation Marks Treated as Spaces( -,---,/)	The hyphen, dash (of any length), or slash is to be treated as a space.			Implemented in 0.8f
3.3	Punctuation Marks Ignored (other than -,---,/)	The following punctuation marks should be disregarded for arrangement purposes: period (full stop), comma, semi-colon, colon, parentheses, square brackets, angle brackets, braces (curved brackets), apostrophe, quotation marks (single or double), exclamation mark, question mark. They are not to be treated as spaces.		Ambassador hotel ...and so to bed	Implemented in 0.8f
3.4	Symbols Other Than Numerals, Letters and Punctuation Marks	Such symbols are arranged after a space but before a numeral. Two or more contiguous symbols should be treated as a single character.		¥ £ $ exchange $$$ and sense % of gain $10 a day 20 funny stories	Implemented
3.5	Numerals (0 through 9)	All data beginning with a numeral should be arranged ahead of any data beginning with a letter.		007 James Bond James	Implemented
3.6	Letters (A through Z)	The records should be arranged in the order of English alphabet ( Upper case and lower case has equal arrangement value)		Abalone abdomen Ambassador hotel	Implemented in 0.8f
3.6.1	Modified Letters	Letters modified by diacritical marks and ligatures of two letters should be arranged like their nearest basic equivalent letters in the English alphabet	Bob: This is OK for now -- may have to be refined later, in that some European languages alphabetize letters with diacritics separately from their base letter. This is important and needs to be researched and effort estimated in sprint 0.8f, but should not hold up other items. Development will be scheduled in a future sprint. Impact on performance to be kept in mind.	á, à, â, å, ä are arranged as a ñ is arranged as n ø is arranged as o æ is arranged as ae oe is arranged as oe	Implemented in 0.8f
3.7	Superscript and Subscript Characters	Superscript and subscript characters are arranged as “on-the-line” Characters.Basic characters followed by both sub- and superscript characters are arranged in the sequence: basic character - subscript - superscript.	Should be implemented. This can happen in Roman and non-Roman chars. Non-Roman chars will be taken up in future sprint. Need sample data also for research.	H2 H24 H34 When a character (H)has both subscript ₂ and superscript ⁴ characters, it should be coded as H24.Then the ordering will be as specified by NISO.	Implemented in 0.8f

4.	Headings		Choice for current is "word by word". T 4.1.2.1	The following are in word-by-word sort order: cream, cream cheese, cream corn.
4.1	Arrangement of Headings	Headings shall be arranged exactly as written, printed or otherwise displayed. The arrangement of a heading among other headings should be based solely on the sequence of numbers in arithmetical order and on the sequence of the 26 letters of the English alphabet.
4.1.1	Single-Word Headings	Data consisting of a single word precedes any data beginning with the same word and followed by other words.		New New Zealand	Implemented
4.1.2	Multi-word Headings(Word-by-Word)	This method is preferred, because it keeps together data beginning with the same word (or words).	Use 4.1.2.1 Word-by-Word application of Headings arrangement (do not apply 4.1.2.2 letter-by-letter)	networks New, Agnes New, Thomas New Zealand news agencies Newton, Isaac	Implemented in 0.8f
4.2	Headings with Qualifiers	Qualifying or explanatory terms are integral parts of a heading and should be arranged as any other words in the heading. Punctuation marks enclosing or preceding such terms are ignored.		bill (bank note) Bill Clinton; a life bill (ornithology) bill (request for payment) bill (weapon)	Implemented in 0.8f
4.3	Headings with Identical Initial Words	Data beginning with identical initial words should be arranged in the following sequence. a)Single-word headings b)Multi-word headings, including headings with qualifiers		New New York New (Zealand)	Implemented in 0.8f
4.4	Headings with Cross-References	Cross-references are not part of a heading, and therefore do not affect the arrangement of a heading.	No need to do anything for MARC and other formats. In case it is required for non-MARC formats, Bob will let us know.	fathers see parents Father’s Day see also Mother’s Day	Implementation not needed (as the sortable data does not have cross-references)	Difficult to identify cross references.
4.5	Subheadings	Subheadings are normally arranged in alphanumeric sequence.Subheadings are subject to the same arrangement rules as the headings they modify.	Nothing to do here as there are no subheadings for sortable fields.	memory Alzheimer’s disease and psychoses long-term loss of childhood events short-term	Implementation not needed (as the sortable data does not have sub-headings)	No subheadings seen in the data
4.6	Headings Beginning with Articles	Data beginning with Articles (a,an and the) are displayed in ascending order.	See Marc indicators. Bob- if Dublin Core or other format, should we use generic rules if "A, An, The" used at beginning of heading? Ignore and start with next full word? Bob: yes, but we'll need a longer list of initial words to ignore, including the most common foreign ones (El, Le, La, Il, etc.) See chart at http://en.wikipedia.org/wiki/Article_%28grammar%29 for example Articles should be considered for sorting. But if there is second indicator (in case of MARC records), it should be enforced.	A man Man Man, A see A man Man, The see The man The man	Implemented in 0.8f

5	Abbreviations	Abbreviations should be alphabetized exactly as written, not as spelled out.	Ignore punctuation chars.	Order is : A B C Aarhus abacus A.B.C. abdomen Cmdr. Smith CO2 lasers M. Flip ignorait sa mort M’Bow, Ahmadu Mlle. Henriette Mme. Pompadour Monsieur Verdoux Mr. Adams Mrs. Miniver No. 10, Downing Street No and yes	Implemented in 0.8f

6	Numbers
6.1	Headings Containing Numbers	Numbers at beginning or within the data should arranged in arithmetical order and sorted in ascending order. Headings beginning with numbers written in Arabic numerals should be sorted in ascending arithmetical order before headings beginning with a letter sequence.	Can the index treat numbers as whole entities, rather than digit by digit? The former is preferable -- if the latter, then "apt.11a" will come before "apt.7a". But it won't come up that often, so if it has to go digit-by-digit, we can live with that. Digit-by-digit is ok	007 James Bond 2 kinetic sculptors 2-phase flow in turbines 1984, Nineteen Eighty-four The 14th Amendment Zero-sum	Not implemented as per NISO. But the current implementation (digig-by-digit) is acceptable.	The ordering is digit by digit.(Difficult to order by value)
6.2	Punctuation in Numbers	Punctuation in numbers, as in other text, has no arrangement value (and sorted in ascending order).		$5000 reward 5,000- and 10,000-year 5000 años de historia	Implemented in 0.8f
6.3	Decimal Fractions	Decimal fractions should be arranged according to their arithmetical value (and sorted in ascending order).	Digit-by-digit is ok	0.25 mm .30 Vickers machine gun .303-inch machine guns	Not implemented as per NISO. But the current implementation (digit-by-digit) is acceptable.
6.4	Roman Numbers	Roman numbers should be arranged by their arithmetical value. To achieve this, the sequence of letters must first be tagged as a number by human intervention, and it may then be sorted as a Roman numeral, either manually or by an algorithm.	See text and also notes/jira on non-roman characters BP: may be practically impossible to identify them in library metadata -- probably not worth too much effort Word-by-word sorting is ok.	17 days to better living XX century encyclopedia 20 short stories John II	Implemented in 0.8f	Cannot identify Roman numbers.


7	Arrangement of Symbols Other than Numerals and Letters	symbols, whether single or forming a contiguous sequence, are arranged after a space but before any numerals or letters		see image- for special character handling
7.1	Arrangement in Standardized sequence	Symbols that form part of a standardized sequence. for example, ASCII (ANSI X3.4, American National Standard Code for Information Interchange)	Choice for current is #7.1 for ASCII. ASCII chars will be ordered in ASCII sequence. Ordering of Non-ASCII chars is unspecified.	# + & % $ *	Implemented
7.2	Arrangement in Order of Appearance	Not recommended as per Jira: OLE-2194	Do not use		Not in scope
7.3	Arrangement by Verbal Equivalent	Not recommended as per Jira: OLE-2194	Do not use		Not in scope
	Non-Roman (OLE 0.8)		We are NOT yet addressing any non-roman/unicode characters for full features or Indexing, ie for treatment of Chinese, Russian etc, butl still index or sort on their "romanized" values- and need display and edits to diacritics/non-roman available- https://jira.kuali.org/browse/OLE-2934

Browser not supported