Info | ||
---|---|---|
| ||
https://jira.kuali.org/browse/OLE-1144 (OLE Search Executive- see also linked tasks and sub-tasks) |
...
1.1.1 Searchable fields for Bibliographic
Field Name | Work-Bib-MARC | Work-Bib-DublinQ | Work-Bib-DublinUnQ | |||
---|---|---|---|---|---|---|
Title | Yes | Yes | Yes | |||
Author | Yes | Yes | Yes | |||
Subject | Yes | Yes | Yes | Yes | ||
Description | Yes | Yes | Yes | |||
Date of Publication | Yes | Yes | Yes | |||
Format | Yes | Yes | Yes | |||
Language | Yes | Yes | Yes | |||
Publisher | Yes | Yes | Yes | |||
ISSN/ISBN/other (last for dc identifier) | Yes | Yes | Yes | |||
Genre (marc genre/dc type) | Yes | Yes | Yes | |||
Edition | Yes | No | No Description | Yes | Yes | Yes |
Date of Publication | Yes | Yes | Yes | |||
Format | Yes | Yes | Yes | |||
Language | Yes | Yes | Yes | |||
Publisher | Yes | Yes | Yes | |||
ISSN/ISBN/other (last for dc identifier) | Yes | Yes | Yes | |||
Genre (marc genre/dc type) | Yes | Yes | Yes | |||
Edition | Yes | No | No | |||
Bib Identifier | Yes | |||||
Holdings Identifier | Yes | |||||
Local Id | Yes | |||||
Doc Category | Yes | |||||
Doc Format | Yes | |||||
Doc Type | Yes | |||||
Status | Yes | |||||
Status Updated On | Yes | |||||
Staff Only Flag | Yes | |||||
Created By | Yes | |||||
Updated By | Yes | |||||
Date Entered | Yes | |||||
Date Updated | Yes |
1.1.2 Searchable fields for Instance
Field Name | Work-Instance-OLEML | Work-Holdings-OLEML | Work-Item-OLEML |
---|---|---|---|
Barcode | No | No | Yes |
Location | No | No | Yes |
Source | Yes | No | No |
Record Type | No | Yes | No |
Encoding Level | No | Yes | No |
Receipt Status | No | Yes | No |
Acquisition Method | No | Yes | No |
Policy Type | No | Yes | No |
Copies Reported | No | Yes | No |
Item Type | No | No | Yes |
Location Status | No | No | Yes |
Shelving Scheme | No | No | Yes |
Shelving Order | No | No | Yes |
Address | No | No | Yes |
Copy Number | No | No | Yes |
Volume Number | No | No Yes | Ye |
1.1.
...
4 Searchable fields for License
Field Name | Work-License-ONIXPL | Work-License-PDF |
---|---|---|
Contract Number | Yes | No |
Licensee | Yes | No |
Licensor | Yes | No |
Status | Yes | No |
Method | Yes | No |
Type | Yes | No |
Name | No | Yes |
File Name | No | Yes |
Date Uploaded | No | Yes |
Owner | No | Yes |
Notes | No | Yes |
1.2 Facet fields for all document categories, types and formats
Facet Field | Work-Bib-MARC | Work-Bib-DublinQ | Work-Bib-DublinUnQ | Work-Instance-OLEML | Work-Holdings-OLEML | Work-Item-OLEML | Work-License-ONIXPL | Work-License-PDF |
---|---|---|---|---|---|---|---|---|
Subject | Yes | Yes | Yes | No | No | No | No | No |
Author | Yes | Yes | Yes | No | No | No | No | No |
Format | Yes | Yes | Yes | No | No | No | No | No |
Language | Yes | Yes | Yes | No | No | No | No | No |
Publication Date | Yes | Yes | Yes | No | No | No | No | No |
Genre | Yes | Yes | Yes | No | No | No | No | No |
1.3 Field definitions for Work-Bib-MARC documents
Field | Data fields for search (MV- indicates multi-valued) | Data fields for short display | Data fields for detailed display | Data fields for Facet |
---|---|---|---|---|
ISSN | 022 - a,z (MV) | first value | all values | same as search field |
ISBN | 020 - a,z (MV) | first value | all values | same as search field |
Author/Creator | For each 100, 110: every subf except $6 (gives us 2 values for every tag). Also every subf except $t and $6 for: 111, 700, 710, 711, 800, 810, 811, 400, 410, 411) | first non-empty value of | All non-empty | All non-empty |
Title | 245 - all subf exc. c and 6. Also, 130, 240, 246, 247, 440, 490, 730, 740, 773, 774, 780, 785, 830, 840) (MV) | 245$a and 245$b | all values |
|
Place of Publication | 260 - a (MV) | first value | all values | same as search field |
Description | 505 - a (MV) | first value | all values | same as search field |
Subject | 600, 610, 611, 630, 650, 651, 653, 69X: every subf exc. $6, $2, $=, $? across these tags | first non-empty value of 600, 610, 611, 650, 651, 653, 69X . | All non-empty indexed | All non-empty indexed values. |
Date of Publication | <marc:controlfield tag="008">[Date 1 in the 7-10 positions LR: Can also include 260 $c. (260-c is same as the value in control field. Use this if control field does not have pub date value.) (MV). | first value | all values | same as search field. |
Edition | 250 - a,b (MV) | first value | all values | same as search field |
Form/Genre | 655 - a, v (MV) | first value | all values | same as search field |
Language | <marc:controlfield tag="008">[language code in the 35-37 positions]</marc:controlfield> LR: Add 546 $a (MV) | all values | all values | same as search field |
Format | 856 - q | first value | all values | same as search field |
...
1.4 Format field definitions for Work-Bib-MARC documents
Label | Marc Fields | Comments |
---|---|---|
Manuscript | Has any holdings with "manuscripts" in location_name (gets only this value) | LR: MARC XML does not have location_name so this is irrelevant to the IU data that OLE has for November. Manuscript could be determined by the Leader 06/07. 06 values a, f, t equal manuscripts on their own. 07 values c and d seem to imply mauscript/archival collections/series. We should check with the SMEs on this one. |
Microformat | Has 245 $h containing "micro" OR has any holdings with "micro" in location_name OR call_number starts "micro" (gets only this value) | LR: the 245 $h "micro" will work for the IU OLE MARCXML we have, but the reamaing text is specific to UPenn. |
Archive | Has any holdings with "archive" in location_name (gets only this value) | LR: This is specific to UPenn. We may need to talk to IU about if they include Archive descriptions in their MARC records and how they designate them as such. |
Thesis/Dissertation | bib_format is 'tm' AND has a 502 field | LR UPenn's bib_format seems to be a combination of the data values found in the 06/07 Leader fields. For example, t in the 06 is Manuscript and m in the 07 is Monograph/Item and together they equal a Thesis/Dissertation. |
Conference/Event | Has a 111 or 711 field [LR: Include 611 or 811] |
|
Book | bib_format is 'aa', 'am' or 'ac' or 'tm'; exclude $h [micro*] and $k [kit] | LR: the 2 characters are from the Leader 06/07 the inclusions are 245 subfields |
Sound recording | bib_format is 'im' or 'jm' or 'jc' or 'jd' or 'js' | LR: the 2 characters are from the Leader 06/07 |
Musical score | bib_format is cm, dm, ca, cb, cd or cs | LR: the 2 characters are from the Leader 06/07 |
Map/Atlas | bib_format is 'e*' or 'fm' | LR: the 2 characters are from the Leader 06/07 |
Video | bib_format is 'gm' AND 007/0 = v | LR: the 2 characters are from the Leader 06/07 |
Projected graphic | bib_format is 'gm' AND 007/0 = g | LR: 007 is a controlled field that indicates the format/physical description at general level and then associated subfields are more specific. |
Journal/Periodical | bib_format is 'as' or 'gs' | LR: 007 is a controlled field that indicates the format/physical description at general level and then associated subfields are more specific. |
Image | bib_format is 'km' | LR: the 2 characters are from the Leader 06/07 |
Datafile | bib_format is 'mm' | LR: the 2 characters are from the Leader 06/07 |
Newspaper | bib_format is 'as' AND (008/21 = 'n' OR 008/22 = 'e' ) | LR: the 2 characters are from the Leader 06/07. The 008 controlled field in those 2 positions provides the "form") |
3D object | bib_format is 'r*' | LR: the single character maps to the 06 position in the leader. |
Database/Website | bib_format is '*i' | LR: the single character maps to the 06 position in the leader. |
Government document | bib_format is NOT c*, d*, i*, j* AND ( (008/28 = f, i, o and 260$b not 'press') ) | LR: the single character maps to the 06 position in the leader. 008 is a fixed length controlled field and 260 $b is a type of publication. |
Other | any bib_format not caught above | LR: Presumably relates to other 06/07 Leader data values not represented. |
...
Field | DC-UnQ fields for Search | DC-Q fields for Search | Data fields for short display | Data fields for detailed display | Data fields for Facet |
---|---|---|---|---|---|
Author | <dc:creator> | <dcvalue element="contributor" qualifier="author"> | first value | All non-empty | All non-empty |
Description | <dc:description> (MV) | Per Bob P.: Do not show Abstract description. | first value | all values | same as search field |
Language | <dc:language> (MV) | <dcvalue element="language" qualifier="iso">en_US</dcvalue> | first value | all values | same as search field |
Subject | <dc:subject> (R) | <dcvalue element="subject" qualifier="none"> | first value | all values | same as search field |
Title | <dc:title> | <dcvalue element="title" qualifier="none"> | first value | all values | same as search field |
Type | <dc:type> (MV) | <dcvalue element="type" qualifier="none"> | first value | all values | same as search field |
Date of Publication | <dc:date> | <dcvalue element="date" qualifier="issued"> | first value | all values | same as search field |
Format | <dc:format> (MV) | <dcvalue element="type" (This is covered in a separate field. So do not include it in Format) | first value | all values | same as search field |
Publisher | <dc:publisher> (MV) | <dcvalue element="publisher" | first value | all values | same as search field |
ISBN/ISSN/other | <dc:identifier>(ISSN)0198-9669</dc:identifier> (MV) | <dcvalue element="identifier" qualifier="isbn">0-918006-48-1</dcvalue> | first value | all values | same as search field |
...
Output: "To", "be,", "or", "what?"
2.3.2 Synonym Filtering
It is the process of synonym mapping. Each token is looked up in the list of synonyms and if a match is found, then the synonym is emitted in place of the token. The position value of the new tokens are set such they all occur at the same position as the original token
...
The file named stopwords.txt specifies such words. Currently they are:
No Format |
---|
an and are as at
be but by
for
if in into is it
no not
of on or
s such
t that the their then there these they this to
was will with
|
...
Field Info/Attributes
Field Attribute | Purpose | Example |
---|---|---|
Id | Unique identifier of a field with a given [category, type, format] | id="ISBN_search" |
Name | Name of the field suitable for display | name="ISBN" |
Type | Indicates the type of value of the field (informative purpose only) | type="text" |
Field Definition Example:
No Format |
---|
<field id="ISBN_search" name="ISBN" type="text">
<mapping type="custom">
<include>020-a;z</include>
<exclude/>
</mapping>
</field>
|
...
Mapping Info/Attributes
Mapping Attribute | Purpose | Example |
---|---|---|
Type | Indicates how the mapping info is to be interpreted | type="custom" |
Include | Values to be included | <include>020-a;z</include> |
Exclude | Values to be excluded | <exclude/> |
xpath Mapping Example:
No Format |
---|
<field id="ContractNumber_search" name="Contract Number" type="text">
<mapping type="xpath">
<include>/publicationsLicenseExpression/licenseDetail/licenseIdentifier/IDValue/value</include>
</mapping>
</field>
|
...