...
This functionality allows documents to be searched for by giving keywords or phases. Searching can be based on category, type, format, search fields.
2.1 Quick Search
Select Doc Category : Work
...
System shows records with any field matching one or more keywords.
2.2 Advanced Search
Select Doc Category : Work
...
Search is performed based on the conditions entered by the user.
2.3 Solr-specific search rules
Solr allows us to specify how the input data is indexed and searched for.
...
Output: "To", "be,", "or", "what?"
2.3.2 Synonym Filtering
It is the process of synonym mapping. Each token is looked up in the list of synonyms and if a match is found, then the synonym is emitted in place of the token. The position value of the new tokens are set such they all occur at the same position as the original token
...
# Synonym mappings can be used for spelling correction too
pixima => pixma
2.3.3 Stop word filtering
It is the process of discarding tokens that are on the given stop words list.
...
No Format |
---|
an and are as at be but by for if in into is it no not of on or s such t that the their then there these they this to was will with |
2.3.4 Word delimiter (splitting)
It is the process of splitting tokens at word delimiters. The rules for determining delimiters are as follows:
- A change in case within a word: "CamelCase" -> "Camel", "Case"
- A transition from alpha to numeric characters or vice versa:"Gonzo5000" -> "Gonzo", "5000" ; "4500XL" -> "4500", "XL"
- Non-alphanumeric characters (discarded): "hot-spot" -> "hot", "spot"
- A trailing "'s" is removed: "O'Reilly's" -> "O", "Reilly"
- Any leading or trailing delimiters are discarded: "-hot-spot" -> "hot", "spot"
2.3.5 Lower case conversion
Any uppercase letters in a token are converted to the equivalent lowercase token. All other characters are left unchanged.
2.3.6 Keyword protection
Protecting words from being modified by stemmers.
...
No such words are specified at this time.
2.3.7 Stemming
It is the process of reducing any of the forms of a word such as "walks, walking, walked", to its elemental root e.g., "walk".
Porter Stemming Algorithm is used. It is only appropriate for English language text.
2.3.8 Remove duplicates
Removing duplicate tokens in the stream. Tokens are considered to be duplicates if they have the same text and position values.
...
Records with empty or null values will appear at the top of the search results.
3.4 Solr-specific sorting features
- If the data contains more than one space then they are treated as a single space.
- All data beginning with a numeral are arranged ahead of any data beginning with a letter.
- Data consisting of a single word precedes any data beginning with the same word and followed by other words.
- Data beginning with Articles (a, an and the) are displayed in ascending order.
- Numbers at beginning or within the data are arranged in arithmetical order and sorted in ascending order.
- Punctuation in numbers, as in other text, has no arrangement value (and sorted in ascending order).
- Decimal fractions are arranged according to their arithmetical value (and sorted in ascending order).
...