Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This functionality allows documents to be searched for by giving keywords or phases. Searching can be based on category, type, format, search fields.

2.1  Quick Search

            Select Doc Category : Work

...

   System shows records with any field matching one or more keywords.

2.2  Advanced Search

            Select Doc Category : Work

...

   Search is performed based on the conditions entered by the user.

2.3 Solr-specific search rules

Solr allows us to specify how the input data is indexed and searched for.

...

Output: "To", "be,", "or", "what?"

2.3.2 Synonym Filtering

It is the process of synonym mapping. Each token is looked up in the list of synonyms and if a match is found, then the synonym is emitted in place of the token. The position value of the new tokens are set such they all occur at the same position as the original token

...

# Synonym mappings can be used for spelling correction too

pixima => pixma

2.3.3 Stop word filtering

It is the process of discarding tokens that are on the given stop words list.

...

No Format
an and are as at

be but by

for

if in into is it

no not

of on or

s such

t that the their then there these they this to

was will with
2.3.4 Word delimiter (splitting)

It is the process of splitting tokens at word delimiters. The rules for determining delimiters are as follows:

  •     A change in case within a word: "CamelCase" -> "Camel", "Case"
  •     A transition from alpha to numeric characters or vice versa:"Gonzo5000" -> "Gonzo", "5000"   ;  "4500XL" -> "4500", "XL"
  •     Non-alphanumeric characters (discarded): "hot-spot" -> "hot", "spot"
  •     A trailing "'s" is removed: "O'Reilly's" -> "O", "Reilly"
  •     Any leading or trailing delimiters are discarded: "-hot-spot" -> "hot", "spot"
2.3.5 Lower case conversion

Any uppercase letters in a token are converted to the equivalent lowercase token. All other characters are left unchanged.

2.3.6 Keyword protection

Protecting words from being modified by stemmers.

...

No such words are specified at this time.

2.3.7 Stemming

It is the process of reducing any of the forms of a word such as "walks, walking, walked", to its elemental root e.g., "walk".

Porter Stemming Algorithm is used. It is only appropriate for English language text.

2.3.8 Remove duplicates

Removing duplicate tokens in the stream. Tokens are considered to be duplicates if they have the same text and position values.

...

Records with empty or null values will appear at the top of the search results.

3.4 Solr-specific sorting features

  • If the data contains more than one space then they are treated as a single space.
  • All data beginning with a numeral are arranged ahead of any data beginning with a letter.
  • Data consisting of a single word precedes any data beginning with the same word and followed by other words.
  • Data beginning with Articles (a, an and the) are displayed in ascending order.
  • Numbers at beginning or within the data are arranged in arithmetical order and sorted in ascending order.
  • Punctuation in numbers, as in other text, has no arrangement value (and sorted in ascending order).
  • Decimal fractions are arranged according to their arithmetical value (and sorted in ascending order).

...