Knowledge Article

Article Number

000246241

Old Article Number

000129846

Article Type

Product/Service Description

Title

Remedy Knowledge Management Application : How RKM search works - INCLUDES VIDEO

Summary

An explanation on how RKM search works - INCLUDES VIDEO

Product

Remedy Knowledge Management Application

Component

Remedy Knowledge Management Application

Applies to

ITSM 9.x, 18.x, 19.x, 20.x

Details

This knowledge article may contain information that does not apply to version 21.05 or later which runs in a container environment. Please refer to Article Number 000385088 for more information about troubleshooting BMC products in containers.

How Search Works in RKM

This document is about how RKM search really works and how to ensure that one gets “relevant” search results.

RKM Search Stack

Lets first take a look at the RKM Search Stack.
RKM provides Global Search and Knowledge Search functionality for searching different indexed entities in RKM/ITSM/SRM like Tickets, Tasks, Articles, etc. Here is a set of steps which occur when you perform search:

Description	System Component
Enter Search text and perform search. RKM:SearchDialog workflows are triggered which build a search qualification and query AR’s FTS engine via “AR System Multi-Form Search” interface form.	RKM
AR Full Text Search Engine [plugin] would get invoked and would utilize Apache Lucene search engine to perform actual search.	AR FTS Engine
Apache Lucene (https://lucene.apache.org/core/) performs the search using an Index which would have got already generated based on indexing of various entities [Tickets, Tasks, Articles…]. Lucene would use its own Relevance Algorithm to order the search results. By default multiple terms provided as part of the search query are “OR”ed together [i.e. even if anyone term matches, the record is still returned as part of results]	Apache Lucene
AR would perform a post processing and would ensure Row Level Security by eliminating records not meant to be visible to the current user	AR Post processing
Search results are displayed by the Relevance Score/Weight returned by FTS	RKM

In order for the search to find relevant entities in ITSM/SRM, BMC indexes Tickets, Articles etc when that information is either created or modified.
In below sections, terms AR, FTS and Lucene are used interchangeably to describe the indexing and relevance concept. Similarly, terms Document and Record [or Entry in AR] are used interchangeably.

Lucene Relevance Scoring

Indexing

When information like Tickets or Articles are modified, Lucene reindexes the information. This involves analyzing the information. For Lucene everything is a document and a field within the document. From AR perspective a record or any entry is a document for Lucene and a column of the record marked for indexing is a field within the document. So an HPD:Help Desk form becomes a document and Summary or Notes fields which are marked for “MFS Only” or “FTS and MFS” indexing become the fields to be indexed by Lucene within that document. Note that Lucene builds index at Field level.

While indexing a field, Lucene

Extracts keyword and calculate the number of occurrences per field and per document [Term Frequency]
Uses “root words” [Stemming]
Can be supplied a dictionary for similar words [Synonyms]
Can be supplied an “ignore words” list [Stop Words]

Searching
Based on the search terms supplied, Lucene would use the already generated index to find similar or matching documents. It tries to find the “relevancy” of the document against the search terms supplied and gives a score to each document. The score is determined by 3 key factors described below:

How often does the search terms appear in the document	The more often the terms is found, the higher the score. A field containing five mentions of the same term is more likely to be relevant than a field containing just one mention.	Technical: Term Frequency Tf = √frequency of the term in the field
How often does the search term appears across the documents in the collection	The more often the term is found, the lower the score. So common terms like go, find contribute little to relevance, unlike uncommon terms like MongoDB, Outlook etc.	Technical: Inverse Document Frequency Idf = 1 + log (numDocs / (docFreq + 1))
How long is the field in which search terms appears	The shorter the field, the higher the score. If a term appears in a shorter field like Title or Keyword, its more likely describing the whole document rather than say a body field.	Technical: Field-length Norm Norm = 1 / √numTerms

In case, multiple fields on the same document are setup for FTS indexing then the above scores are aggregated across the field level scores for each document. The boost factors explained in below sections also come into play for overall score.
This can be further combined with other factors like Term Proximity in case of Phrase Queries Term Similarity in Fuzzy queries. RKM doesn’t use Phrase query by default and doesn’t support Fuzzy queries.

Summarizing Lucene Relevance

In layman’s terminology, all of the following considerations come into play while deciding relevancy of results [and hence order of documents in the results as well]

Documents containing all the search terms stand best chance of appearing on the top
Matches on rare words are better than the common words [i.e. most commonly found across documents]
Long documents or longer field text content is not as good as a short one
Documents which mention the search terms many times are good

Improving Relevance of RKM Searches

Lets look at some ways to improve the relevance of RKM searches, so that most appropriate articles are at the top of the results.

Tuning Guidelines

RKM searches can be tuned by going thr’ the two phase process of verifying current status and then tuning the search relevance:

Phase	Step	What to verify/tune
Verification
	Indexing Status of Knowledge Templates	Midtier->Knowledge Management Console->Manage Knowledge Sources-> Knowledge Template date
	Relevancy Fields correctly mapped	Developer Studio->Various *_Manageable_Join Form -> Definitions View -> FullTextSearch -> Title/Environment/Keyword Relevancy Field Mapping
	Relevancy Field Weights	Midtier->Server Information->FTS Tab->Title/Env/Keyword Field Weights
	Ignore fields list	Midtier->Server Information->FTS Tab->Ignore Words List
	For articles which appear on top but should not - check Use and View Count	Are the counts looking reasonable or have excessively high values?
	Use and View Count Boost	Midtier->Application Admin-> Custom Configuration-> Knowledge Management-> Application Settings Do you need Defensive or Aggressive?
	Role of Article level Attachments	Check how the search text is matching Do you want the Attachments to be indexed?
Search Tuning
	Identify keywords to search and identify articles which should be on top	Is Title correctly worded? Is Keyword having required search terms present? What words can be ignored during search?
	Edit Article - Add needed Keywords	Which sections/fields should FTS search upon? Which sections/fields it should not?
	Set fields to index for KCS Template [Decide whether all OOTB fields need to be indexed for FTS and remove unwanted fields]	Midtier->Knowledge Management Console->Manage Knowledge Sources->KCS Template->Content Fields
	Set Relevancy Field Weight	Midtier->Server Information->FTS Tab->Title/Env/Keyword Field Weights Typically observed settings are Title:5, Keywords:2 Recommended Range is Title:{4-6}, Keywords: {2-4}
	Set words to Ignore	Midtier->Server Information->FTS Tab->Ignore Words List
	Decide whether default Boost is aggressive for Viewing and Usage/Linking of article	Midtier->Application Admin->Custom Configuration->Knowledge Management->Application Settings This could have biggest impact on relevancy. If the articles are used very frequently and result in a bigger Use count or View count – like 500 or 1000 – then you need to make sure that the Use/View Boost is more defensive in nature E.g. View Boost ~= 0.000001 and Use Boost ~= 0.00001] Overall the multiplying factor should not cross beyond value of 3 or 4.

	Finally perform complete re-indexing	Cleanup the FTS collection folder Remove records from FTPending table Reindex from the Midtier->Server Information->FTS Tab

Article Writing Guidelines

After having performed the Search Tuning described earlier, there are few practices one can follow while writing the Articles

Title should have representative words describing the problem the article is trying to solve.
The section inside the article which describes the problem statement should be a brief section written in a language and terminology which the consumers of the article would use.
Make sure to utilize the keywords field to enter only singular keywords as well as synonyms which represent the article.
Overall at FTS/Lucence level, make sure that you utilize the Dictionary or Synonym facility to define similar words.
Identify stop-words which could be creating a lot of clutter in search results.
Study the "No Search Results" Report and identify either missing articles or missing keywords in the existing articles.
Visibility Groups should be in place to reduce the clutter in search results.

Note: You may want to check Out of the box BMC Knowledge Management reports: Reports

Technical Reference on Algorithm

When a search query is provided to Lucene, it will find the documents matching the query. As soon as any matching document is found, Lucene calculates the score for the document against the supplied query. It combines score of each matching term. The actual formula used for calculating relevancy score is
Score (q,d) = queryNorm(q) * coord(q,d) * ∑_{(t in q)} ( tf(t in d) * idf (t)² * t.getBoost() * norm (t,d))
Here,
Score(q,d) is the score of document d for query q
queryNorm(q) is query Normalization factor for term q so that all terms are brought at a same normalization level
coord(q,d) is a query Coordination factor which gives more weightage to those documents which contain higher percentage of terms. Thus a document having more query terms is expected to be a good match for the query.
∑_{(t in q)}is sum of Weights of each search term t in the query q for document d
Tf (t in d) is a Term Frequency of the term t in document d. This ensures that more often a term appears in the document, higher the weightage (i.e. more relevant the document is to the query)
Idf (t)² is an Inverse Document Frequency. This ensures that if a term appears more commonly across all documents in a collection/database, then lower the weightage (i.e. less relevant the document is to the query)
t.getBoost() is Query Time Boosting of a Field in the document over other Fields.
Norm (t,d) is Field Length Norm which is influenced by how long the field contents are. Shorter the field content length, higher the weight (i.e. more relevant the document is to the query). This is combined with Index Time Boosting of a Field where the boost (or multiplication) is applied to every term in the field, rather than to the field itself.

Attachment(s):