Search – Find – Retrieve

All we do is talk about searching. Finding is what we should be talking about. Hence the interest for the findability idea with content publishers on the world wide web as it leads people to their web site. Authors motivated by earnings as in content marketing environment will adopt a specific writing style influencing search engine algorithms giving them a higher ranking in the search results. Inside the company a lot of content is written in the context of a specific business process or satisfying the execution of a task, and when is does not concern marketing, the author does not tweak his texts for easy finding, discovery and retrieval. The information contained by the content, however, can have extreme high knowledge value in times of efficiency and stressing the innovative capacity of organisations.
The search and search experience
However not all searches are equal. We often refer to google as the search experience of reference. You often hear “Why can’t we have a google at the office, then we would find the information we need”. There might even be a full text engine already available in that organisation. What makes that we are  that happy with the finding capability of google?
We list some elements:
The boundaries of the ‘collection’ – in the case of the use of google, with which most of us are familiar, we search the internet, a vague concept when looked at it from the point of view of a library or repository. The internet as such has vague boundaries. We have no idea of the content available. The amount of information is huge. As a consequence, the statistical chance of finding something interesting and/or relevant is high. In a corporate environment we are talking of a well circumscribed library of content with a relatively small amount of information artefacts (at least compared to the world wide web).
Related to the boundaries of the collection there is also the group of contributors, that is fairly limited, whilst rather endless on the internet not counting the free services provided by volunteer groups as in the case of wikipedia, who summarise information about most general topics.
Search coverage – Often not all information sources in an organisation is indexed or accessible using the same search engine. Whilst this is largely irrelevant for information available on the internet given its vague boundaries and sheer volume of information available.
The context of the search performed – generally speaking internet searches are there to get acquainted or informed about a certain topic. In a business context, searches are much more targeted, often driven by the execution and deadline of a specific task or case oriented.
This leads us to the goal of the information search – in general when searching the internet our attitude is more of the nature of ‘I want to know something about …’. A number of internet sources are specifically geared toward these ‘what about’ questions. In contrast majority of requests in the corporate environment are very targeted, aimed at confirming factual data. The find expectation is very targeted at a specific result. In a number of cases, we are looking for confirmation or proof, having the document or information artefact in mind. We however forgot the specific wording or document reference.
The feedback model which is strongly related with the earning model in the case of google is pushing them to deliver. The more targeted the search result is, the more likely one will click on it, visit the site and google gets paid for associated publicity. Internal search engines are generally licensed on a server based licence. There is no incentive included based on the quality of the services provided.
Optimising Search and Discovery is a challenge that is not really taken up by search technology providers. Majority of solutions are driven by and inverted file index, listing all keywords used in indexed content with a reference to the source as you will find in the back of a book with a search interface running against it. Basically the search request is mapped with the entries available in the index.
Although precision and high relevance of the search results are highly appreciated by people searching for information there is a balance through volume. The higher the volume of information the larger the result set requires a higher level of precision in indicating relevance in the result set. While in smaller sets the diversity in the result set may be higher the decrease in precision while going through the result list is more visible for the searcher.
In a context of limited content volume introducing the notion of synonym rings, a list of terms having similar meaning,  may ensure recall or a somewhat more important search result list, giving value to searches in a multi-disciplinary environment using different terminology or in a multi-lingual environment. Setting up the synonym lists requires important effort. In a similar effort, enhancing search results, the introduction of semantic web applications controlling vocabulary certainly helped search result quality. It makes navigating the collection possible through the use of a controlled vocabulary whilst not requiring extensive human indexing effort. Alternatively upstream tweaks, at the intake of content, as through automatic classification try to take over the work of the human indexer by automating it after a training period.
The use of clustering on the level of presenting search results helps searchers target the information needed. While the overall search result may be long, topical grouping will guide him or her to obtain the information required faster. Think of requesting information on “Milan”. The search cluster will inform you whether the clusters cover tourist information on Italian city and capital of Lombardia, people having Milan as a first name (with their last name as lower level sorting order) or AC Milan the soccer club.
Characteristics of corporate context
This focus on the content index for searching denied characteristics of corporate context in which the employee is looking to find information. Contrary to the generic web search we know a lot about the person launching the search. Like Google and other public social platforms, as Facebook, we can work with the search and response history when tuning search results. Which items of the result list were visited? Is there a pattern in the visits?  Coupling the search results to content ratings evaluating information found on corporate networks can be used to define the domain of interest.
Unlike on the internet content types with associated meta data can be identified. In the corporate environment adding meta data can be much more controlled, supported by value lists established in a specific business or process context by trained business analist. This does not prevent that additional free tags can be added to content items. Belonging to the same organisation and specific department will per definition enhance external qualifiers or content attributes. All people live in the same terminology cloud defined by their practice, corporate culture and corporate speak. Although slightly modified to unit or team adherence, vocabulary coherence is much higher than used by the widely scattered internet population.
Completely dissimilar to the internet, organisational and functional data is available on the user launching his search. We know in which department, team, project the person works, what the main focus of activities is. For each of the organisational units it is possible to indicate the semantic field of activity, linking the corporate directory to the semantic map declared and maintained in the corporate triple store.
Building these elements into the relevance ranking algorithm combined with the capacities of big data, of AI and the use of probabilistic reasoning and learning from previous search and retrieval behaviour and occasional feedback on content, search results can be targeted much better reducing searcher frustration even with a smaller library and the fundamental different search motivation in corporate contexts. Combined with already existing technological solutions this can lead to superior search and retrieval experience.
Thesauri and ontologies can both standardise language while also added flexibility to searches. Relationships built into the model can take variations into account as they are created on the author side or on the side of the searcher who may use sub vocabularies as created in sub-domains or teams. Some discipline oriented thesauri are available, customising them to the in-house dialects is time consuming and costly, the same goes for specific once when the material is not available. Feedback on the use of search queries and the interaction with the proposed result lists, that are built by the search engine, combined with big data and AI may kick start this process.
km-search
Example implementations, not necessarily covering all elements of the model are:
Open Semantic Search (www.opensemanticsearch.org)
Nuance in the medical sector (https://www.nuance.com/healthcare.html)

Leave a Reply

Your email address will not be published. Required fields are marked *