Search – Find – Retrieve

All we do is talk about searching. Finding is what we should be talking about. Hence the interest for the findability idea with content publishers on the world wide web as it leads people to their web site. Authors motivated by earnings as in content marketing environment will adopt a specific writing style influencing search engine algorithms giving them a higher ranking in the search results. Inside the company a lot of content is written in the context of a specific business process or satisfying the execution of a task, and when is does not concern marketing, the author does not tweak his texts for easy finding, discovery and retrieval. The information contained by the content, however, can have extreme high knowledge value in times of efficiency and stressing the innovative capacity of organisations.
The search and search experience
However not all searches are equal. We often refer to google as the search experience of reference. You often hear “Why can’t we have a google at the office, then we would find the information we need”. There might even be a full text engine already available in that organisation. What makes that we are  that happy with the finding capability of google?
We list some elements:
The boundaries of the ‘collection’ – in the case of the use of google, with which most of us are familiar, we search the internet, a vague concept when looked at it from the point of view of a library or repository. The internet as such has vague boundaries. We have no idea of the content available. The amount of information is huge. As a consequence, the statistical chance of finding something interesting and/or relevant is high. In a corporate environment we are talking of a well circumscribed library of content with a relatively small amount of information artefacts (at least compared to the world wide web).
Related to the boundaries of the collection there is also the group of contributors, that is fairly limited, whilst rather endless on the internet not counting the free services provided by volunteer groups as in the case of wikipedia, who summarise information about most general topics.
Search coverage – Often not all information sources in an organisation is indexed or accessible using the same search engine. Whilst this is largely irrelevant for information available on the internet given its vague boundaries and sheer volume of information available.
The context of the search performed – generally speaking internet searches are there to get acquainted or informed about a certain topic. In a business context, searches are much more targeted, often driven by the execution and deadline of a specific task or case oriented.
This leads us to the goal of the information search – in general when searching the internet our attitude is more of the nature of ‘I want to know something about …’. A number of internet sources are specifically geared toward these ‘what about’ questions. In contrast majority of requests in the corporate environment are very targeted, aimed at confirming factual data. The find expectation is very targeted at a specific result. In a number of cases, we are looking for confirmation or proof, having the document or information artefact in mind. We however forgot the specific wording or document reference.
The feedback model which is strongly related with the earning model in the case of google is pushing them to deliver. The more targeted the search result is, the more likely one will click on it, visit the site and google gets paid for associated publicity. Internal search engines are generally licensed on a server based licence. There is no incentive included based on the quality of the services provided.
Optimising Search and Discovery is a challenge that is not really taken up by search technology providers. Majority of solutions are driven by and inverted file index, listing all keywords used in indexed content with a reference to the source as you will find in the back of a book with a search interface running against it. Basically the search request is mapped with the entries available in the index.
Although precision and high relevance of the search results are highly appreciated by people searching for information there is a balance through volume. The higher the volume of information the larger the result set requires a higher level of precision in indicating relevance in the result set. While in smaller sets the diversity in the result set may be higher the decrease in precision while going through the result list is more visible for the searcher.
In a context of limited content volume introducing the notion of synonym rings, a list of terms having similar meaning,  may ensure recall or a somewhat more important search result list, giving value to searches in a multi-disciplinary environment using different terminology or in a multi-lingual environment. Setting up the synonym lists requires important effort. In a similar effort, enhancing search results, the introduction of semantic web applications controlling vocabulary certainly helped search result quality. It makes navigating the collection possible through the use of a controlled vocabulary whilst not requiring extensive human indexing effort. Alternatively upstream tweaks, at the intake of content, as through automatic classification try to take over the work of the human indexer by automating it after a training period.
The use of clustering on the level of presenting search results helps searchers target the information needed. While the overall search result may be long, topical grouping will guide him or her to obtain the information required faster. Think of requesting information on “Milan”. The search cluster will inform you whether the clusters cover tourist information on Italian city and capital of Lombardia, people having Milan as a first name (with their last name as lower level sorting order) or AC Milan the soccer club.
Characteristics of corporate context
This focus on the content index for searching denied characteristics of corporate context in which the employee is looking to find information. Contrary to the generic web search we know a lot about the person launching the search. Like Google and other public social platforms, as Facebook, we can work with the search and response history when tuning search results. Which items of the result list were visited? Is there a pattern in the visits?  Coupling the search results to content ratings evaluating information found on corporate networks can be used to define the domain of interest.
Unlike on the internet content types with associated meta data can be identified. In the corporate environment adding meta data can be much more controlled, supported by value lists established in a specific business or process context by trained business analist. This does not prevent that additional free tags can be added to content items. Belonging to the same organisation and specific department will per definition enhance external qualifiers or content attributes. All people live in the same terminology cloud defined by their practice, corporate culture and corporate speak. Although slightly modified to unit or team adherence, vocabulary coherence is much higher than used by the widely scattered internet population.
Completely dissimilar to the internet, organisational and functional data is available on the user launching his search. We know in which department, team, project the person works, what the main focus of activities is. For each of the organisational units it is possible to indicate the semantic field of activity, linking the corporate directory to the semantic map declared and maintained in the corporate triple store.
Building these elements into the relevance ranking algorithm combined with the capacities of big data, of AI and the use of probabilistic reasoning and learning from previous search and retrieval behaviour and occasional feedback on content, search results can be targeted much better reducing searcher frustration even with a smaller library and the fundamental different search motivation in corporate contexts. Combined with already existing technological solutions this can lead to superior search and retrieval experience.
Thesauri and ontologies can both standardise language while also added flexibility to searches. Relationships built into the model can take variations into account as they are created on the author side or on the side of the searcher who may use sub vocabularies as created in sub-domains or teams. Some discipline oriented thesauri are available, customising them to the in-house dialects is time consuming and costly, the same goes for specific once when the material is not available. Feedback on the use of search queries and the interaction with the proposed result lists, that are built by the search engine, combined with big data and AI may kick start this process.
km-search
Example implementations, not necessarily covering all elements of the model are:
Open Semantic Search (www.opensemanticsearch.org)
Nuance in the medical sector (https://www.nuance.com/healthcare.html)

Knowledge Management in times of Google and the internet

“Just Google it”. An often heard statement, certainly with the younger and more tech savvy parts of the work force. The most popular of public and free accessible search engines is most certainly a portal to huge amounts of information and of knowledge shared by any wiling person on the internet. With all this information available should we then still organise for collecting, curating and creating knowledge in the confinement of individual organisations?
From a pragmatic and economical point of view one should say no. Why should an organisation bother to document and organise knowledge and information that is already available? This argument may hold for more generally applicable or domain independent knowledge. You may find a lot of IT related ‘how-to’ materials ranging from general information on programming to video presenting technological solutions and how to use them. And more, this information is evolving at a high pace. However it is also the information publishers distribute as paper print books or online materials that lead to the creation of libraries for those who do not want every member of the organisation buying a copy of the book they may need. Digital availability however has virtualised the library and what once was the beginning of knowledge management has been made redundant, certainly when you look at the corporate space.
Another argument not to invest in knowledge management is the quick paced change and innovation in current society. Why would you invest in knowledge that is short lived? Shouldn’t we better invest in capacity and competence to learn and adapt? Most certainly, staff that is capable of learning new skills and techniques will be worth  more and ensure organisational agility and sustainability. Opposite to this argument is of course that innovation can only be generated with sufficient knowledgeable and talented people building further on an already available corpus of knowledge.
In positioning the issue a question comes to mind. Is there something like corporate knowledge? Thus, is there value in governing corporate knowledge?
From a protectionist or legal asset point of view, corporate knowledge is all that  should be protected, that is part of intellectual property (IP) and that sets an organisation a side in the innovation landscape. It builds on a notion of ‘knowledge as property’ that is an asset or tool for building innovation, new products and services. Organisations with a strong research and development wing, would certainly fall in this category of corporate knowledge use. But also here, open innovation initiatives break down barriers and put the property reflex into perspective. Would this then mean that there is a knowledge of practice (as Cook and Brown state in their 1999 article on the epistemologies of knowledge). As they indicate the common knowledge pool is situated on the level of activity domain – such as medicine or mechanical engineering – or more specifically on the level of a company or a group adhering to specific working methods.
The technological asset view strives to the creation of an analytical inventory of knowledge elements. They will be expressed as business rules by business analysts and IT people or derived by learning algorithms to be integrated in business applications automating a maximum of processes and tasks. Knowledge is here seen as a productivity asset increasingly standardising and commoditising knowledge work with a repetitive nature.
Taking a human resource based vantage would see knowledge as the possession or attribute of somebody and reduce knowledge management to recruiting for a context and task at hand, and so a proponent of the resource approach will look for somebody with the required training and/or experience to get a specific job done. Extending the resource view to a social level this would imply that knowledge is shared either through education and training or via professional networks of likewise trained professionals that build a certain experience. As a consequence  organisations are not stable, oriented to short term goals and populated with itinerants renting out their skills.
The culture stance on the issue certainly stresses the existence of organisational knowledge expressed in working methods, specific vocabulary and shared experiences and understanding or historical cases that transmit knowledge. Reasoning in a culture building logic will require strong group building and socialisation for all new comers, embedding them in the group logic.
In a social group logic, discussion is central to transferring knowledge and attaching meaning and applicability to the context of use. Knowledge is contextually requested, explained and adapted to the needs of solving the problem at hand. In this way no stable corpus will be established and thus relates strongly to the human resources approach.
In view of the conclusions that Tsoukas and Vladimirov draw on the fact that “individuals understand generalisations only through connecting the latter to particular circumstances” and as such knowledge is developed by employees while doing their job stable knowledge does not exist. It is a continuous succession of learning experiences. Tuning the wheels of learning efficiency “knowledge management then is primarily the dynamic process of of turning a unreflective practice into a reflective one but elucidating the rules guiding the activities of practice.” In this way one should consider the need for guiding information and knowledge presented on a structured meta level balanced with social sharing of knowledge through discussion initiated by a problem to be solved related to the task at hand at a specific point in time.
In this context knowledge management will focus on
  • education and training, however not necessarily covering only ad hoc needs answering concrete learning needs
  • providing structure, context and finding aids giving access to the materials needed for tackling the task at hand
  • providing (access to) a social infrastructure permitting to contact and interact with knowledgeable people or the ones having experienced a similar challenge
This will give opportunities to address problem solving tactics that differ across generations. The older generations will mainly draw from memory, what they have learned in the past and turn to documentation when the answer is not known. The younger people tend to start googling and in a next resort launching requests in their social network.