Automatic Keywording and Classification
Knowledge management with AI-based services: Group-wide information search improvement
Time & Materials
The international mobility group wants to increase the innovative capacity of its thousands of employees and has had an intranet solution developed with a state-of-the-art search function for supplying information. It is supplemented by numerous services for specialist information, which provide trade literature in the form of articles and books. For a satisfactory information access experience, users expect search, browsing, and navigation functions like those they are used to from leading Internet search engines and e-commerce portals. These functionalities are usually based on very extensive and quality-assured metadata. However, there was little information available to describe the trade literature documents, such as only title and author. Consequently, the quality of search results fell short of user expectations.
The desired potential of the new “knowledge search” via the intranet could not develop. The result was a lack of acceptance despite high-quality content and the new portal technology. Both more and better optimized metadata was needed to facilitate access to information through the desired filters and search functionalities. Avantgarde Labs was engaged to procure the required data and to adapt the search solution.
Metadata enrichment was accomplished through the classification of the content objects in a standardized classification system – the THEMA classification scheme. The core of the search solution was to be realized with an interplay of artificial intelligence and statistical analysis. Stabilizing the classification results required the use of the largest possible, already pre-classified data set available. Avantgarde Labs was able to win the leading German library service provider, EKZ Bibliotheksservice GmbH, from its partner network to provide this learning data set. Since the search queries as well as the content objects were available in both English and German, a language-independent representation of the data had to be created.
This was realized with the BERT transformer, which uses a neural network to group semantically identical terms under so-called tokens. Using Elasticsearch’s Significant Terms analysis, the mapping of the tokens to the THEMA classes in the sense of a document classification system could then be realized. This allowed all content objects to be enriched with the best-fitting THEMA classes. In combination with an optimized search solution, retrieval could be improved, because similar to the content objects, the user’s search queries are first abstracted using tokens and then enriched with topic classes. This enables both intelligent, language-independent full-text searching as well as topic faceting.
Avantgarde Labs supplemented the existing intranet solution with an AI-based search solution and document classification. User queries can be better understood already during the search process, and by enriching the content objects with metadata, users have additional options for narrowing their search results. The improved information supply in the search as well as interest-centered search results enable an increase in the user’s ability to innovate.
In addition, AI-based metadata enrichment automates an important process step in the editorial work and supports thousands of employees in their daily knowledge work. This increases user productivity by eliminating resource-intensive administrative work. By the end of the project, user confidence in the intranet search had been demonstrably increased because, on the one hand, empty search results were avoided and, on the other, the answers delivered also corresponded precisely to the search queries.