Using information in structured text documents

Traditional knowledge management tools relied on structured data often stored in relational databases. Using artificial intelligence (AI), natural language processing (NLP), and information retrieval (IR) technologies it is now possible for automated systems to organize and use information in structured text documents like web pages, PDF files, and word processing documents.

The KnowledgeBooks.com product

The KnowledgeBooks.com system is a general purpose document repository with tools to automatically map semantic relationships in documents. These semantic mappings enhance IR by providing both a natural language query interface and enhanced semantic search capabilities.

A key part of the KnowledgeBooks.com product is real time asynchronous display of information found outside of the local document repository that is hopefully relevant to what the user is browsing in the local document repository. The KnowledgeBooks.com product uses either the public DBPedia Lookup service, or for better performance a local installation of DBPedia Lookup. The system also uses the Microsoft Bing serach APIs so when you install the KnowledgeBooks.com product on your own servers you will need a Microsoft Marketplace account and sign up for the Bing APIs.

The KnowledgeBooks.com system is currently in internal development and a public beta release is planned for the first quarter of 2015.

How does it work?

We are glad you asked! Input documents are processed to form an in-memory graph containing semaintic links across all documents in a repository. Entities (or "things" like people, companies, organizations, and geographic objects) are identified in text and semantic relationships are calculated. These relationships help the system automatically tie different sections in documents (and sections between documents) together and help system users discover all available information for their current work tasks.

How well does it work?

In all honesty, NLP and IR are evolving technologies and the product goals for KnowledgeBooks.com are to make it easier for users to do their work with some automation but the human user is still the key resource. The KnowledgeBooks.com system helps but the end user still "does the work."

Technology stack

The server side code is 80% Clojure and 20% Java. The "fat client" web browser UI is written in Clojurescript.

Please try my NLP demos

kbsportal.com is my experiments using natural language processing. The KBSportal software is available for purchase.

wizard.knowledgebooks.com is an experimental Natural Language Processing search system to answer who/where/when questions using the DBPedia linked data.