Using information in structured text documents

Traditional knowledge management tools relied on structured data often stored in relational databases. Using artificial intelligence (AI), natural language processing (NLP), and information retrieval (IR) technologies it is now possible for automated systems to organize and use information in structured text documents like web pages, PDF files, and word processing documents.

New March 25, 2015: Video demo of product

This is just a quick video (one try and no script) showing a few features that I am working on Quicktime demo movie

I will upload a better demo before the end of April when the system is feature complete.

The product

The system is a general purpose document repository with tools to automatically map semantic relationships in documents. These semantic mappings enhance IR by providing both a natural language query interface and enhanced semantic search capabilities.

A key part of the product is real time asynchronous display of information found outside of the local document repository that is hopefully relevant to what the user is browsing in the local document repository. The product uses either the public DBPedia Lookup service, or for better performance a local installation of DBPedia Lookup. The system also uses the Microsoft Bing serach APIs so when you install the product on your own servers you will need a Microsoft Marketplace account and sign up for the Bing APIs.

The system is currently in internal development and a public beta release is planned for the first quarter of 2015.

How does it work?

We are glad you asked! Input documents are processed to form an in-memory graph containing semaintic links across all documents in a repository. Entities (or "things" like people, companies, organizations, and geographic objects) are identified in text and semantic relationships are calculated. These relationships help the system automatically tie different sections in documents (and sections between documents) together and help system users discover all available information for their current work tasks.

How well does it work?

In all honesty, NLP and IR are evolving technologies and the product goals for are to make it easier for users to do their work with some automation but the human user is still the key resource. The system helps but the end user still "does the work."

Technology stack

The server side code is 80% Clojure and 20% Java. The "fat client" web browser UI is written in Clojurescript.

Please try my NLP demos is my experiments using natural language processing. The KBSportal software is available for purchase. is an experimental Natural Language Processing search system to answer who/where/when questions using the DBPedia linked data.