MediaWiki as a Document Driven decision support system: a Memex Project -

This is the first article from the Semantic Web series. Not all of them are already available, but on the works:

MediaWiki as a Document Driven decision support system: a Memex Project (this one)
Building an enterprise wiki (Pending)
A small business knowledge manager – The advanced Mediawiki (Pending)

Companies are literally spending billions each year with huge and complex decision support systems, which chew tens of gigs worth of data. However, on most of them you cannot publish a link to a web service, without putting it on a Microsoft Word file. This is as companies’ fault at it is the user’s.
Even today, for some users it’s unimaginable to send a photo without putting it inside a Word file. Linking different files or contents through … a link, is pure witchcraft for those people. However, those are the behaviors some companies value the most. On those companies, there is not such thing as information outside a file, how wrong they are…

This hasn’t all been the case…

There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers—conclusions which he cannot find time to grasp, much less to remember, as they appear.

This is as true today as when it was first published, in 1945, on Atlantic Magazine, by Dr Vannevar Bush. Today, there is literally a world of research and knowledge. Nowadays, there are not only scholar and scientific literature, but also, and not less important, technical documentation: how to accomplish a task, do something. All of those are however totally useless if no one can know its existence and access it, unencumbered and quickly.

This obviously links information and documents to knowledge. Knowledge that can be accessed and used, instead of staying buried beneath tens of papers, documents or specifications. Knowledge is useless if it cant reach its intended users. History already showed how this can happen, and when it does it impairs science for tens of years: the founder of modern genetics, Gregor Mendel, had its work unnoticed though out is life due to the fact that no peer was able to access it’s work. This happened countless times since writing was invented.

Dr Vannevar Bush described a tool, something he called a Memex, a device which mimics the way the human brain work, where concepts are linked on a very personal way, unique to the individual. It’s is way we currently call the cognitive map. This tool would be able to hold all knowledge, easily accessible to everyone, on which knowledge could be accessed and improved upon. Although in 1945 he could not look beyond microfilms and levers to implement this tool, Dr Bush did make some remarkable requirements for such a tool, namely:

Indexing – the memex shall be able to properly index all the content it possesses;
Associative indexing – the ability of associate two documents, or on today’s language, linking them;
Create comments – the user may be able to add comments to documents or links, and sharing them with other people;
Search – the use shall be able to search through all content present. The search results shall be ranked by importance. Also, when going through the search results, it’s important to have access to a short summary of the document’s contents in relation to the search query;
Add new documents – the user shall be able to add new documents.

Now, only recently the human race was able to create a tool which mimics most of the Memex features, and to the amusement of most, it’s free. Mediawiki, the software on which wikipedia runs on top of, is one of the tools which most resemble a Memex. Although Mediawiki is almost universally known for wikipedia, it’s actually used for knowledge management both on profit and non-profit organizations, specially on highly technical environments.

Let’s check it’s features one by one, and use Wikipedia as use case:

Indexing

Most people don’t use wikipedia’s index, but it is there. As there are more than 4.000.000 documents on the English wikipedia, several levels of indexes are needed ( to put things in perspective, if a single level index where to be used, as on an encyclopedia, more than 30.000 pages would be needed for the index alone). If you’ve used Wikipedia for years but have never seen it’s index, here it is:

Associative indexing

This is one of the most used features, and underscores the the whole raison d’état of the Memex. The ability to connect or link two related documents allows the user to create a trail of thinking and thoughts (as Dr. Bush describes). It’s only so often that someone accesses some Wikipedia page and go through a large number of other related pages before you noted you just spent too many time on it. As the following example, starting from the index page, I ended up on the Cherry Valley Battle page, and right after, Cherry Valley page, Otsego county, New York State, and I could go on forever…

Creating comments

On more places other than Wikipedia, there is the need for users to place comments on pages. This is due to the known lack of trust of the information it contains, as it is mostly based on user generated content, whereas the Memex was designed for expert generated content. Regardless of the reason, it’s more useful on Wikipedia than on most other uses, specially on the enterprise context.
However, on Mediawiki, all comments are public, whereas the Memex also defined private comments and notes. Although this is unfeasible on Mediawiki itself,it should be easy enough to implement client side, in the web browser. This is the case of the Note Anywhere Chrome extension, which offer far more functionality than expected for the Memex.

Search

Usually people don’t reach wikipedia directly but through a search engine, usually Google. However, for a standalone use, a Memex does needs search capabilities, which Mediawiki has. It’s not as powerful as Google, and for the most part doesn’t searches through binary files (read .doc, ,.xls, etc). And this is one of the main reasons why and organization should not rely on binary files to store it’s knowledge. As far as Mediawiki goes, predictive search (trying to give hints while you type) is in fact available out of the box, and this should be enough to demonstrate it’s search capabilities. Also, as Dr. Vannevar described, the search results shall include a summary of the content, which is also supported by Mediawiki.

Add new documents

Now, this is one of the most known features of Wikipedia, and also one of it’s biggest downfalls: everyone can add and edit new content, hopefully by adding new and solid knowledge. Some other people tried to add some peer reviewing workflow, but with limited success. Regardless, on a more limited scope, such as an organization this is less of a problem.

In sum

After some though one can realize that most decision support systems target the goal of generating information out of piles of data, by dumping huge piles of resources on it. However few are addressing the need of managing knowledge. It’s knowledge that matters, and it messes with the way people think and interact with each other.

Editor’s note: if you find this text helpful, please leave a message. It’s interesting to evaluate how people react to this.