July 17, 2006

Should ontology management be open sourced?

I’ve argued previously that enterprises need serious ontologies, and that this lack is holding back growth in multiple areas of text technology – search, text mining and knowledge extraction, various forms of speech recognition, and so on. The core point was:

The ideal ontology would consist mainly of four aspects:

1. A conceptual part that’s language-independent.
2. A general language-dependent part.
3. A sensitivity to different kinds of text – language is used differently when spoken, for instance, than it is in edited newspaper articles.
4. An enterprise-specific part. For example, a company has product names, it has competitors with product names, those names have abbreviations, and so on.

There are actually two different requirements – the enterprise-independent ontology, and the software to manage and use it in an enterprise-specific way. But while I continue to believe that this dual product category will emerge, my faith has wavered somewhat. The big vendors don’t “get it,” and the little ones lack the resources even if they do see the opportunity.

I discussed this with David Thede of dtSearch last week, and he raised an interesting question: Should this be an open source project? Some initial responses follow; also, I’d be very interested to know what the rest of the community thinks.

Peripheral parts of the software clearly can be drawn from the open source community. IBM has open-sourced UIMA, which seems like a perfectly good framework for modularity, integration, interoperability, and so on. Development tools can probably be based on Eclipse. Etc. Open source has plenty of applicability these days.

The core software needs a profit-motivated vendor. Open-source, to date, has been much more about implementing alternate versions of known technology than it has been about difficult first-time invention. And there’s a lot of invention still to be done here, especially in the area of taxonomy federation. Could there be an open source business model in which the vendor gives away the code and sells services? Sure. If nothing else, Monash’s Second Law of Commercial Semantics states “Where there are ontologies, there is consulting.” But somebody has to own the product, in every sense of “own,” or it never will see the light of day.

An enterprise-independent ontology ideally should be open-sourced. Probably, the leader of this effort should be the supplier of the ontology management software. WordNet is more or less public domain. Various taxonomies of proper nouns and industry jargon can also be found. Individual contributors can naturally provide new small pieces. So there’s a lot of reason to think that public domain/open source is naturally the way to go for an ontology.

That said, a big problem comes to mind – how do you get everyone to agree on what the structure will be? But isn’t that the same kind of problem that open source software development projects solve all the time? I think so.

But unfortunately it’s also the kind of problem that standards committees botch all the time. From that, I conclude that a public-domain ontology will need strong central leadership. Who’s qualified and incentivized to provide that leadership? I can’t think of anybody better suited than whoever emerges to seize the billion-dollar opportunity for ontology management software.


Leave a Reply

Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.