By: Text Technologies»Blog Archive » The text mining vendors continue to lack constructive vision

Wed, 19 Dec 2007 02:39:08 +0000

[…] active in text search, except to some extent in the custom-publishing vertical, despite the huge reliance of search vendors on text mining technologies. They aren’t getting traction in the archiving/compliance area. There don’t seem to be […]

By: Paolo Cavone

Paolo Cavone — Fri, 16 Mar 2007 16:10:34 +0000

I’m agree with Patrick. Text mining should (or must!) in effect *learn* (and store semantic structures)

[quote]
…Yes search engines like Google already return both documents and extracted elements from them. But these search engines don’t appear to have semantics built into them.
[/quote]

until the semantic structures are (too much) detected from the anchors of the backlink…

By: Patrick Herron

Patrick Herron — Sat, 11 Nov 2006 19:52:14 +0000

You got me, that’s for sure.

I understand that search and text mining are related and overlap. Part of what’s commonly called text mining–information extraction (IE)–powers some advanced features of search engines. Sure enough. But search engines take keywordese input, do some sort of finding operation, and return relevant documents. That’s the purpose of information retrieval. The most promising, interesting, and innovative applications of text mining are not information extraction engines but rather systems that do operations and return results not at the document level nor even at the extraction level but rather at the level of synthesis. Such applications take extracted elements and put them together in order to generate new information. They use inductive logic or some set of domain rules (taxonomic rules, if you like) to create information that did not previously exist. That’s not information-finding/search but information-generating. Some positive examples include Arrowsmith, the Robot Scientist, BioPubMiner, and pieces of Etzioni’s Machine Reading (MR) research. While many of technologies that power search are present in good text mining applications, text mining applications should do a whole lot more. I mean, look at how token-driven Google is. Search regurgitates. Text mining should in effect *learn*.

Numerous credible sources consider IE applications to be text mining, or more oxymoronically, “knowledge discovery.” Heart’s widely-read essays, or Weiss et al.’s definitive text mining text, place IE under the text mining umbrella. But IE is really just search taken one more step, from returning documents to returning document elements. Yes search engines like Google already return both documents and extracted elements from them. But these search engines don’t appear to have semantics built into them.

If anything, the relationship started very tightly and has separated over the years. Text mining will likely change names, as it already appears to be doing, thus making the process of separation somewhat more difficult to interpret.

Comments on: Text mining and search, joined at the hip

By: Text Technologies»Blog Archive » The text mining vendors continue to lack constructive vision

By: Paolo Cavone

By: Patrick Herron