<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Text Technologies &#187; BI integration</title>
	<atom:link href="http://www.texttechnologies.com/category/bi-integration/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.texttechnologies.com</link>
	<description>Understanding technology ... in both senses of the phrase</description>
	<lastBuildDate>Sun, 28 Feb 2010 05:30:01 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>MEN ARE FROM EARTH, COMPUTERS ARE FROM VULCAN</title>
		<link>http://www.texttechnologies.com/2009/05/30/men-are-from-earth-computers-are-from-vulcan/</link>
		<comments>http://www.texttechnologies.com/2009/05/30/men-are-from-earth-computers-are-from-vulcan/#comments</comments>
		<pubDate>Sat, 30 May 2009 06:15:44 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[BI integration]]></category>
		<category><![CDATA[IBM and UIMA]]></category>
		<category><![CDATA[Language recognition]]></category>
		<category><![CDATA[Natural language processing (NLP)]]></category>
		<category><![CDATA[Progress and EasyAsk]]></category>
		<category><![CDATA[Search engines]]></category>
		<category><![CDATA[Speech recognition]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=331</guid>
		<description><![CDATA[The newsletter/column excerpted below was originally published in 1998.  Some of the specific references are obviously very dated.  But the general points about the requirements for successful natural language computer interfaces still hold true.  Less progress has been made in the intervening decade-plus than I would have hoped, but some recent efforts &#8212; especially in [...]]]></description>
			<content:encoded><![CDATA[<p><em>The newsletter/column excerpted below was originally published in 1998.  Some of the specific references are obviously very dated.  But the general points about the requirements for successful natural language computer interfaces still hold true.  Less progress has been made in the intervening decade-plus than I would have hoped, but some recent efforts &#8212; especially in the area of search-over-business-intelligence &#8212; are at least mildly encouraging.  Emphasis added.<br />
</em></p>
<p>Natural language computer interfaces were introduced commercially about 15 years ago*.  They failed miserably.</p>
<p><em>*I.e., the early 1980s</em></p>
<p style="margin-bottom: 0in;">For example, Artificial Intelligence Corporation&#8217;s Intellect was a natural language DBMS query/reporting/charting tool.  It was actually a pretty good product.  But it&#8217;s infamous among industry insiders as the product for which IBM, in one of its first software licensing deals, got about 1700 trial installations &#8212; and less than a 1% sales close rate.  Even its successor, Linguistic Technologies&#8217; English Wizard*, doesn&#8217;t seem to be attracting many customers, despite consistently good product reviews.</p>
<p style="margin-bottom: 0in;"><em>*These days (i.e., in 2009) it&#8217;s owned by Progress and called EasyAsk. It still doesn&#8217;t seem to be selling </em>well.</p>
<p style="margin-bottom: 0in;">Another example was HAL, the natural language command interface to 1-2-3.  HAL is the product that first made Bill Gross (subsequently the founder of Knowledge Adventure and idealab!) and his brother Larry famous.  However, it achieved no success*, and was quickly dropped from Lotus&#8217; product line.</p>
<p style="margin-bottom: 0in;"><em>*I loved the product personally. But I was sadly alone.</em></p>
<p style="margin-bottom: 0in;"><strong>In retrospect, it&#8217;s obvious why natural language interfaces failed.</strong> First of all, <strong>they offered little advantage over the  forms-and-menus paradigm</strong> that dominated enterprise computing in  both the online-character-based and client-server-GUI eras.  If you  couldn&#8217;t meet an application need with forms and menus, you couldn&#8217;t meet it with natural language either.<span id="more-331"></span></p>
<p style="margin-bottom: 0in;">Even worse, NL actually had a couple of clear disadvantages versus traditional interfaces.  First of all,<strong> it required (ick!) typing,</strong> often more typing than the forms and menus did.  Second, <strong>forms and menus tell the user exactly what he can do.</strong> Natural language, however, lets him give orders the computer doesn&#8217;t know how to follow.  This is inefficient, not to mention frustrating.</p>
<p style="margin-bottom: 0in;">However, even in 1983, it was obvious that the typing objection would go away some day, because of speech recognition &#8212; once desktop computers reached 100 MIPs or so.  (Effective keyboard-replacement speech recognition <span style="font-family: Arial Unicode MS;">&#8211; </span>as opposed to true natural language understanding &#8212; is mainly a matter of processing power.)  15 years later, standard PCs exceed 100 MIPs (assuming that 1 MIPs = a couple of megahertz for these purposes), and speech recognition is indeed getting practical.</p>
<p style="margin-bottom: 0in;">In fact, as become increasingly evident recently, speech recognition is now a hot technology.  Bill Gates has been talking it up for a couple of years.  Increasingly, the press has swung to believing him &#8230; And my parents just bought a PC with two speech recognition products on it.</p>
<p style="margin-bottom: 0in;">That said, speech recognition is as misunderstood (no pun intended) as most artificial intelligence technologies.  Yes, it beats typing, in a number of circumstances:</p>
<ul>
<li>On the telephone (duh!)</li>
<li>&#8220;Busy hands&#8221; and/or &#8220;busy eyes&#8221; applications and locales (doctors<span style="font-family: Arial Unicode MS;">&#8216; </span>offices, trading floors, warehouses, etc. <span style="font-family: Arial Unicode MS;">&#8211; </span>and, some day in the future, your kitchen and car)</li>
<li>People simply reluctant to type (e.g., anybody with sufficient wrist or back problems, and many males over the age of 45)</li>
</ul>
<p>But before our computers talk back and forth with us in the voice of Majel Barrett Roddenberry, applications are going to have to add several important elements required for truly functional natural-language  interfaces:</p>
<ul>
<li><strong>Intuitively clear names for 	everything on (or just behind) the screen</strong></li>
<li><strong>Application-specific 	disambiguation logic</strong></li>
</ul>
<p style="margin-bottom: 0in;">For most practical purposes, the latter requirement equates to</p>
<ul>
<li>
<p style="margin-bottom: 0in;">A new generation of document 	selection technology</p>
</li>
</ul>
<p style="margin-bottom: 0in;">THE RULE OF NAMES</p>
<p>According to legend, knowing something&#8217;s name gives you power over it.  When that &#8220;something&#8221; is a button or menu choice on a speech-enabled computer, the legend is literally true.  But when a feature doesn&#8217;t have an obvious name, you can&#8217;t easily invoke it.</p>
<p>When applications consisted mainly of forms and menus, this was rarely a problem.  Everything had a clear role and label.  But web pages are less organized.  Hyperlinks can be scattered all over the place, with little rhyme or reason.</p>
<p>Frankly, I don&#8217;t think this is a hard problem to solve.  It wouldn&#8217;t take a lot of XML to divide the page into clear regions, so that commands like &#8220;Show me article #3&#8243; (on a search results list) could be interpreted in the obvious way.  But it does take at least some discipline; random web pages will not necessarily be easy to &#8220;talk&#8221; to.</p>
<p>CYBERNETIC LISTENING SKILLS</p>
<p><strong>The bigger challenge is to make sure that the application can respond in some useful way, no matter what command it&#8217;s given. </strong> This is even more difficult than it was 15 years ago, because of the radical increase in &#8220;casual&#8221; computer usage.  In the old days, we could assume the user had some clear business reason for using the application, and if necessary that s/he had time to be trained (even if people rarely sat still for as much training as they really needed).  Therefore, we could at least assume that the users had at least a general idea of what the application did, and hence of which commands the computer could obey.  From an NL standpoint, we could assume that what they actually &#8220;said&#8221; (which in those days meant &#8220;typed&#8221;) was at least reasonably close to what they were &#8220;supposed&#8221; to say.</p>
<p>Now, however, some of the most important applications are internet e-commerce and portals, competing and begging for the user&#8217;s attention.  The user is there strictly on a voluntary basis, and if he doesn&#8217;t get immediate gratification, he<span style="font-family: Arial Unicode MS;">&#8216;</span>s gone, history, hasta la bye-bye.  Site-specific training isn&#8217;t even a consideration. And even if somebody did actually take a class on &#8220;How to use Excite,&#8221; the knowledge would be obsolete in six months.  So <strong>applications, if they are to have natural language interfaces that please and respond to users, have to be able to respond pretty much to any command.</strong></p>
<p>Ideally, voice-enabled systems would be like the computers on Star Trek, which can return information from vast archives, brew a pot of Earl Grey tea, play three parts of a quartet, create self-aware life forms, or answer questions like &#8220;Computer, what is the nature of the universe?&#8221;  More realistically, they should be able, for example, to respond to a command like &#8220;Tell me about flights to Miami&#8221; by automatically giving the user a travel-reservation application or web page, and entering Miami in the appropriate form field.</p>
<p>If one thinks about the complications in such a system, it becomes clear that there are only two possible ways an application system can be designed to respond meaningfully to an enormous range of reasonable possible requests.</p>
<p>1. It can do the equivalent of saying &#8220;I&#8217;m sorry, I didn&#8217;t understand that,&#8221; &#8220;I&#8217;m sorry, I can&#8217;t do that,&#8221; and so on.</p>
<p>2. It can interpret many commands as text-search strings, and return appropriate results.</p>
<p>The first strategy <span style="font-family: Arial Unicode MS;">&#8211; </span>application-specific disambiguation logic, clear responses to &#8220;errors,&#8221; etc. &#8212; is absolutely necessary.  No software is perfectly intelligent; <strong>the user will have to be asked for disambiguation help from time to time</strong> (just as clerks today ask customers to repeat their requests!). I&#8217;m not going to go into much detail about how that works because, frankly, it&#8217;s a tricky thing to get right.  Users hate unnecessary disambiguation steps. They also hate the incorrect responses that result from ambiguity, and do tolerate being asked for help when it&#8217;s truly needed.  In short, whatever you build the first time around will probably be wrong.  So build something fast; then run, don&#8217;t walk, to the nearest usability lab, find out how you screwed up, and redo your system until you get it right.</p>
<p>I&#8217;m convinced that the second strategy &#8212; <strong>heavy reliance on text search technology &#8212; is a requirement as well. </strong> Just try to name a major web site that doesn&#8217;t use text search.  True, text search has gotten a bad rap recently, mainly because a whole generation of search engines didn&#8217;t really work.  But it will stage a comeback.</p>
<p><em><strong>Related links</strong></em></p>
<ul>
<li>My <a href="http://www.texttechnologies.com/2007/12/02/voice-dictation-nuance-dragon-naturallyspeaking/" >December, 2007 survey of speech recognition technology</a></li>
<li><a href="http://www.monashreport.com/2009/05/12/star-trek-companions/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.monashreport.com');">Star Trek fun</a></li>
</ul>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2009/05/30/men-are-from-earth-computers-are-from-vulcan/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The phrase &#8220;business intelligence&#8221; was COINED for text analytics</title>
		<link>http://www.texttechnologies.com/2008/07/11/the-phrase-business-intelligence-was-coined-for-text-analytics/</link>
		<comments>http://www.texttechnologies.com/2008/07/11/the-phrase-business-intelligence-was-coined-for-text-analytics/#comments</comments>
		<pubDate>Fri, 11 Jul 2008 07:31:00 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[BI integration]]></category>
		<category><![CDATA[Categorization and filtering]]></category>
		<category><![CDATA[IBM and UIMA]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[knowledge management]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=264</guid>
		<description><![CDATA[Late last year, there was a little flap about who invented the phrase business intelligence.  Credit turns out to go to an IBM researcher named H. P. Luhn, as per this 1958 paper.  Well, I finally took a look at the paper, after Jeff Jones of IBM sent over another copy.  And [...]]]></description>
			<content:encoded><![CDATA[<p>Late last year, there was <a href="http://www.softwarememories.com/2007/12/02/disputed-history-of-the-term-business-intelligence/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.softwarememories.com');">a little flap about who invented the phrase <em>business intelligence</em></a>.  Credit turns out to go to an IBM researcher named H. P. Luhn, as per <a href="http://www.research.ibm.com/journal/rd/024/ibmrd0204H.pdf" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.research.ibm.com');">this 1958 paper</a>.  Well, I finally took a look at the paper, after Jeff Jones of IBM sent over another copy.  And guess what?  It&#8217;s all about text analytics.  Specifically, it&#8217;s about what we might now call a combination of classification and knowledge management.</p>
<p>Half a century later, the industry is finally poised to deliver on that vision.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2008/07/11/the-phrase-business-intelligence-was-coined-for-text-analytics/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>6 trends that could shake up the text analytics market</title>
		<link>http://www.texttechnologies.com/2008/06/19/6-trends-that-could-shake-up-the-text-analytics-market/</link>
		<comments>http://www.texttechnologies.com/2008/06/19/6-trends-that-could-shake-up-the-text-analytics-market/#comments</comments>
		<pubDate>Thu, 19 Jun 2008 08:33:31 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[BI integration]]></category>
		<category><![CDATA[Enterprise search]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Search engines]]></category>
		<category><![CDATA[Social software and online media]]></category>
		<category><![CDATA[Text mining]]></category>
		<category><![CDATA[Cache']]></category>
		<category><![CDATA[Intersystems]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=251</guid>
		<description><![CDATA[My last two posts were based on the introductory slide to my talk The Text Analytics Marketplace: Competitive landscape and trends. I&#8217;ll now jump straight ahead to the talk&#8217;s conclusion.
Text analytics vendors participate in the same trends as other software and technology vendors.  For example, relational business intelligence and data warehousing products are increasingly [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in; font-style: normal;"><span>My <a href="http://www.texttechnologies.com/2008/06/19/text-analytics-marketplace-competitive-landscape-trends/" >last</a> <a href="http://www.texttechnologies.com/2008/06/19/3-specialized-markets-for-text-analytics/" >two</a> posts were based on the introductory slide to my talk </span><em><span>The Text Analytics Marketplace: Competitive landscape and trends. </span></em><span style="font-style: normal;"><span>I&#8217;ll now jump straight ahead to the talk&#8217;s conclusion.</span></span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span style="font-style: normal;"><span>Text analytics vendors participate in the same trends as other software and technology vendors.  For example, relational business intelligence and data warehousing products are increasingly being sold to departmental buyers.  Those buyers place particularly high value on ease of installation.  And golly gee whiz, both parts of that are also true in text mining. </span></span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span style="font-style: normal;"><span>But beyond such general trends, I&#8217;ve identified six developments that I think could radically transform the text analytics market landscape.  Indeed, they could invalidate the neat little eight-bucket categorization I laid out in the prior post.  Each is highly likely to occur, although in some cases the timing remains greatly in doubt.</span></span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span style="font-style: normal;"><span>These six market-transforming trends are:</span></span></p>
<ol>
<li> Web/enterprise/messaging 	integration</li>
<li> BI 	integration</li>
<li> Universal 	message retention</li>
<li> Portable 	personal profiles</li>
<li> Electronic 	health records</li>
<li> Voice 	command &amp; control</li>
</ol>
<p style="margin-bottom: 0in; font-style: normal;"><span id="more-251"></span><span>I&#8217;ll explain briefly.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>1.  Google and Microsoft are two of the three leaders in web search.  Now that Microsoft has bought FAST, they are also two of the leaders in enterprise search.  They are also two of the leaders in hosted email. Ditto instant messaging.  So </span><strong>there&#8217;s a good chance these various disciplines will converge.</strong></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>2.  There are a number of ways text analytics and traditional analytics can and are being integrated:</span></p>
<ul>
<li><span>Enterprise 	search and business intelligence are akin; both involve digging 	information out of the data you already have.</span></li>
<li><span>Text 	mining is naturally integrated with business intelligence and/or 	data mining.</span></li>
<li><span>There&#8217;s 	a trend toward using text search to dig up business intelligence 	documents such as specific reports, spreadsheets, etc.</span></li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;"><span>To date the latter is focused on reports that already exist, rather than queries that could be run on the fly, but I hope and trust the technology will be extended over time.  Natural language queries have merit anyway; </span><strong>I&#8217;d like to see the search box be extended in functionality to a true data-retrieval command line.</strong></p>
<p style="margin-bottom: 0in; font-style: normal;">3.  One of the big purchase drivers of storage, search, and clustering technology is mandates to preserve information and make it available to auditors, regulators, and/or people who want to sue you.  Email in particular is changing from being ephemeral to becoming part of the permanent record.  Well, if the information is being retained anyway, then maybe it&#8217;s time to see how to get useful insight from it.</p>
<p style="margin-bottom: 0in; font-style: normal;"><strong>Right now, a company&#8217;s overall text archives aren&#8217;t being leveraged in the same way data warehouses are.  That will change.</strong></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>4.  For over a decade, online companies have fought to exploit the fact that users were registered with their sites or services, but not with others.  Huge amounts of investment money were wasted in the dot-com bubble because people thought “registered users” was a significant metric, or that ISP subscribers could be directed to proprietary content.  Enormous valuations are being assigned to Facebook and LinkedIn on similar theories today.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>But as site owners and other marketers get ever more aggressive about exploiting user-specific information, users will get ever more sophisticated about controlling it. </span><strong>The obvious solution is for each internet user to control a sophisticated database of their contact information, presence information, actions, preferences, and writings, and to be very selective about which online services are allowed to see which portions of the data. </strong><span>I think that will come about some day, but I don&#8217;t know when.  When it does, text analytics will be affected in a variety of interesting ways.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>5.  Electronic health records are almost unique in IT.  What other enterprise app can you think of for which relational DBMS aren&#8217;t the default underpinning?  (Intersystems&#8217; object-oriented DBMS Cache&#8217; has huge share in the clinical records market.)   Normal tabular data, text, images, sensor output streams – health records have it all.  What&#8217;s more, the health records area is coming upon some very interesting times in the area of data sharing, at least in the US.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>Just as retailing went from being an IT backwater (through the mid-1980s), to a sophisticated user of database technology (1990s), to the leader of the internet revolution (rise of e-commerce), </span><strong>I think health care is due to take a leadership role in IT advances</strong><span>.   And when it does, search, text mining, and voice recognition will all play important roles.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>6.  Most people reading this far have probably watched Star Trek.</span><strong> Well, what is keeping us from being able to command computers in a Star Trek fashion?  Not really that much. </strong><span> Sure, there are some big missing pieces.  We need a mapping from commands to the specific applications that would carry them out.  We also need a more structured kind of analytic middle tier so that there&#8217;s something to map questions to.  But those are solvable problems.  And by the way – when everybody wears headphones, voice commands emanating from the next cubicle are no longer the big annoyance they would be today.  Mobile/small devices only add to the business case for voice recognition advances.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>When voice becomes a primary mode of human/device communication, “text” analytics will be affected in any number of ways.</span></p>
<p style="margin-bottom: 0in;"><em><strong>Related links:</strong></em></p>
<ul>
<li><a href="http://www.texttechnologies.com/2008/06/19/text-analytics-marketplace-competitive-landscape-trends/" >The introductory post in this series</a></li>
<li><a href="http://www.texttechnologies.com/2008/02/03/microsoft-yahoo-synergies/" >19 possible Microsoft/Yahoo synergies</a>, many of them related to text technology convergence, e.g. between web search and enterprise search</li>
<li>The compelling case for <a href="http://www.monashreport.com/2008/01/04/early-thoughts-on-outsourcing-to-google-mail/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.monashreport.com');">letting Google handle your enterprise email</a></li>
<li>An old post on <a href="http://www.texttechnologies.com/2006/09/01/why-the-bi-vendors-are-integrating-with-google-onebox/" >why BI vendors flocked to integrate with Google OneBox</a></li>
<li>A proposal to <a href="http://www.texttechnologies.com/2007/02/06/what-is-linkedin-needed-for-absolutely-nothing-and-the-same-goes-for-myspace/" >refactor social networks</a></li>
<li>An old post in which I outlined some of the criteria for <a href="http://www.dbms2.com/2005/11/17/native-xml-storage-part-2-apps/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.dbms2.com');">Profiles 2.0</a></li>
<li><a href="http://www.networkworld.com/community/node/29109" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.networkworld.com');">Why text technologies are going to recombine</a> (in <em>A World of Bytes</em>)</li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">
<p style="margin-bottom: 0in; font-style: normal;">
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2008/06/19/6-trends-that-could-shake-up-the-text-analytics-market/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Text Analytics Marketplace: Competitive landscape and trends</title>
		<link>http://www.texttechnologies.com/2008/06/19/text-analytics-marketplace-competitive-landscape-trends/</link>
		<comments>http://www.texttechnologies.com/2008/06/19/text-analytics-marketplace-competitive-landscape-trends/#comments</comments>
		<pubDate>Thu, 19 Jun 2008 07:35:39 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Audio and video search]]></category>
		<category><![CDATA[BI integration]]></category>
		<category><![CDATA[Custom publishing]]></category>
		<category><![CDATA[Enterprise search]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Natural language processing (NLP)]]></category>
		<category><![CDATA[Nuance]]></category>
		<category><![CDATA[Progress and EasyAsk]]></category>
		<category><![CDATA[Search engines]]></category>
		<category><![CDATA[Social software and online media]]></category>
		<category><![CDATA[Spam and antispam]]></category>
		<category><![CDATA[Speech recognition]]></category>
		<category><![CDATA[Structured search]]></category>
		<category><![CDATA[Text Analytics Summit]]></category>
		<category><![CDATA[Text mining]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=249</guid>
		<description><![CDATA[As I see it, there are eight distinct market areas that each depend heavily on linguistic technology. Five are off-shoots of what used to be called “information retrieval”:
1.  Web search
2.  Public-facing site search
3.  Enterprise search and knowledge management
4.  Custom publishing
5.  Text mining and extraction
Three are more standalone:
6.  Spam filtering
7. [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">As I see it, there are eight distinct market areas that each depend heavily on linguistic technology. Five are off-shoots of what used to be called “information retrieval”:</p>
<p style="margin-bottom: 0in; font-style: normal; padding-left: 30px;">1.  Web search</p>
<p style="margin-bottom: 0in; font-style: normal; padding-left: 30px;">2.  Public-facing site search</p>
<p style="margin-bottom: 0in; font-style: normal; padding-left: 30px;">3.  Enterprise search and knowledge management</p>
<p style="margin-bottom: 0in; font-style: normal; padding-left: 30px;">4.  Custom publishing</p>
<p style="padding-left: 30px;">5.  Text mining and extraction</p>
<p style="margin-bottom: 0in; font-style: normal;">Three are more standalone:</p>
<p style="margin-bottom: 0in; font-style: normal; padding-left: 30px;">6.  Spam filtering</p>
<p style="margin-bottom: 0in; font-style: normal; padding-left: 30px;">7.  Voice recognition</p>
<p style="margin-bottom: 0in; font-style: normal; padding-left: 30px;">8.  Machine translation</p>
<p><span id="more-249"></span></p>
<p style="margin-bottom: 0in;">This list comes from a talk I gave Monday at the Text Analytics Summit called <em>The Text Analytics Marketplace: Competitive landscape and trends. </em>In half an hour, I covered the first five areas (in Sue Feldman&#8217;s word, at a “gallop”). The slide deck has been uploaded to the link below.  <span style="font-style: normal;"><span>I plan to break out the material from the talk into a series of blog posts over the next few (or perhaps not-so-few) weeks. </span></span></p>
<p style="margin-bottom: 0in;"><em><strong>Slides:</strong></em></p>
<ul>
<li><a href="http://www.monash.com/Text-analytics-markets-June-2008.ppt " onclick="javascript:pageTracker._trackPageview('/outbound/article/www.monash.com');"><span>The Text Analytics Marketplace: Competitive landscape and trends</span></a></li>
</ul>
<p style="margin-bottom: 0in;"><strong><em>Other posts based on those slides:</em></strong></p>
<ul>
<li><span><a href="http://www.texttechnologies.com/2008/06/19/3-specialized-markets-for-text-analytics/" >Three specialized markets for text analytics</a> (based on Slide 2)</span></li>
<li><span><a href="http://www.texttechnologies.com/2008/06/19/6-trends-that-could-shake-up-the-text-analytics-market/" >6 trends that could shake up the text analytics market</a> (based on Slide 19)</span></li>
<li><span><a href="(in A World of Bytes)">Why search technologies are going to recombine</a> (in <em>A World of Bytes</em>, based on Slide 19)<br />
</span></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2008/06/19/text-analytics-marketplace-competitive-landscape-trends/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Attivio tries to do it all</title>
		<link>http://www.texttechnologies.com/2007/12/12/attivio-tries-to-do-it-all/</link>
		<comments>http://www.texttechnologies.com/2007/12/12/attivio-tries-to-do-it-all/#comments</comments>
		<pubDate>Wed, 12 Dec 2007 04:38:55 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Attivio]]></category>
		<category><![CDATA[BI integration]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Open source text analytics]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/2007/12/12/attivio-tries-to-do-it-all/</guid>
		<description><![CDATA[When Andrew McKay was at FAST, I grumped about his search/BI integration story.   Now that he&#8217;s trying to do the same thing at a startup called Attivio, it sounds more plausible.
Attivio is having a house party and product rollout in the latter part of January, and details are scarce in the mean time. [...]]]></description>
			<content:encoded><![CDATA[<p>When Andrew McKay was at FAST, I grumped about his <a href="http://www.texttechnologies.com/2007/02/01/what%e2%80%99s-interesting-about-the-fast-venture-in-bi/" >search/BI integration story</a>.   Now that he&#8217;s trying to do the same thing at a startup called <a href="http://www.attivio.com" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.attivio.com');">Attivio</a>, it sounds more plausible.</p>
<p>Attivio is having a house party and product rollout in the latter part of January, and details are scarce in the mean time.  But here are some highlights.</p>
<ul>
<li>Attivio was founded in August.  It has 21 people and 1 VC.  The VC has invested &gt;$6 million and committed &gt;$12 million total.</li>
<li>Attivio has ambitious plans for a fully integrated data management/real-time BI stack.  It&#8217;s currently called the &#8220;Active Intelligence Engine.&#8221;<span id="more-151"></span></li>
<li>The data management part combines tabular, text, and XML data.  The tabular part is some kind of bitmap.  The text part is fairly traditional, and based on Lucene.</li>
<li>One point of this architecture is that one can more or less seamlessly join different kinds of data.</li>
<li>Another point is surely that &#8212; with everything being more or less like a column or bitmap &#8212; memory management and administration are manageable issues.</li>
<li>Despite containing all these wonders, the code is under 10 megs total.  At least right now.  But then &#8212; how much code can one write in a few months?</li>
<li>Andrew didn&#8217;t want me to repeat everything he said about target markets, but clearly Wall Street is one of the top possibilities.</li>
</ul>
<p>Stay tuned.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2007/12/12/attivio-tries-to-do-it-all/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Clarabridge does SaaS, sees Inxight</title>
		<link>http://www.texttechnologies.com/2007/11/14/clarabridge-saas-inxight-uima-ibm-cognos/</link>
		<comments>http://www.texttechnologies.com/2007/11/14/clarabridge-saas-inxight-uima-ibm-cognos/#comments</comments>
		<pubDate>Wed, 14 Nov 2007 18:11:28 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[BI integration]]></category>
		<category><![CDATA[Clarabridge]]></category>
		<category><![CDATA[Comprehensive or exhaustive extraction]]></category>
		<category><![CDATA[IBM and UIMA]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Text mining]]></category>
		<category><![CDATA[Text mining SaaS]]></category>
		<category><![CDATA[business intelligence]]></category>
		<category><![CDATA[Business Objects]]></category>
		<category><![CDATA[Inxight]]></category>
		<category><![CDATA[software as a service]]></category>
		<category><![CDATA[uima]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/2007/11/14/clarabridge-saas-inxight-uima-ibm-cognos/</guid>
		<description><![CDATA[I just had a quick chat with text mining vendor Clarabridge&#8217;s CEO Sid Banerjee.  Naturally, I asked the standard “So who are you seeing in the marketplace the most?” question.  Attensity is unsurprisingly #1.  What&#8217;s new, however, is that Inxight – heretofore not a text mining presence vs. commercially-focused Clarabridge – has [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in">I just had a quick chat with text mining vendor Clarabridge&#8217;s CEO Sid Banerjee.  Naturally, I asked the standard “So who are you seeing in the marketplace the most?” question.  Attensity is unsurprisingly #1.  What&#8217;s new, however, is that Inxight – heretofore not a text mining presence vs. commercially-focused Clarabridge – has begun to show up a bit this quarter, via the Business Objects sales force.  Sid was of course dismissive of their current level of technological readiness and integration – but at least BOBJ/Inxight is showing up now.</p>
<p style="margin-bottom: 0in">The most interesting point was text mining SaaS (Software as a Service).  When Clarabridge first put out its “<a href="http://www.clarabridge.com/PressRelease/tabid/87/Default.aspx?&amp;PressReleaseID=200" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.clarabridge.com');">We offer SaaS now</a>!” announcement, I yawned.  But Sid tells me that about half of Clarabridge&#8217;s deals now are actually SaaS.  The way the SaaS technology works is pretty simple.  The customer gathers together text into a staging database – typically daily or weekly – and it gets sucked into a Clarabridge-managed Clarabridge installation in some high-end SaaS data center.  If there&#8217;s a desire to join the results of the text analysis with some tabular data from the client&#8217;s data warehouse, the needed columns get sent over as well.  And then Clarabridge does its thing. <span id="more-139"></span></p>
<p style="margin-bottom: 0in">It has always been the case that business intelligence was an IT systems software technology that often wound up being sold on an application basis to end-user departments.  Clarabridge very much fits that model.  And while it used to be the case that BI adoption was pretty simple, that&#8217;s increasingly not the case, which is one reason SaaS is appealing.  So this all makes a lot of sense.</p>
<p style="margin-bottom: 0in">Even so, I was surprised to hear that SaaS had so quickly become half of Clarabridge&#8217;s business.  Wow.</p>
<p style="margin-bottom: 0in">Since Clarabridge touts Cognos as an important partner, and <a href="http://www.texttechnologies.com/2007/11/12/everybodys-talking-about-structuredunstructured-integration/" >Cognos is being bought by IBM</a>, I also asked Sid about UIMA.   He basically responded that UIMA was unlikely to become relevant to Clarabridge any time soon, because the way Clarabridge interfaces with other software is SQL.  Up to a point, that makes great sense to me.  But if we buy into the comprehensive/exhaustive extraction story &#8212; as Clarabridge does &#8212; then the day should and will come when serious linguistic processing gets done on text <strong>after</strong> it is extracted into a relational database.   And if that happens, then all of a sudden SQL won&#8217;t be the only interface integrating text analytics with BI.</p>
<p><em></em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2007/11/14/clarabridge-saas-inxight-uima-ibm-cognos/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Everybody&#8217;s talking about structured/unstructured integration</title>
		<link>http://www.texttechnologies.com/2007/11/12/everybodys-talking-about-structuredunstructured-integration/</link>
		<comments>http://www.texttechnologies.com/2007/11/12/everybodys-talking-about-structuredunstructured-integration/#comments</comments>
		<pubDate>Mon, 12 Nov 2007 17:04:46 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[BI integration]]></category>
		<category><![CDATA[Business Objects and Inxight]]></category>
		<category><![CDATA[IBM and UIMA]]></category>
		<category><![CDATA[Cognos]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[unstructured data]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/2007/11/12/everybodys-talking-about-structuredunstructured-integration/</guid>
		<description><![CDATA[Today&#8217;s big news is IBM&#8217;s $5 billion acquisition of Cognos.  Part of the analyst conference call was two customer examples of how the companies had worked together in the past &#8212; and one of those two had a lot of &#8220;integration of structured and unstructured data.&#8221;  The application sounded more like a 360-degree [...]]]></description>
			<content:encoded><![CDATA[<p>Today&#8217;s big news is <a href="http://www.dbms2.com/2007/11/12/ibm-is-buying-cognos-%e2%80%93-quick-reactions/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.dbms2.com');">IBM&#8217;s $5 billion acquisition of Cognos</a>.  Part of the analyst conference call was two customer examples of how the companies had worked together in the past &#8212; and one of those two had a lot of &#8220;integration of structured and unstructured data.&#8221;  The application sounded more like a 360-degree customer view, retrieving text documents alongside relational records, than it did like hardcore text analytics.  Even so, it illustrates a trend that I was seeing even before BOBJ&#8217;s buy of Inxight, namely an increasing focus in the business intelligence world on <a href="http://www.texttechnologies.com/2006/09/01/why-the-bi-vendors-are-integrating-with-google-onebox/" >at least the trappings of text analytics</a>.<br />
<em><br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2007/11/12/everybodys-talking-about-structuredunstructured-integration/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Business Objects-Inxight update</title>
		<link>http://www.texttechnologies.com/2007/10/17/business-objects-inxight-update/</link>
		<comments>http://www.texttechnologies.com/2007/10/17/business-objects-inxight-update/#comments</comments>
		<pubDate>Wed, 17 Oct 2007 19:24:22 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Application areas]]></category>
		<category><![CDATA[BI integration]]></category>
		<category><![CDATA[Business Objects and Inxight]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Voice of the Customer]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/2007/10/17/business-objects-inxight-update/</guid>
		<description><![CDATA[I&#8217;m at the Business Objects annual user conference, and had a couple of chances to talk with Inxight/text analytics folks. When I asked about areas of commercial application traction, answers were similar to those I got from Attensity and Clarabridge, but not quite the same.  Specifically:

Voice of the Customer is definitely tops.
Some of the [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in">I&#8217;m at the Business Objects annual user conference, and had a couple of chances to talk with Inxight/text analytics folks. When I asked about areas of commercial application traction, answers were similar to those I got from <a href="http://www.texttechnologies.com/2007/10/05/text-mining-applications-as-per-attensity-and-clarabridge/" >Attensity and Clarabridge</a>, but not quite the same.  Specifically:</p>
<ul>
<li>Voice of the Customer is definitely tops.</li>
<li>Some of the other applications Attensity and Clarabridge mentioned appear as well (e.g., antifraud).</li>
<li>Business Objects also has a couple of customers looking at text mining as an aid to medical records, e.g. by helping to catch errors in tabular-field coding.</li>
<li>There are some projects in actual investment research/analysis/trading, e.g. in correlating news announcements and stock price movements.</li>
</ul>
<p style="margin-bottom: 0in">The Business Objects/Inxight folks also made a couple of interesting general technical points.  <span id="more-134"></span>When I challenged the usefulness of text analytics in dashboards, they pointed out how it can at least be a good drill-down. (Example: You&#8217;re getting unusually many of customer complaints in a particular time frame; you drill down into a text mining-based graphic to see which particular areas of complaint have spiked.) Also, when I mentioned exhaustive extraction, Ian Hersey pointed out that in many cases the intermediate results of Inxight tagging and so on happen to be stored in an RDBMS.</p>
<p style="margin-bottom: 0in">Perhaps most important, I got a general feeling that Business Objects is serious about integrating Inxight into its general product offerings, which is not good news for independent text mining vendors such as Attensity or Temis (except insofar as it heats up the acquisition market for same). On the other hand, I&#8217;ve gotten nothing but confirmation of my view that Business Objects plans to remain a good OEM partner, even to competitors such as SAS.</p>
<p style="margin-bottom: 0in"><em>This is my first post from the conference.  There surely will be more soon on </em><a href="http://www.dbms2.com/category/products-and-vendors/business-objects/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.dbms2.com');"><span style="font-style: normal">DBMS2</span></a><em> and the</em><span style="font-style: normal"> <a href="http://www.monashreport.com/category/vendors/business-objects/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.monashreport.com');">Monash Report</a>.</span></p>
<p style="margin-bottom: 0in">&nbsp;</p>
<p style="margin-bottom: 0in"><em>Get great research about text mining, data warehouse appliances, and other hot analytics-related topics! Subscribe to our comprehensive (if not exhaustive) <a href="http://www.monash.com/blogs.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.monash.com');">feed</a>, by RSS/Atom or e-mail! We recommend taking the integrated feed for all our blogs, but blog-specific ones are also easily available.</em></p>
<p style="margin-bottom: 0in">&nbsp;</p>
<p><em><p>Technorati Tags: <a href="http://technorati.com/tag/Business+Objects" onclick="javascript:pageTracker._trackPageview('/outbound/article/technorati.com');" rel="tag">Business Objects</a>, <a href="http://technorati.com/tag/Inxight" onclick="javascript:pageTracker._trackPageview('/outbound/article/technorati.com');" rel="tag"> Inxight</a>, <a href="http://technorati.com/tag/text+analytics" onclick="javascript:pageTracker._trackPageview('/outbound/article/technorati.com');" rel="tag"> text analytics</a></p></em></p>
<p style="margin-bottom: 0in">&nbsp;</p>
<p style="margin-bottom: 0in">&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2007/10/17/business-objects-inxight-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SAP is acquiring Inxight</title>
		<link>http://www.texttechnologies.com/2007/10/08/sap-is-acquiring-inxight/</link>
		<comments>http://www.texttechnologies.com/2007/10/08/sap-is-acquiring-inxight/#comments</comments>
		<pubDate>Mon, 08 Oct 2007 18:37:51 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[BI integration]]></category>
		<category><![CDATA[Business Objects and Inxight]]></category>
		<category><![CDATA[SAP]]></category>
		<category><![CDATA[Text mining]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/2007/10/08/sap-is-acquiring-inxight/</guid>
		<description><![CDATA[More precisely, SAP is acquiring Business Objects, and of course Business Objects already acquired Inxight.
 This could be interesting &#8230;
]]></description>
			<content:encoded><![CDATA[<p>More precisely, <a href="http://www.monashreport.com/2007/10/08/some-quick-thoughts-on-sap-acquiring-business-objects/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.monashreport.com');">SAP is acquiring Business Objects</a>, and of course Business Objects already acquired Inxight.</p>
<p> This could be interesting &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2007/10/08/sap-is-acquiring-inxight/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Clarabridge approach to text mining</title>
		<link>http://www.texttechnologies.com/2007/10/06/the-clarabridge-approach-to-text-mining/</link>
		<comments>http://www.texttechnologies.com/2007/10/06/the-clarabridge-approach-to-text-mining/#comments</comments>
		<pubDate>Sun, 07 Oct 2007 00:14:23 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[BI integration]]></category>
		<category><![CDATA[Clarabridge]]></category>
		<category><![CDATA[Comprehensive or exhaustive extraction]]></category>
		<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Text mining]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/2007/10/06/the-clarabridge-approach-to-text-mining/</guid>
		<description><![CDATA[And for my sixth text mining post this weekend, here are some highlights of the Clarabridge technology story.  (Sorry if it sounds clipped, but I&#8217;m a bit burned out &#8230;)

Like Attensity, Clarabridge practices exhaustive extraction.*  That is, they do linguistics against documents, extract all sorts of entities and relationships among the entities from [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in">And for my sixth text mining post this weekend, here are some highlights of the Clarabridge technology story.  (Sorry if it sounds clipped, but I&#8217;m a bit burned out &#8230;)</p>
<ul>
<li>Like Attensity, Clarabridge practices <em>exhaustive extraction.*  </em>That is, they do linguistics against documents, extract all sorts of entities and relationships among the entities from each document, and dump the results into a relational database.</li>
<li>Unlike Attensity, which uses <a href="http://www.texttechnologies.com/2006/06/24/attensity-extractive-exhaustion-and-the-frn/" >a simple normalized relational schema</a>, Clarabridge dumps the extracted data into a star schema.  (The Clarabridge folks are from Microstrategy, which – surely not coincidentally – also favors star schemas.)<span id="more-132"></span></li>
<li>For now, the linguistic part of the analysis is within a sentence, or else based on proximity, or (this sounded minor) based on the whole document.   But actual <em><a href="http://en.wikipedia.org/wiki/Anaphora_(linguistics)" onclick="javascript:pageTracker._trackPageview('/outbound/article/en.wikipedia.org');">anaphora</a> resolution</em> is coming soon.</li>
<li>The other big thing that goes into Clarabridge&#8217;s star schema is a category hierarchy, which has two aspects.  One is categories fixed in advance.  When I asked how many, CTO Justin Langseth cited an example range of 10-400.  I.e., it varies widely.  In principle, these are established by line-of-business folks at Clarabridge customers, but I&#8217;d venture to guess that professional services play a significant role as well.</li>
<li>The other kind of categories – subcategories to the first group – are created automagically at data load time via document clustering.  Indeed, they&#8217;re called “clusters.” These are available for drilldown via business intelligence tools.</li>
<li>Obviously it is good practice to have dashboards and scheduled reports depend only on the fixed categories, not the clusters.</li>
</ul>
<p><em>*I should note that Clarabridge understandably bristles a bit at my use of this Attensity-introduced term to describe what they do too. If Clarabridge wants to start talking about, say, “comprehensive extraction, I&#8217;ll consider adopting that term as well. But for now I&#8217;m going with what&#8217;s most widely used.</em></p>
<p><em>Want to continue getting great research about text mining, data warehouse appliances, and other hot analytics-related topics? Then subscribe to our comprehensive (if not exhaustive) <a href="http://www.monash.com/blogs.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.monash.com');">feed</a>, by RSS/Atom or e-mail! We recommend taking the integrated feed for all our blogs, but blog-specific ones are also easily available.</em></p>
<p style="margin-bottom: 0in"><em><p>Technorati Tags: <a href="http://technorati.com/tag/Clarabridge" onclick="javascript:pageTracker._trackPageview('/outbound/article/technorati.com');" rel="tag">Clarabridge</a>, <a href="http://technorati.com/tag/text+mining" onclick="javascript:pageTracker._trackPageview('/outbound/article/technorati.com');" rel="tag"> text mining</a>, <a href="http://technorati.com/tag/exhaustive+extraction" onclick="javascript:pageTracker._trackPageview('/outbound/article/technorati.com');" rel="tag"> exhaustive extraction</a></p></em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2007/10/06/the-clarabridge-approach-to-text-mining/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
