<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Text Technologies &#187; Text mining</title>
	<atom:link href="http://www.texttechnologies.com/category/text-mining/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.texttechnologies.com</link>
	<description>Understanding technology ... in both senses of the phrase</description>
	<lastBuildDate>Sat, 05 Jun 2010 04:23:24 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Maybe text mining SHOULD be playing a bigger role in data warehousing</title>
		<link>http://www.texttechnologies.com/2008/10/24/text-mining-data-warehousin/</link>
		<comments>http://www.texttechnologies.com/2008/10/24/text-mining-data-warehousin/#comments</comments>
		<pubDate>Fri, 24 Oct 2008 04:39:36 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Attensity]]></category>
		<category><![CDATA[Comprehensive or exhaustive extraction]]></category>
		<category><![CDATA[Sentiment analysis]]></category>
		<category><![CDATA[Text mining]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=289</guid>
		<description><![CDATA[When I chatted last week with David Bean of Attensity, I commented to him on a paradox: 
Many people think text information is important to analyze, but even so data warehouses don&#8217;t seem to wind up holding very much of it. 
My working theory explaining this has two parts, both of which purport to show [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><span style="font-style: normal;">When <a href="http://www.texttechnologies.com/2008/10/24/attensity-update-2/" >I chatted last week with </a></span><a href="http://www.texttechnologies.com/2008/10/24/attensity-update-2/" >David Bean of Attensity</a>, <span style="font-style: normal;">I commented to him on a paradox: </span></p>
<p style="margin-bottom: 0in;"><strong><span style="font-style: normal;">Many people think text information is important to analyze, but even so data warehouses don&#8217;t seem to wind up holding very much of it. </span></strong></p>
<p style="margin-bottom: 0in;"><span id="more-289"></span><span style="font-style: normal;">My working theory explaining this has two parts, both of which purport to show why text data generally doesn&#8217;t fit well into BI or data mining systems. One is that it&#8217;s just too messy and inconsistently organized.  The other </span><span style="font-style: normal;"><span>is that text corpuses generally don&#8217;t contain enough information.</span></span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;"><span>Now, I know that these theories aren&#8217;t wholly true, for I know of counterexamples.  E.g., while I&#8217;ve haven&#8217;t written it up yet, I did a call confirming that a recently published </span></span><a href="http://www.spss.com/press/template_view.cfm?PR_ID=1059" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.spss.com');"><span>SPSS text/tabular integrated data mining story</span></a><span style="font-style: normal;"><span> is quite real.  Still, it has felt for a while as if truth lies in those directions.</span></span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;"><span>Anyhow, David offered one useful number range:</span></span></p>
<p><span style="font-style: normal;"><strong>If you do exhaustive extraction on a text corpus, you wind up with 10-20X as much tabular data as you had in text format in the first place.</strong></span><span style="font-style: normal;"><span> (Comparing total bytes to total bytes.)</span></span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;"><span>So how big are those corpuses? I think most text mining installations usually have at least 10s of thousands of documents or verbatims to play with.  Special cases aside, the upper bound seems to usually be about two orders of magnitude higher. And most text-mined documents probably tend to be short, as they commonly are just people&#8217;s reports on a single product/service experience – perhaps 1 KB or so, give or take a factor of 2-3?  So we&#8217;re probably looking at 10 gigabytes of text at the low end, and a few terabytes at the high end, before applying David&#8217;s 10-20X multiplier.</span></span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;"><span>Hmm – that IS enough data for respectable data warehousing &#8230;</span></span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;"><span>Obviously, special cases like national intelligence or very broad-scale web surveys could run larger, as per <a href="http://www.dbms2.com/2008/10/05/marklogic-architecture-deep-dive/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.dbms2.com');">the biggest Marklogic databases</a>.  Medline runs larger too.</span></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2008/10/24/text-mining-data-warehousin/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Attensity update</title>
		<link>http://www.texttechnologies.com/2008/10/24/attensity-update-2/</link>
		<comments>http://www.texttechnologies.com/2008/10/24/attensity-update-2/#comments</comments>
		<pubDate>Fri, 24 Oct 2008 04:29:24 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Application areas]]></category>
		<category><![CDATA[Attensity]]></category>
		<category><![CDATA[Clarabridge]]></category>
		<category><![CDATA[Competitive intelligence]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Text mining]]></category>
		<category><![CDATA[Text mining SaaS]]></category>
		<category><![CDATA[Voice of the Customer]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=288</guid>
		<description><![CDATA[I had a brief chat with the Attensity guys at their Teradata Partners Conference booth – mainly CTO David Bean, although he did buck one question to sales chief Jeff Johnson.  The business trends story remained the same as it was in June:  The sweet spot for new sales remains Voice of the [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I had a brief chat with the Attensity guys at their Teradata Partners Conference booth – mainly CTO David Bean, although he did buck one question to sales chief Jeff Johnson.  The business trends story remained the same as it was in <a href="http://www.texttechnologies.com/2008/06/16/attensity-update-updated/" >June</a>:  The sweet spot for new sales remains Voice of the Customer/Voice of the Market, while on-premise/SaaS new-name accounts are split around 50-50 (by number, not revenue).</p>
<p style="margin-bottom: 0in;">David&#8217;s thoughts as to why the SaaS share isn&#8217;t even higher – as it seems to be for <a href="http://www.texttechnologies.com/2008/06/04/clarabridge-is-now-all-about-text-mining-saas/" >Clarabridge</a>* – centered on the point that some customers want to blend internal and external data, and may not want to ship the internal part out to a SaaS provider.  Besides, if it&#8217;s tabular data, I suspect Attensity isn&#8217;t the right place to ship it anyway.</p>
<p style="margin-bottom: 0in;"><em>*Speaking of Clarabridge, CEO Sid Banerjee recently posted a thoughtful company update in <a href="http://www.texttechnologies.com/2008/09/08/attensit-layered-messaging-marketing-model/" >this comment thread.</a></em></p>
<p style="margin-bottom: 0in;">When I challenged him on ease of use, David said that <strong>Attensity is readying a Microstrategy-based offering,</strong> which is obviously meant to compete with Clarabridge and any of its perceived advantages head-on.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2008/10/24/attensity-update-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Low-latency text mining in the investment market</title>
		<link>http://www.texttechnologies.com/2008/09/19/low-latency-text-mining-in-the-investment-market/</link>
		<comments>http://www.texttechnologies.com/2008/09/19/low-latency-text-mining-in-the-investment-market/#comments</comments>
		<pubDate>Fri, 19 Sep 2008 09:15:58 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[ClearForest/Reuters]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Sentiment analysis]]></category>
		<category><![CDATA[Text mining]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=282</guid>
		<description><![CDATA[I&#8217;m not at Gartner&#8217;s Event Processing conference, but there seem to be some interesting posts and articles coming out of it.  Seth Grimes has one on Reuters&#8217; integration of text mining and event processing, including sentiment analysis.  Well worth reading.  Lots more detail than I&#8217;ve ever posted on similar applications.
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m not at Gartner&#8217;s Event Processing conference, but there seem to be some interesting posts and articles coming out of it.  Seth Grimes has one on <a href="http://www.intelligententerprise.com/blog/archives/2008/09/event_processin_1.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.intelligententerprise.com');">Reuters&#8217; integration of text mining and event processing</a>, including sentiment analysis.  Well worth reading.  Lots more detail than I&#8217;ve ever posted on <a href="http://www.texttechnologies.com/2006/12/27/text-analytics-is-finally-being-used-for-investment-analysis/" >similar</a> <a href="http://www.texttechnologies.com/2007/08/03/more-on-text-processing-in-cep/" >applications</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2008/09/19/low-latency-text-mining-in-the-investment-market/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The layered messaging marketing model as applied to Attensity</title>
		<link>http://www.texttechnologies.com/2008/09/08/attensit-layered-messaging-marketing-model/</link>
		<comments>http://www.texttechnologies.com/2008/09/08/attensit-layered-messaging-marketing-model/#comments</comments>
		<pubDate>Mon, 08 Sep 2008 06:52:15 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Attensity]]></category>
		<category><![CDATA[Competitive intelligence]]></category>
		<category><![CDATA[Text mining]]></category>
		<category><![CDATA[Voice of the Customer]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=279</guid>
		<description><![CDATA[My general layered messaging theory survived its first test against an IT vendor example – Netezza.  Let&#8217;s try another, in this case a company that&#8217;s not a Monash Research client.
Attensity is a text mining vendor with a lot of cool technology.  Like other text mining vendors, it&#8217;s had mixed market success at best. [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">My general <a href="http://www.strategicmessaging.com/enterprise-technology-marketing-layered-messaging-model/2008/09/08/#more-35" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.strategicmessaging.com');"><strong>layered messaging</strong></a> theory survived its first test against an IT vendor example – Netezza.  Let&#8217;s try another, in this case a company that&#8217;s not a <a href="http://www.monash.com/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.monash.com');"><em>Monash Research</em></a> client.<span id="more-279"></span></p>
<p style="margin-bottom: 0in;">Attensity is a text mining vendor with a lot of cool technology.  Like other text mining vendors, it&#8217;s had mixed market success at best.  However, <a href="../2008/06/10/attensity-update/">sales activity suggests that Attensity recently put together it&#8217;s strongest marketing story ever</a>, specifically in its new <a href="http://www.texttechnologies.com/category/text-analytics-applications/voice-of-the-customer/" >Voice of the Customer</a> / <a href="http://www.texttechnologies.com/category/text-analytics-applications/competitive-intelligence-voice-of-the-market/" >Voice of the Market</a> (VotC/VotM) focus.</p>
<p style="margin-bottom: 0in;"><em><strong>Attensity Voice of the Market messaging stack</strong></em></p>
<ul>
<li>Know what real consumers think 	about your products/services, how they react to your marketing, and 	what stories are being told about you</li>
<li><em>The only way to listen in on 	actual consumer conversations.  Humans can&#8217;t begin to to do this.</em></li>
<li>Mine the Web to find out what&#8217;s 	being said about you; easy SaaS install</li>
<li><em>See – here are real, usable 	results</em></li>
<li>Extraction of the essence from any 	kind of text, as exhibited via proofs-of-concept</li>
</ul>
<p style="margin-bottom: 0in;">That&#8217;s a good story.  The technology works. Prospects can see that it works.  The benefits are self-evident, because the technology gives unique access to highly desirable information. (Obviously, you can&#8217;t have employees sit at their screens and try to read the whole Web on your behalf.)  The cost, time to installation, and so on are attractive.  All is good.</p>
<p style="margin-bottom: 0in;">Let&#8217;s now compare that to what probably was Attensity&#8217;s prior commercial focus, warranty analysis, for products like automobiles, other vehicles, and consumer electronics.  In this market, the story was something like:</p>
<p><em><strong>Attensity warranty messaging stack</strong></em></p>
<ul>
<li>Faster, more 	accurate warning of product problems</li>
<li><em>Human 	reading of the warranty claims is too slow or costly</em></li>
<li>Mine your 	warranty claims to see why your products break</li>
<li><em>See – here are real, usable 	results</em></li>
<li>Extraction of 	the essence from warranty claims, as exhibited via proofs-of-concept</li>
</ul>
<p style="margin-bottom: 0in;">That worked up to a point, which is a big part of why Attensity remained in business.  But in fact, there were relatively few customers for whom the assertion “Human reading of the warranty claims is too slow or costly” was true.  So relatively few sales on that basis were ever made.</p>
<p style="margin-bottom: 0in;">Now, as a market-success-prediction tool, this kind of analysis may seem like overkill.  In essence, all I&#8217;ve done is reiterate:</p>
<ul>
<li>Text mining 	has shown slow growth because too few customers had internal 	corpuses large enough to need it.</li>
<li>If you&#8217;re 	mining the whole Web, however, your corpus is enormous.</li>
</ul>
<p style="margin-bottom: 0in;">But this analysis has another point.  There&#8217;s a text mining industry consensus saying, more or less:</p>
<p style="margin-bottom: 0in;"><em>The text mining industry used to be too focused on the minutiae of technology and especially semantics, but now we&#8217;ve seen the light and are selling straight to business users who don&#8217;t really care about how the stuff works. </em></p>
<p style="margin-bottom: 0in;">As with most views held by a broad consensus of smart people, that one contains a lot of truth. But it&#8217;s missing a next act. Whether or not Attensity, Clarabridge, and TEMIS get acquired soon – as most industry participants seem to expect – it seems inevitable that there will be large, technology-rich contenders in the text mining market.  SAP/Business Objects/Inxight? Oracle/somebody? The enterprise search players? Dow Jones/Factiva?   One way or another, there will eventually be big companies in the text mining market.  Attensity (and the same goes for Clarabridge) isn&#8217;t doing much these days to position itself in advance of such an onslaught.</p>
<p style="margin-bottom: 0in;">Anyhow, whatever you think of my market-evolution views, it sure seems as if the layered-messaging template works in this example as well.</p>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2008/09/08/attensit-layered-messaging-marketing-model/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Lexalytics has merged with part of Infonic</title>
		<link>http://www.texttechnologies.com/2008/08/07/lexalytics-has-merged-with-part-of-infonics/</link>
		<comments>http://www.texttechnologies.com/2008/08/07/lexalytics-has-merged-with-part-of-infonics/#comments</comments>
		<pubDate>Thu, 07 Aug 2008 19:59:01 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Lexalytics]]></category>
		<category><![CDATA[Sentiment analysis]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=269</guid>
		<description><![CDATA[As reported on the Lexalytics blog, sentiment analysis specialist Lexalytics has merged with the text analytics division of Infonic to form Lexalytics Limited.   The deal seems to have a screwy financial structure &#8212; which Seth Grimes made a valiant effort to decipher (I think from vacation, poor guy) &#8212; as is common when [...]]]></description>
			<content:encoded><![CDATA[<p>As reported on the Lexalytics blog, sentiment analysis specialist <a href="http://www.lexalytics.com/lexablog/?p=68" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.lexalytics.com');">Lexalytics has merged with the text analytics division of Infonic to form Lexalytics Limited</a>.   The deal seems to have a screwy financial structure &#8212; which Seth Grimes made <a href="http://www.intelligententerprise.com/blog/archives/2008/08/lexalytics_and.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.intelligententerprise.com');">a valiant effort to decipher</a> (I think from vacation, poor guy) &#8212; as is common when companies much too small to be public wind up trading publicly anyway.</p>
<p><em><strong>Related links</strong></em></p>
<ul>
<li><a href="http://www.lexalytics.com/lexablog/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.lexalytics.com');">Lexalytics&#8217; blog</a></li>
<li><a href="http://www.texttechnologies.com/2008/06/17/intro-to-lexalytics/" >Introduction to Lexalytics</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2008/08/07/lexalytics-has-merged-with-part-of-infonics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>If you think sentiment analysis technology can detect idiom, I have a bridge I&#8217;d like to sell you</title>
		<link>http://www.texttechnologies.com/2008/06/20/if-you-think-sentiment-analysis-technology-can-detect-idiom-i-have-a-bridge-id-like-to-sell-you/</link>
		<comments>http://www.texttechnologies.com/2008/06/20/if-you-think-sentiment-analysis-technology-can-detect-idiom-i-have-a-bridge-id-like-to-sell-you/#comments</comments>
		<pubDate>Fri, 20 Jun 2008 11:40:52 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Lexalytics]]></category>
		<category><![CDATA[Sentiment analysis]]></category>
		<category><![CDATA[Text mining]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=254</guid>
		<description><![CDATA[Text mining tools are just WONDERFUL at detecting idiom, sarcasm, and figurative speech &#8230; Yeah, right.  I asked Lexalytics CEO Jeff Catlin whether his tool could do that kind of thing, and he looked at me like I&#8217;d just grown a third ear. 
Actually, he didn&#8217;t.  But just like every other sentiment analysis [...]]]></description>
			<content:encoded><![CDATA[<p><em>Text mining tools are just WONDERFUL at detecting idiom, sarcasm, and figurative speech &#8230; Yeah, right.  I asked Lexalytics CEO Jeff Catlin whether his tool could do that kind of thing, and he looked at me like I&#8217;d just grown a third ear. </em></p>
<p>Actually, he didn&#8217;t.  But just like every other sentiment analysis vendor I encountered at the Text Analytics Summit or spoke to beforehand, he made it clear that his tool could only handle straightforward, literal expressions of opinion.  Idiom, irony, sarcasm, metaphor, et al. are beyond the current reach of the technology.</p>
<p><em>Aren&#8217;t you just thrilled that I shared that earth-shattering news with you?</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2008/06/20/if-you-think-sentiment-analysis-technology-can-detect-idiom-i-have-a-bridge-id-like-to-sell-you/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>6 trends that could shake up the text analytics market</title>
		<link>http://www.texttechnologies.com/2008/06/19/6-trends-that-could-shake-up-the-text-analytics-market/</link>
		<comments>http://www.texttechnologies.com/2008/06/19/6-trends-that-could-shake-up-the-text-analytics-market/#comments</comments>
		<pubDate>Thu, 19 Jun 2008 08:33:31 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[BI integration]]></category>
		<category><![CDATA[Enterprise search]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Search engines]]></category>
		<category><![CDATA[Social software and online media]]></category>
		<category><![CDATA[Text mining]]></category>
		<category><![CDATA[Cache']]></category>
		<category><![CDATA[Intersystems]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=251</guid>
		<description><![CDATA[My last two posts were based on the introductory slide to my talk The Text Analytics Marketplace: Competitive landscape and trends. I&#8217;ll now jump straight ahead to the talk&#8217;s conclusion.
Text analytics vendors participate in the same trends as other software and technology vendors.  For example, relational business intelligence and data warehousing products are increasingly [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in; font-style: normal;"><span>My <a href="http://www.texttechnologies.com/2008/06/19/text-analytics-marketplace-competitive-landscape-trends/" >last</a> <a href="http://www.texttechnologies.com/2008/06/19/3-specialized-markets-for-text-analytics/" >two</a> posts were based on the introductory slide to my talk </span><em><span>The Text Analytics Marketplace: Competitive landscape and trends. </span></em><span style="font-style: normal;"><span>I&#8217;ll now jump straight ahead to the talk&#8217;s conclusion.</span></span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span style="font-style: normal;"><span>Text analytics vendors participate in the same trends as other software and technology vendors.  For example, relational business intelligence and data warehousing products are increasingly being sold to departmental buyers.  Those buyers place particularly high value on ease of installation.  And golly gee whiz, both parts of that are also true in text mining. </span></span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span style="font-style: normal;"><span>But beyond such general trends, I&#8217;ve identified six developments that I think could radically transform the text analytics market landscape.  Indeed, they could invalidate the neat little eight-bucket categorization I laid out in the prior post.  Each is highly likely to occur, although in some cases the timing remains greatly in doubt.</span></span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span style="font-style: normal;"><span>These six market-transforming trends are:</span></span></p>
<ol>
<li> Web/enterprise/messaging 	integration</li>
<li> BI 	integration</li>
<li> Universal 	message retention</li>
<li> Portable 	personal profiles</li>
<li> Electronic 	health records</li>
<li> Voice 	command &amp; control</li>
</ol>
<p style="margin-bottom: 0in; font-style: normal;"><span id="more-251"></span><span>I&#8217;ll explain briefly.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>1.  Google and Microsoft are two of the three leaders in web search.  Now that Microsoft has bought FAST, they are also two of the leaders in enterprise search.  They are also two of the leaders in hosted email. Ditto instant messaging.  So </span><strong>there&#8217;s a good chance these various disciplines will converge.</strong></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>2.  There are a number of ways text analytics and traditional analytics can and are being integrated:</span></p>
<ul>
<li><span>Enterprise 	search and business intelligence are akin; both involve digging 	information out of the data you already have.</span></li>
<li><span>Text 	mining is naturally integrated with business intelligence and/or 	data mining.</span></li>
<li><span>There&#8217;s 	a trend toward using text search to dig up business intelligence 	documents such as specific reports, spreadsheets, etc.</span></li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;"><span>To date the latter is focused on reports that already exist, rather than queries that could be run on the fly, but I hope and trust the technology will be extended over time.  Natural language queries have merit anyway; </span><strong>I&#8217;d like to see the search box be extended in functionality to a true data-retrieval command line.</strong></p>
<p style="margin-bottom: 0in; font-style: normal;">3.  One of the big purchase drivers of storage, search, and clustering technology is mandates to preserve information and make it available to auditors, regulators, and/or people who want to sue you.  Email in particular is changing from being ephemeral to becoming part of the permanent record.  Well, if the information is being retained anyway, then maybe it&#8217;s time to see how to get useful insight from it.</p>
<p style="margin-bottom: 0in; font-style: normal;"><strong>Right now, a company&#8217;s overall text archives aren&#8217;t being leveraged in the same way data warehouses are.  That will change.</strong></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>4.  For over a decade, online companies have fought to exploit the fact that users were registered with their sites or services, but not with others.  Huge amounts of investment money were wasted in the dot-com bubble because people thought “registered users” was a significant metric, or that ISP subscribers could be directed to proprietary content.  Enormous valuations are being assigned to Facebook and LinkedIn on similar theories today.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>But as site owners and other marketers get ever more aggressive about exploiting user-specific information, users will get ever more sophisticated about controlling it. </span><strong>The obvious solution is for each internet user to control a sophisticated database of their contact information, presence information, actions, preferences, and writings, and to be very selective about which online services are allowed to see which portions of the data. </strong><span>I think that will come about some day, but I don&#8217;t know when.  When it does, text analytics will be affected in a variety of interesting ways.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>5.  Electronic health records are almost unique in IT.  What other enterprise app can you think of for which relational DBMS aren&#8217;t the default underpinning?  (Intersystems&#8217; object-oriented DBMS Cache&#8217; has huge share in the clinical records market.)   Normal tabular data, text, images, sensor output streams – health records have it all.  What&#8217;s more, the health records area is coming upon some very interesting times in the area of data sharing, at least in the US.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>Just as retailing went from being an IT backwater (through the mid-1980s), to a sophisticated user of database technology (1990s), to the leader of the internet revolution (rise of e-commerce), </span><strong>I think health care is due to take a leadership role in IT advances</strong><span>.   And when it does, search, text mining, and voice recognition will all play important roles.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>6.  Most people reading this far have probably watched Star Trek.</span><strong> Well, what is keeping us from being able to command computers in a Star Trek fashion?  Not really that much. </strong><span> Sure, there are some big missing pieces.  We need a mapping from commands to the specific applications that would carry them out.  We also need a more structured kind of analytic middle tier so that there&#8217;s something to map questions to.  But those are solvable problems.  And by the way – when everybody wears headphones, voice commands emanating from the next cubicle are no longer the big annoyance they would be today.  Mobile/small devices only add to the business case for voice recognition advances.</span></p>
<p style="margin-bottom: 0in; font-style: normal;"><span>When voice becomes a primary mode of human/device communication, “text” analytics will be affected in any number of ways.</span></p>
<p style="margin-bottom: 0in;"><em><strong>Related links:</strong></em></p>
<ul>
<li><a href="http://www.texttechnologies.com/2008/06/19/text-analytics-marketplace-competitive-landscape-trends/" >The introductory post in this series</a></li>
<li><a href="http://www.texttechnologies.com/2008/02/03/microsoft-yahoo-synergies/" >19 possible Microsoft/Yahoo synergies</a>, many of them related to text technology convergence, e.g. between web search and enterprise search</li>
<li>The compelling case for <a href="http://www.monashreport.com/2008/01/04/early-thoughts-on-outsourcing-to-google-mail/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.monashreport.com');">letting Google handle your enterprise email</a></li>
<li>An old post on <a href="http://www.texttechnologies.com/2006/09/01/why-the-bi-vendors-are-integrating-with-google-onebox/" >why BI vendors flocked to integrate with Google OneBox</a></li>
<li>A proposal to <a href="http://www.texttechnologies.com/2007/02/06/what-is-linkedin-needed-for-absolutely-nothing-and-the-same-goes-for-myspace/" >refactor social networks</a></li>
<li>An old post in which I outlined some of the criteria for <a href="http://www.dbms2.com/2005/11/17/native-xml-storage-part-2-apps/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.dbms2.com');">Profiles 2.0</a></li>
<li><a href="http://www.networkworld.com/community/node/29109" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.networkworld.com');">Why text technologies are going to recombine</a> (in <em>A World of Bytes</em>)</li>
</ul>
<p style="margin-bottom: 0in; font-style: normal;">
<p style="margin-bottom: 0in; font-style: normal;">
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2008/06/19/6-trends-that-could-shake-up-the-text-analytics-market/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Text Analytics Marketplace: Competitive landscape and trends</title>
		<link>http://www.texttechnologies.com/2008/06/19/text-analytics-marketplace-competitive-landscape-trends/</link>
		<comments>http://www.texttechnologies.com/2008/06/19/text-analytics-marketplace-competitive-landscape-trends/#comments</comments>
		<pubDate>Thu, 19 Jun 2008 07:35:39 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Audio and video search]]></category>
		<category><![CDATA[BI integration]]></category>
		<category><![CDATA[Custom publishing]]></category>
		<category><![CDATA[Enterprise search]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Natural language processing (NLP)]]></category>
		<category><![CDATA[Nuance]]></category>
		<category><![CDATA[Progress and EasyAsk]]></category>
		<category><![CDATA[Search engines]]></category>
		<category><![CDATA[Social software and online media]]></category>
		<category><![CDATA[Spam and antispam]]></category>
		<category><![CDATA[Speech recognition]]></category>
		<category><![CDATA[Structured search]]></category>
		<category><![CDATA[Text Analytics Summit]]></category>
		<category><![CDATA[Text mining]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=249</guid>
		<description><![CDATA[As I see it, there are eight distinct market areas that each depend heavily on linguistic technology. Five are off-shoots of what used to be called “information retrieval”:
1.  Web search
2.  Public-facing site search
3.  Enterprise search and knowledge management
4.  Custom publishing
5.  Text mining and extraction
Three are more standalone:
6.  Spam filtering
7. [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">As I see it, there are eight distinct market areas that each depend heavily on linguistic technology. Five are off-shoots of what used to be called “information retrieval”:</p>
<p style="margin-bottom: 0in; font-style: normal; padding-left: 30px;">1.  Web search</p>
<p style="margin-bottom: 0in; font-style: normal; padding-left: 30px;">2.  Public-facing site search</p>
<p style="margin-bottom: 0in; font-style: normal; padding-left: 30px;">3.  Enterprise search and knowledge management</p>
<p style="margin-bottom: 0in; font-style: normal; padding-left: 30px;">4.  Custom publishing</p>
<p style="padding-left: 30px;">5.  Text mining and extraction</p>
<p style="margin-bottom: 0in; font-style: normal;">Three are more standalone:</p>
<p style="margin-bottom: 0in; font-style: normal; padding-left: 30px;">6.  Spam filtering</p>
<p style="margin-bottom: 0in; font-style: normal; padding-left: 30px;">7.  Voice recognition</p>
<p style="margin-bottom: 0in; font-style: normal; padding-left: 30px;">8.  Machine translation</p>
<p><span id="more-249"></span></p>
<p style="margin-bottom: 0in;">This list comes from a talk I gave Monday at the Text Analytics Summit called <em>The Text Analytics Marketplace: Competitive landscape and trends. </em>In half an hour, I covered the first five areas (in Sue Feldman&#8217;s word, at a “gallop”). The slide deck has been uploaded to the link below.  <span style="font-style: normal;"><span>I plan to break out the material from the talk into a series of blog posts over the next few (or perhaps not-so-few) weeks. </span></span></p>
<p style="margin-bottom: 0in;"><em><strong>Slides:</strong></em></p>
<ul>
<li><a href="http://www.monash.com/Text-analytics-markets-June-2008.ppt " onclick="javascript:pageTracker._trackPageview('/outbound/article/www.monash.com');"><span>The Text Analytics Marketplace: Competitive landscape and trends</span></a></li>
</ul>
<p style="margin-bottom: 0in;"><strong><em>Other posts based on those slides:</em></strong></p>
<ul>
<li><span><a href="http://www.texttechnologies.com/2008/06/19/3-specialized-markets-for-text-analytics/" >Three specialized markets for text analytics</a> (based on Slide 2)</span></li>
<li><span><a href="http://www.texttechnologies.com/2008/06/19/6-trends-that-could-shake-up-the-text-analytics-market/" >6 trends that could shake up the text analytics market</a> (based on Slide 19)</span></li>
<li><span><a href="(in A World of Bytes)">Why search technologies are going to recombine</a> (in <em>A World of Bytes</em>, based on Slide 19)<br />
</span></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2008/06/19/text-analytics-marketplace-competitive-landscape-trends/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>SPSS update</title>
		<link>http://www.texttechnologies.com/2008/06/17/spss-update/</link>
		<comments>http://www.texttechnologies.com/2008/06/17/spss-update/#comments</comments>
		<pubDate>Tue, 17 Jun 2008 06:51:45 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[SPSS]]></category>
		<category><![CDATA[Text Analytics Summit]]></category>
		<category><![CDATA[Text mining]]></category>
		<category><![CDATA[Voice of the Customer]]></category>
		<category><![CDATA[Attensity]]></category>
		<category><![CDATA[Clarabridge]]></category>
		<category><![CDATA[data mining]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=245</guid>
		<description><![CDATA[I emailed a bit with Olivier Jouve last week, and chatted with him at the Text Analytics Summit yesterday.  He cited a figure of 2400 SPSS text mining users (unique user organizations).  The majority of these are for a low-cost, desktop-based surveys product.  But when I pressed him, he eventually gave a [...]]]></description>
			<content:encoded><![CDATA[<p>I emailed a bit with Olivier Jouve last week, and chatted with him at the Text Analytics Summit yesterday.  He cited a figure of 2400 SPSS text mining users (unique user organizations).  The majority of these are for a low-cost, desktop-based surveys product.  But when I pressed him, he eventually gave a 500-1000 figure for actual Text Mining For Clementine users.<span id="more-245"></span></p>
<p>That is, of course, hugely more than any of the independents (e.g. Attensity and Clarabridge) have.  And it&#8217;s focused on marketing-oriented apps &#8212; especially Voice of the Customer &#8212; just as those vendors are.  Even so, they report rarely seeing SPSS, and SPSS agrees with that assessment.</p>
<p>The obvious explanation &#8212; which Olivier does not dispute &#8212; is that Text Mining For Clementine sales are focused on Clementine data mining users.  But that raises an interesting follow-up &#8212; how much data mining are these users really doing on text data?  Attensity and Clarabridge customers do little true data mining, but Olivier asserts that SPSS customers do quite a bit &#8212; predictive modeling, real-time scoring, and the whole enchilada.</p>
<p>By the way, Olivier actually no longer runs SPSS&#8217; text mining business.  He&#8217;s moved to Chicago as VP of Corporate Development, focused on acquisitions.  Coincidentally, he has a glum view of the prospects for independent text analytics companies, and believes the best course for them is to be acquired.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2008/06/17/spss-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>TEMIS tidbits</title>
		<link>http://www.texttechnologies.com/2008/06/17/temis-tidbits/</link>
		<comments>http://www.texttechnologies.com/2008/06/17/temis-tidbits/#comments</comments>
		<pubDate>Tue, 17 Jun 2008 05:27:59 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Application areas]]></category>
		<category><![CDATA[Competitive intelligence]]></category>
		<category><![CDATA[Expert System S.p.A.]]></category>
		<category><![CDATA[Sentiment analysis]]></category>
		<category><![CDATA[TEMIS]]></category>
		<category><![CDATA[Text Analytics Summit]]></category>
		<category><![CDATA[Text mining]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=244</guid>
		<description><![CDATA[The usual TEMIS execs didn&#8217;t make the trip to the Text Analytics Summit this year.  But cofounder Alessandro Zanasi did come, and I chatted with him for a bit.  Alessandro is also author of a recent book on text mining, and pretty much a one-man Italian operation for France-based TEMIS.   Despite [...]]]></description>
			<content:encoded><![CDATA[<p>The usual TEMIS execs didn&#8217;t make the trip to the Text Analytics Summit this year.  But cofounder Alessandro Zanasi did come, and I chatted with him for a bit.  Alessandro is also author of a recent book on text mining, and pretty much a one-man Italian operation for France-based TEMIS.   Despite his nominal 100:1 manpower disadvantage vs. Italian national-champion text anayltics vendor Expert System S.p.A., Alessandro proudly rattled off four different Italian government accounts he&#8217;d won vs. Expert System, all of them apparently in the government area.</p>
<p>Beyond that, Alessandro denies all the rumors that have grown out of TEMIS being hard to reach recently.  He reports that pharma is still TEMIS&#8217;s big market, but stresses that this covers a range of apps, from research to Voice of the Market. I do get the sense that TEMIS&#8217;s sentiment extraction capabilities are less sophisticated than some of the other vendors&#8217; &#8212; but the other vendors I&#8217;m thinking of are pretty focused on English, SPSS aside.  If you need sentiment analysis in non-English languages &#8212; e.g., French or Italian &#8212; TEMIS should definitely be on your vendor shortlist.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2008/06/17/temis-tidbits/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
