<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Text Technologies &#187; Comprehensive or exhaustive extraction</title>
	<atom:link href="http://www.texttechnologies.com/category/text-mining/comprehensive-exhaustive-extraction/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.texttechnologies.com</link>
	<description>Understanding technology ... in both senses of the phrase</description>
	<lastBuildDate>Wed, 18 Jan 2012 17:02:59 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.3</generator>
		<item>
		<title>Maybe text mining SHOULD be playing a bigger role in data warehousing</title>
		<link>http://www.texttechnologies.com/2008/10/24/text-mining-data-warehousin/</link>
		<comments>http://www.texttechnologies.com/2008/10/24/text-mining-data-warehousin/#comments</comments>
		<pubDate>Fri, 24 Oct 2008 04:39:36 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Attensity]]></category>
		<category><![CDATA[Comprehensive or exhaustive extraction]]></category>
		<category><![CDATA[Sentiment analysis]]></category>
		<category><![CDATA[Text mining]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=289</guid>
		<description><![CDATA[When I chatted last week with David Bean of Attensity, I commented to him on a paradox: Many people think text information is important to analyze, but even so data warehouses don&#8217;t seem to wind up holding very much of it. My working theory explaining this has two parts, both of which purport to show [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><span style="font-style: normal;">When <a href="http://www.texttechnologies.com/2008/10/24/attensity-update-2/" >I chatted last week with </a></span><a href="http://www.texttechnologies.com/2008/10/24/attensity-update-2/" >David Bean of Attensity</a>, <span style="font-style: normal;">I commented to him on a paradox: </span></p>
<p style="margin-bottom: 0in;"><strong><span style="font-style: normal;">Many people think text information is important to analyze, but even so data warehouses don&#8217;t seem to wind up holding very much of it. </span></strong></p>
<p style="margin-bottom: 0in;"><span id="more-289"></span><span style="font-style: normal;">My working theory explaining this has two parts, both of which purport to show why text data generally doesn&#8217;t fit well into BI or data mining systems. One is that it&#8217;s just too messy and inconsistently organized.  The other </span><span style="font-style: normal;"><span>is that text corpuses generally don&#8217;t contain enough information.</span></span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;"><span>Now, I know that these theories aren&#8217;t wholly true, for I know of counterexamples.  E.g., while I&#8217;ve haven&#8217;t written it up yet, I did a call confirming that a recently published </span></span><a href="http://www.spss.com/press/template_view.cfm?PR_ID=1059" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.spss.com');"><span>SPSS text/tabular integrated data mining story</span></a><span style="font-style: normal;"><span> is quite real.  Still, it has felt for a while as if truth lies in those directions.</span></span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;"><span>Anyhow, David offered one useful number range:</span></span></p>
<p><span style="font-style: normal;"><strong>If you do exhaustive extraction on a text corpus, you wind up with 10-20X as much tabular data as you had in text format in the first place.</strong></span><span style="font-style: normal;"><span> (Comparing total bytes to total bytes.)</span></span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;"><span>So how big are those corpuses? I think most text mining installations usually have at least 10s of thousands of documents or verbatims to play with.  Special cases aside, the upper bound seems to usually be about two orders of magnitude higher. And most text-mined documents probably tend to be short, as they commonly are just people&#8217;s reports on a single product/service experience – perhaps 1 KB or so, give or take a factor of 2-3?  So we&#8217;re probably looking at 10 gigabytes of text at the low end, and a few terabytes at the high end, before applying David&#8217;s 10-20X multiplier.</span></span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;"><span>Hmm – that IS enough data for respectable data warehousing &#8230;</span></span></p>
<p style="margin-bottom: 0in;"><span style="font-style: normal;"><span>Obviously, special cases like national intelligence or very broad-scale web surveys could run larger, as per <a href="http://www.dbms2.com/2008/10/05/marklogic-architecture-deep-dive/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.dbms2.com');">the biggest Marklogic databases</a>.  Medline runs larger too.</span></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2008/10/24/text-mining-data-warehousin/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>5 ideas for how to pick between Attensity and Clarabridge</title>
		<link>http://www.texttechnologies.com/2008/06/10/attensity-clarabridge/</link>
		<comments>http://www.texttechnologies.com/2008/06/10/attensity-clarabridge/#comments</comments>
		<pubDate>Tue, 10 Jun 2008 23:43:51 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Attensity]]></category>
		<category><![CDATA[Clarabridge]]></category>
		<category><![CDATA[Competitive intelligence]]></category>
		<category><![CDATA[Comprehensive or exhaustive extraction]]></category>
		<category><![CDATA[Text mining]]></category>
		<category><![CDATA[Voice of the Customer]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/?p=236</guid>
		<description><![CDATA[Jim D. of UPS asked in the comment thread to the recent Attensity update post how one should decide between Attensity and Clarabridge. I wrote an answer, and then decided to just split it out in a separate post. Here are five ideas about how to pick between Attensity and Clarabridge for the kind of [...]]]></description>
			<content:encoded><![CDATA[<p>Jim D. of UPS asked in the comment thread to the recent <a href="http://www.texttechnologies.com/2008/06/10/attensity-update/" >Attensity update</a> post how one should decide between Attensity and Clarabridge.  I wrote an answer, and then decided to just split it out in a separate post.  Here are five ideas about how to pick between Attensity and Clarabridge for the kind of Voice of the Customer/Market application both companies are focusing on.</p>
<p>1.  Attensity is the older company than Clarabridge, and is good at more things.  Is Clarabridge really good at everything you want them to be?</p>
<p>2.  In particular, Attensity has more overall sophistication at linguistic extraction.  Do any of the differences matter to you?</p>
<p>3.  Both companies are working hard on ease of use, for multiple kinds of user (business user tweaking linguistic rules, IT user, etc.).  Whose approach and feature set do you like better?</p>
<p>4.  Usually, buying one of these products involves some professional services.  Whose organization do you like better?</p>
<p>5. Attensity&#8217;s default database schema for its exhaustive extraction is pretty flat and normalized, as befits a happy Teradata partner.  Clarabridge&#8217;s is more of a star schema, as befits a bunch of ex-Microstrategy guys.  Either can be straightforwardly translated into the other, so you may not care &#8212; but do you?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2008/06/10/attensity-clarabridge/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Clarabridge does SaaS, sees Inxight</title>
		<link>http://www.texttechnologies.com/2007/11/14/clarabridge-saas-inxight-uima-ibm-cognos/</link>
		<comments>http://www.texttechnologies.com/2007/11/14/clarabridge-saas-inxight-uima-ibm-cognos/#comments</comments>
		<pubDate>Wed, 14 Nov 2007 18:11:28 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[BI integration]]></category>
		<category><![CDATA[Clarabridge]]></category>
		<category><![CDATA[Comprehensive or exhaustive extraction]]></category>
		<category><![CDATA[IBM and UIMA]]></category>
		<category><![CDATA[Software as a Service (SaaS)]]></category>
		<category><![CDATA[Text mining]]></category>
		<category><![CDATA[Text mining SaaS]]></category>
		<category><![CDATA[business intelligence]]></category>
		<category><![CDATA[Business Objects]]></category>
		<category><![CDATA[Inxight]]></category>
		<category><![CDATA[software as a service]]></category>
		<category><![CDATA[uima]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/2007/11/14/clarabridge-saas-inxight-uima-ibm-cognos/</guid>
		<description><![CDATA[I just had a quick chat with text mining vendor Clarabridge&#8217;s CEO Sid Banerjee. Naturally, I asked the standard “So who are you seeing in the marketplace the most?” question. Attensity is unsurprisingly #1. What&#8217;s new, however, is that Inxight – heretofore not a text mining presence vs. commercially-focused Clarabridge – has begun to show [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in">I just had a quick chat with text mining vendor Clarabridge&#8217;s CEO Sid Banerjee.  Naturally, I asked the standard “So who are you seeing in the marketplace the most?” question.  Attensity is unsurprisingly #1.  What&#8217;s new, however, is that Inxight – heretofore not a text mining presence vs. commercially-focused Clarabridge – has begun to show up a bit this quarter, via the Business Objects sales force.  Sid was of course dismissive of their current level of technological readiness and integration – but at least BOBJ/Inxight is showing up now.</p>
<p style="margin-bottom: 0in">The most interesting point was text mining SaaS (Software as a Service).  When Clarabridge first put out its “<a href="http://www.clarabridge.com/PressRelease/tabid/87/Default.aspx?&amp;PressReleaseID=200" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.clarabridge.com');">We offer SaaS now</a>!” announcement, I yawned.  But Sid tells me that about half of Clarabridge&#8217;s deals now are actually SaaS.  The way the SaaS technology works is pretty simple.  The customer gathers together text into a staging database – typically daily or weekly – and it gets sucked into a Clarabridge-managed Clarabridge installation in some high-end SaaS data center.  If there&#8217;s a desire to join the results of the text analysis with some tabular data from the client&#8217;s data warehouse, the needed columns get sent over as well.  And then Clarabridge does its thing. <span id="more-139"></span></p>
<p style="margin-bottom: 0in">It has always been the case that business intelligence was an IT systems software technology that often wound up being sold on an application basis to end-user departments.  Clarabridge very much fits that model.  And while it used to be the case that BI adoption was pretty simple, that&#8217;s increasingly not the case, which is one reason SaaS is appealing.  So this all makes a lot of sense.</p>
<p style="margin-bottom: 0in">Even so, I was surprised to hear that SaaS had so quickly become half of Clarabridge&#8217;s business.  Wow.</p>
<p style="margin-bottom: 0in">Since Clarabridge touts Cognos as an important partner, and <a href="http://www.texttechnologies.com/2007/11/12/everybodys-talking-about-structuredunstructured-integration/" >Cognos is being bought by IBM</a>, I also asked Sid about UIMA.   He basically responded that UIMA was unlikely to become relevant to Clarabridge any time soon, because the way Clarabridge interfaces with other software is SQL.  Up to a point, that makes great sense to me.  But if we buy into the comprehensive/exhaustive extraction story &#8212; as Clarabridge does &#8212; then the day should and will come when serious linguistic processing gets done on text <strong>after</strong> it is extracted into a relational database.   And if that happens, then all of a sudden SQL won&#8217;t be the only interface integrating text analytics with BI.</p>
<p><em></em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2007/11/14/clarabridge-saas-inxight-uima-ibm-cognos/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Clarabridge approach to text mining</title>
		<link>http://www.texttechnologies.com/2007/10/06/the-clarabridge-approach-to-text-mining/</link>
		<comments>http://www.texttechnologies.com/2007/10/06/the-clarabridge-approach-to-text-mining/#comments</comments>
		<pubDate>Sun, 07 Oct 2007 00:14:23 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[BI integration]]></category>
		<category><![CDATA[Clarabridge]]></category>
		<category><![CDATA[Comprehensive or exhaustive extraction]]></category>
		<category><![CDATA[Ontologies]]></category>
		<category><![CDATA[Text mining]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/2007/10/06/the-clarabridge-approach-to-text-mining/</guid>
		<description><![CDATA[And for my sixth text mining post this weekend, here are some highlights of the Clarabridge technology story. (Sorry if it sounds clipped, but I&#8217;m a bit burned out &#8230;) Like Attensity, Clarabridge practices exhaustive extraction.* That is, they do linguistics against documents, extract all sorts of entities and relationships among the entities from each [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in">And for my sixth text mining post this weekend, here are some highlights of the Clarabridge technology story.  (Sorry if it sounds clipped, but I&#8217;m a bit burned out &#8230;)</p>
<ul>
<li>Like Attensity, Clarabridge practices <em>exhaustive extraction.*  </em>That is, they do linguistics against documents, extract all sorts of entities and relationships among the entities from each document, and dump the results into a relational database.</li>
<li>Unlike Attensity, which uses <a href="http://www.texttechnologies.com/2006/06/24/attensity-extractive-exhaustion-and-the-frn/" >a simple normalized relational schema</a>, Clarabridge dumps the extracted data into a star schema.  (The Clarabridge folks are from Microstrategy, which – surely not coincidentally – also favors star schemas.)<span id="more-132"></span></li>
<li>For now, the linguistic part of the analysis is within a sentence, or else based on proximity, or (this sounded minor) based on the whole document.   But actual <em><a href="http://en.wikipedia.org/wiki/Anaphora_(linguistics)" onclick="javascript:pageTracker._trackPageview('/outbound/article/en.wikipedia.org');">anaphora</a> resolution</em> is coming soon.</li>
<li>The other big thing that goes into Clarabridge&#8217;s star schema is a category hierarchy, which has two aspects.  One is categories fixed in advance.  When I asked how many, CTO Justin Langseth cited an example range of 10-400.  I.e., it varies widely.  In principle, these are established by line-of-business folks at Clarabridge customers, but I&#8217;d venture to guess that professional services play a significant role as well.</li>
<li>The other kind of categories – subcategories to the first group – are created automagically at data load time via document clustering.  Indeed, they&#8217;re called “clusters.” These are available for drilldown via business intelligence tools.</li>
<li>Obviously it is good practice to have dashboards and scheduled reports depend only on the fixed categories, not the clusters.</li>
</ul>
<p><em>*I should note that Clarabridge understandably bristles a bit at my use of this Attensity-introduced term to describe what they do too. If Clarabridge wants to start talking about, say, “comprehensive extraction, I&#8217;ll consider adopting that term as well. But for now I&#8217;m going with what&#8217;s most widely used.</em></p>
<p><em>Want to continue getting great research about text mining, data warehouse appliances, and other hot analytics-related topics? Then subscribe to our comprehensive (if not exhaustive) <a href="http://www.monash.com/blogs.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.monash.com');">feed</a>, by RSS/Atom or e-mail! We recommend taking the integrated feed for all our blogs, but blog-specific ones are also easily available.</em></p>
<p style="margin-bottom: 0in"><em><p>Technorati Tags: <a href="http://technorati.com/tag/Clarabridge" onclick="javascript:pageTracker._trackPageview('/outbound/article/technorati.com');" rel="tag">Clarabridge</a>, <a href="http://technorati.com/tag/text+mining" onclick="javascript:pageTracker._trackPageview('/outbound/article/technorati.com');" rel="tag"> text mining</a>, <a href="http://technorati.com/tag/exhaustive+extraction" onclick="javascript:pageTracker._trackPageview('/outbound/article/technorati.com');" rel="tag"> exhaustive extraction</a></p></em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2007/10/06/the-clarabridge-approach-to-text-mining/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>When to use exhaustive extraction</title>
		<link>http://www.texttechnologies.com/2007/10/05/when-to-use-exhaustive-extraction/</link>
		<comments>http://www.texttechnologies.com/2007/10/05/when-to-use-exhaustive-extraction/#comments</comments>
		<pubDate>Sat, 06 Oct 2007 00:54:52 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Attensity]]></category>
		<category><![CDATA[Clarabridge]]></category>
		<category><![CDATA[Comprehensive or exhaustive extraction]]></category>
		<category><![CDATA[Text mining]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/2007/10/05/when-to-use-exhaustive-extraction/</guid>
		<description><![CDATA[I&#8217;ve been emailing and/or talking with both Clarabridge and Attensity this week. Since they&#8217;re the two big proponents of exhaustive extraction, I naturally asked whether there are any cases exhaustive extraction should not be used. In Clarabridge&#8217;s case, it turns out exhaustive extraction is the default, and no customer has ever turned this default off. [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been emailing and/or talking with both Clarabridge and Attensity this week.   Since they&#8217;re the two big proponents of <a href="http://www.texttechnologies.com/2006/06/24/attensity-extractive-exhaustion-and-the-frn/" >exhaustive extraction</a>, I naturally asked whether there are any cases exhaustive extraction should not be used.   In Clarabridge&#8217;s case, it turns out exhaustive extraction is the default, and no customer has ever turned this default off.   However, their current high end is several million documents* per year.  They suspect that in some current projects with much higher volumes the default may finally be turned off.<span id="more-129"></span></p>
<p><em>*Actually, the word Clarabridge CTO Justin Langseth used was &#8220;verbatim.&#8221;  But that&#8217;s essentially a synonym for document, only with the connotation that these documents will probably be people&#8217;s statements (think warranty cards, customer surveys, email, call center notes, etc.), with all that implies for their grammar, structure (or lack thereof), and so on. </em></p>
<p>I didn&#8217;t push Attensity for an answer that clear.   What they said was simply that all their capabilities were integrated together, so everybody uses exhaustive extraction.  I imagine they&#8217;d say something similar, but it seems I should follow up a little bit further &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2007/10/05/when-to-use-exhaustive-extraction/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>David Bean of Attensity explains sentiment and other qualifiers</title>
		<link>http://www.texttechnologies.com/2007/10/05/david-bean-of-attensity-explains-sentiment-and-other-qualifiers/</link>
		<comments>http://www.texttechnologies.com/2007/10/05/david-bean-of-attensity-explains-sentiment-and-other-qualifiers/#comments</comments>
		<pubDate>Sat, 06 Oct 2007 00:36:38 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Attensity]]></category>
		<category><![CDATA[Comprehensive or exhaustive extraction]]></category>
		<category><![CDATA[Sentiment analysis]]></category>
		<category><![CDATA[Text mining]]></category>
		<category><![CDATA[Voice of the Customer]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/2007/10/05/david-bean-of-attensity-explains-sentiment-and-other-qualifiers/</guid>
		<description><![CDATA[David Bean of Attensity is rightly one of the most popular explainers of text mining, for his clarity and personality alike. I shot a question to him about how Attensity&#8217;s exhaustive extraction strategy handled sentiment and so on. He responded with an email that contains the best overall explanation of sentiment analysis in text mining [...]]]></description>
			<content:encoded><![CDATA[<p>David Bean of Attensity is rightly one of the most popular explainers of text mining, for his clarity and personality alike.   I shot a question to him about how Attensity&#8217;s<a href="http://www.texttechnologies.com/2006/06/24/attensity-extractive-exhaustion-and-the-frn/" > exhaustive extraction</a> strategy handled sentiment and so on.  He responded with an email that contains the best overall explanation of sentiment analysis in text mining I&#8217;ve seen anywhere.   Naturally, this is rolled into an Attensity-specific worldview and sales pitch &#8212; but so what?<span id="more-128"></span></p>
<blockquote>
<p style="margin-bottom: 0in">Our exhaustive extraction approach doesn&#8217;t compromise detection of qualifiers* because we recognize the qualifications while we have access to the complete linguistic information of the input.  Much of that information is later stripped away, since it&#8217;s way more information than a user would want.  We make sure we project qualifications like you mention in the final representations.  In fact, we&#8217;ve put a lot of effort into recognizing &#8220;voicing,&#8221; i.e. distinguishing among negations, conditional statements, and variations in the degree of sentiment.</p>
<p style="margin-bottom: 0in">Examples will help here:</p>
<p style="margin-bottom: 0in">    (1)     I want to return the espresso machine.  (intention to<br />
return)</p>
<p style="margin-bottom: 0in">    (2)     I plan on returning the espresso machine. (intention to<br />
return)</p>
<p style="margin-bottom: 0in">    (3)     I won&#8217;t return the espresso machine. (negation &#8211; not a<br />
return)
</p>
<p style="margin-bottom: 0in">    (4)     I returned no espresso machines. (negation &#8211; not a return)</p>
<p style="margin-bottom: 0in">    (5)     I failed to return the espresso machine.  (negation &#8211; not a return)</p>
<p style="margin-bottom: 0in">    (6)     If you don&#8217;t return my phone call, I will return the espresso machine.  (conditional &#8211; threat to return)</p>
<p style="margin-bottom: 0in">    (7)     I&#8217;ve returned espresso machines twice already.  (recurrence &#8211; repeated returns)</p>
<p style="margin-bottom: 0in">    (8)     I tried to return the espresso machine.  (attempt to return, negation &#8211; not a return)</p>
<p style="margin-bottom: 0in">    (9)     I failed to return the espresso machine.  (failed attempt, negation &#8211; not a return)</p>
<p style="margin-bottom: 0in">    (10)    I refuse to return the espresso machine.  (negation &#8211; not a return)</p>
<p style="margin-bottom: 0in">    (11)    I need to return the espresso machine now/asap.  (urgency)</p>
<p style="margin-bottom: 0in">    (12)    I&#8217;m unhappy.  (unhappy, duh)</p>
<p style="margin-bottom: 0in">    (13)    I&#8217;m really unhappy.  (augmented unhappiness)</p>
<p style="margin-bottom: 0in">    (14)    The tires were over-inflated.  (augmented inflation&#8230;works on non-sentiment qualities too)</p>
<p style="margin-bottom: 0in">    (15)    The breakfast was under-cooked.  (diminished)</p>
<p style="margin-bottom: 0in">    (16)    The water in the shower this morning was way too cold. (augmented coldness)</p>
<p style="margin-bottom: 0in">    (17)    I will speak to the customer about returning the espresso machine.  (indefinite &#8211; not a return, yet)</p>
<p style="margin-bottom: 0in">&nbsp;</p>
<p style="margin-bottom: 0in">If we&#8217;re using our Fact Relationship Network style of extraction to look at these sentences, those voicing variations get represented on the mode* (typically), so you&#8217;d see output like:</p>
<p>return (intent)<br />
return (not)<br />
return (if/then)<br />
return (again)<br />
return (urgency)<br />
happy (not)<br />
happy (not, augmented)<br />
cooked (diminished)<br />
cold (augmented)</p></blockquote>
<p style="margin-bottom: 0in"><em>*Editor&#8217;s note:  “Mode” means, in effect, “behavior or action.”  It&#8217;s not a typo for “node.”</em></p>
<blockquote>
<p style="margin-bottom: 0in">Post-extraction, any of these voicings can be used to roll up several FRN extractions into a collection that makes sense to the business, e.g. &#8220;water | cold (augmented)&#8221; and &#8220;water | hot (not).&#8221;  What makes all that possible is that the core engine has access to a great deal of linguistic information before it turns the extraction into a specific type of representation like an FRN.  Such linguistic information includes the notions of negating verbs (failed to &lt;x&gt;), double negatives, negative quantifiers that transfer their negation to the verb (no animals were harmed&#8230;), adverbial prepositional phrases (I returned the espresso machine in a fit of rage.) and so on.  We think that&#8217;s a big deal &#8211; it lets us get a true count of, in these examples, product returns &#8211; not the returns of phone calls, or the threatened returns, the intentional returns, or the non-returns.  We used this kind of distinctive power to show a retailer how they could identify customers who were threatening to return products, thereby detecting a set of product recalls that could be saved (before they ended up costing the retailer $$$).</p>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2007/10/05/david-bean-of-attensity-explains-sentiment-and-other-qualifiers/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Clarabridge takes on Attensity</title>
		<link>http://www.texttechnologies.com/2007/03/26/clarabridge-takes-on-attensity/</link>
		<comments>http://www.texttechnologies.com/2007/03/26/clarabridge-takes-on-attensity/#comments</comments>
		<pubDate>Tue, 27 Mar 2007 00:36:38 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Attensity]]></category>
		<category><![CDATA[BI integration]]></category>
		<category><![CDATA[Clarabridge]]></category>
		<category><![CDATA[Comprehensive or exhaustive extraction]]></category>
		<category><![CDATA[Text mining]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/2007/03/26/clarabridge-takes-on-attensity/</guid>
		<description><![CDATA[Text mining newbie Clarabridge gave me the all-too-customary “Please let us brief you, but then don’t write about it for a while” routine. Now that it’s OK to post, what I’m up for offering is a few salient points in bullet form. The closest analogy to what Clarabridge does is Attensity’s new(ish) strategy – extract [...]]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal">Text mining newbie <a href="http://www.clarabridge.com/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.clarabridge.com');">Clarabridge</a> gave me the all-too-customary “Please let us brief you, but then don’t write about it for a while” routine.  Now that it’s OK to post, what I’m up for offering is a few salient points in bullet form.</p>
<ul>
<li>The closest analogy to what Clarabridge does is <a href="http://www.texttechnologies.com/2006/06/24/attensity-extractive-exhaustion-and-the-frn/" >Attensity’s new(ish) strategy</a> – extract “facts” from documents and dump them into a relational database management system.  In particular, Clarabridge and Attensity alike make the case “Our categorization is more flexible because it’s applied only after the extraction happens.”</li>
<li>Clarabridge’s sweet spot is extracting user opinions from short documents. E.g., the customer uses cases they talk about are customer feedback forms, public blog postings, etc. about A. hotels and B. consumer software products.</li>
<li>Clarabridge has a strong business intelligence mentality, describing the product as “ETL for unstructured data.”  But then, it’s spun out of <a href="http://www.claraview.com/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.claraview.com');">a BI consultancy</a> that itself was founded by <a href="http://www.microstrategy.com/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.microstrategy.com');">Microstrategy</a> veterans.</li>
<li>Clarabridge uses a different database schema than Attensity.  Attensity’s fact-relationship network (FRN) is basically just two thin, long tables.  Clarabridge, however, uses a Microstrategy-like star schema, in which different kinds of things that you can tokenize correspond to different dimensions.</li>
</ul>
<p class="MsoNormal">Frankly, if somebody wants an alternative to the Attensity/Teradata/Business Objects partnership they could do worse than talk with Clarabridge.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2007/03/26/clarabridge-takes-on-attensity/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Attensity, extractive exhaustion, and the FRN</title>
		<link>http://www.texttechnologies.com/2006/06/24/attensity-extractive-exhaustion-and-the-frn/</link>
		<comments>http://www.texttechnologies.com/2006/06/24/attensity-extractive-exhaustion-and-the-frn/#comments</comments>
		<pubDate>Sun, 25 Jun 2006 02:40:27 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Attensity]]></category>
		<category><![CDATA[BI integration]]></category>
		<category><![CDATA[Comprehensive or exhaustive extraction]]></category>
		<category><![CDATA[Text Analytics Summit]]></category>
		<category><![CDATA[Text mining]]></category>

		<guid isPermaLink="false">http://www.texttechnologies.com/2006/06/24/attensity-extractive-exhaustion-and-the-frn/</guid>
		<description><![CDATA[Two of the clearest and most charismatic speakers in the text mining business are Attensity cofounders Todd Wakefield and David Bean. Last year, Todd’s Text Mining Summit speech gave an excellent overview of the various application areas in which text mining was being adopted; vestiges of that material may be found in a blog post [...]]]></description>
			<content:encoded><![CDATA[<p>Two of the clearest and most charismatic speakers in the text mining business are Attensity cofounders Todd Wakefield and David Bean.  Last year, Todd’s Text Mining Summit speech gave an excellent overview of the various application areas in which text mining was being adopted; vestiges of that material may be found in <a href="http://www.computerworld.com/blogs/node/336" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.computerworld.com');">a blog post</a> I made at the time, and on <a href="http://www.attensity.com/www/solutions/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.attensity.com');">Attensity’s web site</a>.  This time, David’s Text Analytics Summit speech was basically a <a href="http://www.attensity.com/www/news_events/press_releases/061906.php" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.attensity.com');">pitch</a> for Attensity’s latest product release – and it was a pitch well worth hearing.<br />
<span id="more-23"></span></p>
<p class="MsoNormal">The basic story is that selective fact extraction from text is a knowledge-engineering-intensive process.  You need to determine which facts to extract, and then determine how to extract those particular kinds of facts.  So Attensity has a better idea; it will extract all facts, not just some, and dump them in a “fact relationship network” (FRN).  The FRN is two relational tables, one for facts and one for relationships, suitable for copying to a Teradata machine.  Attensity calls this “exhaustive extraction.”</p>
<p class="MsoNormal">To some extent, exhaustive extraction amounts to what in the math biz is called restating the problem.</p>
<ul style="margin-top: 0in;" type="disc">
<li class="MsoNormal">Old      version:  You need to determine      which kinds of facts to get out of the documents, and what those facts      might look like.</li>
<li class="MsoNormal">New      version:  Same two challenges, but      now vis-à-vis the FRN.</li>
</ul>
<p class="MsoNormal">Still, this approach would seem to offer some nice advantages.  Separating the initial extraction from later lexicography is pure goodness, for all the reasons that modularity is generally good.  The same goes for separating the initial extraction from later decisions as to just what information it is you care about anyway.  And generally, this approach should help in applications where somebody might say, in David’s phrase, “I don’t know what I’m looking for, but I’ll know it when I see it.”</p>
<p class="MsoNormal">
]]></content:encoded>
			<wfw:commentRss>http://www.texttechnologies.com/2006/06/24/attensity-extractive-exhaustion-and-the-frn/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>

