David Bean of Attensity is rightly one of the most popular explainers of text mining, for his clarity and personality alike. I shot a question to him about how Attensity’s exhaustive extraction strategy handled sentiment and so on. He responded with an email that contains the best overall explanation of sentiment analysis in text mining I’ve seen anywhere. Naturally, this is rolled into an Attensity-specific worldview and sales pitch — but so what?
Our exhaustive extraction approach doesn’t compromise detection of qualifiers* because we recognize the qualifications while we have access to the complete linguistic information of the input. Much of that information is later stripped away, since it’s way more information than a user would want. We make sure we project qualifications like you mention in the final representations. In fact, we’ve put a lot of effort into recognizing “voicing,” i.e. distinguishing among negations, conditional statements, and variations in the degree of sentiment.
Examples will help here:
(1) I want to return the espresso machine. (intention to
(2) I plan on returning the espresso machine. (intention to
(3) I won’t return the espresso machine. (negation – not a
(4) I returned no espresso machines. (negation – not a return)
(5) I failed to return the espresso machine. (negation – not a return)
(6) If you don’t return my phone call, I will return the espresso machine. (conditional – threat to return)
(7) I’ve returned espresso machines twice already. (recurrence – repeated returns)
(8) I tried to return the espresso machine. (attempt to return, negation – not a return)
(9) I failed to return the espresso machine. (failed attempt, negation – not a return)
(10) I refuse to return the espresso machine. (negation – not a return)
(11) I need to return the espresso machine now/asap. (urgency)
(12) I’m unhappy. (unhappy, duh)
(13) I’m really unhappy. (augmented unhappiness)
(14) The tires were over-inflated. (augmented inflation…works on non-sentiment qualities too)
(15) The breakfast was under-cooked. (diminished)
(16) The water in the shower this morning was way too cold. (augmented coldness)
(17) I will speak to the customer about returning the espresso machine. (indefinite – not a return, yet)
If we’re using our Fact Relationship Network style of extraction to look at these sentences, those voicing variations get represented on the mode* (typically), so you’d see output like:
happy (not, augmented)
*Editor’s note: “Mode” means, in effect, “behavior or action.” It’s not a typo for “node.”
Post-extraction, any of these voicings can be used to roll up several FRN extractions into a collection that makes sense to the business, e.g. “water | cold (augmented)” and “water | hot (not).” What makes all that possible is that the core engine has access to a great deal of linguistic information before it turns the extraction into a specific type of representation like an FRN. Such linguistic information includes the notions of negating verbs (failed to <x>), double negatives, negative quantifiers that transfer their negation to the verb (no animals were harmed…), adverbial prepositional phrases (I returned the espresso machine in a fit of rage.) and so on. We think that’s a big deal – it lets us get a true count of, in these examples, product returns – not the returns of phone calls, or the threatened returns, the intentional returns, or the non-returns. We used this kind of distinctive power to show a retailer how they could identify customers who were threatening to return products, thereby detecting a set of product recalls that could be saved (before they ended up costing the retailer $$$).