Comments on: If you think sentiment analysis technology can detect idiom, I have a bridge I’d like to sell you

By: The rise of machine-written journalism | Fullrunner

The rise of machine-written journalism | Fullrunner — Mon, 03 Jan 2011 13:36:41 +0000

[…] the way, of course, intelligent systems will need to start coping with the complexities of human language have so far confounded them, including idiom, metaphor and […]

By: Curt Monash

Curt Monash — Tue, 12 Aug 2008 20:30:01 +0000

Jane,

I think it’s harder than you’re suggesting. For one thing, idioms change with fashion. For another, idioms don’t all have fixed forms. E.g., I’ve seen “Pot: Meet kettle” and “pot-kettle-black” both used fairly often to refer to “the pot is calling the kettle black”. And I’ve been known to post about “the relative colors of cookware”.

CAM

By: jane

jane — Mon, 11 Aug 2008 14:54:58 +0000

It should be recognized two different aspect of idioms sentiment analysis ( or any text anaysis).
Real world mirroring or usage and labs unrealistically created. Vast majority of population use fairly common idioms, phrases etc. not very complex and not very tricky…problem is taht once it get to testing the software by “so called exterts” it gets to extra – unrealistic phrases that are barely used in real world.
So idom detection should not be hard if tried to mimick reality .

By: Kirk Daly

Kirk Daly — Thu, 10 Jul 2008 12:37:50 +0000

Hi all,

The comment I would make is that yes, while irony, sarcasm etc. are beyond most technologies today (primarily in my view because they require a degree of real world knowledge that pushes processing times beyond what is acceptable), that doesn’t undermine the massive contribution sentiment analysis software can make to certain business applications.

A good example of this would be the deployment of Infonic’s Sentiment Analysis technology within Thomson Reuters’ NewsScope product. Trading desks receive a large volume of market news in a very standard format, typically delivered from the journalist without irony or other such “difficult” language.

While in individual instances Sentiment’s linguistic algorithms may well be defeated by the language used, on average the technology does a proven job of extracting the sentiment of the coverage as it relates to the entities mentioned. This enables accurate real time tracking of the average sentiment of the news flow pertaining to specific stocks – with the obvious application to automated trading.

It is not necessary to take our word on this. Sceptics need only look to the large trading banks who are purchasing the software in increasing numbers for use within their algorithmic trading systems. They are doing this having rigorously tested the software and proven to their own satisfaction the correlation of movements in our average sentiment score with movements in the stock price.

To take on the specific “… riot after…” example discussed, Sentiment would treat “pretty” as an intensifier of “ugly (riot)” but handle “beautiful, ugly” quite differently. From what David has said above I think both Infonic and Attensity would think that all bets are very much ON…

– Kirk

By: Tim Estes

Tim Estes — Sun, 06 Jul 2008 19:12:31 +0000

David,

Thanks for that explication. It was quite informative and honest in stating with clarity the strength and limitations of what can be done right now. Its quite an impressive bit of engineering to handle subtleties such as that on the syntactic/role level with that kind of potential ambiguity. The “voicing” piece is particularly cool.

As for the comments on the semantic model problem… that is one way to handle it – at least with a schematic bias. Of course, there is another approach that might look at it more as a rich model of features with particular expectations such that a use in the way described is novel and suggests that the representation used (i.e. the word) is actually not representative of the expected underlying idea.

So Chris… how would Inxight/BO/SAP handle that? You know you asked for that question. 😉

-Tim

By: Chris Riopel

Chris Riopel — Wed, 02 Jul 2008 00:07:06 +0000

David,

I think you meant to say that the parse result was

riot : ugly

Unless you were being brutally honest (above and beyond, really) about how your software might have incorrectly parsed it…

Chris

By: David Bean

David Bean — Tue, 24 Jun 2008 22:31:14 +0000

On the second issue – semantic understanding and the sample sentence…

“The riot after the Celtics’ win on Tuesday was pretty ugly.”

Since we’d parse that, we’d get something like this bracketed form to work with:

[[The riot]np/subj [after [the Celtics’ win]np]pp [on Tuesday]advp [was]vp [pretty ugly]adjp ]clause

We’d also recognize that pretty is a modifier of ugly. (btw, I’ve left out things like POS tags, entity idenfication, semantic class tags, etc.)

From this kind of syntactic analysis, we could perform a number of extraction processes to turn the parse into something more abstract and recognizable to an analyst. In most cases, we’d map the issue of interest to something that looks like this:

win : ugly [more]

The term before the colon is a thing, the term after the colon represents an action performed on that thing or a characteristic or quality of that thing. In this case, the win was ugly. In addition, because we’ve mapped “pretty” into a collection of terms that augment head adjectives, we’d represent this as [more] to distinguish it from a simple case of “the win was ugly.” We call this nuance in expression “voicing” and we use it to pick up on augmentation/diminishment of adjectives, plus negation, recurrence, conditionality, and a bunch of other stuff on verb phrases.

Now, to the semantic model question. We try to do a fair amount of semantic disambiguation by virtue of parsing into thematic roles, i.e. getting at actors, actions, recipients, instruments, etc., at a level above the syntax. That’s what’s letting us understand that “pretty” augments “ugly” in the example above. At a whole ‘nother level, what exactly an “ugly win” means in a larger context — is that a sports-related victory that involved a lot of poor play or is that a political race victory that included lots of negative advertising — requires a ton of real world knowledge, and that’s always been a bug-a-boo in the AI world. I’ve seen some taxonomy-based search engines that could distinguish word senses like that, but tying the correct notion of an “ugly win” to a larger understanding of the world gets, yep, ugly, really fast.

I’ve seen a number of government uses where extracted data is used to trigger ontologies and reasoning over predefined concepts and relationships, and that’s probably approaching the level of semantic modeling you’d need to detect sarcasm, but it would still be difficult to distinguish why some piece of data didn’t fit the model — was it due to sarcasm, or was it incorrectly extracted data, or was it new knowledge that the ontology is missing? I know enough about ontologies and automated reasoning to be just a little bit more than dangerous, so there may be better answers out there.

– David

By: David Bean

David Bean — Tue, 24 Jun 2008 17:55:05 +0000

Hi guys,

Yah, I’ve got a couple of responses for you ;-). First, I completely agree with you on detecting idiomatic usage…that’s beyond state-of-the-art, at least as I understand it. In fact, at last year’s TA Summit, I spoke directly to this issue. I had a slide in my preso about sentiment analysis titled “Reality Check” with examples of content that can’t be handled:

Sarcasm – “You really know how to make a customer feel appreciated, don’t you?”

Sarcasm with Tone of Voice – “Oh….that makes me sooooo happy.”

Metaphor – “I’m as happy as a turkey on November 24th”

Idioms – “I’m just like a bug in a rug.” “Happy as a clam”

Of those, I actually think idiomatic usage may be the more addressable since you could enumerate a slew of idioms and a simple lexical match would suffice for many of them.

But in general, this sort of thing is just beyond our reach, and by “our” I mean the field.

By: Tim Estes

Tim Estes — Tue, 24 Jun 2008 04:35:32 +0000

Touche. Sometimes puns really do land.

By: Curt Monash

Curt Monash — Sat, 21 Jun 2008 20:14:34 +0000

Tim,

OK, I see what you’re getting at. If “pretty ugly” can’t be tagged — based on syntax and semantics — as meaning something quite different from “beautiful, ugly”, all bets are off.

And the syntactic clues are indeed — well, they’re PRETTY slim.

CAM