Due to various transatlantic communication glitches, I’d never had a serious briefing with text mining vendor TEMIS until yesterday, when I finally connected with CEO Eric Bregand. So here’s a quick TEMIS overview; I’ll discuss what they actually do in a separate post.
- TEMIS has 50 people; 3 main businesses and a couple of secondary ones; two larger offices in France; and smaller offices in Germany and the US. As would be expected, TEMIS’ customer base is concentrated in Continental Europe. The US exceptions seem concentrated in the life sciences vertical (not coincidentally, the US office is outside Philadelphia).
- Like Inxight, TEMIS is at least partly a spin-off from Xerox’s text analytics efforts. Indeed, its Grenoble office was acquired from Xerox. Unlike Inxight, TEMIS doesn’t serious pursue OEM business, but a couple of exceptions have occurred (Eric mentioned Convera and Documentum).
- TEMIS claims to follow a middle course between ClearForest on the one and Attensity and Clarabridge on the other, in that it doesn’t offer exhaustive extraction but does offer “iterative extraction.” (More on that below.) Frankly, I not yet sure that there’s much of a difference in this regard between TEMIS and ClearForest. Like ClearForest – and I’m not sure Attensity would completely dispute this – TEMIS believes that really sophisticated semantic analysis is hard in an exhaustive-extraction scenario. Eric also raised size/performance issues about exhaustive extraction, but I found those unconvincing in this era of cheap and powerful data warehouse engines.
- Unlike most of the rest of the text analytics industry, TEMIS really likes UIMA, having committed to it a year and a half ago. So, apparently, does the customer for at least one large deal jointly won with IBM (Europol). The big benefit of UIMA is openness/connectivity, but load-balancing/failover also got mentioned a few times, and that’s attributed to UIMA as well.
I’ll confess to being a little unclear about “iterative exhaustion,” and indeed to suspecting that it conflates two different things. One would just be the inherent waterfall-style processing inherent to UIMA and, for that matter, to most other approaches to tokenization. The other is the idea that you can do a decent job of identifying what’s in each document in a large corpus in one pass, then do another pass focusing more intently on the ones that might have exactly what you’re looking for.
- Attensity FRN (fact-relationship network)
- Attensity vs. ClearForest
- Clarabridge overview
- DBMS2 coverage of data warehouse appliances and other data warehouse engines
- The French presence in text analytics