Discussion of efforts to integrate text analytics with business intelligence and other analytic technologies. Related subjects include:
And for my sixth text mining post this weekend, here are some highlights of the Clarabridge technology story. (Sorry if it sounds clipped, but I’m a bit burned out …)
- Like Attensity, Clarabridge practices exhaustive extraction.* That is, they do linguistics against documents, extract all sorts of entities and relationships among the entities from each document, and dump the results into a relational database.
- Unlike Attensity, which uses a simple normalized relational schema, Clarabridge dumps the extracted data into a star schema. (The Clarabridge folks are from Microstrategy, which – surely not coincidentally – also favors star schemas.) Read more
|Categories: BI integration, Clarabridge, Comprehensive or exhaustive extraction, Ontologies, Text mining||2 Comments|
I’ve been pretty skeptical about Inxight’s Awareness Server. My theory is that ordinary enterprise search engines can index remotely anyway, and they offer much better search functionality. Inxight’s Ian Hersey was kind enough to write in and offer two counter-arguments.
First, Ian points out that there are circumstances when, due to security and permissions, you can’t really index everything via one search engine. Specifically, he offers the government as an example. OK, I can see that in the government, with its classified and/or regulated silos. However, I have trouble thinking of many more examples. While there certainly are plenty of instances where a variety of organizations share information on a somewhat arms-length basis, it’s tough to think of such cases where federated text search would come into play.
Second, Ian in essence disputes my claim of inferior functionality. While implicitly conceding — as well he should! — that Inxight’s Awareness Server doesn’t do some things full-featured search engines do, he points out analytic features that may not be found in conventional search engine offering. The big one he calls out is faceted search — which of course was the core of Intelliseek, the acquisition Awareness Server came from. Hmm. Faceted search has a checkered history, with Excite and Northern Light being perhaps the most visible among many failures. On the other hand, it’s a great idea that keeps being tried, and some versions — notably Endeca’s — have turned out well.
I guess I’ll have to reserve judgment on that part until I look at Inxight’s product and see what they do and don’t actually have.
|Categories: BI integration, Business Objects and Inxight, Endeca, Enterprise search, Search engines||1 Comment|
I dropped by Progress a couple of weeks ago for back-to-back briefings on Apama and EasyAsk. EasyAsk is Larry Harris’ second try at natural language query, after the Intellect product fell by the wayside at Trinzic, the company Artificial Intelligence Corporation grew into.* After a friendly divorce from the company he founded, if my memory is correct, Larry was able to build EasyAsk very directly on top of the Intellect intellectual property.
*Other company or product names in the mix at various times include AI Corp and English Wizard. Not inappropriately, it seems that Larry has quite an affinity for synonyms …
EasyAsk is still a small business. The bulk is still in enterprise query, but new activity is concentrated on e-commerce applications. While Larry thinks that they’ve solved most of the other technical problems that have bedeviled him over the past three decades, the system still takes too long to implement. Read more
|Categories: BI integration, Language recognition, Mercado, Natural language processing (NLP), Progress and EasyAsk, Speech recognition||1 Comment|
When a company announces an acquisition, it usually does a round of limited-content briefings, in no small part because the antitrust lawyers won’t let them do anything else. Once the deal closes, antitrust restrictions are lifted, and they do another round of briefings. These, typically, are vague and platitudinous.
Business Objects/Inxight have now reached that point. Even so, my briefing yesterday had some aspects worth writing up. Read more
|Categories: BI integration, Business Objects and Inxight, Enterprise search, Search engines||2 Comments|
In a comment posted to this Andy Hayler blog entry, a former Inxight board member mentions Inxight’s broad patent portfolio. I don’t know what defensible value is or isn’t there, but I do know that patent positions are important to Business Objects. Read more
The press conference is a little ways off, but the news has come across the wire that Business Objects is acquiring text analytics/text mining vendor Inxight.
Quick context on Business Objects: BOBJ is a pioneer — perhaps THE pioneer — of modern business intelligence. Recently it has gone on an acquisition-heavy bulking-up strategy. There is no assumption that ALL its pieces will fit into one seamless whole. For large enterprises, it is increasing its professional services emphasis (as a complement to new license sales, not a replacement for them).
Quick context on Inxight: Inxight spun off from Xerox PARC with all sorts of cool text-related technologies. But while it’s somewhat of a competitor in generic text mining, visualization, and so on, the one market where it has really succeeded is in OEM software for filtering and tokenization, serving search and text mining vendors alike. Read more
|Categories: BI integration, Business Objects and Inxight, Companies and products, Text mining||1 Comment|
Text mining newbie Clarabridge gave me the all-too-customary “Please let us brief you, but then don’t write about it for a while” routine. Now that it’s OK to post, what I’m up for offering is a few salient points in bullet form.
- The closest analogy to what Clarabridge does is Attensity’s new(ish) strategy – extract “facts” from documents and dump them into a relational database management system. In particular, Clarabridge and Attensity alike make the case “Our categorization is more flexible because it’s applied only after the extraction happens.”
- Clarabridge’s sweet spot is extracting user opinions from short documents. E.g., the customer uses cases they talk about are customer feedback forms, public blog postings, etc. about A. hotels and B. consumer software products.
- Clarabridge has a strong business intelligence mentality, describing the product as “ETL for unstructured data.” But then, it’s spun out of a BI consultancy that itself was founded by Microstrategy veterans.
- Clarabridge uses a different database schema than Attensity. Attensity’s fact-relationship network (FRN) is basically just two thin, long tables. Clarabridge, however, uses a Microstrategy-like star schema, in which different kinds of things that you can tokenize correspond to different dimensions.
Frankly, if somebody wants an alternative to the Attensity/Teradata/Business Objects partnership they could do worse than talk with Clarabridge.
|Categories: Attensity, BI integration, Clarabridge, Comprehensive or exhaustive extraction, Text mining||Leave a Comment|
I caught up with Dennis Moore today to talk about SAP’s search strategy. And the biggest thing I learned was – it’s not about the search. Rather, it’s about a general interface, of which search and natural language just happen to be major parts.
Dennis didn’t actually give me a lot of details, at least not ones he’s eager to see published at this time. That said, SAP has long had a bare-bones search engine TREX. (TREX was also adapted to create the columnar relational data manager BI Accelerator.) But we didn’t talk about TREX enhancements at all, and I’m guessing there haven’t really been many. Rather, SAP’s focus seems to be on:
A. Finding business objects.
B. Helping users do things with them.
|Categories: BI integration, Enterprise search, Language recognition, Natural language processing (NLP), SAP, Search engines||2 Comments|
FAST is annoying me a bit these days. It’s nothing serious, but travel schedule screw-up’s, an annoying embargo, and a screw-up in the annoying embargo have all hit at once. So I’ll keep this telegraphic and move on to other subjects.
- They’re doing fast queries without using a lot of RAM.
- They’re doing the usual text search thing of indexing across multiple “databases,” only now it’s applied to, well, databases. (Not that there’s much new about that particular aspect. Actually, there seems to be a bit of kludge in that they export the databases to some kind of simple text files.)
- They’re doing some level of concept identification ala the text mining guys. (They don’t call it “entity extraction” because the results aren’t dumped into a database anywhere, but instead are just used on the fly.) Of course, the text mining/search convergence goes both ways.
- They bought a BI/dashboard tool and are using it both to analyze query logs and also to do normal BI/dashboard kinds of things.
- They have big references for this stuff, at least the single-web-site query aspect. Well, actually, the customer names are confidential. Oh well.
And as another example of how this wasn’t the smoothest PR month for FAST, Steve Arnold somehow got the false idea that they were getting out of true text search altogether.
Dave Kellogg thinks FAST will be ineffective and defocused because of its efforts in business intelligence. I can’t comment on whether that analysis is brilliant, self-serving, or both, because anything I’ve been told on the subject is under embargo.
Embargos were a crucial PR tactic when Regis McKenna exploited them for the original rollout of the Macintosh in 1984. But I suspect that in many cases they’ve quite outlived their usefulness. If I wait between the time I’m briefed and the time the embargo is up to write something, my thoughts about it get fuzzy. If I write something at the time and put it on ice, it may be obsolete because of what other people write in the mean time.
More and more, if something is embargoed, I wind up not writing about it at all.
EDIT: Point #4 of my post on the mismatch between relational databases and text search is pretty relevant here.