Analysis of ontologies, of the role they play in text analytics, and of technology and techniques to build and manage them.

September 20, 2009

Data marts in the world of text

CMS/search (Content Management System) expert Alan Pelz-Sharpe recently decried “Shadow IT”, by which he seems to mean departmental proliferation of data stores outside the control of the IT department. In other words, he’s talking about data marts, only for documents rather than tabular data.

Notwithstanding the manifest virtues of centralization, there are numerous reasons you might want data marts,  in the tabular and document worlds alike.  For example:

Bottom line: Text data marts, much like relational data marts, are almost surely here to stay.

Related link

December 29, 2008

Where “semantic” technology is or isn’t important

At Lynda Moulton’s behest, I spoke a couple of times recently on the subject of where “semantic” technology is or isn’t likely to be important.  One was at the Gilbane conference in early December.  The slides were based on my previously posted deck for a June talk I gave on a text analytics market overview. The actual Gilbane slides may be found here.

My opinions about the applicability of semantic technology include:

So what would your list be like?

June 15, 2008

How text search has evolved over the past 15 years

I just stumbled across a brilliant summary of evolution in text search technology, written four years ago. It’s equally valid today (which in itself says something). I found it on the Prism Legal blog, but the actual author is Sharon Flank. My own comments are interspersed in bold. Read more

June 11, 2008

Expert System S.p.A. update

I chatted with Brooke Aker, the new CEO of Expert System’s US subsidiary, for quite a while last week. Unfortunately, we had some cell phone problems, and email followup hasn’t gone well, so I’m hazy on a few details. But here are some highlights, as best I understood them. Read more

January 31, 2008

The biggest text analytics company you probably never heard of

I caught up with Expert System S.p.A. last week. They turn out to be doing $10 million in text technology annual revenue. That alone is surprising (sadly), but what’s really remarkable is that they did it almost entirely in the Italian market. As you might guess, that figure includes a little bit of everything, from search engines to Italian language filters for Microsoft Office to text mining. But only $3 ½ million of Expert System’s revenue is from the government (and I think that includes civilian agencies), and under 30% is professional services, so on the whole it seems like a pretty real accomplishment. Oh yes – Expert Systems says it’s entirely self-funded.

As of last year, Expert System also has English-language products, and a couple of minor OEM sales in the US (for mobile search and semantic web applications). German- and Arabic-language products are in beta test. The company says that its market focus going forward is national security – surely the reason for the Arabic – and competitive intelligence. It envisions selling through partners such as system integrators, although I think that makes more sense for the government market than it does vis-a-vis civilian companies. In February the company is introducing a market intelligence product focused on sentiment analysis.

Expert System is a bit of a throwback, in that it talks lovingly of the semantic network that informs its products. Read more

October 6, 2007

The Clarabridge approach to text mining

And for my sixth text mining post this weekend, here are some highlights of the Clarabridge technology story. (Sorry if it sounds clipped, but I’m a bit burned out …)

April 30, 2007

Wise Crowds of Long-Tailed Ants, or something like that

Baynote sells a recommendation engine whose motto appears to be “popularity implies accuracy.” While that leads to some interesting technological ideas (below), Baynote carries that principle to an unfortunate extreme in its marketing, which is jam-packed with inaccurate buzzspeak. While most of that is focused on a few trendy meme-oriented books, the low point of my briefing today was the probably the insistence against pushback that “95%” of Google’s results depend on “PageRank.” (I think what Baynote really meant is “all off-page factors combined,” but anyhow I sure didn’t get the sense that accuracy was an important metric for them in setting their briefing strategy. And by the way, one reason I repeat the company’s name rather than referring to Baynote by a pronoun is that on-page factors DO matter in search engine rankings.)

That said, here’s the essence of Baynote’s story, as best I could figure it out. Read more

March 26, 2007

So THAT’S why Andrew Orlowski still has a job (Part 2)

Andrew Orlowski is an over-the-top jerk, and a pretty sloppy reporter and analyst to boot. But he occasionally makes a good point even so. In the most recent instance, he confronted Tim Berners-Lee. As the article makes clear, Berners-Lee reacted badly to Orlowski, reflecting an attitude that is probably shared by 99% of the people who encounter the guy, and in the future will probably be adopted by sentient computers as well. Even so, Orlowski’s underlying point is valid: If the Semantic Web is going to be any more spam-free than the current Web, nobody has adequately explained why.

February 15, 2007

InQuira’s and Mercado’s approaches to structured search

InQuira and Mercado both have broadened their marketing pitches beyond their traditional specialties of structured search for e-commerce. Even so, it’s well worth talking about those search technologies, which offer features and precision that you just don’t get from generic search engines. There’s a lot going on in these rather cool products.

In broad outline, Mercado and InQuira each combine three basic search approaches:

Of the two, InQuira seems to have the more sophisticated ontology. Indeed, the not-wholly-absurd claim is that InQuira does natural-language processing (NLP). Both vendors incorporate user information in deciding which search results to show, in ways that may be harbingers of what generic search engines like Google and Yahoo will do down the road. Read more

February 7, 2007

Is DMOZ the cure to Wikipedia’s spam problem?

Joost de Valk makes an interesting suggestion, namely that Wikipedia should drop all external links other than to DMOZ, and rely on DMOZ as the outside link directory. As division of labor, it makes perfect sense. However, it’s a total non-starter until at least two problems are solved. Read more

Next Page →

Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.