January 31st, 2008 Curt Monash
I caught up with Expert System S.p.A. last week. They turn out to be doing $10 million in text technology annual revenue. That alone is surprising (sadly), but what’s really remarkable is that they did it almost entirely in the Italian market. As you might guess, that figure includes a little bit of everything, from search engines to Italian language filters for Microsoft Office to text mining. But only $3 ½ million of Expert System’s revenue is from the government (and I think that includes civilian agencies), and under 30% is professional services, so on the whole it seems like a pretty real accomplishment. Oh yes – Expert Systems says it’s entirely self-funded.
As of last year, Expert System also has English-language products, and a couple of minor OEM sales in the US (for mobile search and semantic web applications). German- and Arabic-language products are in beta test. The company says that its market focus going forward is national security – surely the reason for the Arabic – and competitive intelligence. It envisions selling through partners such as system integrators, although I think that makes more sense for the government market than it does vis-a-vis civilian companies. In February the company is introducing a market intelligence product focused on sentiment analysis.
Expert System is a bit of a throwback, in that it talks lovingly of the semantic network that informs its products.
Read the rest of this entry »
Posted in Application areas, Enterprise search, Expert System S.p.A., Ontologies and context identification, Search and text storage, Text mining, Voice of the Market/competitive intelligence | No Comments »
October 6th, 2007 Curt Monash
And for my sixth text mining post this weekend, here are some highlights of the Clarabridge technology story. (Sorry if it sounds clipped, but I’m a bit burned out …)
- Like Attensity, Clarabridge practices exhaustive extraction.* That is, they do linguistics against documents, extract all sorts of entities and relationships among the entities from each document, and dump the results into a relational database.
- Unlike Attensity, which uses a simple normalized relational schema, Clarabridge dumps the extracted data into a star schema. (The Clarabridge folks are from Microstrategy, which – surely not coincidentally – also favors star schemas.)
Read the rest of this entry »
Posted in BI integration, Clarabridge, Comprehensive or exhaustive extraction, Ontologies and context identification, Text mining | 1 Comment »
April 30th, 2007 Curt Monash
Baynote sells a recommendation engine whose motto appears to be “popularity implies accuracy.” While that leads to some interesting technological ideas (below), Baynote carries that principle to an unfortunate extreme in its marketing, which is jam-packed with inaccurate buzzspeak. While most of that is focused on a few trendy meme-oriented books, the low point of my briefing today was the probably the insistence against pushback that “95%” of Google’s results depend on “PageRank.” (I think what Baynote really meant is “all off-page factors combined,” but anyhow I sure didn’t get the sense that accuracy was an important metric for them in setting their briefing strategy. And by the way, one reason I repeat the company’s name rather than referring to Baynote by a pronoun is that on-page factors DO matter in search engine rankings.)
That said, here’s the essence of Baynote’s story, as best I could figure it out.
Read the rest of this entry »
Posted in Baynote, Google, Ontologies and context identification, Search and text storage, Search engine optimization (SEO), Social software and media, Specialized search engines | 3 Comments »
March 26th, 2007 Curt Monash
Andrew Orlowski is an over-the-top jerk, and a pretty sloppy reporter and analyst to boot. But he occasionally makes a good point even so. In the most recent instance, he confronted Tim Berners-Lee. As the article makes clear, Berners-Lee reacted badly to Orlowski, reflecting an attitude that is probably shared by 99% of the people who encounter the guy, and in the future will probably be adopted by sentient computers as well. Even so, Orlowski’s underlying point is valid: If the Semantic Web is going to be any more spam-free than the current Web, nobody has adequately explained why.
Want to continue getting great research about search engines, directories, and other hot internet topics? Then subscribe to our feed, by RSS/Atom or e-mail! We recommend taking the integrated feed for all our blogs, but blog-specific ones are also easily available.
Posted in Ontologies and context identification, Spam and antispam | 2 Comments »
February 15th, 2007 Curt Monash
InQuira and Mercado both have broadened their marketing pitches beyond their traditional specialties of structured search for e-commerce. Even so, it’s well worth talking about those search technologies, which offer features and precision that you just don’t get from generic search engines. There’s a lot going on in these rather cool products.
In broad outline, Mercado and InQuira each combine three basic search approaches:
- Generic text indexing.
- Augmentation via an ontology.
- A rules engine that helps the site owner determine which results and responses are shown under various circumstances.
Of the two, InQuira seems to have the more sophisticated ontology. Indeed, the not-wholly-absurd claim is that InQuira does natural-language processing (NLP). Both vendors incorporate user information in deciding which search results to show, in ways that may be harbingers of what generic search engines like Google and Yahoo will do down the road.
Read the rest of this entry »
Posted in InQuira, Mercado, Natural language processing (NLP), Ontologies and context identification, Search and text storage, Structured search | 2 Comments »
February 7th, 2007 Curt Monash
Joost de Valk makes an interesting suggestion, namely that Wikipedia should drop all external links other than to DMOZ, and rely on DMOZ as the outside link directory. As division of labor, it makes perfect sense. However, it’s a total non-starter until at least two problems are solved. Read the rest of this entry »
Posted in Directories, Directories and filtering, ODP and DMOZ, Ontologies and context identification, Spam and antispam | 4 Comments »
November 11th, 2006 Curt Monash
Most people in the text analytics market realize that text mining and search are somewhat related. But I don’t think they often stop to contemplate just how close the relationship is, could be, or someday probably will become. Here’s part of what I mean:
- Text mining powers search. The biggest text mining outfits in the world, possibly excepting the US intelligence community, are surely Google, Yahoo, and perhaps Microsoft.
- Search powers text mining. Restricting the corpus of documents to mine, even via a keyword search, makes tons of sense. That’s one of the good ideas in Attensity 4.
- Text mining and search are powered by the same underlying technologies. For starters, there’s all the tokenization, extraction, etc. that vendors in both areas license from Inxight and its competitors. Beyond that, I think there’s a future play in integrated taxonomy management that will rearrange the text analytics market landscape.
Read the rest of this entry »
Posted in Attensity, Business Objects and Inxight, Enterprise search, FAST, Google, IBM and UIMA, Ontologies and context identification, Open source text analytics, Search and text storage, Text mining | 3 Comments »
August 3rd, 2006 Curt Monash
My August Computerworld column starts where July’s left off, and suggests principles for enterprise text technology architecture. This will not run Monday, August 7, as I was originally led to believe, but rather in my usual second-Monday slot, namely August 14. Thus, I finished it a week earlier than necessary, and I apologize to those of you I inconvenienced with the unnecessary rush to meet that deadline.
The principles I came up with are:
- Deploy search widely across the enterprise.
- It’s OK for your text data to be distributed across a range of silos.
- Integrate fact extraction/text mining aggressively into your predictive analytics and dashboards.
- Having a preferred enterprise text technology tool suite is nice, but accept that there will probably be lots of departmental exceptions.
- Reinvent your customer communication (and other) processes to exploit text technologies.
- Integrate your taxonomies.
I’ll provide a link when the column is actually posted.
Posted in Enterprise search, Ontologies and context identification, Search and text storage, Text mining | 1 Comment »
July 29th, 2006 Curt Monash
Once upon a time, more than a decade before the founding of Autonomy, a New Mexico inventor had the idea for a generic pattern recognition tool. He implemented it on a PC add-in board that, if I recall correctly, plugged into the Apple II. This was the genesis of the company Excalibur Technologies.
Read the rest of this entry »
Posted in Convera, Enterprise search, Ontologies and context identification, Search and text storage | 2 Comments »
July 17th, 2006 Curt Monash
I’ve argued previously that enterprises need serious ontologies, and that this lack is holding back growth in multiple areas of text technology – search, text mining and knowledge extraction, various forms of speech recognition, and so on. The core point was:
The ideal ontology would consist mainly of four aspects:
1. A conceptual part that’s language-independent.
2. A general language-dependent part.
3. A sensitivity to different kinds of text – language is used differently when spoken, for instance, than it is in edited newspaper articles.
4. An enterprise-specific part. For example, a company has product names, it has competitors with product names, those names have abbreviations, and so on.
Read the rest of this entry »
Posted in Ontologies and context identification, Open source text analytics | No Comments »