April 29th, 2008 Curt Monash
I’m putting up two posts this morning on Mark Logic and its MarkLogic product family. The main one, over on DBMS2, outlines the technical architecture — focusing on MarkLogic as an XML database management system — and provides a bit of overall context. This post attempts to position MarkLogic against alternative kinds of text analytics engine.
For the most part, MarkLogic is indeed sold (and bought) for the storage, manipulation, and retrieval of text. (One long-confidential exception to this rule is scheduled to be unveiled at the June user conference.) Most applications seem to fit a custom publishing/enhanced search paradigm:
-
Ingest text.
-
Enhance it.
-
Serve it up in chunks, typically via a sophisticated search interface.
Differences vs. conventional search engines include:
-
Documents are indexed on the fly, and available for query immediately upon ingestion.
-
MarkLogic is a real, ACID-compliant DBMS. So everything else – such as a user tag or comment — is also available for immediate query. Mark Logic says customers are making a lot of use of this feature.
-
MarkLogic has a real programming language – specifically XQuery. (Note: XQuery is a much fuller language than, say, standard SQL, with conditional logic, arithmetic, try/catch, and so on.)
-
MarkLogic handles fielded information, document chunks, and whole documents in a completely integrated fashion. Truth be told, I don’t know exactly to what extent Autonomy or FAST do or don’t fall short of this standard, but it’s never seemed to be as much of a priority on their part as I’ve felt it should be.
Mark Logic also claims huge advantages in corpus administration. Scalability seems good too; there’s a national-intelligence customer with a 200 terabyte database. And they’re proud of a feature called lexicons, although it seems so obvious to me that I’ve so far failed to muster what they’d probably regard as the proper level of excitement about it. (In SQL terms, it seems to be a combination of SELECT and COUNT DISTINCT, both of which are capabilities I’d think would be in XQuery anyway.)
Please subscribe to our feed!
Posted in Application areas, Mark Logic | 3 Comments »
November 1st, 2007 Curt Monash
CEO Eric Bregand of Temis recently checked in by email with an update on text mining market activity. Highlights of Eric’s views include:
- Yep, Voice Of The Customer is hot, in “many markets”; Eric specifically mentioned banking, car, energy, food, and retail. He further sees IBM backing VotC as text’s “killer app.” (Note: Temis has a history of partnering with IBM, most notably via its unusually strong commitment to UIMA.)
- Specifically, THE hot topics in the European market these days are competitive intelligence and sentiment analysis. (Note: I’ve always thought Temis got serious about competitive analysis a little earlier than most other text mining vendors did.)
- Life sciences is an ever growing focus for Temis.
- I confused him a bit with how I phrased my question about custom publishing and Temis’ Mark Logic partnership. But he did express favorable views of the market, specifically in the area of integrating text mining and native XML database management, and even volunteered that nStein appears to be doing well.
Get great research about text mining, database management, and other hot analytics-related topics! Subscribe to our comprehensive (if not exhaustive) feed, by RSS/Atom or e-mail! We recommend taking the integrated feed for all our blogs, but blog-specific ones are also easily available.
Technorati Tags: TEMIS, nStein, IBM, text mining, voice of the customer
Posted in Application areas, IBM and UIMA, Investment research and trading, Mark Logic, TEMIS, Text mining, Voice of the Customer, Voice of the Market/competitive intelligence, nStein | 1 Comment »
July 22nd, 2007 Curt Monash
It was tough to judge user demand at the recent Text Analytics Summit because, well, very few users showed up. And frankly, I wasn’t as aggressive at pumping vendors for trends as I am some other times. That said, I have talked with most text analytics vendors recently,* and here are my impressions of what’s going on. Any contrary – or confirming! — opinions would be most welcome.
*Factiva is the most significant exception. Hint, hint.
If you think about it, text analytics is a “secret ingredient” in search, antispam, and data cleaning,* and this dominates all other uses of the technology. A significant minority of the research effort at companies that do any kind of text filtering is – duh — text analytics. Cold comfort for specialist text analytics vendors, to be sure, but that’s the way it is.
*I.e., part of the “T” in “ETL” (Extract/Transform/Load).
Text-analytics-enhanced custom publishing will surely at some point become a must-have for business and technical publishers. However, it appears that we’re not quite there yet, as large publishers make do with simple-minded search and the like. In what I suspect is a telling market commentary, there’s no headlong rush among vendors to dump text mining for custom publishing, notwithstanding the examples of nStein and (sort of) ClearForest. I don’t want to be overly negative – either my friends at Mark Logic are doing just fine or else they’re putting up a mighty brave front – but I don’t think the nonspecialist publishing market is there yet.
Read the rest of this entry »
Posted in ClearForest and Reuters, Factiva and Dow Jones, Mark Logic, SAS, Search and text storage, Spam and antispam, Text Analytics Summit, Text mining, Voice of the Customer, nStein | 1 Comment »
March 21st, 2007 Curt Monash
We’ve now solidified the membership of the Text Analytics Summit marketing panel. It is:
- Curt Monash, President, Monash Information Services
- Dave Kellogg, CEO, Mark Logic Corporation
- Michelle De Haaff, VP Marketing, Attensity Corporation
- Michel Lemay, VP Marketing, nstein Technologies
- Mary Crissey, SAS Analytics Marketing Manager, SAS Institute
Michelle, Michel, and Mary are all obvious choices, responsible for marketing at leading text mining vendors. In addition, Mary has excelled on the same panel in the past, Michel sent me e-mail with some brilliant thoughts on the panel subject, and Attensity has one of the most interesting strategies in the text analytics market.
As for Dave — he’s simply one of the most astute marketing theorists working in software today. And he runs a very interesting text technology company. And he used to be most senior marketing guy in all of business intelligence, when he was SVP at Business Objects. In his copious free time, he writes a really cool blog.
Want to continue getting great research about search, text mining, and other hot text technology topics? Then subscribe to our feed, by RSS/Atom or e-mail! We recommend taking the integrated feed for all our blogs, but blog-specific ones are also easily available.
Posted in Attensity, Mark Logic, SAS, Text Analytics Summit, Text mining, nStein | 3 Comments »
December 27th, 2006 Curt Monash
So far as I can tell, Attensity’s strategy when the company was originally founded was rather like ClearForest’s strategy today – and vice-versa. That said, here’s where they seem to stand at this time:
- Attensity wants to make text analytics very easy to integrate into business intelligence and data mining – at the moment, they’re not too focused on the differences between those two disciplines – and is trying to deliver the best possible fact extraction consistent with that charter.
- ClearForest wants to provide really great information extraction — to the limits of what can be done without excessive knowledge engineering – and is trying to integrate as well as possible with other technologies, the better to serve the customers who need what they offer.
Read the rest of this entry »
Posted in Attensity, ClearForest and Reuters, Mark Logic, TEMIS, Text mining | No Comments »
October 3rd, 2006 Curt Monash
Last July I wrote about Google’s text-based project management system. Dave Kellogg of Mark Logic offers links to discussion of a related Google project, and adds news of his own — Mark Logic built a text-based bug tracking system in its own MarkLogic technology.
Posted in Enterprise search, Google, Mark Logic, Search and text storage, Specialized search engines | No Comments »
August 26th, 2006 Curt Monash
I talked again with Mark Logic, makers of MarkLogic Server, and they continue to have an interesting story. Basically, their technology is better search/retrieval through XML. The retrieval part is where their major differentiation lies. Accordingly, their initial market focus (they’re up to 46 customers now, including lots of big names) is on custom publishing. And by the way, they’re a good partner for fact-extraction companies, at least in the case of ClearForest.
Here, as best I understand, is the story of the custom publishing business.
Read the rest of this entry »
Posted in ClearForest and Reuters, Mark Logic, Search and text storage, Specialized search engines | 2 Comments »