April 29, 2008

Mark Logic viewed as a different kind of text search technology vendor

I’m putting up two posts this morning on Mark Logic and its MarkLogic product family. The main one, over on DBMS2, outlines the technical architecture — focusing on MarkLogic as an XML database management system — and provides a bit of overall context. This post attempts to position MarkLogic against alternative kinds of text analytics engine.

For the most part, MarkLogic is indeed sold (and bought) for the storage, manipulation, and retrieval of text. (One long-confidential exception to this rule is scheduled to be unveiled at the June user conference.) Most applications seem to fit a custom publishing/enhanced search paradigm:

  1. Ingest text.

  2. Enhance it.

  3. Serve it up in chunks, typically via a sophisticated search interface.

Differences vs. conventional search engines include:

Mark Logic also claims huge advantages in corpus administration. Scalability seems good too; there’s a national-intelligence customer with a 200 terabyte database. And they’re proud of a feature called lexicons, although it seems so obvious to me that I’ve so far failed to muster what they’d probably regard as the proper level of excitement about it. (In SQL terms, it seems to be a combination of SELECT and COUNT DISTINCT, both of which are capabilities I’d think would be in XQuery anyway.)

April 25, 2008

Twitter is indeed replaceable

Dennis Howlett believes any hope of monetizing [Twitter] rests upon reliability at scale. He’s partially right. Michael Arrington disagrees, essentially asserting that Twitter has become an unshakable monopoly due to the network effect, but his reasoning is flawed. Read more

April 25, 2008

Investment text mining job listing

As per this job listing, at least one “major NYC investment bank” plans to do text mining on a proprietary trading desk.

The successful candidate will mine text data from numerous news sources and incorporate the information the proprietary trading systems.

April 25, 2008

Drive-by Google de-listing

As previously noted, we got hit with some hidden text, probably by SQL injection, and that lead to a Google de-listing. Of the three blogs affected by the attack, I got a de-indexing notice for one (DBMS2); another was de-listed without a notice (Text Technologies); and a third seems to have waltzed through still indexed (Software Memories). I also received a de-indexing notice for another site I have nothing to do with and indeed had never heard of before. Go figure …

We’ve now upgraded to WordPress 2.5, which should close the vulnerability. (Thank you Melissa Bradshaw!) Fearing our old, buggy theme would degrade further, we upgraded to a new one, Biru, designed by Bob. There are some teething-pain stability issues, but if they don’t cause a reversion in the next day, I’ll apply to Google for re-inclusion. (Uh, does anybody have some boundaries around how long that’s likely to take?)

All these hours of aggravation because some criminal wanted a bit of SEO advantage …

April 7, 2008

Yahoo indeed seems to want an all cash deal

The Microsoft/Yahoo negotiation is in a very public phase right now. In its latest letter, the Yahoo board makes two references to “certainty,” in one case spelling out that this encompasses “certainty of value” and “certainty of closing.”

It’s hard to imagine what the former could mean other than “Please make an all-cash offer (or, better yet, go away).” But I previously noted, Microsoft can indeed afford to buy Yahoo entirely for cash.

The latter part is a reference to the antitrust boogeyman, obviously a non-trivial concern whenever Microsoft is involved.

Please subscribe to our feed!

Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.