July 26, 2007

Event stream processors active in text filtering

OK. I secured permission to actually quote the details on something I’d previously dropped a small hint about — stream processing for text messages. Traditionally, that’s been the province of enterprise search companies. A decade ago, Verity had a kernel group of 6-7 engineers under Phil Nelson. They managed to produce not only a decent search engine, but a search engine “turned on its side” as well. I.e., instead of running one query against a corpus, they could run many queries each against documents as they arrived, one document at a time. Subsequently, the same idea has been implemented by most enterprise search providers, at least those that are serious about the intelligence market.

Well, the event-processing guys are active in that market too. At least StreamBase is. It was an obvious guess to ask if they were, and over the past few months I’ve gotten confirmation (including that they partner w/ Inxight just like almost everybody else does). Here, quoted with permission and lightly edited, is what StreamBase VP Marketing Bill Hobbib has to say on the subject

Regarding text filtering and processing, we’ve done work in financial services for a hedge fund and extensive work with the federal government and intelligence community. The existing StreamBase schema and message/field structure supports a wide array of message types, and text parsing can be done in either the SMTP adapter or the StreamBase engine, where the text processing occurs. Multiplexing onto different streams/schemas is supported.

In terms of text processing, StreamBase can process emails, documents, sentences, individual words or letters using StreamSQL’s existing capabilities, including a variety of built-in standard string operations, or user-defined functions, custom operators, and user-defined aggregates. We can also partner with text-mining companies.

Though you didn’t ask about processing media such as audio or video data, the new BLOB datatype is designed for this purpose. We partner with feature extraction vendors (e.g. speech to text) where necessary, as we don’t process the native audio or video data–just the metadata or the extracted features and converted files.

More on StreamBase can be found over on DBMS2, e.g. in this post or this one.

Comments

2 Responses to “Event stream processors active in text filtering”

  1. Text Technologies»Blog Archive » More on text processing in CEP on August 3rd, 2007 9:21 pm

    [...] isn’t the only complex event/stream processing (CEP) vendor doing text processing. Progress Apama is as well. Stemming, fuzzy matching, and so on seem to happen all the time. But [...]

  2. DBMS2 — DataBase Management System Services » Blog Archive » Applications for not-so-low-latency CEP on April 25th, 2008 12:07 am

    [...] matching and filtering requirements are just a better fit for the CEP paradigm. For example, StreamBase, Apama, and Coral8 each have some degree of activity in text [...]

Leave a Reply




Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.