OK. I secured permission to actually quote the details on something I’d previously dropped a small hint about — stream processing for text messages. Traditionally, that’s been the province of enterprise search companies. A decade ago, Verity had a kernel group of 6-7 engineers under Phil Nelson. They managed to produce not only a decent search engine, but a search engine “turned on its side” as well. I.e., instead of running one query against a corpus, they could run many queries each against documents as they arrived, one document at a time. Subsequently, the same idea has been implemented by most enterprise search providers, at least those that are serious about the intelligence market.
Well, the event-processing guys are active in that market too. At least StreamBase is. It was an obvious guess to ask if they were, and over the past few months I’ve gotten confirmation (including that they partner w/ Inxight just like almost everybody else does). Here, quoted with permission and lightly edited, is what StreamBase VP Marketing Bill Hobbib has to say on the subject
Regarding text filtering and processing, we’ve done work in financial services for a hedge fund and extensive work with the federal government and intelligence community. The existing StreamBase schema and message/field structure supports a wide array of message types, and text parsing can be done in either the SMTP adapter or the StreamBase engine, where the text processing occurs. Multiplexing onto different streams/schemas is supported.
In terms of text processing, StreamBase can process emails, documents, sentences, individual words or letters using StreamSQL’s existing capabilities, including a variety of built-in standard string operations, or user-defined functions, custom operators, and user-defined aggregates. We can also partner with text-mining companies.
Though you didn’t ask about processing media such as audio or video data, the new BLOB datatype is designed for this purpose. We partner with feature extraction vendors (e.g. speech to text) where necessary, as we don’t process the native audio or video data–just the metadata or the extracted features and converted files.