December 31st, 2007 Curt Monash
Shortly after my first reference to Shoemoney’s DMOZ issues — who did you think I meant with “shoe in his mouth“? — I got mailbombed big time. Things calmed down after a month or so, although I did change web hosting companies in the fallout.
Starting Christmas Eve — which coincidentally was shortly after a forum mention of various Shoemoney flaps, and of the first attack — I got hit again. And there was another wave right after Christmas. A fair amount of email was lost forever, possibly both professional and personal. My blogs also were down for a while, as were other sites on the same server. (And if you sent me any email over that time period, please resend it.)
It seems that I should move my email/MX record to a different service than hosts my websites, perhaps one that has invested in technology to efficiently deflect DDOS attacks. (Or perhaps I should move one domain with it, if a traditional hosting deal seems best.) Does anybody have any recommendations of such services? Read the rest of this entry »
Posted in Search engine optimization (SEO), Spam and antispam | No Comments »
December 23rd, 2007 Curt Monash
Text mining is science-project artificial intelligence. Fiction. Text mining is proven in many practical applications.
To implement text mining, you need computational linguists. Fact. Monash’s Second Law of Commercial Semantics states “Where there are ontologies, there is consulting.” And it’s linguists, or reasonable facsimiles of same, who do the consulting.
To use text mining, you need computational linguists. Fiction. When last I counted, the number of known computational linguists working for end-user organizations, worldwide, was precisely 1, at Procter & Gamble. (Intelligence agencies excepted, of course.) I’d guess it’s higher now, but I probably could still count them all without taking my socks off.
CRM applications are driving the growth of text mining. Fact. Most current growth in text mining seems to come from Voice of the Customer and Voice of the Market/competitive intelligence applications. And a couple of years ago, when SAS and SPSS had a joint boom in text mining, a lot of that was coming from CRM.
Text mining products are useful mainly for large enterprises. More fact than fiction. Text mining makes the most sense when you have too much text for humans to read and summarize.
Text mining doesn’t fit well with relational databases. Fiction. The fastest-growing text mining companies seem to be Attensity and Clarabridge, who consistently extract textual information into relational databases.
Text mining imposes structure on unstructured* data. More fact than fiction. Most text mining applications involve examining free-text documents and creating entries in relational or XML databases. Most people would call that a transition from unstructured to structured form.
*I still don’t like the “structured/unstructured” distinction, but with repetition I’m getting somewhat inured to it.
Enterprise search is an alternative to text mining. Fact. You can use a high-end search engine to cluster documents and look for trends and insight. It’s not the real McCoy, but in some cases it gives you 80% of the benefit of the real thing.
Text mining is an ingredient, not a product category. Part fact, part fiction. The biggest text mining efforts in the world are probably at Google, Yahoo, Microsoft search, and Dow Jones/Factiva. Antispam vendors also invest a lot in text mining. Two of the top five independent text mining vendors were acquired this year (ClearForest and Inxight). And of the many dozens of small text mining independents, most are focused on specific niches.
Even so, Attensity, Clarabridge, and Temis show that, at least for now, text mining remains a legitimate product category.
The text mining industry is in trouble. Part fact, part fiction. As I recently ranted, even the leading text mining vendors are letting many opportunities pass them by. And like many software sectors, text mining seems poised to be absorbed via large-company acquisition. SAP has already secured a text mining business via BOBJ/Inxight, but at least one vendor each could easily be bought by Oracle, Microsoft (despite the in-house expertise from its search arm), and IBM (despite or even in connection with UIMA).
But in the meantime, a few small text mining vendors are still showing rapid growth.
Previous “fact and fiction” post: Data warehouse appliances.
Stay informed! No hassle, no spam — all it takes is an email address or an RSS subscription! Get all our research — on text analytics, DBMS, BI, and everything else — or just the text analytics part, or even just a very few notifications of our most important news.
Technorati Tags: Text mining, text analytics
Posted in Text mining | 2 Comments »
December 19th, 2007 Curt Monash
Scout Labs sounds like even more of what I was thinking of than Summize. It’s a shame that the “traditional” text mining vendors didn’t get there first.
Posted in Text mining, Voice of the Market/competitive intelligence | 2 Comments »
December 18th, 2007 Curt Monash
I’ve been thinking for a long time that the various text mining companies doing sentiment analysis should try some public-facing (or at least multi-customer) services. Investors might love such a thing. So might marketing managers (actually, Factiva claims to be active there, at least as per their web site). And as a key part of the strategy, text mining companies selling to enterprises might brand such a site and gain massive awareness accordingly. Well, it seems that public-facing sentiment analysis sites are springing up. At least, Summize has. (Hat tip to TechCrunch.) And the text mining vendors are nowhere to be seen.
So what else is new? Read the rest of this entry »
Posted in Application areas, Factiva and Dow Jones, Investment research and trading, Text mining | 1 Comment »
December 12th, 2007 Curt Monash
When Andrew McKay was at FAST, I grumped about his search/BI integration story. Now that he’s trying to do the same thing at a startup called Attivio, it sounds more plausible.
Attivio is having a house party and product rollout in the latter part of January, and details are scarce in the mean time. But here are some highlights.
- Attivio was founded in August. It has 21 people and 1 VC. The VC has invested >$6 million and committed >$12 million total.
- Attivio has ambitious plans for a fully integrated data management/real-time BI stack. It’s currently called the “Active Intelligence Engine.”
Read the rest of this entry »
Posted in Attivio, BI integration, Investment research and trading, Lucene, Open source text analytics | 1 Comment »
December 11th, 2007 Curt Monash
As part of the Monash Advantage program, I published a proprietary Monash Letter about online marketing … and another one … and some further stuff so proprietary I’m not even putting out teasers about it. Now I’ve taken the next step, and written another Letter with a complete overview of software-centric marketing strategy and tactics (lead generation aside). That’s proprietary too, and only available in full if you have access to the secure Monash Advantage website, but here are some semi-random highlights for public consumption. Read the rest of this entry »
Posted in Online marketing | No Comments »
December 9th, 2007 Curt Monash
Ina Fried reports of a Russian chatbot that sure sounds like it passes the Turing test. To wit (emphasis mine):
A program that can mimic online flirtation and then extract personal information from its unsuspecting conversation partners is making the rounds in Russian chat forums, according to security software firm PC Tools.
The artificial intelligence of CyberLover’s automated chats is good enough that victims have a tough time distinguishing the “bot” from a real potential suitor, PC Tools said. The software can work quickly too, establishing up to 10 relationships in 30 minutes, PC Tools said.
That said, threat reports from PC security companies are notoriously hyped, so I wouldn’t get too excited until there’s stronger confirmation. Read the rest of this entry »
Posted in Social software and media | 10 Comments »
December 8th, 2007 Curt Monash
Until the middle of this year, I got negligible search engine traffic from either MSN or Yahoo, or indeed any other search engine except Google. We’re literally talking a 90-95% share for Google, on each of my three main blogs, most months.
But in November, the Windows Live share was 19% on DBMS2, 29% on Text Technologies, and 41% on the Monash Report. And those aren’t blips; in each case there was steady August-November monthly growth. But on the other hand, early December month-to-date figures are all back down. Weird. Read the rest of this entry »
Posted in Microsoft and Windows Live Search, Search and text storage, Search engine optimization (SEO) | No Comments »
December 7th, 2007 Curt Monash
Here are some highlights of the QL2 story, per exec Mike McDermott.
- QL2’s main business is scraping price and other product offering data from the web for high-speed competitive analysis. For example, of their 250ish customers overall, over 90 are airlines. Online retailers are another big chunk of their customer base.
- QL2 also commonly partners with text mining companies in applications such as Voice of the Market or competitive intelligence. E.g., QL2 has been brought into a few deals each by Attensity, Clarabridge, and especially Temis.
- QL2 goes well beyond basic crawling. Notably, the system fills in forms with parameters. And of course it monitors pages for changes.
- QL2’s scripting language is, Mike tells me, very SQL-like. Hence the “QL” in the name.
- QL2 rolls its own filters, rather than using INSO or whoever. (Actually, what are the main file-reading filter choices these days? I’ve lost track.) Indeed, Mike fondly believes QL2 does a better job with PDFs than Adobe does.
- QL2 doesn’t want to be thought of as web-only. Rather, Mike likes my formulation of “text data ETL, web or otherwise.” That said, he freely admits QL2’s strength is in Extract rather than in Transform or Load.
Read the rest of this entry »
Posted in Application areas, QL2, Text mining, Voice of the Market/competitive intelligence | No Comments »
December 2nd, 2007 Curt Monash
Danny Sullivan thinks blended vertical search — which he’s calling Search 3.0 — is a game changer. (In this context, “vertical” search denotes alternate result types such as video, image, map coordinates, or product listings.) In saying that, he’s focused on search marketers, who now have a lot more ways to try to get their messages onto Google searchers’ top result pages. But I presume what he’s really saying is that there will be a feedback effect — if Google tells all web searchers about videos and product listings, then internet marketers will be more motivated to post videos and product listings, and hence there will be more interesting choices of videos and product listings — which Google will naturally wind up featuring more prominently in its search results. And so on.
Given the Youtube explosion, I find it hard to argue with his claim.
Stay informed! No hassle, no spam — all it takes is an email address or an RSS subscription! Get all our research, or just the text analytics part, or even just a very few notifications of our most important news.
Posted in Google, Search and text storage, Search engine optimization (SEO), Specialized search engines, Structured search | No Comments »