January 31st, 2008 Curt Monash
I caught up with Expert System S.p.A. last week. They turn out to be doing $10 million in text technology annual revenue. That alone is surprising (sadly), but what’s really remarkable is that they did it almost entirely in the Italian market. As you might guess, that figure includes a little bit of everything, from search engines to Italian language filters for Microsoft Office to text mining. But only $3 ½ million of Expert System’s revenue is from the government (and I think that includes civilian agencies), and under 30% is professional services, so on the whole it seems like a pretty real accomplishment. Oh yes – Expert Systems says it’s entirely self-funded.
As of last year, Expert System also has English-language products, and a couple of minor OEM sales in the US (for mobile search and semantic web applications). German- and Arabic-language products are in beta test. The company says that its market focus going forward is national security – surely the reason for the Arabic – and competitive intelligence. It envisions selling through partners such as system integrators, although I think that makes more sense for the government market than it does vis-a-vis civilian companies. In February the company is introducing a market intelligence product focused on sentiment analysis.
Expert System is a bit of a throwback, in that it talks lovingly of the semantic network that informs its products.
Read the rest of this entry »
Posted in Application areas, Enterprise search, Expert System S.p.A., Ontologies and context identification, Search and text storage, Text mining, Voice of the Market/competitive intelligence | No Comments »
January 28th, 2008 Curt Monash
We all know how “The Year of X” kinds of predictions go. Still, when I read that Forrester Research says enterprises are ready to seriously adopt wikis and message forums, it made sense to me. Email threads — via Notes/Exchange or otherwise — aren’t doing the job any more. It’s time to go straight to communally-created web pages.
Personally, I think it’s also time to further replace email disasters, by having broadcasts over something like an enterprise version of Twitter. Clearly, enterprise Twitter would have to have a lot more tagging, group filtering, and automated censorship — ::sigh:: — than current public Twitter. But that all fits very well into the CEP-based architecture (or some near equivalent) that I believe to be the future of Twitter anyway. So would a complete integration between enterprise Twitter and point-to-point enterprise instant messaging.
Please subscribe to our feed!
Technorati Tags: social software, Enterprise 2.0, forums, Forrester Research
Posted in Social networking, Social software and media, Twitter | No Comments »
January 26th, 2008 Curt Monash
A post that gives you a clear sense of how gobbledydook is automatically generated (from another knowledgeable black-hat SEO who can’t be bothered to get his permalink structure sensible
)
Posted in Online marketing, Search engine optimization (SEO), Spam and antispam | No Comments »
January 18th, 2008 Curt Monash
I don’t know how pronounced this trend is, but Google web search seems to be putting more emphasis on phrases than it used to.
For starters, Google doesn’t always ignore stopwords. The Fly and Fly produce different search results. Beyond that, “or” is sometimes assumed to be a word you’re searching on, not an operator — for an example, try live free or die and see the line of text that comes back under the search box. (I’m not sure whether this ever works for “and” as well — even Sanford and Son returns the usual harangue that “the AND operator is unnecessary”.) This is all a pretty clear indicator that Google is looking at phrases. Bill Slawski’s patent-analysis-heavy SEO blog has a lot more to say on that subject, specifically on an indexing scheme that addresses the problems that indexing stopwords in might otherwise cause.
Also, there’s a direct series of patents on “Phrase-Based Indexing.”
Finally, although I don’t recall a link, there seems to be a belief that:
- Google is using or moving to Latent Semantic Indexing (LSI)
- Word-based LSI is patented by somebody else.
Posted in Google, Search and text storage | 3 Comments »
January 17th, 2008 Curt Monash
Here are the top 200 tags (words? subjects? themes?) in the Iliad, per IBM Research.
Neither Paris nor Helen makes the list. Either Homer couldn’t stay on topic, or else the ostensible reasons for the war had little to do with the real issues. I say it’s the latter. Plus ça change, plus c’est la même chose.
Evidently one can upload one’s own data there to make one’s own visualizations.
Posted in Uncategorized | No Comments »
January 17th, 2008 Curt Monash
Lynda Moulton and I see enterprise search quite similarly, as I discovered when she called me yesterday to praise my post on the many differences between enterprise and web search, and followed up with this one of her own. One of Lynda’s big themes is that large enterprises, much as they use multiple database management systems, use multiple search engines too. Read the rest of this entry »
Posted in Business Objects and Inxight, Enterprise search, Search and text storage | 4 Comments »
January 17th, 2008 Curt Monash
The Reg passes along a Reuters story that Hungarian scientists have built a system to automatically understand canine vocalizations. I’d like to say it’s a woof-to-Magyar translator, but apparently all it does is recognize the doggies’ emotional states. The story reports that the system has 43% accuracy, vs. 40% for humans.
I must confess, however, to being somewhat puzzled about how they measure success. Does the pooch fill out a survey form afterwards? Do they conclude that the beast wasn’t angry if the experimenter doesn’t get bitten?
I need to know a bit more about the research protocol before I know what to think about this.
EDIT: The CBC has a little more detail. The underlying research paper is appearing in Animal Cognition.
Posted in Natural language and speech recognition, Speech recognition | 1 Comment »
January 16th, 2008 Curt Monash
XMCP writes one of the better black hat SEO blogs. In a post last November, he laid out a ton of advice about automating black hat SEO. Personally, I don’t approve of doing black hat SEO. Still, it’s an intellectually interesting subject. What’s more, black hat SEOs create a large fraction of all websites, and certainly of all blog comments, links, and so on. So it’s interesting to track them.
Most interesting to me and probably to most readers here is the part that shows where black hat SEOs get their content: Read the rest of this entry »
Posted in Search engine optimization (SEO), Spam and antispam | 2 Comments »
January 14th, 2008 Curt Monash
Stephen Spencer has a great interview with Matt Cutts of Google, from last month’s Pubcon. Almost all of it is SEO-related. But it also contains a few tidbits that may be interesting even if one doesn’t care about SEO, such as:
- Google now indexes up to 1/2 a megabyte per page, up from the old 101K limit.
- Google needs to do a fair amount of image recognition, but they’re going fairly plain-vanilla. For Flash they use an Adobe-supplied SDK. For detecting hidden text (e.g., white-on-white) they use what Matt characterizes as pretty simple heuristics.
- As I noted recently, Google seems to have a lot of heuristics for identifying particular types of pages. In this interview, the example was that a page that would otherwise seem spammy because it consisted only of links would be fine if it were serving as a true site map or archive.
SEO highlights included: Read the rest of this entry »
Posted in Google, Search and text storage, Search engine optimization (SEO) | No Comments »
January 14th, 2008 Curt Monash
Eric Lai wrote in this week’s Computerworld about “Why is enterprise search harder than Google Web search?” Highlights included:
- He described enterprise search as consisting mainly of a search box plus faceted searching, with maybe some automated tagging as well.
- He observed that off-page factors such as PageRank don’t work nearly as well in an enterprise as they do on the Web, and that manual tagging by enterprise users falls far short of closing the gap.
- He stumbled a bit compare/constrasting search engines and “structured” DBMS.
- He basically endorsed the worldview of Ali Riaz, late of FAST, now of Attivio.
On the whole, that’s not bad. If this were an easy subject to write about, I’d have explained it a lot more clearly in the past myself. OK. Let me get off my duff and give it a whirl now. Read the rest of this entry »
Posted in Attivio, Enterprise search, FAST, Google, Search and text storage | 12 Comments »