I believe there are two ways search will improve significantly in the future. First, since talking is easier than typing, speech recognition will allow longer and more accurate input strings. Second, search will be informed by much more persistent user information, with search companies having very detailed understanding of searchers. Based on that, I expect:
- A small oligopoly dominating the conjoined businesses of mobile device software and search. The companies most obviously positioned for membership are Google and Apple.
- The continued and growing combination of search, advertisement/recommendation, and alerting. The same user-specific data will be needed for all three.
- A whole lot of privacy concerns.
My reasoning starts from several observations:
- Enterprise search is greatly disappointing. My main reason for saying that is anecdotal evidence — I don’t notice users being much happier with search than they were 15 years ago. But business results are suggestive too:
- HP just disclosed serious problems with Autonomy.
- Microsoft’s acquisition of FAST was a similar debacle.
- Lesser enterprise search outfits never prospered much. (E.g., when’s the last time you heard mention of Coveo?)
- My favorable impressions of the e-commerce site search business turned out to be overdone. (E.g., Mercado’s assets were sold for a pittance soon after I wrote that, while Endeca and Inquira were absorbed into Oracle.)
- Lucene/Solr’s recent stirrings aren’t really in the area of search.
- Web search, while superior to the enterprise kind, is disappointing people as well. Are Google’s results any better than they were 8 years ago? Google’s ongoing hard work notwithstanding, are they even as good?
- Consumer computer usage is swinging toward mobile devices. I hope I don’t have to convince you about that one.
In principle, there are two main ways to make search better:
- Understand more about the documents being searched over. But Google’s travails, combined with the rather dismal history of enterprise search, suggest we’re well into the diminishing-returns part of that project.
- Understand more about what the searcher wants.
The latter, I think, is where significant future improvement will be found.
|Categories: Autonomy, Coveo, Endeca, Enterprise search, FAST, Google, Lucene, Mercado, Microsoft, Search engines, Speech recognition, Structured search||4 Comments|
SOPA (Stop Online Piracy Act) is getting blasted all over the Internet. Even so, one of its major dangers has not yet been widely discussed. People seem to realize that SOPA can create censorship by governments, or businesses, or as collateral damage when governments and businesses pursue other interests. But they may not yet grasp that SOPA can allow individuals to stifle free speech as well.
To quote the owner of a popular sports fan discussion forum (emphasis mine):
The problem is several of the provisions in SOPA will force ISPs hosting websites (ie: the company that hosts our servers) to potentially disconnect us from the Internet if there’s a claim – unsubstantiated or not – that we’re infringing against copyright, regardless of if it has not been fully proved in court. The argument is that this would make it easy for someone to make false or weak claims against the site to take a us offline until we went to court.
That’s a headache I’m not prepared to deal with. The number of threats I get each year via e-mail from angry members from other teams we remove are pretty unreal and obviously you guys don’t see them, so giving any additional ammunition backed up by a law like this would be a potentially huge issue. I’ve been talking with other sites and it’s a very real concern that we’re all potentially going to be faced with if this goes through, unless it’s rewritten to better target the sites that are really the ones they’re looking to address.
And that’s just from the passions of sports fandom. The passions of the politics — or the commercial interests of those being criticized — are of even greater concern.
Indeed, SOPA-like legislation creates an easy way to take down any forum, blog, or other site that allows user-generated content: flood it with copyrighted content, then run to the regulators. We must never, ever, ever accept a legal regime in which publishers may be censored before they are PROVED to be guilty of wrongdoing.
In case you missed it, Sarah Lacy has launched Pando Daily, aka “Spawn of TechCrunch”. It has a clear mission statement, which she phrased as
the site-of-record for that startup root-system and everything that springs up from it, cycle-after-cycle
and mentor/investor/board member Mike Arrington simply called
to be the paper of record for Silicon Valley
That, I believe, is in the form a journalistic mission statement should take:
- “We (will) offer the best X about Y”, where …
- … “X” is something like news or analysis or opinion and …
- … “Y” is a particular subject area.
But there’s a problem with that template. One would ideally wish a mission statement of the form “We do the best A” to be followed up by “and, obviously, people will pay lots of money for A”. Journalistic mission statements don’t have that nice property.
Fortunately, at least in the case of tech blogging, they do tend to have a nice substitute. Let me explain.
The recent Dreamforce conference (i.e, salesforce.com’s extravaganza) focused attention on “the social enterprise” or, more generally, enterprises’ uses of social technology. salesforce is evidently serious about this push, with development/acquisition investment (e.g. Chatter, Radian 6), marketing focus (e.g. much of Dreamforce) and sales effort (Mark Benioff says he got thrown out of a CIO’s office because he wouldn’t stop talking about the “social” subject) all aligned.
It’s a cool story, and worthy of attention. But I’d like to step back and remind us that there are numerous different ways to use social technology in the enterprise, which probably shouldn’t be confused with each other. And then I’d like to discuss one area of social technology that’s relatively new to me: integration between social and operational applications.
I wasn’t asked to moderate a panel at the Text Analytics Summit because the guy running it — NOT Seth Grimes — didn’t feel “comfortable” with me doing so. (I wanted real discussion; Ezra evidently just wanted to buy off sponsors and partners with marketing-opportunity slots.) I also wasn’t given a press pass.* (Although uninterested in the sessions, I was interested in stopping by and meeting some newer vendors.)
*This is although I’ve spoken at four prior versions of the event, and responded to their request for free consulting as recently as this year.
They have a business model that does not apply well to the IT conference space.
The Text Analytics Summit has been troubled for years, but evidently things have gotten worse.
This is more than an incidental problem. Interest in text data is exploding, and marketplace confusing about text analytic technology abounds. More clarity is needed, but too few folks have found an economic model for providing it. (The industry shares some of the blame for that.) I’m glad Seth is doing other conference work — notably on sentiment analysis — but yet more is needed.
If I get into the conference business — and it seems natural that I would — I’ll try to help fill the gap. But if somebody else beats me to the punch, more power to you, and please let me know how I can help.
Text analytics application areas typically fall into one or more of three broad, often overlapping domains:
- Understanding the opinions of customers, prospects, or other groups. This can be based on any combination of documents the user organization controls (email, surveys, warranty reports, call center logs, etc.) — in which case — or public-domain documents such as blogs, forum posts, and tweets. The former is usually called Voice of the Customer (VotC), while the latter is Voice of the Market (VotM).
- Detecting and identifying problems. This can happen across many domains — VotC, VotM, diagnosing equipment malfunctions, identifying bad guys (from terrorists to fraudsters), or even getting early warnings of infectious disease outbreaks.
- Aiding text search, custom publishing, and other electronic document-shuffling use cases, often via document augmentation.
For several years, I’ve been distressed at the lack of progress in text analytics or, as it used to be called, text mining. Yes, the rise of sentiment analysis has been impressive, and higher volumes of text data are being processed than were before. But otherwise, there’s been a lot of the same old, same old. Most actual deployed applications of text analytics or text mining go something like this:
- A bunch of documents are analyzed to ascertain the ideas expressed in them.
- A count is made as to how many times each idea turns up.
- The application user notices any surprisingly large numbers, and as result of noticing pays attention to the corresponding ideas.
Often, it seems desirable to integrate text analytics with business intelligence and/or predictive analytics tools that operate on tabular data is. Even so, such integration is most commonly weak or nonexistent. Apart from the usual reasons for silos of automation, I blame this lack on a mismatch in precision, among other reasons. A 500% increase in mentions of a subject could be simple coincidence, or the result of a single identifiable press article. In comparison, a 5% increase in a conventional business metric might be much more important.
But in fairness, the text analytics innovation picture hasn’t been quite as bleak as what I’ve been painting so far. Read more
|Categories: Attensity, BI integration, Investment research and trading, SPSS, Text mining, Voice of the Customer||9 Comments|
Time for a notes/links/comments post just for Text Technologies: Read more
|Categories: Blogosphere, Online media, Sentiment analysis, Social software and online media, Text mining||Leave a Comment|
Jonathan Stray reminds us of an excellent point:
New Media journalism should be thought of as a product that people use, not as collection of stories or other pieces.
In particular, he argues:
- The value of journalism can only be assessed in connection with how people use it …
- … and their lack of enthusiasm about New Media news is a warning sign.
- Technology and form factor matter; imitating old media is likely not the best way to go.
- Personalization and targeting need to be a lot better. In particular:
- What’s most important is getting stories to the people who are likely to want to act on what’s in them. The true value of journalism lies in informing people’s choices and actions. (By contrast, he seems to denigrate the other main benefits of news, which are pure entertainment and/or the facilitation of social interaction.)
- It’s OK and natural that the people inclined to act — on a given story or indeed at all — are only a small fraction of the overall population.
I am in vehement agreement with much of what Stray has to say, although I think he understates the importance of general knowledge and the often serendipitous benefits of pursuing same. Read more
It is common to say that “On the whole, journalism will be fine even as the media industry is disrupted – but the investigative part of journalism may not fare so well.” Indeed, I took something like that stance in my May, 2009 post on where the information ecosystem is headed and even more directly in an earlier piece that month. However, I’ve changed my mind in an optimistic direction, and now believe:
There are still some things we need to do to preserve and extend the societal benefits of investigative reporting. But they are straightforward and very likely to happen.
Specifically, I recommend: Read more