Text Technologies

The future of search

Curt Monash — Mon, 26 Nov 2012 03:07:34 +0000

I believe there are two ways search will improve significantly in the future. First, since talking is easier than typing, speech recognition will allow longer and more accurate input strings. Second, search will be informed by much more persistent user information, with search companies having very detailed understanding of searchers. Based on that, I expect:

A small oligopoly dominating the conjoined businesses of mobile device software and search. The companies most obviously positioned for membership are Google and Apple.
The continued and growing combination of search, advertisement/recommendation, and alerting. The same user-specific data will be needed for all three.
A whole lot of privacy concerns.

My reasoning starts from several observations:

Enterprise search is greatly disappointing. My main reason for saying that is anecdotal evidence — I don’t notice users being much happier with search than they were 15 years ago. But business results are suggestive too:
- HP just disclosed serious problems with Autonomy.
- Microsoft’s acquisition of FAST was a similar debacle.
- Lesser enterprise search outfits never prospered much. (E.g., when’s the last time you heard mention of Coveo?)
- My favorable impressions of the e-commerce site search business turned out to be overdone. (E.g., Mercado’s assets were sold for a pittance soon after I wrote that, while Endeca and Inquira were absorbed into Oracle.)
- Lucene/Solr’s recent stirrings aren’t really in the area of search.
Web search, while superior to the enterprise kind, is disappointing people as well. Are Google’s results any better than they were 8 years ago? Google’s ongoing hard work notwithstanding, are they even as good?
Consumer computer usage is swinging toward mobile devices. I hope I don’t have to convince you about that one.

In principle, there are two main ways to make search better:

Understand more about the documents being searched over. But Google’s travails, combined with the rather dismal history of enterprise search, suggest we’re well into the diminishing-returns part of that project.
Understand more about what the searcher wants.

The latter, I think, is where significant future improvement will be found.

So how does a search engine understand what you want? It can listen to you directly, parsing your search string. It can ask for more clarity, through some kind of disambiguation interface. Or it can make inferences, based on — well, based on just about any kind of information that might exist about you and your online behavior.

Search strings are short, typically four words or less. That doesn’t leave room for a lot of innovative parsing. Not a lot of progress can be made until search strings get a lot longer, and that is unlikely except perhaps through the convenience of speech recognition.

Faceted/parameterized selection has its place. For example, when I search on Amazon.com, the site encourages me to also select a department from its dropdown menu; otherwise, it refuses to rank the search results. And when I buy shirts from Land’s End, I just click through and never search at all. Still, Google’s been around for 15 years, and about all its successes in searcher-does-the-work disambiguation boil down to is:

A list of a few major subcategories to search (News, YouTube, etc.).
Spelling correction.
A desultory list of related/more specific searches, perhaps just longer search strings other people have recently entered.
Well-hidden “Advanced Search” features, which look much like AltaVista’s and AllTheWeb’s similar features did late in the 20th Century.

Whatever the user attitudes and behaviors are that constrain Google’s or its competitors’ success in this area, I can’t imagine them changing much — except, once again, in the event that speech recognition leads to richer human-computer conversations.

I’ve now highlighted two different ways in which there’s a search-interface challenge that will be tough to beat without turning to speech recognition. But the case for speech recognition is even stronger than that. We’re moving to small, mobile devices, and:

Traditional search interfaces work worse on mobile devices than on desktop computers. Typing is harder. So is dealing with picky forms.
Speech may work as well or better on mobile devices than at your desk. If you have upgraded your Apple device to IOS 6, you have both a microphone and Siri. The same may not be true of your desktop gear.

And so I conclude that speech recognition is a big part of the future of search.

What will that allow? Since talking is easier than typing, speech is a way to get longer text strings as search inputs, or more of them. It’s plausible that people might speak queries as complex as:

“I want to buy a recharger for an iPad 3 with delivery this week.”
“Where is 10gen’s Northern California office?” … “Which nearby restaurants have good Yelp reviews?”
“Tell me about the David Reed who went to the Kennedy School of Government around 1977, went to Dartmouth before that, and worked for the Federal Communications Commission.”

Getting search engines to the point that they can handle such queries will be difficult but straightforward — but even more progress is needed. Search results for various queries will be greatly improved if the search engine “knows” things like:

The location of your home and office, and the distance you’re willing to go from them to eat or shop.
Your tastes in food, clothing, and gadgetry.
The level of sophistication at which you like to read about medicine, finance, or electronics.
Which people are or might be in your extended social network.

And that will cement internet search squarely in the world of — for once I approve of the term — big data.

SOPA’s potentially chilling effect on public debate

Curt Monash — Wed, 18 Jan 2012 17:02:59 +0000

SOPA (Stop Online Piracy Act) is getting blasted all over the Internet. Even so, one of its major dangers has not yet been widely discussed. People seem to realize that SOPA can create censorship by governments, or businesses, or as collateral damage when governments and businesses pursue other interests. But they may not yet grasp that SOPA can allow individuals to stifle free speech as well.

To quote the owner of a popular sports fan discussion forum (emphasis mine):

The problem is several of the provisions in SOPA will force ISPs hosting websites (ie: the company that hosts our servers) to potentially disconnect us from the Internet if there’s a claim – unsubstantiated or not – that we’re infringing against copyright, regardless of if it has not been fully proved in court. The argument is that this would make it easy for someone to make false or weak claims against the site to take a us offline until we went to court.

That’s a headache I’m not prepared to deal with. The number of threats I get each year via e-mail from angry members from other teams we remove are pretty unreal and obviously you guys don’t see them, so giving any additional ammunition backed up by a law like this would be a potentially huge issue. I’ve been talking with other sites and it’s a very real concern that we’re all potentially going to be faced with if this goes through, unless it’s rewritten to better target the sites that are really the ones they’re looking to address.

And that’s just from the passions of sports fandom. The passions of the politics — or the commercial interests of those being criticized — are of even greater concern.

Indeed, SOPA-like legislation creates an easy way to take down any forum, blog, or other site that allows user-generated content: flood it with copyrighted content, then run to the regulators. We must never, ever, ever accept a legal regime in which publishers may be censored before they are PROVED to be guilty of wrongdoing.

Freemium journalism business models, or the Launch of the Spawn of TechCrunch

Curt Monash — Tue, 17 Jan 2012 10:44:41 +0000

In case you missed it, Sarah Lacy has launched Pando Daily, aka “Spawn of TechCrunch”. It has a clear mission statement, which she phrased as

the site-of-record for that startup root-system and everything that springs up from it, cycle-after-cycle

and mentor/investor/board member Mike Arrington simply called

to be the paper of record for Silicon Valley

That, I believe, is in the form a journalistic mission statement should take:

“We (will) offer the best X about Y”, where …
… “X” is something like news or analysis or opinion and …
… “Y” is a particular subject area.

But there’s a problem with that template. One would ideally wish a mission statement of the form “We do the best A” to be followed up by “and, obviously, people will pay lots of money for A”. Journalistic mission statements don’t have that nice property.

Fortunately, at least in the case of tech blogging, they do tend to have a nice substitute. Let me explain.

TechCrunch and Pando Daily seem to have the same business plan:

Create a popular and respected blog.
Use the access provided by that popularity and respect to populate great conferences.
Use the readership provided by that blog to promote the conferences.
Ka-ching.

I have an analogous plan for DBMS 2:

Create a popular and respected blog.
Use the access provided by that popularity and respect to inform great consulting.
Use the readership provided by that blog to promote the consulting.
Ka-ching.

Other business models, such as GigaOm’s, would seem to be a hybrid of our two. All are what could be called “freemium” models, even if the other guys (and gals) sell a few ads as well. All seem to work.

Here’s what I think is the non-obvious part of our models:

Different parts of our readership are important for different reasons.

To a first approximation:

Everybody who reads our work and benefits from it makes us feel good, and motivates us to do more.
Everybody who reads our work and is influenced by it makes tech vendors want to be on our good side, talk to us, give us insight, please us by speaking at our events, and so on.
A moderate fraction of our readers help us expand our readership by word-of-mouth.
Only a small fraction of our readers chip in with helpful blog comments, insightful/tip-off e-mail, and the like, or by publicly throwing us links/tweets.
Only a small fraction of our readers are likely to ever give us money.

I think a lot of successful journalistic (or quasi-journalistic) business models will be similarly layered.

Social technology in the enterprise

Curt Monash — Wed, 14 Sep 2011 06:04:36 +0000

The recent Dreamforce conference (i.e, salesforce.com’s extravaganza) focused attention on “the social enterprise” or, more generally, enterprises’ uses of social technology. salesforce is evidently serious about this push, with development/acquisition investment (e.g. Chatter, Radian 6), marketing focus (e.g. much of Dreamforce) and sales effort (Mark Benioff says he got thrown out of a CIO’s office because he wouldn’t stop talking about the “social” subject) all aligned.

Denis Pombriant obviously attended the same Marc Benioff session I did. Dion Hinchcliffe blogged the whole story in considerable detail.

It’s a cool story, and worthy of attention. But I’d like to step back and remind us that there are numerous different ways to use social technology in the enterprise, which probably shouldn’t be confused with each other. And then I’d like to discuss one area of social technology that’s relatively new to me: integration between social and operational applications.

Suppose we split up social technology use cases by saying it can help you:

Communicate and collaborate internally …
… and also with small groups of outsiders, such as your supply chain.
Observe, listen to, and interact with consumers (and the world at large).

The biggest buzz, of course, is around social technology that reaches out to the buying public or world at large. You can use social technology to:

Observe and listen to consumers — i.e., classic Voice of the Customer/Voice of the Market text analytics.
Publish to consumers, influencers, etc., via blogging, broadcast-oriented Twitter, and other social media, or go even further and …
… communicate with consumers interactively, whether through loosely-structured interaction (e.g. Twitter), or in the more structured ways that Attensity and others provide.

I support all that, and indeed participate ferociously myself. But for now, let’s move on.

On the internal collaboration/communication side, I’d say:

Any communication tool useful for communicating with the public may be valuable internally as well — portals, blogs, Twitter-imitators, and so on.
Pure email “push” may not always be the best tool for point-to-point internal communication.
Text analytics on internal communication can have a variety of uses, e.g:
- Compliance (yet another privacy intrusion, but sometimes a legitimate one).
- Internal expert-finding. (In principle, this is the traditional genuine benefit of elaborate “knowledge management” implementations, but without the burdens of traditional knowledge management. In practice, that didn’t work out so great for Tacit Software.)
- Project management.

That all gives plenty of scope for useful adoption, on both the email-replacement and text-analytic sides. But again, let’s keep going.

The relatively new to me — notwithstanding the “portals” link above — part of the social technology story is integration between social and operational applications. While at Dreamforce, I talked with two manufacturing application SaaS vendors — Kenandy and Rootstock Software. In both cases I asked “So what are you doing that’s an advance over where MRP was 20 years ago?” In both cases the main answer was “Now users can use social technology to track and communicate about particular orders or issues.”

*MRP stood for “Material Requirements Planning” and then “Manufacturing Resources Planning”, and is essentially the forerunner of ERP. By “Kenandy” I specifically mean Kenandy’s founder — ASK Computer Systems founder and thus MRP legend Sandy Kurtzig.

Good point. Of course, it can be generalized; one can communicate and collaborate around almost any kind of business process. I’ve mentioned this before in analytic contexts; it’s an important concept on the monitoring-oriented side of business intelligence and — if Oliver Ratzesberger is to be believed — in investigative analytics as well. But the operational side may actually be more important.

Some things one does in the business world actually involve using one’s body, from manufacturing products to repairing power stations to standing in a store and serving customers. Most of the rest fits into one or more of three buckets:

Creating (a product, a marketing plan, a marketing document, a compensation plan, a program for internal use, an analytic insight, …)
Relating (to an employee, a sales prospect, a reporter, …)
Participating in a fairly routine business process (data entry, accounting, mortgage approval, parts ordering, …)

And why can’t we just automate those routine business processes away? Because there’s so often a need for manual intervention. And when there’s a need for manual intervention, there’s usually also an element of communicating with other people. This is almost always true in cases of trouble-shooting or exception-handling (an order is late, a system is down, the automated result violates common sense). It may be present in other cases as well (the new account calls for a personal thank you note, the food order needs to be annotated with special requests). General email is commonly an awkward medium for these communications; automated messages are worse. Newer social technologies, however, have the potential to do much better.

So what do you think? Have I drunk too much Kool-Aid, or is this stuff for real?

The Text Analytics Summit needs to be replaced

Curt Monash — Fri, 13 May 2011 00:17:34 +0000

I wasn’t asked to moderate a panel at the Text Analytics Summit because the guy running it — NOT Seth Grimes — didn’t feel “comfortable” with me doing so. (I wanted real discussion; Ezra evidently just wanted to buy off sponsors and partners with marketing-opportunity slots.) I also wasn’t given a press pass.* (Although uninterested in the sessions, I was interested in stopping by and meeting some newer vendors.)

*This is although I’ve spoken at four prior versions of the event, and responded to their request for free consulting as recently as this year.

OK, that might have been personal in some way — but Nick Patience tweets a very similar story. Even Seth himself tweets that

They have a business model that does not apply well to the IT conference space.

The Text Analytics Summit has been troubled for years, but evidently things have gotten worse.

This is more than an incidental problem. Interest in text data is exploding, and marketplace confusing about text analytic technology abounds. More clarity is needed, but too few folks have found an economic model for providing it. (The industry shares some of the blame for that.) I’m glad Seth is doing other conference work — notably on sentiment analysis — but yet more is needed.

If I get into the conference business — and it seems natural that I would — I’ll try to help fill the gap. But if somebody else beats me to the punch, more power to you, and please let me know how I can help.

The state of the art in text analytics applications

Curt Monash — Thu, 02 Dec 2010 02:06:54 +0000

Text analytics application areas typically fall into one or more of three broad, often overlapping domains:

Understanding the opinions of customers, prospects, or other groups. This can be based on any combination of documents the user organization controls (email, surveys, warranty reports, call center logs, etc.) — in which case — or public-domain documents such as blogs, forum posts, and tweets. The former is usually called Voice of the Customer (VotC), while the latter is Voice of the Market (VotM).
Detecting and identifying problems. This can happen across many domains — VotC, VotM, diagnosing equipment malfunctions, identifying bad guys (from terrorists to fraudsters), or even getting early warnings of infectious disease outbreaks.
Aiding text search, custom publishing, and other electronic document-shuffling use cases, often via document augmentation.

For several years, I’ve been distressed at the lack of progress in text analytics or, as it used to be called, text mining. Yes, the rise of sentiment analysis has been impressive, and higher volumes of text data are being processed than were before. But otherwise, there’s been a lot of the same old, same old. Most actual deployed applications of text analytics or text mining go something like this:

A bunch of documents are analyzed to ascertain the ideas expressed in them.
A count is made as to how many times each idea turns up.
The application user notices any surprisingly large numbers, and as result of noticing pays attention to the corresponding ideas.

Often, it seems desirable to integrate text analytics with business intelligence and/or predictive analytics tools that operate on tabular data is. Even so, such integration is most commonly weak or nonexistent. Apart from the usual reasons for silos of automation, I blame this lack on a mismatch in precision, among other reasons. A 500% increase in mentions of a subject could be simple coincidence, or the result of a single identifiable press article. In comparison, a 5% increase in a conventional business metric might be much more important.

But in fairness, the text analytics innovation picture hasn’t been quite as bleak as what I’ve been painting so far. While standalone, passively-reported text analytics is indeed the baseline, there are some interesting exceptions. For example:

I once confirmed that SPSS customer Cablecom‘s statistical models for churn and the like absolutely included text data; Cablecom even assigned different weights to the same apparent level of emotion depending on whether the text was in German, French, or Italian. Vertica recently told me of a Vertica/Hadoop customer doing something similar, except for the multilingual aspect. And the end of a 2008 SAS-based paper makes similar claims.
There long* have been some examples of fact extraction that don’t really fit into my three buckets above. For example, researchers mine collections of articles to try to determine biochemical or biological pathways that would not be apparent from examining single research studies alone.
It also has long* been the case that some bad-guy-finding applications — especially in the anti-terrorism area — used text analytics to populate state-of-the-art graph-oriented data analysis tools.

*When it comes to text analytics, “long” means “at least for the past several years.”

In more recent examples:

Greenplum built a document recommender for law firms that does hard-core statistical analysis to determine which .1% of a document set lawyers might actually want to see, and which then learns from users’ feedback after they respond to initial result sets.
Information extracted from investment news gets included into automated trading algorithms. This was unusual technology a couple of years ago, but is more common today.
After a series of mergers, Attensity now uses marketing-oriented text analytics in at least three different ways:
- Attensity text analytics feeds marketing dashboards just as it always did.
- Attensity text analytics triggers alerts, as I wish dashboards and business intelligence tools more often did, the false positives problem notwithstanding.
- Attensity text analytics triggers concrete workflows, for example routing specific social media hits for priority response.
- And in one example that did not actually get into production, a very large social networking company correlated word usage (e.g., choice among different synonyms) against user characteristics such as age and gender.

Finally there are some applications that, while fitting the standard template, just strike me as getting to unusually sophisticated levels of analysis. For example, Vertica told me of another Vertica/Hadoop case where VotM document analysis is carried out to the level of observing which order brand names appear in, and adjusting that for whether or not it was just an alphabetical list.

I suspect text analytics is about to become more interesting again.

Related links

The enabling technology for text/tabular data integration has existed for years.
In 2006, I listed major application areas for data mining/predictive analytics. It overlaps pretty closely with the similar list for text mining/text analytics.
Before being acquired by IBM, SPSS boasted a rather large text mining user base.

Notes, links, and comments, October 24, 2010

Curt Monash — Sun, 24 Oct 2010 08:58:25 +0000

Time for a notes/links/comments post just for Text Technologies:

TechCrunch got sold, GigaOm raised money, and VentureBeat/MediaBeat provided a good starting link for both those stories and more. Since TechCrunch and GigaOm are/were both private, financial details are murky, but:
- TechCrunch is variously reported as having revenue in the $6-10 million range, probably mainly from events. (If you believe that they sell ~3000 total tickets at ~$2000 each to two annual versions of TechCrunch Disrupt, that makes sense.)
- GigaOm reports >10,000 subscribers to market research sevice (sort of) GigaOm Pro, at $199, apparently concentrated on the vendor side.
John Gruber straightforwardly posts both ad rates and circulation for his blog. It’s a simple $5000/week for readership that exceeds mine by >1 order of magnitude.
The New Yorker points out Gawker Media may not yet have crossed $20 million in revenue.
An “ASCAP for news” seems to finally be on the way.
Business Week/Bloomberg notices a trend that social-media/Voice of the Customer/Voice of the Market text analytics firms are getting acquired by bigger marketing-oriented firms. Seth Grimes, however, argues that the same trend is already passe’.
TechCrunch accused the Wall Street Journal of killing a story about sister company MySpace, then quickly running it after TechCrunch caught them.
LinkedIn has a really cool-looking tech blog. One recent post describes LinkedIn’s approach to socially-informed search. I read about it in a thoughtful post on Daniel Tunkelang’s blog.
Bill Simmons took 3843 words to explain the story of a two-word tweet — “moss Vikings.” Somewhere in there are a few interesting ruminations about media in the current age.
Some notes and links that actually belong here instead went up on DBMS 2 a few weeks ago.
About half of what I write about liberty and privacy is highly relevant to the subjects of this blog, including almost all of today’s post.

A framework for thinking about New Media journalism

Curt Monash — Tue, 28 Sep 2010 05:54:10 +0000

Jonathan Stray reminds us of an excellent point:

New Media journalism should be thought of as a product that people use, not as collection of stories or other pieces.

In particular, he argues:

The value of journalism can only be assessed in connection with how people use it …
… and their lack of enthusiasm about New Media news is a warning sign.
Technology and form factor matter; imitating old media is likely not the best way to go.
Personalization and targeting need to be a lot better. In particular:
- What’s most important is getting stories to the people who are likely to want to act on what’s in them. The true value of journalism lies in informing people’s choices and actions. (By contrast, he seems to denigrate the other main benefits of news, which are pure entertainment and/or the facilitation of social interaction.)
- It’s OK and natural that the people inclined to act — on a given story or indeed at all — are only a small fraction of the overall population.

I am in vehement agreement with much of what Stray has to say, although I think he understates the importance of general knowledge and the often serendipitous benefits of pursuing same. For example:

I tend to assume that what we write affects people’s choices by supporting their informed judgments.
I think it is neither necessary nor acceptable to let investigative reporting wane.
I have witheringly negative opinions about vacuous “news.”

And I indeed try to practice what Stray preaches. Most of my own posts — especially when you weight them by length and/or time spent researching and writing them — are designed to help at least some people make on-the-job decisions.

I do just mean “help,” the assumption being that people read my work as part of a general research process.
That lots of you read more for general interest or education is great. I suspect you still like the standard of quality to which I aspire, namely that what I write should in most cases actually be informative even to people who have reason to be well-informed in the area already.

How to preserve investigative reporting in the New Media Era

Curt Monash — Sun, 26 Sep 2010 12:18:36 +0000

It is common to say that “On the whole, journalism will be fine even as the media industry is disrupted – but the investigative part of journalism may not fare so well.” Indeed, I took something like that stance in my May, 2009 post on where the information ecosystem is headed and even more directly in an earlier piece that month. However, I’ve changed my mind in an optimistic direction, and now believe:

There are still some things we need to do to preserve and extend the societal benefits of investigative reporting. But they are straightforward and very likely to happen.

Specifically, I recommend:

Public-spirited law-oriented types should do a better job of popularizing tips for how to get information out of government (Freedom of Information Act and all that). And back it up with more pro bono or charitably-funded legal assistance – not just for specific causes, but for general corruption investigations as well.
- I’m sure quite a bit of that is happening, but it should be much more visible and active.
Domain-specific websites should be created and promoted that seek out and call attention to negative stories in their particular areas, especially for specific industries or geographical regions.
- A lot of those exist targeted at specific large companies people have grudges against, but otherwise they’re much too hard to find.
Reporters need to be in the habit of seeking out stories first uncovered by other people.
- They do this already, but they need to get better.

Below, at considerable length, is why I think those developments are both necessary and sufficient to carry the tradition of investigative journalism forward into the new media era.

For there to be public benefit from reporting, three things generally need to occur:

Disclosure or discovery of the raw facts. Without that, you don’t have reporting or news.
Analysis or interpretation. This stage can be optional when the purpose of news is entertainment, societal bonding, or whatever. But it’s pretty central to investigative journalism.
Distribution and popularization. It doesn’t do much good to uncover an important story unless people notice and care about it. Old media, with its emphases on writing, curation, and physical distribution, almost defines itself by this stage. (E.g., “paper” is part of the word “newspaper.”)

Disclosure and discovery come in two main forms:

Serendipity.
Spadework.

The serendipity part often seems to work well in the new media. Let’s go to some examples.

Wikileaks is a hugely successful case – people send Wikileaks documents or other files (a process that only makes sense with modern technology), and Wikileaks posts them.
- Note: There was an article yesterday about “internal strife” at Wikileaks – but the gist turned out to be that Wikileaks, already highly influential, could be doing even more than it already is.
Michael Arrington found out about a meeting of major angel investors – perhaps originally via a tweet – and kicked off a major technology industry news story now known as “Angelgate”.
An anonymous tipster spent 2 ½ hours IMing with me to reveal the true cause of the JP Morgan Chase site outages.
- Motivation: Because s/he felt Chase’s technology organization was being unfairly maligned by prior coverage.
- Why me: Because my previous speculative post about the JP Morgan Chase outages had shown up in the search engines and looked pretty credible.)
- Result: Enough accurate tech details of a major consumer embarrassment to create a “teachable moment,” even though the concerned parties were trying to cover them up.
An assisted living/nursing home in Dublin, Ohio called Friendship Village misbehaved toward my parents and me. I blogged about the problem, and it’s in the search engines now. If this turns out to be a pattern of behavior rather than an isolated incident, they’ll have some deserved trouble.

The story on the spadework side is more mixed. For example, there’s evidence I did as good a job on the JP Morgan Chase story as conventional media could today – Computerworld ran a story based on my post, without being able to uncover a single detail I hadn’t already found. But perhaps in the old-media-economics days, perhaps Computerworld would have had the resources to try harder and find something I didn’t. (E.g., I screwed up and didn’t actually get the details of the specific Oracle bug.) A bigger problem is outlined in this story on the uncovering of massive corruption in the California town of Bell. To wit (emphasis mine):

The new media ecosystem, in which citizen bloggers, small news outlets and big old-school media outlets effectively draw upon one another’s work to collaborate, didn’t quite work out in this case.

One blogger actually has anonymously and exhaustively alleged corruption in Bell for years …

The paper’s reporters say the blogger gave them tips. Though he’s a bit frustrated not to get more credit, he says the newspaper’s reporting muscle and much bigger audience gave life to the story in a way his website simply couldn’t. He counts his readers in the scores; The L.A. Times has hundreds of thousands of subscribers …

… some residents said they had gone to city hall to get their own answers. In essence, they were trying to do their own reporting on why their tax bills were so high and on rumors city officials were making a ton of money.

They got nowhere. …

“As a common citizen, I don’t know what my rights are with the city. I don’t know really how to attack them,” Sanchez said. “The Times, they have their legal departments. Of course, they’re able to get it more than a regular Joe like me.”

The citizens of Bell needed some place to turn for help, other than the overworked LA Times reporters who eventually uncovered the story on their own. Hence my first recommendation near the top of this post.

In many ways, analysis and interpretation work well in the new media era already. After all, there’s a whole world wide-web of self-appointed volunteer analysts on any issue you’d care to name! Yes, there are legitimate concerns about fragmentation and echo chambers, in which people only listen to the analysis of those folks who shared their biases to begin with. But those are hardly a barrier to muckraking – if anything, quite the contrary, as illustrated by the bogus ACORN prostitute/pimp advice scandal. (If your politics lean to the conservative side, think instead of something like a Michael Moore film.)

Or returning to the examples above:

Wikileaks’ biggest leaks are widely analyzed by all sorts of commentators, including top-flight mainstream media people and a broad variety of online commentators alike. I’ll confess I didn’t find any analysis of Wikileaks’ revelations about, say, Iceland or the Turks & Caicos Islands, but I’ll also confess to not looking very hard.
For the technology news uncovered respectively by Arrington and me, pretty much the ideal people to analyze it were, respectively – well, they were Arrington and me.
- In the case of Angelgate, much other analysis (and news) ensued.
- Analysis of the JP Morgan Chase outage details hasn’t yet gone all that far past me – but I already turned it into a “don’t make the same mistake JP Morgan Chase did” lesson.
The Friendship Village case is being used as a cornerstone of my slowly-unfolding analysis of the general problem with medical records.

And that brings us to distribution and popularization. The most brilliant sleuthing in the world doesn’t help people very much if they – or their lawmakers/regulators/advisers/whatever – don’t find out about it.

Wikileaks has that problem solved for its biggest leaks, but perhaps not for the others.
Arrington’s TechCrunch is a top news outlet in his area, so the problem was automatically solved for him.
DBMS 2 is a fairly serious outlet for database-related news. But in any case the JP Morgan Chase story was picked up by general trade press and financial-industry-specific press alike.
As noted in the story on Bell, CA, nobody was paying attention to a blogger who apparently had worked quite a bit of it out.
And if there’s anything you found lacking in my list of analysis/interpretation examples – well, if a story were picked up more broadly, then analysis/interpretation might also be stronger as well.

Almost nobody would ever see my Friendship Village story if I didn’t happen to own some websites with strong search engine authority. And how high it stays in the rankings as it ages still remains to be seen.

Possible answers take two main forms:

Aggregation and curation,* in which various contributions are bundled together at go-to websites or the like.
A reporting feeding chain, in which journalists with broader reach:
- Steal/borrow/take ideas from more specialized contributors.
- Repackage them.
- Perhaps add additional value in reporting, analysis, or presentation. (Several examples of this may be found in the links above.)

Investigative reporting needs more of each.

*The latter is the more high-falutin’ version of the former.

Consider my story about Friendship Village. Standing alone, it’s not going to influence much of anybody, except insofar as I can personally influence the course of medical database design or privacy law. But suppose one person each reported similar things at 20 different institutions. A journalist who wrote a story based on those reports could carry a lot of sway, perhaps:

Influencing the course of medical information exchange in the United States, or at least
Alerting people to the lengths they have to go to get proper information about and before their sick relatives.

Similarly, suppose there were a go-to website for complaints about assisted living facilities. Well, people considering moving into Friendship Village would have a little concern to address. Even better, the very existence of that site might help motivate people to share more stories. Bad institutions would need to reform, and bad practices might be reformed under the spotlight of public scrutiny.

If this isn’t my longest blog post ever, it’s surely close. So while I have much more to say on these subjects, I’ll stop here. Comments and examples are warmly encouraged.

Ike Pigott on the future of reporting

Curt Monash — Sun, 04 Apr 2010 13:47:06 +0000

Ike Pigott argues that, as the number of conventional journalists plummets, corporations will have to hire their own “embedded” journalists to fill the void. As he puts it:

The embeds of the future will work for the company, and be paid by the company to provide news about the company in a multitude of formats. Print, newsletter, video, blog, podcast, moving billboards, tattoos — whatever it takes. Because the bits and pieces of Corporate America that have a story to tell will still have their stories – just no ready outlets.

How is this different than what you have today? Surely there are corporate PR departments and external agencies already doing these things, right?

No.

What is required is an internal producer who writes in external voice — like the neutral point-of-view so often described by Wikipedia. People can smell marketing and propaganda coming around the corner, and they know when the pitches and puff pieces are missing that edge of neutrality. An accurate and fair piece is accurate and fair, no matter who writes it.

It’s an interesting theory, but it seems to presuppose dual marketing communication efforts, with separate departments of “Straightforwardness” and “Hype”. That may work at some companies, but in most cases I think it will be more practical to try to infuse straightforwardness through multiple parts of the marcom effort.

My more specific quick responses include:

That sure sounds a lot like Robert Scoble in his Microsoft days.
It also sounds like “community managers” at MMO game companies. (Both of the MMOs I’ve played have had great ones.) They often only use one or two channels (forums and the associated general website), but otherwise they fit the bill.
Ike’s views fit very well with mine on the future of the information ecosystem.
I’m getting ever more sympathetic to the idea that you need people whose main job is external communication of a straightforward kind. Reasons include:
- Senior executives who write great blogs commonly don’t keep them up. And even when they’re active, the blogging is pretty sparse. E.g., among companies I follow closely, Vertica, Aster Data, and Netezza have all done some outstanding blogging in the past, but do very little of it now. Only Dave Kellogg at Mark Logic really keeps going.
- It’s not obvious that senior executives are wrong to spend their time at something other than blogging. One of the greatest vendor blogs ever was Jonathan Schwartz’s at Sun. Umm — how sure are we that he actually did much good for his company with that effort?
- I frequently tell vendors “If you tell Story X in your own words, I’ll gladly point to it or post it for you.” They usually agree this is a wonderful idea — but then usually don’t free up the rather limited resources that would be required to take me up on it.
That said, the kinds of people who provide customer support (pre- or post-sales) are often very well suited to fill the role Ike is describing. At least, that’s the case in enterprise tech companies.
The media mix isn’t really as complex as Ike was suggesting. It basically falls into two groups: Text, and audio/video.
That said, text/graphics and audio/video media people are increasingly the same. (Just think of sports media, where the newspaper folks make their big bucks on radio or TV. That’s a harbinger of the future. Or think again of Scoble.)
One flaw of Ike’s idea is that in its pure form it only makes sense for companies large enough to have multi-person PR staffs. Other firms would have to use part-timers, or outsource. And if you’re going to do that, might it not make more sense to pay part of the cost of sponsoring, you guessed it, an independent blog?
I know that’s text/graphics-only, or at least text/graphics-mainly, but I happen to think audio/visual business news/PR is minor anyway. People may give enough attention to, for example, listen to audio from a company if it purports to teach them something. But news ABOUT a company? Who’s so interested in that to sit still for audio/video, unless they happen to be employees, or investors in its stock?

Bottom line: I think he’s wrong about some of his detailed views, but Ike Pigott is directionally very right in suggesting that newsmakers will increasingly become content creators for news about themselves.