Twitter is a rather new communications service, wildly popular in the technology blogging and podcasting communities. There are close to a million registered accounts or users, but I’d guess the active users number in the low-mid five figures. Even at that low usage, Twitter is on overload, plagued with outages and data loss.
Scaling Twitter is a huge challenge. Doing so will involve changing just about every aspect of what Twitter is. A number of commentators have suggested lesser fixes, but none that I’ve seen is apt to work. (Generally, they forget that UI options will need to change as usage grows.) However, I think I’ve come up with an approach that would indeed work, for:
- Arbitrarily high levels of public Twitter use.
- Twitter integration with other communication tools such as instant messaging or IRC-style chat.
- Enterprise or integrated personal/enterprise use of Twitter.
The sections below cover:
- Future metadata needed by Twitter “tweets” (i.e., posts)
- Filtering enhancements Twitter will need as usage scales (and could greatly use already today)
- Present and future Twitter use cases
- Twitter CEP and database architecture (almost everybody else I see writing about Twitter gets this wrong)
- Enterprise Twitter
- Twitter’s competitive vulnerabilities
If you’re not familiar with Twitter, you probably should be. Crunchbase gives a decent overview, and the link above is a live look.
Twitter posts need more metadata
Twitter’s limit of 140 characters /message is cute, and maybe even sustainable for actual text. But that doesn’t allow for much metadata, @ replies and # tags notwithstanding. And the reliance on TinyURL is a kludge. The minimum metadata Twitter posts (aka tweets) need going forward is:
- URL being linked to
- Target individual(s)
- Target group(s), selectable by readers and writers alike
- Level of urgency
- Level of protection (e.g., totally open, target group only, friends only, etc.)
- Subject tags (this could be combined with group tags as a temporary hack)
- What’s already there (date/time, author, etc.)
Twitter needs many more tweet filters
Even today, Twitter writers and readers would benefit from more ability to filter tweets. If the number of users went up 10X or 100X, better filtering would become an absolute need. Even absent such growth, if users join who are less technosocial than the early adopters – or if current users tire of the distraction Twitter now causes — filtering will be a need for them too.
Examples of filters that I think Twitter should develop or support include:
- User groups (both ways — targets selected by the writer or authors selected by the reader)
- Subject (whether by explicit tag or content analysis)
- Taboo words (foul language, and perhaps a lot more than that for enterprise use)
User-group filters are crucial, because the current model of listening to a whole “stream” doesn’t scale. Right now, Twitterers only fit into two groups – those you listen to and those you don’t. But as usage grows, we’ll need to be able to deploy filters such as:
- The group I’m discussing tonight’s meetup with, archived back through the six hours I’ve been traveling.
- My usual high-priority groups, because otherwise I’m too busy to tweet today.
- Business-oriented groups only, plus my immediate family, because I’m at work.
- Fellow political enthusiasts, because there’s a big primary election tonight.
- NOT sports, because March Madness has started and I don’t really care about college hoops.
The need goes even further than that. Already today, some people tweet publicly that they want to read Dave Winer’s views on technology but not on politics, or Robert Scoble’s actual tweets but not his automated notifications of podcasts. What’s more, we may prefer different filter sets for real-time streams on our phones, real-time streams on our PCs, and occasional archival lookups.
Twitter needs to expand its use cases
Right now, there are two main ways to use Twitter – like high-tech CB radio, broadcasting to all who listen, or in “private update” mode, communicating only with your friends. As I’ve suggested above, there needs to be a lot more variety than that, with user groups and subjects freely filtered in and out. If that functionality is added, Twitter could have a number of major uses, include:
- General socializing (arguably one of Twitter’s two core uses today)
- General issues discussion (arguably the other one)
- General advice (Twitter is a great way to get immediate tech help)
- Meeting planning (another major Twitter use)
- General workgroup collaboration
- Narrowcast news dissemination – local snow days, daily enterprise news, breaking fantasy sports alerts, and many more
In addition, Twitter should be integrated with instant messaging. Right now, many people use Twitter through AIM or GoogleTalk. The tighter that integration gets, the better. Seamless switching between mass Twitter and reciprocal IM would be a nice improvement. (Just remember not to broadcast intimate love notes to your entire Twitter following.)
And it’s not just IM integration. For example, a group of Twitterites tweeting just at each other would be a whole lot like an IRC or AOL chat room, if filtering functionality worked that way. However – and this is a big advantage – it would be easy to be “in” multiple rooms at once.
Twitter needs a different architecture (CEP/database)
The essence of Twitter is accepting and distributing messages in real time. As I’ve already pointed out, this should be done via complex event/stream processing (CEP), not by writing everything first to a database. The need for much more complex filters just makes the case for CEP overwhelming. Of course, there also has to be a persistent message store, but database writing only should happen after real-time needs have been met.
This could scale nicely. Suppose there were 1,000,000 users online in any given hour. Suppose for each of those users the system maintained a cache of 500 16-byte message IDs. We’d only be talking about 8 gigabytes of RAM for that portion, no matter how many followers the most popular Twitterers each have.
So far, I’ve begged the question of whether
- Each user would get a personal representation of her full Twitter stream on disk, or
- Her Twitter stream would be recreated by a full database query each time she logged on or drilled back in her archives.
What I suggest is a hybrid. When a user is online, whichever tweets she sees should eventually be persisted out to disk, in batches (at least their message IDs). When she first signs on (assuming she’s a frequent user), there should be a cache of tweets waiting for her in memory. But if she ever wants to do an archival search beyond those two groups of tweets, a slowish database lookup will have to do. That said, if it turns out to be a useful performance speedup hack to persist complete Twitter streams for the most active users, I won’t be at all astonished.
Sometimes there would indeed be a complex query to fetch all or part of somebody’s Twitter stream. It would start with a set of rules that generated a list of tweet authors, perhaps executed against a persistent list of all the authors that user ever follows (or against some other kind of cache). Then it would look for all messages, in an appropriate time period (key point for performance optimizations), on the desired subjects. And last it would apply any negative filters (e.g., strong language. But if this were done against a real data warehouse DBMS, I don’t see why it would be a terribly big deal at all.
Twitter needs an enterprise version
I think Twitter could be a valuable enterprise tool. In particular, much of what email is used for would work better on a sufficiently spruced-up Twitter — namely quick notifications, often with an associated URL. (There anyway should be fewer emails with file attachments in the world, as those should be replaced by URLs. This is especially true at enterprises where good downloading connectivity can be assumed.)
Obviously, enterprise Twitter would need better archiving and integrations than the public version. I think it would actually need better filtering too. On the other hand, scalability would be much less of a challenge.
Voila! We have a monetization model for Twitter. However, we also have a huge reason for Microsoft to competitively blow Twitter out of the water. Make that “another huge reason” — the first one lies in the potential for Twitter to be a major enhancement to IM.
Twitter is very vulnerable to competitors
As popular as Twitter is, it doesn’t have a lot of built-in loyalty. Tweets are ephemeral; walking away from one’s archive of them would not be a terrible loss. Rebuilding the network of people one follows is a bit of a pain, but we’ve all done that multiple times before. And a new improved version could build a user base quickly by being more proactive about invites than Twitter is.
Above all, there’s rampant dissatisfaction with Twitter’s system robustness. As I’ve noted above, there’s also a lot of room for feature improvement.
Twitter is very vulnerable to being blown away.