Audio and video search
Analysis of search technologies focused on returning images, sounds, or video.
As I see it, there are eight distinct market areas that each depend heavily on linguistic technology. Five are off-shoots of what used to be called “information retrieval”:
1. Web search
2. Public-facing site search
3. Enterprise search and knowledge management
4. Custom publishing
5. Text mining and extraction
Three are more standalone:
6. Spam filtering
7. Voice recognition
8. Machine translation
One of the great music videos of all time is Madonna’s Material Girl. With two exceptions, all the “related videos” listed by YouTube are just what one would expect: either other Madonna videos, or other versions of Material Girl. One exception is Cyndi Lauper’s Girls Just Want to Have Fun, while the other is Marilyn Monroe’s Diamonds Are A Girl’s Best Friend. The connection with the Monroe video is particularly strong, with each being #3 on each other’s “Related” list.
And that’s an outstanding result. Material Girl is obviously a direct reference, conceptually and visually, to Diamonds Are A Girl’s Best Friend. So my question is: How does YouTube know that? Are there favorite videos lists on which they co-exist? Did somebody hand-enter the connection? Is it inferred from their comment threads (which I definitely have not paged through)? Or — by far the least likely but most interesting of all — is there some sort of direct visual comparison?
Other than popularity presumably having something to do with it (both videos are, deservedly, very often watched and commented on), I haven’t figured out which it is.
I talked yesterday with enterprise search vendor Coveo. Here are some highlights.
- Coveo spun out of Copernic a few years ago. The only relationship between the companies now is that Coveo licenses Copernic’s desktop search product.
- Coveo has 60 employees.
- Coveo has 5-600 customers, including lots of big-name companies.
- Coveo’s pitch boils down to “inexpensive, easy to install, and no-apologies functionality.” Actually, Coveo also claims superior relevance and performance, but I’m not going to comment much on those until I have a chance for a more technical discussion.
- Example of ease of set-up: Coveo says Factiva downloaded the product on a Monday, called up and bought it on Thursday, and deployed it in production that Friday. This may be a growing industry trend. Attivio also features a “download first, talk to us second” distribution model. So do vendors of other kinds of “platform” software such as database management systems, application servers, or complex event/stream processing.
- Average selling price: $50K. Everything is included for one price unless it requires bundled third-party software (as is the case for audio, video, and OCR search).
- Coveo claims 90% head-to-head win rates vs. Google OneBox and Microsoft Sharepoint search. Generally, customers have other search products too (I guess that’s obvious, since Coveo has only been around 2-3 years). Sometimes they even have all-you-can-eat licenses to competitive products, but buy from Coveo anyway. Rule of thumb: Nobody’s head-to-head win rate is truly as high as they like to think, but companies that think their rate is 90% generally are doing quite well.
- Coveo cites a strong demand for text search of relational databases. Based on specific examples cited, this seems to mean text fields such as call center notes.
- Coveo offers audio/video search. Really, it’s just an audio search technology; what’s being searched on in videos is the audio part. And the audio search boils down to a speech-to-text transcription, with a search of the resulting text. Coveo’s key claim is that the error-laden text you get from speech-to-text conversion is sufficient for useful searching. Specifically, you do best searching for unusual words, such as proper names. In the case of telephone calls, which are low quality – perhaps 32 kb/sec – Coveo says there’s only 10-20% accuracy in word transcription. However, Coveo also says that the words that do come through are exactly the unusual ones most usefully searched on.
- Coveo also says that its speech-to-text lexicon is initially strengthened by text crawls. In general, while I didn’t ask, I would guess that the easy-installation story involves a fair amount of automated lexicon enhancement.