May 19, 2008

How is YouTube relating videos?

One of the great music videos of all time is Madonna’s Material Girl. With two exceptions, all the “related videos” listed by YouTube are just what one would expect: either other Madonna videos, or other versions of Material Girl. One exception is Cyndi Lauper’s Girls Just Want to Have Fun, while the other is Marilyn Monroe’s Diamonds Are A Girl’s Best Friend. The connection with the Monroe video is particularly strong, with each being #3 on each other’s “Related” list.

And that’s an outstanding result. Material Girl is obviously a direct reference, conceptually and visually, to Diamonds Are A Girl’s Best Friend. So my question is: How does YouTube know that? Are there favorite videos lists on which they co-exist? Did somebody hand-enter the connection? Is it inferred from their comment threads (which I definitely have not paged through)? Or — by far the least likely but most interesting of all — is there some sort of direct visual comparison?

Other than popularity presumably having something to do with it (both videos are, deservedly, very often watched and commented on), I haven’t figured out which it is.


4 Responses to “How is YouTube relating videos?”

  1. rtl on May 19th, 2008 10:50 pm

    almost certainly normalized co-occurrences of favorites or views.

    The tags are completely different so it’s not mutual tags. No freaking way on the visual comparison. There’s no place to manually enter the “historical” connection and that’s obviously not scalable. Mining the comments text is messy. One other option would be show candidate videos randomly in that space and rank them by eCTR; but the candidate list would anyways have to be generated using one of these techniques.

  2. Curt Monash on May 20th, 2008 2:54 am

    Better theory than any I came up with. Thanks!


  3. Lawrence Miao on May 20th, 2008 10:55 am

    I read a paper today solving the same problem in a similar site. The model is built in a bipartite graph, one set is video set, another set is keyword set. The keywords are parsed by natural language processing tools from tags provided by user, and also the title. The paper co-clustered the graph by using some information theory based metrics. After getting the co-clusters, by ranking, we can get the ‘hot’ topics and by traversal property or connectivity of the bipartite graph, we can get related videos.

    according to the tech:
    The tags of Material Girl are:
    self-parody economic metaphor high camp golddigga bling ice cashmoney holla OG madonna material girl

    The tags of Girls Just Want to Have Fun:
    Girls Just Want to Have Fun Cyndi Lauper Music

    The tags of Diamonds Are A Girl’s Best Friend:
    marilyn monroe diamonds girl best friend gentlemen prefer blondes song songs movie movies

    All of them have ‘girl’, from this, all of the three are connected in the bipartite graph, other techniques can be used to do ranking. There are lots of ranking schemes available. For example, if in the ranking, Youtube use data mining techs like (sequence) frequent patterns of user click sequence history, there’ll be high probability, user like all of the three and watch them all in a sequence.

    This is the first time, I made comment here. Your site is very useful for me. Thanks for your work. : )

    btw, the paper title is
    Web video topic discovery and tracking via bipartite graph reinforcement model,
    Lu Liu, Lifeng Sun, Yong Rui, Yao Shi, Shiqiang Yang,
    appeared in WWW 2008

  4. Susheel on November 13th, 2008 3:12 am

    There is a paper published on this:

Leave a Reply

Feed including blog about text analytics, text mining, and text search Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.