new icn messageflickr-free-ic3d pan white
Teshub | by kevin dooley
Back to photostream


#301 in Explore, 8.20.2007


So... I am a stats guy so I ran some simple statistical models on my own photos to see if I could understand anything about Interestingness or Explore candidacy. If I were designing such a system, I would make Interestingness a function of the photos in the person's set, and Explore use Interestingness, amongst other variables, to determine what to pick for Explore. I believe in fact that is what Flick does.


My insights are objective, but are tainted by a sample size of one (me.) If others want to and can do similar models please share.


Sample: My 40 most interesting photos as ranked on a particular day

Variables: Interestingness (1-40), # Views, # Faves, # Comments, Explore highest rank (0=no, 1=401-500, 2=301-400, etc.), (Log) Age, # Groups posted to, and ratios between views, faves, comments, and groups.

Model: Simple linear correlation, alpha=0.05 significance test



1. Posting to more (mostly comment-required) groups increases comments most, then views, then faves. Faves tend to come more from contacts.

2. Interestingness is moderately, positively correlated (0.55) about equally with views, faves, and comments... and groups.

3. Faves, views, and comments are very strongly correlated to one another (0.90).

4. Explore status was a moderate function of # groups, and then less so, comments, views, and Interestingness (not faves, interestingly).


Interpretation: My guess is that about 40% of Interestingness is probably a function of "popularity", incorporating views, comments, and faves; another 40% based on who views, comments, whether it's required in the group, etc.; and another mystery 20%.


My second guess is that Explore is a much more complex (semi-heuristic) SAMPLING method. As many people say, many great pictures never get into Explore, and many mediocre ones do. I think that misses the point. Explore is not just a sample of photos, it is a sample of photographers and Flickr cliques. Again, if I were desiging Explore, I would want to sample from relatively disconnected parts of the Flickr web (as determined by contacts, co-posting, cross-faving and cross-commenting). if this is true then, e.g., if you got into Explore, it would be less likely that a close Flickr colleague would.


Finally, I guess that because of they way the rankings shift across the day, and by the way the numbers change, they are using a SPRING EMBEDDING type solution to determine Explore rank, which would make sense if they're doing network calculations. The "reverberations" we see in rank look a lot like a system fine-tuning to a stable solution. If the stable Explore solution were "disturbed" by a randomness filter (e.g. qualify 5000, show 500), this could explain the behavior that we see.


Or, I could be all wrong. It's just fun to figure out black box models!

147 faves
Taken on August 5, 2007