Tuesday, July 29, 2008

Missing the value of information

This begins as critique, but I'm sure ideas will surface.

Mike of Mike On Ads has written a javascript example to mine the user's browser history for top sites, and use statistics to work out their gender. In the comments, many many people have posted their scores and whether the score was correct. Some are right, some are wrong.

But there are two complications here. Firstly, a complication of the quality of the data - many PCs have multiple users, particularly in family homes, which will render their scores useless. Also, no frequency data is included - a hundred visits to espn.com will be cancelled out by a single salon.com.

The second, and more interesting, point is this: why do you want their info? For any normal commercial reason - in particular serving ads or choosing content - what is interesting is not the true gender of the viewer; it is their statistical gender. If someone arrives at your sport&sewing site, should you pitch it as a sport site or a sewing site based on their gender, or the ratio of crochet sites to
live sport news sites? You might think you want their gender, but what you really want is correlation.

I should mention that I wouldn't endorse the practise, and that soon enough it should be prevented by, I suggest, returning the default link colours rather
than the live colours for anything but links from the parent domain - i.e. strengthen the XSS protection.

Even without this kind of temporary naughtiness, you can still find out about your general user base - watch what links they click, videos they watch, news items they find interesting. And build useful statistics - perhaps your site is visited by three fairly separate groups, which might determine your site development strategy. This time, you have frequency data, timing data, everything.

But remember not to be evil.