avram: (Default)
avram ([personal profile] avram) wrote2003-01-04 07:33 pm

Google News

I wonder how Google News selects its headlines. For example, look at this chunk of screenshot:

[ Google News paragraph ]

What quirk of software was responsible for choosing a headline from The Hindu, an Indian paper, as the top headline, and relegating the Washington Post and Yahoo to second bananahood? Not that I mind — I like the diversity of sources — I’m just curious.

I suppose The Hindu might have more readers, India being such a populous nation, but in another section a Newsday headline is top and the NY Times headline comes in second, and I doubt Newsday has more readers. More likely it’s based on filing time, or linked-to rankings.

[identity profile] miramon.livejournal.com 2003-01-05 03:02 am (UTC)(link)
Because The Hindu's headline actually bears some relation to the story (which was about Viagra for women)? The Yahoo headline is irrelevant and the WP's one doesn't include the keyword "women".

[identity profile] miramon.livejournal.com 2003-01-05 10:32 am (UTC)(link)
If I understand the way that Google News works, they do it by clustering (this is based on that next-in-sequence functionality that they were testing last year). They detect that there are a number of stories on a particular subject and so that becomes a subject. They then see how many stories fit into the group.

The lead story is the one whose headline best fits the group, the next two are ones with content that fits the group. If you look at the Washington Post's story it is very specifically about Pfizer inventing female sexual disorder. But the headline doesn't reflect that so it doesn't get the lead position.