When I posted my blog “How Aggregating – Google to Launch News Search Site“, I followed up by emailing Mark Pilgrim to see if he was going to offer a cool Python interface that might lob news from the Google API over to the Blogger API. His reply caught me off guard because:
- He pointed out an ambiguity in my original post as I implied the Google API could be approached with XML-RPC.
- He also brought to my attention an ethical issue I overlooked because I was viewing things from a purely technical point of view.
Hence, I’ve categorized this message under take a plank out of my eye – and am posting his messages because they are informative, instructive and accurate:
Not a chance. Google goes to great lengths to block all scrapers and other scripts that try to automatedly pull content from anywhere on their site. Their SOAP API only covers the main search results (no image search, no directory search, no groups search, no news search). In other words, unless they provide an interface for it, it’ll be next to impossible to grab the raw data and repurpose it.
Here is Mark’s reply when I asked permission to reprint the above email:
Thanks MARK! Sometimes I get so keyed up with new toys and ideas, I sometimes forget that it’s only fun until I put someone gets hurt!