Friday, November 23, 2007

Kindle RSS Ripper

After trying out blogs and periodicals on the Kindle, I'm left reasonably impressed. It's really neat to pick up the Kindle and find that one of the blogs you watch has an update you haven't read yet. The push functionality is cool, and I dig what amounts to the pay as you go approach to the EVDO bandwidth. I mean, someone has to pay for it somewhere, right? And so what if Amazon are playing the margins for profit, the ARE a business, right?

But what about RSS feeds that aren't on the Kindle service yet? For that matter what about RSS feeds that you don't care about quite so much as to be that up to date? I have a few blogs that I read once in a blue moon, and entertained as I am by them, I can't see shelling out cash for them. Clearly Amazon don't want to trumpet free content when they're offering their upgraded cool stuff for pay, but they've graciously left a trail of breadcrumbs for anyone so inclined to follow.

First off, the format of Kindle documents appears to just be a wrapper around Mobipocket files. These are compiled binary files created by tools available at the Mobipocket site. Believe it or not, there are free versions of the tools there, including a command line compiler.

What are the input files? Turns out they're just plain old HTML. Mobipocket recognize a few extra tags aimed at books (e.g. a page break tag), but otherwise they're just HTML. Reference for the whole thing is available at the site.

So we have XML (RSS) feed documents on the one hand, and a command line compiler that takes HTML files and builds our desired output. Piece of cake!

I've whipped up a little command line utility of my own in C# that will process a list of rss feeds into a set of .mobi Mobipocket books. I'm posting it here for anyone else to try, but I make no guarantees about whether it'll work for you, nor will I offer any support or additional features. I don't have a lot of free time these days, and I'm only likely to work on it further should I need something more of it. Hey, if we're lucky this will inspire someone else to go make a fancier version!

Having said that, this does work reasonably well. So far I've tried it with Kotaku, Slashdot, GayGamer, XKCD.com and a bunch of other sites. If it fails for you on some other feed, I'd be inclined to suspect that it's yet another weird variant of what it means to be an RSS feed. I didn't know that the whole concept was such a mess!

To run this:
  • Download and unzip the RSSKindleSync.zip from here file to any directory of your choice.
  • Download the mobigen utility from the Mobipocket site and copy the actual mobigen.exe file to the same directory where you left RSSKindleSync. You should now have 4 files there: RSSKindleSync.exe, RSSKindleSync.exe.config, feed list.xml and mobigen.exe.
  • Now edit the feed list.xml file in that folder to add all the sites you're interested in ripping. Just copy and paste the existing tag and change the URL parameter for each.
  • Run the RSSKindleSync.exe file and away you go!
  • You should end up with a pile of .mobi files in a /books/ subdirectory under that exe file, one for each RSS feed you listed. Just copy these over to your Kindle's documents directory and you're done.
  • Whenever you'd like a refresh, just rerun RSSKindleSync.exe and all your RSS books will be rebuilt.
Enjoy!

5 comments:

Anonymous said...

How about sharing the source? so people don't have to use ildasm to see what you're doing.

Anonymous said...

Hey baby, it's been a while (more than a year!!!) since your last post!

Aren't ya gonna show up?!

How's the RPG going on?

Anonymous said...

OK, I can't take it anymore, where the hell are you? (j/k)

Hey buddy, how's the RPG coming? Still kicking, huh?

Well, I hope so... good luck!

Unknown said...

Hey man, I don't believe I'm posting this after all this years!

Just wanted to say that I loved your RPG game W.I.P posts ;)

Hope you're OK and doing great stuff like you did here, I've been waiting for you like 3 years or so!

But that's OK, you're gonna show up some day, huh? ;)

Unknown said...

What about http://www.tomykindle.com/rss?

* gets your favourite rss feed
* reads links to real articles
* parsers the articles to strip pictures and unimportant stuff
* sends the nicely formatted article to your Kindle for free

Nice, eh?