Penny Arcade and Control-Alt-Del are great comics. I love to read them and do so when I’m not feeling really lazy. See, I pretty much live inside my Google Calendar, GMail, Remember the Milk, Google Reader, and Netvibes tabs, for the most part. I’m pretty lazy when it comes to some forms of content, and if a comic doesn’t show up in Google Reader, I’ll typically ignore it until someone points out a particularly good strip to me.
While playing around with Yahoo! Pipes, I realized I could finally do something about this. I began to play around with a couple of pipes to read in the RSS feeds, look for all comic entries, and change the content. The trick was to copy the location the feed item was pointing to (which would contain the actual comic image within the page) to the description, and then apply a regular expression to the description to turn it into an <img> tag pointing to the image itself.
I got lucky. To my knowledge, there is currently no way to fetch content from any arbitrary HTML page and do something with a piece of that page. I suspect their Fetch Data module might let me, but I haven’t managed to get it to work just yet. I was able to pull this off since the comic image was stored with a predictable path based on the date of the comic, and the page being linked to also contained the date. A regular expression was all that was needed to parse out the date and rebuild the path.
Anyway. the end result is that I now have inline comics in my Penny Arcade and Control-Alt-Del RSS feeds! You can add them to your RSS reader below, or take a look at how they were made.
Awesome!
Thanks a lot! BTW I put together a feed of Dr. Fun (no longer published, alas) via the URL listed above. I just used a python script called from cron — I’m definitely going to have to check out Pipes, it seems pretty handy.
You rock!
Wow, that is terrifically neat. Going to have to play around with this later.
Some artists specifically discourage RSS feeds from linking directly to their comics, on the basic that they’re losing the advertising revenue that helps pay bandwidth cost. Do either of these comics have such a policy?
Simon: Good question, and I’m not sure. If they complain, I’ll take it down. I think it would be beneficial to them, however, to have inline comics *and* inline ads in a feed provided by them. I certainly wouldn’t mind seeing both if it meant getting to see the comic. As it is, I almost never look at the comics because it means an extra click, and sometimes that’s all it takes.
Ever tried feed43.com? You give a couple simple regex-like things and it scrapes HTML. Surprisingly easy (no, I am not being paid to say this).
E.g. http://feed43.com/4227712333774324.xml
Nice work. I’m trying to duplicate your work, but with User Friendly (http://userfriendly.org) – I can’t get it to inline, because the author stores the images in a directory that includes the name of the month in the URL. (for example, today’s cartoon: http://www.userfriendly.org/cartoons/archives/07apr/xuf010203.gif). Any ideas how to make this work?
I’ve been doing the same thing with Greasemonkey for a few months now. It works great.
Just a question… how’s the buy and sell domain over at Penny Arcade works? who does the domain appraisal?
Thanks
I guess they changed it a little bit, so here’s a new pipe I made for Penny Arcade Inline Comics