PhD Comics, by Jorge Cham, provide a funny (but accurate) glimpse into the life of a graduate student. Being a graduate student myself, I have always enjoyed reading this comic.
At some point, I decided to read a bunch of these comic during my travel when I wouldn't be able to access the Internet. Since this is an online comic, this was a problem. I wrote a small python scraping script that is able to visit different pages on the phdcomics website, locate the comic GIFs, and download them on local disk for reading later on.
Usage: python downloadphdcomics.py
Download: Github: https://github.com/lifeofpentester/phdcomics
You can store all of these GIFs in a ZIP file and change the extension from ZIP to CBR. Then, you can use any CBR reader to read these comics.
In case you are interested in reading the code, here it is:
#!/usr/bin/python """The PhD Comics Downloader""" """ This code fetches PhD comics from www.phdcomics.com and saves to '/root/' Written by: Pranshu bajpai [dot] pranshu [at] gmail [dot] com """ from bs4 import BeautifulSoup from urllib import urlretrieve import urllib2 import re for i in range(1, 1699): url = "http://www.phdcomics.com/comics/archive.php?comicid=%d" %i html = urllib2.urlopen(url) content = html.read() soup = BeautifulSoup(content) for image in soup.find_all('img', src=re.compile('http://www.phdcomics.com/comics/archive/' + 'phd.*gif$')): print "[+] Fetched Comic " + "%d" %i + ": " + image["src"] outfile = "/root/" + "%d" %i + ".gif" urlretrieve(image["src"], outfile)
Hi,
ReplyDeleteI'm new to python (in fact, this is the very first time I use it), and I get this error when running this program:
urlretrieve(image["src"], outfile)
NameError: name 'image' is not defined
Anyway, thanks for your effort :)
Cheers
Move the last two lines right below print.
ReplyDeletefor image in soup.find_all('img', src=re.compile('http://www.phdcomics.com/comics/archive/' + 'phd.*gif$')):
print "[+] Fetched Comic " + "%d" %i + ": " + image["src"]
outfile = "/root/" + "%d" %i + ".gif"
urlretrieve(image["src"], outfile)
Hi,
ReplyDeleteI am unable to get this code to work in the current form.