Web scraping is the technique to extract data from a website.
The module BeautifulSoup is designed for web scraping. The BeautifulSoup module can handle HTML and XML. It provides simple method for searching, navigating and modifying the parse tree.
from BeautifulSoup import BeautifulSoup import urllib2 import re
html_page = urllib2.urlopen("https://arstechnica.com") soup = BeautifulSoup(html_page) for link in soup.findAll('a', attrs={'href': re.compile("^http://")}): print link.get('href')
Leave a Reply: