Sarath's Web Log

The memories are written in INK,for just THINK…

Web scraping using python

today i’m telling about some basics of Web scraping using python. i use python 2.7. after a googling and exploring python docs i got some interesting results about this.
(Web scraping is a computer software technique of extracting information from websites.)

a simple way to scrap information from web, using built in python library urllib(urlib2).

import urllib2
response = urllib2.urlopen('''')
html =
print html

urllib2 is a Python module for fetching URLs (Uniform Resource Locators). It offers a very simple interface, in the form of the urlopen function. This is capable of fetching URLs using a variety of different protocols. It also offers a slightly more complex interface for handling common situations – like basic authentication, cookies, proxies and so on. These are provided by objects called handlers and openers

Now the real parts start,scrap the data using urls.For that I used BeautiFulSoup.Parsing data is very easy using it.

Before trying it,you have to install it, download the tarball.
Windows: install
Ubuntu:sudo apt-get install python-beautifulsoup

Parsing is as simple as below:

import urllib2
from BeautifulSoup import BeautifulSoup
url = ""
html = urllib2.urlopen(url).read()
data = BeautifulSoup(html)
print data

It will print whole parsed data of the url.And then you can navigate and collect yours need html tag values from the
soup 🙂 .


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: