Home > Python, XML, XPath > XPath (libxml2) in Python

XPath (libxml2) in Python


Step 1: Install libxml2 using synaptic package manager

Step 2: Create an xml file that you want to traverse.

For example I am using w3school’s xml document http://www.w3schools.com/xpath/books.xml.
We can also use the local file exist on file system.

Step 3: Create a python for example having name xpathcode.py

Open the xpathcode.py import the libxml2 and urllib. Parse the xml file.

import libxml2
import urllib
rss=libxml2.parseDoc(urllib.urlopen('http://www.w3schools.com/xpath/books.xml').read())


Note: If file exist on local file system try like below

import libxml2
import urllib
rss=libxml2.parseDoc(open('books.xml', 'r').read())

Step 4: Now try the following xpath query one by one.

a. Selects the first book element that is the child of the bookstore

nodes=rss.xpathEval('/bookstore/book[1]')
print nodes[0]

Output:

<book category="COOKING">
 <title lang="en">Everyday Italian</title>
 <author>Giada De Laurentiis</author>
 <year>2005</year>
 <price>30.00</price>
</book>

b. Selects the last book element that is the child of the bookstore element.

nodes=rss.xpathEval('/bookstore/book[last()]')
print nodes[0]

Output:

<book category="WEB">
 <title lang="en">Learning XML</title>
 <author>Erik T. Ray</author>
 <year>2003</year>
 <price>39.95</price>
</book>

c. Selects the last but one book element that is the child of the bookstore element

nodes=rss.xpathEval('/bookstore/book[last()-1]')
print nodes[0]

Output:

<book category="WEB">
 <title lang="en">XQuery Kick Start</title>
 <author>James McGovern</author>
 <author>Per Bothner</author>
 <author>Kurt Cagle</author>
 <author>James Linn</author>
 <author>Vaidyanathan Nagarajan</author>
 <year>2003</year>
 <price>49.99</price>
</book>

d. Selects the first two book elements that are children of the bookstore element

nodes=rss.xpathEval('/bookstore/book[position()<3]')
for i in nodes:
    print i

Output:

<book category="COOKING">
 <title lang="en">Everyday Italian</title>
 <author>Giada De Laurentiis</author>
 <year>2005</year>
 <price>30.00</price>
</book>
<book category="CHILDREN">
 <title lang="en">Harry Potter</title>
 <author>J K. Rowling</author>
 <year>2005</year>
 <price>29.99</price>
</book>

e. Selects all the title elements that have an attribute named lang

nodes=rss.xpathEval('//title[@lang]')
for i in nodes:
    print I

Output:

<title lang="en">Everyday Italian</title>
<title lang="en">Harry Potter</title>
<title lang="en">XQuery Kick Start</title>
<title lang="en">Learning XML</title>

f. Selects all the title elements that have an attribute named lang with a value of ‘eng’

nodes=rss.xpathEval("//title[@lang='eng']")
if not nodes:
    print 'eng not exist'

Output:
eng not exist

g. Selects all the title elements of the book elements of the bookstore element that have a price element with a value greater than 35.00

nodes=rss.xpathEval("/bookstore/book[price>35.00]/title")
for i in nodes:
    print I

Output:

<title lang="en">XQuery Kick Start</title>
<title lang="en">Learning XML</title>

h. Selects all the title AND price elements of all book elements

nodes=rss.xpathEval("//book/title | //book/price")
for i in nodes:
    print I

Output:

<title lang="en">Everyday Italian</title>
<price>30.00</price>
<title lang="en">Harry Potter</title>
<price>29.99</price>
<title lang="en">XQuery Kick Start</title>
<price>49.99</price>
<title lang="en">Learning XML</title>
<price>39.95</price>

i. Selects all the title elements of the book element of the bookstore element AND all the price elements in the document

nodes=rss.xpathEval("/bookstore/book/title | //price")
for i in nodes:
   print I

Output:

<title lang="en">Everyday Italian</title>
<price>30.00</price>
<title lang="en">Harry Potter</title>
<price>29.99</price>
<title lang="en">XQuery Kick Start</title>
<price>49.99</price>
<title lang="en">Learning XML</title>
<price>39.95</price>

j. Select all the title’s text

nodes=rss.xpathEval("/bookstore/book/title/text()")
for i in nodes:
    print i

Output:

Everyday Italian
Harry Potter
XQuery Kick Start
Learning XML

for more detail on xpath please visit: http://www.w3schools.com/xpath/default.asp

Categories: Python, XML, XPath
  1. February 7, 2011 at 4:38 am

    nice dude.

  2. February 7, 2011 at 4:40 am

    @thanks mir and recluze

  3. Karan
    May 7, 2012 at 9:18 am

    Hey it works fine but I only get the node address as output and not the complete output as yours:/

  4. Georgia
    June 30, 2012 at 3:23 pm

    Excellent examples — Just what i was searching for — thanks!!!!

  5. Selva
    January 23, 2013 at 7:49 pm

    Hi Mohsin, Lot of thanks for your gut work.

    I am interested particularly on traverse to the element with attribute value specific.

    Some thing like this :
    nodes=rss.xpathEval(“//title[@lang=’eng’]”) from your example

    If once I have identfied this , I wanted to remove that element itself. How can I select this and remove?

    Pls help me

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: