Python - SPLessons

Python XML Processing

Home > Lesson > Chapter 16
SPLessons 5 Steps, 3 Clicks
5 Steps - 3 Clicks

Python XML Processing

Python XML Processing

shape Description

XML is "Extensible Markup Language" in which the user can store, transport and exchange data. XML concentrates only on what data means unlike HTML concentrating on Layout also.

shape Conceptual Figure

When a Client sends any request in an application, XML Document is loaded by the Web Server Control which inturn forwards to the Application Server. The application server then checks in XML Databases like XML Enabled Relational Database and Native XML Non-Relational Database.

Issues for evolution of XML

shape Description

=> The XML recommendations as an XML element has a physical structure and a logical structure. Now XML was originally developed to create printed documents, and printed documents often have images, headings and text in them. Computers doesn't have that capability of finding elements in that particular page. To overcome the problem, XML can introduced that make the computer understand about the data present in the document. => It is also common to take a document and factor it up into multiple files to make the document compatible on various devices. It's the physical structure of the XML document that ties all those pieces together. XML DTD or Document Type Definition is part of the physical structure of the XML document which can be used to define the physical structure of the XML document.

shape Example

The following is an example to create a sample xml file. The name of the file movies.xml . [c]<collection shelf="New Arrivals"> <movie title="Enemy Behind"> <type>War, Thriller</type> <format>DVD</format> <year>2003</year> <rating>PG</rating> <stars>10</stars> <description>Talk about a US-Japan war</description> </movie> <movie title="Transformers"> <type>Anime, Science Fiction</type> <format>DVD</format> <year>1989</year> <rating>R</rating> <stars>8</stars> <description>A schientific fiction</description> </movie> <movie title="Trigun"> <type>Anime, Action</type> <format>DVD</format> <episodes>4</episodes> <rating>PG</rating> <stars>10</stars> <description>Vash the Stampede!</description> </movie> <movie title="Ishtar"> <type>Comedy</type> <format>VHS</format> <rating>PG</rating> <stars>2</stars> <description>Viewable boredom</description> </movie> </collection>[/c] SAX Parser reads the input from the top to bottom. When a certain “event” occur, it invokes call back method that was provided. As a case, to remove the titles of data articles from a weblog nourish, the starttag() procedure is alluded to as and is checked if “name” component call is available or no more. all things considered, transfer its content. at the point when gotten the occasion name “endtag”, investigate if it’s far the last detail of “title”. at that point overlook all comparably considers until both info closes or another “starttag” with a name of “name” comes nearby. As it is event based, SAX parser doesn’t create any DOM tree structure in memory. SAX parser just checks for an event and call particular call back method in which whatever functionality required, can be implemented and that is how the SAX Parser operates. The following is an example. [c]import xml.sax class MovieHandler( xml.sax.ContentHandler ): def __init__(self): self.CurrentData = "" self.type = "" self.format = "" self.year = "" self.rating = "" self.stars = "" self.description = "" # Call when an element starts def startElement(self, tag, attributes): self.CurrentData = tag if tag == "movie": print "*****Movie*****" title = attributes["title"] print "Title:", title # Call when an elements ends def endElement(self, tag): if self.CurrentData == "type": print "Type:", self.type elif self.CurrentData == "format": print "Format:", self.format elif self.CurrentData == "year": print "Year:", self.year elif self.CurrentData == "rating": print "Rating:", self.rating elif self.CurrentData == "stars": print "Stars:", self.stars elif self.CurrentData == "description": print "Description:", self.description self.CurrentData = "" # Call when a character is read def characters(self, content): if self.CurrentData == "type": self.type = content elif self.CurrentData == "format": self.format = content elif self.CurrentData == "year": self.year = content elif self.CurrentData == "rating": self.rating = content elif self.CurrentData == "stars": self.stars = content elif self.CurrentData == "description": self.description = content if ( __name__ == "__main__"): # create an XMLReader parser = xml.sax.make_parser() # turn off namepsaces parser.setFeature(xml.sax.handler.feature_namespaces, 0) # override the default ContextHandler Handler = MovieHandler() parser.setContentHandler( Handler ) parser.parse("movies.xml")[/c] Now compile the code result will be as follows. [c]*****Movie***** Title: Enemy Behind Type: War, Thriller Format: DVD Year: 2003 Rating: PG Stars: 10 Description: Talk about a US-Japan war *****Movie***** Title: Transformers Type: Anime, Science Fiction Format: DVD Year: 1989 Rating: R Stars: 8 Description: A schientific fiction *****Movie***** Title: Trigun Type: Anime, Action Format: DVD Rating: PG Stars: 10 Description: Vash the Stampede! *****Movie***** Title: Ishtar Type: Comedy Format: VHS Rating: PG Stars: 2 Description: Viewable boredom[/c]

Summary

shape Key Points

  • XML is Extensible Markup Language.
  • XML stores, transports and exchanges data.
  • SAX provides a programmatic event-based parsing XML document.
  • SAX Parser reads the input from the top to bottom.