Python XML Processing

Python XML Processing is another advanced python processing. XML Extensible Markup Language is a markup language that allows developers to develop applications that can be read by other applications. XML is a portable, open-source language. It encodes documents by defining a set of rules in both machine-readable and human-readable format. Extended from SGML Standard Generalized Markup Language, it also describes the structure of the document. In XML, we can define custom tags. We can also use XML as a standard format to exchange information.

APIs used in XML:

There are two basic used APIs in Python XML Processing :

  • Document Object Model (DOM) API
  • Simple API for XML (SAX)
DOM It allows changes to the XML file. This is considered a WWW
Consortium recommendation. In it, the entire file is read into memory
and stored in a hierarchical form to represent all the features of an XML document.
SAX It is a read-only API. In it register callbacks for events of interest
and then let the parser proceed through the document.
Useful when your documents are large or you have
memory limitations, it parses the file as it reads it from disk,
and the entire file is never stored in memory.

Python XML Parsing Modules:

Parsing means to read information from a file and split it into pieces by identifying parts of that particular Python XML Processing file.

There are two parsing modules in Python XML:

  • xml.etree.ElementTree module
  • Minidom (Minimal DOM Implementation)

1-xml.etree.ElementTree module:

The xml.etree.ElementTree module helps to format XML data in a tree structure (the natural representation of hierarchical data). The ElementTree is a class that wraps the element structure and it allows conversion to and from XML. The element type allows the storage of hierarchical data structures in memory. It also have some properties given below:

Tail StringUsing this we can have tail strings.
AttributesIt consists of a number of attributes stored as dictionaries.
TagIt is a string which is representing the type of data to be stored.
Text StringIt contains a text string having information to be displayed.
Child ElementsIt consists of a number of child elements stored as sequences.
i-Writing XML Files:
#writing xml files

import xml.etree.ElementTree as ET 
#creating file
data = ET.Element('GAME')

#Adding subtag as`Opening`  
element1 = ET.SubElement(data, 'OPEN') 
s_elem1 = ET.SubElement(element1, 'A1') 
s_elem2 = ET.SubElement(element1, 'B2') 
#Adding attributes to `items` 
s_elem1.set('KIND', 'APPROVED') 
s_elem2.set('KIND', 'CANCELED') 
#Adding text between the `A1` and `B2`  
s_elem1.text = "Order Approved"
s_elem2.text = "Order Canceled"
#Converting the xml data to byte object 
b_xml = ET.tostring(data) 
#Opening a file under the name `items2.xml`,`wb` (write + binary) 
with open("GFG.xml", "wb") as f: 
ii-Reading XML Files:
#reading XML files

import xml.etree.ElementTree as ET 
#Passing the path of xml document
tree = ET.parse('dict.xml') 
#getting the parent tag
root = tree.getroot() 
#printing the parent tag along with location 
#printing the attributes of the parent tag 
#printing the text contained within 
iii-Adding to XML:
for description in myroot.iter('description'):
new_desc = str(description.text)+'
wil be served'description.text = str(new_desc)
description.set('updated', 'yes') 
iv-Deleting from XML:
#deleting element

myroot[0][0].attrib.pop('name', None) 
#creating new XML file with the results
v-Finding Elements in XML:
#finding element


2-xml.dom.minidom module:

The xml.dom.minidom module is basically used by people, proficient with Document Object module. Sometimes DOM applications start by parsing XML into DOM.

In xml.dom.minidom, parsing can be done in the ways given below:
i-By using parseString() Method:

Use this method when you want to supply the Python XML processing file to be parsed as a string.

p3 = minidom.parseString('<myxml>Using<empty/> parseString</myxml>')
ii-By using parse() Method:

This method is for use of the parse() function by supplying the Python XML processing file as a parameter.

#using parse()

from xml.dom import minidom
p1 = minidom.parse("demo.xml");

#spliting the XML file

iii-Finding Elements in XML:
#finding element

iv-Accessing Elements using GetElementByTagName:
#accessing element

tagname= dat.getElementsByTagName('item')[0]