Python XML Processing is another advanced python processing. XML Extensible Markup Language is a markup language that allows developers to develop applications that can be read by other applications. XML is a portable, open-source language. It encodes documents by defining a set of rules in both machine-readable and human-readable format. Extended from SGML Standard Generalized Markup Language, it also describes the structure of the document. In XML, we can define custom tags. We can also use XML as a standard format to exchange information.
APIs used in XML:
There are two basic used APIs in Python XML Processing :
- Document Object Model (DOM) API
- Simple API for XML (SAX)
APIs | Description |
DOM | It allows changes to the XML file. This is considered a WWW Consortium recommendation. In it, the entire file is read into memory and stored in a hierarchical form to represent all the features of an XML document. |
SAX | It is a read-only API. In it register callbacks for events of interest and then let the parser proceed through the document. Useful when your documents are large or you have memory limitations, it parses the file as it reads it from disk, and the entire file is never stored in memory. |
Python XML Parsing Modules:
Parsing means to read information from a file and split it into pieces by identifying parts of that particular Python XML Processing file.
There are two parsing modules in Python XML:
- xml.etree.ElementTree module
- Minidom (Minimal DOM Implementation)
1-xml.etree.ElementTree module:
The xml.etree.ElementTree module helps to format XML data in a tree structure (the natural representation of hierarchical data). The ElementTree is a class that wraps the element structure and it allows conversion to and from XML. The element type allows the storage of hierarchical data structures in memory. It also have some properties given below:
Properties | Description |
Tail String | Using this we can have tail strings. |
Attributes | It consists of a number of attributes stored as dictionaries. |
Tag | It is a string which is representing the type of data to be stored. |
Text String | It contains a text string having information to be displayed. |
Child Elements | It consists of a number of child elements stored as sequences. |
i-Writing XML Files:
Example:
#writing xml files import xml.etree.ElementTree as ET #creating file data = ET.Element('GAME') #Adding subtag as`Opening` element1 = ET.SubElement(data, 'OPEN') s_elem1 = ET.SubElement(element1, 'A1') s_elem2 = ET.SubElement(element1, 'B2') #Adding attributes to `items` s_elem1.set('KIND', 'APPROVED') s_elem2.set('KIND', 'CANCELED') #Adding text between the `A1` and `B2` s_elem1.text = "Order Approved" s_elem2.text = "Order Canceled" #Converting the xml data to byte object b_xml = ET.tostring(data) #Opening a file under the name `items2.xml`,`wb` (write + binary) with open("GFG.xml", "wb") as f: f.write(b_xml)
ii-Reading XML Files:
Example:
#reading XML files import xml.etree.ElementTree as ET #Passing the path of xml document tree = ET.parse('dict.xml') #getting the parent tag root = tree.getroot() #printing the parent tag along with location print(root) #printing the attributes of the parent tag print(root[0].attrib) #printing the text contained within print(root[5][0].text)
iii-Adding to XML:
Example:
for description in myroot.iter('description'): new_desc = str(description.text)+' wil be served'description.text = str(new_desc) description.set('updated', 'yes') mytree.write('new.xml')
iv-Deleting from XML:
Example:
#deleting element myroot[0][0].attrib.pop('name', None) #creating new XML file with the results mytree.write('output5.xml')
v-Finding Elements in XML:
Example:
#finding element print(myroot[0].tag)
2-xml.dom.minidom module:
The xml.dom.minidom module is basically used by people, proficient with Document Object module. Sometimes DOM applications start by parsing XML into DOM.
In xml.dom.minidom, parsing can be done in the ways given below:
i-By using parseString() Method:
Use this method when you want to supply the Python XML processing file to be parsed as a string.
Example:
p3 = minidom.parseString('<myxml>Using<empty/> parseString</myxml>')
ii-By using parse() Method:
This method is for use of the parse() function by supplying the Python XML processing file as a parameter.
Example:
#using parse() from xml.dom import minidom p1 = minidom.parse("demo.xml"); #spliting the XML file dat=open('demo.xml') p2=minidom.parse(dat)
iii-Finding Elements in XML:
Example:
#finding element dat=minidom.parse('sample.xml') print(dat)
iv-Accessing Elements using GetElementByTagName:
Example:
#accessing element tagname= dat.getElementsByTagName('item')[0] print(tagname)