As we know, today’s web technology advances are fast in good and bad ways. With almost every technology, if not used properly, its results might be devastating. Many programmers are not introduced to the vulnerabilities that might occur when working and parsing XML files, so that was the reason for me to write this article. I hope you like it. 2. What is XML? XML stands for Extensible Markup Language, mostly used for representing structured information. XML is widely employed in today’s web technology like web services (SOAP, REST, WSDL), RSS feed, Atom, configuration files (Microsoft Office and many other Desktop applications). XML has been standardized by the World Wide Web Consortium (W3C) and is part of SGML (ISO 8879). XML was created in 1996 by Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau, and John Cowan. The first standardization and specification for XML was made on 10 Feb 1998. W3schools.com has a nice short description of what XML represents (http://www.w3schools.com/xml/xml_whatis.asp):
XML stands for Extensible Markup Language
XML is a markup language much like HTML
XML was designed to carry data, not to display data
XML tags are not predefined. You must define your own tags
XML is designed to be self-descriptive
XML is a W3C Recommendation
- Designing an XML structure
XML Header (Document Type Definition – DTD)
Designing an XML structure is pretty straightforward. Each XML document begins with a header that defines the XML declaration:
Code 1: Sample of header declaration For the current example, the header defines the type of the encoding and the version. Also, in the header some additional entities could be included such as !DOCTYPE or other material. This is known as DTD (Document Type Definition) where a set of declarations are added to the XML file (for the tags used in DTD visit http://www.w3schools.com/dtd/).
XML Elements
Each XML file contains elements that could be defined with any character you want, except for special characters. The start of the tag is “” for example and end of a tag is “”.
[xml]
Names can contain letters, numbers, and other characters
Names cannot start with a number or punctuation character
Names cannot start with the letters xml (or XML, or Xml, etc)
Names cannot contain spaces
XML Attributes
Instead of making an element within an element, you can make the child element be an attribute to its parent element. Kind of confusing to explain, but in practice it’s very easy.
[xml]
XML Validation
There are also web sites where you can validate you an XML file, to see if it is properly designed or not: http://www.xmlvalidation.com/, http://www.w3schools.com/xml/xml_validator.asp, http://www.validome.org/xml/ and many more.
Today many web application and desktop application use XML as part of its structure and the RSS feed is one of them. It stands for Rich Site Summary, or more colloquially Really Simple Syndication, and its main function is to display summarized text of recent published blogs, posts, news and etc. Today many news aggregators including Google News works by using the RSS feed. Here is a sample script in PHP for making an RSS feed. This is just a sample for you to see how it works. I definitely wouldn’t recommend this for using in real life project!
[php] "; $rss.="
Figure 1: XML file in browser In Python, you can easily parse XML files. There are many modules that can be used for this purpose, for this sample will be used BeautifulSoap (http://www.crummy.com/software/BeautifulSoup/). [xml] def parse_score(link): xml = urllib2.urlopen(link) xml_content = xml.read() soup = BeautifulSoup(xml_content) results = soup.find_all(“item”) for result in results: print result.contents [/xml] Code 6: Sample of XML parsing The code is straightforward; three steps are involved: loading the XML link (or file), parsing the content by using BeautifulSoap and the last step is extracting the XML content. 6. Common XML vulnerabilities (sample of vulnerable code https://gist.github.com/hakre/2416846) Every application has vulnerabilities, so XML parsers have some too. This is a list of well-known XML vulnerabilities that might occur in your application:
Billion laughs
This vulnerability is a DoS (Denial Of Service) aimed for the parsers of the XML. This vulnerability is also known as XML bomb or Entity Expansion XML bomb. It also might happen that this vulnerability pass the validation of the XML schema. Consider the following tag:
Code 7: DTD tag Now consider the following vulnerable code (the code is taken from http://cytinus.wordpress.com/2011/07/26/37/):
Figure 2: Billion laughs vulnerable code As you can see, we have 10 “lols”. So what is happening here? At the end, we have instance of “lol9”. When the &lol9; is parsed the entity lol9 will be called which has 10 “lol8” instances. The lol8 has 10 “lol7” instances and so on. At the end you may assume that there will be a lot of “lol” (100,000,000 instances = billion). The billion lol’s might cause DoS (Denial of Service). That’s why it is called the Billion Laughs Vulnerability. For more information about the vulnerability, check the link http://cytinus.wordpress.com/2011/07/26/37/.
Quadratic blowup
Another Entity Expansion XML bomb is the quadratic blowup vulnerability discovered by Amin Klein of Trusteer. The “kaboom” entity has 50,000 “a” represented as “&a;” When parsed, the size of it changes, from 200KB to 2.5gb, causing DoS. Still the billion laughs create much bigger size when parsing compared to quadratic blowup.
[xml]
DTD retrieval
Also with entity declaration, you can have an URL link for replacement (for definition of replacement see previous vulnerability). When using the System identifiers you can download the content from external location and embed it in you XML file. [xml] ]>
[/xml]
Code 8. Remote entity expansion retrieval example
The same vulnerability could be used for local file also:
[xml] >
An attacker can circumvent firewalls and gain access to restricted resources as all the requests are made from an internal and trustworthy IP address, not from the outside.
An attacker can abuse a service to attack, spy on or DoS your servers but also third party services. The attack is disguised with the IP address of the server and the attacker is able to utilize the high bandwidth of a big machine.
An attacker can exhaust additional resources on the machine, e.g. with requests to a service that doesn’t respond or responds with very large files.
An attacker may gain knowledge, when, how often and from which IP address a XML document is accessed.
An attacker could send mail from inside your network if the URL handler supports smtp:// URIs.
- How to defend
Figure 3: Modules that lack protection from XML exploits (http://blog.python.org/2013/02/announcing-defusedxml-fixes-for-xml.html) Tips:
http://msdn.microsoft.com/en-us/magazine/ee335713.aspx http://www.cisco.com/en/US/docs/app_ntwk_services/data_center_app_services/ace_waf/v61/user/guide/waf_ug_xmldefense.pdf Using parsers that use safe functions
http://docs.python.org/2/library/xml.html#xml-vulnerabilities
- Conclusion I think that this topic was interesting because it is something that many programmers are not aware of. We should care more about the security of web applications, because XML is more and more part of them, and that increases the risks of being exploited. We saw that the results of exploiting these vulnerabilities are devastating, and that is why we should be more concerned about using safe modules and functions.
- References
http://docs.python.org/2/library/xml.html#xml-vulnerabilities http://msdn.microsoft.com/en-us/magazine/ee335713.aspx http://en.wikipedia.org/wiki/Billion_laughs http://www.xponentsoftware.com/Articles/XML_vulnerabilities.aspx http://clawslab.nds.rub.de/wiki/index.php/XML_Generic_Entity_Expansion http://cytinus.wordpress.com/2011/07/26/37/ http://stackoverflow.com/questions/10212752/how-can-i-use-phps-various-xml-libraries-to-get-dom-like-functionality-and-avoi http://clawslab.nds.rub.de/wiki/index.php/XML_Generic_Entity_Expansion http://www.w3schools.com/dtd/ http://www.w3schools.com/xml/default.asp http://blog.python.org/2013/02/announcing-defusedxml-fixes-for-xml.html