Skip to main content

XPath injection

Need

Implementation of input validation and sanitization for XPath statements

Context

  • Usage of C# for building robust and scalable applications
  • Usage of Microsoft.AspNetCore.Mvc for building web applications using the MVC pattern
  • Usage of System.Xml for XML data processing and manipulation

Description

Non compliant code

public class BookController: Controller
{
private readonly XmlDocument _doc;

public BookController()
{
_doc = new XmlDocument();
_doc.LoadXml("<books><book><title>Book Title</title><author>Author Name</author></book></books>");
}

[HttpGet]
public IActionResult Get(string title)
{
string xPath = $"//book[title='{title}']";
XmlNode bookNode = _doc.SelectSingleNode(xPath);

if (bookNode != null)
{
return Ok(bookNode.OuterXml);
}

return NotFound();
}
}

The above code is a simple ASP.NET Core Controller that is used to perform an XPath query on an XML document to find a book by its title. The Get method takes a title parameter from the HTTP GET request and uses it to build an XPath query.

The vulnerability lies in the line where the XPath query is built:

string xPath = $"//book[title='{title}']";

This line directly inserts the title parameter into the XPath query without any validation or sanitization. An attacker could exploit this by providing a specially crafted title parameter that alters the XPath query to return sensitive data or perform malicious actions. This is known as an XPath Injection attack.

For example, an attacker could provide a title parameter like '] or '1'='1 which would alter the XPath query to //book[title=''] or '1'='1'. This would return all books, regardless of their title, potentially leaking sensitive data.

In addition, if the XML document being queried contains sensitive data, an attacker could craft an XPath query that extracts this data. For example, if the XML document contained user passwords, an attacker could provide a title parameter like '] | //user/password/text() | //book[title=' to extract all user passwords.

Steps

  • Validate and sanitize user input before using it in the XPath statement.
  • Use parameterized queries or prepared statements to prevent XPath injection.
  • Avoid constructing dynamic XPath statements using string concatenation.
  • Consider using a whitelist approach to validate user input for the XPath statement.
  • Implement input validation and filtering to ensure that only expected values are used in the XPath statement.

Compliant code

public class BookController: Controller
{
private readonly XmlDocument _doc;

public BookController()
{
_doc = new XmlDocument();
_doc.LoadXml("<books><book><title>Book Title</title><author>Author Name</author></book></books>");
}

[HttpGet]
public IActionResult Get(string title)
{
// Validate and sanitize the input
if (string.IsNullOrWhiteSpace(title) || title.Any(c => !char.IsLetterOrDigit(c)))
{
return BadRequest("Invalid title");
}

// Use parameterized XPath query
var manager = new XmlNamespaceManager(_doc.NameTable);
var xPath = $"//book[title='{title}']";
var expr = _doc.CreateNavigator().Compile(xPath);
XmlNode bookNode = _doc.SelectSingleNode(expr.Expression, manager);

if (bookNode != null)
{
return Ok(bookNode.OuterXml);
}

return NotFound();
}
}

The original code was vulnerable to XPath injection because it used string concatenation to create dynamic XPath statements. This allowed an attacker to potentially manipulate the XPath statement and extract sensitive data.

The fixed code mitigates this vulnerability by implementing several security measures:

  1. Input Validation and Sanitization: The code checks if the title is null or contains any non-alphanumeric characters. If it does, the code returns a "Bad Request" response. This prevents an attacker from injecting malicious code into the XPath statement.

  2. Parameterized XPath Query: Instead of using string concatenation to create the XPath statement, the code uses a parameterized query. This means that the user input is treated as a literal string, not part of the XPath statement. This prevents an attacker from manipulating the XPath statement.

  3. Whitelist Approach: The code only allows alphanumeric characters in the title. This is a form of a whitelist approach where only expected values are allowed. This further reduces the chance of an XPath injection attack.

By implementing these measures, the code is now more secure and resistant to XPath injection attacks.

References