Skip to main content

XPath injection


Implementation of input validation and sanitization to prevent XPath injection attacks.


  • Usage of Ruby for building dynamic and object-oriented applications
  • Usage of nokogiri for parsing and manipulating XML and HTML documents


Non compliant code

def search(query)
doc = Nokogiri::XML("books.xml"))
result = doc.xpath("//book[title[contains(., '#{query}')]]")
return result

In the above code snippet, we are using the xpath method provided by the Nokogiri library in Ruby to search for a book title in an XML document. The query parameter is directly interpolated into the XPath expression. This makes the code vulnerable to XPath Injection.

An attacker can manipulate the query parameter to change the structure of the XPath query and access data that they are not supposed to. For example, they could provide a query such as ') or '1'='1, which would result in the XPath expression //book[title[contains(., '') or '1'='1]]. This would return all books, regardless of their title.

This vulnerability can lead to unauthorized access to data, data corruption, and in some cases, even denial of service. It is therefore crucial to validate and sanitize the query parameter before using it in the XPath expression.


  • Sanitize and validate the input query to prevent any malicious XPath injection.
  • Use parameterized XPath queries instead of directly interpolating the query into the XPath statement.
  • Implement proper error handling and logging to detect any potential XPath injection attempts.
  • Consider using a dedicated XML parsing library that provides built-in protection against XPath injection, such as the 'sanitize_xpath' gem for Ruby.
  • Regularly update and patch the XML parsing library to ensure the latest security fixes are applied.

Compliant code

def search(query)
doc = Nokogiri::XML("books.xml"))
sanitized_query = sanitize(query)
result = doc.xpath("//book[title[contains(., :query)]]", query: sanitized_query)
return result

def sanitize(input)
# Use a simple regex to remove any non-alphanumeric characters
input.gsub(/[^0-9a-z ]/i, '')

The original code was vulnerable to XPath injection because it directly interpolated the user's query into the XPath statement. This means that a malicious user could potentially manipulate the XPath query to access data they shouldn't be able to, or even modify or delete data.

The fixed code mitigates this vulnerability in two ways:

  1. Input sanitization: The sanitize method is used to remove any non-alphanumeric characters from the user's query. This is a simple and effective way to prevent most XPath injection attacks, as it removes the special characters that are used to manipulate XPath queries.

  2. Parameterized XPath queries: Instead of directly interpolating the user's query into the XPath statement, the fixed code uses a parameterized query. This means that the user's query is treated as a single unit, rather than part of the XPath syntax. This makes it much harder for a malicious user to manipulate the XPath query.

In addition to these changes, it's also recommended to implement proper error handling and logging, and to use a dedicated XML parsing library with built-in protection against XPath injection. Regularly updating and patching the XML parsing library will also help to ensure that the latest security fixes are applied.