Skip to main content

Apache lucene query injection


Prevention of Apache Lucene query injection


  • Usage of Ruby for building dynamic and object-oriented applications
  • Usage of Lucene for full-text search and indexing


Non compliant code

def search(query)
index ='index_directory')

In the code snippet above, the search method is vulnerable to Apache Lucene query injection. This is because it takes a query parameter and directly passes it to the method without any sanitization or validation.

This means that an attacker could potentially manipulate the query parameter to alter the search query being executed by the Lucene index. This could lead to unauthorized access to data, corruption of data, or other unexpected and undesirable behavior.

For example, an attacker could pass a query string that includes special Lucene query syntax to expand the search to include all documents (*:*), or to search on fields that the application does not intend to be searchable. They could also potentially construct a query that is designed to consume excessive resources and cause a denial-of-service condition.

The vulnerability arises from the fact that the application is generating Lucene queries dynamically based on untrusted input, and is not using any mechanisms such as parameterized queries or stored procedures to prevent injection attacks.


  • Validate and sanitize user input before generating the Lucene query
  • Use parameterized statements or prepared statements to build the Lucene query
  • Implement input validation and filtering to prevent any malicious input
  • Implement a whitelist approach to only allow specific characters or patterns in the query
  • Implement proper error handling and logging to detect and respond to any potential attacks
  • Consider using an ORM or query builder that provides built-in protection against query injection
  • Regularly update and patch the Lucene library to ensure any security vulnerabilities are addressed

Compliant code

def search(query)
index ='index_directory')
sanitized_query = sanitize_query(query)

def sanitize_query(query)
# Remove any special characters used in Lucene query syntax
query.gsub!(/[\\+\\-\\&\\|\\!\\(\\)\\{\\}\\[\\]\\^\\"\\~\\*\\?\\:\\\\]/, '')

# Implement a whitelist of allowed characters
query.gsub!(/[^a-zA-Z0-9\\s]/, '')

# Escape any remaining special characters
query = CGI::escape(query)

return query

The search method is used to perform a search on a Lucene index. The query for this search is provided by the user and is passed to the search method as a parameter.

In the original code, the user-provided query was used directly in the search without any validation or sanitization. This could allow an attacker to perform a query injection attack by providing a specially crafted query.

The updated code includes a new sanitize_query method that is used to sanitize the user-provided query before it is used in the search. This method removes any special characters used in Lucene query syntax, implements a whitelist of allowed characters, and escapes any remaining special characters. This helps to prevent any potential query injection attacks.

The sanitize_query method is called within the search method before the query is used. This ensures that the query is always sanitized, regardless of where the search method is called from.

In addition to these changes, it is also recommended to implement proper error handling and logging, use an ORM or query builder that provides built-in protection against query injection, and regularly update and patch the Lucene library to ensure any security vulnerabilities are addressed.