Paint it Black: The Role of Taint Detection in Java Application Security

Written by

RASP is changing the negative perception of data tainting as a tool for application security, writes John Matthew Holt

‘Data tainting’, sometimes misleadingly called ‘taint checking’, has been used in several computer programming languages to provide crude application protection and vulnerability identification features. These implementations have sought to use data tainting as a mechanism to prevent malicious users from executing commands on a host computer.

However, its real-world effectiveness for protecting web applications from business-logic vulnerabilities like SQL injection and cross site scripting (XSS) has been limited by a simplistic security assumption that untainted data is good and tainted data is bad. In practice this type of security decision-making only works for a small category of basic security vulnerabilities and circumstances.

In Perl, one popular programming language with data tainting support, the runtime has the ability to flag as tainted a piece of data that arrives from outside the program. Similarly, the Ruby runtime also provides a mechanism where an object can have a taint flag that, when set, indicates the data came from an unreliable source and cannot be used for certain operations. Several other languages also support data tainting in one form or another.

Generally, all of these implementations work by setting a ‘taint-bit’ for an object when that object contains data from an untrusted source. Later, if an attempt is made to pass a tainted parameter to a sensitive or important API (a ‘sink’ in programmer parlance), then the language’s runtime environment will produce an error message or abort the operation. In this way, data tainting monitors the use of externally inputted, un-validated data by an application’s business logic.

Monitoring non-validated data, and blocking it when necessary, is a non-trivial matter. Without input validation the application is free to make unrestrained use of tainted data, and becomes vulnerable to several types of attacks. These include cross-site scripting, where maliciously written URLs can insert executable text into a dynamically generated webpage; hidden field tampering, where the hidden fields of a web page are manipulated; or cookie poisoning, where false data is inserted into user cookies to trick a website.

Of these, command injection attacks pose the most serious threat. Especially SQL attacks, where a hacker adds malicious code to the query entry field to change its meaning. If the attack is successful the hacker can trick an SQL database into executing commands and gain partial or complete access to the database.

The result can be catastrophic. By simply tricking the SQL server into interpreting carefully crafted text strings as commands, an attacker may not only read an entire database table but modify it at will, and possibly even gain control access to the operating system itself.

For simple security vulnerabilities and exploit scenarios, the traditional notion that ‘tainted data is unsafe’ may be sufficient.  However, as application business logic complexity has grown – driven in large part by the growth of web applications – this naïve notion has revealed itself as too crude for many real-world applications.  As a result, a perception has grown in some communities that data tainting has no value.  

Java, the most popular programming language in financial services and many other industries, and the most commonly attacked, has never possessed native support for data tainting functionality. While multiple attempts have been made to provide data tainting on the Java platform, most have introduced unacceptable performance overheads, effectively ruling out their practicality for most applications.  

However, the recent emergence of Runtime Application Self-Protection (RASP) technologies has renewed interest and development in using data tainting on Java.

The reason is simple. RASP technologies are built on the principal of analyzing the runtime behavior of an application as it executes and correlating application events in real-time to identify attacks and vulnerability exploits. This ‘real-time’ correlation gathers as much information as possible about application behavior (what Gartner calls ‘application context’), including:

  1. what values have been input to the application, and from where,
  2. where those input values are flowing through the application, and
  3. what those input values are being used for by the application

When combined with RASP, data tainting fills a significant security void, namely the ability to identify ‘data origin’. Only data tainting provides the intelligence to distinguish, at runtime, between data which originated from outside and data which originated from inside the application. For RASP, this intelligence unlocks a higher level of runtime event correlation which provides new, more accurate and reliable application security capabilities.

RASP is changing the negative perception of data tainting as a tool for application security. As new low-overhead data tainting systems become integrated with RASP the two technologies are taking application security to the next level.


About the Author

John Matthew Holt is founder and CTO of Waratek, and a recognized expert in Java, virtualization and application security. He has been awarded more than 50 patents for inventions that span virtual machines, dynamic recompilation, distributed computing and virtualization technologies. 

What’s hot on Infosecurity Magazine?