Analyzing Hybrid Analysis Mapping (HAM) – Part 1

This post will start a new series on ThreadFix’s Hybrid Analysis Mapping (HAM) library. Today I’ll cover the background on the SBIR contract, why ThreadFix was a good candidate for the program, and why HAM tastes so good in sandwiches.


Denim Group won a DHS-funded Small Business Innovation Research (SBIR) research contract to a technology we call Hybrid Analysis Mapping. To even understand the point of the contract requires explanation. I’ll try to break it down somemore. Since we’re talking about ThreadFix, everything uses the application context (and for HAM the web application context) as opposed to network context.

A Quick Word on Scanners

Static scanners find vulnerabilities by looking at source or binary code. Therefore, they have the following pieces of information:

  1. File name and line number, or code location (for each part of the code that executes when the vulnerability is exploited)
  2. The type of vulnerability (XSS, SQLi, etc.)

Dynamic scanners find vulnerabilities by sending requests to live applications and analyzing responses. These scanners have a different set of information:

  1. URL
  2. Parameter (sometimes)
  3. The type of vulnerability

The Multiple Scanner Theory

Also important to this discussion: no single scanner can find all the vulnerabilities in an application. For that manner, no combination of tools can find all the vulnerabilities in an application. However, having a combination of static and dynamic scanners, or better yet multiple static and multiple dynamic scanners, greatly increases the number of vulnerabilities you find. It should go without saying that we want to find as many vulnerabilities as possible, just like we want to find as many bugs as possible. So, we should use as many different scanners as possible.

Trouble in Paradise

Using multiple scanners has an associated management cost. If you find the same vulnerability with a static and a dynamic scanner, recognizing that fact requires poring through both reports and knowing the application fairly well. Such analysis can grow quite complicated: you may need to know that UserController on line 57 corresponds to the URL /manage/users, but UserController on line 23 corresponds to the URL /manage/users/3, for example. And if one pair of scans is complicated, managing weekly scans from multiple tools is way worse.

Problem Statement

The problem we originally set out to solve was: “Determine the feasibility of developing a system that can reliably and efficiently correlate and merge the results of automated static and dynamic security scans of web applications.” DHS wanted an automated system that will allow security teams to reap the benefits of a mixed scanner system but not deal with the negative aspects of managing such a system by hand.

Enter ThreadFix

ThreadFix, as it turns out, was already part of the way there. We already had a number of scanner report parsers, and we had algorithms to merge together scans from different types of scanners (dynamic to dynamic and static to static). The main strategies we employed to do this were:

  1. Using the CWE standard allowed us to identify when different scanners reported the same type with different names–Fortify’s “Cross Site Scripting (reflected)” and AppScan’s “Reflected XSS” both correspond to CWE 79, and that fact lets us identify them as the same type.
  2. We used the URL (dynamic) or file (static) and HTTP parameter (both) to see if the vulnerability was in the same place on the attack surface.
  3. Diffing between scans from the same scanner let us track when vulnerabilities were added and removed by developers – no more manually looking through the report to see if your developers actually fixed the vulnerability or just closed the JIRA ticket.

This set of techniques was effective for dynamic to dynamic and static to static merging. Prior to our work on HAM, for dynamic to static merging, ThreadFix performed acceptably with some simple frameworks (raw PHP, JSP) but fell flat on its face trying to merge together results for more complicated frameworks (Spring MVC, ASP.NET MVC, etc.) The goal of this research was to address the greatest deficiency in our merge algorithm.


pablo (1)

We knew we needed to make some significant upgrades to ThreadFix to properly solve this problem. First was identifying why static and dynamic results weren’t merging:

  1. ThreadFix can’t match file names to corresponding URLs if the two differ greatly, as in MVC frameworks.
  2. Static scanners usually don’t include information about HTTP parameters, but we need them to differentiate between static results.

With this in mind, we decided to pursue source code parsing. If the framework can figure out which code to run given a URL and parameters, we can too. There are various methods for getting the information we needed from the source code:

First we looked at adding a runtime environment, starting the code, then examining the result. This is what the IAST folks do with their tools, and that approach has a lot of promise, but requires a heavyweight investment for each execution environment that needs to be supported.

The second strategy was to do light parsing of source to extract just the information we needed. This could be done in a more time- and memory-efficient manner and wouldn’t require updated libraries every time we wanted to run a newer version of a framework. We decided to go this route to be able to more efficiently support additional languages and frameworks.

Check back soon for Part 2.

About Dan Cornell

A globally recognized application security expert and the creator of ThreadFix, Dan Cornell holds 20 years of experience architecting, developing and securing web-based software systems. As the Chief Technology Officer and a Principal at Denim Group, Ltd, the parent company of ThreadFix, he leads the technology team to help Fortune 500 companies and government organizations integrate security throughout the development process.