Seeding Web Application Scanners with Attack Surface Data

We’ve talked previously on this blog about web application attack surface – why it is important to understand all of the URLs an application will respond to and all of the injection points where an application will “listen” for inputs that will impact its behavior. As part of the Phase 1 Hybrid Analysis Mapping (HAM) research we did for the Department of Homeland Security (DHS) we even created and released a command-line tool that will perform a quick static scan of an application’s source code and dump a list of these URLs and injection points out to standard output. This is a handy way to get a quick look at what an application’s attack surface looks like. So we decided to take this a step further – what if you could seed your web application scanner with this sort of data? If the scanner knows what the application’s attack surface is, you will get a much more thorough analysis of the target application.

Why is this? Well – a web application scanner can’t test attack surface if it doesn’t know that the attack surface exists. How do most scanners determine an applications attack surface? They will typically:

  • Spider the application – The web scanner will act like Google does when it indexes pages on the Internet by analyzing a web page’s HTML, JavaScript and other assets to determine links to other pages that have not yet been visited as well as any GET or POST parameters that may be passed to those pages. Scanners will also look for the cookies an application sets so those can become part of the application surface to be tested as well. For applications that require a user to be logged-in in order to access functionality this spidering process must be coupled with login and session-management capabilities in the scanner.
  • Guess – Web scanners can also try to make guesses about additional URLs that might be exposed as well as parameters that can be passed in. These may be guesses at common URLs (like an admin/ directory) or permutations of previously identified objects (like looking for /index.php.bak if a site exposes the URL /index.php).

That’s all well and good, but what about pages and parameters that are missed by these approaches? Elements can be missed for a variety of reasons:

  • Weaknesses in the spider – No scanner is perfect so the spidering process could potentially miss exposed attack surface. URLs might be discoverable by analyzing JavaScript, Flash and other site elements, but not all spiders take these into account.
  • Pages with no inbound links – Complex applications may have all sorts of pages that are not exposed to the spidering process. One example of these are landing pages that are an entry point to the application with links back into other parts of the application, but that have no outbound links to them.
  • Invisible parameters – I once did some work on an application where every page would respond to a parameter of “d” being passed in with a request by attempting to delete the order with the value of the “d” parameter. (yikes!) This appeared to be some utility functionality that a site developer had placed in the application for debugging and convenience and, thankfully, you would never find any “d” parameters when crawling the application. But the application behavior was there nonetheless. This isn’t the only such time we’ve found application behaviors like this and when we do find them, the impact of exploiting those behaviors is pretty severe.

So seeding application scans with attack surface data gives us the opportunity to jump-start the spidering process and can help get us better scan coverage for applications that expose these sorts of hidden capabilities.

Let’s take a look at how this works in ThreadFix for the OWASP ZAP scanner using the Bodgeit Store example application:

  1. Set up an application in ThreadFix and provide a pointer to the application’s source code (see more about this in the Hybrid Analysis Mapping Configuration wiki page)ham_config
  2. ThreadFix does a lightweight static analysis of the source code to create a database of mappings between attack surface points and the source code responsible for that attack surface (screenshot is of ThreadFix’s command-line Hybrid Analysis Mapping tool to illustrate the underlying analysis)ham_analysis
  3. From OWASP ZAP, you configure the ThreadFix server and API key to be used to pull the attack surface datazap_config
  4. Then you select the application whose attack surface you want to retrieveapp_selection
  5. And provide a relative base URL where the scanning will occurbase_url
  6. OWASP ZAP then pulls the attack surface data from ThreadFix, consisting of URLs and GET/POST parameters that will be used to see ZAP’s spidering and scanning. Note that the ZAP scan now knows about the “admin.jsp” page and multiple “debug” parameters that would not have been found otherwise. This results in a more thorough scan and an identification of vulnerabilities that would have otherwise been missedsurface_import_annotated

This attack surface calculation provides the scanner with an exhaustive list of all the URLs and parameters it will need to fuzz to get a thorough examination of the application. This isn’t foolproof – a scanner won’t be able to fuzz parts of the application it knows about but doesn’t know how to get to. One example of a situation like this is a multi-step process like an e-commerce checkout. However, this sets the stage for running an analysis of the URLs a scanner should have hit, but did not. Down the road we might look to automate some of this checking as well.

These examples show how this technique can be used with ThreadFix and the open source OWASP ZAP dynamic scanner. We also have a plugin for Portswigger BurpSuite and I’ll follow up with a Burp-specific blog post before too much longer.

Calculating application attack surface and feeding it to dynamic scanners is one of the new cool things we’ve done with the 2.0 release of ThreadFix. To download ThreadFix 2.0 and the OWASP ZAP and BurpSuite plugins check out the ThreadFix download center. As always, please feel free to post any questions to the ThreadFix Google Group and post any bugs or feature requests (such as other scanners you’d like to see supported) to the ThreadFix GitHub page.


dan _at_


About Dan Cornell

A globally recognized application security expert and the creator of ThreadFix, Dan Cornell holds 20 years of experience architecting, developing and securing web-based software systems. As the Chief Technology Officer and a Principal at Denim Group, Ltd, the parent company of ThreadFix, he leads the technology team to help Fortune 500 companies and government organizations integrate security throughout the development process.