I'm a Computer Science Master's candidate at Case Western Reserve University, Cleveland, OH. My research focuses on static analysis of software, specifically, predicting and reducing the occurence of regression defects, and how to prevent them. My advisor Lee White has examined this topic using what he calls a "software testing firewall". I am developing software to automate the process of such a firewall. (Last updated 07 Mar. 2008.)

What is a firewall?

What does your software require?

How will your tool work?

Future work

What is a firewall

A software testing firewall is akin to the real-world definition of a firewall. In an automobile or building, a firewall is a literal wall, protecting humans from potential danger from fire, be it a furnace or an internal combustion engine, or an unexpected conflagaration.

In software, often bug fixes or feature enhancements are released after an initial release of a piece of software. Anybody who knows much about Microsoft Windows knows that sometimes these releases break functionality, or have new bugs in place of the bugs they were trying to fix. These are known as regression faults. A "high" rate of faults may be indicative of poor software design, necessitating rewriting portions of code from scratch. Analysis after solving these bugs often reveals that the cause was lack of understanding of the way that a seemingly inconsequential change cascades through the code and breaks a module in a different portion of the code. Thus this problem is partially a program comprehension problem, and partly a testing problem. However, in practice, it seems that testers usually have less knowledge about the code than the programmers, so either the programmers need to be far more conservative in introducing change, or the testers (and the programmers) need better tools to help them know what really changed when a portion of code changes.

A software testing firewall analyzes the changes to a body of software, looking at function calls, message passing, class inheritance, and data paths (variable usage), to determine what portion of a program need to be (re)tested and to define any new tests that may now be required by these changes. A software testing firewall is not a magic bullet that will catch pre-existing bugs, but it is a systematic method for analyzing the "impact" of the changes made to a pre-existing body of code.

For the purposes of my research, I'm restraining myself to instances where a developer and the automated tool have full access to all source code. In the real world this isn't always the case, and there is theory from previous software testing firewall research on how to handle those cases.

What does your software require?

How will your tool work?

My tool will need full access to all source code. It will first analyze the differences between two pieces of code. These "pieces" can be defined as a folder hierachy, single file, or portion of a single file. The tool for performing the difference analysis is Araxis Merge, as my tool has a few requirements. The tool must have a GUI, the tool must run on Windows, the tool must be able to handle Windows MFC code and calls to dlls. Merge provides a .Net-compliant API, which allows me to analyze differences, launch a graphical window showing those differences, and send the analysis forward to the next stage of my tool.

Also, the company I'm developing my tool with uses Araxis Merge in-house, so it's essential that the differences my tool finds have a 1:1 correlation with changes they would find when using a typical version of Araxis Merge to perform the previous manual firewall creation process.

Knowing the differences, the tool needs to map the changes to specific lines of code to functions, or data, or a class hierarchy, etc. I need to be able to efficiently and accurately parse C++, C, and C# code. srcML is a tool for static analysis of code, which marks up the code with XML tags. srcML is nice in that it maintains spacing and line numbers of the source code, and less nice in that by inserting substantial markup tags, the column numbers of a code element are no longer reliable to find a code element. This is what we in Computer Science like to famously call a trade-off.

After problems building gcc-xml under Windows, I decided to adopt srcML as the parser / markup tool of choice. I then feed the marked up version of source code to a .Net-based XML parser that picks out function prototypes and line numbers where these functions begin and end. I am able to feed my tool a line number of a difference (from the previous phase) and get back the function prototype, (assuming such change occurs in a function at all) and then pass the function prototype on to the next phase to determine impact.

I do similar work to get an analysis of the impact on data of changing a variable. That work was difficult, and came down to a problem that to solve properly, probably requires handling an entire C/C++ syntax tree. I've determined that the problem of parsing out single differences from an actual difference block as output by diff / Araxis Merge is very difficult. Humans can see the similarities, and know what sections in a difference block have changed. It's not nearly as easy to describe algorithmically what a human does when they look at a line of code before and after, and know exactly what changed. Is the change to a comment, is the change before or after an assignment, etc. That work is going to be left incomplete for my research, and maybe somebody in the future will come up with a satisfactory resolution. I found other people getting stuck with the same problem in my literature search, so I'm less bothered by leaving this unsolved.

I need to map functions to a class hierarchy, which doxygen does wonderfully. doxygen makes beautiful graphical call graphs and class hierarchy diagrams. Problem is these are graphical. I want to be able to analyze the callers and callees of a given function. doxygen depends on Graphviz for drawing these graphs. I've modified doxygen to make a text representation of the callgraphs and NOT draw the actual diagrams. I parse the text-based Graphviz graph descriptions for callers and callees, thereby generating the last stage of my impact analysis for my automated firewall generation. Given a function prototype, my tool returns the callers and callees, provided it has access to a complete set of pre-produced callgraphs generated with my modified version of doxygen.

For a description of my changes to doxygen, see my doxygen page. I deliberately made changes to doxygen in such a way that if doxygen drastically changes the code internals, my code will still attach and run properly, it will just take longer. My modified doxygen binary runs about 20% faster by not having to call Graphviz to make the grapical callgraph images, the overwhelming majority of which I don't use. In my final report stage, I then generate only the needed graph images, using a call to Graphviz using the text-based graph description generated by doxygen.

There are essentially two versions of my tool, and one of them has a GUI based in Windows Forms. The command line interface version of my tool is more convenient for purposes like scripting automating firewall generation against daily builds.

My development process over the last year-plus has generally validated the usefulness of C# as a language. I have not tried to run my tool under Mono. Right now, my tool depends on multiple commercial software products, and to move the tool to a non-Windows platform would require replacing my difference tool with something freely available, like GNU diff. I would need to wrap several functions that come built-in with the API for Araxis Merge, and that would take me a while. I also call the API for Microsoft Word to generate the firewall report, inserting figures and text. I could do something similar by wrapping calls to something like TeX (or maybe OpenOffice) and then generate a PDF. srcML is free software and could readily be released. The same goes for doxygen, and as I mentioned earlier, while I have patched the doxygen source, this isn't strictly necessary for function, it just increases performance for my narrow usage of doxygen, and a typical doxygen usage would never want my patch to become part of the real doxygen project.

I had a persistent problem earlier in the project involving illegal XML characters. The .Net XML parser would choke on these, throw an exception, and refuse to parse them. I made a simple class for pre-treating my XML documents, and the solution is a stage just after doxgyen finishes, and one more pass of the XML-cleaning parser just after each time I call srcML against a source file with changes in it. I discovered that the source of these problems was some stray ascii characters in the source code I was testing with. Good program design dictates that you have to deal with the input you're given, and that's how I handled that problem. I don't make changes to the input, only to my representations of the input.

I also solved my earlier problem with how to get data impact data out of doxygen. Doxygen is used by most people to generate an html report, but I use the XML report, then parse the XML output with an array of custom XML parsers to extract data impact. This work went through multiple revisions. At one point I rewrote the parsers from scratch using a better approach. This improved maintainability of my code and helped my solve a problem that was ultimately traced to my overzealous use of a regular expression search. After revising a parser to work against index.xml, I was getting much more useful information from doxygen, with less runtime.


Future Work

At this stage of development, I'm analyzing the accuracy and performance of my tool. It's hard to say whether any piece of software is truly "finished", it's just released, and there's always room for improvement. There's a lot of room for improvement for my software.

Open problems I haven't yet solved:

  • For better comprehensibility, software testing firewall theory says that call graphs with common nodes should be merged on those nodes, creating a single larger graph, and this process should repeat until all common nodes are joined in a macro graph. If you get a graph that represents the entire program, this suggests I'm hoping somebody has already done this using graphviz, but I haven't attacked this problem yet. This was originally going to be part of my tool, but it has been dropped in favor of analyzing what I do have.
  • My GUI is really simple right now. I need to add several more features to it, including the ability to select single files rather than folders.