table of contents

So You Think You're Covered?

One of the more disturbing trends in testing is a reliance — in some cases, a corporate mandate — on code coverage metrics. Don't get me wrong: I wholeheartedly support coverage tools, and use both Cobertura and Emma on a regular basis (Cobertura has a plugin for Maven, Emma has a plugin for Eclipse). But I don't rely on either of them to tell me how well my unit tests exercise my code.

Because coverage metrics lie.

Perhaps no more than any other metric, and definitely less than some. But as with any metric, the numbers that you get out of your coverage tool are an indication of how well your code is being exercised, not an absolute statement. The rest of this article examines some of the ways that coverage tools lie, and what you can do about it.

How Coverage Tools Work

To understand why coverage tools lie, it's necessary first to know how they work. Which turns out to be very simple: the coverage tool adds code to the class to track execution, either via a custom classloader or as a post-compilation step. For example, here's how javap disassembles the “Hello, World” main method:

public static void main(java.lang.String[])   throws java.lang.Exception;
  Code:
   0:   getstatic       #19; //Field java/lang/System.out:Ljava/io/PrintStream;
   3:   ldc     #25; //String Hello, World
   5:   invokevirtual   #27; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   8:   return

And here's what it looks like after being instrumented by Emma:

public static void main(java.lang.String[])   throws java.lang.Exception;
  Code:
   0:   getstatic       #36; //Field $VR4019:[[Z
   3:   iconst_1
   4:   aaload
   5:   astore_1
   6:   getstatic       #2; //Field java/lang/System.out:Ljava/io/PrintStream;
   9:   ldc     #3; //String Hello, World
   11:  invokevirtual   #4; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   14:  aload_1
   15:  iconst_0
   16:  iconst_1
   17:  bastore
   18:  return

As you can see, the bytecode has more than doubled in size. If you don't read bytecode, what's happening is that Emma creates a boolean array associated with the method, where each element in the array corresponds to a particular piece of code. When that piece of code is executed, the associated array element gets set to true. In our case, there's only one line of code, System.out.println(), which happens between bytecode indices 6 and 11. The flag is set between indices 14 and 17, and in a larger method similar bytecode would be repeated for each tracked piece of program code.

And this brings up the first issue with coverage tools: how granular is their coverage? Both Emma and Cobertura track coverage to the level of a “basic block” (aka branch instructions in the bytecode). In other words, they will tell you if you haven't exercised both parts of a ternary expression. This is a very good thing. Other coverage tools aren't quite so good: they provide coverage of lines, or methods, or (worst) classes. Know what level of coverage you get from your tools!

The Independent Path Problem

Consider the following piece of code. How many different paths are there through this code?

public static int testMe(int a, int b, int c)
{
    if (a > 5)
        c += a;
    if (b > 5)
        c += b;
    return c;
}

If you build a truth table for the if statements, you'll see the answer is four:

a <= 5 a > 5
b <= 5    
b > 5    

OK, so how many times do you have to call this method to get 100% coverage, even with the block-level coverage of Emma and Cobertura? Two.

public void test100PercentCoverage() throws Exception
{
    assertEquals(2, testMe(2, 2, 2));
    assertEquals(21, testMe(7, 7, 7));
}

While this is a trivial example, every non-trivial program in existence exhibits the same trait: there are multiple independent paths through the code. This makes a mockery of any mandated coverage percentages: 100% reported coverage may not validate all possible paths — in fact, it almost certainly won't.

There are several approaches to mitigating this problem. One is to use truth (path) tables like the one shown above. Each box in the table should contain a test method, and empty boxes need more tests regardless of what your coverage tool says.

Unfortunately, this technique quickly breaks down. Real applications rarely have simple two-dimensional path combinations, and once you get above three or four dimensions the number of paths is overwhelming. A better approach is to refactor the code, typically using Extract Method to move the code inside the branch into an easily tested, linear method. That still leaves the branch intact, but it means that you can write simpler tests because you don't have to validate the branch and the code inside it. In fact, in some cases you can move the branch testing out of the realm of unit tests, and into the realm of acceptance tests.

One way to identify code that needs to be refactored is to calculate the cyclomatic complexity of the code under test. High complexity values indicate code with many independent paths, which consequently require more effort to properly test. Cobertura has an edge here, as it reports cyclomatic complexity along with coverage numbers.

Uncovered Dependencies

Are you looking for full coverage of just your code, or of your code and its dependencies? Usually you consider libraries to be separate from your code, but is that a reasonable approach?

True story from a previous job: while running instrumented tests against our server (written in C++), we discovered memory leaks and access violations in code from a major commercial database vendor, as well as that from another division of our own company. The former was met by “thank you, we'll investigate,” the latter by “you have no business poking around in that code?!?” While neither of these problems ultimately affected us, knowing of their existence meant that we could avoid code that triggered them.

Clearly, you can't justify writing tests for all the libraries that you use (although, if you're using open source, such tests would be welcomed). But the point remains: even if you reach 100% coverage of your own code, you won't necessarily be bug-free.

Exceptions

Coverage-driven testing tends to focus on the “happy path” through the tested code: does it do the right thing given expected input. Consider the following code, which expects a string in the form “foo:bar:baz” and extracts the middle element:

    public static String extractMiddle(String s)
    {
        int idx1 = s.indexOf(':');
        int idx2 = s.indexOf(':', idx1 + 1);
        return s.substring(idx1 + 1, idx2);
    }

    public void testExtractMiddle() throws Exception
    {
        assertEquals("bar", extractMiddle("foo:bar:baz"));
    }

This test gives 100% coverage, but completely ignores cases where the string is null, or where it's improperly formatted and idx1 is -1, causing the substring() to throw. Perhaps that's OK. After all, this code is probably deep within your application, and there's no good reason to exercise all the failure modes — but at some point you had better test input validation, to ensure that only good strings will get to this code.

Missing Features

And that brings me to the final problem: a coverage tool can tell you to write more tests, but it can't tell you to write more mainline code. Or put another way, if a missing feature is never tested, you'll never know — until the code reaches production.

What you need is some way to determine how well your tests cover the application's specifications. The test-driven-development contingent will respond that tests are the specifications: if there isn't a test for a feature, that feature doesn't exist. Unfortunately, in a large application, where you may have thousands of unit tests, it's remarkably hard to identify missing features. Worse, you may think that you have a feature tested, when in fact it's only tested for a subset of the possible application states.

Ultimately, this comes back to the Independent Path Problem, and there's no good solution to that problem save careful thought. While test-as-specification may be appropriate on the level of a single class or set of interacting classes, it's not appropriate at the level of an application. And acceptance tests, while closer to usable specification, do not have the granularity of a good unit test. They especially tend to miss boundary conditions and exceptional behavior.

Conclusion

As I said at the top of this article, I think coverage tools are great. A big red block in the middle of your code is a strong incentive to write more tests. But more important, from the perspective of quality code, is to think about what your code is doing and put time into ensuring that your tests fully exercise it, especially at the boundaries. Because 100% coverage isn't that important when you get a midnight phone call asking why your code is throwing NullPointerException.

For More Information

Coverage tools:

“Uncle Bob” Martin has several articles on code coverage, cyclomatic complexity, and (of course) test-driven development. I'll leave explanations of “CRAP” to him.

There aren't many examples in this articles, and the ones that exist are meant for shock value rather than education. If you feel the need to shock someone, here they are:

I gave a presentation on this material at the Philadelphia Java Users Group, in the fall of 2009. It doesn't say anything that you haven't alread read, but it's one of the better-looking presentations that I've done. You can find the slides here.

Copyright © Keith D Gregory, all rights reserved