So You Think You're Covered?

Originally published: 2009-08-20

Last updated: 2015-04-27

One of the more disturbing trends in testing is a reliance — in some cases, a corporate mandate — on code coverage metrics. Don't get me wrong: I wholeheartedly support coverage tools, and use both Cobertura and Emma (now JaCoCo) on a regular basis (Cobertura has a plugin for Maven, Emma has a plugin for Eclipse). But I don't rely on either of them to tell me how well my unit tests exercise my code.

Because coverage metrics lie.

Perhaps no more than any other metric, and definitely less than some. But as with any metric, the numbers that you get out of your coverage tool are an indication of how well your code is being exercised, not an absolute statement. The rest of this article examines some of the ways that coverage tools lie, and what you can do about it.

How Coverage Tools Work

To understand why coverage tools lie, it's necessary first to know how they work. Which turns out to be very simple: the coverage tool adds code to the class to track execution, either via a custom classloader or as a post-compilation step. For example, here's a pre-coverage version of the “Hello, World” program, disassembled with javap.

public static void main(java.lang.String[])   throws java.lang.Exception;
  Code:
   0:   getstatic       #19; //Field java/lang/System.out:Ljava/io/PrintStream;
   3:   ldc     #25; //String Hello, World
   5:   invokevirtual   #27; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   8:   return

And here's what it looks like after being instrumented by Emma:

public static void main(java.lang.String[])   throws java.lang.Exception;
  Code:
   0:   getstatic       #36; //Field $VR4019:[[Z
   3:   iconst_1
   4:   aaload
   5:   astore_1
   6:   getstatic       #2; //Field java/lang/System.out:Ljava/io/PrintStream;
   9:   ldc     #3; //String Hello, World
   11:  invokevirtual   #4; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   14:  aload_1
   15:  iconst_0
   16:  iconst_1
   17:  bastore
   18:  return

As you can see, the bytecode has more than doubled in size. If you don't read bytecode, what's happening is that Emma creates a boolean array associated with the method, where each element in the array corresponds to a particular piece of code. When that piece of code is executed, the associated array element is set to true. In our case, there's only one line of code, System.out.println(), which happens between bytecode indices 6 and 11. The flag is set between indices 14 and 17, and in a larger method similar bytecode would be repeated for each tracked piece of program code.

The Granulatiry Problem

And this brings up the first issue with coverage tools: how granular is their coverage? Some simplistic tools only track coverage at the method level: they set their flag on entry to a method. Others track at the line level: they manage their flags based on the line number information that the Java compiler produces. Neither of these levels is acceptable.

Both Emma and Cobertura track coverage to the level of a “basic block,” which roughly corresponds to every branch instructions in the source code. This means that you'll be able to know the coverage for both paths of an if-else, even if it's hidden inside a ternary operator.

At the time I originally wrote this article, many popular tools were limited to method or line coverage. Now basic-block coverage is standard. But the key point remains: know your tools!

The Independent Path Problem

Consider the following piece of code. How many different paths are there through this code?

public static int testMe(int a, int b, int c)
{
    if (a > 5)
        c += a;
    if (b > 5)
        c += b;
    return c;
}

If you build a truth table for the if statements, you'll see the answer is four:

	`a <= 5`	`a > 5`
`b <= 5`
`b > 5`

OK, so how many times do you have to call this method to get 100% coverage, even with the block-level coverage of Emma and Cobertura? Two.

public void test100PercentCoverage() throws Exception
{
    assertEquals(2, testMe(2, 2, 2));
    assertEquals(21, testMe(7, 7, 7));
}

While this is a trivial example, every non-trivial program exhibits the same trait: there are multiple independent paths through the code. This makes a mockery of any mandated coverage percentages. Even 100% reported coverage may not validate all possible paths — in fact, it almost certainly won't.

There are several approaches to mitigating this problem. One is to use truth tables like the one shown above. Each box in the table should contain a test method, and empty boxes need more tests regardless of what your coverage tool says. Unfortunately, this technique quickly breaks down. Real applications rarely have simple two-dimensional path combinations, and once you get above three or four dimensions the number of paths is overwhelming.

A better approach is to refactor the code, typically using Extract Method to move the code inside the branch into an easily tested, linear path. That still leaves the branch intact, but it means that you can write simpler tests because you don't have to validate the branch and the code inside it. In fact, in some cases you can move the branch testing out of the realm of unit tests, and into the realm of integration or acceptance tests.

One way to identify code that needs to be refactored is to calculate the cyclomatic complexity of the code under test. High complexity values indicate code with many independent paths, which consequently require more effort to properly test. Cobertura has an edge here, as it reports cyclomatic complexity along with coverage numbers.

Uncovered Dependencies

Are you looking for coverage of just your code, or of your code and its dependencies? Usually you consider libraries to be separate from your code, but is that a reasonable assumption?

True story from a previous job: while running instrumented tests against our server (written in C++), we discovered memory leaks and access violations in code from a major commercial database vendor, as well as that from another division of our own company. The former was met by “thank you, we'll investigate,” the latter by “you have no business poking around in that code!” While neither of these problems ultimately affected us, knowing of their existence meant that we could avoid code that triggered them.

Clearly, you can't justify writing tests for all the libraries that you use (although, if you're using open source, such tests would no doubt be welcomed by the project maintainers). But the underlying point is this: even if you reach 100% coverage of your own code, you still haven't covered your entire application.

Exceptions

Coverage-driven testing tends to focus on the “happy path” through the tested code: does it do the right thing given expected input. Consider the following code, which expects a string in the form “foo:bar:baz” and extracts the middle element:

    public static String extractMiddle(String s)
    {
        int idx1 = s.indexOf(':');
        int idx2 = s.indexOf(':', idx1 + 1);
        return s.substring(idx1 + 1, idx2);
    }

    public void testExtractMiddle() throws Exception
    {
        assertEquals("bar", extractMiddle("foo:bar:baz"));
    }

This test gives 100% coverage, but completely ignores cases where the string is null, or where it's improperly formatted and idx1 is -1, causing the substring() to throw. Unfortunately, code like this tends to be buried deep within the program, working fine until some input validator doesn't do its job.

Missing Features

And that brings me to the final problem: a coverage tool can tell you to write more tests, but it can't tell you to write more mainline code. Or put another way, if a missing feature is never tested, you'll never know — until the code reaches production.

What you need is some way to determine how well your tests cover the application's specifications. The test-driven-development contingent will respond that tests are the specifications: if there isn't a test for a feature, that feature doesn't exist. That may be true, but in a large application you may have thousands of unit tests, and it's remarkably hard to identify missing features. Worse, you may think that you have a feature tested, when in fact it's only tested for a subset of the possible inputs.

Ultimately, this comes back to the Independent Path Problem, and there's no good solution to that problem save careful thought. While test-as-specification may be appropriate on the level of a single class or set of interacting classes, it's not appropriate at the level of an application. And acceptance tests, while closer to usable specification, do not have the granularity of a good unit test. They especially tend to miss boundary conditions and exceptional behavior.

Conclusion

As I said at the top of this article, I think coverage tools are great. A big red block in the middle of your code is a strong incentive to write more tests. But more important, from the perspective of quality code, is to think about what your code is doing and put time into ensuring that your tests fully exercise it, especially at the boundaries. Because 100% coverage isn't that important when you get a midnight phone call asking why your code is throwing NullPointerException.

For More Information

Coverage tools:

Emma was my preferred tool for a long time, primarily because it supported Maven 1.0 and was free. I use the EclEmma plugin for Eclipse, and despite its name, it's based on the JaCoCo coverage library.
Cobertura is what I use now for Maven 2.0 projects. You should recognize that Cobertura is licensed under the GPL, not the LGPL. Personally, I think this is a moot point, since you'll never release instrumented code, but taking legal advice from anyone but a lawyer is a bad idea.
Clover is a commercial tool from Atlassian (the same people who brought you Confluence and Jira). I've used it in a corporate setting, but have not delved into its configuration.

“Uncle Bob” Martin has several articles on code coverage, cyclomatic complexity, and (of course) test-driven development. I'll leave explanations of “CRAP” to him.

There aren't many examples in this articles, and the ones that exist are meant for shock value rather than education. If you feel the need to shock someone, here they are:

MultipathExample demonstrates that two tests give 100% on four code paths.
ExceptionExample demonstrates that you can get 100% coverage without testing failure conditions.

I gave a presentation on this material at the Philadelphia Java Users Group, in the fall of 2009. It doesn't say anything that you haven't already read, but it's one of the better-looking presentations that I've done. You can find the slides here.

This site does not intentionally use tracking cookies. Any cookies have been added by my hosting provider (InMotion Hosting), and I have no ability to remove them. I do, however, have access to the site's access logs with source IP addresses.