Refactoring Unit Tests

Originally published: 2009-05-18

Last updated: 2015-04-27

One of the great things about unit tests is that they allow you to refactor and restructure your code in safety, because you get immediate feedback when you break it. Because you can (and do!) constantly refactor your code, it's simpler, more modular, and easier to extend and maintain. But what about the tests themselves? Over time, you will develop a testing codebase that may be larger than the mainline code. Should you — can you — refactor those tests?

The answer, of course, is a loud YES!. In fact, you'll probably find more opportunities for refactoring in tests than in mainline code. Unit tests tend to be very repetitive, exploring different aspects of the same class. As a result, they'll use similar setup, execution, and teardown, often changing just a parameter or two.

After refactoring, your tests will better convey exactly what you're trying to test. There's a hidden prize, however: in many cases you'll discover or create methods that can be reused. For example, an assertion that uses XPath to verify the contents of a generated XML document. When you find such code, pull it out into a “testing library,” available for your next project.

This article describes my approach to unit test refactoring. It uses examples written with JUnit 3, not JUnit 4. I continue to use JUnit 3 for my personal work, and I suspect many companies are still using it as well. Plus, there's a large body of JUnit3 test suites that could benefit from refactoring. The techniques, however, will apply as well to JUnit 4.

A Case Study

I'm currently working on a Forth interpreter project. This is an almost too-perfect study for testcase refactoring, because Forth is built around “words” that make some change in a simple environment. As a result, all tests follow the same pattern: set up the context, execute the word(s), assert that the context was changed appropriately.

The need for refactoring became apparent after approximately 80 tests, or roughly 1700 lines of code (including comments and whitespace). With this many tests, the testing structure that worked very well at the start of the project was showing strain: although the tests remained relatively small and self-contained, their logic was hidden under a mass of contextual code.

For example: the following test verifies that the Compiler word properly creates a new definition. It is a black-box test: rather than look inside the new definition, it verifies that the defined word produces the expected output. Once you know how it works, the test is readable. I would prefer, however, that you didn't need that background knowledge to understand it.

public void testCompiler() throws Exception
    // note: ":" would already be consumed
    StringReader in = new StringReader("argle foo bar ;");
    StringWriter out = new StringWriter();
    Context ctx = new Context(in, out);
    ctx.dictionary().store(new LoggingWord("foo"))
                    .store(new LoggingWord("bar"));

    new Compiler().execute(ctx);
    Word word = ctx.dictionary().lookup("argle");

    String result = out.toString();
    int idx1 = result.indexOf("foo");
    int idx2 = result.indexOf("bar");
    assertTrue(idx1 >= 0);
    assertTrue(idx2 > idx1);

The Approach

The goal of refactoring unit tests is slightly different from refactoring mainline code. For the latter, your goal is to modularize the codebase and eliminate tightly coupled relationships. For unit tests, those goals are secondary to creating simple, human-readable tests.

As a result, the number of appropriate refactorings is lower: I find that Extract Method and Introduce Explanatory Variable are pretty much all that I use. I'd further suggest that, if you find yourself turning to any of the refactorings that involve encapsulation (such as Extract Class or Replace Conditional With Polymorphism), these are signs that you need more refactoring in your mainline code — you should spend time making that code easier to test and use, rather than making your tests use it more easily.


One of my better testing habits is to create an AbstractTestCase class at the start of every project, and have all tests derive from it. This gives you a convenience place to put refactored methods, along with common objects such as a logger instance.

My one caution is that this base class doesn't become a dumping ground for everything that you think you might want to share. When you refactor code, first refactor within the test class itself. Think about moving refactored code to the base class only when you see the same refactorings happening in more than one test class. And in a large system, don't be afraid to create several layers of abstract classes: inheritance is a tool, use it when it makes your life easier.

Extract Method

In the Extract method refactoring, you identify repeated pieces of code, and replace them by a method call. For example, many of the Forth interpreter tests have to push values onto the operand stack before executing the word under test. The easiest way to do this is execute the objects that the interpreter would:

    new NumericLiteral(12).execute(ctx);
    new NumericLiteral(13).execute(ctx);

This code works, but it isn't very readable: the values that we're pushing are lost in the boilerplate code to create a “word” object and execute it. An extract method vastly improves readability:


Even better is to recognize that you'll often be pushing multiple values onto the stack, and use varargs.

pushValues(12, 13);

Introduce Explanatory Variable

This refactoring creates a temporary variable whose only purpose is to encapsulate some complex piece of code for readability. You can see an example of this in the (unrefactored) test at the top of this article: it verifies that an output string contains the words “foo” and “bar,” in that order. The “idx” variables are explanatory variables: they indicate that we're comparing the index of the two substrings.

The alternative, unrefactored version, is below. At first glance, this may seem simpler; it's definitely fewer lines of code. However, it quickly becomes unreadable when you build long chains of assertions that use the same base information. For example, what if you wanted to test that the output was “foo bar bar foo”?

    assertTrue(result.indexOf("foo") >= 0);
    assertTrue(result.indexOf("bar") > result.indexOf("foo"));

Replace Literal by Constant

This isn't one of Fowler's documented refactorings, probably because it's a standard coding practice. However, I'm constantly amazed at how often tests have literal values sprinkled through them — even tests that I wrote, and I'm a fanatic about using constants. I think this comes from the immediacy of test writing: you are writing one test, with specific values, and don't think whether how those values will be used elsewhere.

There are, of course, times when it's reasonable to use literals. The mathematical example above would be less readable if I were to introduce variables (or worse, class-level constants). However, most tests benefit: as a rule of thumb, if a test refers to the same literal value in both setup and assertion, give it a name:

    // don't do this
    String resultA = someOperation("foo");
    // instead do this
    final String value = "foo";
    String resultB = someOperation(value);

Another rule of thumb: if you find yourself reusing the same literals, define them as class-level constants. I find that this drives my tests toward a consistent structure: for example, an XML testcase might define constants EL_ROOT, EL_CHILD_1, and so on for element names. It's then a matter of copying — and refactoring — boilerplate code that creates and asserts the generated XML.

That said, do not reuse constants defined by the class under test. Your tests exist to validate that this class does not unexpectedly change its interface, and defined constants are part of that interface. It may be more work to define your own set of constants, but it checks that production code won't break unexpectedly.

Where to Refactor

A good refactoring will reduce complexity and eliminate duplication. The former comes from taking a complex block of code and giving it a name; it can be done anywhere. The latter comes from taking code that is repeated, perhaps with slight differences, and creating a canonical version that parameterizes those differences. In JUnit tests, duplication tends to happen in two areas: setup/teardown and assertions.

Setup and Teardown

All tests follow the same pattern: set up initial conditions, execute some operation, and assert that you got the expected results. This pattern is embedded into the design of junit.framework.TestCase, which provides a setUp() method that's called before each test method. Perhaps paradoxically, I find that I rarely use the built-in setup and teardown methods, preferring instead to do setup within the body of the test.

My reasoning is that a true unit test should focus on a limited aspect of the code being tested. As a result, setup code tends to be specific to the test. That doesn't mean that there isn't an opportunity to share code, just that the opportunity doesn't fit into JUnit's setUp() method.

For the Forth interpreter, each of the test cases needs to create a Context object. However, some testcases (those for the interpreter and compiler) need to read input or access words already in the dictionary, while the rest don't. The answer is a pair of factory methods in AbstractTestCase: one for an empty Context, one for preloaded (note that I've also moved the output object into a member variable):

    protected void createContext()

    protected void createContext(String in, Word... toStore)
        _out = new StringWriter();
        _ctx = new Context(new StringReader(in), _out);
        for (Word word : toStore)

This reduces the per-test setup to something resembling the following. It also improved my mainline code: I originally had a no-argument constructor for Context, was used only for testing. By creating the factory method, I had the same functionality for tests but was able to keep my mainline code aligned with real-world use.

    createContext("argle foo bar ;",
                  new LoggingWord("foo"),
                  new LoggingWord("bar"));


Assertions represent one of the best areas for refactoring: indeed, this article was originally about writing custom assertions. In the case of the Forth interpreter, most assertions revolve around the contents of the stack. For example, the math operations started out with assertions like these:

    assertEquals(1, ctx.operandStack().size());
    assertEquals(25, ctx.operandStack().pop().getValue());

This asserts that the stack contains a single item, and that its value is 25. A lot of typing, particularly when it's replicated throughout a test class. Taking a hint from the pushValues() method described above, we can refactor this into assertStack():

    pushValues(12, 13);

For another example, look back at testCompiler(): we have a lot of code to verify that particular pieces of text appears in the output, in a given order. This is also a prime candidate for refactoring (and makes use of the fact that our output is held as a member variable):

    assertOutputContains("foo", "bar");

You've Got Tools, Use Them

While the assertOutputContains() method is fine as far as it goes, that isn't very far: it basically takes the original index-based assertions, and puts them in a loop. However, we've got another tool that can do the same thing: a regular expression.


The original implementation of assertOutputContains() isn't worth showing; the new implementation is a mere three lines of code:

protected void assertOutputContains(String regex)
    String output = getOutput();
    assertTrue("expected: " + regex + "; was: " + output,

Whenever you find yourself writing complex code to perform an assertion, particularly when it involves text, ask yourself if there's a tool that can make the code simpler. Regular expressions for text, XPath for XML, even the classes in java.text; they're all tools that can make your life easier.

Assertion Libraries

As you're refactoring assertions, chances are very good that you'll find one or more that aren't specific to your project. For example, asserting a regular expression is going to be useful for any project that manipulates text. If you don't already have a cross-project “commons” library, it's time to create one and put these assertions in it. Over time, this library will grow into an asset that you'll rely on when you start a test suite.

When implementing this library, be aware that your custom assertions will have to explicitly import the “standard” assertions from junit.framework.Assert, and you'll have to explicitly import your assertions into your test cases. The JDK 1.5 static import directive will make this easier. I strongly suggest that you do not follow the JUnit approach, where all testcases inherit from your custom assertion class, in part because you'll almost certainly want multiple such classes.


I take a very narrow view of refactoring: it re-arranges code, but preserves the overall behavior. Sometimes, however, you realize that there's a better way to do something. Often, this realization comes about because you've eliminated distracting code by refactoring.

In the case of my Forth interpreter, I had this sort of realization about LoggingWord, a mock object that would record its invocation in the output stream. When I originally created this object, I wrote not only its name but also its identity hashcode: my intent was to allow assertions on the actual instance, not just the name. This decision drove the complex positional output assertions, and turned out to be a YAGNI (You Ain't Gonna Need It) violation: I never wrote a test that needed to differentiate between instances. After removing the superfluous instance ID, output assertions became simple string comparisons.

The risk of such restructuring is that you're changing the meaning of the test. As long as the test still passes against unchanged mainline code, this should not be a major issue. However, you should carefully consider any such changes to make sure that you're not simplifying tests to the point where they leave a hole for bugs to walk through (coverage tools can help here).

Closing Thoughts

Here's the testCompiler() method post-refactoring (and with another assertion added). The overall lines of code have not decreased dramatically, in part due to my indentation style. However, it is far more readable: you're no longer distracted by boilerplate code that creates the input and output streams, nor do you have to puzzle through a mass of indexOf()s that assert the output.

public void testCompiler() throws Exception
    // : would already be consumed
    createContext("argle foo bar ;",
                  new LoggingWord("foo"),
                  new LoggingWord("bar"));

    execute(new Compiler());
    Word word = getContext().dictionary().lookup("argle");
    assertStringValues(word, "argle", ": argle foo bar ;");


For More Information

Everybody should own a copy of Refactoring, even if they never read it. Just having it sitting on the shelf will be an incentive to keep your code well-factored.

The Forth interpreter used as an example is still a work in progress; it will be linked here once complete.

The Practical XML library contains an example “common assertions” class, DomAsserts, which uses XPath to assert various characteristics of an XML document. This class (and its unit tests) demonstrate how an assertion looks when not contained within a subclass of junit.framework.TestCase.

Copyright © Keith D Gregory, all rights reserved

This site does not intentionally use tracking cookies. Any cookies have been added by my hosting provider (InMotion Hosting), and I have no ability to remove them. I do, however, have access to the site's access logs with source IP addresses.