Testing the code means verifying that the code works correctly and as expected. But how do you know the tests evaluating your code are good enough or provide adequate coverage? This is where mutation testing comes into play.
Mutation testing helps assess the quality of a test suite by verifying whether your tests are effective and can identify bugs as they are meant to. It is especially useful for enterprise Java applications with thousands of unit tests as it enables developers to quickly identify code spots not covered by tests.
In this article, we will discuss the key concepts, metrics, and best practices of mutation testing and learn how to perform it for Java applications using the PITest library.
The code used in this article is available on GitHub.
Table of Contents
What is mutation testing?
Mutation testing is a technique that introduces bugs into the code to see whether the existing tests can identify these bugs. If none of the tests fails after introducing a bug, it means that it was not detected and the test suite is inadequate.
Mutation testing shows which parts of the code are not covered by tests, but it is not the same as line coverage.
Line coverage is part of the code coverage metrics that measures the percentage of code lines executed during the tests, but it doesn’t say anything about the quality of tests. There’s also another concept called branch coverage that measures the number of branches executed, most typically, if statements. Mutation coverage is closer to the branch coverage, but gives even more information.
Let’s look at the following example. We have a simple Task
class:
@Entity
@Table(name = "tasks")
public class Task {
@Id
@Column(name = "id", nullable = false)
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(name = "name", length = 150)
private String name;
@Column(name = "start_time")
@DateTimeFormat(pattern = "yyyy-MM-dd HH:mm")
private LocalDateTime start;
@Column(name = "end_time")
@DateTimeFormat(pattern = "yyyy-MM-dd HH:mm")
private LocalDateTime end;
}
In the TaskService
class, we have only one method for checking whether there is an overlap of dates for a new task with dates for some existing task:
public boolean isOverlap(LocalDateTime newStart,
LocalDateTime newEnd, Task task) {
return !(newEnd.isBefore(task.getStart())
|| newStart.isAfter(task.getEnd()));
}
Suppose we have written this test to verify that it functions correctly:
@Test
void shouldFindOverlap() {
Task task = new Task(
"Make lunch",
"2025-02-01 12:00", "2025-02-01 12:30");
LocalDateTime newStart = DateTimeParser.parseToLocalDateTime("2025-02-01 11:00");
LocalDateTime newEnd = DateTimeParser.parseToLocalDateTime("2025-02-01 12:15");
assertTrue(taskService.isOverlap(newStart, newEnd, task));
}
So far, we have 100% line coverage (but not 100% branch coverage) for TaskService
as we have executed all lines of the one method we have. But this is by no means a good test suite. We don’t know what happens if there’s no time overlap. We also can’t be sure that if somebody introduces changes to the code, the test will fail.
Another example of good line coverage with poor-quality tests:
@Test
public void testThatDoesNothing() {
// Create a mock object
TaskService mockService = mock(TaskService.class);
// Set up a method to do nothing when called
doNothing().when(mockService).deleteTask(1L);
// Call the method
mockService.deleteTask(1L);
// Verify that the method was called
verify(mockService.deleteTask(1L));
}
There’s an error in this test. If we wrote verify(mockService).deleteTask(1L)
, then the test would actually check the code. But with verify(mockService.deleteTask(1L))
, it does nothing, so it will always pass! Such a mistake is easy to make, so there is a non-zero risk that such specimens could nest among hundreds or thousands of unit tests for an enterprise application.
Mutation testing aims to reveal such weak spots in your tests by introducing mutations.
What is a mutation
A mutation is a slight change made to the program’s code to introduce a bug or regression. These altered code versions are called mutants. After the mutations have been inserted, the mutated versions of the program are run with the existing test suite.
If tests fail, it means that the mutant was killed. If tests pass, the mutant has survived. In the latter case, the test suite should be improved to kill the remaining mutants. In most cases, it means you have to add the missing tests for the code that wasn’t tested.
Note that mutation testing does not aim to verify the quality of the code. Its goal is to assess the quality of the test suite.
Types of mutation
Mutants are typically very small and can be divided into several types:
- Statement mutations: statements can be changed or removed. For example,
int sum += i
becomesint sum = i
; - Value mutations: a value is changed. For instance,
int age = 18
becomesint age = 16
; - Operator mutations: arithmetic or logical operators are changed. For instance,
TRUE
becomesFALSE
ora > b
becomesa < b
.
Let’s look at the code snipped with the isOverlap()
method again. If we replace boolean return with TRUE
like that:
public boolean isOverlap(LocalDateTime newStart,
LocalDateTime newEnd, Task task) {
return true;
}
We introduced a mutation manually as an example. Of course, you shouldn’t create mutants yourself or change the production code. There are tools that we will discuss in more detail below that will automatically create copies of the code with mutations and provide detailed reports on the testing results.
Key metrics of mutation testing
Mutation score is used to measure the effectiveness of the test suite. The mutation score is the number of killed mutants divided by the total number of mutants multiplied by a hundred:
The mutation score of 100% means that all mutants were killed, and the test suite is effective.
It’s important to note that sometimes, we can encounter equivalent mutants among other mutations. Equivalent mutants are changes to the program syntax that behave like the original program. For instance, look at the following snippet:
public int compare(Item item1, Item item2) {
if (item1.price > item2.price) {
return 1;
} else if (item1.price < item2.price) {
return -1;
} else {
return 0;
}
If we change the return value from 1 to 2 (or from -1 to -2), the program’s behavior will not change because the Comparator returns a negative integer, zero, or a positive integer, so it doesn’t matter which positive or negative number to return.
Equivalent mutants can’t be killed, so you will have to either rewrite the code if they can potentially lead to bugs or not consider them in the overall mutation score.
Why you should use mutation testing
Mutation testing is a good but often overlooked practice that should be taken on board as an integral part of the development/testing processes for the following reasons:
- As demonstrated above, line coverage is not the best indicator of test quality. Mutation testing helps find flaws in your test suite.
- When you introduce changes to the program, mutation testing helps to see whether the tests still function correctly.
- When you start working on an existing project, mutation testing can help you rapidly assess the quality of its test suite.
- Sometimes, your program may include code that is difficult to test with unit tests; mutation testing can help you with such code.
- Mutation testing helps to make testing different scenarios, not just happy paths, a habit.
Advantages and disadvantages of mutation testing
Integration of mutation tests into the project has several advantages:
- Mutation tests can help you find missing tests and improve the reliability of the test suite;
- Enterprises can assess the quality of the existing test suite using the mutation score;
- Although mutation tests are not meant for checking code quality, the presence of equal mutants can, in some cases, point to potential weak spots in the code;
- Mutation testing helps to safely refactor the test suite after introducing changes to the code.
Possible disadvantages of mutation testing:
- Improving the mutation score may take a lot of time, depending on the size of the application and the test suite;
- Mutation testing becomes an additional step in the testing process, prolonging the overall testing time;
- Mutation testing without a specialized tool is time-consuming.
Using a tool for automatic mutation testing and integrating mutation tests at an earlier stage of project development can reduce the disadvantages.
Best practices of mutation testing
There are two key recommendations for efficient mutation testing:
- If possible, integrate mutation testing at an early stage of development to reduce the time spent on testing, assessing, and refactoring;
- Do not run the whole test suite with mutations every time. Use mutation tests for changed code parts to save time.
Enough theory: let’s kill some mutants!
How to perform Java mutation testing with PITest
In this tutorial on using mutation testing for Java applications, we will use an automated tool for mutation testing. Several mutation testing systems exist for Java applications, including PITest, Javalanche, µJava, or Jumble. We will use PITest.
What is PITest
PITest is an open-source tool for performing mutation testing on Java and Kotlin applications. It is fast, easy to use, and targeted to real-world applications. PITest integrates with common build tools, including Maven and Gradle, JUnit, and the most popular mocking frameworks.
PITest inserts mutations into the bytecode generated by the compiler, which is faster than working with source code. However, sometimes, it may be complicated to understand how mutations can be mapped to the source code.
After running the tests, PITest produces easy-to-understand reports with information about line coverage and mutation coverage.
Set up the project
For this tutorial, we will use a small Spring Boot demo application, snippets from which you have already seen above. The code is available on GitHub.
You can also take your own application, just make sure there are some unit tests. We are testing the tests, right?
Let’s see how PITest works with Spring Boot. Go to Spring Initializr, select Java 23, Maven, Jar, and a couple of dependencies: H2 Database and Spring Data JPA.
Generate the project and open it in your favorite IDE. Make sure that you have Java 23 installed: you can download Liberica JDK recommended by Spring.
Now, let’s integrate PITest. You need to add the following plugin to pom.xml:
<plugin>
<groupId>org.pitest</groupId>
<artifactId>pitest-maven</artifactId>
<version>1.17.4</version>
<dependencies>
<dependency>
<groupId>org.pitest</groupId>
<artifactId>pitest-junit5-plugin</artifactId>
<version>1.2.1</version>
</dependency>
</dependencies>
</plugin>
That’s it, no more configurations required!
Nevertheless, you can configure the PITest tests as you see fit. We'll talk about it later.
Our application consists of a POJO class Task
:
@Entity
@Table(name = "tasks")
public class Task {
@Id
@Column(name = "id", nullable = false)
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(name = "name", length = 150)
private String name;
@Column(name = "start_time")
@DateTimeFormat(pattern = "yyyy-MM-dd HH:mm")
private LocalDateTime start;
@Column(name = "end_time")
@DateTimeFormat(pattern = "yyyy-MM-dd HH:mm")
private LocalDateTime end;
//getters, setters, constructors
An auxiliary class DateTimeParser
:
public class DateTimeParser {
static final DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm");
public static LocalDateTime parseToLocalDateTime(String date) {
return LocalDateTime.parse(date, dateTimeFormatter);
}
}
TaskRepository
:
public interface TaskRepository extends JpaRepository<Task, Long> {
}
And finally, TaskService
with business logic, which is quite simple. We can save the task to a database, but only if it doesn’t collide with other tasks in terms of schedule:
@Service
public class TaskService {
private final TaskRepository taskRepository;
public TaskService(TaskRepository taskRepository) {
this.taskRepository = taskRepository;
}
public boolean isOverlap(LocalDateTime newStart,
LocalDateTime newEnd, Task task) {
return !(newEnd.isBefore(task.getStart())
|| newStart.isAfter(task.getEnd()));
}
protected boolean checkSlots(LocalDateTime newStart, LocalDateTime newEnd) {
return taskRepository.findAll()
.stream()
.noneMatch(t -> isOverlap(newStart, newEnd, t));
}
public int createTask(Task newTask) {
if (newTask.getStart().isBefore(newTask.getEnd())
&& checkSlots(newTask.getStart(), newTask.getEnd())) {
taskRepository.save(newTask);
return 1;
}
return -1;
}
}
We already have one test in the TaskServiceTest
class. Let’s add one more to verify that we can successfully save the task:
@SpringBootTest
public class TaskServiceTest {
@Autowired
private TaskService taskService;
@Test
void shouldFindOverlap() {
Task task = new Task(
"Make lunch",
"2025-02-01 12:00", "2025-02-01 12:30");
LocalDateTime newStart = DateTimeParser.parseToLocalDateTime("2025-02-01 11:00");
LocalDateTime newEnd = DateTimeParser.parseToLocalDateTime("2025-02-01 12:15");
assertTrue(taskService.isOverlap(newStart, newEnd, task));
}
@Test
void shouldSaveTask() {
Task task = new Task(
"Make lunch",
"2025-02-01 12:00", "2025-02-01 12:30");
int val = taskService.createTask(task);
assertEquals(1, val);
}
}
That’s it, we can now set PITest loose on our test suite!
Run mutation tests
To run mutation tests, you need this command:
mvn test-compile org.pitest:pitest-maven:mutationCoverage
The mutationCoverage
goal covers all classes that match target tests and classes.
PITest will automatically analyze the coverage, create mutants, and generate the reports you will find in the target directory under pit-reports.
You can also use the -DwithHistory
flag to speed up the analysis of the same codebase. This flag enables saving the data into a temporary directory, so it is good for local development. In other cases, you can use incremental analysis to point to the location of mutation analysis results: historyInputLocation
and historyOutputLocation
or for Maven historyInputFile
and historyOutputFile
.
Analyze the results of mutation testing
The pit-reports directory includes several files: index.html with a summary of testing results and files with a more detailed overview of inserted mutations into each class.
Let’s first look at index.html.
PITest Summary report
As you can see, our mutation coverage is far from perfect and nests at 38%, meaning that 24 mutants were introduced and only 9 were killed. Well, we didn’t write any tests for the Task class, which is the main source of issues.
But let’s focus on the TaskService
class. The mutation score for this class is better, 64%, with 11 mutants injected and seven killed. Let’s open the Testservice.java.html file to see where these mutations lived.
TaskService testing results
Here’s our code highlighted with four colors:
- light green shows line coverage,
- dark green shows mutation coverage,
- light red shows lack of line coverage, and
- dark red shows lack of mutation coverage.
If you scroll down a bit, you will see the summary of mutants that were killed or have survived and where they were planted.
Description of mutants used in tests
The report also lists types of mutations used:
- CONDITIONALS_BOUNDARY
- EMPTY_RETURNS
- FALSE_RETURNS
- INCREMENTS
- INVERT_NEGS
- MATH
- NEGATE_CONDITIONALS
- NULL_RETURNS
- PRIMITIVE_RETURNS
- TRUE_RETURNS
- VOID_METHOD_CALLS
The full list of available mutations can be found here.
Let’s try to improve our mutation score!
Improve the mutation score
For the sake of brevity, let’s concentrate on the TaskService
class coverage.
On line 29, a boolean operator was changed to false
. On line 35, the conditional was negated. In addition, we have no coverage for line 39.
Let’s add one more test to see if it will be able to kill the test of the mutants. This time, we’ll verify that we can’t save the task if there are no available slots.
@Test
void shouldNotSaveTaskIfOverlap() {
Task task = new Task(
"Make lunch",
"2025-02-01 13:00", "2025-02-01 14:30");
int val = taskService.createTask(task);
assertEquals(-1, val);
}
Run the mutation tests again and check the report.
PITest mutant summary
Great, all the mutants were killed!
You can do a little exercise and improve the mutation score for the Task
class.
Configure PITest
As mentioned above, it is recommended to run mutation tests in case of code changes. So, there’s often no need to test all classes and tests in your project: it can be very time-consuming.
You can configure which classes should be mutated in the plugin:
<plugin>
<groupId>org.pitest</groupId>
<artifactId>pitest-maven</artifactId>
<version>1.17.4</version>
<dependencies>
<dependency>
<groupId>org.pitest</groupId>
<artifactId>pitest-junit5-plugin</artifactId>
<version>1.2.1</version>
</dependency>
</dependencies>
<configuration>
<targetClasses>
<param>com.myproject.somepackage.*</param>
<param>com.myproject.otherpackage.Utils*</param>
</targetClasses>
</configuration>
</plugin>
The same can be done to tests:
<configuration>
<targetTests>
<param>com.myproject.mutations.test.*</param>
</targetTests>
</configuration>
You can also specify which mutators should be applied:
<configuration>
<mutators>
<mutator>NEGATE_CONDITIONALS</mutator>
<mutator>MATH</mutator>
</mutators>
</configuration>
A full list of configuration options can be found here.
Conclusion
In this article, we discussed mutation testing and its value in assuring the test suite’s quality. It is an excellent supplementary technique for other testing metrics used in your company.
With the help of the right tool, doing mutation testing is not difficult. We saw that as we learned how to perform mutation testing on Java applications using the PITest library.
Don’t forget to subscribe to our newsletter to read more posts on best practices and cutting-edge solutions for Java development!
FAQ
How does mutation testing differ from standard unit tests?
Unit tests are aimed at validating the expected behavior of the code. Mutation testing evaluates whether the test suite is complete enough to detect bugs in the executed code.
Can I use mutation testing for enterprise projects?
Mutation testing is recommended for a project of any size and purpose.
What does mutation score below 100% mean?
A mutation score below 100% means that not all mutants planted into the code were killed, meaning that there are flaws in the existing test suite.
How to improve mutation score?
In most cases, you need to add the lacking tests to improve the mutation score.