If we had half-assembled tangible products piling up and overloading our store, we’d be alarmed. If we had staff fighting in the store, we’d be alarmed, too. If students at school were not progressing in their skills, we’d be apprehensive. Yet, whenever I ask people, “Why are you using code reviews?” the answer is usually:
We’ve been doing this for ages, and it’s been working just fine.
Human beings have been manipulating information since the dawn of civilization, but IT as an industry is very young. The term “Information Technology” was only coined in 1958. IT is also quite complex to reason about: we work with information, and it’s intangible. We cannot physically see queues of half-done tasks obstructing our view and slowing everything down, we don’t physically fight over conflicts in reviews, we can’t assess how well we learn from reading code, and it’s very hard to rationally evaluate the quality of tasks to see how certain practices improve it.
I’m Vitaly Sharovatov, Developer Advocate for Qase, an all-in-one QA and test management platform. I’m a quality enthusiast with more than 20 years of experience and a passion for helping people build great products. In this article, I want to present four perspectives on classical pull-request-based asynchronous code review, which I believe explain why this practice isn’t as efficient as most people think.
Table of Contents
Psychological perspective
I have a hobby, leathercraft. I’m a novice, and I know I have years of study and practice ahead of me. I am learning it alone by watching YouTube and reading articles. Every single task I approach, I do it the way I know it should be done. If I show my crafted wallet to someone, and that person criticizes it a lot and gives hundreds of comments like “you should have thought of skiving the edges more,” I would perceive this criticism badly, even knowing that the person might be right.
Work done well fulfills our need for competence. Every human being associates the results of their work with themselves, and if I’m told my wallet is bad, I’m still going to be hurt. Valuing something we put effort into is innate to our nature. The more effort a person puts into a task, the more vulnerable they become to criticism, especially if the criticism feels unfair.
Fairness is key here. If I were studying leathercraft with a teacher who showed me how to do it right and then made mistakes while working on it, I would perceive their criticism as fair: we synchronized the “how,” the approach, and the methods before the work.
With traditional code reviews, there’s no synchronization before the work. The person does the task how they think is best, passes it for review, and receives criticism. In many cases, this triggers a feeling of unfairness, which, in turn, triggers negativity and sometimes conflicts or irritation.
The internet is full of posts and articles on “solving” this negativity problem. Most of these posts talk about providing more positive feedback or wrapping criticism between two layers of positive feedback, known as the “sandwich technique.” However, studies have shown that these approaches do not yield good results, and candid criticism is more beneficial.
This creates a vicious circle: negative feedback is perceived badly, while positive feedback doesn’t highlight problems and is essentially useless.
The root cause of this problem is that code review is done without prior synchronization on how the task is to be approached.
This problem disappears completely with pair programming, where two people work on a task together, synchronizing constantly. If I were crafting my leather wallet with a colleague, there would be no need for a “code review” after the work is done, as we both would see and correct issues while working together.
The problem would also be less significant if two or more engineers agreed on how the work was to be done before the actual work began. This could be implemented as a pair (or ensemble/mob) session to create a technical plan for feature implementation, after which the developer would work on the feature alone. In this case, even if a developer encounters some issues, they will be smaller and will not require significant refactoring.
Educational / knowledge sharing perspective
Sometimes, code reviews are said to have a knowledge-sharing goal. The idea is that the author of the code will learn something from the reviewer’s comments, and the reviewer will learn something from the code under review.
From the educational sciences perspective, there are a few issues with using code reviews to learn and share knowledge.
For the reviewer, reading code does not lead to much knowledge acquisition. The reason is simple: we learn much more through practice. If a reviewer were to interact with the code while reading — breaking, refactoring, running, testing, and debugging — they would learn much more than from just reading it.
Additionally, a reviewer only sees the final solution, not the problem analysis and solution synthesis process. These two aspects are the most important parts of feature development, while the code is just a by-product.
For the author of the code, the learning process is even less efficient, as all they get after the code review are some comments. While comments are feedback that can be valuable, with code reviews, this feedback is too late from an educational perspective.
Traditional education follows this process:
- The teacher gives all the necessary guidance and information.
- The student practices to learn it.
- The teacher checks the results of the practice and potentially highlights issues.
In code review, the first stage is absent: no one gives the author any guidance or information, so the author practices without knowing if they are learning correctly or incorrectly. The reviewer does not synchronize their understanding of “how” with the reviewee before the task is done. As a result, the practice step only reinforces existing knowledge and does not lead to new knowledge acquisition. The author might rework the code upon receiving comments, but this is simply inefficient.
Modern educational systems focus on group learning, where students learn and practice something new together with the teacher. With this approach, feedback is instant, and students never practice and learn something they will have to unlearn and redo later.
This approach is very similar to pair or ensemble programming, where two or more employees solve problems and write code together, learning from each other all the time.
Quality Perspective
Most companies believe quality gates to be efficient measures against bugs.
When presented with a pull request for review, a reviewer usually has two options:
- Study the requirements or user story and the problem behind it. Come up with a solution. Write the code for the solution. Compare this solution with what the author coded.
- Simply read the code.
Option one could potentially allow a decent level of quality control because the author and the reviewer would be comparing two solutions to the same problem based on their optimality. However, this approach doubles the Lead Time for the feature, as the reviewer has to do the same work the author did, but asynchronously.
Option two takes much less time and, as studies show, is very ineffective. Most of the issues found are either “taste”-based and aren’t defects or could be detected using linters or more powerful static analysis tools like Qodana.
You can check how many actual defects your code review process finds. And if it finds many, ask yourself: should the quality assurance procedures be shifted left a bit? Would pair or ensemble programming allow for avoiding issues altogether before the code is completed while also yielding a more optimal solution to the problem?
Economical perspective
Most companies aim to be faster — faster development, faster testing, and faster deployment. They want clients to get new features as soon as possible. Many managers are already aware of the cost of delay, which is the economic impact of a delay in project delivery. Additionally, most companies do not want to sacrifice quality while aiming to be really fast.
As noted above in the Quality Perspective section, code reviews rarely yield significant quality improvement and come at the cost of a substantial slow-down in the workflow. In many companies, code reviews can take from 20 to 40% of the total lead time.
The graph below is just an illustration. I suggest you run your own analysis to see how much you usually pay in the cost of delay for doing code reviews. I have yet to find a team where pair programming wasn’t more efficient and effective than solo work plus code reviews.
Code review vs pair programming
Alternatives to code reviews
As you might have noticed, in all perspectives I compared code reviews with pair programming — a collaborative practice where two engineers work together on one machine on a single task.
Pair programming prevents all the inefficiencies associated with pull-request-based code reviews. It increases the quality of the code written, significantly improves knowledge sharing, and speeds up the overall development process.
The only approach found to be even more efficient than pairing is ensemble (mob) programming.
However, both pair and ensemble (mob) programming require a certain effort to start. Both practices demand a significant change of habits for engineers and managers.
Many engineers, particularly experienced ones, resist collaborative practices mostly because they have never tried them. Managers, on the other hand, sometimes still think in terms of “why pay twice for the same feature,” failing to understand the economics of software development.
Changing habits is hard, and changing beliefs is even harder.
A good way to improve the existing culture and code reviews is to start development in a pair. During this stage, two engineers decide on the architecture and technical plan for the feature or user story. Once the plan is composed, the pair can disassemble and proceed working individually.
Code review vs pair research vs pair programming
This approach allows the code review practice to remain intact after development, so those engineers and managers who insist on it will not have to change their beliefs. Additionally, this approach significantly speeds up the overall development, almost reaching the speed of pair programming, because the hardest part — research — is done in a pair. Two engineers do the research and come up with an artifact — a technical plan — which is the result of their synchronization on “how” the feature is to be developed. This dramatically reduces the probability of rework later.
The collaborative research and synchronization enable better knowledge sharing than any code review: during the pair research, both engineers will remember the problem they are solving, the solutions they considered, and which one they agreed to implement.
Additionally, the code review stage in this scenario will be much shorter and will not trigger conflicts. The time required is smaller simply because a second engineer must only confirm that the feature was developed according to the plan they both created. Conflicts do not arise for the same reason: there is nothing to argue about — the plan is either followed or not.
What’s most beautiful about this “pair research” approach is that it’s simple, so most engineers do not object to it, and most managers do not see it as “overpaying.”
I hope this article provoked you to re-think the code-review practice and maybe implement at least a “pair research” approach at work.
References
- Chapter 11 - Factors Critical to the Success of Knowledge Management
- The Knowledge-Creating Company
- A review of software Inspections, Porter, Siy, Votta, 1996
- Expectations, Outcomes, and Challenges of Modern Code Review, 2013
- Investigating technical and non-technical factors influencing modern code review, 2015
- Information Needs in Contemporary Code Review, Pascarella, Spadini, Palomba, Bruntink, Bachelli, 2018
- The Cost of Interrupted Work: More Speed and Stress, Mark, Gudith, Klocke, 2008
- The myth of multitasking, Nass, 2013
- Code Reviews Do Not Find Bugs. How the Current Code Review Best Practice Slows Us Down
- Modern code review: a case study at google, 2018
- Reconfiguration of task-set: Is it easier to switch to the weaker task? Psychological Research, Monsell, S., Yeung, N., & Azuma, R. (2000)
- Multitasking: Switching costs (american psychological association)
- Executive Control of Cognitive Processes in Task Switching — Joshua S. Rubinstein, David E. Meyer, Jeffrey E. Evans
- The correlation between organizational culture and knowledge conversion on corporate performance
- A Meta-Analysis of Ten Learning Techniques
- The Costs and Benefits of Pair Programming
- The effectiveness of pair programming: A meta-analysis
- Evaluating Effectiveness of Pair Programming as a Teaching Tool in Programming Courses
- Increasing Quality with Pair Programming - An Investigation of Using Pair Programming as a Quality Tool
- The Case for Collaborative Programming
- Strengthening the Case for Pair Programming
- Mob vs Pair: Comparing the two programming practices – a case study
- Mob Programming: A Qualitative Study from the Perspective of a Development Team
- Leveraging the Mob Mentality: An Experience Report on Mob Programming