Systematically reviewing source code is an essential part of ensuring code quality and correctness of the code. Yet, most developers review their code ad hoc, which means they do not have any formal, or even explicit code review approach that they can easily describe. While a few formal code review techniques exist, they are not widely known. In this article, I show you different code review techniques you can use to make your code review approach more systematic, explicit, and effective. Those techniques are either dedicated code review techniques, or techniques from other development disciples, such as code reading, testing, or debugging, that I adapted to facilitate the task of code reviewing to help you find more bugs, and understand the code under review better.
Modern code review is ad-hoc
Most programmers review code in an ad hoc manner. This means there is no explicit, formal, or systematic process that the team agreed upon to review code. Instead, the developers intuitively review source code files based on their experience.
Which files, methods and even statements developers look at, and in which order, is up to the individual developer, and often even up to their current mood. Similarly, which aspect or issues they look for in code is not well defined, and can change from developer to developer.
Junior developers struggle without concrete review techniques
While programmers develop their individual techniques for code reviewing over time, those techniques are implicit, vary from person to person, and often only live in the developers’ heads. Junior developers therefore often wonder how exactly they should review code, and what to look for. They learn how to review code by copying and mimicking what they see their peers do in code review. However, this process to learn code reviews is neither efficient nor effective. It also means that each team member will have a slightly (or vastly) different approach to review code.
Transforming implicit to explicit knowledge
One of the best code review hacks a team can embrace is to transform the implicit knowledge that experienced developers on their team have (basically in their head), into explicit knowledge.
During such a transformation process, the experienced developer describes in detail what they do during a code review. Over time, those explicit guides to reviewing code, can become one overall guideline that describes how code is reviewed within a team. Such a guideline serves two purposes: First, it streamlines how a team reviews code, and second, it allows people new to the team to learn quickly what is important during code reviewing.
Formal, systematic, and explicit review techniques
In addition to learning from the experience of senior developers, we can also learn from formal, systematic, and explicitly described techniques for code inspection, code review and code reading.
In this blog post, I show you some of the code review techniques I also teach in my code review workshops. This can help you to transform your implicit and ad hoc reviewing techniques into explicit and more systematic approaches.
Systematic Code Review Approaches
We will look at the following systematic code review approaches:
- Checklist-based Code Reviews
- Test-Driven Code Reviews
- Bottom-up versus Top-Down Code Reading
- Control-flow versus Data-flow Code Reviews
- Pattern Recognition
- Cross-Referencing
- Change-Impact Analysis
- Trace-Based Code Reading
- Abstract-Driven Code Reviews
- Functionality or Use-Case-Driven Code Reviews
1. Checklist-based Code Reviews
One of my favorite code review techniques involves using a checklist. They are by far the most widely used systematic code review technique because the approach is straightforward to implement and does not involve a lot of additional effort. In empirical research, checklist-based code reviews have also been shown as very effective.
Checklist-based code reviews involve using a predefined list of items or criteria to assess the code. A checklist can focus on a variety of factors such as coding standards, design principles, security guidelines, and performance considerations. You can find a code review checklist that covers a wide range of topics here. Sometimes, only one of those factors, for example, security, is the focus of such a checklist. This makes sense when you want to perform a more thorough review in this specific area. Have a look at this security-focused code review checklist as an example.
Pros:
- Provides a systematic and structured method for reviewing code, ensuring consistency and thoroughness.
- Helps reviewers focus on important aspects that might be overlooked otherwise.
- Assists in educating new team members about coding standards and best practices.
- Functions as a memory aid, ensuring important aspects are checked.
Cons:
- Correct usage of the checklist might be unclear or confusing.
- Aspects not covered by the list might be overlooked.
- May lack flexibility, as not all checklist items are applicable to every piece of code.
- It might be not clear where to look, as no order for inspection was given.
Learn more about checklist-based reading here.
2. Test-Driven Code Reviews
During test-driven code reviews, the code reviewer starts by reviewing the test code before they review the production code. The rationale behind this approach is to use the test cases as use cases that explain the code. Thus, the code reviewer learns about the production code when they look at the test code, which helps them create a mental model. The tests are used as a specification of what the software is supposed to do, and the code reviewer learns about the assumption the code author had about the code through the tests.
Pros:
- Helps improve the test code quality and thus maintainability and quality of the code.
- Ensures high test coverage and quality.
- Encourages writing meaningful tests that truly validate the code.
- Tests can help the developer understand the production code.
Cons:
- Depends heavily on the quality of the tests themselves.
- Can’t be done in the absence of tests.
- Tests might replicate the misunderstandings of the code author about the code’s specification.
Learn more about test-driven code reviewing here.
3. Bottom-up and Top-Down Code Reading
All code reading techniques can be broadly categorized also into bottom-up or top-down reading. Most of the time, we use a combination of those two approaches. So, let’s have a more detailed look!
Bottom-up Code Reading
Bottom-up reading means that the developer starts reading, understanding, and reviewing small fragments of the code first. Based on those small fragments, like single code statements, or small methods the developer works their way up to understand and evaluate more and more of the functionality and quality of the software under review. This way, the developer gradually builds a picture of how code elements interact to form the larger system.
Bottom-up code reviewing works well when reviewing a small, self-contained piece of code, like a function or a class. On the other hand, it also works well, when you have to review a larger piece of software you are completely unfamiliar with that lacks documentation.
Pros:
- Developers build a thorough understanding of the nitty-gritty details of the code.
- Helps to effectively identify local issues, bugs, and inefficiencies.
- The developer needs less overall context about the system, making it suitable for new team members or for reviewing isolated components.
- Works well for unclear or unfamiliar legacy code, or code that isn’t documented.
Cons:
- It is a time-consuming approach due to the focus on details.
- Reviewers might miss the big picture, as well as architectural or higher-level design problems.
- When confronted with a large code change, starting at the lowest level can be overwhelming and inefficient.
Top-down Reading
In a top-down reading approach, the software developer starts by reading, for example, a user story or the documentation of the software first. These often non-code artifacts already explain the purpose of the code under review and give the reviewer an overview of what to expect. Then, the reviewer examines the overall structure and components of the systems. The goal is to build a mental model of the system before looking at the code in more detail. In a later step, the reviewer tries to find the most significant method(s), that drive the functionality and starts reviewing in more detail from there.
Pros:
- This approach helps build an understanding of the system’s architecture and high-level design.
- It is effective in spotting problems with the overall design, structure, and integration points.
- The approach is suited for larger code changes or even complete codebases.
- It helps prioritize which parts of the code to focus on for a more detailed review.
Cons:
- Reviewers using only a top-down approach might overlook specific low-level coding issues or bugs.
- Requires the reviewers to have a good understanding of the overall system, which might not be feasible for newcomers.
- By examining only high-level artifacts, the reviewer might make incorrect assumptions about how high-level designs are implemented at the lower levels.
Learn more about top-down code reviews here.
Combining bottom-up and top-down code reviewing
In practice, a combination of both approaches is often used. Starting with a top-down review helps understand the system’s architecture and major components, setting the stage for a more detailed, bottom-up review of specific areas of interest or concern. This combined approach ensures the architectural integrity and the detailed correctness of the code.
4. Control-flow and Data-flow Code Reviews
Two other complementary code review approaches are control-flow and data-flow code reading. Control-flow reading follows the execution of the program, whereby during data-flow reading you follow the data.
Control-flow Code Reviews
Control-flow reading focuses on understanding how the program’s execution progresses. It examines the order in which statements, instructions, or function calls are executed and how the program moves from one part to another. This includes looking at loops, conditional statements, function calls, and recursion. Control-flow driven code reviews are great for finding logical errors.
Pros:
- Highly effective in spotting logical problems with the flow of execution, such as loops, conditionals, and sequence of operations.
- Essential for understanding the execution order and for debugging issues like infinite loops or unexpected branches in logic.
- Helps in understanding the overall structure of the program, particularly the execution order and the interrelation of various components.
Cons:
- Reviewers might miss issues related to data handling or data integrity.
- It is less effective for data-intensive applications.
- Complex to perform in event-driven systems due to the non-linear progression of execution.
Data-Flow Code Reviewing
Data-flow reading focuses on how data moves through the program. This includes tracking the source, use, and modification of variables and data structures, understanding how data is passed between functions, and examining how data state changes over time. Data-flow code reviewing is particularly useful when looking at the code with a security lens. It helps identify data leaks or points where sensitive data might be compromised.
Pros:
- Effective in identifying problems related to data handling, such as data corruption, improper use of data structures, and issues with data lifecycle.
- Particularly useful for uncovering security flaws that stem from the misuse of data, like buffer overflows or injection attacks.
- Helps in tracing how data moves through the system, which is crucial for understanding complex interactions and dependencies.
- Can identify inefficiencies in how data is processed, leading to opportunities for performance optimization.
Cons:
- Tracing data flow can be challenging, especially in large codebases with numerous data paths and interactions.
- Requires a detailed and thorough examination of how data is handled, manipulated, and transformed, which can be time-intensive.
- Focusing primarily on data flow can lead to overlooking issues related to control flow, such as logic errors or incorrect program sequencing.
Combining control-flow and data-flow code reviewing
In practice, both control-flow and data-flow reading modes are often used together during code review. Understanding how data is handled (data-flow) is crucial for evaluating the logic and sequence of operations (control-flow). For example, in a function with complex logic, you might use control-flow reading to understand the sequence of operations while simultaneously using data-flow reading to see how data is manipulated throughout those operations. This integrated approach provides a comprehensive understanding of the code, helping to identify both logical errors and data-related issues.
Code Reading, Debugging and Program Understanding techniques
While reviewing code, we can make use of general code reading, debugging and program comprehension techniques. In the following, I’ll briefly highlight some of the most useful techniques.
5. Pattern Recognition
During pattern recognition, the code reviewer uses mostly a top-down approach to identify common patterns in the code. Such patterns include design patterns, algorithmic patterns, or idiomatic expressions in the programming language. Pattern recognition during code reviewing is helpful to create a mental model of the software system, but also to identify problems with the software’s architecture, or the design of algorithms.
Pros:
- Helps create a mental model of the code change and the structure of the code.
- Streamlines the review process and leads to more consistent review approaches even amongst different team members.
- Well suited to find problems within the software architecture or design.
Cons:
- Due to the focus on patterns, reviewers might overlook other problems.
- Using known and common patterns to judge code, can bias the reviewers and they might dismiss good, new approaches, just because they are not established yet.
- Reviewers need a broad knowledge of patterns and anti-patterns to look for.
6. Cross-Referencing
During cross-referencing, the code reviewer systematically reviews the dependencies and relationships between code elements, such as classes, functions or variables. This approach is suitable to understand the interdependencies and also the impact change can have on the system. As manual cross-referencing is a tedious task, code reviewers can use tools to extract, display or visualize this information. Most IDE’s provide built-in tools that help with cross-referencing, which is also often used during code refactoring. These tools build an index of all code elements and link them, which helps during navigation.
Pros:
- Finds dependencies, and relationships and reveals how a change impacts the rest of the system.
- Allows the reviewer to build a deep understanding of the codebase, and the code change under review.
- Reviewers get a good understanding of the control flow and the data flow, thus allowing them to identify errors in execution and data manipulation.
Cons:
- If done manually, cross-referencing is tedious and error-prone.
- For larger code changes and systems might be overwhelming for the reviewer.
- Focusing too much on relationships and dependencies between artifacts can lead to overlooking other issues.
7. Change-Impact Analysis
During change-impact analysis, the reviewer gathers information to be able to judge how the code change will impact the rest of the system.
While cross-referencing can be used as part of a change-impact analysis, a change-impact analysis is broader and also involves analyzing more high-level artifacts, such as specifications, documentations to understand the impact of the code change. A reviewer will normally start by understanding the rationale behind the code change, and its description (top-down). Then, the reviewer will identify affected components, analyze the dependency graph and assess the impact of this change on the functionality of the overall system. During such an analysis the reviewer should also look at the impact on the test system: does it need tests to change, or the be added?
Pros:
- Discussions on the impact of code changes can improve the understanding a team has of the overall codebase.
- Allows to direct reviewing efforts towards impacted and high-risk parts of the codebase.
- Reduces the risk of introducing bugs or regressions.
Cons:
- Change impact analysis can be very time-consuming.
- The reviewer has to have already a good and deep understanding of the codebase to understand the impact.
- It can lead to analysis paralysis, because the reviewer cannot completely envision all potential
impacts of this code change, and is thus reluctant to approve it.
8. Trace-Based Code Reading
Trace-based reading, often used during debugging, involves following the execution of code at runtime. The code reviewer can do this by stepping through the code in a debugger or by examining logs or output traces.
Most code reviewers use a static control-flow approach to review code, instead of this dynamic code analysis technique, that focuses on understanding how the code behaves at runtime.
This dynamic evaluation is useful when the code reviewer is puzzled by a certain part of the program, or expects a bug. It allows us to understand the state of the program at various points of execution, to track the sequence of function calls, and to observe changes in variable states over time.
Pros:
- Allows the review to see how the system behaves at runtime.
- Effective during diagnosing or evaluating “potential” bugs and problems.
- Helps build a mental model of the code, thus helpful if the reviewer is unfamiliar with the system.
- Can help catch side effects or anomalies that cannot be found during a static inspection.
- Suitable for detecting performance problems.
Cons:
- Setting up the reviewer’s system to run the code might be cumbersome and time-intensive.
- Might distract the reviewer from actually reviewing the source code.
- The review is limited to the scenarios the reviewer traces.
Academic Code Reading Techniques
Code review techniques described in academic papers aim at making code reviews more systematic. Those approaches detail exactly where a reviewer should look, the order of inspection, and also which questions to ask and issues to look for.
I summarize the approaches into two types (abstract-driven and functionality-driven) and only highlight aspects that help us to better design our own code reading and reviewing approaches.
9. Abstract-Driven Code Reviews
During abstract-driven code reading, the code reviewers read small pieces of code, and write down their understanding of the code (aka create descriptions or specifications). Then, the reviewers compare the description they created with the existing specification or documentation of the code. If there was a mismatch, they found a problem. To determine the order in which methods or functions are reviewed, academic approaches often use dependency information. Commonly, the reviewers are instructed to start with the methods that have the least dependencies on the other parts of the systems. This is in line with the bottom-up code review style.
10. Functionality or Use-Case-Driven Code Reviews
Another distinct set of reading techniques does not focus so much on the entities of the system (as described before), but rather on concrete use cases or functionality of the system. Here the aim is to check whether the code behaves correctly. For that, the reviewer first has to define use cases, and then inspect if the code implements the use cases correctly. This includes designing the preconditions for the use case, the success and failure conditions, and the exceptions. This approach reminds us of systematic testing approaches, whereby here the focus is on statically inspecting the code using control-flow and data-flow reading techniques, and not on the dynamic execution of the codebase.
Want to learn more about the presented code review techniques and see them in action? Have a look at my member-only articles on code review techniques here, or sign up for a free remote learning session.