Automatic code review tools

Using Copilot for Automated Code Reviews

One of the most intriguing applications of GitHub Copilot is its potential to assist in code reviews.
In this tutorial, we’ll explore how Copilot helps you understand code, suggest improvements, explain code, and answer specific questions about the code under review.

You’ll learn two main ways to use Copilot:

  1. Copilot in the IDE (Visual Studio Code)
  2. Copilot on GitHub for reviewing pull requests (currently in beta).

Along the way, I’ll share my thoughts on the use cases and workflows for AI-assisted code reviews, as well as the weaknesses of this approach.
While GitHub Copilot can transform how we conduct code reviews, we will see that human oversight remains essential throughout the process.

GitHub Copilot

GitHub Copilot is a coding assistant that uses large language models (LLMs) under the hood. GitHub Copilot is a coding assistant powered by large language models (LLMs). It integrates seamlessly with various integrated development environments (IDEs), providing developers with real-time suggestions, code completions, and review functionality.
Recently, Copilot has also become available (in beta) directly on GitHub, where it assists developers in reviewing pull requests (PRs).

Code Under Review

In this tutorial, you’ll explore how to use GitHub Copilot to review code effectively.
Let’s start with a snippet of code from a file named auth.js. Below, you’ll find the first code sample we’ll review.

You don’t need to analyze the code in detail right now, but take a quick look.

Do you notice any opportunities for improvement?


import React, { createContext, useState, useEffect } from "react"
import firebase from "gatsby-plugin-firebase"

export const AuthContext = createContext({})

const AuthProvider = ({ children }) => {
  const [user, setUser] = useState()

  useEffect(() => {
    firebase.auth().onAuthStateChanged(user => setUser(user))
  }, [])

  return (
    <AuthContext.Provider value={{ user, setUser }}>
      {children}
    </AuthContext.Provider>
  )
}

export default AuthProvider

Copilot in the IDE

As mentioned earlier, GitHub Copilot is available in IDEs and on GitHub itself.
We’ll begin by reviewing this piece of code within the IDE and later compare it to the review experience on GitHub in the PR view.

In Visual Studio Code, a small context menu provides access to Copilot’s AI-powered capabilities.
This menu allows you to Explain, Fix, or start a chat (Add File to Chat) about a specific piece of code.
You can also use it to generate documentation or tests with the help of AI.
Recently, Copilot introduced a Review and Comment option for reviewing code directly in the IDE, which is what you’ll use today to get started.


Screenshot of the context menu of Copilot


Review and Comment

Previously, the primary option for reviewing code with GitHub Copilot was to start a chat with the LLM. This was particularly useful when you wanted an exploratory and interactive approach to understanding and improving your code.

Now, with the Review and Comment option, you have access to a more direct, guided, and structured way to review your code.
To start a guided review, select the parts of the code you want to examine and ask GitHub Copilot to Review and Comment.

Copilot then opens a review interface that highlights suggested changes line by line within the file under review.
This interface resembles tools commonly used for refactoring or resolving merge conflicts, making it intuitive for developers.

Here’s an example of the interface in action while reviewing the auth.js file in the IDE:


Copilot in action


Line by Line Review

The AI suggested code change is embedded directly in the code. Copilot injects the review interface just below line 10, which is the main line of code this code review comment addresses.
The review interface highlights the original lines in red and the suggested changes in green, following a familiar diff-style format.

You can either accept the changes or discard them and move to the next suggestion.
What stands out to me is that Copilot also provides a brief explanation of the reasoning behind each suggestion, making it easier to understand the proposed improvements.

However, one usability issue I’ve encountered is that accepting a change automatically moves you to the next one.
This behavior means you can’t review or verify the outcome of the change Copilot applied.
I’d prefer an option to accept the change and stay on the same line (e.g., line 10) to inspect the result before proceeding.

Overall, the Review and Comment functionality is straightforward and enables a highly structured approach to code review, aligning well with automated code review practices.

Let Copilot Explain Code

Another way Copilot can assist in a review is by explaining the code.
This feature helps you build a mental model of the code changes, improving your ability to provide meaningful feedback during the review process.

To prompt Copilot for an explanation, select the code and right-click to open the context menu. Then, click Explain.
The executed prompt is: “@workspace /explain Write an explanation for the active selection as paragraphs of text.”

This prompt triggers GitHub Copilot to generate a detailed explanation of the selected code, as shown in the screenshot below.
The explanation highlights objects, components, and function calls, making it easier for developers to associate the description with the code itself.
Additionally, functions are clickable, linking you directly to the file where they are defined.

After the explanation, a chat interface appears, allowing you to interact further with the LLM about the code.


Explanation of the auth.js component


This chat interface is also a great starting point when you want a more exploratory experience with the LLM and the code.

Inline Chat with Copilot

Another way to interact with GitHub Copilot in the IDE is by using the inline chat (Editor Inline Chat) interface, which provides explanations directly within the file view.


Inline chat of Copilot


One challenge with inline prompts is that the chat window is quite small, while the explanations or suggestions are often lengthy.
This results in extensive scrolling, which can feel cumbersome and less user-friendly, as shown in the screenshot below.

Fortunately, you can prompt Copilot to View in Chat, which opens the conversation in a larger, dedicated window (in my case, on the right-hand side).


Long inline explanation of Copilot


Asking Copilot for Changes

The inline interface is particularly effective for prompting Copilot to make changes.
For example, you can ask Copilot, “Change this function call to a more secure one.”
It will then present you with suggested changes, though without much explanation by default.

However, the interface initially only displays the newly generated code, not the original code.
This can make it difficult to compare and evaluate whether the suggested change is truly helpful.


Inline Suggestion of Copilot


To address this, you can enable the Toggle Changes option, which shows a side-by-side comparison of the original and new code—similar to what many developers are accustomed to in code reviews.


Inline Suggestion of Copilot


If you’re unsure about the suggested changes, you can also ask Copilot to explain its suggestions, adding clarity and helping you assess their value.

Chatting with Copilot About the Code

One of the most common ways LLMs assist developers today is through “traditional” chat interfaces.
In this interface, developers can ask the LLMs questions about the code or explore it interactively.
GitHub Copilot also offers this method of interaction, providing a versatile way to engage with the code.

The inline chat we discussed earlier is another representation of this chat interface.
Similarly, the Explain option brings you directly into this interface with a prefilled standardized prompt, enabling a seamless transition between exploration and explanation.

Copilot to Review PRs on GitHub

The GitHub Copilot experience for pull request reviews is currently similar to the chat functionality in the IDE or tools like ChatGPT on the web.
However, it provides a less directed and structured approach compared to the Explain and Comment interface available for automated code reviews in the IDE.

There are several ways to trigger Copilot directly on GitHub during a PR review. For instance, you can click on a line of code, which opens a small Copilot context menu.
From there, you can select options such as Explain or Attach to current thread to add Copilot’s response to an existing conversation.


Context menu of Copilot on GitHub


AI Code Explanation for Code Review

When you click Explain, it triggers a pop-up window that resembles the familiar chat interface.

Using AI to explain code is an excellent use case during a code review.
It helps the reviewer understand the code more effectively and build a mental model of the changes.
However, this feature does not provide improvement suggestions from Copilot, as shown in the example below.


Copilot chat on GitHub


Generating a PR Summary from Multiple Files

Another way to trigger GitHub Copilot is through the menu located above all the file changes, as shown in the screenshot below. This option allows you to select multiple files from the change set and start a chat with Copilot about them.

One practical use case is to have AI automatically summarize the pull request for us. While there are already tools and plugins available that offer automatic PR summarization, GitHub now provides this functionality out of the box.

To try it out, I selected all source code files changed in this PR, along with relevant configuration files. I omitted content-related files such as markdown files, as well as certain configuration or generated files like package.json.

The auth.js file is part of a rather large PR, as shown below. I ended up adding 14 source code files the context of Copilot.


Selecting multiple files


For the sheer amount of code changes, Copilot’s automated code review summary is surprisingly short and lacks meaningful insights. It primarily lists some of source code files included in the change set, grouping them together logically.


Automated PR summary


I also asked Copilot to indicate where I should focus my attention during the code review. While it provided an answer, I found it underwhelming. The response listed a few files but offered no explanation for why these files were highlighted. Additionally, the auth.js file—one that I, as a human reviewer, consider highly relevant—was notably absent from the list.


Asking Copilot for areas to review


I am disappointed with both responses. Perhaps my prompt wasn’t clear or specific enough. To address this, I tried again with the prompt: “Summarize the PR in more detail.” Copilot processed the request but returned the same short and unhelpful summary as before. Bummer.

Next, I sent another prompt focused on risk assessment: “What are the most risky changes in this PR?” Once again, Copilot took its time to create the plan, but the answer was almost identical to the previous response. Unfortunately, it offered little insight into the PR, and the risk assessment did not seem appropriate or actionable.

Don’t use AI for Risk Assessment

A chat interface is a powerful tool, letting you ask any question or provide any instruction you want to the LLM. While this flexibility is valuable, it doesn’t prevent us from making unwise requests. For example, asking an automated code reviewer to assess the risk of a pull request is one such misstep.

LLMs are fundamentally unsuited for assessing risk in source code. They lack the deep understanding of domain-specific logic, semantic nuances, and file interdependencies required for accurate analysis. Risk often arises from how files interact or their role within the system. These are all factors LLMs are incapable of dynamically evaluating.

LLMs treat all input equally, making it hard for them to prioritize files based on critical factors like functionality, or security and business relevance. Without domain-specific knowledge or explicit input, LLMs cannot evaluate the importance or business-critical nature of a file.

As these models often respond regardless of their suitability for the task, it is crucial to critically evaluate whether this type of AI is appropriate for the specific task at hand.

Asking Copilot for Code Improvements

The experience of interacting with Copilot on files may feel smoother when focusing on a single file. For instance, let’s choose only auth.js. Once selected, the chat window opens, and Copilot is ready to assist by responding to your questions or following your instructions.


Copilot on GitHub with multi-file selection


Let’s ask Copilot: “How can this code be improved?” and review its response.

When you prompt Copilot, the page reloads. If you were examining a specific change in the long list of PR changes, you’ll find yourself back at the beginning of the list. This behavior can be frustrating and unhelpful, as it causes you to lose the context of the code you just reviewed and sought assistance for.

Another negative UX aspect is that the chat window is small, requiring significant scrolling to read Copilot’s full response. Unlike the IDE experience, there is no option to open the conversation in a larger, more convenient window.

Copilot’s response is detailed and starts with a list of improvement suggestions, including enhancements for error handling, type safety, and dependency management.


Copilot Chat on GitHub


When we scroll down, our personal automated code reviewer, Copilot, provides an updated version of the code that you can copy and paste.
However, during a code review, the ability to copy and paste the entire updated code is not particularly useful.

What’s more important is mapping the new code back to the code under review, identifying the specific lines that were changed, and determining where to leave comments.
Since the piece of code under review is relatively small, it’s manageable to map the updated code in the chat window to the corresponding lines in the diff view.


Updated version of the code


If you find Copilot’s suggestion useful, you can add a comment or make a concrete code change suggestion directly in the PR.
However, for non-trivial changes, I personally feel uneasy making edits this way.

For anything beyond a simple adjustment, I prefer to make the changes within the IDE, where I can immediately verify if I’ve made a mistake (maybe even just a typo, or copy/paste error).
This ensures a smoother and more reliable workflow, particularly for complex updates.


Commenting based on Co-Pilot Suggestions


Should We Ask AI to Review a PR?

The idea of a peer reviewing code already submitted as a PR by prompting an LLM to improve it feels rather strange.
Ideally, the process of prompting AI to refine code should be driven by the developer who wrote the code and should occur in the IDE before the PR is submitted for peer review. This use case, or a self-review is nicely covered by Copilot in the IDE.

On the other hand, using AI to explain code during a PR review is a valuable and practical use case.
Similarly, if a reviewer has a suspicion or an improvement suggestion, interacting with the LLM can help clarify their thoughts.

In such scenarios, it’s unlikely that the reviewer would use a prompt as general as the one in this example.
More specific prompts tend to yield better results. A code review checklist can provide inspiration for crafting effective code review prompts.

In most cases, reviewers will have a concrete question or suspicion about the code and will ask the AI code reviewer for targeted feedback directly.

AI Helps Reviewers Verify Their Suggestions

One of the most significant benefits AI offers to code reviewers is helping them build a mental model of the changes by explaining the code and providing feedback on their suggestions.

For example, let’s imagine we’re reviewing the code and notice:

const [user, setUser] = useState().

We think it might be better to change this to:

const [user, setUser] = useState(null).

Yet, feel uncertain and insecure about making the suggestion. What if we are wrong? Insecurity during code reviews, and even anxiety, is a common challenge. This is where AI can help us investigate our idea.

Let’s ask Copilot if our suggestion makes sense and why.


Evaluating review suggestions using AI


Copilot provides a detailed explanation with three reasons why initializing useState() with null is a better approach. We can take it a step further and ask our AI code review helper to summarize this explanation, making it easier to use as a concise review comment.


Summarizing Copilot exploration for comment


AI Trustworthiness Is a Problem

One of the biggest challenges with using LLMs for code reviews is the trustworthiness of their suggestions. Although it might seem like AI understands or is intelligent, this is not the case.

The code suggestions provided by Copilot, or any other automated code review tool that uses LLMs, rely purely on statistical relationships. The LLMs are designed to predict the next most likely token, resulting in outputs that appear plausible.
However, even though the output may seem correct, it can sometimes be entirely wrong.

I’ve personally observed AI fabricating API calls that don’t exist, suggesting catching exceptions from libraries not in use, and don’t get me started on the suggestions I received for improving CSS code. Even while working on this tutorial, there were moments when Copilot generated incorrect or incomplete code, as shown in the appendix.

While some of the AI-generated code is correct, the potential for errors is too significant to ignore, especially during a code review. A code review is not the right place to take chances with unverified suggestions. This is why it’s critical to thoroughly evaluate any AI-generated code review feedback before acting on it.

Evaluating AI Code Review Suggestions

An LLM can be a valuable companion during a code review, especially when you need it to remind you of best practices or highlight potential issues in the code. However, it’s crucial that you have the competency to evaluate the proposed changes critically (and quickly).

For developers who are less familiar with a framework or programming language, relying on AI for code reviews can be tempting. Yet, the less skilled you are at writing or improving the code yourself, the harder it becomes to assess the validity of the AI’s code review suggestions.

Research indicates that even experienced developers struggle to fully understand, modify, or debug code generated by LLMs.
As a result, evaluating AI-generated code review suggestions—some of which can be entirely wrong—remains a challenging task.

Evaluating Our AI Code Review Suggestion

Let’s examine Copilot’s recommendation to modify the line of code where we call onAuthStateChanged. Our automated code reviewer suggested using a pattern to ensure that any subscriptions or side effects are properly cleaned up when a component is unmounted or before the effect is re-executed. However, if we’re not familiar with React or the Firebase API, it can be difficult to determine whether this pattern is applicable in this context.

If we refer to the official Firebase documentation, we notice that the code snippets provided there do not use this pattern. This omission could also explain why it’s not present in the existing codebase.


Firebase Authentication Snippet - Official website


After reviewing the proposed solution, the reviewer might feel even more uncertain:

  • Is this a valid pattern?
  • Should it be applied in our case?
  • Why wasn’t it included in the official documentation?

If the reviewer is unsure about the AI-generated suggestion, there are three possible approaches:

  1. Trust the AI and recommend the change anyway.
  2. Stay cautious and decide against recommending the change.
  3. Dig deeper by conducting further research to arrive at a conclusion.

Option one is not recommended, as it assumes AI is infallible. Option two is a safe choice but may overlook opportunities for improvement. Option three can lead to valuable learning and potentially improve the codebase quality, though it requires more time and effort.

In our case, we chose the third option and found additional documentation on a Firebase-related website. This resource indicates that a listener is returned and can, or should, be used for unmounting.


Another (not official) firebase snippet


Does this evidence provide enough justification to propose the change?

Or should we involve a human/colleague to review the AI-based code suggestion?

😜

AI Code Review Belongs in the IDE

When an automated code review is conducted in the IDE, the process of iteratively evaluating AI suggestions becomes a natural part of the development workflow. AI-assisted code reviews are an excellent tool for self-review, allowing developers to inspect and improve their own code before involving others. There’s no need to include a peer in this semi-automated code review process—unless during an AI pair programming session.

In the IDE, developers can thoroughly modify, enhance, and test the AI code review suggestions. Yet, by the time the code reaches the code review stage, the developer should have already leveraged AI to assess and refine their solution.

Once a code review begins, the stakes for changing the code based on AI suggestions increase significantly, while the reviewer’s ability to evaluate the suggestion diminishes.

AI Code Reviews Can Be Time-Consuming

While Copilot produced straightforward and sensible suggestions for our code in this tutorial, this is not always the case. Using AI for code reviews can lead to incorrect suggestions, as discussed before, but also to many recommendations that are irrelevant or don’t apply in the given context.

This challenge is similar to the struggle developers face when using static analysis tools for automated code reviews. These tools often generate numerous false positives, which means they are reporting issues that aren’t actual problems.
In contrast, LLMs add another layer of complexity because they are non-deterministic, meaning they can produce different outputs for the same input.

Sifting through irrelevant or inconsistent suggestions is a time-consuming task, making it essential to critically evaluate the AI’s output during a code review, but also to question it’s usefulness and efficiency.

Conclusion: AI Code Reviews Need Human Supervision

Despite advancements in AI, the human element remains irreplaceable in code reviews. There is no such thing as a fully automated code review yet.

AI-generated suggestions can sometimes be terribly incorrect, making code reviews the wrong place to take these risks.
While tools like Copilot are excellent companions in the IDE, they are still assisting technologies rather than autonomous code reviewers.

That said, don’t underestimate the power of AI assistance. An AI reviewer can help developers focus on details, detect errors, improve code quality, handle repetitive tasks, and help them build a mental model of the code.

By combining human expertise with AI capabilities, we can create a balanced approach that leverages the strengths of both.
As AI continues to evolve, its role in code reviews will likely expand, but the need for human oversight will remain critical for a long time.

What’s your experience with AI code reviews? Let me know in the comment section on LinkedIn!

Appendix: Code Generated by Copilot on GitHub

Here’s the complete, improved code snippet that Copilot suggested during our automated code review session.

What do you think?

AI Improved Code

import React, { createContext, useState, useEffect } from "react";
import firebase from "gatsby-plugin-firebase";
import PropTypes from "prop-types";

export const AuthContext = createContext({});

const AuthProvider = ({ children }) => {
  const [user, setUser] = useState(null);

  useEffect(() => {
    const unsubscribe = firebase.auth().onAuthStateChanged(
      (user) => setUser(user),
      (error) => console.error("Auth state change error:", error)
    );
    return () => unsubscribe();
  }, []);

  return (
    <AuthContext.Provider value={{ user, setUser }}>
      {children}
    </AuthContext.Provider>
  );
};

AuthProvider.propTypes = {
  children: PropTypes.node.isRequired,
};

export default AuthProvider;

Code Explanation from Copilot

The code snippet is a React useEffect hook that listens for changes in the authentication state using Firebase. Here’s a step-by-step explanation:

  1. useEffect Hook: This hook runs a piece of code after the component renders. The empty dependency array [] means it runs only once when the component mounts.

  2. firebase.auth().onAuthStateChanged: This is a Firebase method that sets up an observer on the Auth object to listen for changes in the user’s sign-in state.

  3. Callback Function: The callback function user => setUser(user) is called whenever the authentication state changes. It updates the state with the current user object.

  4. Cleanup Function: The return () => unsubscribe() part ensures that the observer is unsubscribed when the component unmounts, preventing memory leaks.


Copilot's Incorrect Code Suggestions

Below is an “improved version of the code”, as Copilot calls it.

Can you spot the issue with the proposed code?

import { useEffect } from 'react';
import firebase from 'firebase/app';
import 'firebase/auth';

const useAuth = (setUser) => {
  useEffect(() => {
    const unsubscribe = firebase.auth().onAuthStateChanged(
      user => setUser(user),
      error => console.error('Auth state change error:', error)
    );
    return () => unsubscribe();
  }, [setUser]);
};

export default useAuth;

This article first appeared on https://awesomecodereviews.com. Last updated: November 22, 2024

Profile picture of Michaela Greiler

Written by Dr. Michaela Greiler who is obsessed with making code reviews your superpower. Learn more about her workshops.