Optimal pull request size

October 16, 2017 / Blaine Osepchuk / 8 Comments

Every team has an optimal pull request size, it’s likely much smaller than you think, and making your pull requests your optimal size will improve the performance of your team. I’m going to convince you that these three statements are true by the end of this post.

What does the research say?

A large body of research has shown that code reviews are the most cost-effective way of finding defects in code¹. But most of the value of a code review comes from the first reviewer in the first hour².

How much code can my team review in an hour?

My team tracks how much time we spend on each code review. I compared those times to the number of lines changed in our pull requests to produce this scatter plot³.

The data is pretty messy but if we want a 90% chance of completing a code review in an hour, we probably need to cap our pull requests at around 200 lines of code.

Finding your optimal pull request size

This is really just an optimization problem. You have a cost associated with creating, reviewing, managing, and merging a pull request, which we can think of like a transaction cost. But you also have a cost associated with letting the pull request remain as work in progress, which we can think of like a holding cost. I’ll define these costs in more detail in a minute. I just want to establish that you are trying to find the pull request size that minimizes the sum of your costs (the lowest point on the blue line below).

Pull request transaction costs

It takes time and effort to:

break stories into bite sized chunks
make the them independent or schedule them so they don’t block each other
then actually make the changes required
create a testing plan
test the changes
create a pull request
review it
merge it
deploy it to production (we do continuous delivery)

If you did this for every line you changed, you would be at the far left on graph above. Not good. But making larger pull requests creates costs of its own.

Pull request holding costs

The larger the pull request, the:

longer it takes for the reviewer to find a big enough slot of time to address it (and the less motivated the reviewer will be to review it)
less chance it will pass code review on the first attempt and therefore require multiple reviews
longer it will take to review
more chance the review will exceed an hour (or whatever your personal limit is for effective code reviews) and become be ineffective
more chance it will grow larger still and need even more review
bigger the risk that the author made a bad assumption and will have to redo much more work
bigger the risk that the requirements will change before the code is released
more demoralizing it becomes to work on this code (multiple reviews of the same code sucks for everyone)
more it sucks to have to re-review and re-test the whole pull request if some little part of it fails
less certainty we have over when the code will be merged
more chance of a merge conflict
more chance it will block other stories in the sprint and derail the schedule
longer it will take to get the code into production and start delivering business value

If someone submitted a 10,000 line pull request in our project I can guarantee that all of these points would come into play. But our data and experience show that much smaller pull requests are still incurring huge holding costs.

(As an Amazon Associate I earn from qualifying purchases.)

Why would smaller pull requests by better?

Smaller batch sizes increase flow, according to Donald Reinertsen in his book: The Principles of Product Development (paid link).”

Specifically:

Reducing batch size reduces cycle time
Reducing cycle time reduces variability in flow
Reducing batch sizes accelerates feedback
Reducing batch sizes reduces risk
Reducing batch sizes reduces overhead
Large batch sizes reduce efficiency
Large batch sizes inherently lower motivation and urgency
Large batch sizes cause exponential cost and schedule growth
Large batches lead to even larger batches
The entire batch is limited by its worst element

Applying lean manufacturing concepts to pull requests

Some very bright people at Toyota invented the Toyota Production System, which includes single-piece flow. And they used this system to take over the auto industry. This is a similar idea but Reinertsen tweaked it for product development. The key to making this really work is to lower your transaction costs. If you become really efficient at processing pull requests, your overall cost curve will shift down. And your optimal batch size will shift to the left (get smaller). It’s a virtuous cycle.

You want to get stories moving across your sprint board very rapidly. And you want to get new code into production as quickly as possible. You don’t want pull requests stalled in review for days at a time. And you really don’t want failed pull requests and all the rework that goes with them.

Still not convinced that the optimal pull request size is small?

Let’s do some math. The more code in your pull request, the more chance that you’ve introduced a defect. I know software defect rates vary widely but let’s use the often quoted 10-50 defects/kloc for released commercial software. But, we are talking about a code review–not released software–so let’s assume that your defect rate is 50 defects/kloc. That’s one defect every 20 lines of code on average⁴.

Good code reviews catch 60-90%¹ of the defects in the code so let’s pick a middle value and say that you can catch 75% of them. That means you should find one defect for every 27 lines of code on average.

Smaller pull requests contain fewer defects

A 200 line pull request should contain 10 defects and you should expect to find 7.5 of them. That means it’s extremely unlikely that it will pass code review on your first attempt. Now let’s redo the math with a 50 line pull request. There should be 2.5 defects in this pull request and you should expect to find 1.9 of them. You’ve got a much better chance of passing code review on the first try with a smaller pull request.

When your big pull request fails code review, you have to fix the whole pull request, re-review the whole pull request, re-test the whole pull request, and hope that the nobody has introduced a new error along the way. When your small pull request fails code review, you still have to do all the same steps. But it will go much more quickly because you are dealing with much less code.

What’s wrong with the way my team handles pull requests?

My team is struggling with our pull requests and our problems are all related to what I like to call “mega pull requests”. Mega pull requests are pull requests that have gotten out of control and have become difficult to review and deploy. We recognize them by their large diffs, multitude of comments, multiple reviews, many days spent in review, and many hours spent trying to get them into the ‘done’ column.

How a mega pull request is born

Suppose my colleague creates a 200 line pull request. I see it in Jira and just from the title I know it’s going to need my full attention. So I schedule it in a big block of time tomorrow and plan to tackle it all at once. I review it and find a couple of problems. I write my comments in BitBucket and bounce it back to the author. He’s moved onto something else so it takes him until the next morning to review my comments, change the code, update the testing plan, retest the branch, and put it back in review.

We repeat this cycle once or twice while we deal with other stories, meetings, unplanned work, and other interruptions. By now, this pull request has been in review for the better part of a week and now it might have merge conflicts. If so, the author fixes them, retests, and sends it back to review. At some point I eventually get all the tests in the testing plan to pass, I approve the pull request, merge into master, and move on to the next story.

We probably have one pull request like this per sprint⁵. But on rare occasions, we’ve had pull requests bounce several times over several weeks. I’m not joking. Mega pull requests kill our productivity.

What’s helped

We’ve tried all kinds of things to prevent the formation of mega pull requests but our ideas have mostly been ineffective. The only things that helped a little were:

automating our code styling and formatting with a commit hook so it’s never an issue
doing a refactoring story in front of the main story
avoiding stray changes in the main story no matter how ugly the code – we just put it in the backlog and make a comment in the code with the Jira story ID

We think we’ve still got room to improve so we’re interested in finding our optimal pull request size.

How do we find our optimal pull request size?

We conduct experiments. Our current policy is to only create a pull request that will take less than an hour to review. But we suspect our velocity will improve if we create smaller pull requests. So we’ll try a 45 minute limit for two months, examine the data, set a new target, and repeat the experiment until we find our optimal pull request size.

Wrapping up

Every team has an optimal pull request size, it’s likely much smaller than you think, and making your pull requests your optimal size will improve the performance of your team. You can find your optimal pull request size by doing experiments and measuring the change in your velocity.

I think I’ve made a pretty strong case for smaller pull requests. But I’d love to hear what works for your team. Have you experimented with the size of your pull requests? Have you found your optimal pull request size? How big are they and how has your velocity changed since you found it?

Fagan ME (1976) Design and code inspections to reduce errors in program development. IBM Syst J 15: 182-211. ↩ ↩
Cohen J (2010) Modern code review. Oram A, Wilson G, editors. Making software: what really works, and why we believe it. Sepastopol (California): O’Reilly. pp. 329-336. ↩
I removed spurious data like file renames, automatically generated files, and changes caused by running a formatting tool over the code. I also removed pull requests that didn’t pass review. ↩
This is just a back-of-the-napkin math. You can use a wide range of defect and detection rates and still reach the conclusion that large code changes should contain detectable errors that cause your code review to fail. Please don’t clobber me in the comments about the exact numbers I’ve used here. ↩
We have a ‘zero known defects’ policy so code goes back in forth in code review until we think it has zero defects and meets our other quality objectives. We’re trying to improve the quality of our code base and repay our technical debt. This will change our optimal pull request size compared to a team that has a ‘ship it and we’ll fix it in maintenance’ policy. ↩

Uncategorized

Code Quality Effectiveness Software Engineering

6 Comments

calebcauthon (@calebcauthon)
January 28, 2018 at 9:49 am

doing a refactoring story in front of the main story

That sounds very interesting. Fairly new to stories so I dont know what that would look like, but I’d like to try it on our team.

Of your tips, the one that rang true for us was “avoid stray changes…”, and we are generally a “pay down the debt ” team.

Reply
- Blaine Osepchuk (Post author)
  January 28, 2018 at 11:50 am
  
  Thanks for your comments.
  
  The more we do the refactoring stories in front of the new feature, the more we like them. We find it faster to clean up a mess and then add a new feature to the newly cleaned code as a separate step than trying to cleanly add a new feature to messy code. Doing it as two steps is faster and results in cleaner code all around.
  
  I should also tell you that we don’t strictly follow Scrum when it comes to stories. We allow refactoring stories that have no user-facing benefits if their purpose is to prepare the code for a story that does provide user benefit. It also helps keep the pull requests smaller.
  
  Reply
B Anon
February 14, 2018 at 1:28 pm

What is the outcome of your test with smaller pull requests after a couple of months?

Reply
- Blaine Osepchuk (Post author)
  February 14, 2018 at 1:48 pm
  
  I haven’t analyzed the data yet but I believe it is working. Our reports show less work in progress and faster cycle times. And now when I get a pull request that takes 45 minutes to review, it feels way too big.
  
  Our next step might be to limit pull requests to 30 minutes maximum and see how that goes.
  
  I should mention, that it’s not all smooth sailing. Some stories are difficult to break up into PRs that can be reviewed in under 45 minutes. And some people have more difficulty with this kind of planning than others.
  
  I’ll post an update when I have a chance to write one.
  
  Thanks for your question.
  
  Reply
  - ibres
    June 7, 2019 at 6:48 am
    
    Hello Blaine, find your approach sound and I would like to know if you have seen improvements in the last year?
    
    Reply
    - Blaine Osepchuk (Post author)
      June 7, 2019 at 12:39 pm
      
      Thanks for the question, ibres. I’ve been meaning to write a followup post about our results but haven’t gotten around to it yet.
      
      Yes, we’ve seen lots of improvements. We don’t get mega-pull requests any more and we are getting much better at judging how to break work up into reviewable bites. Since I wrote this post we’ve lowered the time limit for reviews to 40 minutes. My team rejects PRs when we hit the 40 minute mark–“too long” is the feedback the author gets. But that’s our limit. The sweet spot seems to be around 20 minutes.
      
      We use chained pull requests to handle large changes. And we’ve a procedure for them.
      
      We create one or more PRs to refactor the code where we want to make our change and/or add unit tests. They are tightly focused and we usually mark commits/PRs that are exclusively done with refactoring tools and keep them separate from manual code changes, which we scrutinize more closely.
      
      Once we’ve got a nice clean place for our new code we branch off of the last refactoring branch and start creating the classes we’ll need for our new functionality. That’s often one class plus its unit tests per PR. The new classes aren’t called by any existing code yet but they give the reviewer a chance to think about them, look at the tests, and see if there are any errors or missing test cases. We keep creating classes with unit tests and joining them with the other classes we’ve created in small steps until we’ve built everything we need for the feature.
      
      Then the final PR joins the new feature to the existing system and activates it.
      
      What else? The author usually discusses his design/code/PR plan with the reviewer at the beginning of the process. They agree on the design and the approach before a line of code is changed. The reviewer is also reviewing the PRs as soon as possible after they are created so if things are going off the rails the author can get feedback from the reviewer before he wastes a bunch of time.
      
      Note: this is what works for my team. We are a permanent team, we work together all time, and we value low defect, highly maintainable code. This approach might not work with teams with different priorities.
      
      Reply

Optimal pull request size

What does the research say?

How much code can my team review in an hour?

Finding your optimal pull request size

Pull request transaction costs

Pull request holding costs

Why would smaller pull requests by better?

Applying lean manufacturing concepts to pull requests

Still not convinced that the optimal pull request size is small?

Smaller pull requests contain fewer defects

What’s wrong with the way my team handles pull requests?

How a mega pull request is born

What’s helped

How do we find our optimal pull request size?

Wrapping up

Related

6 Comments

2 Pingbacks

Leave a Reply Cancel reply

Top Posts

Tags

Recent Posts

Archives

Recent Posts

Optimal pull request size

What does the research say?

How much code can my team review in an hour?

Finding your optimal pull request size

Pull request transaction costs

Pull request holding costs

Why would smaller pull requests by better?

Applying lean manufacturing concepts to pull requests

Still not convinced that the optimal pull request size is small?

Smaller pull requests contain fewer defects

What’s wrong with the way my team handles pull requests?

How a mega pull request is born

What’s helped

How do we find our optimal pull request size?

Wrapping up

Share this:

Related

6 Comments

2 Pingbacks

Leave a Reply Cancel reply

Top Posts

Tags

Recent Posts

Archives

Recent Posts

Discover more from Small Business Programming