top of page
  • Writer's picturehenryfarleyjohnson

A Lil Cubs Post

On Tuesday night, the Cubs put up 17 runs against the Washington Nationals. While it's delightful to see your team score over and over again, it can also be frustrating: there's no bonus for winning by a wider margin, and you wish you could bank those runs for a night when the bats are quiet.

With this in mind, I decided to look at the Cubs' distribution of runs this year to see where they fall relative to the rest of the league. I'll be using my old pal the Gini coefficient as a measure of run consistency (I'll also use coefficient of variation as another measure just to be thorough, but know that my heart lies with the Gini).

As a reminder, the Gini coefficient expresses inequality on a scale from 0-1, with 1 being the most unequal. So if the Cubs scored all of their runs in a single game and were shut out in all other games, their Gini would be 1, and if they scored an identical number of runs in every game, their Gini would be 0.

I have three interwoven hypotheses:

1) The 2023 Chicago Cubs are bad at scoring runs BUT

2) When they do score, they score a lot, which is bad BECAUSE

3) Teams are more successful when their run scoring is balanced across games.

Welp, hypothesis #1 is easily disprovable. It turns out the Cubs are 10th in the majors this year in runs scored. My apologies to the Ricketts family for the pessimism.

Hypothesis #2 is kinda true, but only kinda. It turns out the Cubs are 11th in the majors this year when it comes to inconsistency of runs (that is, their Gini coefficient is 11th highest of any team.) If you'd like the league-wide results, feast your eyes upon the homeliest table you ever did see:

Finally, hypothesis #3 seems to be correct! Using Baseball-Reference, I pulled the last decade worth of games for every MLB team, and there is a significant, negative correlation between a team's run imbalance in a given season and its record that season.

In fact, if we look at a team's over- and under-performance relative to the total runs it scored, the relationship gets even stronger: the teams that are able to squeeze the most wins out of their runs are the ones with a balanced scoring distribution. This checks with intuition: if you could give a team, say, 700 runs to use across a season, it would do no good for that team to spend them all in one game! They'd want to parcel their scoring out more consistently.

OK, but what about runs allowed? It turns out that the same logic applies (but in reverse)! The more successful teams are the ones whose runs allowed are the most imbalanced. Here again, this aligns with intuition: if a team is going to give up a certain number of runs across the season, it serves them better to have those runs clustered in as few games as possible.

For your viewing pleasure, here's a chart of wins above/below expectation vs. imbalance of runs scored:

It's noisy, but there is a negative relationship!

And here's a similar chart but with imbalance of runs allowed on the x-axis:

A messy but positive relationship! Lastly, here are the results of a linear model that predicts wins above/below expectation using runs-scored imbalance and runs-allowed imbalance:


bottom of page