Can you make money by betting on underdogs in the NBA?

You might be surprised to learn that the NBA teams that win the most games do NOT win bettors the most money!

In this post, I will analyze NBA betting data and run simulations to show you why this has historically been true — and explain which teams you should bet on instead. You won’t even need basketball knowledge to successfully implement this strategy.

Although you don’t need to be an NBA expert, this post assumes you have some knowledge of sports betting concepts and terminology. If you aren’t familiar with them, check out this short guide for a primer.

How accurate are betting odds?

Let’s start with the basics — when two NBA teams play each other, one team is considered more likely to win (the favorite) while the other more likely to lose (the underdog). If you were to place a moneyline bet on the underdog, you’d win more money than if you had bet on the favorite; after all, you deserve a bigger reward for picking the less likely winner. Based on the payouts set by the sportsbook, you can calculate the implied probability of each team winning the game.

In theory, the implied probability of winning your bet should be identical to the probability of your team winning the game. In reality, sportsbooks don’t set their odds that way. Instead, they invite action on both sides so the amount of money at stake is balanced, thus reducing their risk and maximizing their profit.

For example, let’s say the Milwaukee Bucks (the team with the best record last year) are playing against the New York Knicks (a team with… not the best record). If bettors rush to bet on the Bucks, the sportsbooks may get nervous about paying out a lot of money if the Bucks win. So they decide to reduce their payout for the Bucks and increase their payout for the Knicks, thus incentivizing more money to be placed on the Knicks’ side.

These payouts could swing such that implied probability of a Bucks win is 90%, when in reality it might only be 80%. In this case we have a value bet, where it would be profitable in the long-run to bet on the Knicks.

Implied Probabilities
Bettors can make money in the long-run if there are differences between “true probability” and “implied probability.” Source: Alpha Sports Betting

Of course, the big challenge that all bettors face is finding the true win probability of each bet AND figuring it out before everyone else does. It’s impossible to know these win probabilities for certain, but can we find games where the implied odds are more likely to be inefficient? This could help us spot opportunities for value bets.

Step 1: Getting data

First things first, we need a large dataset for our analysis. We’ll need data for the following:

  1. The betting odds of each game. We use this info to determine which team to bet on, and if we win the bet, the amount of our winnings.
  2. The winner of each game. We need to know whether we won our bets or not.

I ended up using Sportsbook Review, a site that aggregates historical betting odds from many different sportsbooks. From there, I found an open-source repository with a script that can effectively scrape betting data from any historical NBA game. I modified the script to also scrape the final score of each game and to run for an entire NBA season, which I did for three regular seasons: 2017-18, 2018-19, and 2019-20 (only including games before the pandemic suspended the season). Here’s a snapshot of what the dataset looks like:

There are a lot of different ways to place bets on NBA games, like moneylines, point spreads, point totals, etc. For the sake of this post, I’ll focus exclusively on the profitability of moneylines. Similarly, there are many different sportsbooks you can use to bet. For the sake of simplicity, I will focus exclusively on Pinnacle, which is regarded as having some of the most accurate odds in the industry.

Step 2: Finding differences in implied probability

The goal here is to examine the implied odds of past NBA games and determine if they’ve been historically accurate. If there are big discrepancies, then there could be an opportunity to make money.

I calculated the implied win probability for each bet based on their moneyline odds. You might notice that the sum of the win probabilities in each game is greater than one, which shouldn’t be possible! However, sportsbooks do this on purpose to profit from the total betting action. To adjust for this, I normalized the win probabilities to add up to one, which results in the REAL implied win probability for each bet.

# Function that calculates win probability from moneyline
def ml_to_win_prob(ml):
if ml < 0:
prob = ml / (ml + 100)
else:
prob = 100 / (ml + 100)
return prob
# Data wrangling: add new columns for W/L outcomes and win probabilities
df_generator = df_nba_lines.iterrows()
for (i, row1), (j, row2) in zip(df_generator, df_generator):
# Determine winner of each game and insert values in new column
if row1['score'] < row2['score']:
df_nba_lines.at[i, 'outcome'] = 'L'
df_nba_lines.at[j, 'outcome'] = 'W'
elif row1['score'] > row2['score']:
df_nba_lines.at[i, 'outcome'] = 'W'
df_nba_lines.at[j, 'outcome'] = 'L'
# Calculate implied win probabilities for each team
row1_win_prob = ml_to_win_prob(row1['ml_PIN'])
row2_win_prob = ml_to_win_prob(row2['ml_PIN'])
df_nba_lines.at[i, 'win_prob_PIN'] = row1_win_prob
df_nba_lines.at[j, 'win_prob_PIN'] = row2_win_prob
# Calculate NORMALIZED win probabilities for each team
prob_sum = row1_win_prob + row2_win_prob
df_nba_lines.at[i, 'win_prob_norm_PIN'] = row1_win_prob / prob_sum
df_nba_lines.at[j, 'win_prob_norm_PIN'] = row2_win_prob / prob_sum
view raw win_rates.py hosted with ❤ by GitHub

Next, I created “bins” so that all bets with similar implied win probabilities are grouped together. Why do this?

Suppose we have a bet with an implied win probability of 11.7% (or +755 moneyline). It’s hard to find many other bets that have this exact moneyline. But if we include it in a bin of all bets from 10% to 15%, then we have quite a few data points to look at in each bin. We then calculate each bin’s actual win rate (number of real-life wins divided by total number of games) and expected win rate (average implied win probability of all bets in the bin).

### Bucket games by ML and compare to actual win percentages. Are the ML actually predictive of final results?
# For now, we will only consider moneylines from Pinnacle!
# Place outcomes into bins based upon their pregame implied win probabilities
bins = 20
df_pin = df_nba_lines[['key', 'date', 'ml_time', 'team', 'opp_team', 'score', 'ml_PIN', 'outcome', 'win_prob_PIN', 'win_prob_norm_PIN']]
df_pin['bin'] = pd.cut(df_pin['win_prob_norm_PIN'], bins=bins)
### Now, the goal is to calculate the win rate for each bin
# Start with grouping by bin and game outcome (W or L)
outcomes = df_pin.groupby(['bin', 'outcome']).size()
# Calculate the win AND loss rates for each bin based on game outcomes
win_rate = outcomes.groupby(level=0).apply(lambda x: x / float(x.sum()))
# Convert to df
df_win_rate = win_rate.reset_index(name='actual_win_rate')
# Filter only for win rate (remove loss rate and unneeded columns)
df_win_rate = df_win_rate[df_win_rate['outcome'] == 'W'][['bin', 'actual_win_rate']]
# Add column for the average implied win rate of each bin. This will the "expected win rate."
expected_win_rate_series = df_pin.groupby(['bin']).mean()['win_prob_norm_PIN']
df_win_rate = df_win_rate.assign(expected_win_rate = expected_win_rate_series.values)
# Calculate residuals (actual minus hypothetical)
df_win_rate['residual'] = df_win_rate['actual_win_rate'] df_win_rate['expected_win_rate']
# Add column for the number of lines (teams) in each bin
size_series = df_pin.groupby(['bin']).size()
df_win_rate = df_win_rate.assign(count = size_series.values)
# Nicely formatted HTML table
df_win_rate.style.format({
'actual_win_rate': '{:,.2%}'.format,
'expected_win_rate': '{:,.2%}'.format,
'residual': '{:,.2%}'.format
})
view raw bin_odds.py hosted with ❤ by GitHub

From there, we can take the difference between the actual win rates and expected win rates – which I’ll call the residual – and see if there are any large discrepancies. Here are the results when dividing all bets into 20 bins. Each bin covers an implied win probability interval of about 5 percentage points.

The bin column shows the probability interval in decimal form. The count column shows the total number of bets in each bin. Notice that the counts are symmetric across the middle row, with the exception of the (0.456, 0.5] interval having 12 games (24 bets) with exactly 50/50 odds.

It turns out that the implied win probabilities (and therefore the moneylines) are pretty accurate! In general, the actual and expected win probabilities don’t differ by more than 5%. However, there is a slight negative correlation between residual and expected win rate. It appears that huge underdogs have been slightly underrated, while huge favorites have been slightly overrated.

Step 3: Simulating the strategy

Now, it’s time to put my (imaginary) money where my mouth is. In the last section, we found that huge underdogs might actually be slightly undervalued. What happens if we simulate betting on underdogs over the last three NBA seasons? We’ll backtest with actual game results and Pinnacle moneylines from the 2016-17 to 2019-20 seasons.

I wrote a function that simulates a betting strategy and tracks our winnings over time. We first must set a bet amount, which will be $100 every time for the sake of simplicity. We must also set a “win probability threshold,” which determines the underdog teams we’ll bet on. If we set it to 0.5, then we bet on any team with a win probability less than 50% (aka the underdog of every game). If we set it to 0.2, then we only bet on the big underdogs of lopsided games, where the win probability is less than 20%.

def get_ml_bet_winnings(ml_odds, bet_amount):
if ml_odds < 0:
returns = (100.0 / ml_odds) * bet_amount
else:
returns = (bet_amount / 100.0) * ml_odds
return returns
# Function that simulates a betting strategy
def simulate_bets(dataset_df=df_pin, bet_amount=100, win_prob_threshold=0.5):
total_profit = 0
all_winnings = []
running_profits = []
df_generator = dataset_df.iterrows()
for (i, row1), (j, row2) in zip(df_generator, df_generator):
### 1. Determine who to bet on for this game
t1_ml = row1['ml_PIN']
t2_ml = row2['ml_PIN']
t1_score = row1['score']
t2_score = row2['score']
t1_win_prob = row1['win_prob_norm_PIN']
t2_win_prob = row2['win_prob_norm_PIN']
# If any odds are missing, just skip the game
if pd.isnull(t1_ml) or pd.isnull(t2_ml):
continue
if t1_win_prob < win_prob_threshold:
placed_bet_on = 1
bet_ml = t1_ml
elif t2_win_prob <= win_prob_threshold:
placed_bet_on = 2
bet_ml = t2_ml
else:
# Not a favorable bet. Skip to next game
continue
### 2. Deterimine if we won or lost this bet
if t1_score > t2_score:
game_winner = 1
elif t2_score > t1_score:
game_winner = 2
else:
# Throw out any games missing scores (shouldn't be any)
continue
is_win = game_winner == placed_bet_on
### 3. Calculate winnings from this bet
# If favorite lost, then deduct bet from net profits
if is_win is False:
winnings = bet_amount
# If favorite won, then add winnings to net profits
if is_win is True:
winnings = get_ml_bet_winnings(bet_ml, bet_amount)
### 4. Record winnings and update total profits
total_profit += winnings
all_winnings.append(winnings)
running_profits.append(total_profit)
return total_profit, all_winnings, running_profits
view raw simulate_bets.py hosted with ❤ by GitHub

We’re ready to go now. What happens if we run the simulation with a threshold of 0.5, which places a $100 bet on the underdog of every game?

Oh no, we lost money! We lost $3,903 after three seasons, with an especially brutal stretch from bets 600 to 1,000.

It’s worth noting that the expected value of every bet is negative. As I mentioned earlier, the sportsbook takes a cut of every bet through the virgorish or “vig.” Pinnacle has a vig of about 2-3%, which is actually quite low. Any strategy that is profitable must be at least 3% better than break-even!

Next, let’s try the exact opposite strategy and bet on the favorite of every game. After a tiny tweak to my simulator function, we get the following results over time:

What an absolute nightmare! With this strategy, we lost $10,352. Comparing this graph with the previous one, we can see that the trends move in opposite directions (as they should), but the winnings are completely outweighed by the magnitude of the losses.

Finally, let’s test our hyped-up strategy of betting on huge underdogs. What happens if we run the simulation with a threshold of 0.2?

We make a total profit of $7,182!! Not bad at all!

One really important caveat is that we lost nearly $3,000 before making profits afterwards. In order to survive with this strategy, you will need to have a large bankroll and/or make small-sized bets. Otherwise, it’d be easy to go on a long losing streak and completely run out of cash.

If you’d like to check out my complete Jupyter notebook, you can find that here.

Conclusion

Based on this analysis, there may be a profitable strategy by betting on big-time underdogs. You could think of every bet as a lottery ticket with a high likelihood of losing but a large upside.

I do think it’s feasible that these underdogs are relatively “underpriced” while the heavy favorites are “overpriced.” There could be a psychological explanation for a lot of bettors; people might want to win more often at the cost of long-term monetary returns.

In summary, this underdog strategy requires patiently enduring long losing streaks, placing small-sized bets, and having a big enough bankroll. However, if the bottom-feeders of the NBA can pull off enough rare Ws, you just might be able to make cash.

And by the way, the Knicks already beat the Bucks this season, despite having a 12% implied win probability! Perhaps a sign of things to come.


Thank you for taking the time to read my article! Please keep in mind that there are disclaimers that come with my findings:

  1. This post is meant for cultivating gambling and data science knowledge. Use this info at your own risk!
  2. I have never wagered any of my own, real-life money using this strategy (yet).
  3. Despite hearing very positive reviews, I have never used Pinnacle.
  4. Even though it’s useful for simulations, backtesting is by no means a guarantee for future outcomes.
  5. The SBR website is missing odds data entirely from 1/4/18 to 1/18/18, so there are about two weeks of games that aren’t considered in my analysis.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s