# Comparing Home and Away Wins of Kenny Dalglish’s Managerial Runs (Also, Did It Matter?)

After my first post, @HighlandDataSci on Twitter had a great question: Was Kenny Dalglish’s home and away win odds ratio different during his first run as manager than his second?

About those circumstances: Dalglish lead Liverpool to 10 trophies between 1985 and 1991. He returned in 2011 and won the League Cup during his short run of games, but his mission in the end was to guide the club between the managerial run of Roy Hodsgon (who managed too few games for me to justify including in the analysis) and that of Brendan Rodgers.

In this analysis we’ll compare the home and away win odds of Dalglish’s two seasons, but we’ll also start exploring the question “What does all this mean for winning games?”

# Setting Up the Analysis

For the sake of brevity, I’ll refer readers to my first post to look at the code I used to build the initial Liverpool managers dataset. We’ll be using the `engsoccerdata` package and a few others.

``````#install_github("jalapic/engsoccerdata")
library(engsoccerdata)
library(tidyverse)
library(lubridate)
library(stringr)
library(rvest)
library(knitr)``````

We’ll be starting with the same dataset I used in the first post, which we assigned to the variable `lfc`. We’ll narrow it down to only Kenny Dalglish’s games and split his games up so we can look at his two runs separately.

``````dalglish <- lfc %>%
# Add a column for wins
mutate(win = 0,
win = ifelse(home == "Liverpool" & result == "H", 1, win),
win = ifelse(visitor == "Liverpool" & result == "A", 1, win),
res = "",
res = ifelse(home == "Liverpool" & result == "H", 2, res),
res = ifelse(visitor == "Liverpool" & result == "A", 2, res),
res = ifelse(home == "Liverpool" & result == "A", 0, res),
res = ifelse(visitor == "Liverpool" & result == "H", 0, res),
res = ifelse(result == "D", 1, res)) %>%
select(Date, mngr, at_anf, win, res, hgoal, vgoal, goaldif) %>%
filter(mngr == "Dalglish") %>%
# Split into first and second run
mutate(run = "",
run = ifelse(Date >= ymd(19850817) & Date <= ymd(19910209),
"first",
"second")) ``````

# Comparing Odds Ratios

When we calculate the ratio of home odds to away odds for each of Dalglish’s seasons, we see that his home/away win odds ratio was slightly higher during his first run than it was during his second. This means that in his first run, his home win odds were 2.5 times bigger than his away win odds. In his second run, his odds of winning at home were the same as his odds of winning away.

``````dalglish_odds <- dalglish %>%
# Add a column for wins
group_by(mngr, run) %>%
# Add columns for home and away win totals and total games
summarise(total_hwins = sum(at_anf == "Anfield" & win == 1),
total_hgames = sum(at_anf == "Anfield"),
total_awins = sum(at_anf != "Anfield" & win == 1),
total_agames = sum(at_anf != "Anfield"),
total_games = total_hgames + total_agames) %>%
ungroup() %>%
# Calculate the the odds ratios
mutate(h_odds = (total_hwins / total_hgames) / (1 - total_hwins / total_hgames),
a_odds = (total_awins / total_agames) / (1 - total_awins / total_agames),
odds_ratio = round(h_odds / a_odds, 2)) %>%
select(mngr, run, total_games, odds_ratio)

kable(dalglish_odds, caption = "Dalglish's Home and Away Wins Odds Ratio")``````
Table 1: Dalglish’s Home and Away Wins Odds Ratio
mngr run total_games odds_ratio
Dalglish first 224 2.5
Dalglish second 56 1.0

Before looking at these numbers, I guessed that a longer run of games and a much different era of winning for Liverpool would require a more even performance between home and away games. We can see that the first condition is definitely there. King Kenny’s first run as Liverpool manager was for 224 games, compared to his second run of only 56. You’d expect that with a larger bucket of games to examine, we’d begin to see the randomness shake out to reveal the odds ratio that Dalglish eventually settled on. So was the second odds ratio a result of too few games for Dalglish to settle in? Let’s look at that by plotting a game by game cumulative odds ratio for each managerial run.

We’ll create the plot by adding a column of game numbers for each run and then calculating the home/away win odds ratio as it changed game by game. Then we’ll plot the two running odds ratios together so we can compare.

``````# Add running totals and odds ratios to dataset

odds_ratio <- function(h, tot_h, a, tot_a) {
# Calculates odd ratios
#
# Args
#   h: home wins
#   tot_h: total home games
#   a: away wins
#   tot_a: total away games
((h/tot_h) / (1 - h/tot_a)) / ((a/tot_a) / (1 - a/tot_a))
}

running_odds <- dalglish %>%
group_by(run) %>%
mutate(run_hwin = ifelse(at_anf == "Anfield" & win == 1, 1, 0),
# Running total of home wins
run_hwin = cumsum(run_hwin),
# Running total of home games
run_hgames = ifelse(at_anf == "Anfield", 1, 0),
run_hgames = cumsum(run_hgames),
run_awin = ifelse(at_anf == "Away" & win == 1, 1, 0),
run_awin = cumsum(run_awin),
run_agames = ifelse(at_anf == "Away", 1, 0),
run_agames = cumsum(run_agames),
run_odds_ratio = odds_ratio(
run_hwin, run_hgames, run_awin, run_agames
)) %>%
ungroup() %>%
split(.\$run) %>%
# Number the games of each run
map(~mutate(., game = seq(1:nrow(.)))) %>%
bind_rows() %>%
# Remove NaNs caused by 0 home and away games played
filter(!is.nan(run_odds_ratio))``````
``````# Make the plot

# Parameters for annotate
first <- running_odds %>%
filter(run == "first") %>%
filter(row_number() == nrow(.))

second <- running_odds %>%
filter(run == "second") %>%
filter(row_number() == nrow(.))

ggplot(data = running_odds, aes(x = game, y = run_odds_ratio, color = run)) +
geom_line(size = .75, alpha = .5) +
scale_color_manual(values = c("red", "purple"),
label = c("1985-1991", "2011-2012")) +
annotate("text",
x = nrow(filter(running_odds, run == "first")),
y = 0,
label = first\$run_odds_ratio,
color = "red",
size = 5) +
annotate("text",
x = nrow(filter(running_odds, run == "second")),
y = -2,
label = second\$run_odds_ratio,
color = "purple",
size = 5) +
labs(title = "Home/Away Win Odds Ratio Stabilized With More Games",
subtitle = "Dalglish's Running Odds Ratio",
x = "Number of Games",
y = "Running Odds Ratio",
color = "")`````` The odds ratio jumps up and down quite a bit before 25 games before it starts to settle down, but it looks like the odds ratio flattened out earlier during the second run than on the first. There’s no way of knowing for sure how Dalglish’s second run would have progressed beyond the fifty or so games that he managed, but what is striking is how consistent he was in both of these seasons compared to other managers. By comparison, Bill Shankly’s odds of winning a game at home were almost five times the odds of winning away, and he managed Liverpool for 611 games.

### Quick Detour: How The Number of Games Affects the Odds

About 6 percent of the odds ratio values in the Dalglish dataset came out as infinity and non-numbers, so I did some exploring to understand why that was. I’ll share that here, but feel free to skip this part if you’re not interested.

It makes sense that the odds ratio jumps up and down a lot in the first few games, since we know that when the total home and away games are low each additional win makes a bigger difference. Here are some odds calculations that compare the value of 1 win out of 2 games with the value of 1 win out of 100 games. If you play less games, every additional win tends to shake the odds ratio up more than if you are on a longer run of games.

``````# Value of 1 win if you've played 2 games
(1/2) / (1 - 1/2) ``````
``##  1``
``````# Value of 1 win of you've played 100 games
(1/100) / (1 - 1/100)``````
``##  0.01010101``

# But What Really Matters Here? (aka, Ryan overthinks it again)

I was chatting with a friend and fellow Liverpool supporter about Klopp’s superb consistency of home odds to away odds. At the time of writing, his odds of winning at home are about the same as winning away over about 50 games. But it struck me later that the discussion may be missing the point. Evaluating a manager using their relative difference between home win odds and away win odds is a bit like comparing highway and city gas mileage in cars: you can find a car that is the most consistent between the two efficiency measurements, but does that necessarily mean that car is getting you the most miles for a gallon of gas?

Football is about winning games, so let’s examine the managers that had the highest percentage of wins to see if they were also the most consistent home and away winners.

``````wins <- lfc %>%
# Add a column for wins
mutate(win = 0,
win = ifelse(home == "Liverpool" & result == "H", 1, win),
win = ifelse(visitor == "Liverpool" & result == "A", 1, win)) %>%
select(mngr, at_anf, win) %>%
group_by(mngr) %>%
# Add columns for home and away win totals and total games
summarise(total_hwins = sum(at_anf == "Anfield" & win == 1),
total_hgames = sum(at_anf == "Anfield"),
total_awins = sum(at_anf != "Anfield" & win == 1),
total_agames = sum(at_anf != "Anfield"),
total_games = total_hgames + total_agames,
win_perc = round(
100 * ((total_hwins + total_awins) / total_games), 2)
) %>%
ungroup() %>%
# Calculate the the odds ratios
mutate(h_odds = (total_hwins / total_hgames) / (1 - total_hwins / total_hgames),
a_odds = (total_awins / total_agames) / (1 - total_awins / total_agames),
odds_ratio = round(h_odds / a_odds, 2)) %>%
select(mngr, total_games, odds_ratio, win_perc) %>%
filter(total_games >= 30) %>%
arrange(desc(win_perc))

kable(head(wins, n = 5), caption = "Top Five Win Percentages")``````
Table 2: Top Five Win Percentages
mngr total_games odds_ratio win_perc
Barclay 88 5.40 57.95
Dalglish 280 2.03 57.14
Paisley 375 4.20 56.00
Benítez 228 2.98 55.26
Ashworth 136 3.85 52.94

The five managers with the best win percentage do not all share home and away consistency. Though it’s worth noting that none of them had the lopsidedness of Graeme Souness, who had home win odds approaching 8 times that of his away win odds and finished his managerial run winning 41 percent of his games, fifth lowest on our win percentage list. Are the odds ratios just as varied among managers who didn’t win as many games overall? We can plot all the managers in the dataset to see how spread out the big winners were along an axis of home/win odds ratios.

``````top_5 <- head(wins, n = 5)

ggplot(data = wins, aes(x = odds_ratio, y = win_perc)) +
geom_point(aes(size = total_games), alpha = .5) +
geom_point(data = top_5, color = "red", aes(size = total_games)) +
geom_text(data = wins, aes(label = mngr), nudge_x = .6, check_overlap = T) +
labs(title = "Win Percentage Varies Across Home/Away Win Odds Ratio",
subtitle = "Liverpool Top Flight League Games 1893 - 2017",
x = "home/away win odds ratio",
y = "win percentage")`````` Pretty spread out! Dalglish’s odds ratio was about as consistent as Fagan, Rodgers, and Houllier over both his runs, but he had a higher win percentage. On the other hand, Barclay’s win percentage was comparable to Dalglish’s but he somehow managed a home/away odds ratio almost three times that of Dalglish’s. This is all a long way of saying that there are more ways to get a high win percentage than being just as good at home as you are away. ##### Ryan Estrellado
###### Executive Consultant at South County SELPA, Educator, and Data Scientist

I work at the San Diego South County SELPA, where I design solutions to help SELPAs build equitable systems for students with disabilities.