Statistical association football predictions

Alue sisältää keskustelua kirjoista. Juonipaljastukset ovat mahdollisia!

Statistical association football predictions

ViestiKirjoittaja UnyPlewly » 05.02.2021 00:56

Kuva

Kuva








п»їPredicting Football Results With Statistical Modelling.
Combining the world’s most popular sport with everyone’s favourite discrete probability distribution, this post predicts football matches using the Poisson distribution.
David Sheehan.
Data scientist interested in sports, politics and Simpsons references.
Football (or soccer to my American readers) is full of clichés: “It’s a game of two halves”, “taking it one game at a time” and “Liverpool have failed to win the Premier League”. You’re less likely to hear “Treating the number of goals scored by each team as independent Poisson processes, statistical modelling suggests that the home team have a 60% chance of winning today”. But this is actually a bit of cliché too (it has been discussed here, here, here, here and particularly well here). As we’ll discover, a simple Poisson model is, well, overly simplistic. But it’s a good starting point and a nice intuitive way to learn about statistical modelling. So, if you came here looking to make money, I hear this guy makes £5000 per month without leaving the house.
Poisson Distribution.
HomeTeam AwayTeam HomeGoals AwayGoals 0 Burnley Swansea 0 1 1 Crystal Palace West Brom 0 1 2 Everton Tottenham 1 1 3 Hull Leicester 2 1 4 Man City Sunderland 2 1.
We imported a csv as a pandas dataframe, which contains various information for each of the 380 EPL games in the 2016-17 English Premier League season. We restricted the dataframe to the columns in which we’re interested (specifically, team names and numer of goals scored by each team). I’ll omit most of the code that produces the graphs in this post. But don’t worry, you can find that code on my github page. Our task is to model the final round of fixtures in the season, so we must remove the last 10 rows (each gameweek consists of 10 matches).
You’ll notice that, on average, the home team scores more goals than the away team. This is the so called ‘home (field) advantage’ (discussed here) and isn’t specific to soccer. This is a convenient time to introduce the Poisson distribution. It’s a discrete probability distribution that describes the probability of the number of events within a specific time period (e.g 90 mins) with a known average rate of occurrence. A key assumption is that the number of events is independent of time. In our context, this means that goals don’t become more/less likely by the number of goals already scored in the match. Instead, the number of goals is expressed purely as function an average rate of goals. If that was unclear, maybe this mathematical formulation will make clearer:
represents the average rate (e.g. average number of goals, average number of letters you receive, etc.). So, we can treat the number of goals scored by the home and away team as two independent Poisson distributions. The plot below shows the proportion of goals scored compared to the number of goals estimated by the corresponding Poisson distributions.
We can use this statistical model to estimate the probability of specfic events.
The probability of a draw is simply the sum of the events where the two teams score the same amount of goals.
Note that we consider the number of goals scored by each team to be independent events (i.e. P(A n B) = P(A) P(B)). The difference of two Poisson distribution is actually called a Skellam distribution. So we can calculate the probability of a draw by inputting the mean goal values into this distribution.
So, hopefully you can see how we can adapt this approach to model specific matches. We just need to know the average number of goals scored by each team and feed this data into a Poisson model. Let’s have a look at the distribution of goals scored by Chelsea and Sunderland (teams who finished 1st and last, respectively).
Building A Model.
You should now be convinced that the number of goals scored by each team can be approximated by a Poisson distribution. Due to a relatively sample size (each team plays at most 19 home/away games), the accuracy of this approximation can vary significantly (especially earlier in the season when teams have played fewer games). Similar to before, we could now calculate the probability of various events in this Chelsea Sunderland match. But rather than treat each match separately, we’ll build a more general Poisson regression model (what is that?).
Generalized Linear Model Regression Results Dep. Variable: goals No. Observations: 740 Model: GLM Df Residuals: 700 Model Family: Poisson Df Model: 39 Link Function: log Scale: 1.0 Method: IRLS Log-Likelihood: -1042.4 Date: Sat, 10 Jun 2017 Deviance: 776.11 Time: 11:17:38 Pearson chi2: 659. No. Iterations: 8 coef std err z P>|z| [95.0% Conf. Int.] Intercept 0.3725 0.198 1.880 0.060 -0.016 0.761 team[T.Bournemouth] -0.2891 0.179 -1.612 0.107 -0.641 0.062 team[T.Burnley] -0.6458 0.200 -3.230 0.001 -1.038 -0.254 team[T.Chelsea] 0.0789 0.162 0.488 0.626 -0.238 0.396 team[T.Crystal Palace] -0.3865 0.183 -2.107 0.035 -0.746 -0.027 team[T.Everton] -0.2008 0.173 -1.161 0.246 -0.540 0.138 team[T.Hull] -0.7006 0.204 -3.441 0.001 -1.100 -0.302 team[T.Leicester] -0.4204 0.187 -2.249 0.025 -0.787 -0.054 team[T.Liverpool] 0.0162 0.164 0.099 0.921 -0.306 0.338 team[T.Man City] 0.0117 0.164 0.072 0.943 -0.310 0.334 team[T.Man United] -0.3572 0.181 -1.971 0.049 -0.713 -0.002 team[T.Middlesbrough] -1.0087 0.225 -4.481 0.000 -1.450 -0.568 team[T.Southampton] -0.5804 0.195 -2.976 0.003 -0.963 -0.198 team[T.Stoke] -0.6082 0.197 -3.094 0.002 -0.994 -0.223 team[T.Sunderland] -0.9619 0.222 -4.329 0.000 -1.397 -0.526 team[T.Swansea] -0.5136 0.192 -2.673 0.008 -0.890 -0.137 team[T.Tottenham] 0.0532 0.162 0.328 0.743 -0.265 0.371 team[T.Watford] -0.5969 0.197 -3.035 0.002 -0.982 -0.211 team[T.West Brom] -0.5567 0.194 -2.876 0.004 -0.936 -0.177 team[T.West Ham] -0.4802 0.189 -2.535 0.011 -0.851 -0.109 opponent[T.Bournemouth] 0.4109 0.196 2.092 0.036 0.026 0.796 opponent[T.Burnley] 0.1657 0.206 0.806 0.420 -0.237 0.569 opponent[T.Chelsea] -0.3036 0.234 -1.298 0.194 -0.762 0.155 opponent[T.Crystal Palace] 0.3287 0.200 1.647 0.100 -0.062 0.720 opponent[T.Everton] -0.0442 0.218 -0.202 0.840 -0.472 0.384 opponent[T.Hull] 0.4979 0.193 2.585 0.010 0.120 0.875 opponent[T.Leicester] 0.3369 0.199 1.694 0.090 -0.053 0.727 opponent[T.Liverpool] -0.0374 0.217 -0.172 0.863 -0.463 0.389 opponent[T.Man City] -0.0993 0.222 -0.448 0.654 -0.534 0.335 opponent[T.Man United] -0.4220 0.241 -1.754 0.079 -0.894 0.050 opponent[T.Middlesbrough] 0.1196 0.208 0.574 0.566 -0.289 0.528 opponent[T.Southampton] 0.0458 0.211 0.217 0.828 -0.369 0.460 opponent[T.Stoke] 0.2266 0.203 1.115 0.265 -0.172 0.625 opponent[T.Sunderland] 0.3707 0.198 1.876 0.061 -0.017 0.758 opponent[T.Swansea] 0.4336 0.195 2.227 0.026 0.052 0.815 opponent[T.Tottenham] -0.5431 0.252 -2.156 0.031 -1.037 -0.049 opponent[T.Watford] 0.3533 0.198 1.782 0.075 -0.035 0.742 opponent[T.West Brom] 0.0970 0.209 0.463 0.643 -0.313 0.507 opponent[T.West Ham] 0.3485 0.198 1.758 0.079 -0.040 0.737 home 0.2969 0.063 4.702 0.000 0.173 0.421.
If you’re curious about the smf.glm(. ) part, you can find more information here (edit: earlier versions of this post had erroneously employed a Generalised Estimating Equation (GEE)- what’s the difference?). I’m more interested in the values presented in the coef column in the model summary table, which are analogous to the slopes in linear regression. Similar to logistic regression, we take the exponent of the parameter values. A positive value implies more goals (), while values closer to zero represent more neutral effects (). Towards the bottom of the table you might notice that home has a coef of 0.2969. This captures the fact that home teams generally score more goals than the away team (specifically, =1.35 times more likely). But not all teams are created equal. Chelsea has a coef of 0.0789, while the corresponding value for Sunderland is -0.9619 (sort of saying Chelsea (Sunderland) are better (much worse!) scorers than average). Finally, the opponent* values penalize/reward teams based on the quality of the opposition. This relfects the defensive strength of each team (Chelsea: -0.3036; Sunderland: 0.3707). In other words, you’re less likely to score against Chelsea. Hopefully, that all makes both statistical and intuitive sense.
Let’s start making some predictions for the upcoming matches. We simply pass our teams into poisson_model and it’ll return the expected average number of goals for that team (we need to run it twice- we calculate the expected average number of goals for each team separately). So let’s see how many goals we expect Chelsea and Sunderland to score.


Predicting Football Results With Statistical Modelling.
Combining the world’s most popular sport with everyone’s favourite discrete probability distribution, this post predicts football matches using the Poisson distribution.
David Sheehan.
Data scientist interested in sports, politics and Simpsons references.
Football (or soccer to my American readers) is full of clichés: “It’s a game of two halves”, “taking it one game at a time” and “Liverpool have failed to win the Premier League”. You’re less likely to hear “Treating the number of goals scored by each team as independent Poisson processes, statistical modelling suggests that the home team have a 60% chance of winning today”. But this is actually a bit of cliché too (it has been discussed here, here, here, here and particularly well here). As we’ll discover, a simple Poisson model is, well, overly simplistic. But it’s a good starting point and a nice intuitive way to learn about statistical modelling. So, if you came here looking to make money, I hear this guy makes £5000 per month without leaving the house.
Poisson Distribution.
HomeTeam AwayTeam HomeGoals AwayGoals 0 Burnley Swansea 0 1 1 Crystal Palace West Brom 0 1 2 Everton Tottenham 1 1 3 Hull Leicester 2 1 4 Man City Sunderland 2 1.
We imported a csv as a pandas dataframe, which contains various information for each of the 380 EPL games in the 2016-17 English Premier League season. We restricted the dataframe to the columns in which we’re interested (specifically, team names and numer of goals scored by each team). I’ll omit most of the code that produces the graphs in this post. But don’t worry, you can find that code on my github page. Our task is to model the final round of fixtures in the season, so we must remove the last 10 rows (each gameweek consists of 10 matches).
You’ll notice that, on average, the home team scores more goals than the away team. This is the so called ‘home (field) advantage’ (discussed here) and isn’t specific to soccer. This is a convenient time to introduce the Poisson distribution. It’s a discrete probability distribution that describes the probability of the number of events within a specific time period (e.g 90 mins) with a known average rate of occurrence. A key assumption is that the number of events is independent of time. In our context, this means that goals don’t become more/less likely by the number of goals already scored in the match. Instead, the number of goals is expressed purely as function an average rate of goals. If that was unclear, maybe this mathematical formulation will make clearer:
represents the average rate (e.g. average number of goals, average number of letters you receive, etc.). So, we can treat the number of goals scored by the home and away team as two independent Poisson distributions. The plot below shows the proportion of goals scored compared to the number of goals estimated by the corresponding Poisson distributions.
We can use this statistical model to estimate the probability of specfic events.
The probability of a draw is simply the sum of the events where the two teams score the same amount of goals.
Note that we consider the number of goals scored by each team to be independent events (i.e. P(A n B) = P(A) P(B)). The difference of two Poisson distribution is actually called a Skellam distribution. So we can calculate the probability of a draw by inputting the mean goal values into this distribution.
So, hopefully you can see how we can adapt this approach to model specific matches. We just need to know the average number of goals scored by each team and feed this data into a Poisson model. Let’s have a look at the distribution of goals scored by Chelsea and Sunderland (teams who finished 1st and last, respectively).
Building A Model.
You should now be convinced that the number of goals scored by each team can be approximated by a Poisson distribution. Due to a relatively sample size (each team plays at most 19 home/away games), the accuracy of this approximation can vary significantly (especially earlier in the season when teams have played fewer games). Similar to before, we could now calculate the probability of various events in this Chelsea Sunderland match. But rather than treat each match separately, we’ll build a more general Poisson regression model (what is that?).
Generalized Linear Model Regression Results Dep. Variable: goals No. Observations: 740 Model: GLM Df Residuals: 700 Model Family: Poisson Df Model: 39 Link Function: log Scale: 1.0 Method: IRLS Log-Likelihood: -1042.4 Date: Sat, 10 Jun 2017 Deviance: 776.11 Time: 11:17:38 Pearson chi2: 659. No. Iterations: 8 coef std err z P>|z| [95.0% Conf. Int.] Intercept 0.3725 0.198 1.880 0.060 -0.016 0.761 team[T.Bournemouth] -0.2891 0.179 -1.612 0.107 -0.641 0.062 team[T.Burnley] -0.6458 0.200 -3.230 0.001 -1.038 -0.254 team[T.Chelsea] 0.0789 0.162 0.488 0.626 -0.238 0.396 team[T.Crystal Palace] -0.3865 0.183 -2.107 0.035 -0.746 -0.027 team[T.Everton] -0.2008 0.173 -1.161 0.246 -0.540 0.138 team[T.Hull] -0.7006 0.204 -3.441 0.001 -1.100 -0.302 team[T.Leicester] -0.4204 0.187 -2.249 0.025 -0.787 -0.054 team[T.Liverpool] 0.0162 0.164 0.099 0.921 -0.306 0.338 team[T.Man City] 0.0117 0.164 0.072 0.943 -0.310 0.334 team[T.Man United] -0.3572 0.181 -1.971 0.049 -0.713 -0.002 team[T.Middlesbrough] -1.0087 0.225 -4.481 0.000 -1.450 -0.568 team[T.Southampton] -0.5804 0.195 -2.976 0.003 -0.963 -0.198 team[T.Stoke] -0.6082 0.197 -3.094 0.002 -0.994 -0.223 team[T.Sunderland] -0.9619 0.222 -4.329 0.000 -1.397 -0.526 team[T.Swansea] -0.5136 0.192 -2.673 0.008 -0.890 -0.137 team[T.Tottenham] 0.0532 0.162 0.328 0.743 -0.265 0.371 team[T.Watford] -0.5969 0.197 -3.035 0.002 -0.982 -0.211 team[T.West Brom] -0.5567 0.194 -2.876 0.004 -0.936 -0.177 team[T.West Ham] -0.4802 0.189 -2.535 0.011 -0.851 -0.109 opponent[T.Bournemouth] 0.4109 0.196 2.092 0.036 0.026 0.796 opponent[T.Burnley] 0.1657 0.206 0.806 0.420 -0.237 0.569 opponent[T.Chelsea] -0.3036 0.234 -1.298 0.194 -0.762 0.155 opponent[T.Crystal Palace] 0.3287 0.200 1.647 0.100 -0.062 0.720 opponent[T.Everton] -0.0442 0.218 -0.202 0.840 -0.472 0.384 opponent[T.Hull] 0.4979 0.193 2.585 0.010 0.120 0.875 opponent[T.Leicester] 0.3369 0.199 1.694 0.090 -0.053 0.727 opponent[T.Liverpool] -0.0374 0.217 -0.172 0.863 -0.463 0.389 opponent[T.Man City] -0.0993 0.222 -0.448 0.654 -0.534 0.335 opponent[T.Man United] -0.4220 0.241 -1.754 0.079 -0.894 0.050 opponent[T.Middlesbrough] 0.1196 0.208 0.574 0.566 -0.289 0.528 opponent[T.Southampton] 0.0458 0.211 0.217 0.828 -0.369 0.460 opponent[T.Stoke] 0.2266 0.203 1.115 0.265 -0.172 0.625 opponent[T.Sunderland] 0.3707 0.198 1.876 0.061 -0.017 0.758 opponent[T.Swansea] 0.4336 0.195 2.227 0.026 0.052 0.815 opponent[T.Tottenham] -0.5431 0.252 -2.156 0.031 -1.037 -0.049 opponent[T.Watford] 0.3533 0.198 1.782 0.075 -0.035 0.742 opponent[T.West Brom] 0.0970 0.209 0.463 0.643 -0.313 0.507 opponent[T.West Ham] 0.3485 0.198 1.758 0.079 -0.040 0.737 home 0.2969 0.063 4.702 0.000 0.173 0.421.
If you’re curious about the smf.glm(. ) part, you can find more information here (edit: earlier versions of this post had erroneously employed a Generalised Estimating Equation (GEE)- what’s the difference?). I’m more interested in the values presented in the coef column in the model summary table, which are analogous to the slopes in linear regression. Similar to logistic regression, we take the exponent of the parameter values. A positive value implies more goals (), while values closer to zero represent more neutral effects (). Towards the bottom of the table you might notice that home has a coef of 0.2969. This captures the fact that home teams generally score more goals than the away team (specifically, =1.35 times more likely). But not all teams are created equal. Chelsea has a coef of 0.0789, while the corresponding value for Sunderland is -0.9619 (sort of saying Chelsea (Sunderland) are better (much worse!) scorers than average). Finally, the opponent* values penalize/reward teams based on the quality of the opposition. This relfects the defensive strength of each team (Chelsea: -0.3036; Sunderland: 0.3707). In other words, you’re less likely to score against Chelsea. Hopefully, that all makes both statistical and intuitive sense.
Let’s start making some predictions for the upcoming matches. We simply pass our teams into poisson_model and it’ll return the expected average number of goals for that team (we need to run it twice- we calculate the expected average number of goals for each team separately). So let’s see how many goals we expect Chelsea and Sunderland to score.


Predicting Football Results With Statistical Modelling.
Combining the world’s most popular sport with everyone’s favourite discrete probability distribution, this post predicts football matches using the Poisson distribution.
David Sheehan.
Data scientist interested in sports, politics and Simpsons references.
Football (or soccer to my American readers) is full of clichés: “It’s a game of two halves”, “taking it one game at a time” and “Liverpool have failed to win the Premier League”. You’re less likely to hear “Treating the number of goals scored by each team as independent Poisson processes, statistical modelling suggests that the home team have a 60% chance of winning today”. But this is actually a bit of cliché too (it has been discussed here, here, here, here and particularly well here). As we’ll discover, a simple Poisson model is, well, overly simplistic. But it’s a good starting point and a nice intuitive way to learn about statistical modelling. So, if you came here looking to make money, I hear this guy makes £5000 per month without leaving the house.
Poisson Distribution.
HomeTeam AwayTeam HomeGoals AwayGoals 0 Burnley Swansea 0 1 1 Crystal Palace West Brom 0 1 2 Everton Tottenham 1 1 3 Hull Leicester 2 1 4 Man City Sunderland 2 1.
We imported a csv as a pandas dataframe, which contains various information for each of the 380 EPL games in the 2016-17 English Premier League season. We restricted the dataframe to the columns in which we’re interested (specifically, team names and numer of goals scored by each team). I’ll omit most of the code that produces the graphs in this post. But don’t worry, you can find that code on my github page. Our task is to model the final round of fixtures in the season, so we must remove the last 10 rows (each gameweek consists of 10 matches).
You’ll notice that, on average, the home team scores more goals than the away team. This is the so called ‘home (field) advantage’ (discussed here) and isn’t specific to soccer. This is a convenient time to introduce the Poisson distribution. It’s a discrete probability distribution that describes the probability of the number of events within a specific time period (e.g 90 mins) with a known average rate of occurrence. A key assumption is that the number of events is independent of time. In our context, this means that goals don’t become more/less likely by the number of goals already scored in the match. Instead, the number of goals is expressed purely as function an average rate of goals. If that was unclear, maybe this mathematical formulation will make clearer:
represents the average rate (e.g. average number of goals, average number of letters you receive, etc.). So, we can treat the number of goals scored by the home and away team as two independent Poisson distributions. The plot below shows the proportion of goals scored compared to the number of goals estimated by the corresponding Poisson distributions.
We can use this statistical model to estimate the probability of specfic events.
The probability of a draw is simply the sum of the events where the two teams score the same amount of goals.
Note that we consider the number of goals scored by each team to be independent events (i.e. P(A n B) = P(A) P(B)). The difference of two Poisson distribution is actually called a Skellam distribution. So we can calculate the probability of a draw by inputting the mean goal values into this distribution.
So, hopefully you can see how we can adapt this approach to model specific matches. We just need to know the average number of goals scored by each team and feed this data into a Poisson model. Let’s have a look at the distribution of goals scored by Chelsea and Sunderland (teams who finished 1st and last, respectively).
Building A Model.
You should now be convinced that the number of goals scored by each team can be approximated by a Poisson distribution. Due to a relatively sample size (each team plays at most 19 home/away games), the accuracy of this approximation can vary significantly (especially earlier in the season when teams have played fewer games). Similar to before, we could now calculate the probability of various events in this Chelsea Sunderland match. But rather than treat each match separately, we’ll build a more general Poisson regression model (what is that?).
Generalized Linear Model Regression Results Dep. Variable: goals No. Observations: 740 Model: GLM Df Residuals: 700 Model Family: Poisson Df Model: 39 Link Function: log Scale: 1.0 Method: IRLS Log-Likelihood: -1042.4 Date: Sat, 10 Jun 2017 Deviance: 776.11 Time: 11:17:38 Pearson chi2: 659. No. Iterations: 8 coef std err z P>|z| [95.0% Conf. Int.] Intercept 0.3725 0.198 1.880 0.060 -0.016 0.761 team[T.Bournemouth] -0.2891 0.179 -1.612 0.107 -0.641 0.062 team[T.Burnley] -0.6458 0.200 -3.230 0.001 -1.038 -0.254 team[T.Chelsea] 0.0789 0.162 0.488 0.626 -0.238 0.396 team[T.Crystal Palace] -0.3865 0.183 -2.107 0.035 -0.746 -0.027 team[T.Everton] -0.2008 0.173 -1.161 0.246 -0.540 0.138 team[T.Hull] -0.7006 0.204 -3.441 0.001 -1.100 -0.302 team[T.Leicester] -0.4204 0.187 -2.249 0.025 -0.787 -0.054 team[T.Liverpool] 0.0162 0.164 0.099 0.921 -0.306 0.338 team[T.Man City] 0.0117 0.164 0.072 0.943 -0.310 0.334 team[T.Man United] -0.3572 0.181 -1.971 0.049 -0.713 -0.002 team[T.Middlesbrough] -1.0087 0.225 -4.481 0.000 -1.450 -0.568 team[T.Southampton] -0.5804 0.195 -2.976 0.003 -0.963 -0.198 team[T.Stoke] -0.6082 0.197 -3.094 0.002 -0.994 -0.223 team[T.Sunderland] -0.9619 0.222 -4.329 0.000 -1.397 -0.526 team[T.Swansea] -0.5136 0.192 -2.673 0.008 -0.890 -0.137 team[T.Tottenham] 0.0532 0.162 0.328 0.743 -0.265 0.371 team[T.Watford] -0.5969 0.197 -3.035 0.002 -0.982 -0.211 team[T.West Brom] -0.5567 0.194 -2.876 0.004 -0.936 -0.177 team[T.West Ham] -0.4802 0.189 -2.535 0.011 -0.851 -0.109 opponent[T.Bournemouth] 0.4109 0.196 2.092 0.036 0.026 0.796 opponent[T.Burnley] 0.1657 0.206 0.806 0.420 -0.237 0.569 opponent[T.Chelsea] -0.3036 0.234 -1.298 0.194 -0.762 0.155 opponent[T.Crystal Palace] 0.3287 0.200 1.647 0.100 -0.062 0.720 opponent[T.Everton] -0.0442 0.218 -0.202 0.840 -0.472 0.384 opponent[T.Hull] 0.4979 0.193 2.585 0.010 0.120 0.875 opponent[T.Leicester] 0.3369 0.199 1.694 0.090 -0.053 0.727 opponent[T.Liverpool] -0.0374 0.217 -0.172 0.863 -0.463 0.389 opponent[T.Man City] -0.0993 0.222 -0.448 0.654 -0.534 0.335 opponent[T.Man United] -0.4220 0.241 -1.754 0.079 -0.894 0.050 opponent[T.Middlesbrough] 0.1196 0.208 0.574 0.566 -0.289 0.528 opponent[T.Southampton] 0.0458 0.211 0.217 0.828 -0.369 0.460 opponent[T.Stoke] 0.2266 0.203 1.115 0.265 -0.172 0.625 opponent[T.Sunderland] 0.3707 0.198 1.876 0.061 -0.017 0.758 opponent[T.Swansea] 0.4336 0.195 2.227 0.026 0.052 0.815 opponent[T.Tottenham] -0.5431 0.252 -2.156 0.031 -1.037 -0.049 opponent[T.Watford] 0.3533 0.198 1.782 0.075 -0.035 0.742 opponent[T.West Brom] 0.0970 0.209 0.463 0.643 -0.313 0.507 opponent[T.West Ham] 0.3485 0.198 1.758 0.079 -0.040 0.737 home 0.2969 0.063 4.702 0.000 0.173 0.421.
If you’re curious about the smf.glm(. ) part, you can find more information here (edit: earlier versions of this post had erroneously employed a Generalised Estimating Equation (GEE)- what’s the difference?). I’m more interested in the values presented in the coef column in the model summary table, which are analogous to the slopes in linear regression. Similar to logistic regression, we take the exponent of the parameter values. A positive value implies more goals (), while values closer to zero represent more neutral effects (). Towards the bottom of the table you might notice that home has a coef of 0.2969. This captures the fact that home teams generally score more goals than the away team (specifically, =1.35 times more likely). But not all teams are created equal. Chelsea has a coef of 0.0789, while the corresponding value for Sunderland is -0.9619 (sort of saying Chelsea (Sunderland) are better (much worse!) scorers than average). Finally, the opponent* values penalize/reward teams based on the quality of the opposition. This relfects the defensive strength of each team (Chelsea: -0.3036; Sunderland: 0.3707). In other words, you’re less likely to score against Chelsea. Hopefully, that all makes both statistical and intuitive sense.
Let’s start making some predictions for the upcoming matches. We simply pass our teams into poisson_model and it’ll return the expected average number of goals for that team (we need to run it twice- we calculate the expected average number of goals for each team separately). So let’s see how many goals we expect Chelsea and Sunderland to score.


Predicting Football Results With Statistical Modelling.
Combining the world’s most popular sport with everyone’s favourite discrete probability distribution, this post predicts football matches using the Poisson distribution.
David Sheehan.
Data scientist interested in sports, politics and Simpsons references.
Football (or soccer to my American readers) is full of clichés: “It’s a game of two halves”, “taking it one game at a time” and “Liverpool have failed to win the Premier League”. You’re less likely to hear “Treating the number of goals scored by each team as independent Poisson processes, statistical modelling suggests that the home team have a 60% chance of winning today”. But this is actually a bit of cliché too (it has been discussed here, here, here, here and particularly well here). As we’ll discover, a simple Poisson model is, well, overly simplistic. But it’s a good starting point and a nice intuitive way to learn about statistical modelling. So, if you came here looking to make money, I hear this guy makes £5000 per month without leaving the house.
Poisson Distribution.
HomeTeam AwayTeam HomeGoals AwayGoals 0 Burnley Swansea 0 1 1 Crystal Palace West Brom 0 1 2 Everton Tottenham 1 1 3 Hull Leicester 2 1 4 Man City Sunderland 2 1.
We imported a csv as a pandas dataframe, which contains various information for each of the 380 EPL games in the 2016-17 English Premier League season. We restricted the dataframe to the columns in which we’re interested (specifically, team names and numer of goals scored by each team). I’ll omit most of the code that produces the graphs in this post. But don’t worry, you can find that code on my github page. Our task is to model the final round of fixtures in the season, so we must remove the last 10 rows (each gameweek consists of 10 matches).
You’ll notice that, on average, the home team scores more goals than the away team. This is the so called ‘home (field) advantage’ (discussed here) and isn’t specific to soccer. This is a convenient time to introduce the Poisson distribution. It’s a discrete probability distribution that describes the probability of the number of events within a specific time period (e.g 90 mins) with a known average rate of occurrence. A key assumption is that the number of events is independent of time. In our context, this means that goals don’t become more/less likely by the number of goals already scored in the match. Instead, the number of goals is expressed purely as function an average rate of goals. If that was unclear, maybe this mathematical formulation will make clearer:
represents the average rate (e.g. average number of goals, average number of letters you receive, etc.). So, we can treat the number of goals scored by the home and away team as two independent Poisson distributions. The plot below shows the proportion of goals scored compared to the number of goals estimated by the corresponding Poisson distributions.
We can use this statistical model to estimate the probability of specfic events.
The probability of a draw is simply the sum of the events where the two teams score the same amount of goals.
Note that we consider the number of goals scored by each team to be independent events (i.e. P(A n B) = P(A) P(B)). The difference of two Poisson distribution is actually called a Skellam distribution. So we can calculate the probability of a draw by inputting the mean goal values into this distribution.
So, hopefully you can see how we can adapt this approach to model specific matches. We just need to know the average number of goals scored by each team and feed this data into a Poisson model. Let’s have a look at the distribution of goals scored by Chelsea and Sunderland (teams who finished 1st and last, respectively).
Building A Model.
You should now be convinced that the number of goals scored by each team can be approximated by a Poisson distribution. Due to a relatively sample size (each team plays at most 19 home/away games), the accuracy of this approximation can vary significantly (especially earlier in the season when teams have played fewer games). Similar to before, we could now calculate the probability of various events in this Chelsea Sunderland match. But rather than treat each match separately, we’ll build a more general Poisson regression model (what is that?).
Generalized Linear Model Regression Results Dep. Variable: goals No. Observations: 740 Model: GLM Df Residuals: 700 Model Family: Poisson Df Model: 39 Link Function: log Scale: 1.0 Method: IRLS Log-Likelihood: -1042.4 Date: Sat, 10 Jun 2017 Deviance: 776.11 Time: 11:17:38 Pearson chi2: 659. No. Iterations: 8 coef std err z P>|z| [95.0% Conf. Int.] Intercept 0.3725 0.198 1.880 0.060 -0.016 0.761 team[T.Bournemouth] -0.2891 0.179 -1.612 0.107 -0.641 0.062 team[T.Burnley] -0.6458 0.200 -3.230 0.001 -1.038 -0.254 team[T.Chelsea] 0.0789 0.162 0.488 0.626 -0.238 0.396 team[T.Crystal Palace] -0.3865 0.183 -2.107 0.035 -0.746 -0.027 team[T.Everton] -0.2008 0.173 -1.161 0.246 -0.540 0.138 team[T.Hull] -0.7006 0.204 -3.441 0.001 -1.100 -0.302 team[T.Leicester] -0.4204 0.187 -2.249 0.025 -0.787 -0.054 team[T.Liverpool] 0.0162 0.164 0.099 0.921 -0.306 0.338 team[T.Man City] 0.0117 0.164 0.072 0.943 -0.310 0.334 team[T.Man United] -0.3572 0.181 -1.971 0.049 -0.713 -0.002 team[T.Middlesbrough] -1.0087 0.225 -4.481 0.000 -1.450 -0.568 team[T.Southampton] -0.5804 0.195 -2.976 0.003 -0.963 -0.198 team[T.Stoke] -0.6082 0.197 -3.094 0.002 -0.994 -0.223 team[T.Sunderland] -0.9619 0.222 -4.329 0.000 -1.397 -0.526 team[T.Swansea] -0.5136 0.192 -2.673 0.008 -0.890 -0.137 team[T.Tottenham] 0.0532 0.162 0.328 0.743 -0.265 0.371 team[T.Watford] -0.5969 0.197 -3.035 0.002 -0.982 -0.211 team[T.West Brom] -0.5567 0.194 -2.876 0.004 -0.936 -0.177 team[T.West Ham] -0.4802 0.189 -2.535 0.011 -0.851 -0.109 opponent[T.Bournemouth] 0.4109 0.196 2.092 0.036 0.026 0.796 opponent[T.Burnley] 0.1657 0.206 0.806 0.420 -0.237 0.569 opponent[T.Chelsea] -0.3036 0.234 -1.298 0.194 -0.762 0.155 opponent[T.Crystal Palace] 0.3287 0.200 1.647 0.100 -0.062 0.720 opponent[T.Everton] -0.0442 0.218 -0.202 0.840 -0.472 0.384 opponent[T.Hull] 0.4979 0.193 2.585 0.010 0.120 0.875 opponent[T.Leicester] 0.3369 0.199 1.694 0.090 -0.053 0.727 opponent[T.Liverpool] -0.0374 0.217 -0.172 0.863 -0.463 0.389 opponent[T.Man City] -0.0993 0.222 -0.448 0.654 -0.534 0.335 opponent[T.Man United] -0.4220 0.241 -1.754 0.079 -0.894 0.050 opponent[T.Middlesbrough] 0.1196 0.208 0.574 0.566 -0.289 0.528 opponent[T.Southampton] 0.0458 0.211 0.217 0.828 -0.369 0.460 opponent[T.Stoke] 0.2266 0.203 1.115 0.265 -0.172 0.625 opponent[T.Sunderland] 0.3707 0.198 1.876 0.061 -0.017 0.758 opponent[T.Swansea] 0.4336 0.195 2.227 0.026 0.052 0.815 opponent[T.Tottenham] -0.5431 0.252 -2.156 0.031 -1.037 -0.049 opponent[T.Watford] 0.3533 0.198 1.782 0.075 -0.035 0.742 opponent[T.West Brom] 0.0970 0.209 0.463 0.643 -0.313 0.507 opponent[T.West Ham] 0.3485 0.198 1.758 0.079 -0.040 0.737 home 0.2969 0.063 4.702 0.000 0.173 0.421.
If you’re curious about the smf.glm(. ) part, you can find more information here (edit: earlier versions of this post had erroneously employed a Generalised Estimating Equation (GEE)- what’s the difference?). I’m more interested in the values presented in the coef column in the model summary table, which are analogous to the slopes in linear regression. Similar to logistic regression, we take the exponent of the parameter values. A positive value implies more goals (), while values closer to zero represent more neutral effects (). Towards the bottom of the table you might notice that home has a coef of 0.2969. This captures the fact that home teams generally score more goals than the away team (specifically, =1.35 times more likely). But not all teams are created equal. Chelsea has a coef of 0.0789, while the corresponding value for Sunderland is -0.9619 (sort of saying Chelsea (Sunderland) are better (much worse!) scorers than average). Finally, the opponent* values penalize/reward teams based on the quality of the opposition. This relfects the defensive strength of each team (Chelsea: -0.3036; Sunderland: 0.3707). In other words, you’re less likely to score against Chelsea. Hopefully, that all makes both statistical and intuitive sense.
Let’s start making some predictions for the upcoming matches. We simply pass our teams into poisson_model and it’ll return the expected average number of goals for that team (we need to run it twice- we calculate the expected average number of goals for each team separately). So let’s see how many goals we expect Chelsea and Sunderland to score.


Predicting Football Results With Statistical Modelling.
Combining the world’s most popular sport with everyone’s favourite discrete probability distribution, this post predicts football matches using the Poisson distribution.
David Sheehan.
Data scientist interested in sports, politics and Simpsons references.
Football (or soccer to my American readers) is full of clichés: “It’s a game of two halves”, “taking it one game at a time” and “Liverpool have failed to win the Premier League”. You’re less likely to hear “Treating the number of goals scored by each team as independent Poisson processes, statistical modelling suggests that the home team have a 60% chance of winning today”. But this is actually a bit of cliché too (it has been discussed here, here, here, here and particularly well here). As we’ll discover, a simple Poisson model is, well, overly simplistic. But it’s a good starting point and a nice intuitive way to learn about statistical modelling. So, if you came here looking to make money, I hear this guy makes £5000 per month without leaving the house.
Poisson Distribution.
HomeTeam AwayTeam HomeGoals AwayGoals 0 Burnley Swansea 0 1 1 Crystal Palace West Brom 0 1 2 Everton Tottenham 1 1 3 Hull Leicester 2 1 4 Man City Sunderland 2 1.
We imported a csv as a pandas dataframe, which contains various information for each of the 380 EPL games in the 2016-17 English Premier League season. We restricted the dataframe to the columns in which we’re interested (specifically, team names and numer of goals scored by each team). I’ll omit most of the code that produces the graphs in this post. But don’t worry, you can find that code on my github page. Our task is to model the final round of fixtures in the season, so we must remove the last 10 rows (each gameweek consists of 10 matches).
You’ll notice that, on average, the home team scores more goals than the away team. This is the so called ‘home (field) advantage’ (discussed here) and isn’t specific to soccer. This is a convenient time to introduce the Poisson distribution. It’s a discrete probability distribution that describes the probability of the number of events within a specific time period (e.g 90 mins) with a known average rate of occurrence. A key assumption is that the number of events is independent of time. In our context, this means that goals don’t become more/less likely by the number of goals already scored in the match. Instead, the number of goals is expressed purely as function an average rate of goals. If that was unclear, maybe this mathematical formulation will make clearer:
represents the average rate (e.g. average number of goals, average number of letters you receive, etc.). So, we can treat the number of goals scored by the home and away team as two independent Poisson distributions. The plot below shows the proportion of goals scored compared to the number of goals estimated by the corresponding Poisson distributions.
We can use this statistical model to estimate the probability of specfic events.
The probability of a draw is simply the sum of the events where the two teams score the same amount of goals.
Note that we consider the number of goals scored by each team to be independent events (i.e. P(A n B) = P(A) P(B)). The difference of two Poisson distribution is actually called a Skellam distribution. So we can calculate the probability of a draw by inputting the mean goal values into this distribution.
So, hopefully you can see how we can adapt this approach to model specific matches. We just need to know the average number of goals scored by each team and feed this data into a Poisson model. Let’s have a look at the distribution of goals scored by Chelsea and Sunderland (teams who finished 1st and last, respectively).
Building A Model.
You should now be convinced that the number of goals scored by each team can be approximated by a Poisson distribution. Due to a relatively sample size (each team plays at most 19 home/away games), the accuracy of this approximation can vary significantly (especially earlier in the season when teams have played fewer games). Similar to before, we could now calculate the probability of various events in this Chelsea Sunderland match. But rather than treat each match separately, we’ll build a more general Poisson regression model (what is that?).
Generalized Linear Model Regression Results Dep. Variable: goals No. Observations: 740 Model: GLM Df Residuals: 700 Model Family: Poisson Df Model: 39 Link Function: log Scale: 1.0 Method: IRLS Log-Likelihood: -1042.4 Date: Sat, 10 Jun 2017 Deviance: 776.11 Time: 11:17:38 Pearson chi2: 659. No. Iterations: 8 coef std err z P>|z| [95.0% Conf. Int.] Intercept 0.3725 0.198 1.880 0.060 -0.016 0.761 team[T.Bournemouth] -0.2891 0.179 -1.612 0.107 -0.641 0.062 team[T.Burnley] -0.6458 0.200 -3.230 0.001 -1.038 -0.254 team[T.Chelsea] 0.0789 0.162 0.488 0.626 -0.238 0.396 team[T.Crystal Palace] -0.3865 0.183 -2.107 0.035 -0.746 -0.027 team[T.Everton] -0.2008 0.173 -1.161 0.246 -0.540 0.138 team[T.Hull] -0.7006 0.204 -3.441 0.001 -1.100 -0.302 team[T.Leicester] -0.4204 0.187 -2.249 0.025 -0.787 -0.054 team[T.Liverpool] 0.0162 0.164 0.099 0.921 -0.306 0.338 team[T.Man City] 0.0117 0.164 0.072 0.943 -0.310 0.334 team[T.Man United] -0.3572 0.181 -1.971 0.049 -0.713 -0.002 team[T.Middlesbrough] -1.0087 0.225 -4.481 0.000 -1.450 -0.568 team[T.Southampton] -0.5804 0.195 -2.976 0.003 -0.963 -0.198 team[T.Stoke] -0.6082 0.197 -3.094 0.002 -0.994 -0.223 team[T.Sunderland] -0.9619 0.222 -4.329 0.000 -1.397 -0.526 team[T.Swansea] -0.5136 0.192 -2.673 0.008 -0.890 -0.137 team[T.Tottenham] 0.0532 0.162 0.328 0.743 -0.265 0.371 team[T.Watford] -0.5969 0.197 -3.035 0.002 -0.982 -0.211 team[T.West Brom] -0.5567 0.194 -2.876 0.004 -0.936 -0.177 team[T.West Ham] -0.4802 0.189 -2.535 0.011 -0.851 -0.109 opponent[T.Bournemouth] 0.4109 0.196 2.092 0.036 0.026 0.796 opponent[T.Burnley] 0.1657 0.206 0.806 0.420 -0.237 0.569 opponent[T.Chelsea] -0.3036 0.234 -1.298 0.194 -0.762 0.155 opponent[T.Crystal Palace] 0.3287 0.200 1.647 0.100 -0.062 0.720 opponent[T.Everton] -0.0442 0.218 -0.202 0.840 -0.472 0.384 opponent[T.Hull] 0.4979 0.193 2.585 0.010 0.120 0.875 opponent[T.Leicester] 0.3369 0.199 1.694 0.090 -0.053 0.727 opponent[T.Liverpool] -0.0374 0.217 -0.172 0.863 -0.463 0.389 opponent[T.Man City] -0.0993 0.222 -0.448 0.654 -0.534 0.335 opponent[T.Man United] -0.4220 0.241 -1.754 0.079 -0.894 0.050 opponent[T.Middlesbrough] 0.1196 0.208 0.574 0.566 -0.289 0.528 opponent[T.Southampton] 0.0458 0.211 0.217 0.828 -0.369 0.460 opponent[T.Stoke] 0.2266 0.203 1.115 0.265 -0.172 0.625 opponent[T.Sunderland] 0.3707 0.198 1.876 0.061 -0.017 0.758 opponent[T.Swansea] 0.4336 0.195 2.227 0.026 0.052 0.815 opponent[T.Tottenham] -0.5431 0.252 -2.156 0.031 -1.037 -0.049 opponent[T.Watford] 0.3533 0.198 1.782 0.075 -0.035 0.742 opponent[T.West Brom] 0.0970 0.209 0.463 0.643 -0.313 0.507 opponent[T.West Ham] 0.3485 0.198 1.758 0.079 -0.040 0.737 home 0.2969 0.063 4.702 0.000 0.173 0.421.
If you’re curious about the smf.glm(. ) part, you can find more information here (edit: earlier versions of this post had erroneously employed a Generalised Estimating Equation (GEE)- what’s the difference?). I’m more interested in the values presented in the coef column in the model summary table, which are analogous to the slopes in linear regression. Similar to logistic regression, we take the exponent of the parameter values. A positive value implies more goals (), while values closer to zero represent more neutral effects (). Towards the bottom of the table you might notice that home has a coef of 0.2969. This captures the fact that home teams generally score more goals than the away team (specifically, =1.35 times more likely). But not all teams are created equal. Chelsea has a coef of 0.0789, while the corresponding value for Sunderland is -0.9619 (sort of saying Chelsea (Sunderland) are better (much worse!) scorers than average). Finally, the opponent* values penalize/reward teams based on the quality of the opposition. This relfects the defensive strength of each team (Chelsea: -0.3036; Sunderland: 0.3707). In other words, you’re less likely to score against Chelsea. Hopefully, that all makes both statistical and intuitive sense.
Let’s start making some predictions for the upcoming matches. We simply pass our teams into poisson_model and it’ll return the expected average number of goals for that team (we need to run it twice- we calculate the expected average number of goals for each team separately). So let’s see how many goals we expect Chelsea and Sunderland to score.




http://kaizenasesores.com/2020/03/24/la ... ment-70969
http://www.oneline.cc/space-uid-1932826.html
https://www.anisopteragames.com/forum/m ... e&u=137187
https://www.krasota-na-million.ru/users/unyundoge/
https://foodieadvice.com/page/contact-us
https://coinguon.us/index.php/contact-us
http://artistgreen.org/contact/
http://birthome.web066.host888.net/home ... uid=584155
https://newtotse.com/phpbb3/posting.php?mode=post&f=2
http://www.dick168.com/home.php?mod=space&uid=46148
http://www.zzlyt.com/home.php?mod=space&uid=47560

Tennis betting tips 1x2
Oddschecker rugby league
Full nfl draft
Solo prediction of today games
mixed matches beats
Nfl lines week 14
Pro soccer tips
Nj sportsbook locations
Best online sportsbook reddit 2020
Best mlb fanduel lineup for tonight
nairobi fixed matches
Nfl picks for week 9 espn
Mlb number 1 overall picks
All sports bet365
Oddshark nba odds
Draftkings lineup this week
Nba mock draft
Victor oladipo nba draft
Biggest bet ever placed on a horse
Nfl week 15 lines 2020
Thunder valley sports betting
Mlb mock draft 2020 2 rounds
Wnba covers matchups
Best sports betting apps iphone
Latest fixed match
Daily accumulator betting tips
C sportsbook
Goaloo football predictions
Bovada sports review
Betting near me
Draftkings new jersey
Early week 8 waiver wire
Nj sports betting law
Bet us mobile sportsbook
Sky bet fixtures
are matches fixed
Week 3 lines college football
fixed matches over
All tomorrow football prediction
uk sure fixed matches
538 football predictions
Nfl picks 2020 espn
Sportingbet promotional code
Statarea weekend fixed matches 2018 youtube
Draft 2020 nfl order
Sports money line
Timed out loading session gta 5
Early odds horse racing
Uk open darts odds
Winners golden bet prediction
Expert picks and parlays
Reverse teaser card
Predict today gg game
ov2 5 fixed matches
Top picks for fantasy basketball 2020
365 fixed matches
Nba betting trends
Odd sharks football
European rugby champions cup odds
Cricket betting exchange sites
Nfl week 8 score predictions 2020
Fixed matches wenger
Statarea weekend fixed matches 2019 17
Week 11 nfl projections
States legal sports betting
Fantasy draft pick 8
Nfl game lines today
First pick 2020 nba draft
Ufc 228 betting odds
Boylesports blog
Just horse racing bet of the day
Nba sleeper picks fantasy
2020 1st pick nba
2020 latest nfl mock draft
Top college nba prospects 2020
Fantasy football standard draft strategy
Nfl 2020 season predictions
2008 mlb draft results
How to bet in basketball
Today&#8217 s winning football predictions
Football picks no spread
topbet fixed matches portal
Soccer predictions ht ft
Dream11 kabaddi tips
Online basketball betting
Bet9ja soccer
Best us betting site
Ncaa week 4 predictions
Top football betting tips
Nfl week 11 survivor picks
UnyPlewly
Oppilas
 
Viestit: 22
Liittynyt: 30.01.2021 13:43
Paikkakunta: UK

Paluu Kirjat

Paikallaolijat

Käyttäjiä lukemassa tätä aluetta: Ei rekisteröityneitä käyttäjiä ja 1 vierailijaa