How to talk a good soccer game with data
Sunday, June 09, 2024
Real Madrid’s Toni Kroos, made a total of 94 passes and completed 91 of them in the final, a 97 per cent completion rate.

Soccer and similar sports matches resemble war. One must be well prepared to get into it. Fans are part of it, cheering their teams and taunting their opponents from stadium terraces and social media, or at a sports bar in a city like Kigali.

Sun Tzu, the much-quoted Chinese military general and strategist once advised, "Know thy enemy and know yourself; in a hundred battles, you will never be defeated.”

It can’t be any different in soccer or sports, or even in life where we are urged to understand a problem to craft a fitting solution.

Sun Tzu’s injunction is however more dramatic. But the real issue is how things have changed since his advice more than 2000 years ago.

The application of data science denotes that change. In military spheres as in soccer and similar sports, every conceivable data on thy enemy as on thyself is how battles are now gauged, win or lose.

Ardent soccer fans are already well-versed in this. But let me illustrate. The verdict by many observers during the European Champions League final two weeks ago between Spain’s Real Madrid and Germany’s Borussia Dortmund was that Dortmund appeared the better team.

The game statistics, or stats in sports parlance, seemed to prove it. Expected goals (xG), a statistical metric measuring the quality of goal-scoring chances, were quite good for Dortmund in the first half than Real’s at xG 1.16 to 0.09.

In the second half, Real took the lead, but the stats were lower at xG 1.05 against Dortmund’s 0.07.

If how they arrived at these calculations confounds you as they do me, it only emphasises how complex things have become in the name of sports analytics.

Recall that while goal-scoring chances are created by teamwork, it also takes individual flare.

At the individual level, therefore, is what is called possession value, a measure of how a player’s actions increase or decrease their team’s chances of scoring. Data for every kick, pass, and run is captured.

Real Madrid’s Toni Kroos, for instance, made a total of 94 passes and completed 91 of them in the final, a 97 per cent completion rate.

The next highest number of passes at 74 also came from Real by Antonio Rüdiger. No Dortmund player registered more than 65.

Did these stats play a role in Real taking the title for the 15th time? Maybe they did. Some pundits are not convinced and say Real got lucky.

But we are in 2024 where everybody is potentially an informed pundit. Each team and ardent fan usually have the stats at their fingertips and bravely enter the game intimately knowing their enemy.

How did we reach here? As chronicled on soccertake.com, reference has often been made to the 2011 movie Moneyball which starred Brad Pitt. It tells the story of a minor baseball team that used data analytics to compete with bigger and wealthier teams.

The team used sabermetrics – baseball statistics that measure player and game activity – to scout and analyse undervalued players to build the winning team it became.

A key lesson was the recognition of previously undervalued players, whose brilliance was often lost in the glare of the goal scorers.

The movie is credited with introducing the idea of using data science in sports. It showed how sports analytics can change the fortunes of not only a poor and perennially losing team but also the fortunes of a big team with lots of mean big-team competitors.

It, however, was a build-up to reaching that point. Before Moneyball, metrics such as assists, saves, and shots on goal were recorded, but they were rudimentary and team-oriented.

Things started to change with the spread of computer technology in the 1990s which saw the digitisation of the data and the early beginnings of gauging player performance.

The Moneyball revolution saw the introduction of wearable tech to track player biometric data, in addition to using satellite technology like GPS and video analysis to capture player movement and positioning throughout a game.

With this leap came statistical models and software to interpret the data. And today, with the advent of artificial intelligence large language models such as ChatGPT a couple of years ago, it is about predictive analytics by employing algorithms to identify the best strategies through game simulations to improve winning chances.

This has led some to call data the 12th player in the pitch. And as TV sports pundits have thrashed out what happened and why it happened, the fans have caught on.

You might hear them rattling off to one another the pre- and post-match stats in anticipation of and after a hot future match—whether the match be real or one of those fantasy computer football games that are all the rage.