| |

Genius Sports Project: Analysing Data Logging Errors in English and Scottish Football

A striking regional disparity emerged: 64% of major mistakes in the Scottish Premiership were VAR-related, compared to 37.5% in England, highlighting significant differences in match event data reporting. Analysis showed that 50% of major statistician errors in the English Premier League occurred in matches decided by a single goal, highlighting how data logging mistakes are influenced by tightly contested games.

This project analysed officiating error data provided by Genius Sports, a global sports data technology company responsible for delivering real-time match data to leagues, media organisations, and sportsbooks worldwide. The study focused on football competitions in England and Scotland between the 2021/22 and 2025/26 seasons, with the objective of identifying patterns in officiating mistakes and improving data quality within the statistician reporting network.

Using Microsoft Power BI and Data Analysis Expressions (DAX), the project transformed raw match-event datasets into structured analytical dashboards capable of revealing patterns in refereeing decisions, error severity, and regional variation between leagues.

Aim and Objectives

The primary goal of the analysis was to convert raw Genius Sports event data into a reliable analytical framework capable of identifying recurring officiating error patterns.

Key objectives included:

  • Cleaning and standardising match-event data across multiple competitions and seasons
  • Building Power BI dashboards capable of filtering mistakes by competition, severity, and season
  • Using DAX measures to enable time-series trend analysis across football seasons (2021/22–2025/26)
  • Comparing error distribution between English and Scottish football leagues

These objectives ensured that the dataset could be analysed consistently while providing meaningful insights into the performance of data collection processes across competitions.

Methodology

The analysis followed a structured data analytics workflow consisting of three main stages: data preparation, modelling, and visualisation.

Data Cleaning and Preparation

The original dataset contained 919 rows of match-event data. After cleaning, the dataset was reduced to 895 valid football records.

Key data preparation steps included:

  • Removing 14 ice hockey records and 1 basketball record that were incorrectly included
  • Removing 9 rows with missing mistake-type information
  • Correcting a misclassified record (Hibernian vs St Johnstone) that had been incorrectly labelled as an English match instead of a Scottish Premiership fixture

These steps ensured the dataset accurately represented football competitions in England and Scotland.

Data Modelling

Data modelling was carried out using Power BI and DAX.

Key modelling techniques included:

  • Creation of a Date Table using the CALENDARAUTO() function to enable time-series analysis
  • Development of calculated columns to standardise competition names (removing season-year text from competition titles)
  • Classification of teams into Senior, U23, and U21 categories
  • Establishing relationships between the dataset and date hierarchy to enable seasonal filtering and analysis

These transformations enabled dynamic analysis of error trends across multiple seasons and competitions.

Key Findings

Distribution of Officiating Errors

Analysis revealed a significant difference in the number of recorded mistakes between England and Scotland.

Between 2022 and 2025:

  • England accounted for 75.9% of all mistakes (679 incidents)
  • Scotland accounted for 24.1% (216 incidents)

In both countries, moderate mistakes were significantly more common than major mistakes, with yellow cards and corner decisions representing the most frequent error types.

Most Common Error Types

Across the dataset, the most frequent officiating mistakes were:

  • Yellow Card decisions – 334 mistakes
  • Corner decisions – 246 mistakes
  • Goal decisions – 89 mistakes
  • Red Card decisions – 84 mistakes

Although moderate mistakes occurred more frequently, major errors such as goals, penalties, and red cards were particularly significant because they had the potential to alter match outcomes.

VAR Decision Errors

One of the most important insights from the analysis was the disparity in VAR-related mistakes between England and Scotland.

The data showed that:

  • 64% of major mistakes in the Scottish Premiership were linked to ongoing VAR decisions
  • In comparison, VAR-related errors accounted for 37.5% of major mistakes in England

This difference suggests potential challenges associated with the later implementation of VAR in Scotland and highlights the importance of training and operational consistency in VAR decision-making.

Competitive Impact of Errors

Further analysis revealed that 50% of major officiating errors in the English Premier League occurred in matches decided by a single goal. This indicates that inaccurate event reporting or officiating mistakes may have a substantial impact on match outcomes and competitive balance.

Limitations

Several limitations were identified during the analysis.

Firstly, there was a difference in the number of statisticians monitoring matches in England and Scotland. Approximately 177 statisticians monitored English football, compared to 44 in Scotland, which may influence reporting consistency and sample size.

Secondly, VAR was implemented earlier in England (August 2019) compared to Scotland (October 2022), meaning training time and operational experience differed between the two systems.

Finally, some mistakes—particularly VAR-related decisions—require contextual interpretation that statisticians monitoring live match data may not always have access to in real time.

Conclusion and Client Value

This case study demonstrates how data analytics tools such as Power BI and DAX can be used to monitor operational performance within sports data environments.

By transforming raw match-event datasets into structured analytical dashboards, the project identified key patterns in officiating errors and provided actionable insights for improving data accuracy and reporting processes.

Several recommendations emerged from the analysis:

  • Implement targeted training for statisticians covering Scottish Premiership matches
  • Improve procedures for recording VAR decisions, potentially allowing delayed logging when decisions require extended review
  • Provide regular referee education sessions to ensure consistent interpretation of football rules
  • Closely monitor matches decided by narrow scorelines, where officiating mistakes are most likely to influence outcomes

Overall, the project highlights the growing role of sports analytics and performance monitoring in maintaining the accuracy and integrity of football data systems.

Contribution

I contributed across all stages of the project, including data preparation, analytical modelling, dashboard development, and the interpretation of findings. This involved cleaning and standardising the dataset, removing non-football records, and correcting inconsistencies to ensure accurate analysis. I also worked on developing the Power BI data model, implementing DAX measures and calculated columns to enable time-series filtering, competition-level comparisons, and mistake-type analysis. In addition, I supported the development of the visual dashboards used to explore error distribution across competitions in England and Scotland, and contributed to the interpretation of key insights such as VAR-related mistakes, error severity patterns, and the potential impact of data logging errors on closely contested matches.

Acknowledgements

This project was completed collaboratively with my MSc Data Analytics group members Rohan Sharma, Ishita Jadwan, Prashanth Babu, and Nikita Musquri. I would also like to thank our course instructors and Genius Sports for providing the dataset used in this analysis.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *