Abstract 2: Trained on what? A case study on the misapplication of machine learning in women’s ice hockey

Résumé

With increased awareness around the use of advanced analytics in professional sport, the market for third party providers of video and statistical analysis has grown. Third party providers provide coaches, athletes, and organizations with statistical information, which often includes outputs from proprietary machine learning models. In ice hockey, one of the most common modelling approaches are expected goal models, which assign a value to each shot attempt, allowing for teams to gather a better understanding of their game play beyond the box score. In this study, an expected goal model was created for use in women’s collegiate ice hockey by collecting over 14000 unblocked shot attempts during the 2023 and 2024 Ontario University Athletics (OUA) women’s ice hockey season. Using a single team as a case study, our model predicted that a total of 112.43 goals would be scored in games featuring this team, while the third-party proprietary model predicted there would be 179.4 goals. Ultimately, 126 non-empty net goals were scored, indicating an extremely large discrepancy between the third-party model and the team’s actual output. The findings from this case study highlight multiple issues surrounding transparency in proprietary models, the overapplication of machine learning to inappropriate populations, and ethical concerns surrounding the implications of incorrect performance evaluations placed on athletes of all ages. Using this case study, we encourage coaches and practitioners to aim to understand the information sources available to them, and to appropriately vet any technologies used in sport before implementation.