[PDF]

Generating Commentary of Football Matches Using Natural Language Processing


Alec Cook

01/12/2023

Supervised by Oktay Karakus; Moderated by Hantao Liu

Generating Commentary of Sporting Events Using Natural Language Processing

I would like to create a rudimentary model that takes in raw sporting data (focusing on football matches) and produces commentary about the events in a match. I’ve been a big fan of football from a young age and have always had a slightly unhealthy obsession with the football manager game series. I was always intrigued by how the game simulates fake football matches and would like to attempt to replicate it in the real world. There are an untold number of football games every year and a tool like this would be able to help people keep up to date with teams that don’t receive a large amount of media attention. Most importantly, I am very interested in the potential of natural language processing, and this idea is mainly a vehicle for me to be able to explore and learn about this fascinating subject in more depth.

The general idea is to see whether a NLP model can provide engaging and accurate commentary on real-time events as they happen. If a player receives a yellow card, the model should comment on it in the same way a human commentator could by giving such details as the player involved and the reason for the card. This type of commentary will be extended to enough events in a football match as to give a reasonable coverage of all possible events and their permutations.

Furthermore, the project will explore the possibilities of the recent wave of large language models by conducting all of the computatational work 'locally', that is without the aid of cloud computing resources. The project aims to test what is achievable when training and deploying language models on consumer grade hardware. Various training strategies will be formulated and implemeted to examine the behaviour of large language models when confronted with different circumstances. Thus, the project has an experimental edge to it; what can be done on consumer grade hardware and what is the best way to do it?

The main issue will be the acquisition of enough data. Fortunately, thanks to the passion and dedication of football fans, there are swathes of datasets available for free on the internet and football data API’s that will help me with this. I will be relying on my background as a former English teacher to help me direct the model to construct engaging and informative comments. Python will be the main tool for building this project, and I am confident that my proficiency in Python is enough for me to take on this project.

There is a question about copyright and trademark laws if using real player and team names. If this is a problem I could easily anonymise the commentary (E.g “Blue number 12” instead of the player’s name, “The home team” instead of the team’s name) or I could get the express consent of teams to use their name. Further guidance on the ethical considerations is needed. The use of the data itself should present no problem as most leagues (as far as my research shows) allow the use of football statistics for free as long as it is not for commercial use.

I am open to amendments both large and small if this is not a feasible project. My motivation for writing this proposal is to demonstrate my interest in NLP and to attract the interest of suitable supervisors.


Final Report (01/12/2023) [Zip Archive]

Publication Form