From big data to smart data

There is an arms race in today’s sporting environment. Teams, athletes, coaches and support staff aren’t just fighting for the best facilities and talent, they’re also seeing who can collect the most numbers. This is the era of big data, and the past decade has seen an unrivaled amount of information available in sports from a wide variety of methods, including the use of GPS systems, electronic timing gates, force platforms, blood testing, and general wellness questionnaires. The richness and vastness of information available, however, can also be seen as a curse; both teams and individuals can feel like they have to collect more and more data, in the hope that they can gain an edge over their competitors, and better enhance athletic performance. As with many things, we need to shift the focus on data from quantity to quality.

More might seem like better, but often more information can harm decision making processes, especially when there are a number of potential issues associated with the collection of vast swathes of information. Every time you have to collect data, there is always a convenience cost to the athlete. For example, they have to put on a GPS monitoring unit, open up a smartphone app numerous times per day to input information, remember to charge their sleep tracker, remember to put on their sleep tracker, and present to a doctor or phlebotomist for bloods to be drawn. Where possible, we want these data collection techniques to be a minimally invasive as possible, and, ideally, combined into fewer data collection time points. A further problem is that, in many cases, the technologies utilized are poorly validated, and so may either no be accurate, or may not be providing you with the data you think you’re getting. How do coaches know if it is worth it to start collecting more data?

Key questions in data collection

In my experience, there are three main issues that require consideration before using a specific piece or type of technology in practice:

  1. What will you do with the data you collect?
  2. What changes will you make based on this data?
  3. What is your threshold for making this change?

Understanding what you’re going to do with the data prior to collecting it is crucial—and, in my experience, where most people go wrong. There are a number of reasons why you might collect certain types of data. You might collect wellness data over the course of a season, and then match this data to injuries and illness, to see if the wellness scores correlated with these problems; if so, then you have a somewhat reliable and valid base on which to recommend changes when wellness data in the future starts to match the trends historically associated with illness or injury. You might be collecting performance data, and using this to either track performance, or compare to a benchmark—such as during the return to play process, where you want an athlete to achieve a set percentage of their previous best in a given exercise before they can return to training. Whatever you wish to use the information for is fine, but have a plan in place to ensure that you do use what you collect, and don’t just collect it for the sake of it.

Secondly, if you’re using this data to inform training program design, either in the form of exercise selection, or the monitoring of training load, it’s important to have at least some sort of idea of what changes you will make based on the data, before you collect it. For example, if you’re collecting blood data, will you be using this to inform nutritional recommendations based on the micronutrient composition of that blood? Do you know what the “normal” values are for athletes? Do you know which recommendations to make? All are important considerations, but perhaps the third point in the numbered list is the most important; at what point will you be making the change? If a player’s wellness data is tracking downwards, at what point do you intervene? After one day of downwards trending, or five? Or is this based on a set threshold; if a player drops below this, should training be modified? Having a plan in place regarding what you will do with the information you have, and at which point you will act, assists in preventing the collection of useless data.

Think before you buy

In summary, my belief is that haphazard collection of data is costly, both in terms of finances and time. It creates a burden on the coach, athlete, and support staff, which, if released, could free up resources better able to enhance performance. That said, there is the potential for many important insights to be gained from the use of targeted data collection. As a result, it is important to identify what data points you require, and how you will go about collecting this in a reliable manner.

Prior to collecting specific types of data, you also need to understand what you’re going to use it for and what modifications to make—and when—based on the data you have. Doing so will assist in reducing data waste, which is a major time-sink in high level sport. We can, and should, do better!