Penetrating the data smog through better visualization

As anyone who witnessed the UK government’s recent presentation on COVID-19 data will know, data is only useful if it can be read and understood clearly. For those who missed the broadcast, the presentation consisted of many slides of data with the BBC banner on the screen blocking out the titles! The audience of millions was left looking at lines and bar charts that had no context or explanation. Unfortunately such examples can easily be found in the world of sport too.

Sport science is very good at collecting data but how much of that information is able to be digested by the athlete? In this article I shall give a set of guidelines on effective data visualization with an example from one of the athletes that I coach. This is something that I learned on the MOOC Introduction to Data Visualization and Infographics’ led by Alberto Cairo at the Knight Center in Texas.

Why use graphics

The aim of using graphics is to reduce the amount of work that the eyes have to do and to allow the brain to process the information efficiently. The less sorting the brain has to do, the more understanding it can do.

A common error is to try and show as much information as possible or add as many different colors as possible in an effort to look good. I have seen this many times: ‘Look at all the cool colors and charts,’ and then the athletes’ eyes glaze over.

Instead, we should justify every piece of information, every annotation and every line on the graphic. If we can’t justify it then it should be taken out: much like our coaching vernacular and training plans. To do this we also have to really understand what the data means so we can sort out the value from the rubbish. We have to know where the data came from, what is represents, and why it is even needed.

To see how this looks in practice it is best to take a real case study from training. Below I shall take you through my thought process and illustrations when dealing with a small amount of data from one of my athletes. You can then use those ideas with any data that you choose to collect.

Case Study: 17-year-old football midfielder

Subject X is a 17-year-old football midfielder who is looking to transition from youth football to an under-23s developmental side. I had been coaching him once a month before lockdown but moved to weekly during the summer months when he had no school and no football matches.

My coaching eye had told me that his speed and running mechanics needed improvement. His coordination was good and once we had built up his structural integrity we both wanted to improve his speed. Part of this was pragmatic: we could only train outside in the summer. Part of it was purposeful: speed is a decisive element in football (along with skill, decision-making and other elements) and coaches are always looking for faster players.

The goal: to improve running mechanics and top speed. We had a successful summer of training and the subject resumed training and playing. Anecdotal comments such as ‘He’s not pushed off the ball now, people are bouncing off him,’ and ‘He’s been man of the match’ were nice to hear, as was the fact that he had made the starting line-up in almost all of the matches. He had got better as a player.

But, had he got faster?

Luckily, he wears a GPS tracker when playing matches, and I was able to ask for the relevant data. I wanted his top 3 speeds in 3 matches pre-lockdown and his top 3 speeds in 3 matches post lockdown. I didn’t want averages, or every number, I just wanted to see how fast he was in the matches.

Version 1: Raw data

Here is what I was sent (the raw numbers):

Matches from Dec 2019/ Jan 2020 – (1) 7.0, 6.4, 6.3.  (2) 7.06, 7.0, 6.8. (3) 7.12, 7.1, 6.7

Matches from Sept/Oct 2020 – (1) 8.6, 8.2, 7.9.  (2) 8.3, 7.5, 7.5.  (3) 7.9, 7.8, 7.6.

A quick scan shows me that the numbers on the bottom line are higher than the top line. He was running faster in matches. With only 18 numbers my brain could process this information. As there appeared to be a big difference, it was even easier.

Version 2: Tables

I then put the numbers into this table. Surprisingly, this takes more time to process, the words ‘Match’ slow down the scanning, as do the dates.

I thought that segregating the speeds into slow <7m/s, medium 7-7.5m/s and fast >7.5m/s might help, so I added the colors accordingly (bottom left). You can also see that I have added hard grid lines. This was a mistake so I removed them and also abbreviated the word match to M in another version (bottom right). I quite like the final table as my brain doesn’t have to read the word ‘Match’ 6 times and the colors help highlight the different speed categories. It looks like a scorecard and its quite clear to see the improvement in speed. My categorization of slow, medium and fast is completely arbitrary and may have to change over time.

Version 3: Charts

Not everyone can process numbers well, so I tried using a bar chart. The first two iterations where failures. The first choice (below left) is one ugly chart, the colors are meaningless, and there is no legend on the side explaining what the numbers mean. The legend on the bottom takes up as much brain space as the numbers, if not more. Even when I abbreviated the words (below right) it still wasn’t clear. I tried several different variations of the bar chart, sideways, colors and sizes but none seemed clear. The space under the top speed is irrelevant and so a box is the wrong tool: a box better conveys total work done, not top speeds.


Then I tried new chart formats: the line chart and the box stock chart. The line chart (below left) was a step forwards. There is a lot less distracting information and colors that the reader has to process. However, the lines are confusing because they just link up the top speeds from each match, the middle speeds and the bottom speeds. As there is not much of a range in some of the matches, it becomes muddied. The bottom legend is good, there is no side legend.

Finally I tried the box stock chart (below right). There are no meaningless colors, no grid lines, the title is clear and removes the need for the legend on the side. The improvement in speed is clear. This will do for me and for my subject. I am sure I could tinker with it even more, but since I found a graphic that effectively conveys my message, my job is done.

5 lessons in data visualization

Coaching is about effective communication. Data visualization allows us to communicate with our athletes more effectively. Bad graphics obfuscate and confuse. Good graphics enlighten and help understanding.

What can we learn from the above case study? There is no ‘best graphic’ for all the data out there, but there are some best practices. My key take aways from this case study are:

  1. Before we present data we have to collect it. Think about WHY we need the information, WHAT information we need, and finally HOW to collect the information.
  2. If you don’t have to put it into a graphic, don’t. If in doubt, leave it out.
  3. Remove unnecessary grid lines and words. Again, simpler is better. Don’t distract the reader, help them focus.
  4. The software’s default may be simple to use but it is often the incorrect option. Take the extra time to make it fit the purpose.
  5. Know when to finish. Unless this is your full-time job, stop fiddling and get back to coaching.

These points should help you discover simple and cleaner ways of sharing information. Best of luck!