Digital twins and the future of data modeling in sport

On April 11, 1970, the crew of Apollo 13 blasted off from Kennedy Space Centre at the start of their mission to the moon. Following the recent successes of Apollo missions 11 and 12, James Lovell and Fred Haise were due to become the fifth and sixth humans to walk on the moon. However, just under 56 hours after taking off, and 330,000 km from Earth, disaster struck.

During a reasonably routine procedure—stirring the oxygen tanks—a large bang was heard, and the crew momentarily lost radio contact with Mission Control. Eventually, following resumption of communication with Earth, it was determined that the spacecraft had suffered an explosion in the oxygen tank, which powered the fuel cells—severely limiting Apollo 13’s ability to produce power.

Finding remote solutions

Whilst not in immediate danger, there were two issues facing NASA and the crew. Firstly, due to the damage to the Command Module, the crew moved to the Lunar Module. This created an issue with carbon dioxide accumulation; the original plan was for Lovell and Haise to spend 45 hours on the moon in the Lunar Module, but now the crew were living in it; as a result, the carbon dioxide clearance capacity of the lunar module was insufficient to support three astronauts for the duration of the “new” mission. The second problem was that, in order to achieve re-entry into the Earth’s atmosphere, the crew had to re-start the power on the Command Module, without draining the batteries—something they had no idea how to do.

To solve both these problems, NASA turned to their model of the spacecraft back on Earth. With regards to the carbon dioxide problem, the ground crew made a list of all the items on the spacecraft, and then used them to build a working CO2 scrubber, passing this information to the astronauts in space. To deal with the re-start issue, again the ground crew spent hours testing various different start-up procedures in their simulator to see which process used the least power; eventually, they came to a workable solution, which the astronauts used and were able to return safely to Earth.

The value of simulation

This example, now 50 years old, shows the value of simulation. The ground staff at NASA had, almost exclusively, never been into space, and yet they were able to replicate the conditions faced by their teammates, hundreds of thousands of kilometers away, to devise an innovative solution. The lessons from this approach are now being recycled into a sports technology innovation currently being explored by a variety of organizations today; that of a digital twin.

A digital twin is defined as a digital replica of a living or non-living entity. Whereas NASA’s simulation existed as a bricks-and-mortar model, digital twins exist virtually. Here, various pieces of data are fed into the virtual model, in order to calculate and predict the output. The digital twin owner can then manipulate a variety of variables to explore how this changes the outcome. This approach has been used by Babylon Health, who use a variety of health inputs to create a person’s digital twin, and model how lifestyle changes may affect this in future—hopefully stimulating that person to change any negative behaviors they may have.

The potential for the use of digital twins within sport is huge. Sport is essentially a long process of trial, error, and refinement. We try something, see what happens, and then make changes based on the results. Digital twins provide the opportunity of trying multiple different interventions, all at once, and seeing which is the most effective. For example, when it comes to manufacturing a prosthesis for a Paralympic sprinter, the various different components could be adjusted and tested adjustments virtually, and then the most effective set up used. Perhaps more promisingly, the outcomes of training sessions might be able to be tested in advance. In this case, the coach wants a specific adaptation to take place, without undue fatigue or an injury occurring. By using an athlete’s digital twin, the coach could try and variety of different possible training options to see which delivers the best result, and then use this particular session in real life.

No simulation without information

At present, one of the biggest problems affecting the use of digital twins in sport is the quality and depth of information required to accurately model a response. Returning to the training session example, the digital twin model would require accurate information across a wide variety of areas, including athlete genotype, previous training history, life stress, current biochemistry, and many more. Collecting this information is both time consuming and invasive, and, at least in the case of genotype, we’re still not entirely sure of the influence of a variety of genes on exercise response. Given the complexity in developing a personal digital twin, we’re likely some way off from developing accurate models and representations.

Whilst the area of personalized training response modeling is perhaps some way off, there is plenty of potential around modeling the aggregate/average, an approach which is currently used by a couple of different sporting organizations. The NFL, for example, recently signed a deal with Amazon Web Services to simulate the effects of changes in rules and equipment on player safety, allowing them to make the changes which best support player health. Performance analysts may be able to create a digital model of both their team and their upcoming opponents, in order to understand how changes in tactics or selection may influence the outcome of the game—although again this requires reliable information to be fed into the model.

From data to digital twins

In summary, as we move into an era of sport—and life in general—in which more data than ever is both produced and captured, the potential to utilize this data effectively is massive. One promising area of this is that of digital twin, in which collected data is used to provide simulations and inform decisions. This can be really basic, such as choice of equipment or when to make a substitution, or increasingly complex, forecasting injury risk or training response. As always with the use of technology, there are ethical questions around the collection and use of data, especially when the technology is intrusive, or the decisions made involve selection. Nevertheless, the ability to simulate a variety of different scenarios, and therefore avoiding costly error, has the potential to be incredibly useful across a variety of domains, not just sport—as illustrated by NASA all those years ago.