28 June 2016

Introduction

Outline

  • Motivation
  • Functionality of trackeR
  • Code examples and implementation detail

Questions to answer in the next 15 minutes:

  • Why care?
  • What can you do with trackeR?
  • How does it work?

Motivation

  • Increasing amount of fitness and sports data from GPS-enabled tracking devices.
  • Proprietary software for analysis from device manufacturers and various apps.
  • Open-source software Golden Cheetah for cycling analysis.
  • Typically tools for descriptive analysis, unlike the many opportunities in R.
  • trackeR aims to bridge the gap between the routine collection of such data and their analysis in R.
  • Relatively few R packages related to sports: sport management, ranking teams, scraping betting odds, etc and cycleRtools/elpatron for cycling data.

Package structure

What can you do?

  • Read data from Training Center XML (TCX) files, SQLite databases and Golden Cheetah JSON files.
  • Store the data in session-based, unit- and operation-aware objects of class trackeRdata.
  • Basic operations: summarise, plot, handle units, etc.
  • Analyse time in zones, work capacity above critical power/speed, distribution of training time.
  • Utilise the rich tool set of statistical methodology provided in R base and other packages.

Read data

Package structure: read functionality

Read data

library("trackeR")
filepath <- system.file("extdata", "2013-06-08-090442.TCX", 
                        package = "trackeR")
runDF <- readTCX(file = filepath, timezone = "GMT")
str(runDF)
## 'data.frame':    1191 obs. of  9 variables:
##  $ time      : POSIXct, format: "2013-06-08 08:04:42" ...
##  $ latitude  : num  51.4 51.4 51.4 51.4 51.4 ...
##  $ longitude : num  1.04 1.04 1.04 1.04 1.04 ...
##  $ altitude  : num  6.2 6.2 6.2 6.2 6.2 ...
##  $ distance  : num  0 1.68 5.28 8.33 14.88 ...
##  $ heart.rate: num  83 84 84 86 89 93 96 98 101 102 ...
##  $ speed     : num  0 0.594 1.416 1.928 2.614 ...
##  $ cadence   : num  NA 54 74 97 97 97 97 98 97 97 ...
##  $ power     : num  NA NA NA NA NA NA NA NA NA NA ...

Read data

runTr0 <- trackeRdata(runDF)
runTr1 <- readContainer(filepath, type = "tcx", timezone = "GMT")
identical(runTr0, runTr1)
## [1] TRUE
runTr2 <- readDirectory(system.file("extdata", package = "trackeR"), 
                        timezone = "GMT")
## Reading file /home/frick/lib/R/trackeR/extdata/2013-06-08-090442.TCX (file 1 out of 1) ...
## Cleaning up...Done
identical(runTr0, runTr2)
## [1] TRUE

Session-based data structure

  • Very basic data cleaning: no missing time stamps, heart rate of 0 set to NA.
  • Split observations in sessions based on ordered time stamps: any observations further apart than a threshold (default: 2 hours) are considered to be in different sessions.
  • Short gaps in recordings can occur within sessions, usually due to a reduced sampling rate or to the device being paused. We impute zero speed/power during such periods.
  • Store each session in a multivariate zoo object.
  • A trackeRdata object is a list of session objects with attributes for units of measurement and operations such as smoothing.

Example data

  • 27 sessions by a male runner in June 2013 available via
data("runs", package = "trackeR")
  • Different visualisations available
plot(runs, session = c(3,13))
plotRoute(runs, session = 3, source = "osm")
leafletRoute(runs, session = c(6,7,10,12,13,21))

Visualise sessions

Visualise sessions

Visualise sessions

Summarise sessions

summary(runs, session = 1)
##  *** Session 1 ***
##  
##  Session times: 2013-06-01 17:32:15 - 2013-06-01 18:37:56 
##  Distance: 14130.7 m 
##  Duration: 1.09 hours 
##  Moving time: 1.07 hours 
##  Average speed: 3.59 m_per_s 
##  Average speed moving: 3.67 m_per_s 
##  Average pace (per 1 km): 4:38 min:sec
##  Average pace moving (per 1 km): 4:32 min:sec
##  Average cadence: 88.66 steps_per_min 
##  Average cadence moving: 88.87 steps_per_min 

Summarise sessions

##  Average power: NA W 
##  Average power moving: NA W 
##  Average heart rate: 141.11 bpm 
##  Average heart rate moving: 141.13 bpm 
##  Average heart rate resting: 136.76 bpm 
##  Work to rest ratio: 42.31 
##       
##  Moving threshold: 1 m_per_s 
runsSummary <- summary(runs)
plot(runsSummary, what = c("avgSpeed", "distance"))

Summarise sessions

Distribution and concentration profiles

Summarise sessions with regard to speed, heart rate, etc:

  • Via time spent above a (single) arbitrary threshold such as maximum aerobic speed.
  • Via time in zones, i.e., the differences of time spent above a set of thresholds.
  • Extend this idea: a distribution profile is a function of the threshold, returning the time spent exercising above that threshold (Kosmidis and Passfield, 2015).
  • A concentration profile is the negative first derivative of the distribution profile, revealing concentrations around certain speeds, heart rates, etc.

Illustration

Illustration

Distribution and concentration profiles

## Change speed unit to miles per hour
runs <- changeUnits(runs, variable = "speed", unit = "mi_per_h")
## Calculate and plot distribution profiles
dProfiles <- distributionProfile(runs, what = "speed", 
  grid = seq(0, 22, by = 0.1))
plot(dProfiles, multiple = TRUE)
## Calculate and plot concentration profiles
cProfiles <- concentrationProfile(dProfiles)
plot(cProfiles, multiple = TRUE)
## Functional principal components analysis (PCA)
cpPCA <- funPCA(cProfiles, what = "speed", nharm = 4)
plot(cpPCA, harm = 1:2)

Distribution and concentration profiles

Functional PCA

Summary

  • Motivation: bring fitness and sports tracking data to the analytic capacity of R.
  • Functionality: Read data, session-based data structure, basic analytic tools and visualisations.
  • Available from CRAN and GitHub:
  • Further information in vignettes:
    • trackeR: more details on implementation.
    • Tour de trackeR: a brief tour of functionality.
  • Future work: more input formats, stronger link with Golden Cheetah, incorporate team structure, …

References

  • Kosmidis I, Passfield L (2015). "Linking the Performance of Endurance Runners to Training and Physiological Effects via Multi-Resolution Elastic Net." ArXiv e-print arXiv:1506.01388.
  • Zeileis A, Grothendieck G (2005). "zoo: S3 Infrastructure for Regular and Irregular Time Series." Journal of Statistical Software, 14(6), 1-27.
  • Wickham H (2009). "ggplot2: Elegant Graphics for Data Analysis." Springer-Verlag New York.
  • Ramsay JO, Silverman BW (2005). "Functional Data Analysis." Springer-Verlag New York.
  • Pya N, Wood SN (2015). "Shape Contrained Additive Models." Statistics and Computing, 25(3), 543-559.