Home Accueil / Blog / Statistics Statistiques

Reproducible statistical analysis plans
with R and Quarto

Plans d'analyse statistique reproductibles
avec R et Quarto

TL;DR

A statistical analysis plan that exists only as a Word document is half a plan. When the SAP and the analysis code are the same document, via Quarto, you eliminate the drift between what you said you would do and what you actually did.

Le SAP comme document vivant

Le plan d'analyse statistique (SAP) est le contrat entre la conception de l'étude et les résultats rapportés. La plupart des SAP sont des documents Word qui décrivent les analyses en langage naturel. Le SAP et le code sont deux artéfacts séparés qui peuvent, et le font souvent, diverger.

Quarto comme format de SAP

Quarto mélange prose et code exécutable. Un SAP Quarto contient à la fois la description en langage naturel et le code R réel. Le document se rend en PDF pour soumission et s'exécute pour l'analyse. Ce sont le même fichier. Cela élimine le problème de dérive.

Blocs de code exécutables dans le SAP

## 3.1 Analyse de l'Endpoint Primaire

```{r primary-analysis, eval=FALSE}
library(lme4); library(emmeans)

model_primary <- lmer(
  hydration ~ treatment * visit + (1 | subject_id),
  data = analysis_data, REML = TRUE
)

emmeans(model_primary, ~ treatment | visit) |>
  contrast("revpairwise") |>
  summary(infer = TRUE)
```

Contrôle de version et verrouillage

git add SAP_v1.0.qmd
git commit -m "SAP v1.0 : version initiale"
git tag -s SAP-v1.0-LOCKED -m "SAP verrouillé par statisticien et PI"

Le tag signé crée un audit trail : toute personne peut vérifier que le SAP n'a pas été modifié après verrouillage.

Le rapport d'écarts

Quand l'analyse est terminée, générez un rapport d'écarts : une comparaison entre le code SAP (depuis le tag verrouillé) et le code d'analyse tel qu'exécuté. Chaque différence est un écart qui doit être documenté avec une justification.

Un modèle de SAP minimal

---
title: "SAP - [ID Étude]"
params:
  analysis_ready: false
---

# 1. Vue d'ensemble de l'étude
# 2. Populations d'analyse
```{r populations, eval=params$analysis_ready}
# créer les flags ITT, PP, safety
```
# 3. Analyse primaire
```{r primary, eval=params$analysis_ready}
# modèle primaire
```

À retenir

Un SAP Quarto n'est pas juste un document: c'est l'analyse. Le même fichier qui décrit vos méthodes est celui qui les exécute. Combiné avec les tags Git signés et un rapport d'écarts, il vous donne un audit trail qu'un document Word ne peut jamais fournir.

The SAP as a living document

The statistical analysis plan (SAP) is the contract between the study design and the reported results. It specifies, before data are seen, what will be analyzed, how, with what covariates, using what population, and how missing data will be handled. Its purpose is to prevent cherry-picking and post-hoc analytic flexibility.

In practice, most SAPs are Word documents that describe analyses in natural language. The actual analysis happens separately in SAS or R scripts. The SAP and the code are two separate artifacts that can, and often do, drift apart.

Why Word documents fail as SAPs

A Word-based SAP has three structural problems:

  • No executable specification: "A mixed model with treatment and visit as fixed effects and subject as a random effect" is ambiguous. Which R function? Which optimizer? Which degrees-of-freedom method?
  • No automatic deviation detection: if the analyst changes the model (e.g., adds a covariate that was not in the SAP), there is no technical mechanism to detect or document this deviation.
  • Versioning is manual: SAP v1.0, v1.1, v2.0-final-FINAL. The history of analytical decisions is buried in filenames and email threads.

Quarto as the SAP format

Quarto is a scientific publishing system that mixes prose and executable code. A Quarto SAP contains both the natural language description of the analysis and the actual R code that will run it. The document renders to PDF for submission and executes for analysis. They are the same file.

This eliminates the drift problem. When you run the analysis, you are running the SAP. Any deviation from the originally committed SAP is detectable through Git history.

SAP structure and required sections

A minimal SAP for a Phase II cosmetic or clinical study should contain:

  • Study overview (design, endpoints, populations)
  • Analysis populations (ITT, PP, safety set: with R code that creates each flag)
  • Primary endpoint analysis (model specification with executable code)
  • Secondary and exploratory analyses
  • Missing data handling
  • Multiple comparison adjustments
  • Sensitivity analyses
  • Software and environment (renv lockfile reference)

Executable code blocks in the SAP

---
title: "Statistical Analysis Plan - Study COSM-2025-01"
version: "1.0"
date: "2025-03-15"
status: "LOCKED - do not edit after this date"
---

## 3.1 Primary Endpoint Analysis

The primary endpoint is the change from baseline in skin hydration score
(Corneometer units) at Day 28. The primary analysis uses a linear mixed model
with treatment group and visit as fixed effects and subject as a random intercept.

```{r primary-analysis, eval=FALSE}
# eval=FALSE in the SAP version; eval=TRUE in the analysis run
library(lme4)
library(emmeans)

model_primary <- lmer(
  hydration ~ treatment * visit + (1 | subject_id),
  data = analysis_data,
  REML = TRUE
)

# Primary contrast: change from BL at Day 28, active vs placebo
emmeans(model_primary, ~ treatment | visit) |>
  contrast("revpairwise") |>
  summary(infer = TRUE)
```

Setting eval=FALSE in the SAP version prevents execution during document rendering. When the analysis run begins, the locked SAP document is copied to an analysis directory, eval flags are set to TRUE, and the code runs.

Version control and locking

# Standard workflow:
# 1. Create SAP, commit to Git
git add SAP_COSM-2025-01_v1.0.qmd
git commit -m "SAP v1.0: initial version"

# 2. When approved/locked, tag the commit
git tag -s SAP-v1.0-LOCKED -m "SAP locked by statistician and PI"
# -s creates a signed tag: cryptographically tied to committer identity

# 3. Any post-lock changes become v1.1 with documented rationale
# 4. Final analysis runs from the locked tag, recorded in the analysis header

The signed tag creates an audit trail: anyone can verify that the SAP was not modified after locking using git verify-tag SAP-v1.0-LOCKED.

The pre-analysis plan and deviations report

When the analysis is complete, generate a deviations report: a comparison between the SAP code (from the locked tag) and the analysis code (as run). In R:

library(diffr)

# Compare SAP code blocks to actual analysis code
diffr(
  "SAP_COSM-2025-01_v1.0.qmd",
  "analysis/COSM-2025-01_analysis.qmd"
)

Every difference is a deviation. Each deviation must be documented with a rationale (data quality issue, protocol amendment, etc.) in the clinical study report.

A minimal SAP template

The minimal structure for a Quarto SAP document:

---
title: "SAP - [Study ID]"
date: "[Date]"
author: "[Statistician name]"
version: "1.0"
params:
  analysis_ready: false  # set to true when analysis begins
---

# 1. Study overview
# 2. Analysis populations
```{r populations, eval=params$analysis_ready}
# create ITT, PP, safety flags
```

# 3. Primary analysis
```{r primary, eval=params$analysis_ready}
# primary model
```

# 4. Secondary analyses
# 5. Missing data
# 6. Software
```{r session-info, eval=params$analysis_ready}
sessionInfo()
renv::snapshot()
```

Key takeaway

A Quarto SAP is not just a document: it is the analysis. The same file that describes your methods is the file that runs them. Combined with Git signed tags and a deviations report, it gives you an audit trail that a Word document can never provide.

AM

Aslane Mortreau

Freelance Data & AI specialist working with pharmaceutical, biotech, and cosmetic R&D teams. Statistical modeling, analytical pipelines, and custom applications.

Spécialiste Data & IA freelance travaillant avec des équipes R&D pharmaceutiques, biotech et cosmétiques. Modélisation statistique, pipelines analytiques et applications sur mesure.