The SAP as a living document
The statistical analysis plan (SAP) is the contract between the study design and the reported results. It specifies, before data are seen, what will be analyzed, how, with what covariates, using what population, and how missing data will be handled. Its purpose is to prevent cherry-picking and post-hoc analytic flexibility.
In practice, most SAPs are Word documents that describe analyses in natural language. The actual analysis happens separately in SAS or R scripts. The SAP and the code are two separate artifacts that can, and often do, drift apart.
Why Word documents fail as SAPs
A Word-based SAP has three structural problems:
- No executable specification: "A mixed model with treatment and visit as fixed effects and subject as a random effect" is ambiguous. Which R function? Which optimizer? Which degrees-of-freedom method?
- No automatic deviation detection: if the analyst changes the model (e.g., adds a covariate that was not in the SAP), there is no technical mechanism to detect or document this deviation.
- Versioning is manual: SAP v1.0, v1.1, v2.0-final-FINAL. The history of analytical decisions is buried in filenames and email threads.
Quarto as the SAP format
Quarto is a scientific publishing system that mixes prose and executable code. A Quarto SAP contains both the natural language description of the analysis and the actual R code that will run it. The document renders to PDF for submission and executes for analysis. They are the same file.
This eliminates the drift problem. When you run the analysis, you are running the SAP. Any deviation from the originally committed SAP is detectable through Git history.
SAP structure and required sections
A minimal SAP for a Phase II cosmetic or clinical study should contain:
- Study overview (design, endpoints, populations)
- Analysis populations (ITT, PP, safety set: with R code that creates each flag)
- Primary endpoint analysis (model specification with executable code)
- Secondary and exploratory analyses
- Missing data handling
- Multiple comparison adjustments
- Sensitivity analyses
- Software and environment (renv lockfile reference)
Executable code blocks in the SAP
---
title: "Statistical Analysis Plan - Study COSM-2025-01"
version: "1.0"
date: "2025-03-15"
status: "LOCKED - do not edit after this date"
---
## 3.1 Primary Endpoint Analysis
The primary endpoint is the change from baseline in skin hydration score
(Corneometer units) at Day 28. The primary analysis uses a linear mixed model
with treatment group and visit as fixed effects and subject as a random intercept.
```{r primary-analysis, eval=FALSE}
# eval=FALSE in the SAP version; eval=TRUE in the analysis run
library(lme4)
library(emmeans)
model_primary <- lmer(
hydration ~ treatment * visit + (1 | subject_id),
data = analysis_data,
REML = TRUE
)
# Primary contrast: change from BL at Day 28, active vs placebo
emmeans(model_primary, ~ treatment | visit) |>
contrast("revpairwise") |>
summary(infer = TRUE)
```
Setting eval=FALSE in the SAP version prevents execution during document rendering. When the analysis run begins, the locked SAP document is copied to an analysis directory, eval flags are set to TRUE, and the code runs.
Version control and locking
# Standard workflow:
# 1. Create SAP, commit to Git
git add SAP_COSM-2025-01_v1.0.qmd
git commit -m "SAP v1.0: initial version"
# 2. When approved/locked, tag the commit
git tag -s SAP-v1.0-LOCKED -m "SAP locked by statistician and PI"
# -s creates a signed tag: cryptographically tied to committer identity
# 3. Any post-lock changes become v1.1 with documented rationale
# 4. Final analysis runs from the locked tag, recorded in the analysis header
The signed tag creates an audit trail: anyone can verify that the SAP was not modified after locking using git verify-tag SAP-v1.0-LOCKED.
The pre-analysis plan and deviations report
When the analysis is complete, generate a deviations report: a comparison between the SAP code (from the locked tag) and the analysis code (as run). In R:
library(diffr)
# Compare SAP code blocks to actual analysis code
diffr(
"SAP_COSM-2025-01_v1.0.qmd",
"analysis/COSM-2025-01_analysis.qmd"
)
Every difference is a deviation. Each deviation must be documented with a rationale (data quality issue, protocol amendment, etc.) in the clinical study report.
A minimal SAP template
The minimal structure for a Quarto SAP document:
---
title: "SAP - [Study ID]"
date: "[Date]"
author: "[Statistician name]"
version: "1.0"
params:
analysis_ready: false # set to true when analysis begins
---
# 1. Study overview
# 2. Analysis populations
```{r populations, eval=params$analysis_ready}
# create ITT, PP, safety flags
```
# 3. Primary analysis
```{r primary, eval=params$analysis_ready}
# primary model
```
# 4. Secondary analyses
# 5. Missing data
# 6. Software
```{r session-info, eval=params$analysis_ready}
sessionInfo()
renv::snapshot()
```
Key takeaway
A Quarto SAP is not just a document: it is the analysis. The same file that describes your methods is the file that runs them. Combined with Git signed tags and a deviations report, it gives you an audit trail that a Word document can never provide.