What ICH E6(R2) actually requires
ICH E6(R2), the GCP guideline governing clinical trial conduct, requires that all trial data be attributable, legible, contemporaneous, original, and accurate. The 2016 revision extended these ALCOA principles explicitly to electronic systems and analytical outputs: data must be traceable to their source, changes must be auditable, and results must be reproducible from the underlying dataset.
These are not abstract principles. They translate into concrete requirements for how results are generated, stored, and communicated. The question is not whether your statistical analysis is correct. The question is whether you can demonstrate that it is correct, to an auditor who was not in the room when it was run.
The audit question that exposes most teams
"Can you reproduce the result on slide 14 from your locked analysis dataset?" This is a legitimate GCP question. If your answer involves opening R, re-running a script, and hoping the number matches, you have a traceability gap.
Why PowerPoint fails on all three counts
A standard internal reporting workflow - run analysis in R or SAS, copy results into a slide deck, circulate for review - fails the ICH E6(R2) requirements in three specific ways.
Attributability. There is no link between a number on a slide and the dataset and code that produced it. If someone asks which version of the analysis dataset was used, which protocol deviation exclusions were applied, or which version of the analysis code produced that specific output, the slide cannot answer.
Reproducibility. Copying a number from statistical output to a slide is a manual transcription step. Manual transcription introduces errors. ICH E6(R2) Annex C specifically flags manual transcription between systems as a data integrity risk.
Change control. When a dataset is amended after data cleaning, the slide deck does not update automatically. There is no systematic way to verify that every number in a 47-slide deck reflects the current locked dataset.
What a compliant workflow requires
A GCP-compliant internal reporting workflow does not require a specific technology. It requires three properties, regardless of what tools you use:
- Programmatic generation: outputs are computed from source data by code, not produced by manual transcription.
- Version control: the code, the dataset version, and the outputs are linked and versioned together.
- Source linkage: every reported number carries metadata identifying the dataset it came from, the analysis version, and the date it was generated.
SAS with a documented macro library and a locked dataset satisfies these requirements. R with a version-controlled script and renv satisfies them. A Python notebook committed to Git with a pinned environment satisfies them. A PowerPoint file does not, regardless of how carefully it was prepared.
The practical standard regulators apply
FDA and EMA do not audit internal slide decks. What they audit is the clinical study report and the data package that supports it. But the internal reporting workflow matters for two reasons.
First, internal results are often the basis for decisions that affect the trial - dose escalation, cohort expansion, protocol amendments. If those decisions were based on results that cannot be traced to their source, the decision-making process is not GCP-compliant even if the final CSR is clean.
Second, when an auditor asks to see the analytical process, "we put the results in PowerPoint and sent them around" is not an answer that inspires confidence. A documented, reproducible pipeline demonstrates that the team understands data integrity, not just statistics.
A minimum viable compliant pipeline
The bar is not high. A minimum viable compliant pipeline for internal clinical trial reporting looks like this:
- Analysis code in a version-controlled repository (Git), with each analysis tied to a specific commit
- Datasets referenced by path and checksum, so the code fails explicitly if the dataset has changed since it was locked
- Outputs generated by the code directly into the report document, never pasted manually
- The report document itself generated programmatically, so re-running it from the locked dataset produces the identical output
---
title: "Interim Analysis - Study XYZ-001"
date: "`r Sys.Date()`"
params:
dataset_path: "data/adlb_locked_v2.sas7bdat"
lock_checksum: "a3f8c2d1e9b7..."
---
```{r setup, include=FALSE}
library(haven); library(dplyr); library(gt)
adlb <- read_sas(params$dataset_path)
stopifnot(
"Dataset checksum mismatch - verify lock status" =
digest::digest(adlb) == params$lock_checksum
)
```
```{r primary-table}
adlb |>
filter(SAFFL == "Y", PARAMCD == "ALT") |>
group_by(VISIT, ARM) |>
summarise(mean = mean(AVAL, na.rm = TRUE), n = n(), .groups = "drop") |>
gt() |>
tab_header(title = "ALT by Visit and Treatment - Safety Analysis Set")
```