Why SAS became the CDISC default
The FDA began requiring electronic submissions in the 1990s. SAS was the dominant statistical software at every major pharma company. When CDISC developed SDTM and ADaM standards in the 2000s, they were operationalized in SAS. FDA reviewers learned to use SAS. CROs built their entire data management infrastructure around it.
The result is a path dependency: CDISC implementation guidance is written with SAS examples, FDA review tools (JMP Clinical, Pinnacle 21) were built to validate SAS transport files, and the people who know how to do CDISC submissions learned on SAS.
None of this means SAS is technically required. It means the ecosystem developed around SAS.
The regulatory reality in 2026
The FDA's data standards catalog specifies CDISC format requirements, not SAS. The Study Data Technical Conformance Guide makes no mention of a required software. XPORT format (the SAS transport file format) is still required for submission, but it can be generated from R.
Several companies have now submitted successfully to FDA and EMA using R-generated CDISC datasets. Roche, GSK, and others have published their open-source tooling. The Pharmaverse consortium exists specifically to make R-based CDISC implementation tractable.
What R can do: SDTM with Pharmaverse
The Pharmaverse is a curated set of R packages for clinical reporting. For SDTM, the key packages are sdtm.oak (mapping raw data to SDTM domains) and xportr (producing XPT files with proper metadata).
library(sdtm.oak)
library(xportr)
library(dplyr)
# Map raw demographics to SDTM DM domain
dm <- raw_demographics %>%
derive_dm_age(
age_var = "age_at_consent",
ageu = "YEARS"
) %>%
derive_dm_sex(sex_var = "sex_at_birth") %>%
assign_ct(
ct_spec = load_ct_spec("ct_dm.yaml"),
col_var = "RACE"
)
# Write XPT with proper labels and formats
dm %>%
xportr_label(metadata = dm_metadata, domain = "DM") %>%
xportr_format(metadata = dm_metadata, domain = "DM") %>%
xportr_write("DM.xpt")
The output is a properly structured XPT file that passes Pinnacle 21 validation with no structural errors.
Building an ADaM dataset in R
For ADaM, the admiral package (developed by a Roche-GSK consortium and now maintained by Pharmaverse) provides a tidy, pipe-based API for building ADSL, ADAE, ADLB, and other standard datasets.
library(admiral)
# Build ADSL from SDTM DM, DS, EX
adsl <- dm %>%
derive_vars_merged(
dataset_add = ds,
new_vars = exprs(EOSDT = DSSTDTC),
filter_add = DSCAT == "DISPOSITION EVENT"
) %>%
derive_var_trtdurd() %>%
derive_var_disposition_status(
dataset_ds = ds,
new_var = EOSSTT,
status_var = DSDECOD
)
adsl %>%
xportr_label(adsl_spec, "ADSL") %>%
xportr_write("ADSL.xpt")
The admiral API is verbose but structured to mirror the CDISC ADaM implementation guide. Each derive_* function corresponds to a documented derivation step.
Validation without SAS: xportr and Pinnacle 21
The standard validation workflow for CDISC datasets is Pinnacle 21 Community (free) or Enterprise. It accepts XPT files regardless of how they were generated. The checks are against CDISC conformance rules, not against SAS.
The xportr package adds a pre-submission validation layer in R: it enforces variable labels, formats, and lengths before writing the XPT file, catching metadata errors that would otherwise surface in Pinnacle 21.
What you still need a specialist for
R can do the heavy lifting, but some things still require domain expertise:
- Define.xml generation: the submission metadata file. Tools exist (metacore, defineR) but the output requires careful review against FDA requirements.
- STDM/ADaM reviewer's guide: this is a Word document written by a human who understands both the trial design and CDISC. No tool generates it.
- Controlled terminology alignment: mapping your data collection terms to CDISC CT requires judgment, especially for adverse event coding (MedDRA) and lab parameters (LOINC/CDISC CT).
A practical roadmap
For a small biotech team preparing a first submission with no SAS infrastructure:
- Week 1-2: Set up Pharmaverse environment (renv, admiral, xportr, Pinnacle 21). Build ADSL for your study.
- Week 3-4: Build primary efficacy ADaM dataset (typically ADEFF or ADRS). Run through Pinnacle 21. Resolve structural errors.
- Week 5-6: Build ADAE and ADLB. Generate Define.xml. Review against FDA study data technical conformance guide.
- Ongoing: Document derivations in annotated SDTM and ADaM reviewer's guides.
Key takeaway
CDISC compliance without SAS is not only possible, it is now a well-trodden path for small biotech teams. The Pharmaverse ecosystem has reached production maturity. The remaining gap is not technical: it is the domain expertise to make correct mapping decisions, and documentation that satisfies FDA reviewers.