Why Excel NCA is a regulatory risk
Most pharmacokineticists learn NCA in Excel. It is fast to set up for a small study and produces readable output. The problem emerges at scale and under regulatory scrutiny.
An Excel-based NCA workflow has four structural deficiencies in a GxP context:
- No audit trail: cell edits are not logged unless you specifically enable change tracking, which most users do not.
- Formula fragility: a drag-fill error, a row insertion, or a filter left applied can silently produce wrong results.
- No reproducibility: running the same spreadsheet on a different machine, with a different Excel version, can produce different outputs due to floating-point handling.
- Manual CDISC mapping: translating Excel output to CDISC PC domain is a manual, error-prone step that typically happens at the last minute before submission.
The FDA has cited Excel-based PK analysis errors in Complete Response Letters. This is not a theoretical risk.
Setting up the PKNCA pipeline
PKNCA is the primary R package for NCA. It is actively maintained, peer-reviewed, and produces output that maps directly to CDISC terminology. Start with a clean project structure:
# Install and snapshot with renv
renv::init()
install.packages(c("PKNCA", "dplyr", "tidyr", "readr"))
renv::snapshot()
# Project structure
# pk_nca/
# data/raw/ # locked raw data, never modified
# data/processed/ # cleaned, CDISC-structured input
# R/01_prepare.R # data preparation
# R/02_nca.R # NCA execution
# R/03_output.R # CDISC output generation
# reports/nca_report.qmd
Data preparation: the CDISC PC structure
PKNCA expects concentration data in a specific structure. The good news is it maps almost directly to the CDISC PC domain. Key variables:
library(dplyr)
library(PKNCA)
# Required columns: USUBJID, PCTPTNUM (nominal time), PCSTRESN (concentration)
# PCSTAT handles BLQ values
pc_data <- raw_plasma %>%
transmute(
USUBJID = subject_id,
PCTPTNUM = nominal_time_h,
PCSTRESN = ifelse(concentration_ngml < lloq, lloq / 2, concentration_ngml),
PCSTAT = ifelse(concentration_ngml < lloq, "BLQ", ""),
PCLLOQ = lloq,
PCSTRESU = "ng/mL",
VISIT = visit_label,
PERIOD = period
)
# Dose data
ex_data <- raw_dosing %>%
transmute(
USUBJID = subject_id,
EXDOSE = dose_mg,
EXSTDTC = dose_datetime
)
Running the NCA and extracting parameters
Once the data is structured, the NCA itself is a few lines:
conc_obj <- PKNCAconc(pc_data,
PCSTRESN ~ PCTPTNUM | USUBJID / PERIOD,
time.nominal = "PCTPTNUM")
dose_obj <- PKNCAdose(ex_data,
EXDOSE ~ 0 | USUBJID / PERIOD)
data_obj <- PKNCAdata(conc_obj, dose_obj,
intervals = data.frame(
start = 0, end = Inf,
auclast = TRUE, cmax = TRUE, tmax = TRUE,
half.life = TRUE, aucinf.obs = TRUE
))
results <- pk.nca(data_obj)
nca_df <- as.data.frame(results)
Generating a CDISC-compliant output
PKNCA parameter names map directly to CDISC PP domain controlled terminology. The mapping is straightforward:
cdisc_map <- c(
"auclast" = "AUClast",
"cmax" = "Cmax",
"tmax" = "Tmax",
"half.life" = "t1/2",
"aucinf.obs" = "AUCinf"
)
pp_domain <- nca_df %>%
filter(PPTESTCD %in% names(cdisc_map)) %>%
transmute(
USUBJID = USUBJID,
PPTESTCD = cdisc_map[PPTESTCD],
PPSTRESC = as.character(round(PPORRES, 4)),
PPSTRESN = round(PPORRES, 4),
PPSTRESU = PPUNIT,
PPSPEC = "PLASMA"
)
readr::write_csv(pp_domain, "data/processed/pp_domain.csv")
Validation and QC checks
A validated NCA pipeline needs automated QC checks. The minimum set:
# Check 1: No subjects missing from output
stopifnot(all(unique(pc_data$USUBJID) %in% unique(pp_domain$USUBJID)))
# Check 2: AUCinf > AUClast for all subjects
auc_check <- pp_domain %>%
filter(PPTESTCD %in% c("AUClast", "AUCinf")) %>%
tidyr::pivot_wider(names_from = PPTESTCD, values_from = PPSTRESN) %>%
mutate(flag = AUCinf < AUClast)
if (any(auc_check$flag, na.rm = TRUE)) warning("AUCinf < AUClast for some subjects")
# Check 3: Tmax within observation window
stopifnot(all(pp_domain$PPSTRESN[pp_domain$PPTESTCD == "Tmax"] <= max(pc_data$PCTPTNUM)))
Integrating into a reproducible report
The final step is wrapping the pipeline in a Quarto document that executes every step from raw data to formatted tables. When the locked data changes (protocol amendment, late samples), re-rendering the document regenerates all outputs in seconds with zero manual intervention.
---
title: "NCA Report - Study STUDY-001"
params:
study_id: "STUDY-001"
data_version: "v1.0_locked"
execute:
echo: false
cache: false
---
```{r}
source("R/01_prepare.R")
source("R/02_nca.R")
source("R/03_output.R")
knitr::kable(nca_summary_table, caption = "NCA Parameter Summary")
```
Key takeaway
An R-based NCA pipeline using PKNCA takes a few days to set up correctly and pays back that investment on the first study it runs. The CDISC mapping is nearly automatic, the QC checks catch errors that Excel would silently propagate, and the Quarto report means you never manually update a table again.