What AlphaFold actually proved
AlphaFold 2 is a genuine scientific achievement. It solved the protein structure prediction problem, given a sequence of amino acids, predict the three-dimensional folded structure, at near-experimental accuracy for most proteins. This took decades of effort by thousands of researchers, and a deep learning system solved it.
What AlphaFold proved is that some well-defined biological prediction problems can be solved at scale by deep learning. Protein structure prediction is such a problem because it has a clear input (sequence), a clear output (3D coordinates), and decades of experimental structures to train on. The loss function is unambiguous. The benchmark is unambiguous.
The well-defined problem problem
Most of drug discovery does not look like this. Consider target identification: given a disease phenotype, which proteins should you try to modulate? This is not a prediction problem with a clear output. It is a causal inference problem with incomplete mechanistic knowledge, confounded observational data, and no ground truth, you only discover whether you were right years later, in humans.
Or consider ADMET prediction (absorption, distribution, metabolism, excretion, toxicity): predicting whether a molecule will be toxic in humans from its structure alone. Models can be trained on in vitro data. But in vitro toxicity does not reliably predict in vivo toxicity, which does not reliably predict human toxicity. The prediction problem is technically tractable. The scientific problem is not.
Where AI genuinely adds value in drug discovery
There are domains where AI applications have delivered real, reproducible value:
- Molecular property prediction: predicting physicochemical properties (solubility, logP, Tpsa) from molecular structure. Well-defined inputs, measurable outputs, large training datasets. GNNs and transformer-based molecular models work well here.
- Hit identification and virtual screening: ranking large compound libraries against a known target structure. Faster and cheaper than wet-lab screening for known target classes.
- Synthesis planning: predicting feasible synthetic routes for target molecules. Commercially deployed by Synthia (now Merck) and others.
- Image-based phenotypic screening: analyzing high-content microscopy data to identify compounds that produce desired cell morphology changes. Computer vision on well-controlled experimental data.
- Biomarker discovery: identifying molecular signatures associated with disease subtypes or treatment response in omics data. Primarily a pattern recognition task.
Where AI is being oversold
The claims that are being stretched beyond what the evidence supports:
- De novo drug design: generative models that design novel drug candidates from scratch have produced many molecules that look good on paper. Experimental hit rates when synthesized and tested are not obviously better than traditional methods. The models are better at generating drug-like molecules than at generating drugs.
- End-to-end pipeline automation: the idea that AI can take you from target to clinical candidate with minimal human intervention. The bottleneck has never been molecule generation: it has been mechanistic understanding, experimental validation, and ADMET. AI has not solved these.
- Reducing clinical failure rates: 90% of drugs fail in clinical trials. AI-designed candidates are now entering trials. We will know in 5-10 years whether failure rates have changed. The current confidence expressed in press releases is not supported by clinical outcomes data.
The data quality problem nobody talks about
Every ML application in drug discovery is constrained by the quality of its training data. Bioactivity data in public databases (ChEMBL, BindingDB) contains systematic errors: assay variability, inconsistent units, duplicate measurements with contradictory values, reporting bias (positive results are overrepresented).
Models trained on this data learn patterns that partially reflect biology and partially reflect assay artifacts. The validation benchmarks used in academic papers often use the same contaminated data as training. A model that achieves 90% accuracy on such a benchmark is not necessarily 90% accurate on prospective predictions.
What this means for AI drug discovery teams
The useful framing is not "can AI discover drugs?" but "which specific tasks in drug discovery are well-defined prediction problems with high-quality training data and unambiguous benchmarks?" Those tasks, and there are genuine ones: are where AI tools should be applied.
For everything else, AI provides useful signals that need to be interpreted by scientists with mechanistic domain knowledge. The value is in augmentation, not replacement. The scientist's job does not disappear. It shifts toward better experimental design, better data curation, and better interpretation of model outputs.
A more honest framing
AI in drug discovery is a set of powerful pattern-recognition tools being applied to a domain where most of the hard problems are not pattern-recognition problems. The tools are genuinely useful for the subset of problems they fit. The error is claiming they fit more of the problem than they do.
The honest version of the AI drug discovery story is this: we can now do certain computational tasks faster and more cheaply than before. That reduces some bottlenecks. It does not change the fundamental difficulty of predicting clinical efficacy and safety in humans.
Key takeaway
AI adds real value in drug discovery for well-defined prediction tasks with good training data. It does not change the fundamental challenge: predicting which molecules will be safe and effective in humans is a scientific problem, not a data problem. Until we have better mechanistic models, better data, and outcomes from AI-designed drugs in clinical trials, the hype is running ahead of the evidence.