AI for Drug Discovery: How AlphaFold Reinvented Biology
Introduction
Drug development is one of humanity's most expensive and failure-prone endeavours. A new medicine typically takes 10 to 15 years (commonly cited as approximately 12 years on average) to reach patients, costs over one billion dollars in research and development, and has a greater than 90 percent chance of failing in clinical trials despite looking promising in the laboratory.
That bottleneck has a root cause: biology is extraordinarily complex, and until recently our ability to model it computationally was primitive. Researchers would spend years just figuring out what a single protein looked like in three dimensions before they could even begin designing a drug to interact with it.
Then, in 2020, DeepMind's AlphaFold 2 solved what biologists had called the protein folding problem, a challenge that had resisted 50 years of scientific effort. The solution was a deep learning model. In 2024, Demis Hassabis and John Jumper shared the Nobel Prize in Chemistry for this work, alongside David Baker of the University of Washington for his independent contributions to computational protein design.
This article explains why drug discovery is fundamentally a machine learning problem, how AlphaFold works, what generative chemistry is doing to design molecules from scratch, and which AI-discovered drugs are already in human clinical trials.
Problem Statement
Drug discovery is a search problem, and the search space is incomprehensibly large. The set of all possible drug-like molecules, called chemical space, contains an estimated 10 to the power of 60 candidates. The entire history of pharmaceutical research has sampled only a tiny fraction of this space through physical experimentation. No brute-force physical search is possible.
A drug works by binding to a target protein and altering its activity. To design a molecule that fits a protein precisely, like a key fitting a lock, you must know the protein's three-dimensional shape. That shape is determined by the protein's one-dimensional amino acid sequence, but predicting three-dimensional structure from a one-dimensional sequence involves exploring a conformational space so vast it makes chess look simple. As of 2020, only about 170,000 protein structures had ever been experimentally solved, out of hundreds of millions of known protein sequences. The bottleneck was not lack of desire but lack of a method fast enough to close the gap.
Even when a promising molecule is found, optimising it requires satisfying conflicting constraints simultaneously: high binding affinity to the target, selectivity against off-target proteins, good absorption and stability in the body, low toxicity, and practical synthesisability in a laboratory. Improving one property often degrades another. Finding candidates on the Pareto frontier across all dimensions by physical experimentation alone takes years.
Core Concepts and Terminology
| Term | Definition |
|---|---|
| Protein folding | The process by which a protein's linear amino acid sequence spontaneously adopts its functional three-dimensional shape. |
| ADMET | Absorption, Distribution, Metabolism, Excretion, and Toxicity, the five pharmacokinetic properties a drug candidate must have to be viable in humans. |
| Chemical space | The theoretical set of all possible drug-like molecules, estimated at around 10 to the power of 60 candidates. |
| Virtual screening | Using computational models to evaluate millions of candidate molecules in silico and identify the most promising ones for laboratory testing. |
| SMILES | Simplified Molecular Input Line Entry System, a text-based notation for representing molecular structure, widely used in cheminformatics and as input/output format for generative chemistry models. |
| Graph Neural Network (GNN) | A neural network architecture designed to process graph-structured data. Molecules are natural graphs where atoms are nodes and chemical bonds are edges. |
| Message passing | The mechanism by which GNNs update atom representations, each atom aggregates information from its bonded neighbours across multiple layers. |
| Binding affinity | A measure of how strongly a drug molecule binds to its target protein. Higher affinity generally correlates with lower required dose. |
| Generative chemistry | The use of generative AI models, including reinforcement learning, diffusion models, and variational autoencoders, to design novel molecular structures rather than simply screening existing ones. |
| Target identification | The process of identifying which protein or biological pathway is responsible for a disease and could be modified by a drug to produce a therapeutic effect. |
How It Works
Step 1, Target Identification: Finding the Right Lock
Before designing a drug, researchers must identify the protein whose activity, when altered, will produce a therapeutic effect. AI accelerates this step through genomics analysis that identifies which genes are differentially expressed in diseased versus healthy tissue, knowledge graph mining that links known disease pathways to unexplored proteins, and literature mining that surfaces connections across decades of published research that no human team could read in full. Identifying the right target can now take months instead of years for diseases with established genomic data.
Step 2, Structure Prediction: AlphaFold Solves the Lock's Shape
Once a target protein is identified, drug designers need to know its three-dimensional shape to design something that will bind to it. AlphaFold 2 cracked this problem using two complementary ideas. The first is multiple sequence alignment: when a protein's sequence has evolved across hundreds of species, amino acid positions that co-evolve are physically close to each other in 3D space because they interact. This evolutionary signal encodes geometric constraints without any physics simulation. The second is equivariant attention: a specialised form of the transformer attention mechanism that correctly handles the rotational and translational symmetry of 3D space, allowing the model to directly predict atomic coordinates.
At the CASP14 competition in 2020, AlphaFold 2 achieved a median GDT_TS score of 92.4, with backbone RMSD as low as 0.96 angstroms on the best free-modelling targets, essentially matching expensive laboratory methods. DeepMind then released predictions for over 200 million proteins, essentially every known protein sequence, in a freely accessible database. This expanded available structural knowledge by more than a thousandfold overnight.
Step 3, Hit Finding and Lead Optimisation: Designing the Right Key
With the target structure known, the next challenge is finding or designing a molecule that binds to it effectively. Three AI approaches are currently most active. Reinforcement learning treats molecule construction as a sequential decision problem: an agent starts with a partial structure and at each step chooses which atom or functional group to add, guided by reward signals that score binding affinity, drug-likeness, and synthesisability. Diffusion models apply the same framework that generates photorealistic images to molecular design, learning to denoise random starting structures into realistic candidates with desired properties. Graph neural networks serve as fast surrogate models: trained on experimental databases, they predict binding affinity, solubility, metabolic stability, and toxicity risk for millions of candidate structures in seconds rather than the days laboratory measurement requires.
Step 4, Clinical Validation: The Step AI Cannot Yet Compress
Even with a promising AI-designed candidate, clinical validation in humans must follow the same path it always has. Phase 1 tests safety and dosing in healthy volunteers. Phase 2 tests efficacy in a small patient cohort. Phase 3 tests in large diverse populations with long follow-up periods. This process takes 6 to 10 years and cannot be significantly accelerated by AI. The failure rate in Phase 2 and 3, where drugs that worked in cells and animals fail in humans, remains above 90 percent and represents the domain where AI has contributed least so far.
Practical Example
Insilico Medicine's ISM001-055, described by Insilico Medicine as the first fully AI-designed drug to enter human clinical trials (though the definition of "fully AI-designed" is debated across the industry), illustrates the end-to-end AI drug discovery pipeline. The target for idiopathic pulmonary fibrosis, a progressive lung disease with no cure, was identified by an AI system that analysed transcriptomic data from diseased lung tissue and surfaced a previously unexplored pathway. A generative chemistry model then designed candidate molecules optimised simultaneously for binding affinity to that target, predicted ADMET properties, and synthesisability. The team synthesised and tested only the top candidates experimentally, compressing the hit-to-candidate timeline from years to months. ISM001-055 reached Phase 2 clinical trials, where it is being tested in human patients, representing the first time an entirely AI-driven discovery process produced a molecule that entered human testing.
Advantages
- AlphaFold democratised structural biology: Before 2020, determining a protein structure required months of laboratory work. Today, any researcher can look up AlphaFold predictions for 200 million proteins for free in seconds, enabling target identification and virtual screening that was previously impossible without specialist crystallography expertise.
- Generative chemistry explores regions of chemical space that humans would never visit: AI models are not constrained by human chemical intuition. They routinely propose molecules with unusual structural features that turn out to have superior properties, features that a human medicinal chemist might dismiss based on experience that does not generalise to the new target.
- GNN surrogate models compress years of experimental screening into hours: A GNN trained on millions of bioactivity records can screen billions of candidate molecules for predicted properties in a single GPU-hour, replacing years of physical high-throughput screening with a computational filter that directs experimental resources to the most promising candidates.
- Multi-objective optimisation produces better drug candidates than sequential optimisation: AI approaches that simultaneously optimise binding affinity, ADMET properties, and synthesisability find candidates that human medicinal chemists, who typically optimise one property at a time, would not converge to through iterative refinement.
Limitations and Trade-offs
- In-silico to in-vivo translation remains the central unsolved problem: A molecule that looks perfect in a computational model can fail completely in a living system. Biology is messy in ways that models do not capture: selective membranes, metabolic enzymes, compensatory cellular pathways, and individual genetic variation. No computational model fully represents this complexity.
- AI cannot predict clinical success: The 90 percent failure rate in clinical trials is dominated by failures in Phase 2 and 3. AI has significantly accelerated pre-clinical discovery but cannot yet predict which candidates will pass human trials. The variables that matter most, patient heterogeneity, off-target effects at therapeutic doses, long-term safety, are not well represented in any current training dataset.
- Data quality and IP barriers constrain learning: The highest-value bioactivity data from pharmaceutical companies' internal screening libraries is proprietary and not shared. Models trained on public databases like ChEMBL are therefore trained on a biased subset of known bioactivity space and may not generalise well to the novel chemical territories that generative models are trying to explore.
- Synthesisability predictions are imperfect: A molecule that a generative model proposes as synthesisable may in practice require reaction steps that are difficult, expensive, or not yet established in the literature. Experimental chemists still need to validate synthetic routes, and some AI-proposed candidates turn out to be practically impossible to make.
Common Mistakes
- Treating binding affinity as the only objective: A molecule with excellent predicted binding affinity that also inhibits the hERG cardiac ion channel will fail clinical trials due to cardiac toxicity. ADMET properties must be optimised alongside binding affinity from the first round of generation, not retrofitted at the end.
- Using Morgan fingerprints for all molecular property prediction: Fixed fingerprints treat all structural patterns equally regardless of the specific target. Task-specific GNNs that learn which features matter for a given property consistently outperform fingerprint-based models, especially for novel chemical series outside the training distribution.
- Overinterpreting AlphaFold confidence scores: AlphaFold predictions come with per-residue confidence scores. Regions with low confidence scores, particularly intrinsically disordered regions, should not be treated as reliable structural predictions. Designing drugs against low-confidence binding pockets risks targeting a structure that does not exist in the physiological form of the protein.
- Skipping wet-lab validation of computational predictions too long: AI models are most valuable when they are rapidly validated by experiment, not when they are allowed to generate candidates for months before any laboratory feedback enters the loop. Short, tight feedback cycles between computational prediction and experimental validation produce better outcomes than long purely computational campaigns.
Best Practices
- Use AlphaFold 3 for protein-ligand docking predictions rather than relying solely on classic docking software like AutoDock. AlphaFold 3's co-folding predictions incorporate binding-induced conformational changes that classical rigid-receptor docking misses.
- Train property prediction models on task-specific data whenever possible. A GNN trained on binding affinity data for your specific target class will outperform a general-purpose model trained on the full ChEMBL database.
- Include synthesisability scoring as a hard constraint in generative chemistry workflows, not as a soft penalty. Molecules that cannot be synthesised consume experimental resources and delay timelines.
- Design multi-objective reward functions for reinforcement learning generators carefully. Weights that over-penalise toxicity risk at the expense of binding affinity will generate safe molecules that are therapeutically useless. Calibrate weights against a known benchmark set of approved drugs first.
- Establish a rapid wet-lab validation loop with short cycles between computational generation and experimental testing. The fastest teams generate a small set of high-confidence candidates, synthesise and test them within weeks, use the results to update the model, and repeat.
Comparison: Key AI Techniques in Drug Discovery
| Technique | Primary Use | Strengths | Limitations |
|---|---|---|---|
| AlphaFold 2 and 3 | Protein structure prediction and protein-ligand docking | Atomic-resolution accuracy, free database of 200 million proteins | Low-confidence predictions for disordered regions; limited for membrane protein classes |
| Graph Neural Networks | Molecular property prediction (binding affinity, ADMET) | Learn task-specific representations; outperform fingerprints on novel series | Require large labelled datasets; may not generalise outside training distribution |
| Reinforcement Learning | Molecule generation and lead optimisation | Direct multi-objective optimisation; explores novel chemical space | Reward function design is critical and error-prone; may exploit model artefacts |
| Diffusion models | Protein backbone design, 3D molecular generation | High-quality 3D structures; can incorporate geometric constraints | Computationally expensive; limited training data for 3D molecular structures |
| Protein language models (ESM-2) | Protein function prediction, sequence design | No multiple sequence alignment required; scales to billions of parameters | Structure prediction less accurate than AlphaFold; best used as complement |
FAQ
Has any AI-designed drug been approved for patients?
As of 2026, no fully AI-designed drug has completed Phase 3 trials and received regulatory approval. Insilico Medicine's ISM001-055 is the furthest along, currently in Phase 2 for idiopathic pulmonary fibrosis. Several AI-assisted candidates are in Phase 1 and Phase 2 trials. Regulatory approval, if successful, is likely 3 to 5 years away for the current leading candidates given trial timelines.
Does AlphaFold make drug discovery easy?
AlphaFold dramatically accelerated target characterisation and structure-based drug design, but drug discovery remains hard. Knowing the structure of a target protein is necessary but not sufficient, you still need to design a molecule with the right binding specificity, optimised ADMET properties, and a practical synthetic route. The 90 percent clinical failure rate is not primarily due to lack of structural information, and AlphaFold does not address the clinical translation problem.
What is AlphaFold 3 and how does it differ from AlphaFold 2?
AlphaFold 2 predicts the structure of a single protein from its amino acid sequence. AlphaFold 3, released in 2024, extends this to predict the structure of complexes involving proteins, DNA, RNA, and small molecules together. This directly supports drug design by predicting how a candidate molecule docks into a protein binding pocket, rather than requiring a separate docking simulation. It also incorporates post-translational modifications and ligand conformations, which AlphaFold 2 could not model.
Why do AI-designed drugs still fail in clinical trials?
AI models predict properties based on patterns in training data. The training data comes from cell cultures and animal models, which are imperfect proxies for the human body. Phase 2 and 3 failures are typically due to lack of efficacy in humans, unexpected off-target effects at therapeutic doses, or long-term safety signals that take years to manifest. These failure modes are not well-represented in current training datasets, so AI models trained on pre-clinical data cannot yet predict clinical outcomes reliably.
What is the role of graph neural networks in drug discovery?
Molecules are naturally represented as graphs, with atoms as nodes and bonds as edges. GNNs are architectures specifically designed to learn from graph-structured data. In drug discovery, GNNs are trained to predict molecular properties from structure, serving as fast surrogate models that can screen millions of candidates in seconds. They consistently outperform traditional fingerprint-based methods because they learn task-specific representations rather than treating all structural patterns equally.
References
- Jumper, J., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596, 583–589.
- Stokes, J.M., et al. (2020). A Deep Learning Approach to Antibiotic Discovery. Cell, 180(4), 688–702.
- Zhavoronkov, A., et al. (2019). Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nature Biotechnology, 37, 1038–1040.
- Watson, J.L., et al. (2023). De novo design of protein structure and function with RFdiffusion. Nature, 620, 1089–1100.
- Abramson, J., et al. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, 630, 493–500.
- Bender, A., and Cortés-Ciriano, I. (2021). Artificial intelligence in drug discovery: what is realistic, what are illusions? Drug Discovery Today, 26(2), 511–524.
Key Takeaways
- Drug discovery is fundamentally a prediction and search problem across vast chemical and biological spaces, exactly the kind of problem machine learning is suited to at scale.
- AlphaFold 2 solved protein structure prediction at experimental accuracy and made 200 million structure predictions publicly available overnight. Hassabis and Jumper received the 2024 Nobel Prize in Chemistry for this work.
- Generative chemistry approaches including reinforcement learning, diffusion models, and GNNs can now design novel molecules optimised simultaneously for binding affinity, ADMET properties, and synthesisability.
- AI-designed drugs are in Phase 2 clinical trials as of 2026. Insilico Medicine's ISM001-055 is described by Insilico Medicine as the first fully AI-designed molecule to reach human testing, though the definition of "fully AI-designed" is debated across the industry.
- The remaining bottleneck is clinical validation. AI accelerates the discovery and pre-clinical stages dramatically but cannot yet predict which molecules will succeed in human trials.
- The next frontiers are protein language models that work without multiple sequence alignment, personalised cancer vaccines designed from individual patient mutation profiles, and digital cell twin models that simulate whole-cell response to drug treatment.
Related Articles