AI has already demonstrated its mettle as an effective tool for discovering novel drugs. Over two dozen candidates developed with AI are in clinical trials, according to analysis published late last year.
Now, scientists are looking to one of the next frontiers.
Rather than using AI to create the right drug, some new models are being trained to pinpoint the underlying triggers of disease, an approach that could reveal new targets for drugs to go after. In other words, rather than finding the keys to treat a disease — AI could help reveal which locks have gone unexplored.
“It’s about finding new targets altogether … new biology, new mechanisms for solving disease,” said Nick Naclerio, founding partner at Illumina Ventures. “That’s a much harder problem.”
Pulling off this feat will mean building models that unravel the complex intricacies of human biology.
“AI is trained on what’s in the literature, but there’s a finite understanding of how human biology works,” Naclerio said.
With that goal in mind, Illumina, which specializes in R&D technologies and diagnostics, announced a research consortium this year to build a comprehensive map of human disease biology.
Called the Billion Cell Atlas, the genome-wide perturbation dataset is being created with an alliance including industry bigwigs like Merck & Co., Eli Lilly and AstraZeneca. The goal is to build a “curated set of cell lines to drive drug target validation, train advanced AI models at scale, and advance research into fundamental disease mechanisms that had previously been out of reach,” according to the press release.
The Atlas model will develop over 200 cell lines relevant to a range of disease areas such as cancer and cardiometabolic and neurological conditions. Researchers then plan to use CRISPR technology to test what happens to billions of individual cells when thousands of different genes are switched on and off.
“The next few years will be a race to build foundational models that can make leaps ... into understanding biology.

Nick Naclerio
Founding partner, Illumina Ventures
“Imagine you bought a new house and wanted to know what the light switches do,” Naclerio said. “In this model, each switch could impact thousands of systems … and you’ve got around 20,000 genes you’re going to turn on and off multiple times in hundreds of tissues. Now you’ve got an enormous dataset.”
The insights the model generates into human biology could then be used to identify drug targets with the help of AI, Naclerio said.
Illumina’s efforts add to a growing number of AI approaches to computational biology.
Google DeepMind’s AlphaFold platform has become known for its ability to predict complex protein structures — a breakthrough that snagged a Nobel Prize in 2024.
Other biotechs are pushing these bounds further. Australia-based Omnigeniq, for example, developed a platform that can visualize proteins in their ligand-ready state, which will help “engineer therapies that come from an understanding of the changes that lead to disease propagation,” the company’s CEO, Jordana Blackman, told PharmaVoice earlier this year.
New disease-focused techniques from the academic arena could also set the stage for upending long-held approaches to drug R&D.
Instead of the traditional method of testing one protein or drug at a time, an AI-driven tool developed by researchers at Harvard Medical School can now identify multiple disease drivers and then which genes could address them. This new model could steer drug development toward more effective targets, researchers reported in September.
“The next few years will be a race to build foundational models that can make leaps in generating new insights into understanding biology,” Naclerio said.
The data cycle
AI systems are only as good as the data that feeds them — and models for disease target prediction are no different. Pulling together those datasets will be a chief focus for the industry.
“This is where the competition is right now,” Naclerio said. “Pharma is looking for big datasets to train models to find new targets.”
The current need for more data follows a technology development cycle that’s emerged recently.
“Over the last 20 years, the amount of data we could generate has gone up by many orders of magnitude. It got to the point where we had more data than we knew what to do with. Then came AI to ingest all that data,” Naclerio said. “And now we’re back around to having new tools like AI, but we need to ramp up experimentation to extract new data for them.”
Although the industry is not at the point where AI can “instantly find cures,” Naclerio said, the timing is almost right for the available data and toolsets to converge for the next leap forward.
“Over time, we are going to bend the curve of drug discovery productivity and see huge improvements,” he said.