The Simulation Layer: Where Biology Becomes Software

Tung
2 apr
12 minuten om te lezen

The Core Investment Thesis

The simulation layer acts as the primary bottleneck for the Bio × AI stack. It is not merely a category of tools, but the point at which biological discovery transitions from physical experimentation to computational modeling. Today, preclinical drug discovery alone represents roughly $50 billion in annual spend, the majority of which is allocated to iterative wet lab experimentation. Simulation threatens to make 60–70% of that process computational.

Investors overlook this change by treating simulation as basic software for researchers. In reality, it is powerful productivity infrastructure. A marginal improvement in simulation accuracy or cost does not produce linear gains; it compounds across the entire discovery pipeline. The non-consensus position, therefore, is that simulation is not an incremental efficiency layer but the central bottleneck. It transforms the industry from a slow, expensive gamble into a streamlined process that is much faster and more cost-effective for everyone involved.

However, this thesis carries a specific condition: the bottleneck only shifts value if simulation outputs are trusted enough to replace, not merely supplement, physical experiments. The entire investment case rest on that transition happening at scale. Until it does, simulation remains a cost reduction tool, not a discovery engine. Thus a meaningfully smaller prize.

The Simulation Ladder

Think of the simulation layer as a series of levels: atoms, cells, organs, individuals, and entire groups. We should view this as a "precision scale" where we map out biology at every level. Success depends on connecting these different levels to see how tiny molecular changes eventually affect whole populations. Each step up this scale corresponds to a larger addressable economic domain, but also to exponentially greater complexity and validation difficulty.

From an investment view, value isn't found in the most complex science. Instead, it appears when a simulation becomes "good enough" to replace a physical lab test. The real profit happens when digital models are accurate enough to skip expensive real-world experiments, cutting costs and saving time. Molecular simulation has already crossed this threshold in several domains, particularly in protein structure prediction and aspects of molecular interaction modeling. Cellular simulation is approaching this boundary, while full organism-level or digital twin simulations remain aspirational.

This scale provides a natural framework for time horizon layering. Near-term value capture is concentrated at the molecular level, where digital models are already replacing physical experiments. Medium-term opportunities lie at the cellular level, where early validation signals are emerging. Long-term upside resides in system-level simulations, but remains highly uncertain.

The critical investors question at each rung is not "how good is the science?" but "what wet lab budget line does this eliminate?". That reframe converts an abstract technological capability into a concrete revenue displacement estimate. At the molecular level, the displaced budget is primarily lead optimization in medicinal chemistry, a well-defined large spend category. At the cellular level, it is early phenotypic screening. At the organism level, it is phase I/II clinical costs. Each rung up the ladder represents a larger prize but a longer wait.

Compute Cost as Primary Catalyst The primary driver enabling this transition is the collapse in computational cost. Classical molecular dynamics simulations have historically required tens of thousands of dollars per candidate, limiting their use to a small subset of high-priority problems. Machine learning–based surrogate models are reducing this cost to single-digit dollars per simulation, representing a reduction on the order of four to six magnitudes.

To make this concrete: Schrödinger's physics-based Free Energy Perturbation (FEP) pipeline costs roughly $10,000 - $50,000 per compound in compute alone. Machine-Learning surrogate models are running equivalent approximations at under $10 per compound. That is not an incremental improvement, it is a structural repricing of who can afford to run simulation at scale.

This cost compression is analogous to what cloud computing did for enterprise software. As simulation becomes cheaper, it becomes ubiquitous. More experiments can be run, more hypotheses tested, and more players can participate in the discovery process. The immediate beneficiaries are not necessarily the end applications, but the platforms that enable and scale this increased activity.

Hardware remains a critical enabler. NVIDIA continues to dominate general-purpose compute, but specialized architectures, from wafer-scale systems to domain-specific chips, suggest that the hardware layer itself may continue to evolve in response to biological workloads. The investor implication is clear: cost curve collapse expands the TAM and shifts value toward those who control the interfaces through which simulation is accessed.

The secondary implication is often missed: cost collapse does not only benefit large Pharma. It democratizes access to simulation for smaller biotechs and academic spinouts. This expands the customer base for platform providers and accelerates the volume of simulation runs, which in turn accelerates data flywheel dynamics for whoever sits at the center of that activity.

Competitive Moats

The simulation layer is not defensible by default. Moats must be actively constructed, and they tend to fall into four categories: data, architecture, workflow integration, and network effects.

Data moats arise from proprietary experimental datasets that improve model accuracy and generalization. Architecture moats are based on novel model designs that are not easily replicated. Workflow moats are created through deep integration into pharmaceutical R&D pipelines, increasing switching costs. Network effects, while theoretically powerful, remain underdeveloped in this domain.

The absence of these moats is a critical red flag. Companies that merely wrap open-source models without proprietary data or integration risk rapid commoditization. This is particularly relevant in a field where foundational capabilities are increasingly being released into the public domain.

Ranked by current defensibility: -Workflow moats are the most durable in the near term because Pharma R&D pipelines change slowly and switching costs are high, Schrödinger's enterprise contracts reflect this. -Data moats are the most valuable long-term but also the most fragile: they hold until a foundation model trained on all published biology absorbs the advantage. -Architecture moats have the shortest half-life. Novel designs gets replicated or open-sourced within 12-24 months in this field.

Network effects remain largely theoretical in simulation today. The first company where more users generate more training data which improves model for all users, will have the most durable position of all.

Commoditization vs. Differentiation

The defining tension of the simulation layer is the speed at which capabilities are becoming commoditized. Open-source breakthroughs in protein folding and generative modeling have dramatically lowered the barrier to entry. What was once a proprietary advantage can become a baseline capability within a relatively short time frame.

This creates a divergence. At the lower end of the stack, simulation tools become commoditized utilities, with limited pricing power and high competition. At the upper end, platforms that combine proprietary data, continuous feedback loops, and integrated workflows can maintain defensibility and capture outsized value.

The middle ground, companies that rely on incremental improvements to widely available techniques without building deeper moats, is structurally unstable. Their competitive advantage erodes as the baseline advances, compressing margins and limiting long-term viability.

The rate of commoditization is accelerating. AlphaFold 2 was released in 2021. Within 18 months, it was integrated into every major academic and commercial pipeline as free baseline. RFdiffusion, released in 2022, followed a similar trajectory. The practical implication for investors is that moat half-lives are shortening. What took five years to commoditize in traditional software takes 18-24 months here. This raises the bar for what counts as a durable competitive advantage and should make investors deeply skeptical of any simulation company whose primary asset is a single model or technique rather than a system of compounding advantages.

The Bear Case

No serious investment thesis is complete without confronting the scenarios under which it fails. For the simulation layer, three bear cases are worth taking seriously.

The first is Pharma adoption inertia. The pharmaceutical industry has a well-documented history of slow external platform adoption. Internal resistance, validation requirements, regulatory conservatism, and institutional incentives to protect existing workflows all create friction. Simulation could be technically superior and still fail to capture value if Pharma refuses to change how it buys and deploys discovery tools.

The second is foundational model absorption. If a sufficiently powerful general biology foundational model, trained on all published experimental data, emerges and is made available at low cost, it could commoditize the entire simulation layer simultaneously. This is not a remote scenario: it is arguably what AlphaFold did to protein structure prediction in a single paper. A similar event in molecular interaction modeling or cell biology would reset the competitive landscape overnight.

The third is clinical validation failure. The entire thesis assumes that simulation-guided candidates perform better in the clinic. If phase II success rates for simulation-validated drugs do not outperform the historical baseline by a measurable margin, the productivity argument collapses and with it, the justification for premium pricing and platform valuations.

The Public Market Landscape Public markets offer early exposure to the simulation layer, but the available set of companies reflects an industry still in transition.

Schrödinger ($SDGR) represents the closest approximation to a pure-play simulation platform, combining physics-based modeling with machine learning. Its strength lies in workflow integration and enterprise adoption. The bear case is specific: Schrödinger's core FEP pipeline is expensive and slow relative to emerging machine-learning alternatives. If machine learning surrogate accuracy continues to close the gap with physics-based methods, which the trajectory suggests it will, Schrödinger's primary technical differentiator erodes. The workflow moat buys time, but not indefinitely. Monitor software revenue growth rate and enterprise contract renewal rates as leading indicators.

Certara ($CERT) operates closer to the regulatory interface, focusing on pharmacokinetics and clinical modeling. It is more defensible because its moat is regulatory workflow integration, not model performance. Certara's software is embedded in FDA submission progress, switching costs are exceptionally high because changing tools mid-program introduces regulatory risk that Pharma will not accept. The tradeoff is limited upside: this is a steady compounder, not an asymmetric position.

Recursion Pharmaceuticals ($RXRX) represents a different approach, integrating data generation, simulation, and drug development into a single vertically integrated platform. The bull case is that recursion's phenomics data library, generated from automated biology at massive scale, is the most credible data flywheel in public markets today. The bear case is that vertical integration means clinical failure is an existential event, not just a pipeline setback. The Roche/Genentech partnership validates the data asset; Future clinical readouts will determine whether the integrated model actually delivers on its promise.

In this sector, valuations often mimic software-industry multiples instead of accounting for the high-reward potential found in drug pipelines. This gap between market price and actual value will likely close once clinical trials start proving that the technology works, aligning investor expectations with real-world biological results.

Private Markets

The most significant asymmetry currently resides in private markets, where foundational models and new architectures are being developed. Entities such as Isomorphic Labs and emerging foundation model companies are pushing the frontier of what biological simulation can achieve.

Beyond Isomorphic Labs, several private companies deserve attention. Unlearn AI is building digital twins for clinical trials, specifically synthetic control arms that reduce the number of patients needed in phase II/III studies. The FDA has engaged directly with their methodology under the complex innovative trial designs program, making this one of the few simulation plays with a near-term regulatory revenue path.

Turbine AI is simulating cancer cells responses to combinatorial drug treatments, a problem that is computationally tractable and commercially urgent.

EvolutionaryScale is commercializing ESM3, a protein language model that operates across sequence, structure, and function simultaneously, the closest thing to a unified protein foundation model currently available.

Atomic AI is focused specifically on RNA simulation, an under-explored approach with direct relevance to MRNA therapeutics and RNA-targeted drug discovery.

These companies are often not directly investable, but their partnerships and funding rounds serve as critical signals. When large pharmaceutical companies commit significant upfront capital to these platforms, they are not merely outsourcing research; they are validating the underlying paradigm. Tracking these signals provides insight into where conviction is forming within the industry.

Partnerships as Validation Signals

Partnerships between simulation companies and pharmaceutical firms provide a window into real-world validation. The structure of these deals is particularly informative. Upfront payments indicate a willingness to pay for certainty, while milestone-based structures suggest a more cautious approach.

Two deals illustrate this framework in practice. Isomorphic Labs signed agreements with Eli Lilly and AstraZeneca in early 2024, with upfront payments reported in the range of $45-70 million per deal per milestones. The upfront component is the signal: these companies are not paying for future outcomes alone, they are paying for present access to a capability they cannot replicate internally. Recursion's partnership with Roche/Genentech follows a similar logic: a $150 million upfront payment against a total deal value of up to $12 billion. The ratio of upfront to total is small, but the absolute upfront number reflects genuine conviction in the data asset.

As these partnerships evolve, the ratio of upfront to contingent payments can be used as a proxy for confidence in simulation capabilities. An increase in upfront commitments would signal that simulation is moving from experimental to essential within the drug discovery process.

The Data Flywheel The long-term winners in the simulation layer will be defined by their ability to build and sustain data flywheels. The loop is straightforward: real-world experiments generate data, which improves simulation models, which in turn generate better predictions and synthetic data, reducing the need for further experimentation. This compounding advantage creates increasing returns to scale. Companies that establish early leadership in data acquisition and model training can widen their lead over time.

Specialized flywheels will hold for longer than most investors expect in domains where experimental data is expensive to generate and not publicly available. Recursion's phenomics data cannot be replicated from published literature, it requires physical automation at scale. Relay Therapeutic's conformational ensemble data requires proprietary experimental infrastructure. These are not model advantages; they are data generation infrastructure advantages, which are harder to commoditize than algorithmic ones.

The risk shifts when a large foundational model is trained on all published experimental data and begins to approximate what these flywheels produce, that is the moment to reassess.

Platform vs. Application The simulation layer offers multiple entry points for investors, each with different risk-reward profiles. Platform companies provide infrastructure to a wide range of users and benefit from broad adoption, but may face commoditization pressure. Vertically integrated companies, which combine simulation with drug development, offer higher upside but also greater execution risk. Infrastructure providers, compute, data pipelines, and cloud services, offer indirect exposure, often with more liquidity and lower volatility.

The asymmetric position is in vertically integrated players with demonstrated data flywheels. These offer the highest convexity: if the simulation-to-clinic pathway validates, they capture value across the entire stack. The risk is binary in the sense that clinical failure is expensive and public. Platform plays are lower variance but also lower ceiling.

Time Horizon

The simulation layer must be evaluated across multiple time horizons.

-Near term (0-2 years): Molecular simulation is generating measurable revenue. Watch for: Schrödinger software renewal rates, machine-learning surrogate accuracy benchmarks vs. FEP, and first clinical readouts from simulation-guided programs at Recursion and Relay Therapeutics.

-Medium term (2-5 years): Cellular modeling and generative design approach clinical validation. Watch for: Unlearn AI regulatory approval of synthetic control arm methodology, phase II data from simulation-heavy pipelines, and whether EvolutionaryScale's ESM3 becomes the default protein foundation model.

-Long term (5-10 years): Digital twins at patient level could transform clinical trials. Watch for: FDA formal guidance on in silico trial methodology and any phase III trial that uses simulation as a primary evidence source rather than a supplementary one.

Falsifiable Thesis

For the simulation layer thesis to hold, certain measurable outcomes must occur.

-Clinical validation: If simulation-validated drug candidates do not demonstrate phase II success rate above 55%* (vs. the historical baseline of roughly 40%*), by end of 2027, the core productivity argument is unproven.

-Software revenue: If Schrödinger's software segment revenue growth falls below 15% CAGR through 2026, despite machine-learning integration, commoditization is outpacing value capture.

-Partnership signal: If the ratio of upfront to milestone payments in major simulation partnerships declines over the next 12-18 months, Pharma conviction in simulation is weakening not strengthening.

-Regulatory acceptance: If no synthetic control arm methodology receives FDA approval by end of 2026, the regulatory tailwind thesis is at least two years premature.

These are not arbitrary thresholds. Each one is tied to the assumption of a specific part of the thesis. If more than one breaks simultaneously, the position should be reassessed in full.

*The 40% figure is a standard industry benchmark for Phase II success rates. Phase II is where most drugs fail because it's the first time they are tested for actual efficacy in patients (rather than just safety in healthy volunteers). Historically, roughly 30% to 40% of drugs that enter Phase II successfully transition to Phase III. Because the baseline is so low, companies spend billions on drugs that look good in a lab but fail in humans.

The argument is that by using Conformational Ensembles (mapping how proteins move) rather than static snapshots, they can pick drug candidates that are more likely to work. To prove their platform isn't just "expensive software", they need to beat the industry average by a statistically significant margin. Moving from 40% to 55% would represent a 37% improvement in R&D productivity.

The year 2027 is cited because that is when the current wave of "simulation-designed" drugs (from Relay, Recursion, etc.) will have enough completed phase II readouts to create a meaningful sample size.

Conclusion

The simulation layer sits at the intersection of computation and biology, representing a shift from experimentation to engineering. Its significance lies not in its current revenue, but in its potential to restructure the economics of discovery across the life sciences.

The correct posture is selective conviction, not broad exposure. The middle of the market, (companies without proprietary data, without workflow integration, and without a clear path to clinical validation), will be structurally challenged as the baseline advances. The edges of the market, deep workflow integrators on one side, vertically integrated data flywheel players on the other side, offer the most defensible risk-reward.

The signal to watch above all others is clinical trial performance. Everything else, (partnerships, software revenue, regulatory acceptance), is leading indicator noise until a simulation-guided drug outperforms the historical baseline in a statistically meaningful way. That event, when it occurs, will reprice the entire layer.

Invariantum

The Simulation Layer: Where Biology Becomes Software

Recente blogposts

Opmerkingen