David Friedberg has pushed many of us to ask a simple question that hides a hard truth: where should founders actually build next in AI and biology to create real value now, not in ten years.
I wrote this roadmap to answer that question with practical steps founders can use in synthetic biology and biomanufacturing today.
You will get a no-nonsense view of data, lab automation, machine learning in biotech, and the exact choke points where a startup can win.
I will also show how to sequence your work from first demo to GMP, and how to keep capital efficiency while you scale.
.png)
I see founders riding a wave that finally has enough data, compute, and automation to matter.
Machine learning models are better at structure, sequence, and process prediction, but the bottleneck is still getting high-quality experimental data fast.
That is where lab automation and smart experimental design turn AI from a deck slide into a product.
I think of this moment as applied biology’s cloud era, where the winners will ship workflows, not just science.
When people reference David Friedberg, they often mean a focus on systems thinking, unit economics, and building full-stack solutions across agriculture, food, materials, and climate.
Those are the markets where AI and biology can compound and create defensible moats.
For more on founder operating systems and venture readiness, see our blog post: Capitaly.vc Blog.
I map every AI x biology company against five milestones.
This is the shortest path I know from idea to revenue-grade production.
I keep the bar simple at each step.
Can you prove a unit model, validate cost curves, and show repeatability.
Can you show data density per dollar and time-to-learning improving each month.
That is the heartbeat of an AI-first bio company.
I see many teams underestimating dataset design and overestimating off-the-shelf models.
Your edge is not just the model but the dataset you can generate repeatedly under controlled conditions.
I plan these datasets like products.
Each has a schema, metadata, controls, and a path to become a reusable asset.
Labeling quality and experimental confounders will make or break your model performance.
For more on structuring data for venture-grade progress, see our blog post: Capitaly.vc Blog.
I start with throughput and control, not fancy robots.
If you double your design cycles per week, you double your learning velocity.
Automation is about cutting human-in-the-loop variance and latency.
I aim for a 48-hour DBTL cycle for enzymes and 7 to 14 days for strain improvements early on.
That speed compounds into data density and better models.
I recommend putting automation dollars where the hands are bleeding today.
Start with the top three repetitive steps causing delays or errors.
Measure ROI by time saved, variability reduced, and experiments per week.
A lean stack with an Opentrons, a plate reader, an incubator shaker, and barcode tracking can generate outsized returns if aligned with your DBTL loop.
Do not buy robots you cannot program or maintain.
I use cloud labs to de-risk capex and to standardize workflows.
Cloud facilities like Emerald Cloud Lab and Strateos can compress time to data when I have clear protocols and acceptance criteria.
I still validate critical steps in-house to keep IP knowledge and edge cases close.
Think of cloud labs as elastic capacity for methods that are already locked.
When your process is still moving targets, in-house control often wins.
I treat protein language models, structure prediction, and diffusion models as senior interns, not oracles.
They are fast and creative, but they need a manager called the wet lab.
Use ESM-style embeddings, AlphaFold-style structure, and generative models to propose candidates.
Then couple that to rigorous screening and negative controls.
The same logic applies to metabolic pathway design and regulatory sequences.
Foundation models are multipliers when coupled with thoughtful priors and Bayesian updates.
I set up active learning early, even with small datasets.
A simple loop of propose, test, update, and repeat lowers the number of experiments needed to reach a target.
Make the optimizer a product feature, not a research project.
Show investors you learn faster than peers, not just that you can run more plates.
I build a digital twin at the 1 to 5 liter stage with mass balances and kinetic models, and I update it weekly.
It does not need to be perfect, but it must be useful.
The twin informs feed strategies, oxygen transfer, and scale-down models.
Link it to your PAT signals like Raman and off-gas, and you will prevent failures later.
A rough but updated twin beats a glossy model that never touches reality.
I recommend a scale-up ladder that proves transferability before you order expensive steel.
Use scale-down models to mimic large-vessel gradients and test worst cases.
Build a transfer package with clear CPPs, CQAs, and acceptable ranges.
That package is your passport to any CDMO or internal plant.
I think of QbD as a live model with guardrails.
It is not a PDF.
Your software should map inputs, parameters, and outputs to product quality in real time.
Put alarms and playbooks on top of that map so operators know what to do when drift happens.
That is how you turn process understanding into margin.
I do not bolt on GxP at the end.
I write SOPs, validation plans, and data integrity rules while the system is small.
Even if you are not pharma, food and materials customers care about traceability.
Build your ELN, LIMS, and MES with audit trails, user roles, and change control from day one.
This saves months at commercialization.
I separate three layers so I can scale without replatforming.
I add a fourth layer that many skip.
That is the analytics and model hub that pulls from all three and pushes recommendations back.
Use open standards where possible like SiLA 2 to reduce custom glue code.
For more on systems that speed up fundraising diligence, see our blog post: Capitaly.vc Blog.
I pick gear based on software ecosystem, support, and the fit to my workflows, not just specs.
Opentrons is flexible and affordable for early loops.
Tecan and Hamilton shine for robust, validated high-throughput operations.
Custom rigs only make sense when your workflow is your moat and you can service it.
I avoid vendor lock-in by keeping protocols in version-controlled repositories and using adapters where possible.
I look for customers with a burning platform and a short time-to-value.
These are usually teams with revenue pressure or regulatory deadlines.
I lead with a pilot that has a clear ROI and a success metric tied to money.
That can be cost per variant screened, time to hit target titer, or reduced batch failure rate.
I see three models that work, and hybrids among them.
Services are a bridge to product and a data engine if you treat metadata like gold.
Software multiplies margin and reach if you can standardize integrations.
Cells-as-a-Service gives you recurring revenue but requires careful IP and quality control.
I do not rely on novelty alone as a moat in synbio.
I build moats from data quality, proprietary workflows, and integrations that become switching costs.
When a customer says switching will cost them months of validation, you have a moat.
Document your edge like a product, not folklore.
I staff small, cross-functional squads aligned to outcomes.
Each squad owns a loop from design through test and analysis.
Put an ML engineer next to an automation scientist and a fermentation lead.
Give them a weekly cadence and a dashboard tied to milestones.
Hire for people who automate themselves out of a job and then go find the next bottleneck.
I build budgets around learning velocity, not vanity equipment.
Here is a rough blueprint I have used.
Track burn per learning milestone, not just runway in months.
If your cost per unit learning drops each quarter, you are on the right path.
For more on capital planning and investor expectations, see our blog post: Capitaly.vc Blog.
I would build in three areas that feel under-served and near-term valuable.
Each of these has clear buyers, measurable ROI, and a path to defensible data assets.
They align with the practical mindset I associate with the best AI x biology founders.
I pick the first wedge by using two filters.
High pain and low adoption friction.
I validate with five design partner calls and one paid pilot before I start building.
If I cannot define a single number that improves within 90 days, I do not ship it.
Examples include cutting analytical turnaround by 50 percent or improving titer by 20 percent in six weeks.
I bias toward assays with high repeatability, strong dynamic range, and automation potential.
If the assay is noisy, the model will be blind.
I run gage R and control charts even in research to spot drift early.
I standardize sample prep and normalization and I annotate everything that could confound learning.
That discipline compounds into model performance.
I negotiate data and IP rights upfront with any CDMO, cloud lab, or instrumentation vendor.
Your assay outputs, metadata, and derived models should be yours to use and improve.
I avoid proprietary file formats that lock me into a tool.
Export to open schemas and mirror your data to your own warehouse.
Long-term leverage comes from portability.
I write tech transfer playbooks that an external team can follow without tribal knowledge.
They include CPPs, CQAs, sampling plans, out-of-spec protocols, and contact trees.
I attach a simplified digital twin and a control strategy page that explains why each parameter matters.
This is how I reduce transfer cycles and build credibility with partners.
I time raises around milestones that are obvious on one slide.
Pre-seed is about proving your loop and getting first performance gains.
Seed is about model lift, scale-down validation, and one or two paying pilots.
Series A is about pilot scale economics, QbD, and a pipeline of commercial contracts.
I tell the story in cost per learning and cost per unit improvement because it maps to value creation.
For more on fundraising narratives in deep tech, see our blog post: Capitaly.vc Blog.
I hire builders who have shipped in messy environments.
I probe for people who wrote their own scripts, automated their job, and taught others.
My first ten include a lead automation scientist, an ML engineer, a fermentation lead, a data engineer, and a scrappy operations owner.
I pick athletes who can cover two positions each for the first year.
Culture is weekly demos, blameless postmortems, and relentless documentation.
I keep the story simple.
We make biology faster, cheaper, and more reliable, and here is the proof.
For regulators, I lead with control strategies and data integrity.
For customers, I lead with time-to-value and risk reduction.
One slide per audience is all you need when the work is real.
I see five patterns that waste time and burn cash.
When in doubt, trim scope and tighten loops.
Your cadence is your moat.
Synthetic biology is the design of biological systems and parts, while biomanufacturing is the scaled production using those systems.
Think of synbio as design and build, and biomanufacturing as produce and control.
You can get value with hundreds to low thousands of labeled examples if your assay is clean and you use active learning.
The right priors beat raw volume early on.
Automate one high-impact bottleneck, standardize one robust assay, and ship a simple active learning loop.
Expand only after you see weekly gains.
No, but GxP principles help with traceability, reliability, and customer trust.
Build the bones early and scale the formality later.
Use cloud labs for standardized, scalable methods and burst capacity.
Keep variable, IP-critical, or evolving workflows in-house.
Start with a liquid handler, reliable incubator shakers, a plate reader, and barcode-based tracking.
Add colony picking and analytics as throughput grows.
Negotiate ownership of raw data, metadata, and derived models.
Export data to open formats and mirror to your own storage.
Combine technoeconomic analysis with pilot data on titer, rate, yield, media cost, and batch success probability.
Build sensitivity ranges and update monthly.
Lead with learning velocity, repeatability, and a clear line from data to dollars.
Show a timeline that reduces risk at each stage with evidence.
Assay drift, unscalable workflows, lack of integration, and underestimating scale-up complexity.
Fix those with PAT, digital twins, and disciplined process design.
The next generation of AI and biology companies will win by running tighter loops, owning their data, and delivering measurable outcomes across design and manufacturing.
Use this roadmap to pick a wedge, build your DBTL engine, and scale with process intelligence as your product.
If you lead with clarity on datasets, automation, and unit economics, you will separate yourself from the noise.
That is the practical path I see for founders in synthetic biology and biomanufacturing who are inspired by the operator mindset often associated with David Friedberg.
Subscribe to Capitaly.vc Substack (https://capitaly.substack.com/) to raise capital at the speed of AI.