Claude Code for Science, automating code generation and research workflows.
Using agentic scientific code generation, retrieval-augmented scientific synthesis from databases, and domain-specific LLM fine-tuning.

|
Scientific Computing
|
YC W26

Last Updated:
March 19, 2026

Builds an agentic AI platform that integrates Anthropic's Claude Code to automate scientific code generation, data analysis, and research workflows for computational biology, chemistry, and physics.
Synthetic Sciences publicly positions itself as "Claude Code for Science," with announced integrations to scientific databases (PubMed, ChEMBL, ClinicalTrials.gov, Benchling, 10x Genomics), support for multiple Claude models (Opus, Sonnet, Haiku), modular "Agent Skills" for reproducible bioinformatics workflows, and planned connectors to arXiv, bioRxiv, chemRxiv, GitHub, Zenodo, and citation managers like Zotero and Mendeley.
GitHub activity and technical architecture signals suggest investment in vector database infrastructure (RedisVL), RAG pipelines, and domain-specific fine-tuning of Claude models for scientific reasoning. The two-person team composition,one with AI product/patent experience and one with elite competitive programming skills,hints at a deeply technical, code-first approach. Likely operating in closed/invite-only beta with select research labs. Conference and community engagement patterns suggest partnerships with academic institutions are being explored. The lack of hiring signals a near-term fundraise or accelerator application (YC S26 or similar) to scale the team.
<p>Agentic AI autonomously generates, executes, and iterates on scientific code (Python, R, Julia) for data analysis, simulation, and visualization across biology, chemistry, and physics research workflows.</p>
It's like having a tireless PhD-level research programmer who writes, debugs, and reruns your entire analysis pipeline while you focus on the actual science.
Synthetic Sciences deploys Anthropic's Claude Code as an autonomous agent that receives high-level scientific objectives—such as "perform differential expression analysis on this RNA-seq dataset" or "run molecular dynamics simulation with these parameters"—and generates complete, executable code pipelines. The agent handles environment setup, library imports, data ingestion, quality control, statistical analysis, figure generation, and report compilation. It leverages Claude's advanced reasoning to select appropriate algorithms, handle edge cases in messy experimental data, and iterate on failed runs with intelligent debugging. Modular "Agent Skills" allow researchers to compose reproducible workflows from pre-built components (e.g., QC → normalization → clustering → DE analysis) while retaining the flexibility to customize any step. The system integrates with GitHub for version control and cloud compute for scalable execution, enabling entire research teams to share and reproduce agentic workflows.
It's like replacing your lab's overworked bioinformatics postdoc with an AI that never sleeps, never forgets to set a random seed, and actually documents its code.
<p>RAG-powered system retrieves and synthesizes evidence from scientific literature, databases, and experimental data to provide citation-backed answers and inform research decisions in real time.</p>
It's like having a research librarian with photographic memory who instantly reads every relevant paper and database entry, then writes you a perfectly cited summary.
Synthetic Sciences implements a Retrieval-Augmented Generation (RAG) architecture that connects Claude LLMs to a continuously updated index of scientific literature (PubMed, arXiv, bioRxiv, chemRxiv), chemical databases (ChEMBL), clinical trial registries (ClinicalTrials.gov), and proprietary experimental datasets uploaded by users. When a researcher poses a question—such as "What are the known off-target effects of this CRISPR guide RNA?" or "Summarize recent phase II trial results for GLP-1 agonists in NASH"—the system performs semantic search across vector-embedded document stores, retrieves the most relevant passages, and feeds them as context to Claude for grounded, citation-backed synthesis. The vector database infrastructure (likely RedisVL or similar) enables sub-second retrieval across millions of documents. The system distinguishes between high-confidence claims supported by multiple sources and speculative findings, flagging uncertainty for the researcher. Integration with Zotero and Mendeley allows automatic export of cited references into the researcher's existing citation workflow.
It's like Google Scholar, Wikipedia, and a tenured professor had a baby that actually reads the full text of every paper instead of just the abstract.
<p>Fine-tunes and optimizes Claude foundation models on domain-specific scientific corpora to improve accuracy, reasoning depth, and task performance for specialized research applications across computational biology, chemistry, and physics.</p>
It's like teaching a brilliant generalist the specialized vocabulary and reasoning patterns of each scientific field so it stops confusing a Western blot with a Rorschach test.
Synthetic Sciences invests in adapting Anthropic's Claude foundation models to the unique demands of scientific domains through targeted fine-tuning on curated scientific corpora—peer-reviewed papers, experimental protocols, code repositories, and structured databases. For computational biology, this means training on genomics pipelines, protein structure data, and single-cell analysis workflows. For chemistry, the model ingests reaction databases, spectroscopy data, and molecular property datasets. For physics, it learns from simulation code, mathematical derivations, and experimental measurement conventions. This fine-tuning improves the model's ability to generate syntactically correct domain-specific code (e.g., Bioconductor in R, RDKit in Python), reason about experimental design trade-offs, interpret statistical results in context, and avoid common scientific errors (unit mismatches, inappropriate statistical tests, biologically implausible conclusions). The fine-tuned models are deployed as selectable options within the platform, allowing researchers to choose the model variant best suited to their discipline. Continuous feedback loops from user corrections further refine model performance over time, creating a flywheel effect where platform usage directly improves scientific accuracy.
It's like sending ChatGPT to grad school in three different departments simultaneously—except it actually finishes all three PhDs and remembers everything.
Synthetic Sciences combines deep Anthropic Claude API expertise with elite competitive programming talent, enabling them to build agentic scientific workflows that go far beyond literature search,autonomously generating, executing, and iterating on backend research code in ways competitors like Elicit and Consensus cannot.