Shorten bug fix cycles and boost developer productivity.
Technology
|
Engineering
|
Generative AI
Automates code repair by using a fine-tuned LLM to analyze diagnostic errors and generate code fixes directly within the development environment, accelerating the debugging process.
Imagine your word processor not only highlighted spelling mistakes but also suggested how to rewrite the entire awkward sentence. Replit's Code Repair does that for programmers. It finds bugs and, instead of just pointing them out, it offers a ready-to-use fix, saving developers from the headache of figuring it out themselves.
Replit developed an automated code repair feature to address the significant time developers spend on debugging. The company observed that while the Language Server Protocol (LSP) identifies errors, it only provides automated fixes in about 10% of Python diagnostic cases on their platform. To improve this, they built a large language model (LLM) natively integrated with their cloud-based development environment.
The approach leverages Replit's unique data streams, including edit-by-edit session histories known as Operational Transformations (OTs) and other session events like LSP diagnostics and code executions. The data pipeline reconstructs the project's state at the moment an error occurs. Instead of using noisy, user-generated fixes for training, the system uses LLMs to generate synthetic, high-quality line diffs as training targets. These synthetic fixes are then verified for correctness and applicability before being used.
The core of the system is a 7B parameter model, DeepSeek-Coder-Instruct-v1.5, which was chosen for its balance of performance, latency, and cost. The model was fine-tuned using a supervised learning approach. It is trained to predict a specific output format, a numbered line diff, given the LSP error, the problematic line of code, and the full file content as input. This structured input and output schema, which uses special tokens and explicit line numbers, ensures the generated fixes can be applied unambiguously and reliably within the IDE.
It's like having a master mechanic looking over your shoulder while you work on a car. The moment a tool slips or you grab the wrong part, they do not just grunt, they instantly hand you the correct tool and show you exactly how to use it, getting you back on track in seconds.
4
/5
The project's novelty stems from its sophisticated data engineering, which uses proprietary real-time user interaction data (Operational Transformations) to precisely reconstruct the coding environment at the moment of an error. This approach, combined with using LLMs to synthetically generate and verify a clean training dataset, creates a powerful, context-aware feature that surpasses standard fine-tuning on public code. While other tools use codebase context, Replit's method of recreating the exact temporal state is a leading-edge practical application of generative AI
Timeline:
10 months
Cost:
$1,331,650
Headcount:
11