Improved classification accuracy by up to 60% while standardizing industry data.
Finance
|
Data
|
Generative AI
Ramp built an in-house RAG system to automatically classify businesses into standardized industry codes, replacing their inconsistent homegrown classification system.
Ramp needed a better way to categorize what type of business their customers operate. Their old system was inconsistent and often put very different companies into the same vague category like "Professional Services." They built a smart system that reads business descriptions and automatically assigns the correct standardized industry code, making it much easier for different teams to understand and serve customers appropriately.
Ramp migrated from a problematic homegrown industry classification system to standardized NAICS codes using a custom Retrieval-Augmented Generation (RAG) system. The old system suffered from four major issues: incorrect categories, overly generic classifications, inconsistent categorization of similar businesses, and lack of auditability. For example, diverse companies like law firms, dating apps, and consulting firms were all lumped into "Professional Services."
The RAG system operates in three stages: embedding generation creates text embeddings for business queries and the NAICS knowledge base, similarity search computes scores to generate recommended NAICS codes, and LLM-based selection makes the final classification decision. The system achieved up to 60% improvement in accuracy at k (acc@k) metrics for the first two stages, while prompt engineering and parameter tuning in the LLM stage delivered 5-15% improvement in fuzzy accuracy.
The technical infrastructure uses ClickHouse for fast embedding storage and retrieval, Kafka for comprehensive logging of intermediate results, and internal services for embeddings and LLM prompts. Guardrails filter out invalid NAICS code outputs to prevent hallucinations. This approach gives Ramp full control over the algorithm, enabling real-time hyperparameter adjustments and rapid iteration compared to third-party solutions.
It's like having a librarian who used to randomly shelve books wherever there was space, making it impossible to find anything. Now they have a smart filing system that reads each book's summary and automatically places it in the correct section using the standard library classification system, so everyone knows exactly where to look.
4
/5
Strong applied innovation in finance with a custom RAG system, but primarily an advanced deployment of existing techniques rather than new algorithmic breakthroughs.
Timeline:
9 months
Cost:
$1,100,000
Headcount:
8