Why ChatGPT Failed Your SKU Matching

And How RAG Fixes It

Back to Blog

A surgical instruments distributor came to me with a challenge: thousands of SKUs in their catalog, and customers describing products in ways that never quite match the official names. "That clamp thingy for vascular surgery" could be any of fifty products. Manual matching was eating hours every day.

They'd tried ChatGPT. The results? "Relatively weak," they said. Here's why that happened—and what actually works.

The Problem: Your Catalog Isn't in ChatGPT's Training Data

ChatGPT has never seen your SKU list. It doesn't know your product codes, your proprietary names, or your specific catalog structure. So it guesses—and three fundamental problems emerge:

  • Hallucination risk: The model confidently generates product codes that don't exist
  • Domain terminology gaps: Medical terminology has variations, abbreviations, and synonyms that ChatGPT may misinterpret
  • No source of truth: Without access to your actual catalog, every answer is a guess

Enter RAG: Give the AI Your Data at Query Time

Retrieval Augmented Generation (RAG) takes a different approach: instead of baking your catalog into the model, you give it access to your data at query time. Here's how:

Customer Inquiry
"clamp for vessels"
Vector Search
Your Catalog(embedded)
LLM + Candidates
Claude picksthe best match
Best Match
"VC-1204-C"
StepWhat HappensExample
1. RetrievalCustomer inquiry is matched against your catalog"Vascular clamp, curved" → finds 5 similar products
2. AugmentationRetrieved products are added to the AI promptLLM sees actual product specs, codes, prices
3. GenerationAI reasons about which product best matches"Based on specifications, VC-1204-C is the best match"

The key insight: the LLM's role shifts from "knowing everything" to "reasoning well." It doesn't need to memorize your catalog—it just needs to intelligently compare what the customer asked for against the actual products you retrieved.

How RAG Works for SKU Mapping (Technical but Accessible)

Let me break down the technical implementation without drowning you in code.

Step 1: Embed Your Product Catalog

First, you convert your catalog into a searchable format. Each product description gets transformed into a numerical representation (an "embedding") that captures its semantic meaning.

"14cm curved hemostatic forceps with ratchet lock" becomes a point in high-dimensional space where similar products cluster together. This lets you find semantically similar items even when the exact words don't match.

Step 2: Search for Similar Products

When a customer inquiry comes in, you embed their description the same way and search for the nearest products in that semantic space. This is called "vector search" or "similarity search."

The beauty: "that clamp thingy for blood vessels" and "hemostatic forceps" end up near each other in embedding space, even though they share few words.

Step 3: Let the LLM Decide

You present the top candidate products to the LLM with instructions: "Given this customer inquiry and these product options, which is the best match? Explain your reasoning."

The LLM can now make an informed decision because it has real data to work with. It's not guessing—it's reasoning.

Implementation Approaches

There are several ways to implement RAG, depending on your technical resources:

ApproachToolsBest For
DIYVoyage AI or Google Embeddings + Pinecone, Supabase pgvector, or ChromaDBFull control, cost optimization
Low-CodeLangChain, LlamaIndex, HaystackRapid prototyping, flexibility
EnterpriseAzure AI Search, AWS Bedrock, Google Vertex AIScale, security, compliance

For a surgical instruments company, I'd likely start with LangChain and Supabase pgvector—fast to prototype, easy to integrate with existing systems, and cost-effective at moderate scale.

Real-World Considerations

RAG isn't magic. Here's what separates successful implementations from failed ones:

Data Quality Matters

"Garbage in, garbage out" applies doubly here. If your product descriptions are inconsistent, poorly written, or missing key details, the embeddings won't capture the right semantics.

Before building RAG, audit your catalog. Standardize descriptions. Add synonyms and common misspellings. The AI can only match what's in your data.

Handling Medical Terminology

Surgical instruments have particularly tricky terminology. Abbreviations ("hemo" for hemostatic), brand names vs. generic names, and regional variations all complicate matching.

Solutions: include synonyms in your product data, use medical-specific embedding models, or add a preprocessing step that expands abbreviations.

Confidence Thresholds and Human-in-the-Loop

Not every match will be certain. Build in confidence scoring—when the top matches are close, flag them for human review rather than guessing.

For high-stakes industries like medical equipment, a human-in-the-loop workflow is essential. The AI handles the 80% of clear matches; humans verify the edge cases.

Measuring Success

Track metrics that matter: match accuracy, time saved per inquiry, reduction in manual lookups, and customer satisfaction. Build a test set of known inquiry→product mappings to benchmark against.

Why This Matters Beyond SKU Matching

RAG is a pattern, not a product. The same approach works for:

  • Customer support: Match inquiries to relevant documentation
  • Legal research: Find relevant precedents and clauses
  • Sales enablement: Surface relevant case studies and materials
  • Internal knowledge: Answer questions from company documentation

Any time you need AI to reason about your proprietary data, RAG is the answer. It bridges the gap between powerful language models and the specific knowledge that lives in your systems.


Case study based on a real engagement with a surgical instruments distributor. Details anonymized to protect client confidentiality.

💡

Struggling with Data Matching Challenges?

We build RAG systems that connect AI to your business data—product catalogs, documentation, knowledge bases. Let's discuss your use case.

David Liew

About the Author

David Liew learned the languages of business—numbers under Unity's global CFO and at Meta, operating as employee #1 scaling SG Code Campus from $100K to $2M, and systems as a full-stack builder. AI became his force multiplier. He now translates complexity into practical solutions for Singapore SMEs.

Learn more about David