Hosted /search tuned for messy catalogs

Search infrastructure that can handle Hindi, Hinglish, misspellings, and brand drift.

You bring a catalog. We tune a domain-specific search stack on it and hand back a hosted /search endpoint. Same integration shape as a search SaaS, but built for how your customers actually type.

Reference domain Indian auto parts, because it is one of the worst-behaved search surfaces anywhere.
Current stack Hybrid BM25 + fine-tuned BGE-m3 + class-routed reranker + Hinglish bridge vocabulary.
Delivery model Catalog in, benchmark out, hosted endpoint back. No in-house IR team required.
Product direction The endpoint is the wedge. The real product is the agentic tuning system behind it.
Where defaults fail

Auto parts is the example. The pattern generalizes.

The problem is not “search” in the abstract. It is domain language, transliteration, catalog messiness, and user behavior that default search products are not tuned for.

Code-switched input

Customers switch between Hindi, Romanized Hindi, and English inside the same search journey. Default tokenization and vocabulary coverage are not built for that.

Messy lexical reality

Queries arrive as `brek pad`, symptom language, OEM-ish part numbers, and genericized brand names. Good search needs routing, not one universal ranking rule.

Catalog-specific adaptation

The work is reproducible: tune tokenizer assumptions, bridge vocabulary, retrieval weights, and reranking behavior against a benchmark built from the catalog’s own reality.

Benchmark

Measured against the right baseline

This is a 149-query graded evaluation over a 26,835-document Indian auto-parts catalog. The default baseline is raw Meilisearch, used as a proxy for default SaaS search behavior on Indian catalogs. OpenAI stays here as a research ceiling, not as the product competitor.

Metric Default SaaS Our /search OpenAI ceiling
Overall nDCG@10 0.23 0.45 0.47
Hindi / Hinglish 0.14 0.47 0.54
Misspelled queries 0.20 0.76 0.53
Symptom queries 0.14 0.50 0.55
Zero-result queries 44 / 149 0 / 149

Default SaaS = raw keyword search without Indic-aware preprocessing, Hinglish bridge vocabulary, hybrid fusion, or reranking. That is the category-level failure mode we are targeting.

Benchmark chart comparing default SaaS search, our tuned search, and OpenAI ceiling
What you get

Not a model dump. A working search surface.

The delivery is a scoped search API and a benchmark-backed configuration, not a vague “AI search” promise.

Search stack

  • Fine-tuned BGE-m3 multilingual embedding model
  • BM25 retrieval with custom Indic tokenization
  • 2,700-pair Hinglish bridge dictionary
  • Query-class routing and cross-encoder reranking where it helps

Delivery shape

  • You share a catalog sample and real user queries
  • We benchmark the default state against the tuned state
  • You get a hosted /search endpoint and a clear lift narrative
  • Integration stays close to the way teams already consume search APIs
Process

The bigger product is the playbook

Auto parts is the first reference implementation. The long-term product is the repeatable, agentic tuning workflow that reproduces this result on other catalog domains.

1
Catalog + query intake Start with the messy reality: product rows, user queries, failure cases, and category nuance.
2
Benchmark and error taxonomy Build the evaluation set, classify where default search breaks, and measure against the right baseline.
3
Tune retrieval, routing, and ranking Adjust the stack to the catalog instead of pretending one generic retrieval recipe is enough.
4
Ship the endpoint Hand back a scoped API plus the benchmark narrative that proves why it is better than the default path.