searchmultilingualbm25rrfragirpatentlegal

Designing Multilingual Search for Patents, Law, and Academic Papers - From BM25 to RRF and Cross-encoders

Specialist-domain multilingual search is not just a multilingual embedding plus RAG. This post breaks it down into multilingual BM25, language-specific analyzers, cross-lingual matching, sentence-to-document RRF fusion, and a side-by-side OpenSearch+Qdrant vs Elasticsearch+Milvus build-out.

Data DynamicsMay 18, 202622 min read

This post was written in response to a reader's request — "please cover multilingual search in domains like patents, law, and academic literature: not just multilingual embeddings and LLM RAG, but keyword matching (multilingual BM25), multilingual similarity, sentence-to-document RRF strategies." Thank you for the excellent prompt — this post is the answer.

Search for patents, law, and academic papers differs from web search in one decisive way: a single missed result can mean losing an invalidation trial, missing the controlling precedent in oral argument, or shipping a literature review that ignores prior work. Meanwhile users ask in Korean, but the answers live in English claims, Japanese rulings, and Chinese journals.

This post tackles that gap without trying to solve it with one embedding model, decomposing it into four axes:

Multilingual keyword matching (BM25 plus per-language analyzers)
Cross-lingual information retrieval (CLIR)
Multilingual semantic similarity
Sentence-to-document RRF fusion plus cross-encoder reranking

The general RAG pipeline is covered in the RAG Complete Guide. This post focuses only on going deep on the retrieval stage.

1. Why "multilingual embedding + RAG" alone falls short

Specialist-domain search differs from generic search for three decisive reasons.

(1) The vocabulary itself is the answer. A patent claim's "a plurality of", a legal "third party", a paper's "p < 0.05" — these phrases cannot be paraphrased without changing legal or scientific meaning. Embedding models are trained to pull semantically similar text together, which makes them weakest exactly where you need exact lexical matching: abbreviations, proper nouns, statute numbers, chemical formulas.

(2) Recall has to approach 1.0. Web search succeeds if the answer is in the top three. Patent prior-art search has to find that one filing that might exist somewhere in the world. Precision can be fixed by a human reviewer in the second stage; recall losses are silently invisible forever.

(3) The unit is not the sentence — it is the document. One patent has dozens of claims and hundreds of specification paragraphs. The user wants the most relevant patent, not the most relevant claim. Score by chunk and present the chunk list, and five chunks from the same patent will dominate the top five results.

These three properties together mean the naive "shove all docs into a multilingual embedder, top-k into an LLM" RAG fails on day one in this domain.

Takeaway: specialist-domain search must be evaluated along three axes — lexical precision × domain recall × document-level aggregation.

2. Per-domain requirement decomposition

"Specialist search" is not one problem. Units, ranking signals, and evaluation criteria all differ across the three domains. Spec them out before you touch the index.

Item	Patents	Law	Academic papers
Search unit	1 patent (application no.)	1 case or 1 article of a statute	1 paper (DOI)
Chunk unit	Claim, spec paragraph	Holding paragraph, reasoning section	Abstract, section, figure caption
Hard-match signals	IPC class, applicant	Court, instance, year, cited cases	Authors, journal, citation graph
Critical vocabulary	Claim wording, drawing references	Statute numbers, case numbers	Taxonomic names, formulas, abbreviations
Recall standard	Even one miss = invalidation risk	Missing the key case = losing	Missing the key prior work
Multilingual pattern	en/ko/ja/zh parallel filings	Local language + English summary	English body + non-English abstract

These three deserve separate indices, separate weights, and separate evaluation sets, even if they share the same engine.

3. Multilingual keyword matching: pushing BM25 across languages

BM25 is a 1990s algorithm but in specialist domains it is still the strongest single signal. The trick of taking it multilingual is not the algorithm but per-language analyzers and field weights.

3.1 Per-language analyzer map

Language	OpenSearch / Elasticsearch analyzer	Strategy	Caveats
Korean	`nori` (built-in to OpenSearch)	Morphological + compound noun decomposition	Drop particles with `nori_part_of_speech`
Japanese	`kuromoji`	Morphological + kana/kanji normalization	Old→new shinjitai conversion needed
Chinese	`smartcn` or `ik`	Word segmentation	Traditional↔simplified normalization (`stconvert`)
English	`standard` + `english` stemmer	Stemming + lowercase	Disable stemmer for domain abbreviations
DE/FR/ES	Per-language stemmer	Stemming	Compound splitter (`hyphenation_decompounder`)

3.2 Same Korean sentence, four analyzers

Looking at what each analyzer emits makes the choice obvious.

Input: "This invention relates to a channel estimation method in wireless communication systems." (Korean: "본 발명은 무선 통신 시스템에서의 채널 추정 방법에 관한 것이다.")

Analyzer	Tokens
`standard` (English default)	본, 발명은, 무선, 통신, 시스템에서의, 채널, 추정, 방법에, 관한, 것이다
`nori` (default)	본, 발명, 은, 무선, 통신, 시스템, 에서, 의, 채널, 추정, 방법, 에, 관한, 것, 이다
`nori` + POS filter	발명, 무선, 통신, 시스템, 채널, 추정, 방법
CJK bigram (fallback)	본발, 발명, 명은, 은무, 무선, … (n-gram)

standard glues "시스템에서의" into one token so a search for "시스템" finds nothing. nori + POS filter compresses to seven content words, giving the cleanest BM25 signal. The CJK bigram fallback is the dictionary-less safety net for new terms and proper nouns — always index it alongside.

3.3 OpenSearch index mapping example (patents)

{
  "settings": {
    "analysis": {
      "analyzer": {
        "ko_nori": {
          "type": "custom",
          "tokenizer": "nori_tokenizer",
          "filter": ["nori_part_of_speech", "lowercase", "ko_synonyms"]
        },
        "ko_bigram": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "cjk_bigram"]
        },
        "en_domain": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "english_minimal_stem", "ipc_synonyms"]
        }
      },
      "filter": {
        "ko_synonyms": { "type": "synonym_graph", "synonyms_path": "synonyms/ko-patent.txt" },
        "ipc_synonyms": { "type": "synonym_graph", "synonyms_path": "synonyms/ipc.txt" }
      }
    }
  },
  "mappings": {
    "properties": {
      "application_no": { "type": "keyword" },
      "ipc":             { "type": "keyword" },
      "filing_date":     { "type": "date" },
      "applicant":       { "type": "keyword", "fields": { "text": { "type": "text", "analyzer": "ko_nori" } } },
      "title_ko":        { "type": "text", "analyzer": "ko_nori",
                           "fields": { "bigram": { "type": "text", "analyzer": "ko_bigram" } } },
      "title_en":        { "type": "text", "analyzer": "en_domain" },
      "claims_ko":       { "type": "text", "analyzer": "ko_nori" },
      "claims_en":       { "type": "text", "analyzer": "en_domain" },
      "abstract_ko":     { "type": "text", "analyzer": "ko_nori" },
      "abstract_en":     { "type": "text", "analyzer": "en_domain" }
    }
  }
}

Two things to notice: (a) the same field indexed by two analyzers (title_ko + title_ko.bigram) and (b) separate per-language fields (*_ko, *_en) so queries can re-weight by the user's language.

3.4 BM25F: field weights encode domain knowledge

Claims weigh more than the spec, titles weigh more than abstracts. A multi-field BM25 query encodes that prior.

{
  "query": {
    "multi_match": {
      "query": "wireless communication channel estimation",
      "type": "best_fields",
      "fields": [
        "title_en^4",
        "claims_en^3",
        "abstract_en^2",
        "title_ko^2",
        "claims_ko^1.5"
      ],
      "tie_breaker": 0.3
    }
  }
}

Tune weights by grid search against the evaluation set from chapter 7. The heuristic "claims are 1.5–3× more important than spec" holds across nearly every patent corpus we have seen.

Takeaway: multilingual BM25 quality is decided by analyzer × field split × synonym dictionary, not the algorithm.

4. Query, translation, and cross-lingual matching (CLIR)

A user enters "무선 통신 채널 추정" in Korean and you have to find an English patent for "channel estimation in wireless communication". Three approaches.

Strategy	How	Pros	Cons
(A) Query Translation	Translate the query into every indexed language, run BM25 separately	Index untouched, simple	Short queries translate badly
(B) Document Translation	Pre-translate every document to every language, single index	Fastest at query time	Index size × N, large translation cost
(C) Cross-lingual Embedding	Dense search in a language-agnostic embedding space	Single representation for query and doc	Loses exact lexical matches

Recommended in practice: (A) + (C) hybrid. Translate the query with a domain glossary first, complement with a separate multilingual encoder, then fuse with the chapter-6 RRF. Avoid (B) in domains like patent claims and statutes where translation itself can change legal meaning.

4.1 Glossary-driven query expansion

LLM translation might render "채널 추정" as "channel guess". A pinned domain glossary is safer.

GLOSSARY_KO_EN = {
    "채널 추정": ["channel estimation"],
    "무선 통신": ["wireless communication", "radio communication"],
    "직교 주파수 분할 다중화": ["OFDM", "orthogonal frequency-division multiplexing"],
    "제3자":     ["third party"],
    "선행기술":   ["prior art"],
}
 
def expand_query(q_ko: str) -> dict:
    en_terms = []
    for ko_term, en_list in GLOSSARY_KO_EN.items():
        if ko_term in q_ko:
            en_terms.extend(en_list)
    return {"ko": q_ko, "en": " ".join(en_terms) or None}

The expanded query is matched against the *_ko and *_en fields from 3.3 in a single multi-match query.

4.2 Tokens that must never reach the translator

Chemical and math formulas: H2SO4, O(n log n)
Abbreviations and proper nouns: OFDM, LSTM, K-NN
Statute and case numbers: Supreme Court 2019Da12345
IPC/CPC class codes: H04L 27/26
Drawing reference numerals: 100, 100a

Standard practice: regex-mask before translation, then unmask the output.

5. Multilingual semantic similarity

BM25 alone cannot bridge the gap between "channel estimation", "a method of channel estimation", and "radio channel identification". That is where dense search enters. Picking a multilingual embedding model for a specialist domain is more delicate than the general RAG case — see the Embedding Model Guide for a broader comparison. Here we list only the domain-search-specific criteria.

5.1 Multilingual embedding candidates

Model	Dim	Langs	Strengths	Weaknesses
LaBSE	768	109	Strong on short query/sentence alignment, BERT-stable	Loses on long documents
multilingual-e5-large	1024	100+	Trained for query/doc asymmetry (`query:` / `passage:` prefixes)	Prefix mandatory
BGE-M3	1024	100+	Emits dense + sparse + multi-vector in a single pass	Indexing complexity
Cohere embed-multilingual-v3	1024	100+	API, commercial quality	External call

Default for specialist domains: BGE-M3. A single pass yields dense + sparse + multi-vector (ColBERT-style) representations, so keyword and semantic signals come from the same model.

5.2 Asymmetric length problem

Patent claims often run 500+ characters in a single sentence; user queries are under 10 characters. Address the asymmetry from both sides.

(a) Unify chunk size: split each claim again on period + semicolon to 100–200 tokens. (b) Expand the query: for short queries, generate a one-to-two sentence hypothetical answer with an LLM and embed that (HyDE).

def hyde_expand(query: str, llm) -> str:
    prompt = f"""Write a one-paragraph patent abstract that would directly answer:
"{query}"
Use technical vocabulary. Do not add disclaimers."""
    return llm.complete(prompt)
 
q_vec = embed(hyde_expand(user_query, llm))

HyDE typically lifts dense recall on short queries by 8–15 percentage points on BEIR-class and domain evaluation sets.

5.3 Domain adaptation

The base model alone is not enough for IPC codes, statute numbering, or taxonomy. Two options.

LoRA fine-tune: collect 10k–100k domain (query, positive doc, negative doc) triples and train 1–2 contrastive epochs. Typical nDCG@10 lift of 5–10 percentage points.
In-context query rewriting: keep the model frozen; do dictionary lookup and abbreviation expansion in query preprocessing.

Start with the latter when domain data is scarce; switch to the former once the evaluation set exceeds ~10k examples.

6. Sentence-to-document RRF fusion (the core of this post)

After chapters 3 (BM25) and 5 (dense), assume both have returned a top-K. Two problems remain.

How do you combine two result lists with different score scales?
When you searched at sentence level and the same document has multiple chunks in the result, how do you aggregate to a per-document ranking?

Both answers are RRF (Reciprocal Rank Fusion). RRF uses only ranks, never raw scores, so combining incomparable systems is safe. And summing ranks of multiple chunks from the same document naturally produces a document-level score.

6.1 RRF formula

RRF(d) = Σ  1 / (k + rank_r(d))
         r ∈ R(d)

d : document (or sentence chunk)
R(d) : every ranking system in which d appeared
k : flattening constant, typically 60
rank : 1-based

6.2 Full pipeline

The end-to-end flow as a diagram:

The crucial pattern: search at sentence level, aggregate at document level. BM25 and dense each score sentence chunks and return top-100. Chunks sharing the same application_no (patent) / case_id (precedent) / doi (paper) have their ranks summed by RRF into a document score.

6.3 RRF implementation (Python, stack-agnostic)

from collections import defaultdict
from typing import Iterable
 
def rrf_fuse(
    rankings: list[list[tuple[str, str]]],  # list of [(doc_id, chunk_id), ...]
    k: int = 60,
    weights: list[float] | None = None,
) -> list[tuple[str, float, list[str]]]:
    """Sentence-chunk rankings -> document-level RRF scores.
    Each ranking list is assumed already sorted by rank.
    """
    weights = weights or [1.0] * len(rankings)
    doc_score: dict[str, float] = defaultdict(float)
    doc_chunks: dict[str, list[str]] = defaultdict(list)
 
    for ranking, w in zip(rankings, weights):
        for rank, (doc_id, chunk_id) in enumerate(ranking, start=1):
            # If you want to dampen the contribution of the 2nd+ chunk of
            # the same doc within one run, add a decay here.
            doc_score[doc_id] += w / (k + rank)
            if chunk_id not in doc_chunks[doc_id]:
                doc_chunks[doc_id].append(chunk_id)
 
    fused = sorted(
        ((d, s, doc_chunks[d]) for d, s in doc_score.items()),
        key=lambda x: x[1],
        reverse=True,
    )
    return fused

6.4 Tuning weights and k

k=60 is the original value (Cormack et al., 2009) and works in most cases. Weights typically start at [1.0, 1.0, 1.0] and grid-search toward something like [BM25, expanded BM25, dense] = [1.0, 0.6, 1.2].

Per-domain bias:

Patents: nudge BM25 up to 1.2–1.5 (lexical precision matters most)
Law: keep BM25 and dense roughly equal (statute citations are exact, facts are paraphrased)
Papers: lift dense to 1.3–1.5 (authors phrase the same idea many ways)

6.5 Cross-encoder reranking

After RRF, rerank only the top 20–50 with a cross-encoder. A cross-encoder concatenates (query, passage) into one transformer pass to produce a direct score — expensive but accurate. In specialist-domain search, the final nDCG is nearly always best with cross-encoder reranking enabled.

from sentence_transformers import CrossEncoder
 
reranker = CrossEncoder("BAAI/bge-reranker-v2-m3", max_length=512)
 
def rerank(query: str, candidates: list[dict], top_n: int = 20) -> list[dict]:
    # candidates: [{"doc_id": ..., "best_chunk_text": ...}, ...]
    pairs = [(query, c["best_chunk_text"]) for c in candidates]
    scores = reranker.predict(pairs, batch_size=32)
    for c, s in zip(candidates, scores):
        c["rerank_score"] = float(s)
    return sorted(candidates, key=lambda x: x["rerank_score"], reverse=True)[:top_n]

When you rerank at the document level, do not feed the whole document — feed the single chunk that ranked highest under RRF for that document. That cost/accuracy tradeoff has consistently been best in our deployments.

Takeaway: RRF combines the scores, RRF also changes the unit, and a cross-encoder grabs the final accuracy. Those three steps decide ~80% of specialist-search quality.

7. Evaluation — how to measure specialist search

Tuning a search system without an evaluation set is gambling. Specialist evaluation differs from generic IR in two ways.

(a) There is no single correct answer. One query may have 50 relevant patents. You need graded relevance (0/1/2/3), not binary.

(b) Domain metrics are first-class. Track business measures separately: "What fraction of invalidating prior art did we catch?", "Is the controlling precedent in our top-N?".

7.1 Core metrics

Metric	Definition	Domain use
Recall@k	Fraction of golds inside top-k	Patent prior-art: Recall@100
nDCG@10	Graded relevance weighted cumulative gain	Proxy for user satisfaction
MRR	Mean reciprocal rank of first hit	"Find one fast" UX
MAP	Mean average precision	Balanced single number
Coverage@k	Did domain must-have docs land in top-k (custom)	Hit rate of pivotal precedents

7.2 Multilingual benchmarks

MIRACL: ad-hoc retrieval across 18 languages. Useful for ranking generic multilingual dense models. Includes Korean.
mMARCO: MS MARCO translated into 13 languages. More useful for training data.
BEIR: 14 English domains. Standard for domain generalization.

Those three measure generalization. A final domain evaluation requires your own eval set — typically a minimum of 200–500 queries with graded relevance from at least two annotators per query.

7.3 Evaluation code (minimal)

import math
 
def ndcg_at_k(gains: list[int], k: int) -> float:
    dcg = sum((2**g - 1) / math.log2(i + 2) for i, g in enumerate(gains[:k]))
    ideal = sorted(gains, reverse=True)[:k]
    idcg = sum((2**g - 1) / math.log2(i + 2) for i, g in enumerate(ideal))
    return dcg / idcg if idcg > 0 else 0.0
 
def recall_at_k(retrieved_ids: list[str], gold_ids: set[str], k: int) -> float:
    if not gold_ids:
        return 0.0
    return len(set(retrieved_ids[:k]) & gold_ids) / len(gold_ids)

8. Reference architecture: two stacks side by side

Two combinations that show up most often in production, broken down by responsibility. Either stack runs the chapter-6 pipeline unchanged.

Responsibility	Stack A: OpenSearch + Qdrant	Stack B: Elasticsearch + Milvus
Sparse (BM25)	OpenSearch (Apache 2.0, nori built-in)	Elasticsearch (ELv2, kuromoji/nori plugin)
Dense	Qdrant (Rust, strong payload filters)	Milvus (large scale, GPU index)
Rerank	OpenSearch ML Commons or external	ES inference API or external
Managed	AWS OpenSearch Service, self-host	Elastic Cloud, Zilliz Cloud
License	OSS-friendly across the board	ES is ELv2, Milvus is Apache 2.0

8.1 Stack A: OpenSearch + Qdrant

Dense: load chunks into Qdrant

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
 
qc = QdrantClient(url="http://qdrant:6333")
 
qc.recreate_collection(
    collection_name="patent_chunks",
    vectors_config=VectorParams(size=1024, distance=Distance.COSINE),
)
 
points = [
    PointStruct(
        id=f"{doc['application_no']}#{chunk['idx']}",
        vector=embed(chunk["text"]),
        payload={
            "application_no": doc["application_no"],
            "lang":           chunk["lang"],
            "field":          chunk["field"],     # "claim" | "abstract" | "spec"
            "ipc":            doc["ipc"],
            "filing_date":    doc["filing_date"],
            "text":           chunk["text"],
        },
    )
    for doc, chunk in iter_chunks()
]
qc.upsert(collection_name="patent_chunks", points=points)

Sparse: OpenSearch multi-match

from opensearchpy import OpenSearch
 
os_client = OpenSearch(hosts=["https://opensearch:9200"])
 
def bm25_search(q_ko: str, q_en: str | None, size: int = 100):
    fields = ["title_ko^4", "claims_ko^3", "abstract_ko^2"]
    if q_en:
        fields += ["title_en^3", "claims_en^2", "abstract_en^1.5"]
    body = {
        "size": size,
        "_source": ["application_no", "field_id"],
        "query": {
            "multi_match": {
                "query": q_ko if not q_en else f"{q_ko} {q_en}",
                "type": "best_fields",
                "fields": fields,
                "tie_breaker": 0.3,
            }
        },
    }
    res = os_client.search(index="patent_chunks", body=body)
    return [(h["_source"]["application_no"], h["_source"]["field_id"])
            for h in res["hits"]["hits"]]

Dense search + RRF fusion

def dense_search(q_vec, size: int = 100, ipc_filter: list[str] | None = None):
    flt = {"must": [{"key": "ipc", "match": {"any": ipc_filter}}]} if ipc_filter else None
    res = qc.search(
        collection_name="patent_chunks",
        query_vector=q_vec,
        limit=size,
        query_filter=flt,
    )
    return [(p.payload["application_no"], p.id) for p in res]
 
def hybrid_search(query: str, llm, embedder, ipc_filter=None, top_n=20):
    expanded = expand_query(query)                 # 4.1
    bm25_a = bm25_search(expanded["ko"], None)
    bm25_b = bm25_search(expanded["ko"], expanded["en"])
    dense  = dense_search(embedder.encode(hyde_expand(query, llm)), ipc_filter=ipc_filter)
    fused  = rrf_fuse([bm25_a, bm25_b, dense], k=60, weights=[1.0, 0.6, 1.2])
    candidates = build_candidates(fused[:top_n * 3])  # best chunk per doc
    return rerank(query, candidates, top_n=top_n)

8.2 Stack B: Elasticsearch + Milvus

The same functionality with Elasticsearch 8.x hybrid search and Milvus 2.x.

Milvus collection definition

from pymilvus import MilvusClient, DataType
 
mc = MilvusClient(uri="http://milvus:19530")
 
schema = mc.create_schema(auto_id=False, enable_dynamic_field=False)
schema.add_field("id",             DataType.VARCHAR, is_primary=True, max_length=128)
schema.add_field("application_no", DataType.VARCHAR, max_length=32)
schema.add_field("lang",           DataType.VARCHAR, max_length=4)
schema.add_field("field",          DataType.VARCHAR, max_length=16)
schema.add_field("ipc",            DataType.VARCHAR, max_length=16)
schema.add_field("text",           DataType.VARCHAR, max_length=2048)
schema.add_field("vector",         DataType.FLOAT_VECTOR, dim=1024)
 
mc.create_collection(collection_name="patent_chunks", schema=schema)
mc.create_index(
    collection_name="patent_chunks",
    index_params=[{"field_name": "vector", "index_type": "HNSW",
                   "metric_type": "COSINE", "params": {"M": 16, "efConstruction": 200}}],
)

Elasticsearch: BM25 + kNN in one query (own RRF recommended)

ES 8.x supports rank: { rrf: ... }, but sentence-to-document aggregation still needs custom code — so it is simpler to fetch both result lists separately and RRF them in Python.

from elasticsearch import Elasticsearch
 
es = Elasticsearch("https://es:9200")
 
def bm25_search_es(q_ko, q_en, size=100):
    body = {
        "size": size,
        "_source": ["application_no", "chunk_id"],
        "query": {
            "multi_match": {
                "query": f"{q_ko} {q_en or ''}".strip(),
                "fields": ["title_ko^4", "claims_ko^3", "abstract_ko^2",
                           "title_en^3", "claims_en^2"],
                "tie_breaker": 0.3,
            }
        },
    }
    res = es.search(index="patent_chunks", body=body)
    return [(h["_source"]["application_no"], h["_source"]["chunk_id"])
            for h in res["hits"]["hits"]]
 
def dense_search_milvus(q_vec, size=100):
    res = mc.search(
        collection_name="patent_chunks",
        data=[q_vec],
        limit=size,
        output_fields=["application_no"],
        search_params={"metric_type": "COSINE", "params": {"ef": 128}},
    )[0]
    return [(hit["entity"]["application_no"], hit["id"]) for hit in res]

From here the rrf_fuse() → rerank() flow is identical to Stack A.

8.3 How to pick between the two stacks

Situation	Pick
Heavy Korean BM25 traffic	Stack A (OpenSearch nori is stable out of the box)
100M+ chunks, GPU index needed	Stack B (Milvus scales better)
AWS single-cloud	Stack A (OpenSearch Service + self-hosted Qdrant)
Already on Elastic licensing	Stack B
Diverse payload filters (IPC, court, year)	Stack A (Qdrant payload index is more flexible)

8.4 Operations checklist

Korean orthography rules and loanword variants (데이타 / 데이터) — register both directions in the synonym dictionary
Japanese shinjitai vs kyūjitai (国 / 國) — icu_normalizer
Traditional vs simplified Chinese (資料 / 资料) — stconvert filter
English abbreviation getting stemmed (SAS → sa) — protect with keyword_marker
Drawing reference numbers and claim numbers disappearing through tokenization — check word_delimiter_graph
Use index aliases — zero-downtime reindex via alias swap
Dense vector recompute policy — stamp the embedding-model version onto the chunk metadata
Never hardcode RRF k in production — externalize to config

Wrap-up

Multilingual specialist-domain search is not a "good embedding model" problem. It needs all six stages working together — per-language analyzers → domain dictionary → multilingual BM25 → multilingual dense → document-level RRF → cross-encoder rerank — before you get a system that does not lose a single piece of prior art. With the pipeline diagram and RRF implementation in this post, plus either of the two stack code samples, you should be able to stand up a first working system in a week.

The next post will cover what to do after this retrieval stage — how to bolt on LLM answer generation with citation, and how to reduce hallucinations in legal and patent answers. Suggestions for topics are always welcome.