Two hardcover books stacked horizontally, with colorful sticky tabs protruding from the pages, on a white surface.

PROJECT : ASCENSION

Hybrid search engine for publications

Hybrid Search for Large-Scale Religious Publishing Content

TL;DR:

Built a hybrid search engine for Ascension, one of the world’s largest Catholic publishers, combining traditional keyword search with vector-based semantic retrieval in Elasticsearch. The system improves content discovery across a large amount of documents and focuses on the retrieval layer of a RAG architecture, without implementing an LLM generation component.

For Ascension, one of the world’s largest Catholic publishers, I developed a hybrid search engine designed to improve content discovery across a large and diverse document corpus. The goal was to move beyond purely keyword-based search and make it easier for users to find relevant material, even when exact terminology was not used.

The solution combines traditional keyword matching with vector-based semantic search, implemented using Elasticsearch. By blending lexical and semantic retrieval techniques, the system is able to surface results that are both precise and contextually relevant. By doing so it’s now supporting a wide range of search behaviors, from exact queries to more exploratory or concept-driven searches.

My work focused on the retrieval layer of a Retrieval-Augmented Generation (RAG) architecture. Rather than building a full end-to-end system with an LLM generation layer, the emphasis was on designing and implementing a robust, scalable retrieval component that could later be extended or integrated with downstream generation or summarization workflows.

Key aspects of the project included:

  • Designing a hybrid search strategy that balances keyword relevance with semantic similarity

  • Indexing and querying documents using both traditional inverted indices and vector embeddings in Elasticsearch

  • Tuning retrieval behavior to handle varied document types and theological terminology

  • Ensuring the system could scale to a large publishing archive while maintaining acceptable query performance

The result is a flexible search foundation that significantly improves findability across Ascension’s content library and provides a solid base for future AI-assisted features.

My role in this project

Implemented Search Engine, Data loads to Vector Database. Created a hybrid search engine that combines semantic search with exact search. Loaded the client's data (bible, catechism... etc) as embeddings to the vector database.  

Challenges and trade-offs

Transforming semantic search into something  interesting for app users? The solution was to combine semantic search with exact search to return documents by keyword (e.g. "John 1:14" ) as well as by its semantics ("forgiveness").

Tech stack used in this project

OpenSearch/ElasticSearch, AWS, Vector Database