RAG Implementation Guide

Preparation Phase

1. Define Business Requirements

Clearly outline the specific use cases and goals for the RAG system
Identify key performance indicators (KPIs) to measure success

2. Gather Test Documents

Pertinence - Documents must meet the business requirements
Representativeness - Documents should be representative of all the types of documents that your solution will use
Physical document quality - Documents need to be in a usable shape, e.g. clear document scans
Document content quality - Documents must have high content quality, e.g. there should not be misspellings or grammatical errors

Hints:

Remember to redact PII from real documents
Have at least two documents for each document type
If using synthetic documents, ensure they are as close to real data as possible
You can use LLMs to help evaluate the document quality

3. Gather Test Queries

Create a diverse set of queries covering various use cases
Include edge cases and potential challenging scenarios

Chunking Phase

1. Perform Document Analysis

What information should be ignored or excluded?
What information should be captured in chunks?
How should the document be chunked? (e.g. by sentence or fixed size with overlap)

2. Choose Chunking Method

Sentence-based parsing - Breaks text into chunks of complete sentences
Fixed-size parsing (with overlap) - Breaks text into fixed-size chunks with overlap
Custom code - Uses text parsing techniques like regex
LLM augmentation - Generates textual representations of images or summaries of tables using LLM
Document layout analysis - Combines OCR with deep learning to extract document structure and text
Prebuilt model - Uses models pre-trained for specific document types or genres
Custom model - Uses custom models for structured documents where no prebuilt model exists
Manual - Uses human-curated chunks

Method	Tool Examples	Effort	Processing Cost
Sentence-based parsing	SpaCy sentence tokenizer NLTK sentence tokenizer	Low	Low
Fixed-size parsing	LangChain recursive text splitter Hugging Face chunk visualizer	Low	Low
Custom code	Python (re, regex, BeautifulSoup) R (stringr, xml2)	Medium	Low
LLM augmentation	Azure OpenAI OpenAI	Medium	High
Document layout analysis	Azure AI Document Intelligence Donut	Medium	Medium
Prebuilt model	Azure AI Document Intelligence Power Automate	Low	Medium/High
Custom model	Azure AI Document Intelligence Tesseract	High	Medium/High
Manual	Human reviewers LabelStudio	High	Low

Enrichment Phase

1. Clean Chunks

Lowercase text - Embeddings are usually case-sensitive meaning "Cheetah" and "cheetah" would result in a different vector despite having the same logical meaning
Remove stop words - Removing stop words like "a", "an" and "the" would allow both "a cheetah is faster than a puma" and "the cheetah is faster than the puma" to both be vectorially equal to "cheetah faster than puma."
Fix spelling mistakes
Remove unicode characters
Normalisation (expand abbreviations, convert numbers to words, expand contractions)

2. Augment Chunks with Metadata

Metadata can help filter the chunks prior to the semantic search or be used as part of it.

Examples of metadata fields:

Title
Summary
Keywords
Source
Language
Questions that the chunk can answer

Embedding Phase

1. Choose Embedding Model

Hugging Face Leaderboard

2. Evaluate Embedding Model

Visualise your embeddings using tools such as t-SNE from Scikit-learn
Calculate embedding distances using Euclidean or Manhattan distance

Persisting Phase

Vector storage Examples:

- Pure vector databases: Pinecone, Weaviate
- Full-text search databases: Elasticsearch, OpenSearch
- Vector-capable SQL databases: PostgreSQL with pgvector
- Vector-capable NoSQL databases: MongoDB

Retrieval Phase

1. Implement Semantic Search

Choose between approximate nearest neighbor (ANN) or exact nearest neighbor search based on performance requirements
Implement hybrid search combining semantic and keyword-based approaches for improved results

2. Fine-tune Retrieval

Experiment with different similarity thresholds
Implement re-ranking strategies to improve relevance of retrieved chunks

Conclusion

Implementing a successful RAG solution requires careful consideration of each phase, from preparation to evaluation. By following this guide, you can create a robust and effective RAG system tailored to your specific needs.