Retrieval
The Heart of Your RAG System
Welcome back! So far, you've mastered loading and splitting documents. Now, we arrive at the "R" in RAG: Retrieval. This is where the magic happens. Retrieval is the process of finding and pulling the most relevant pieces of information from your data to answer a user's query. A powerful retrieval system ensures that your Language Model (LLM) gets the best possible context to formulate a smart, accurate response.
Let's fire up the vector database we created earlier and see how we can search it in different ways.
Vectorstore Retrieval: Getting Started
First, we need to load our existing Chroma
vectorstore. This database already contains the embedded text chunks from our documents. We'll use the same GoogleGenerativeAIEmbeddings
to ensure consistency.
from getpass import getpass
import os
from langchain.vectorstores import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings
# Ensure the Google API key is set
if "GOOGLE_API_KEY" not in os.environ:
os.environ["GOOGLE_API_KEY"] = getpass("Enter your Google API key: ")
# Initialize the embedding model
persist_directory = 'learn/chroma/'
embedding = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
# Load the vector database from disk
vectordb = Chroma(
persist_directory=persist_directory,
embedding_function=embedding
)
print(f"Vectors in DB: {vectordb._collection.count()}")
# Expected Output: 330
Similarity Search: The Default Retriever
The most common way to retrieve information is through similarity search. It finds the text chunks whose vector representations are closest to the vector of your question. In simple terms, it looks for the best matches based on semantic meaning.
Let's test this with a small, focused database about mushrooms.
# A small sample of texts
texts = [
"""The Amanita phalloides has a large and imposing epigeous (aboveground) fruiting body (basidiocarp).""",
"""A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.""",
"""A. phalloides, a.k.a Death Cap, is one of the most poisonous of all known mushrooms.""",
]
# Create a temporary, small vector database
smalldb = Chroma.from_texts(texts, embedding=embedding)
question = "Tell me about all-white mushrooms with large fruiting bodies"
# Perform a standard similarity search for the top 2 results
docs_ss = smalldb.similarity_search(question, k=2)
print(docs_ss)
Output:
[Document(page_content='A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.'),
Document(page_content='The Amanita phalloides has a large and imposing epigeous (aboveground) fruiting body (basidiocarp).')]
Notice that both results are highly relevant to a "large fruiting body." However, they are also quite repetitive. What if we want more varied information?
Addressing Diversity: Maximum Marginal Relevance (MMR)
Sometimes, you don't just want the most similar results; you want a set of results that are both relevant to the query and diverse from each other. This prevents getting multiple chunks that say the same thing. That's where Maximum Marginal Relevance (MMR) comes in.
MMR first fetches a larger set of documents and then selects the most relevant ones while penalizing similarity among the selected documents.
Let's try our mushroom query again, but this time with MMR.
# Perform an MMR search
docs_mmr = smalldb.max_marginal_relevance_search(question, k=2, fetch_k=3)
print(docs_mmr)
Output:
[Document(page_content='A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.'),
Document(page_content='A. phalloides, a.k.a Death Cap, is one of the most poisonous of all known mushrooms.')]
Look at the difference! The first document is the same—it's the most relevant. But the second document is different. Instead of repeating the "large fruiting body" fact, MMR chose a document that introduces a new, diverse concept: the mushroom's toxicity. MMR helps you get a broader, more comprehensive context.
Addressing Specificity: Working with Metadata
What if your query is about a specific document or a particular section? A general similarity search might pull in relevant text from anywhere in your database. To narrow the search, we can use metadata.
When we split our documents, LangChain automatically attaches metadata to each chunk, such as the source file and page number. We can use this to our advantage by filtering our search.
Let's first inspect the metadata from a search on our main vectordb
.
/question = "what is the blue planet?"
docs = vectordb.similarity_search(question, k=5)
# Print the metadata for each retrieved document
for d in docs:
print(d.metadata)
Output:
{'page': 93, 'source': 'English_THE_CREATION_OF_THE_UNIVERSE.pdf'}
{'page': 9, 'source': 'English_THE_CREATION_OF_THE_UNIVERSE.pdf'}
{'page': 79, 'source': 'English_THE_CREATION_OF_THE_UNIVERSE.pdf'}
{'page': 86, 'source': 'English_THE_CREATION_OF_THE_UNIVERSE.pdf'}
{'page': 84, 'source': 'English_THE_CREATION_OF_THE_UNIVERSE.pdf'}
By supporting filtering on this metadata, vectorstores allow you to create a retriever that only pulls information from page: 9
, for example, or from a specific source PDF. This self-querying capability allows the retriever to use the information within the documents to inform the retrieval process itself, leading to highly specific and relevant results.
Why This Matters
Mastering retrieval is key to building a state-of-the-art RAG application. By understanding and using different retrieval strategies, you can control the context provided to your LLM.
Similarity Search: Your go-to for finding the most relevant information quickly.
Maximum Marginal Relevance: Your tool for avoiding repetition and providing a broader, more diverse context.
Metadata Filtering: Your solution for zeroing in on specific sources, chapters, or sections within your knowledge base.
By choosing the right retriever for the job, you ensure your LLM has the precise information it needs to generate truly intelligent and helpful responses.
Last updated