Question Answering
Bringing It All Together
You've successfully loaded, split, and stored your documents in a searchable vectorstore. Now it's time for the final, most exciting step in the Retrieval Augmented Generation (RAG) workflow: Question Answering.
This is where we combine the power of our retriever with a Language Model (LLM) to generate answers based on the content of our documents.
The RAG Workflow: A Quick Recap
Document Loading: Ingesting data from sources.
Splitting: Breaking documents into smaller, manageable chunks.
Storage & Retrieval: Embedding chunks and storing them in a vectorstore for efficient searching.
Question Answering (Generation): Using a retriever and an LLM to generate an answer to a user's query.
Let's begin by loading our vector database and initializing our LLM.
from getpass import getpass
import os
from langchain.vectorstores import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
# Set up the API Key and load the database
if "GOOGLE_API_KEY" not in os.environ:
os.environ["GOOGLE_API_KEY"] = getpass("Enter your Google API key: ")
persist_directory = 'learn/chroma/'
embedding = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)
# Initialize the LLM
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro-latest")
print(f"Vectors in DB: {vectordb._collection.count()}")
# Expected Output: a number like 660
The RetrievalQA
Chain
RetrievalQA
ChainThe simplest way to perform question answering over your documents is with the RetrievalQA
chain. This chain performs the core RAG logic for you:
It takes your question.
It uses the provided retriever to fetch the most relevant documents.
It stuffs those documents and your question into a prompt.
It sends that prompt to the LLM to generate the final answer.
Here's how to set it up:
from langchain.chains import RetrievalQA
question = "tell me about universe"
# Create the RetrievalQA chain
qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectordb.as_retriever()
)
# Run the chain
result = qa_chain({"query": question})
# Let's clean up and print the result
print(result["result"])
Result:
The LLM synthesizes the retrieved documents to provide a comprehensive answer, drawing directly from the knowledge stored in our vector database.
## Our Vast and Mysterious Universe: A Glimpse
The universe is everything. It's all of space and time and everything in it, from the smallest atom to the largest galaxy cluster...
...
Studying it helps us understand our place in the cosmos and inspires us to keep asking questions about the nature of existence itself.
Customizing with Prompts
What if you want the LLM to answer in a specific style, persona, or format? You can gain fine-grained control by providing a custom PromptTemplate
. The template instructs the model on how to behave and formats the input, using the retrieved documents ({context}
) and the user's query ({question}
).
from langchain.prompts import PromptTemplate
# Build a detailed prompt to guide the LLM's persona and output
template = """Role: You are Universe Knowledge, a vast and knowledgeable AI dedicated to unraveling the mysteries of the cosmos. You excel at providing comprehensive, accurate, and engaging answers about the universe, from the smallest particles to the largest cosmic structures. Here's how to approach user queries: * Understand the User's Question: Before formulating your response, carefully analyze the user's question to identify the specific information they seek. Pay close attention to keywords related to astronomical objects, phenomena, theories, or historical events. * Provide Accurate and Up-to-Date Information: Base your answers on the latest scientific discoveries, theories, and research in astronomy, astrophysics, and cosmology.
Cite reputable sources whenever possible, such as scientific journals, research institutions, or space agencies like NASA and ESA. * Explain Complex Concepts Clearly: Break down complex astronomical concepts into simpler terms that are easily understandable by a wide audience. Utilize analogies, comparisons, and real-world examples to illustrate your points effectively. * Engage the User with Visuals: Whenever relevant, enhance your responses with captivating visuals such as images, diagrams, simulations, or videos to provide a more immersive and memorable learning experience. * Encourage Curiosity and Exploration: Inspire users to delve deeper into the wonders of the universe by suggesting related topics, further reading materials, or online resources like astronomy websites, documentaries, or virtual tours of space.
Example Interaction: User: What is a black hole, and how is it formed? Universe Knowledge: Imagine a place in space where gravity is so strong that nothing, not even light, can escape. That's a black hole! These enigmatic objects form when massive stars collapse at the end of their life cycle. [Insert a captivating image or animation of a black hole] The star's core implodes, compressing its mass into an infinitely small point called a singularity. The gravitational pull around the singularity becomes so intense that it warps the fabric of spacetime, creating a region from which nothing can return. Want to learn more about the different types of black holes or how we detect them? Remember: * Your primary goal is to educate and inspire users about the marvels of the universe.
* Always maintain a factual, objective, and engaging tone. * Encourage users to continue their exploration of the cosmos.
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)
# Re-create the chain, but with our custom prompt
qa_chain_prompt = RetrievalQA.from_chain_type(
llm,
retriever=vectordb.as_retriever(),
return_source_documents=True, # We'll ask it to return the source documents
chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)
# Run the new chain
result = qa_chain_prompt({"query": question})
# Print the more structured result
print(result["result"])
Result with Custom Prompt:
Now the answer follows our instructions, providing a structured response and suggesting further reading, just as we asked.
The universe is everything. It encompasses all of space and time...
Here are some key points about the universe:
* Vastness and Scale: ...
* The Big Bang: ...
...
To learn more about the universe, you might explore:
* Books: "Cosmos" by Carl Sagan, "A Brief History of Time" by Stephen Hawking
* Websites: NASA (nasa.gov), European Space Agency (esa.int)
* Documentaries: "Cosmos: A Spacetime Odyssey," "Planet Earth"
Verifying the Source
A key advantage of RAG is transparency. Since we set return_source_documents=True
, we can inspect exactly which chunks of text the LLM used to generate its answer. This is invaluable for fact-checking and debugging.
print(result["source_documents"][0])
Output:
Document(page_content='www.har un ya hya.com - www.har un ya hya.net\nen.harunyahya.tv', metadata={'page': 9, 'source': 'English_THE_CREATION_OF_THE_UNIVERSE.pdf'})
A Key Limitation: No Memory
The standard RetrievalQA
chain is stateless. This means it treats every query as a brand-new question and has no memory of your previous interactions.
Notice how each question is independent:
# Ask about the moon
question_moon = "what the moon is?"
result_moon = qa_chain_prompt({"query": question_moon})
print(result_moon["result"])
# Ask about the sun
question_sun = "what is sun?"
result_sun = qa_chain_prompt({"query": question_sun})
print(result_sun["result"])
The chain answers the question about the sun without any awareness that you just asked about the moon. This is fine for simple Q&A, but it falls short for building a conversational chatbot.
Keep Exploring 🚀
You've now built a complete Retrieval Augmented Generation (RAG) pipeline! You've gone from raw documents to an intelligent Q&A system. But this is just the beginning. The world of conversational AI is vast and constantly evolving.
Last updated