Chat

Adding Memory for Conversational RAG

You've built a powerful Question-Answering system using the RetrievalQA chain. It's great for answering standalone questions, but conversations are rarely a series of disconnected queries. Humans naturally ask follow-up questions that rely on previous context.

To build a true chatbot, your application needs memory. This section will show you how to upgrade your RAG pipeline from a simple Q&A bot to a conversational agent that remembers your chat history.

The Problem: Stateless Chains

The RetrievalQA chain we used before is stateless. It treats every question as if it's the first one it has ever heard. It has no memory of past interactions.

For example, if you ask it a question, it uses the retrieved context to answer.

from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

# A simple prompt for the QA chain
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use ten sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"], template=template)

# Create the stateless QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

# Ask a question
question = "Are there aliens in the universe?"
result = qa_chain({"query": question})
print(result["result"])

Output:

The provided text focuses on the unique characteristics of Earth that make it suitable for life... it does not provide information about the existence or non-existence of extraterrestrial life. Therefore, I cannot answer your question. Thanks for asking!

If you were to then ask a follow-up like, "Where might they live?", the chain would have no idea what "they" refers to, because it has already forgotten the original question about aliens.

The Solution: `ConversationalRetrievalChain`

To solve this, LangChain provides the ConversationalRetrievalChain. This chain is designed specifically to handle conversational history.

It works in two steps:

Generate a Standalone Question: It first takes the chat history and the new follow-up question. It then uses an LLM to condense these into a new, standalone question. For example, it would combine "Are there aliens?" and "Where might they live?" into "Where might aliens live?".
Answer the New Question: It then passes this standalone question to the retriever and generates a final answer, just like the RetrievalQA chain.

First, we need to add a memory component to our application.

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(
    memory_key="chat_history", # The key where conversation history is stored
    return_messages=True       # Return history as a list of message objects
)

Now, we can create the ConversationalRetrievalChain, bringing together our LLM, retriever, and the newly created memory.

from langchain.chains import ConversationalRetrievalChain

retriever=vectordb.as_retriever()
qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    memory=memory
)

Putting It All Together: A Real Conversation

Let's see the chain in action.

The First Question

We ask our initial question. The chain fetches relevant documents and provides a detailed answer. This interaction is now stored in our memory object.

question = "Are there aliens in the universe?"
result = qa({"question": question})
print(result['answer'])

Output:

The existence of extraterrestrial life, including intelligent aliens, is one of the most profound and unanswered questions in science... While the existence of aliens remains unproven, the sheer scale of the universe and the growing evidence of life's potential ingredients make it a possibility that many scientists take seriously.

The Follow-up Question

Now for the real test. We ask a follow-up question using the pronoun "their".

question = "Where do their live?"
result = qa({"question": question})
print(result['answer'])

Output:

The universe is vast and the possibilities for life are equally vast. While we haven't yet found definitive proof of alien life, here are some of the most promising places scientists are looking:
Within Our Solar System:
Mars: ...
Europa (moon of Jupiter): ...
Enceladus (moon of Saturn): ...
Beyond Our Solar System (Exoplanets): ...

Success! 🥳 The chain correctly understood that "their" referred to "aliens" from the previous turn. It used this context to generate a new search query and provide a relevant, helpful answer about potentially habitable locations. By incorporating memory, you've transformed a simple Q&A bot into a far more intelligent and natural conversational partner.

Now, go build something wonderful. 🤖

PreviousQuestion Answering

Last updated 2 months ago

The Problem: Stateless Chains

The Solution: ConversationalRetrievalChain

Putting It All Together: A Real Conversation

The Follow-up Question

The Solution: `ConversationalRetrievalChain`