Follow the instructions below to create the agentic RAG workflow shown above, which includes some advanced capabilities such as:
- user intention detection - the agent can detect if the user wants to activate the web search tool to look for information not present in the documents;
- dynamic chunk retrieval - the number of retrieved chunks is not fixed, but determined dynamically using the reranker’s relevance scores and the user-provided
relevance_score_threshold
;
- web search - the agent can search the web for more information if needed.
- Add your API Keys to your environment variables
import os
os.environ["OPENAI_API_KEY"] = "my_openai_api_key"
os.environ["TAVILY_API_KEY"] = "my_tavily_api_key"
Check our .env.example
file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama.
- Create the YAML file
rag_with_web_search_workflow.yaml
and copy the following content in it
workflow_config:
name: "RAG with web search"
# List of tools that the agent can activate if the user instructions require it
available_tools:
- "web search"
nodes:
- name: "START"
conditional_edge:
routing_function: "routing_split"
conditions: ["edit_system_prompt", "filter_history"]
- name: "edit_system_prompt"
edges: ["filter_history"]
- name: "filter_history"
edges: ["dynamic_retrieve"]
- name: "dynamic_retrieve"
conditional_edge:
routing_function: "tool_routing"
conditions: ["run_tool", "generate_rag"]
- name: "run_tool"
edges: ["generate_rag"]
- name: "generate_rag" # the name of the last node, from which we want to stream the answer to the user
edges: ["END"]
tools:
- name: "cited_answer"
# Maximum number of previous conversation iterations
# to include in the context of the answer
max_history: 10
# Number of chunks returned by the retriever
k: 40
# Reranker configuration
reranker_config:
# The reranker supplier to use
supplier: "cohere"
# The model to use for the reranker for the given supplier
model: "rerank-multilingual-v3.0"
# Number of chunks returned by the reranker
top_n: 5
# Among the chunks returned by the reranker, only those with relevance
# scores equal or above the relevance_score_threshold will be returned
# to the LLM to generate the answer (allowed values are between 0 and 1,
# a value of 0.1 works well with the cohere and jina rerankers)
relevance_score_threshold: 0.01
# LLM configuration
llm_config:
# maximum number of tokens passed to the LLM to generate the answer
max_input_tokens: 8000
# temperature for the LLM
temperature: 0.7
- Create a Brain with the default configuration
from quivr_core import Brain
brain = Brain.from_files(name = "my smart brain",
file_paths = ["./my_first_doc.pdf", "./my_second_doc.txt"],
)
- Launch a Chat
brain.print_info()
from rich.console import Console
from rich.panel import Panel
from rich.prompt import Prompt
from quivr_core.config import RetrievalConfig
config_file_name = "./rag_with_web_search_workflow.yaml"
retrieval_config = RetrievalConfig.from_yaml(config_file_name)
console = Console()
console.print(Panel.fit("Ask your brain !", style="bold magenta"))
while True:
# Get user input
question = Prompt.ask("[bold cyan]Question[/bold cyan]")
# Check if user wants to exit
if question.lower() == "exit":
console.print(Panel("Goodbye!", style="bold yellow"))
break
answer = brain.ask(question, retrieval_config=retrieval_config)
# Print the answer with typing effect
console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")
console.print("-" * console.width)
brain.print_info()
- You are now all set up to talk with your brain and test different retrieval strategies by simply changing the configuration file!