chore: docs on quivr-core workflows (#3420)

# Description

Added some initial documentation on RAG workflows, including also some
nice Excalidraw diagrams

Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.

## Checklist before requesting a review

Please delete options that are not relevant.

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged

## Screenshots (if appropriate):
This commit is contained in:
Jacopo Chevallard 2024-10-23 11:12:23 +02:00 committed by GitHub
parent 973c678369
commit 8c7277e9ec
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
14 changed files with 408 additions and 123 deletions

View File

@ -0,0 +1,46 @@
# Configuration
The configuration classes are based on [Pydantic](https://docs.pydantic.dev/latest/) and allow the configuration of the ingestion and retrieval workflows via YAML files.
Below is an example of a YAML configuration file for a basic RAG retrieval workflow.
```yaml
workflow_config:
name: "standard RAG"
nodes:
- name: "START"
edges: ["filter_history"]
- name: "filter_history"
edges: ["rewrite"]
- name: "rewrite"
edges: ["retrieve"]
- name: "retrieve"
edges: ["generate_rag"]
- name: "generate_rag" # the name of the last node, from which we want to stream the answer to the user, should always start with "generate"
edges: ["END"]
# Maximum number of previous conversation iterations
# to include in the context of the answer
max_history: 10
prompt: "my prompt"
max_files: 20
reranker_config:
# The reranker supplier to use
supplier: "cohere"
# The model to use for the reranker for the given supplier
model: "rerank-multilingual-v3.0"
# Number of chunks returned by the reranker
top_n: 5
llm_config:
max_context_tokens: 2000
temperature: 0.7
streaming: true
```

View File

@ -22,27 +22,31 @@ brain = Brain.from_files(name = "my smart brain",
3. Launch a Chat
```python
brain.print_info()
brain.print_info()
console = Console()
console.print(Panel.fit("Ask your brain !", style="bold magenta"))
from rich.console import Console
from rich.panel import Panel
from rich.prompt import Prompt
while True:
# Get user input
question = Prompt.ask("[bold cyan]Question[/bold cyan]")
console = Console()
console.print(Panel.fit("Ask your brain !", style="bold magenta"))
# Check if user wants to exit
if question.lower() == "exit":
console.print(Panel("Goodbye!", style="bold yellow"))
break
while True:
# Get user input
question = Prompt.ask("[bold cyan]Question[/bold cyan]")
answer = brain.ask(question)
# Print the answer with typing effect
console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")
# Check if user wants to exit
if question.lower() == "exit":
console.print(Panel("Goodbye!", style="bold yellow"))
break
console.print("-" * console.width)
answer = brain.ask(question)
# Print the answer with typing effect
console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")
brain.print_info()
console.print("-" * console.width)
brain.print_info()
```
And now you are all set up to talk with your brain !
@ -50,7 +54,7 @@ And now you are all set up to talk with your brain !
## Custom Brain
If you want to change the language or embeddings model, you can modify the parameters of the brain.
Let's say you want to use Mistral llm and a specific embedding model :
Let's say you want to use a LLM from Mistral and a specific embedding model :
```python
from quivr_core import Brain
from langchain_core.embeddings import Embeddings
@ -68,7 +72,7 @@ Note : [Embeddings](https://python.langchain.com/docs/integrations/text_embeddin
## Launch with Chainlit
If you want to quickly launch an interface with Chainlit, you can simply do at the root of the project :
If you want to quickly launch an interface with streamlit, you can simply do at the root of the project :
```bash
cd examples/chatbot /
rye sync /

Binary file not shown.

After

Width:  |  Height:  |  Size: 134 KiB

View File

@ -0,0 +1,77 @@
# Basic ingestion
![](basic_ingestion.excalidraw.png)
Creating a basic ingestion workflow like the one above is simple, here are the steps:
1. Add your API Keys to your environment variables
```python
import os
os.environ["OPENAI_API_KEY"] = "myopenai_apikey"
```
Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama.
2. Create the YAML file ``basic_ingestion_workflow.yaml`` and copy the following content in it
```yaml
parser_config:
megaparse_config:
strategy: "auto" # for unstructured, it can be "auto", "fast", "hi_res", "ocr_only", see https://docs.unstructured.io/open-source/concepts/partitioning-strategies#partitioning-strategies
pdf_parser: "unstructured"
splitter_config:
chunk_size: 400 # in tokens
chunk_overlap: 100 # in tokens
```
3. Create a Brain using the above configuration and the list of files you want to ingest
```python
from quivr_core import Brain
from quivr_core.config import IngestionConfig
config_file_name = "./basic_ingestion_workflow.yaml"
ingestion_config = IngestionConfig.from_yaml(config_file_name)
processor_kwargs = {
"megaparse_config": ingestion_config.parser_config.megaparse_config,
"splitter_config": ingestion_config.parser_config.splitter_config,
}
brain = Brain.from_files(name = "my smart brain",
file_paths = ["./my_first_doc.pdf", "./my_second_doc.txt"],
processor_kwargs=processor_kwargs,
)
```
4. Launch a Chat
```python
brain.print_info()
from rich.console import Console
from rich.panel import Panel
from rich.prompt import Prompt
console = Console()
console.print(Panel.fit("Ask your brain !", style="bold magenta"))
while True:
# Get user input
question = Prompt.ask("[bold cyan]Question[/bold cyan]")
# Check if user wants to exit
if question.lower() == "exit":
console.print(Panel("Goodbye!", style="bold yellow"))
break
answer = brain.ask(question)
# Print the answer with typing effect
console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")
console.print("-" * console.width)
brain.print_info()
```
5. You are now all set up to talk with your brain and test different chunking strategies by simply changing the configuration file!

Binary file not shown.

After

Width:  |  Height:  |  Size: 191 KiB

View File

@ -0,0 +1,106 @@
# Basic RAG
![](basic_rag.excalidraw.png)
Creating a basic RAG workflow like the one above is simple, here are the steps:
1. Add your API Keys to your environment variables
```python
import os
os.environ["OPENAI_API_KEY"] = "myopenai_apikey"
```
Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama.
2. Create the YAML file ``basic_rag_workflow.yaml`` and copy the following content in it
```yaml
workflow_config:
name: "standard RAG"
nodes:
- name: "START"
edges: ["filter_history"]
- name: "filter_history"
edges: ["rewrite"]
- name: "rewrite"
edges: ["retrieve"]
- name: "retrieve"
edges: ["generate_rag"]
- name: "generate_rag" # the name of the last node, from which we want to stream the answer to the user
edges: ["END"]
# Maximum number of previous conversation iterations
# to include in the context of the answer
max_history: 10
# Reranker configuration
reranker_config:
# The reranker supplier to use
supplier: "cohere"
# The model to use for the reranker for the given supplier
model: "rerank-multilingual-v3.0"
# Number of chunks returned by the reranker
top_n: 5
# Configuration for the LLM
llm_config:
# maximum number of tokens passed to the LLM to generate the answer
max_input_tokens: 4000
# temperature for the LLM
temperature: 0.7
```
3. Create a Brain with the default configuration
```python
from quivr_core import Brain
brain = Brain.from_files(name = "my smart brain",
file_paths = ["./my_first_doc.pdf", "./my_second_doc.txt"],
)
```
4. Launch a Chat
```python
brain.print_info()
from rich.console import Console
from rich.panel import Panel
from rich.prompt import Prompt
from quivr_core.config import RetrievalConfig
config_file_name = "./basic_rag_workflow.yaml"
retrieval_config = RetrievalConfig.from_yaml(config_file_name)
console = Console()
console.print(Panel.fit("Ask your brain !", style="bold magenta"))
while True:
# Get user input
question = Prompt.ask("[bold cyan]Question[/bold cyan]")
# Check if user wants to exit
if question.lower() == "exit":
console.print(Panel("Goodbye!", style="bold yellow"))
break
answer = brain.ask(question, retrieval_config=retrieval_config)
# Print the answer with typing effect
console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")
console.print("-" * console.width)
brain.print_info()
```
5. You are now all set up to talk with your brain and test different retrieval strategies by simply changing the configuration file!

View File

@ -1,88 +0,0 @@
# Chat
Creating a custom brain workflow is simple, here are the steps :
1. Create a workflow
2. Create a Brain with this workflow and append your files
3. Launch a chat
4. Chat with your brain !
### Use AssistantConfig
First create a json configuration file in the rag_config_workflow.yaml format (see workflows):
```yaml
ingestion_config:
parser_config:
megaparse_config:
strategy: "fast"
pdf_parser: "unstructured"
splitter_config:
chunk_size: 400
chunk_overlap: 100
retrieval_config:
workflow_config:
name: "standard RAG"
nodes:
- name: "START"
edges: ["filter_history"]
- name: "filter_history"
edges: ["generate_chat_llm"]
- name: "generate_chat_llm" # the name of the last node, from which we want to stream the answer to the user, should always start with "generate"
edges: ["END"]
# Maximum number of previous conversation iterations
# to include in the context of the answer
max_history: 10
#prompt: "my prompt"
max_files: 20
reranker_config:
# The reranker supplier to use
supplier: "cohere"
# The model to use for the reranker for the given supplier
model: "rerank-multilingual-v3.0"
# Number of chunks returned by the reranker
top_n: 5
llm_config:
# The LLM supplier to use
supplier: "openai"
# The model to use for the LLM for the given supplier
model: "gpt-3.5-turbo-0125"
max_input_tokens: 2000
# Maximum number of tokens to pass to the LLM
# as a context to generate the answer
max_output_tokens: 2000
temperature: 0.7
streaming: true
```
This brain is set up to :
* Filter history and keep only the latest conversations
* Ask the question to the brain
* Generate answer
Then, when instanciating your Brain, add the custom config you created:
```python
assistant_config = AssistantConfig.from_yaml("my_config_file.yaml")
processor_kwargs = {
"assistant_config": assistant_config
}
```

View File

@ -1 +0,0 @@
# RAG with Internet

Binary file not shown.

After

Width:  |  Height:  |  Size: 236 KiB

View File

@ -0,0 +1,135 @@
# RAG with web search
![](rag_with_web_search.excalidraw.png)
Follow the instructions below to create the agentic RAG workflow shown above, which includes some advanced capabilities such as:
* **user intention detection** - the agent can detect if the user wants to activate the web search tool to look for information not present in the documents;
* **dynamic chunk retrieval** - the number of retrieved chunks is not fixed, but determined dynamically using the reranker's relevance scores and the user-provided ``relevance_score_threshold``;
* **web search** - the agent can search the web for more information if needed.
---
1. Add your API Keys to your environment variables
```python
import os
os.environ["OPENAI_API_KEY"] = "myopenai_apikey"
```
Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama.
2. Create the YAML file ``rag_with_web_search_workflow.yaml`` and copy the following content in it
```yaml
workflow_config:
name: "RAG with web search"
# List of tools that the agent can activate if the user instructions require it
available_tools:
- "web search"
nodes:
- name: "START"
conditional_edge:
routing_function: "routing_split"
conditions: ["edit_system_prompt", "filter_history"]
- name: "edit_system_prompt"
edges: ["filter_history"]
- name: "filter_history"
edges: ["dynamic_retrieve"]
- name: "dynamic_retrieve"
conditional_edge:
routing_function: "tool_routing"
conditions: ["run_tool", "generate_rag"]
- name: "run_tool"
edges: ["generate_rag"]
- name: "generate_rag" # the name of the last node, from which we want to stream the answer to the user
edges: ["END"]
tools:
- name: "cited_answer"
# Maximum number of previous conversation iterations
# to include in the context of the answer
max_history: 10
# Number of chunks returned by the retriever
k: 40
# Reranker configuration
reranker_config:
# The reranker supplier to use
supplier: "cohere"
# The model to use for the reranker for the given supplier
model: "rerank-multilingual-v3.0"
# Number of chunks returned by the reranker
top_n: 5
# Among the chunks returned by the reranker, only those with relevance
# scores equal or above the relevance_score_threshold will be returned
# to the LLM to generate the answer (allowed values are between 0 and 1,
# a value of 0.1 works well with the cohere and jina rerankers)
relevance_score_threshold: 0.01
# LLM configuration
llm_config:
# maximum number of tokens passed to the LLM to generate the answer
max_input_tokens: 8000
# temperature for the LLM
temperature: 0.7
```
3. Create a Brain with the default configuration
```python
from quivr_core import Brain
brain = Brain.from_files(name = "my smart brain",
file_paths = ["./my_first_doc.pdf", "./my_second_doc.txt"],
)
```
4. Launch a Chat
```python
brain.print_info()
from rich.console import Console
from rich.panel import Panel
from rich.prompt import Prompt
from quivr_core.config import RetrievalConfig
config_file_name = "./rag_with_web_search_workflow.yaml"
retrieval_config = RetrievalConfig.from_yaml(config_file_name)
console = Console()
console.print(Panel.fit("Ask your brain !", style="bold magenta"))
while True:
# Get user input
question = Prompt.ask("[bold cyan]Question[/bold cyan]")
# Check if user wants to exit
if question.lower() == "exit":
console.print(Panel("Goodbye!", style="bold yellow"))
break
answer = brain.ask(question, retrieval_config=retrieval_config)
# Print the answer with typing effect
console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")
console.print("-" * console.width)
brain.print_info()
```
5. You are now all set up to talk with your brain and test different retrieval strategies by simply changing the configuration file!

View File

@ -1 +1,3 @@
# Configuration
# Workflows
In this section, you will find examples of workflows that you can use to create your own agentic RAG systems.

View File

@ -79,10 +79,14 @@ nav:
- Workflows:
- workflows/index.md
- Examples:
- workflows/examples/chat.md
- workflows/examples/rag_with_internet.md
- workflows/examples/basic_ingestion.md
- workflows/examples/basic_rag.md
- workflows/examples/rag_with_web_search.md
- Configuration:
- config/index.md
- config/base_config.md
- config/config.md
- config/base_config.md
- Examples:
- examples/index.md
- examples/custom_storage.md
- Enterprise: https://docs.quivr.app/

View File

@ -25,6 +25,8 @@ anthropic==0.36.1
anyio==4.6.2.post1
# via anthropic
# via httpx
appnope==0.1.4
# via ipykernel
asttokens==2.4.1
# via stack-data
attrs==24.2.0
@ -76,8 +78,6 @@ fsspec==2024.9.0
# via huggingface-hub
ghp-import==2.1.0
# via mkdocs
greenlet==3.1.1
# via sqlalchemy
griffe==1.2.0
# via mkdocstrings-python
h11==0.14.0

View File

@ -25,6 +25,8 @@ anthropic==0.36.1
anyio==4.6.2.post1
# via anthropic
# via httpx
appnope==0.1.4
# via ipykernel
asttokens==2.4.1
# via stack-data
attrs==24.2.0
@ -76,8 +78,6 @@ fsspec==2024.9.0
# via huggingface-hub
ghp-import==2.1.0
# via mkdocs
greenlet==3.1.1
# via sqlalchemy
griffe==1.2.0
# via mkdocstrings-python
h11==0.14.0