chore: docs on quivr-core workflows (#3420)

# Description

Added some initial documentation on RAG workflows, including also some
nice Excalidraw diagrams

Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.

## Checklist before requesting a review

Please delete options that are not relevant.

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged

## Screenshots (if appropriate):
This commit is contained in:
Jacopo Chevallard 2024-10-23 11:12:23 +02:00 committed by GitHub
parent 973c678369
commit 8c7277e9ec
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
14 changed files with 408 additions and 123 deletions

View File

@ -0,0 +1,46 @@
# Configuration
The configuration classes are based on [Pydantic](https://docs.pydantic.dev/latest/) and allow the configuration of the ingestion and retrieval workflows via YAML files.
Below is an example of a YAML configuration file for a basic RAG retrieval workflow.
```yaml
workflow_config:
name: "standard RAG"
nodes:
- name: "START"
edges: ["filter_history"]
- name: "filter_history"
edges: ["rewrite"]
- name: "rewrite"
edges: ["retrieve"]
- name: "retrieve"
edges: ["generate_rag"]
- name: "generate_rag" # the name of the last node, from which we want to stream the answer to the user, should always start with "generate"
edges: ["END"]
# Maximum number of previous conversation iterations
# to include in the context of the answer
max_history: 10
prompt: "my prompt"
max_files: 20
reranker_config:
# The reranker supplier to use
supplier: "cohere"
# The model to use for the reranker for the given supplier
model: "rerank-multilingual-v3.0"
# Number of chunks returned by the reranker
top_n: 5
llm_config:
max_context_tokens: 2000
temperature: 0.7
streaming: true
```

View File

@ -3,7 +3,7 @@
If you need to quickly start talking to your list of files, here are the steps. If you need to quickly start talking to your list of files, here are the steps.
1. Add your API Keys to your environment variables 1. Add your API Keys to your environment variables
```python ```python
import os import os
os.environ["OPENAI_API_KEY"] = "myopenai_apikey" os.environ["OPENAI_API_KEY"] = "myopenai_apikey"
@ -11,51 +11,55 @@ os.environ["OPENAI_API_KEY"] = "myopenai_apikey"
Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama. Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama.
2. Create a Brain with Quivr default configuration 2. Create a Brain with Quivr default configuration
```python ```python
from quivr_core import Brain from quivr_core import Brain
brain = Brain.from_files(name = "my smart brain", brain = Brain.from_files(name = "my smart brain",
file_paths = ["/my_smart_doc.pdf", "/my_intelligent_doc.txt"], file_paths = ["/my_smart_doc.pdf", "/my_intelligent_doc.txt"],
) )
``` ```
3. Launch a Chat 3. Launch a Chat
```python ```python
brain.print_info() brain.print_info()
console = Console() from rich.console import Console
console.print(Panel.fit("Ask your brain !", style="bold magenta")) from rich.panel import Panel
from rich.prompt import Prompt
while True: console = Console()
# Get user input console.print(Panel.fit("Ask your brain !", style="bold magenta"))
question = Prompt.ask("[bold cyan]Question[/bold cyan]")
# Check if user wants to exit while True:
if question.lower() == "exit": # Get user input
console.print(Panel("Goodbye!", style="bold yellow")) question = Prompt.ask("[bold cyan]Question[/bold cyan]")
break
answer = brain.ask(question) # Check if user wants to exit
# Print the answer with typing effect if question.lower() == "exit":
console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}") console.print(Panel("Goodbye!", style="bold yellow"))
break
console.print("-" * console.width) answer = brain.ask(question)
# Print the answer with typing effect
console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")
brain.print_info() console.print("-" * console.width)
brain.print_info()
``` ```
And now you are all set up to talk with your brain ! And now you are all set up to talk with your brain !
## Custom Brain ## Custom Brain
If you want to change the language or embeddings model, you can modify the parameters of the brain. If you want to change the language or embeddings model, you can modify the parameters of the brain.
Let's say you want to use Mistral llm and a specific embedding model : Let's say you want to use a LLM from Mistral and a specific embedding model :
```python ```python
from quivr_core import Brain from quivr_core import Brain
from langchain_core.embeddings import Embeddings from langchain_core.embeddings import Embeddings
brain = Brain.from_files(name = "my smart brain", brain = Brain.from_files(name = "my smart brain",
file_paths = ["/my_smart_doc.pdf", "/my_intelligent_doc.txt"], file_paths = ["/my_smart_doc.pdf", "/my_intelligent_doc.txt"],
llm=LLMEndpoint( llm=LLMEndpoint(
llm_config=LLMEndpointConfig(model="mistral-small-latest", llm_base_url="https://api.mistral.ai/v1/chat/completions"), llm_config=LLMEndpointConfig(model="mistral-small-latest", llm_base_url="https://api.mistral.ai/v1/chat/completions"),
@ -68,12 +72,12 @@ Note : [Embeddings](https://python.langchain.com/docs/integrations/text_embeddin
## Launch with Chainlit ## Launch with Chainlit
If you want to quickly launch an interface with Chainlit, you can simply do at the root of the project : If you want to quickly launch an interface with streamlit, you can simply do at the root of the project :
```bash ```bash
cd examples/chatbot / cd examples/chatbot /
rye sync / rye sync /
rye run chainlit run chainlit.py rye run chainlit run chainlit.py
``` ```
For more detail, go in [examples/chatbot/chainlit.md](https://github.com/QuivrHQ/quivr/tree/main/examples/chatbot) For more detail, go in [examples/chatbot/chainlit.md](https://github.com/QuivrHQ/quivr/tree/main/examples/chatbot)
Note : Modify the Brain configs directly in examples/chatbot/main.py; Note : Modify the Brain configs directly in examples/chatbot/main.py;

Binary file not shown.

After

Width:  |  Height:  |  Size: 134 KiB

View File

@ -0,0 +1,77 @@
# Basic ingestion
![](basic_ingestion.excalidraw.png)
Creating a basic ingestion workflow like the one above is simple, here are the steps:
1. Add your API Keys to your environment variables
```python
import os
os.environ["OPENAI_API_KEY"] = "myopenai_apikey"
```
Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama.
2. Create the YAML file ``basic_ingestion_workflow.yaml`` and copy the following content in it
```yaml
parser_config:
megaparse_config:
strategy: "auto" # for unstructured, it can be "auto", "fast", "hi_res", "ocr_only", see https://docs.unstructured.io/open-source/concepts/partitioning-strategies#partitioning-strategies
pdf_parser: "unstructured"
splitter_config:
chunk_size: 400 # in tokens
chunk_overlap: 100 # in tokens
```
3. Create a Brain using the above configuration and the list of files you want to ingest
```python
from quivr_core import Brain
from quivr_core.config import IngestionConfig
config_file_name = "./basic_ingestion_workflow.yaml"
ingestion_config = IngestionConfig.from_yaml(config_file_name)
processor_kwargs = {
"megaparse_config": ingestion_config.parser_config.megaparse_config,
"splitter_config": ingestion_config.parser_config.splitter_config,
}
brain = Brain.from_files(name = "my smart brain",
file_paths = ["./my_first_doc.pdf", "./my_second_doc.txt"],
processor_kwargs=processor_kwargs,
)
```
4. Launch a Chat
```python
brain.print_info()
from rich.console import Console
from rich.panel import Panel
from rich.prompt import Prompt
console = Console()
console.print(Panel.fit("Ask your brain !", style="bold magenta"))
while True:
# Get user input
question = Prompt.ask("[bold cyan]Question[/bold cyan]")
# Check if user wants to exit
if question.lower() == "exit":
console.print(Panel("Goodbye!", style="bold yellow"))
break
answer = brain.ask(question)
# Print the answer with typing effect
console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")
console.print("-" * console.width)
brain.print_info()
```
5. You are now all set up to talk with your brain and test different chunking strategies by simply changing the configuration file!

Binary file not shown.

After

Width:  |  Height:  |  Size: 191 KiB

View File

@ -0,0 +1,106 @@
# Basic RAG
![](basic_rag.excalidraw.png)
Creating a basic RAG workflow like the one above is simple, here are the steps:
1. Add your API Keys to your environment variables
```python
import os
os.environ["OPENAI_API_KEY"] = "myopenai_apikey"
```
Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama.
2. Create the YAML file ``basic_rag_workflow.yaml`` and copy the following content in it
```yaml
workflow_config:
name: "standard RAG"
nodes:
- name: "START"
edges: ["filter_history"]
- name: "filter_history"
edges: ["rewrite"]
- name: "rewrite"
edges: ["retrieve"]
- name: "retrieve"
edges: ["generate_rag"]
- name: "generate_rag" # the name of the last node, from which we want to stream the answer to the user
edges: ["END"]
# Maximum number of previous conversation iterations
# to include in the context of the answer
max_history: 10
# Reranker configuration
reranker_config:
# The reranker supplier to use
supplier: "cohere"
# The model to use for the reranker for the given supplier
model: "rerank-multilingual-v3.0"
# Number of chunks returned by the reranker
top_n: 5
# Configuration for the LLM
llm_config:
# maximum number of tokens passed to the LLM to generate the answer
max_input_tokens: 4000
# temperature for the LLM
temperature: 0.7
```
3. Create a Brain with the default configuration
```python
from quivr_core import Brain
brain = Brain.from_files(name = "my smart brain",
file_paths = ["./my_first_doc.pdf", "./my_second_doc.txt"],
)
```
4. Launch a Chat
```python
brain.print_info()
from rich.console import Console
from rich.panel import Panel
from rich.prompt import Prompt
from quivr_core.config import RetrievalConfig
config_file_name = "./basic_rag_workflow.yaml"
retrieval_config = RetrievalConfig.from_yaml(config_file_name)
console = Console()
console.print(Panel.fit("Ask your brain !", style="bold magenta"))
while True:
# Get user input
question = Prompt.ask("[bold cyan]Question[/bold cyan]")
# Check if user wants to exit
if question.lower() == "exit":
console.print(Panel("Goodbye!", style="bold yellow"))
break
answer = brain.ask(question, retrieval_config=retrieval_config)
# Print the answer with typing effect
console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")
console.print("-" * console.width)
brain.print_info()
```
5. You are now all set up to talk with your brain and test different retrieval strategies by simply changing the configuration file!

View File

@ -1,88 +0,0 @@
# Chat
Creating a custom brain workflow is simple, here are the steps :
1. Create a workflow
2. Create a Brain with this workflow and append your files
3. Launch a chat
4. Chat with your brain !
### Use AssistantConfig
First create a json configuration file in the rag_config_workflow.yaml format (see workflows):
```yaml
ingestion_config:
parser_config:
megaparse_config:
strategy: "fast"
pdf_parser: "unstructured"
splitter_config:
chunk_size: 400
chunk_overlap: 100
retrieval_config:
workflow_config:
name: "standard RAG"
nodes:
- name: "START"
edges: ["filter_history"]
- name: "filter_history"
edges: ["generate_chat_llm"]
- name: "generate_chat_llm" # the name of the last node, from which we want to stream the answer to the user, should always start with "generate"
edges: ["END"]
# Maximum number of previous conversation iterations
# to include in the context of the answer
max_history: 10
#prompt: "my prompt"
max_files: 20
reranker_config:
# The reranker supplier to use
supplier: "cohere"
# The model to use for the reranker for the given supplier
model: "rerank-multilingual-v3.0"
# Number of chunks returned by the reranker
top_n: 5
llm_config:
# The LLM supplier to use
supplier: "openai"
# The model to use for the LLM for the given supplier
model: "gpt-3.5-turbo-0125"
max_input_tokens: 2000
# Maximum number of tokens to pass to the LLM
# as a context to generate the answer
max_output_tokens: 2000
temperature: 0.7
streaming: true
```
This brain is set up to :
* Filter history and keep only the latest conversations
* Ask the question to the brain
* Generate answer
Then, when instanciating your Brain, add the custom config you created:
```python
assistant_config = AssistantConfig.from_yaml("my_config_file.yaml")
processor_kwargs = {
"assistant_config": assistant_config
}
```

View File

@ -1 +0,0 @@
# RAG with Internet

Binary file not shown.

After

Width:  |  Height:  |  Size: 236 KiB

View File

@ -0,0 +1,135 @@
# RAG with web search
![](rag_with_web_search.excalidraw.png)
Follow the instructions below to create the agentic RAG workflow shown above, which includes some advanced capabilities such as:
* **user intention detection** - the agent can detect if the user wants to activate the web search tool to look for information not present in the documents;
* **dynamic chunk retrieval** - the number of retrieved chunks is not fixed, but determined dynamically using the reranker's relevance scores and the user-provided ``relevance_score_threshold``;
* **web search** - the agent can search the web for more information if needed.
---
1. Add your API Keys to your environment variables
```python
import os
os.environ["OPENAI_API_KEY"] = "myopenai_apikey"
```
Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama.
2. Create the YAML file ``rag_with_web_search_workflow.yaml`` and copy the following content in it
```yaml
workflow_config:
name: "RAG with web search"
# List of tools that the agent can activate if the user instructions require it
available_tools:
- "web search"
nodes:
- name: "START"
conditional_edge:
routing_function: "routing_split"
conditions: ["edit_system_prompt", "filter_history"]
- name: "edit_system_prompt"
edges: ["filter_history"]
- name: "filter_history"
edges: ["dynamic_retrieve"]
- name: "dynamic_retrieve"
conditional_edge:
routing_function: "tool_routing"
conditions: ["run_tool", "generate_rag"]
- name: "run_tool"
edges: ["generate_rag"]
- name: "generate_rag" # the name of the last node, from which we want to stream the answer to the user
edges: ["END"]
tools:
- name: "cited_answer"
# Maximum number of previous conversation iterations
# to include in the context of the answer
max_history: 10
# Number of chunks returned by the retriever
k: 40
# Reranker configuration
reranker_config:
# The reranker supplier to use
supplier: "cohere"
# The model to use for the reranker for the given supplier
model: "rerank-multilingual-v3.0"
# Number of chunks returned by the reranker
top_n: 5
# Among the chunks returned by the reranker, only those with relevance
# scores equal or above the relevance_score_threshold will be returned
# to the LLM to generate the answer (allowed values are between 0 and 1,
# a value of 0.1 works well with the cohere and jina rerankers)
relevance_score_threshold: 0.01
# LLM configuration
llm_config:
# maximum number of tokens passed to the LLM to generate the answer
max_input_tokens: 8000
# temperature for the LLM
temperature: 0.7
```
3. Create a Brain with the default configuration
```python
from quivr_core import Brain
brain = Brain.from_files(name = "my smart brain",
file_paths = ["./my_first_doc.pdf", "./my_second_doc.txt"],
)
```
4. Launch a Chat
```python
brain.print_info()
from rich.console import Console
from rich.panel import Panel
from rich.prompt import Prompt
from quivr_core.config import RetrievalConfig
config_file_name = "./rag_with_web_search_workflow.yaml"
retrieval_config = RetrievalConfig.from_yaml(config_file_name)
console = Console()
console.print(Panel.fit("Ask your brain !", style="bold magenta"))
while True:
# Get user input
question = Prompt.ask("[bold cyan]Question[/bold cyan]")
# Check if user wants to exit
if question.lower() == "exit":
console.print(Panel("Goodbye!", style="bold yellow"))
break
answer = brain.ask(question, retrieval_config=retrieval_config)
# Print the answer with typing effect
console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")
console.print("-" * console.width)
brain.print_info()
```
5. You are now all set up to talk with your brain and test different retrieval strategies by simply changing the configuration file!

View File

@ -1 +1,3 @@
# Configuration # Workflows
In this section, you will find examples of workflows that you can use to create your own agentic RAG systems.

View File

@ -79,10 +79,14 @@ nav:
- Workflows: - Workflows:
- workflows/index.md - workflows/index.md
- Examples: - Examples:
- workflows/examples/chat.md - workflows/examples/basic_ingestion.md
- workflows/examples/rag_with_internet.md - workflows/examples/basic_rag.md
- workflows/examples/rag_with_web_search.md
- Configuration: - Configuration:
- config/index.md - config/index.md
- config/base_config.md
- config/config.md - config/config.md
- config/base_config.md
- Examples:
- examples/index.md
- examples/custom_storage.md
- Enterprise: https://docs.quivr.app/ - Enterprise: https://docs.quivr.app/

View File

@ -25,6 +25,8 @@ anthropic==0.36.1
anyio==4.6.2.post1 anyio==4.6.2.post1
# via anthropic # via anthropic
# via httpx # via httpx
appnope==0.1.4
# via ipykernel
asttokens==2.4.1 asttokens==2.4.1
# via stack-data # via stack-data
attrs==24.2.0 attrs==24.2.0
@ -76,8 +78,6 @@ fsspec==2024.9.0
# via huggingface-hub # via huggingface-hub
ghp-import==2.1.0 ghp-import==2.1.0
# via mkdocs # via mkdocs
greenlet==3.1.1
# via sqlalchemy
griffe==1.2.0 griffe==1.2.0
# via mkdocstrings-python # via mkdocstrings-python
h11==0.14.0 h11==0.14.0

View File

@ -25,6 +25,8 @@ anthropic==0.36.1
anyio==4.6.2.post1 anyio==4.6.2.post1
# via anthropic # via anthropic
# via httpx # via httpx
appnope==0.1.4
# via ipykernel
asttokens==2.4.1 asttokens==2.4.1
# via stack-data # via stack-data
attrs==24.2.0 attrs==24.2.0
@ -76,8 +78,6 @@ fsspec==2024.9.0
# via huggingface-hub # via huggingface-hub
ghp-import==2.1.0 ghp-import==2.1.0
# via mkdocs # via mkdocs
greenlet==3.1.1
# via sqlalchemy
griffe==1.2.0 griffe==1.2.0
# via mkdocstrings-python # via mkdocstrings-python
h11==0.14.0 h11==0.14.0