chore: docs on quivr-core workflows (#3420)

# Description Added some initial documentation on RAG workflows, including also some nice Excalidraw diagrams Please include a summary of the changes and the related issue. Please also include relevant motivation and context. ## Checklist before requesting a review Please delete options that are not relevant. - [ ] My code follows the style guidelines of this project - [ ] I have performed a self-review of my code - [ ] I have commented hard-to-understand areas - [ ] I have ideally added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes - [ ] Any dependent changes have been merged ## Screenshots (if appropriate):
2024-11-21 16:12:42 +03:00 · 2024-10-23 11:12:23 +02:00 · 2024-10-23 11:12:23 +02:00 · 8c7277e9ec
commit 8c7277e9ec
parent 973c678369
14 changed files with 408 additions and 123 deletions
--- a/docs/docs/config/index.md
+++ b/docs/docs/config/index.md
@ -0,0 +1,46 @@
+# Configuration
+
+The configuration classes are based on [Pydantic](https://docs.pydantic.dev/latest/) and allow the configuration of the ingestion and retrieval workflows via YAML files.
+
+Below is an example of a YAML configuration file for a basic RAG retrieval workflow.
+```yaml
+workflow_config:
+  name: "standard RAG"
+  nodes:
+    - name: "START"
+      edges: ["filter_history"]
+
+    - name: "filter_history"
+      edges: ["rewrite"]
+
+    - name: "rewrite"
+      edges: ["retrieve"]
+
+    - name: "retrieve"
+      edges: ["generate_rag"]
+
+    - name: "generate_rag" # the name of the last node, from which we want to stream the answer to the user, should always start with "generate"
+      edges: ["END"]
+# Maximum number of previous conversation iterations
+# to include in the context of the answer
+max_history: 10
+
+prompt: "my prompt"
+
+max_files: 20
+reranker_config:
+  # The reranker supplier to use
+  supplier: "cohere"
+
+  # The model to use for the reranker for the given supplier
+  model: "rerank-multilingual-v3.0"
+
+  # Number of chunks returned by the reranker
+  top_n: 5
+llm_config:
+
+  max_context_tokens: 2000
+
+  temperature: 0.7
+  streaming: true
+```
--- a/docs/docs/quickstart.md
+++ b/docs/docs/quickstart.md
@ -3,7 +3,7 @@
 If you need to quickly start talking to your list of files, here are the steps.

 1. Add your API Keys to your environment variables
-```python 
+```python
 import os
 os.environ["OPENAI_API_KEY"] = "myopenai_apikey"

@ -11,51 +11,55 @@ os.environ["OPENAI_API_KEY"] = "myopenai_apikey"
 Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama.

 2. Create a Brain with Quivr default configuration
-```python 
+```python
 from quivr_core import Brain

-brain = Brain.from_files(name = "my smart brain", 
+brain = Brain.from_files(name = "my smart brain",
                        file_paths = ["/my_smart_doc.pdf", "/my_intelligent_doc.txt"],
                        )

 ```

 3. Launch a Chat
-```python 
-    brain.print_info()
+```python
+brain.print_info()

-    console = Console()
-    console.print(Panel.fit("Ask your brain !", style="bold magenta"))
+from rich.console import Console
+from rich.panel import Panel
+from rich.prompt import Prompt

-    while True:
-        # Get user input
-        question = Prompt.ask("[bold cyan]Question[/bold cyan]")
+console = Console()
+console.print(Panel.fit("Ask your brain !", style="bold magenta"))

-        # Check if user wants to exit
-        if question.lower() == "exit":
-            console.print(Panel("Goodbye!", style="bold yellow"))
-            break
+while True:
+    # Get user input
+    question = Prompt.ask("[bold cyan]Question[/bold cyan]")

-        answer = brain.ask(question)
-        # Print the answer with typing effect
-        console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")
+    # Check if user wants to exit
+    if question.lower() == "exit":
+        console.print(Panel("Goodbye!", style="bold yellow"))
+        break

-        console.print("-" * console.width)
+    answer = brain.ask(question)
+    # Print the answer with typing effect
+    console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")

-    brain.print_info()
+    console.print("-" * console.width)
+
+brain.print_info()
 ```

-And now you are all set up to talk with your brain ! 
+And now you are all set up to talk with your brain !

 ## Custom Brain
 If you want to change the language or embeddings model, you can modify the parameters of the brain.

-Let's say you want to use Mistral llm and a specific embedding model :
-```python 
+Let's say you want to use a LLM from Mistral and a specific embedding model :
+```python
 from quivr_core import Brain
 from langchain_core.embeddings import Embeddings

-brain = Brain.from_files(name = "my smart brain", 
+brain = Brain.from_files(name = "my smart brain",
                        file_paths = ["/my_smart_doc.pdf", "/my_intelligent_doc.txt"],
                        llm=LLMEndpoint(
                            llm_config=LLMEndpointConfig(model="mistral-small-latest", llm_base_url="https://api.mistral.ai/v1/chat/completions"),
@ -68,12 +72,12 @@ Note : [Embeddings](https://python.langchain.com/docs/integrations/text_embeddin

 ## Launch with Chainlit

-If you want to quickly launch an interface with Chainlit, you can simply do at the root of the project : 
-```bash 
+If you want to quickly launch an interface with streamlit, you can simply do at the root of the project :
+```bash
 cd examples/chatbot /
 rye sync /
 rye run chainlit run chainlit.py
 ```
 For more detail, go in [examples/chatbot/chainlit.md](https://github.com/QuivrHQ/quivr/tree/main/examples/chatbot)

-Note : Modify the Brain configs directly in examples/chatbot/main.py;
+Note : Modify the Brain configs directly in examples/chatbot/main.py;
--- a/docs/docs/workflows/examples/basic_ingestion.excalidraw.png
+++ b/docs/docs/workflows/examples/basic_ingestion.excalidraw.png
--- a/docs/docs/workflows/examples/basic_ingestion.md
+++ b/docs/docs/workflows/examples/basic_ingestion.md
@ -0,0 +1,77 @@
+# Basic ingestion
+
+![](basic_ingestion.excalidraw.png)
+
+
+Creating a basic ingestion workflow like the one above is simple, here are the steps:
+
+1. Add your API Keys to your environment variables
+```python
+import os
+os.environ["OPENAI_API_KEY"] = "myopenai_apikey"
+
+```
+Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama.
+
+2. Create the YAML file ``basic_ingestion_workflow.yaml`` and copy the following content in it
+```yaml
+parser_config:
+  megaparse_config:
+    strategy: "auto" # for unstructured, it can be "auto", "fast", "hi_res", "ocr_only", see https://docs.unstructured.io/open-source/concepts/partitioning-strategies#partitioning-strategies
+    pdf_parser: "unstructured"
+  splitter_config:
+    chunk_size: 400 # in tokens
+    chunk_overlap: 100 # in tokens
+```
+
+3. Create a Brain using the above configuration and the list of files you want to ingest
+```python
+from quivr_core import Brain
+from quivr_core.config import IngestionConfig
+
+config_file_name = "./basic_ingestion_workflow.yaml"
+
+ingestion_config = IngestionConfig.from_yaml(config_file_name)
+
+processor_kwargs = {
+    "megaparse_config": ingestion_config.parser_config.megaparse_config,
+    "splitter_config": ingestion_config.parser_config.splitter_config,
+}
+
+brain = Brain.from_files(name = "my smart brain",
+                        file_paths = ["./my_first_doc.pdf", "./my_second_doc.txt"],
+                        processor_kwargs=processor_kwargs,
+                        )
+
+```
+
+4. Launch a Chat
+```python
+brain.print_info()
+
+from rich.console import Console
+from rich.panel import Panel
+from rich.prompt import Prompt
+
+console = Console()
+console.print(Panel.fit("Ask your brain !", style="bold magenta"))
+
+while True:
+    # Get user input
+    question = Prompt.ask("[bold cyan]Question[/bold cyan]")
+
+    # Check if user wants to exit
+    if question.lower() == "exit":
+        console.print(Panel("Goodbye!", style="bold yellow"))
+        break
+
+    answer = brain.ask(question)
+    # Print the answer with typing effect
+    console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")
+
+    console.print("-" * console.width)
+
+brain.print_info()
+```
+
+5. You are now all set up to talk with your brain and test different chunking strategies by simply changing the configuration file!
--- a/docs/docs/workflows/examples/basic_rag.excalidraw.png
+++ b/docs/docs/workflows/examples/basic_rag.excalidraw.png
--- a/docs/docs/workflows/examples/basic_rag.md
+++ b/docs/docs/workflows/examples/basic_rag.md
@ -0,0 +1,106 @@
+# Basic RAG
+
+![](basic_rag.excalidraw.png)
+
+
+Creating a basic RAG workflow like the one above is simple, here are the steps:
+
+
+1. Add your API Keys to your environment variables
+```python
+import os
+os.environ["OPENAI_API_KEY"] = "myopenai_apikey"
+
+```
+Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama.
+
+2. Create the YAML file ``basic_rag_workflow.yaml`` and copy the following content in it
+```yaml
+workflow_config:
+  name: "standard RAG"
+  nodes:
+    - name: "START"
+      edges: ["filter_history"]
+
+    - name: "filter_history"
+      edges: ["rewrite"]
+
+    - name: "rewrite"
+      edges: ["retrieve"]
+
+    - name: "retrieve"
+      edges: ["generate_rag"]
+
+    - name: "generate_rag" # the name of the last node, from which we want to stream the answer to the user
+      edges: ["END"]
+
+# Maximum number of previous conversation iterations
+# to include in the context of the answer
+max_history: 10
+
+# Reranker configuration
+reranker_config:
+  # The reranker supplier to use
+  supplier: "cohere"
+
+  # The model to use for the reranker for the given supplier
+  model: "rerank-multilingual-v3.0"
+
+  # Number of chunks returned by the reranker
+  top_n: 5
+
+# Configuration for the LLM
+llm_config:
+
+  # maximum number of tokens passed to the LLM to generate the answer
+  max_input_tokens: 4000
+
+  # temperature for the LLM
+  temperature: 0.7
+```
+
+3. Create a Brain with the default configuration
+```python
+from quivr_core import Brain
+
+brain = Brain.from_files(name = "my smart brain",
+                        file_paths = ["./my_first_doc.pdf", "./my_second_doc.txt"],
+                        )
+
+```
+
+4. Launch a Chat
+```python
+brain.print_info()
+
+from rich.console import Console
+from rich.panel import Panel
+from rich.prompt import Prompt
+from quivr_core.config import RetrievalConfig
+
+config_file_name = "./basic_rag_workflow.yaml"
+
+retrieval_config = RetrievalConfig.from_yaml(config_file_name)
+
+console = Console()
+console.print(Panel.fit("Ask your brain !", style="bold magenta"))
+
+while True:
+    # Get user input
+    question = Prompt.ask("[bold cyan]Question[/bold cyan]")
+
+    # Check if user wants to exit
+    if question.lower() == "exit":
+        console.print(Panel("Goodbye!", style="bold yellow"))
+        break
+
+    answer = brain.ask(question, retrieval_config=retrieval_config)
+    # Print the answer with typing effect
+    console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")
+
+    console.print("-" * console.width)
+
+brain.print_info()
+```
+
+5. You are now all set up to talk with your brain and test different retrieval strategies by simply changing the configuration file!
--- a/docs/docs/workflows/examples/chat.md
+++ b/docs/docs/workflows/examples/chat.md
@ -1,88 +0,0 @@
-# Chat
-
-Creating a custom brain workflow is simple, here are the steps : 
-
-1. Create a workflow
-2. Create a Brain with this workflow and append your files
-3. Launch a chat
-4. Chat with your brain ! 
-
-### Use AssistantConfig
-
-First create a json configuration file in the rag_config_workflow.yaml format (see workflows):
-```yaml
-ingestion_config:
-  parser_config:
-    megaparse_config:
-      strategy: "fast"
-      pdf_parser: "unstructured"
-    splitter_config:
-      chunk_size: 400
-      chunk_overlap: 100
-
-retrieval_config:
-  workflow_config:
-    name: "standard RAG"
-    nodes:
-      - name: "START"
-        edges: ["filter_history"]
-
-      - name: "filter_history"
-        edges: ["generate_chat_llm"]
-
-      - name: "generate_chat_llm" # the name of the last node, from which we want to stream the answer to the user, should always start with "generate"
-        edges: ["END"]
-  # Maximum number of previous conversation iterations
-  # to include in the context of the answer
-  max_history: 10
-
-  #prompt: "my prompt"
-
-  max_files: 20
-  reranker_config:
-    # The reranker supplier to use
-    supplier: "cohere"
-
-    # The model to use for the reranker for the given supplier
-    model: "rerank-multilingual-v3.0"
-
-    # Number of chunks returned by the reranker
-    top_n: 5
-  llm_config:
-    # The LLM supplier to use
-    supplier: "openai"
-
-    # The model to use for the LLM for the given supplier
-    model: "gpt-3.5-turbo-0125"
-
-    max_input_tokens: 2000
-
-    # Maximum number of tokens to pass to the LLM
-    # as a context to generate the answer
-    max_output_tokens: 2000
-
-    temperature: 0.7
-    streaming: true
-
-```
-This brain is set up to : 
-    * Filter history and keep only the latest conversations
-    * Ask the question to the brain
-    * Generate answer
-
-    
-Then, when instanciating your Brain, add the custom config you created: 
-
-```python
-    assistant_config = AssistantConfig.from_yaml("my_config_file.yaml")
-    processor_kwargs = {
-        "assistant_config": assistant_config
-    }
-
-```
-
-
-
-
-
-
--- a/docs/docs/workflows/examples/rag_with_internet.md
+++ b/docs/docs/workflows/examples/rag_with_internet.md
@ -1 +0,0 @@
-# RAG with Internet
--- a/docs/docs/workflows/examples/rag_with_web_search.excalidraw.png
+++ b/docs/docs/workflows/examples/rag_with_web_search.excalidraw.png
--- a/docs/docs/workflows/examples/rag_with_web_search.md
+++ b/docs/docs/workflows/examples/rag_with_web_search.md
@ -0,0 +1,135 @@
+# RAG with web search
+
+
+![](rag_with_web_search.excalidraw.png)
+
+Follow the instructions below to create the agentic RAG workflow shown above, which includes some advanced capabilities such as:
+
+* **user intention detection** - the agent can detect if the user wants to activate the web search tool to look for information not present in the documents;
+* **dynamic chunk retrieval** - the number of retrieved chunks is not fixed, but determined dynamically using the reranker's relevance scores and the user-provided ``relevance_score_threshold``;
+* **web search** - the agent can search the web for more information if needed.
+
+
+---
+
+1. Add your API Keys to your environment variables
+```python
+import os
+os.environ["OPENAI_API_KEY"] = "myopenai_apikey"
+
+```
+Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama.
+
+2. Create the YAML file ``rag_with_web_search_workflow.yaml`` and copy the following content in it
+```yaml
+workflow_config:
+  name: "RAG with web search"
+
+  # List of tools that the agent can activate if the user instructions require it
+  available_tools:
+    - "web search"
+
+  nodes:
+    - name: "START"
+      conditional_edge:
+        routing_function: "routing_split"
+        conditions: ["edit_system_prompt", "filter_history"]
+
+    - name: "edit_system_prompt"
+      edges: ["filter_history"]
+
+    - name: "filter_history"
+      edges: ["dynamic_retrieve"]
+
+    - name: "dynamic_retrieve"
+      conditional_edge:
+        routing_function: "tool_routing"
+        conditions: ["run_tool", "generate_rag"]
+
+    - name: "run_tool"
+      edges: ["generate_rag"]
+
+    - name: "generate_rag" # the name of the last node, from which we want to stream the answer to the user
+      edges: ["END"]
+      tools:
+        - name: "cited_answer"
+
+# Maximum number of previous conversation iterations
+# to include in the context of the answer
+max_history: 10
+
+# Number of chunks returned by the retriever
+k: 40
+
+# Reranker configuration
+reranker_config:
+  # The reranker supplier to use
+  supplier: "cohere"
+
+  # The model to use for the reranker for the given supplier
+  model: "rerank-multilingual-v3.0"
+
+  # Number of chunks returned by the reranker
+  top_n: 5
+
+  # Among the chunks returned by the reranker, only those with relevance
+  # scores equal or above the relevance_score_threshold will be returned
+  # to the LLM to generate the answer (allowed values are between 0 and 1,
+  # a value of 0.1 works well with the cohere and jina rerankers)
+  relevance_score_threshold: 0.01
+
+# LLM configuration
+llm_config:
+
+  # maximum number of tokens passed to the LLM to generate the answer
+  max_input_tokens: 8000
+
+  # temperature for the LLM
+  temperature: 0.7
+```
+
+3. Create a Brain with the default configuration
+```python
+from quivr_core import Brain
+
+brain = Brain.from_files(name = "my smart brain",
+                        file_paths = ["./my_first_doc.pdf", "./my_second_doc.txt"],
+                        )
+
+```
+
+4. Launch a Chat
+```python
+brain.print_info()
+
+from rich.console import Console
+from rich.panel import Panel
+from rich.prompt import Prompt
+from quivr_core.config import RetrievalConfig
+
+config_file_name = "./rag_with_web_search_workflow.yaml"
+
+retrieval_config = RetrievalConfig.from_yaml(config_file_name)
+
+console = Console()
+console.print(Panel.fit("Ask your brain !", style="bold magenta"))
+
+while True:
+    # Get user input
+    question = Prompt.ask("[bold cyan]Question[/bold cyan]")
+
+    # Check if user wants to exit
+    if question.lower() == "exit":
+        console.print(Panel("Goodbye!", style="bold yellow"))
+        break
+
+    answer = brain.ask(question, retrieval_config=retrieval_config)
+    # Print the answer with typing effect
+    console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")
+
+    console.print("-" * console.width)
+
+brain.print_info()
+```
+
+5. You are now all set up to talk with your brain and test different retrieval strategies by simply changing the configuration file!
--- a/docs/docs/workflows/index.md
+++ b/docs/docs/workflows/index.md
@ -1 +1,3 @@
-# Configuration
+# Workflows
+
+In this section, you will find examples of workflows that you can use to create your own agentic RAG systems.
--- a/docs/mkdocs.yml
+++ b/docs/mkdocs.yml
@ -79,10 +79,14 @@ nav:
      - Workflows:
          - workflows/index.md
          - Examples:
-              - workflows/examples/chat.md
-              - workflows/examples/rag_with_internet.md
+              - workflows/examples/basic_ingestion.md
+              - workflows/examples/basic_rag.md
+              - workflows/examples/rag_with_web_search.md
      - Configuration:
          - config/index.md
-          - config/base_config.md
          - config/config.md
+          - config/base_config.md
+      - Examples:
+          - examples/index.md
+          - examples/custom_storage.md
  - Enterprise: https://docs.quivr.app/
--- a/docs/requirements-dev.lock
+++ b/docs/requirements-dev.lock
@ -25,6 +25,8 @@ anthropic==0.36.1
 anyio==4.6.2.post1
    # via anthropic
    # via httpx
+appnope==0.1.4
+    # via ipykernel
 asttokens==2.4.1
    # via stack-data
 attrs==24.2.0
@ -76,8 +78,6 @@ fsspec==2024.9.0
    # via huggingface-hub
 ghp-import==2.1.0
    # via mkdocs
-greenlet==3.1.1
-    # via sqlalchemy
 griffe==1.2.0
    # via mkdocstrings-python
 h11==0.14.0
--- a/docs/requirements.lock
+++ b/docs/requirements.lock
@ -25,6 +25,8 @@ anthropic==0.36.1
 anyio==4.6.2.post1
    # via anthropic
    # via httpx
+appnope==0.1.4
+    # via ipykernel
 asttokens==2.4.1
    # via stack-data
 attrs==24.2.0
@ -76,8 +78,6 @@ fsspec==2024.9.0
    # via huggingface-hub
 ghp-import==2.1.0
    # via mkdocs
-greenlet==3.1.1
-    # via sqlalchemy
 griffe==1.2.0
    # via mkdocstrings-python
 h11==0.14.0