chore: docs on quivr-core workflows (#3420)

# Description Added some initial documentation on RAG workflows, including also some nice Excalidraw diagrams Please include a summary of the changes and the related issue. Please also include relevant motivation and context. ## Checklist before requesting a review Please delete options that are not relevant. - [ ] My code follows the style guidelines of this project - [ ] I have performed a self-review of my code - [ ] I have commented hard-to-understand areas - [ ] I have ideally added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes - [ ] Any dependent changes have been merged ## Screenshots (if appropriate):
2024-11-22 03:13:00 +03:00 · 2024-10-23 11:12:23 +02:00 · 2024-10-23 11:12:23 +02:00 · 8c7277e9ec
commit 8c7277e9ec
parent 973c678369
14 changed files with 408 additions and 123 deletions
--- a/docs/docs/config/index.md
+++ b/docs/docs/config/index.md
@ -0,0 +1,46 @@
 # Configuration
 The configuration classes are based on [Pydantic](https://docs.pydantic.dev/latest/) and allow the configuration of the ingestion and retrieval workflows via YAML files.
 Below is an example of a YAML configuration file for a basic RAG retrieval workflow.
 ```yaml
 workflow_config:
  name: "standard RAG"
  nodes:
    - name: "START"
      edges: ["filter_history"]
    - name: "filter_history"
      edges: ["rewrite"]
    - name: "rewrite"
      edges: ["retrieve"]
    - name: "retrieve"
      edges: ["generate_rag"]
    - name: "generate_rag" # the name of the last node, from which we want to stream the answer to the user, should always start with "generate"
      edges: ["END"]
 # Maximum number of previous conversation iterations
 # to include in the context of the answer
 max_history: 10
 prompt: "my prompt"
 max_files: 20
 reranker_config:
  # The reranker supplier to use
  supplier: "cohere"
  # The model to use for the reranker for the given supplier
  model: "rerank-multilingual-v3.0"
  # Number of chunks returned by the reranker
  top_n: 5
 llm_config:
  max_context_tokens: 2000
  temperature: 0.7
  streaming: true
 ```
--- a/docs/docs/quickstart.md
+++ b/docs/docs/quickstart.md
@ -3,7 +3,7 @@
 If you need to quickly start talking to your list of files, here are the steps.
 1. Add your API Keys to your environment variables
-```python 
+```python
 import os
 os.environ["OPENAI_API_KEY"] = "myopenai_apikey"
@ -11,51 +11,55 @@ os.environ["OPENAI_API_KEY"] = "myopenai_apikey"
 Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama.
 2. Create a Brain with Quivr default configuration
-```python 
+```python
 from quivr_core import Brain
-brain = Brain.from_files(name = "my smart brain", 
+brain = Brain.from_files(name = "my smart brain",
                        file_paths = ["/my_smart_doc.pdf", "/my_intelligent_doc.txt"],
                        )
 ```
 3. Launch a Chat
-```python 
+```python
-    brain.print_info()
+brain.print_info()
-    console = Console()
+from rich.console import Console
-    console.print(Panel.fit("Ask your brain !", style="bold magenta"))
+from rich.panel import Panel
 from rich.prompt import Prompt
-    while True:
+console = Console()
-        # Get user input
+console.print(Panel.fit("Ask your brain !", style="bold magenta"))
        question = Prompt.ask("[bold cyan]Question[/bold cyan]")
-        # Check if user wants to exit
+while True:
-        if question.lower() == "exit":
+    # Get user input
-            console.print(Panel("Goodbye!", style="bold yellow"))
+    question = Prompt.ask("[bold cyan]Question[/bold cyan]")
            break
-        answer = brain.ask(question)
+    # Check if user wants to exit
-        # Print the answer with typing effect
+    if question.lower() == "exit":
-        console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")
+        console.print(Panel("Goodbye!", style="bold yellow"))
        break
-        console.print("-" * console.width)
+    answer = brain.ask(question)
    # Print the answer with typing effect
    console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")
-    brain.print_info()
+    console.print("-" * console.width)
 brain.print_info()
 ```
-And now you are all set up to talk with your brain ! 
+And now you are all set up to talk with your brain !
 ## Custom Brain
 If you want to change the language or embeddings model, you can modify the parameters of the brain.
-Let's say you want to use Mistral llm and a specific embedding model :
+Let's say you want to use a LLM from Mistral and a specific embedding model :
-```python 
+```python
 from quivr_core import Brain
 from langchain_core.embeddings import Embeddings
-brain = Brain.from_files(name = "my smart brain", 
+brain = Brain.from_files(name = "my smart brain",
                        file_paths = ["/my_smart_doc.pdf", "/my_intelligent_doc.txt"],
                        llm=LLMEndpoint(
                            llm_config=LLMEndpointConfig(model="mistral-small-latest", llm_base_url="https://api.mistral.ai/v1/chat/completions"),
@ -68,12 +72,12 @@ Note : [Embeddings](https://python.langchain.com/docs/integrations/text_embeddin
 ## Launch with Chainlit
-If you want to quickly launch an interface with Chainlit, you can simply do at the root of the project : 
+If you want to quickly launch an interface with streamlit, you can simply do at the root of the project :
-```bash 
+```bash
 cd examples/chatbot /
 rye sync /
 rye run chainlit run chainlit.py
 ```
 For more detail, go in [examples/chatbot/chainlit.md](https://github.com/QuivrHQ/quivr/tree/main/examples/chatbot)
-Note : Modify the Brain configs directly in examples/chatbot/main.py;
+Note : Modify the Brain configs directly in examples/chatbot/main.py;
--- a/docs/docs/workflows/examples/basic_ingestion.excalidraw.png
+++ b/docs/docs/workflows/examples/basic_ingestion.excalidraw.png
--- a/docs/docs/workflows/examples/basic_ingestion.md
+++ b/docs/docs/workflows/examples/basic_ingestion.md
@ -0,0 +1,77 @@
 # Basic ingestion
 ![](basic_ingestion.excalidraw.png)
 Creating a basic ingestion workflow like the one above is simple, here are the steps:
 1. Add your API Keys to your environment variables
 ```python
 import os
 os.environ["OPENAI_API_KEY"] = "myopenai_apikey"
 ```
 Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama.
 2. Create the YAML file ``basic_ingestion_workflow.yaml`` and copy the following content in it
 ```yaml
 parser_config:
  megaparse_config:
    strategy: "auto" # for unstructured, it can be "auto", "fast", "hi_res", "ocr_only", see https://docs.unstructured.io/open-source/concepts/partitioning-strategies#partitioning-strategies
    pdf_parser: "unstructured"
  splitter_config:
    chunk_size: 400 # in tokens
    chunk_overlap: 100 # in tokens
 ```
 3. Create a Brain using the above configuration and the list of files you want to ingest
 ```python
 from quivr_core import Brain
 from quivr_core.config import IngestionConfig
 config_file_name = "./basic_ingestion_workflow.yaml"
 ingestion_config = IngestionConfig.from_yaml(config_file_name)
 processor_kwargs = {
    "megaparse_config": ingestion_config.parser_config.megaparse_config,
    "splitter_config": ingestion_config.parser_config.splitter_config,
 }
 brain = Brain.from_files(name = "my smart brain",
                        file_paths = ["./my_first_doc.pdf", "./my_second_doc.txt"],
                        processor_kwargs=processor_kwargs,
                        )
 ```
 4. Launch a Chat
 ```python
 brain.print_info()
 from rich.console import Console
 from rich.panel import Panel
 from rich.prompt import Prompt
 console = Console()
 console.print(Panel.fit("Ask your brain !", style="bold magenta"))
 while True:
    # Get user input
    question = Prompt.ask("[bold cyan]Question[/bold cyan]")
    # Check if user wants to exit
    if question.lower() == "exit":
        console.print(Panel("Goodbye!", style="bold yellow"))
        break
    answer = brain.ask(question)
    # Print the answer with typing effect
    console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")
    console.print("-" * console.width)
 brain.print_info()
 ```
 5. You are now all set up to talk with your brain and test different chunking strategies by simply changing the configuration file!
--- a/docs/docs/workflows/examples/basic_rag.excalidraw.png
+++ b/docs/docs/workflows/examples/basic_rag.excalidraw.png
--- a/docs/docs/workflows/examples/basic_rag.md
+++ b/docs/docs/workflows/examples/basic_rag.md
@ -0,0 +1,106 @@
 # Basic RAG
 ![](basic_rag.excalidraw.png)
 Creating a basic RAG workflow like the one above is simple, here are the steps:
 1. Add your API Keys to your environment variables
 ```python
 import os
 os.environ["OPENAI_API_KEY"] = "myopenai_apikey"
 ```
 Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama.
 2. Create the YAML file ``basic_rag_workflow.yaml`` and copy the following content in it
 ```yaml
 workflow_config:
  name: "standard RAG"
  nodes:
    - name: "START"
      edges: ["filter_history"]
    - name: "filter_history"
      edges: ["rewrite"]
    - name: "rewrite"
      edges: ["retrieve"]
    - name: "retrieve"
      edges: ["generate_rag"]
    - name: "generate_rag" # the name of the last node, from which we want to stream the answer to the user
      edges: ["END"]
 # Maximum number of previous conversation iterations
 # to include in the context of the answer
 max_history: 10
 # Reranker configuration
 reranker_config:
  # The reranker supplier to use
  supplier: "cohere"
  # The model to use for the reranker for the given supplier
  model: "rerank-multilingual-v3.0"
  # Number of chunks returned by the reranker
  top_n: 5
 # Configuration for the LLM
 llm_config:
  # maximum number of tokens passed to the LLM to generate the answer
  max_input_tokens: 4000
  # temperature for the LLM
  temperature: 0.7
 ```
 3. Create a Brain with the default configuration
 ```python
 from quivr_core import Brain
 brain = Brain.from_files(name = "my smart brain",
                        file_paths = ["./my_first_doc.pdf", "./my_second_doc.txt"],
                        )
 ```
 4. Launch a Chat
 ```python
 brain.print_info()
 from rich.console import Console
 from rich.panel import Panel
 from rich.prompt import Prompt
 from quivr_core.config import RetrievalConfig
 config_file_name = "./basic_rag_workflow.yaml"
 retrieval_config = RetrievalConfig.from_yaml(config_file_name)
 console = Console()
 console.print(Panel.fit("Ask your brain !", style="bold magenta"))
 while True:
    # Get user input
    question = Prompt.ask("[bold cyan]Question[/bold cyan]")
    # Check if user wants to exit
    if question.lower() == "exit":
        console.print(Panel("Goodbye!", style="bold yellow"))
        break
    answer = brain.ask(question, retrieval_config=retrieval_config)
    # Print the answer with typing effect
    console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")
    console.print("-" * console.width)
 brain.print_info()
 ```
 5. You are now all set up to talk with your brain and test different retrieval strategies by simply changing the configuration file!
--- a/docs/docs/workflows/examples/chat.md
+++ b/docs/docs/workflows/examples/chat.md
@ -1,88 +0,0 @@
 # Chat
 Creating a custom brain workflow is simple, here are the steps : 
 1. Create a workflow
 2. Create a Brain with this workflow and append your files
 3. Launch a chat
 4. Chat with your brain ! 
 ### Use AssistantConfig
 First create a json configuration file in the rag_config_workflow.yaml format (see workflows):
 ```yaml
 ingestion_config:
  parser_config:
    megaparse_config:
      strategy: "fast"
      pdf_parser: "unstructured"
    splitter_config:
      chunk_size: 400
      chunk_overlap: 100
 retrieval_config:
  workflow_config:
    name: "standard RAG"
    nodes:
      - name: "START"
        edges: ["filter_history"]
      - name: "filter_history"
        edges: ["generate_chat_llm"]
      - name: "generate_chat_llm" # the name of the last node, from which we want to stream the answer to the user, should always start with "generate"
        edges: ["END"]
  # Maximum number of previous conversation iterations
  # to include in the context of the answer
  max_history: 10
  #prompt: "my prompt"
  max_files: 20
  reranker_config:
    # The reranker supplier to use
    supplier: "cohere"
    # The model to use for the reranker for the given supplier
    model: "rerank-multilingual-v3.0"
    # Number of chunks returned by the reranker
    top_n: 5
  llm_config:
    # The LLM supplier to use
    supplier: "openai"
    # The model to use for the LLM for the given supplier
    model: "gpt-3.5-turbo-0125"
    max_input_tokens: 2000
    # Maximum number of tokens to pass to the LLM
    # as a context to generate the answer
    max_output_tokens: 2000
    temperature: 0.7
    streaming: true
 ```
 This brain is set up to : 
    * Filter history and keep only the latest conversations
    * Ask the question to the brain
    * Generate answer
 Then, when instanciating your Brain, add the custom config you created: 
 ```python
    assistant_config = AssistantConfig.from_yaml("my_config_file.yaml")
    processor_kwargs = {
        "assistant_config": assistant_config
    }
 ```
--- a/docs/docs/workflows/examples/rag_with_internet.md
+++ b/docs/docs/workflows/examples/rag_with_internet.md
@ -1 +0,0 @@
 # RAG with Internet
--- a/docs/docs/workflows/examples/rag_with_web_search.excalidraw.png
+++ b/docs/docs/workflows/examples/rag_with_web_search.excalidraw.png
--- a/docs/docs/workflows/examples/rag_with_web_search.md
+++ b/docs/docs/workflows/examples/rag_with_web_search.md
@ -0,0 +1,135 @@
 # RAG with web search
 ![](rag_with_web_search.excalidraw.png)
 Follow the instructions below to create the agentic RAG workflow shown above, which includes some advanced capabilities such as:
 * **user intention detection** - the agent can detect if the user wants to activate the web search tool to look for information not present in the documents;
 * **dynamic chunk retrieval** - the number of retrieved chunks is not fixed, but determined dynamically using the reranker's relevance scores and the user-provided ``relevance_score_threshold``;
 * **web search** - the agent can search the web for more information if needed.
 ---
 1. Add your API Keys to your environment variables
 ```python
 import os
 os.environ["OPENAI_API_KEY"] = "myopenai_apikey"
 ```
 Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama.
 2. Create the YAML file ``rag_with_web_search_workflow.yaml`` and copy the following content in it
 ```yaml
 workflow_config:
  name: "RAG with web search"
  # List of tools that the agent can activate if the user instructions require it
  available_tools:
    - "web search"
  nodes:
    - name: "START"
      conditional_edge:
        routing_function: "routing_split"
        conditions: ["edit_system_prompt", "filter_history"]
    - name: "edit_system_prompt"
      edges: ["filter_history"]
    - name: "filter_history"
      edges: ["dynamic_retrieve"]
    - name: "dynamic_retrieve"
      conditional_edge:
        routing_function: "tool_routing"
        conditions: ["run_tool", "generate_rag"]
    - name: "run_tool"
      edges: ["generate_rag"]
    - name: "generate_rag" # the name of the last node, from which we want to stream the answer to the user
      edges: ["END"]
      tools:
        - name: "cited_answer"
 # Maximum number of previous conversation iterations
 # to include in the context of the answer
 max_history: 10
 # Number of chunks returned by the retriever
 k: 40
 # Reranker configuration
 reranker_config:
  # The reranker supplier to use
  supplier: "cohere"
  # The model to use for the reranker for the given supplier
  model: "rerank-multilingual-v3.0"
  # Number of chunks returned by the reranker
  top_n: 5
  # Among the chunks returned by the reranker, only those with relevance
  # scores equal or above the relevance_score_threshold will be returned
  # to the LLM to generate the answer (allowed values are between 0 and 1,
  # a value of 0.1 works well with the cohere and jina rerankers)
  relevance_score_threshold: 0.01
 # LLM configuration
 llm_config:
  # maximum number of tokens passed to the LLM to generate the answer
  max_input_tokens: 8000
  # temperature for the LLM
  temperature: 0.7
 ```
 3. Create a Brain with the default configuration
 ```python
 from quivr_core import Brain
 brain = Brain.from_files(name = "my smart brain",
                        file_paths = ["./my_first_doc.pdf", "./my_second_doc.txt"],
                        )
 ```
 4. Launch a Chat
 ```python
 brain.print_info()
 from rich.console import Console
 from rich.panel import Panel
 from rich.prompt import Prompt
 from quivr_core.config import RetrievalConfig
 config_file_name = "./rag_with_web_search_workflow.yaml"
 retrieval_config = RetrievalConfig.from_yaml(config_file_name)
 console = Console()
 console.print(Panel.fit("Ask your brain !", style="bold magenta"))
 while True:
    # Get user input
    question = Prompt.ask("[bold cyan]Question[/bold cyan]")
    # Check if user wants to exit
    if question.lower() == "exit":
        console.print(Panel("Goodbye!", style="bold yellow"))
        break
    answer = brain.ask(question, retrieval_config=retrieval_config)
    # Print the answer with typing effect
    console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}")
    console.print("-" * console.width)
 brain.print_info()
 ```
 5. You are now all set up to talk with your brain and test different retrieval strategies by simply changing the configuration file!
--- a/docs/docs/workflows/index.md
+++ b/docs/docs/workflows/index.md
@ -1 +1,3 @@
-# Configuration
+# Workflows
 In this section, you will find examples of workflows that you can use to create your own agentic RAG systems.
--- a/docs/mkdocs.yml
+++ b/docs/mkdocs.yml
@ -79,10 +79,14 @@ nav:
      - Workflows:
          - workflows/index.md
          - Examples:
-              - workflows/examples/chat.md
+              - workflows/examples/basic_ingestion.md
-              - workflows/examples/rag_with_internet.md
+              - workflows/examples/basic_rag.md
              - workflows/examples/rag_with_web_search.md
      - Configuration:
          - config/index.md
          - config/base_config.md
          - config/config.md
          - config/base_config.md
      - Examples:
          - examples/index.md
          - examples/custom_storage.md
  - Enterprise: https://docs.quivr.app/
--- a/docs/requirements-dev.lock
+++ b/docs/requirements-dev.lock
@ -25,6 +25,8 @@ anthropic==0.36.1
 anyio==4.6.2.post1
    # via anthropic
    # via httpx
 appnope==0.1.4
    # via ipykernel
 asttokens==2.4.1
    # via stack-data
 attrs==24.2.0
@ -76,8 +78,6 @@ fsspec==2024.9.0
    # via huggingface-hub
 ghp-import==2.1.0
    # via mkdocs
 greenlet==3.1.1
    # via sqlalchemy
 griffe==1.2.0
    # via mkdocstrings-python
 h11==0.14.0
--- a/docs/requirements.lock
+++ b/docs/requirements.lock
@ -25,6 +25,8 @@ anthropic==0.36.1
 anyio==4.6.2.post1
    # via anthropic
    # via httpx
 appnope==0.1.4
    # via ipykernel
 asttokens==2.4.1
    # via stack-data
 attrs==24.2.0
@ -76,8 +78,6 @@ fsspec==2024.9.0
    # via huggingface-hub
 ghp-import==2.1.0
    # via mkdocs
 greenlet==3.1.1
    # via sqlalchemy
 griffe==1.2.0
    # via mkdocstrings-python
 h11==0.14.0
`@ -1 +1,3 @@`
	`# Configuration`	`# Workflows`

		`In this section, you will find examples of workflows that you can use to create your own agentic RAG systems.`