diff --git a/docs/docs/config/index.md b/docs/docs/config/index.md index e69de29bb..a39401a9a 100644 --- a/docs/docs/config/index.md +++ b/docs/docs/config/index.md @@ -0,0 +1,46 @@ +# Configuration + +The configuration classes are based on [Pydantic](https://docs.pydantic.dev/latest/) and allow the configuration of the ingestion and retrieval workflows via YAML files. + +Below is an example of a YAML configuration file for a basic RAG retrieval workflow. +```yaml +workflow_config: + name: "standard RAG" + nodes: + - name: "START" + edges: ["filter_history"] + + - name: "filter_history" + edges: ["rewrite"] + + - name: "rewrite" + edges: ["retrieve"] + + - name: "retrieve" + edges: ["generate_rag"] + + - name: "generate_rag" # the name of the last node, from which we want to stream the answer to the user, should always start with "generate" + edges: ["END"] +# Maximum number of previous conversation iterations +# to include in the context of the answer +max_history: 10 + +prompt: "my prompt" + +max_files: 20 +reranker_config: + # The reranker supplier to use + supplier: "cohere" + + # The model to use for the reranker for the given supplier + model: "rerank-multilingual-v3.0" + + # Number of chunks returned by the reranker + top_n: 5 +llm_config: + + max_context_tokens: 2000 + + temperature: 0.7 + streaming: true +``` diff --git a/docs/docs/quickstart.md b/docs/docs/quickstart.md index 840a5b1ec..25a67343e 100644 --- a/docs/docs/quickstart.md +++ b/docs/docs/quickstart.md @@ -3,7 +3,7 @@ If you need to quickly start talking to your list of files, here are the steps. 1. Add your API Keys to your environment variables -```python +```python import os os.environ["OPENAI_API_KEY"] = "myopenai_apikey" @@ -11,51 +11,55 @@ os.environ["OPENAI_API_KEY"] = "myopenai_apikey" Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama. 2. Create a Brain with Quivr default configuration -```python +```python from quivr_core import Brain -brain = Brain.from_files(name = "my smart brain", +brain = Brain.from_files(name = "my smart brain", file_paths = ["/my_smart_doc.pdf", "/my_intelligent_doc.txt"], ) ``` 3. Launch a Chat -```python - brain.print_info() +```python +brain.print_info() - console = Console() - console.print(Panel.fit("Ask your brain !", style="bold magenta")) +from rich.console import Console +from rich.panel import Panel +from rich.prompt import Prompt - while True: - # Get user input - question = Prompt.ask("[bold cyan]Question[/bold cyan]") +console = Console() +console.print(Panel.fit("Ask your brain !", style="bold magenta")) - # Check if user wants to exit - if question.lower() == "exit": - console.print(Panel("Goodbye!", style="bold yellow")) - break +while True: + # Get user input + question = Prompt.ask("[bold cyan]Question[/bold cyan]") - answer = brain.ask(question) - # Print the answer with typing effect - console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}") + # Check if user wants to exit + if question.lower() == "exit": + console.print(Panel("Goodbye!", style="bold yellow")) + break - console.print("-" * console.width) + answer = brain.ask(question) + # Print the answer with typing effect + console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}") - brain.print_info() + console.print("-" * console.width) + +brain.print_info() ``` -And now you are all set up to talk with your brain ! +And now you are all set up to talk with your brain ! ## Custom Brain If you want to change the language or embeddings model, you can modify the parameters of the brain. -Let's say you want to use Mistral llm and a specific embedding model : -```python +Let's say you want to use a LLM from Mistral and a specific embedding model : +```python from quivr_core import Brain from langchain_core.embeddings import Embeddings -brain = Brain.from_files(name = "my smart brain", +brain = Brain.from_files(name = "my smart brain", file_paths = ["/my_smart_doc.pdf", "/my_intelligent_doc.txt"], llm=LLMEndpoint( llm_config=LLMEndpointConfig(model="mistral-small-latest", llm_base_url="https://api.mistral.ai/v1/chat/completions"), @@ -68,12 +72,12 @@ Note : [Embeddings](https://python.langchain.com/docs/integrations/text_embeddin ## Launch with Chainlit -If you want to quickly launch an interface with Chainlit, you can simply do at the root of the project : -```bash +If you want to quickly launch an interface with streamlit, you can simply do at the root of the project : +```bash cd examples/chatbot / rye sync / rye run chainlit run chainlit.py ``` For more detail, go in [examples/chatbot/chainlit.md](https://github.com/QuivrHQ/quivr/tree/main/examples/chatbot) -Note : Modify the Brain configs directly in examples/chatbot/main.py; \ No newline at end of file +Note : Modify the Brain configs directly in examples/chatbot/main.py; diff --git a/docs/docs/workflows/examples/basic_ingestion.excalidraw.png b/docs/docs/workflows/examples/basic_ingestion.excalidraw.png new file mode 100644 index 000000000..4fa23af25 Binary files /dev/null and b/docs/docs/workflows/examples/basic_ingestion.excalidraw.png differ diff --git a/docs/docs/workflows/examples/basic_ingestion.md b/docs/docs/workflows/examples/basic_ingestion.md new file mode 100644 index 000000000..feb4bcd43 --- /dev/null +++ b/docs/docs/workflows/examples/basic_ingestion.md @@ -0,0 +1,77 @@ +# Basic ingestion + +![](basic_ingestion.excalidraw.png) + + +Creating a basic ingestion workflow like the one above is simple, here are the steps: + +1. Add your API Keys to your environment variables +```python +import os +os.environ["OPENAI_API_KEY"] = "myopenai_apikey" + +``` +Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama. + +2. Create the YAML file ``basic_ingestion_workflow.yaml`` and copy the following content in it +```yaml +parser_config: + megaparse_config: + strategy: "auto" # for unstructured, it can be "auto", "fast", "hi_res", "ocr_only", see https://docs.unstructured.io/open-source/concepts/partitioning-strategies#partitioning-strategies + pdf_parser: "unstructured" + splitter_config: + chunk_size: 400 # in tokens + chunk_overlap: 100 # in tokens +``` + +3. Create a Brain using the above configuration and the list of files you want to ingest +```python +from quivr_core import Brain +from quivr_core.config import IngestionConfig + +config_file_name = "./basic_ingestion_workflow.yaml" + +ingestion_config = IngestionConfig.from_yaml(config_file_name) + +processor_kwargs = { + "megaparse_config": ingestion_config.parser_config.megaparse_config, + "splitter_config": ingestion_config.parser_config.splitter_config, +} + +brain = Brain.from_files(name = "my smart brain", + file_paths = ["./my_first_doc.pdf", "./my_second_doc.txt"], + processor_kwargs=processor_kwargs, + ) + +``` + +4. Launch a Chat +```python +brain.print_info() + +from rich.console import Console +from rich.panel import Panel +from rich.prompt import Prompt + +console = Console() +console.print(Panel.fit("Ask your brain !", style="bold magenta")) + +while True: + # Get user input + question = Prompt.ask("[bold cyan]Question[/bold cyan]") + + # Check if user wants to exit + if question.lower() == "exit": + console.print(Panel("Goodbye!", style="bold yellow")) + break + + answer = brain.ask(question) + # Print the answer with typing effect + console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}") + + console.print("-" * console.width) + +brain.print_info() +``` + +5. You are now all set up to talk with your brain and test different chunking strategies by simply changing the configuration file! diff --git a/docs/docs/workflows/examples/basic_rag.excalidraw.png b/docs/docs/workflows/examples/basic_rag.excalidraw.png new file mode 100644 index 000000000..69ee0ff1a Binary files /dev/null and b/docs/docs/workflows/examples/basic_rag.excalidraw.png differ diff --git a/docs/docs/workflows/examples/basic_rag.md b/docs/docs/workflows/examples/basic_rag.md new file mode 100644 index 000000000..f58264450 --- /dev/null +++ b/docs/docs/workflows/examples/basic_rag.md @@ -0,0 +1,106 @@ +# Basic RAG + +![](basic_rag.excalidraw.png) + + +Creating a basic RAG workflow like the one above is simple, here are the steps: + + +1. Add your API Keys to your environment variables +```python +import os +os.environ["OPENAI_API_KEY"] = "myopenai_apikey" + +``` +Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama. + +2. Create the YAML file ``basic_rag_workflow.yaml`` and copy the following content in it +```yaml +workflow_config: + name: "standard RAG" + nodes: + - name: "START" + edges: ["filter_history"] + + - name: "filter_history" + edges: ["rewrite"] + + - name: "rewrite" + edges: ["retrieve"] + + - name: "retrieve" + edges: ["generate_rag"] + + - name: "generate_rag" # the name of the last node, from which we want to stream the answer to the user + edges: ["END"] + +# Maximum number of previous conversation iterations +# to include in the context of the answer +max_history: 10 + +# Reranker configuration +reranker_config: + # The reranker supplier to use + supplier: "cohere" + + # The model to use for the reranker for the given supplier + model: "rerank-multilingual-v3.0" + + # Number of chunks returned by the reranker + top_n: 5 + +# Configuration for the LLM +llm_config: + + # maximum number of tokens passed to the LLM to generate the answer + max_input_tokens: 4000 + + # temperature for the LLM + temperature: 0.7 +``` + +3. Create a Brain with the default configuration +```python +from quivr_core import Brain + +brain = Brain.from_files(name = "my smart brain", + file_paths = ["./my_first_doc.pdf", "./my_second_doc.txt"], + ) + +``` + +4. Launch a Chat +```python +brain.print_info() + +from rich.console import Console +from rich.panel import Panel +from rich.prompt import Prompt +from quivr_core.config import RetrievalConfig + +config_file_name = "./basic_rag_workflow.yaml" + +retrieval_config = RetrievalConfig.from_yaml(config_file_name) + +console = Console() +console.print(Panel.fit("Ask your brain !", style="bold magenta")) + +while True: + # Get user input + question = Prompt.ask("[bold cyan]Question[/bold cyan]") + + # Check if user wants to exit + if question.lower() == "exit": + console.print(Panel("Goodbye!", style="bold yellow")) + break + + answer = brain.ask(question, retrieval_config=retrieval_config) + # Print the answer with typing effect + console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}") + + console.print("-" * console.width) + +brain.print_info() +``` + +5. You are now all set up to talk with your brain and test different retrieval strategies by simply changing the configuration file! diff --git a/docs/docs/workflows/examples/chat.md b/docs/docs/workflows/examples/chat.md deleted file mode 100644 index c4e8a6f00..000000000 --- a/docs/docs/workflows/examples/chat.md +++ /dev/null @@ -1,88 +0,0 @@ -# Chat - -Creating a custom brain workflow is simple, here are the steps : - -1. Create a workflow -2. Create a Brain with this workflow and append your files -3. Launch a chat -4. Chat with your brain ! - -### Use AssistantConfig - -First create a json configuration file in the rag_config_workflow.yaml format (see workflows): -```yaml -ingestion_config: - parser_config: - megaparse_config: - strategy: "fast" - pdf_parser: "unstructured" - splitter_config: - chunk_size: 400 - chunk_overlap: 100 - -retrieval_config: - workflow_config: - name: "standard RAG" - nodes: - - name: "START" - edges: ["filter_history"] - - - name: "filter_history" - edges: ["generate_chat_llm"] - - - name: "generate_chat_llm" # the name of the last node, from which we want to stream the answer to the user, should always start with "generate" - edges: ["END"] - # Maximum number of previous conversation iterations - # to include in the context of the answer - max_history: 10 - - #prompt: "my prompt" - - max_files: 20 - reranker_config: - # The reranker supplier to use - supplier: "cohere" - - # The model to use for the reranker for the given supplier - model: "rerank-multilingual-v3.0" - - # Number of chunks returned by the reranker - top_n: 5 - llm_config: - # The LLM supplier to use - supplier: "openai" - - # The model to use for the LLM for the given supplier - model: "gpt-3.5-turbo-0125" - - max_input_tokens: 2000 - - # Maximum number of tokens to pass to the LLM - # as a context to generate the answer - max_output_tokens: 2000 - - temperature: 0.7 - streaming: true - -``` -This brain is set up to : - * Filter history and keep only the latest conversations - * Ask the question to the brain - * Generate answer - - -Then, when instanciating your Brain, add the custom config you created: - -```python - assistant_config = AssistantConfig.from_yaml("my_config_file.yaml") - processor_kwargs = { - "assistant_config": assistant_config - } - -``` - - - - - - diff --git a/docs/docs/workflows/examples/rag_with_internet.md b/docs/docs/workflows/examples/rag_with_internet.md deleted file mode 100644 index 7ff5f3792..000000000 --- a/docs/docs/workflows/examples/rag_with_internet.md +++ /dev/null @@ -1 +0,0 @@ -# RAG with Internet \ No newline at end of file diff --git a/docs/docs/workflows/examples/rag_with_web_search.excalidraw.png b/docs/docs/workflows/examples/rag_with_web_search.excalidraw.png new file mode 100644 index 000000000..886e64e4a Binary files /dev/null and b/docs/docs/workflows/examples/rag_with_web_search.excalidraw.png differ diff --git a/docs/docs/workflows/examples/rag_with_web_search.md b/docs/docs/workflows/examples/rag_with_web_search.md new file mode 100644 index 000000000..a9cd74aec --- /dev/null +++ b/docs/docs/workflows/examples/rag_with_web_search.md @@ -0,0 +1,135 @@ +# RAG with web search + + +![](rag_with_web_search.excalidraw.png) + +Follow the instructions below to create the agentic RAG workflow shown above, which includes some advanced capabilities such as: + +* **user intention detection** - the agent can detect if the user wants to activate the web search tool to look for information not present in the documents; +* **dynamic chunk retrieval** - the number of retrieved chunks is not fixed, but determined dynamically using the reranker's relevance scores and the user-provided ``relevance_score_threshold``; +* **web search** - the agent can search the web for more information if needed. + + +--- + +1. Add your API Keys to your environment variables +```python +import os +os.environ["OPENAI_API_KEY"] = "myopenai_apikey" + +``` +Check our `.env.example` file to see the possible environment variables you can configure. Quivr supports APIs from Anthropic, OpenAI, and Mistral. It also supports local models using Ollama. + +2. Create the YAML file ``rag_with_web_search_workflow.yaml`` and copy the following content in it +```yaml +workflow_config: + name: "RAG with web search" + + # List of tools that the agent can activate if the user instructions require it + available_tools: + - "web search" + + nodes: + - name: "START" + conditional_edge: + routing_function: "routing_split" + conditions: ["edit_system_prompt", "filter_history"] + + - name: "edit_system_prompt" + edges: ["filter_history"] + + - name: "filter_history" + edges: ["dynamic_retrieve"] + + - name: "dynamic_retrieve" + conditional_edge: + routing_function: "tool_routing" + conditions: ["run_tool", "generate_rag"] + + - name: "run_tool" + edges: ["generate_rag"] + + - name: "generate_rag" # the name of the last node, from which we want to stream the answer to the user + edges: ["END"] + tools: + - name: "cited_answer" + +# Maximum number of previous conversation iterations +# to include in the context of the answer +max_history: 10 + +# Number of chunks returned by the retriever +k: 40 + +# Reranker configuration +reranker_config: + # The reranker supplier to use + supplier: "cohere" + + # The model to use for the reranker for the given supplier + model: "rerank-multilingual-v3.0" + + # Number of chunks returned by the reranker + top_n: 5 + + # Among the chunks returned by the reranker, only those with relevance + # scores equal or above the relevance_score_threshold will be returned + # to the LLM to generate the answer (allowed values are between 0 and 1, + # a value of 0.1 works well with the cohere and jina rerankers) + relevance_score_threshold: 0.01 + +# LLM configuration +llm_config: + + # maximum number of tokens passed to the LLM to generate the answer + max_input_tokens: 8000 + + # temperature for the LLM + temperature: 0.7 +``` + +3. Create a Brain with the default configuration +```python +from quivr_core import Brain + +brain = Brain.from_files(name = "my smart brain", + file_paths = ["./my_first_doc.pdf", "./my_second_doc.txt"], + ) + +``` + +4. Launch a Chat +```python +brain.print_info() + +from rich.console import Console +from rich.panel import Panel +from rich.prompt import Prompt +from quivr_core.config import RetrievalConfig + +config_file_name = "./rag_with_web_search_workflow.yaml" + +retrieval_config = RetrievalConfig.from_yaml(config_file_name) + +console = Console() +console.print(Panel.fit("Ask your brain !", style="bold magenta")) + +while True: + # Get user input + question = Prompt.ask("[bold cyan]Question[/bold cyan]") + + # Check if user wants to exit + if question.lower() == "exit": + console.print(Panel("Goodbye!", style="bold yellow")) + break + + answer = brain.ask(question, retrieval_config=retrieval_config) + # Print the answer with typing effect + console.print(f"[bold green]Quivr Assistant[/bold green]: {answer.answer}") + + console.print("-" * console.width) + +brain.print_info() +``` + +5. You are now all set up to talk with your brain and test different retrieval strategies by simply changing the configuration file! diff --git a/docs/docs/workflows/index.md b/docs/docs/workflows/index.md index af4abbfe2..1e0ecb6e4 100644 --- a/docs/docs/workflows/index.md +++ b/docs/docs/workflows/index.md @@ -1 +1,3 @@ -# Configuration \ No newline at end of file +# Workflows + +In this section, you will find examples of workflows that you can use to create your own agentic RAG systems. diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml index 6c8e1b2ec..5614b8bc6 100644 --- a/docs/mkdocs.yml +++ b/docs/mkdocs.yml @@ -79,10 +79,14 @@ nav: - Workflows: - workflows/index.md - Examples: - - workflows/examples/chat.md - - workflows/examples/rag_with_internet.md + - workflows/examples/basic_ingestion.md + - workflows/examples/basic_rag.md + - workflows/examples/rag_with_web_search.md - Configuration: - config/index.md - - config/base_config.md - config/config.md + - config/base_config.md + - Examples: + - examples/index.md + - examples/custom_storage.md - Enterprise: https://docs.quivr.app/ diff --git a/docs/requirements-dev.lock b/docs/requirements-dev.lock index 2348d21d1..9c734393b 100644 --- a/docs/requirements-dev.lock +++ b/docs/requirements-dev.lock @@ -25,6 +25,8 @@ anthropic==0.36.1 anyio==4.6.2.post1 # via anthropic # via httpx +appnope==0.1.4 + # via ipykernel asttokens==2.4.1 # via stack-data attrs==24.2.0 @@ -76,8 +78,6 @@ fsspec==2024.9.0 # via huggingface-hub ghp-import==2.1.0 # via mkdocs -greenlet==3.1.1 - # via sqlalchemy griffe==1.2.0 # via mkdocstrings-python h11==0.14.0 diff --git a/docs/requirements.lock b/docs/requirements.lock index 2348d21d1..9c734393b 100644 --- a/docs/requirements.lock +++ b/docs/requirements.lock @@ -25,6 +25,8 @@ anthropic==0.36.1 anyio==4.6.2.post1 # via anthropic # via httpx +appnope==0.1.4 + # via ipykernel asttokens==2.4.1 # via stack-data attrs==24.2.0 @@ -76,8 +78,6 @@ fsspec==2024.9.0 # via huggingface-hub ghp-import==2.1.0 # via mkdocs -greenlet==3.1.1 - # via sqlalchemy griffe==1.2.0 # via mkdocstrings-python h11==0.14.0