This pull request includes updates to the `docker-compose.dev.yml` and
`Dockerfile.dev` files. The changes aim to improve performance and fix
bugs. The updates include:
- Removing unnecessary workers configuration in the
`docker-compose.dev.yml` file.
- Updating the base image in the `Dockerfile.dev` to use a slim version.
- Adjusting the schedule for a specific task in the code.
- Modifying the time interval for retrieving active syncs.
- Changing the loader class for processing PowerPoint files.
- Refactoring the file existence check logic.
- Adding debug logs for file existence check and file removal.
- Adjusting the file synchronization logic.
These changes are intended to enhance the performance and stability of
the application.
# Description
Hey,
Here's a breakdown of what I've done:
- Reducing the number of opened fd and memory footprint: Previously, for
each uploaded file, we were opening a temporary NamedTemporaryFile to
write existing content read from Supabase. However, due to the
dependency on `langchain` loader classes, we couldn't use memory buffers
for the loaders. Now, with the changes made, we only open a single
temporary file for each `process_file_and_notify`, cutting down on
excessive file opening, read syscalls, and memory buffer usage. This
could cause stability issues when ingesting and processing large volumes
of documents. Unfortunately, there is still reopening of temporary files
in some code paths but this can be improved further in later work.
- Removing `UploadFile` class from File: The `UploadFile` ( a FastAPI
abstraction over a SpooledTemporaryFile for multipart upload) was
redundant in our `File` setup since we already downloaded the file from
remote storage and read it into memory + wrote the file into a temp
file. By removing this abstraction, we streamline our code and eliminate
unnecessary complexity.
- `async` function Adjustments: I've removed the async labeling from
functions where it wasn't truly asynchronous. For instance, calling
`filter_file` for processing files isn't genuinely async, ass async file
reading isn't actually asynchronous—it [uses a threadpool for reading
the
file](9f16bf5c25/starlette/datastructures.py (L458))
. Given that we're already leveraging `celery` for parallelism (one
worker per core), we need to ensure that reading and processing occur in
the same thread, or at least minimize thread spawning. Additionally,
since the rest of the code isn't inherently asynchronous, our bottleneck
lies in CPU operations rather than asynchronous processing.
These changes aim to improve performance and streamline our codebase.
Let me know if you have any questions or suggestions for further
improvements!
## Checklist before requesting a review
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my code
- [x] I have ideally added tests that prove my fix is effective or that
my feature works
---------
Signed-off-by: aminediro <aminediro@github.com>
Co-authored-by: aminediro <aminediro@github.com>
Co-authored-by: Stan Girard <girard.stanislas@gmail.com>
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Change the prompt of the thoughts feature to have more steps.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
---------
Co-authored-by: chloedia <chloedaems0@gmail.com>
This pull request refactors the generate_answer and generate_stream
functions in order to improve code readability and maintainability. It
also adds new fields to the cited_answer model and updates the system
message template.
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
This pull request fixes the sender email address in the
`resend_invitation_email` function in the `resend_invitation_email.py`
file. The `from` field has been changed to `sender` to ensure that the
correct email address is used when sending the invitation email.
This pull request updates the ChatLiteLLM model to "gpt-4o" and adds a
row-level security (RLS) optimization for notifications. It also
includes a new SQL script to drop and create a policy for allowing user
access to all notifications.
This pull request adds support for the gpt-4o model to the existing
codebase. It includes changes to the BrainConfig, openAiFreeModels,
defineMaxTokens, model_compatible_with_function_calling, create_graph,
main, and process_assistant functions.
This pull request fixes the import statements for OllamaEmbeddings in
multiple files. The import statements are updated to use the correct
package name "langchain_community.embeddings" instead of
"langchain.embeddings.ollama". This ensures that the code can be
compiled and executed without any import errors.
This pull request adds comprehensive docstrings to the Brain classes
within the `backend/modules/brain/integrations` directory, enhancing
code documentation and readability. The changes include:
- **BigBrain (`Big/Brain.py`)**: Adds a class-level docstring explaining
the purpose and functionality of the BigBrain class, along with
method-level docstrings detailing the operations performed by each
method.
- **ClaudeBrain (`Claude/Brain.py`)**: Introduces a class-level
docstring that describes the ClaudeBrain class's integration with the
Claude model for conversational AI capabilities, and method-level
docstrings that clarify the purpose of each method.
- **GPT4Brain (`GPT4/Brain.py`)**: Updates include a detailed
class-level docstring outlining the GPT4Brain's integration with GPT-4
for real-time answers and tool support, along with method-level
docstrings explaining the functionality of each method.
- **NotionBrain (`Notion/Brain.py`)**: Adds a class-level docstring that
describes the NotionBrain's role in leveraging Notion data for
knowledge-based responses.
- **ProxyBrain (`Proxy/Brain.py`)**: Incorporates a class-level
docstring explaining the ProxyBrain's function as a dynamic language
model selector and method-level docstrings detailing the operations of
each method.
These additions ensure that each Brain class and its methods are
well-documented, providing clear insights into their purposes and
functionalities.
---
For more details, open the [Copilot Workspace
session](https://copilot-workspace.githubnext.com/QuivrHQ/quivr?shareId=b4e301ad-828e-4424-95ec-6e378d5d3849).
Updates the GPT-4 documentation and the `GPT4Brain` class to include
detailed information about the tools available for GPT4Brain and their
use cases.
- **Documentation (`docs/brains/gpt4.mdx`):**
- Adds a new section titled "Tools Available for GPT4Brain" that
describes specific tools: WebSearchTool, ImageGeneratorTool,
URLReaderTool, and EmailSenderTool.
- Provides use cases for each tool, demonstrating how they can be
utilized within GPT4Brain for various scenarios, such as generating
images, reading content from URLs, and sending emails.
- **Code (`backend/modules/brain/integrations/GPT4/Brain.py`):**
- Updates the class documentation to include information about the tools
available for GPT4Brain and outlines use cases for WebSearchTool,
ImageGeneratorTool, URLReaderTool, and EmailSenderTool.
- Maintains the existing functionality of the `GPT4Brain` class,
ensuring compatibility with the newly documented tools and use cases.
---
For more details, open the [Copilot Workspace
session](https://copilot-workspace.githubnext.com/QuivrHQ/quivr?shareId=2c2c1666-e5fb-4a06-bb08-ca967f4fe276).
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
This pull request fixes the value of NEXT_PUBLIC_AUTH_MODES in the
docker-compose.yml file. The previous value was incorrect and has been
updated to the correct value.
This pull request adds a new feature to generate images using the OpenAI
DALL-E model. The `ImageGeneratorTool` class is implemented to handle
the image generation functionality.
This pull request adds the Playwright library for web crawling. It
includes the necessary dependencies and updates the code to use
Playwright for crawling websites.
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
This pull request adds a new config parameter to the
`conversational_qa_chain` function. The config parameter allows for
passing metadata, specifically the conversation ID, to the function.
This change ensures that the conversation ID is included in the metadata
when invoking the `conversational_qa_chain` function.
This pull request adds the ProxyBrain integration to the project. The
ProxyBrain class is responsible for handling conversational QA and
generating answers based on the provided chat history and question.
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
---------
Co-authored-by: Stan Girard <girard.stanislas@gmail.com>
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
This pull request removes the citation metadata from the generate_answer
and generate_stream functions. The citation metadata was previously
being added to the streamed_chat_history and metadata dictionaries, but
it is no longer necessary. This change improves the efficiency and
clarity of the code.
This pull request updates the chunk size and overlap parameters in the
File class to improve performance. It also increases the top_n value in
the compressor for both the CohereRerank and FlashrankRerank models.
Additionally, it ensures that the page content is encoded in UTF-8
before processing.
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
… to the DB API
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
This pull request adds the flashrank and contextual compression
retriever to the codebase. The flashrank reranker model is used for
compression, and the contextual compression retriever combines the base
compressor and base retriever to improve document retrieval.
This pull request updates the API documentation to include new sections
on configuring Quivr and contacting the Quivr team. It also removes the
"API Brains" section from the documentation.
This pull request fixes the issue of duplicate sources in the model
response and adds metadata to the response. It removes duplicate sources
with the same name and creates a list of unique sources. Additionally,
it includes the generated URLs and sources in the metadata of the model
response.
This pull request refactors the GPT4Brain and KnowledgeBrainQA classes
to add the functionality of saving non-streaming answers. It includes
changes to the `generate_answer` method and the addition of the
`save_non_streaming_answer` method. This enhancement improves the
overall functionality and performance of the code.