This pull request includes updates to the `docker-compose.dev.yml` and
`Dockerfile.dev` files. The changes aim to improve performance and fix
bugs. The updates include:
- Removing unnecessary workers configuration in the
`docker-compose.dev.yml` file.
- Updating the base image in the `Dockerfile.dev` to use a slim version.
- Adjusting the schedule for a specific task in the code.
- Modifying the time interval for retrieving active syncs.
- Changing the loader class for processing PowerPoint files.
- Refactoring the file existence check logic.
- Adding debug logs for file existence check and file removal.
- Adjusting the file synchronization logic.
These changes are intended to enhance the performance and stability of
the application.
# Description
Hey,
Here's a breakdown of what I've done:
- Reducing the number of opened fd and memory footprint: Previously, for
each uploaded file, we were opening a temporary NamedTemporaryFile to
write existing content read from Supabase. However, due to the
dependency on `langchain` loader classes, we couldn't use memory buffers
for the loaders. Now, with the changes made, we only open a single
temporary file for each `process_file_and_notify`, cutting down on
excessive file opening, read syscalls, and memory buffer usage. This
could cause stability issues when ingesting and processing large volumes
of documents. Unfortunately, there is still reopening of temporary files
in some code paths but this can be improved further in later work.
- Removing `UploadFile` class from File: The `UploadFile` ( a FastAPI
abstraction over a SpooledTemporaryFile for multipart upload) was
redundant in our `File` setup since we already downloaded the file from
remote storage and read it into memory + wrote the file into a temp
file. By removing this abstraction, we streamline our code and eliminate
unnecessary complexity.
- `async` function Adjustments: I've removed the async labeling from
functions where it wasn't truly asynchronous. For instance, calling
`filter_file` for processing files isn't genuinely async, ass async file
reading isn't actually asynchronous—it [uses a threadpool for reading
the
file](9f16bf5c25/starlette/datastructures.py (L458))
. Given that we're already leveraging `celery` for parallelism (one
worker per core), we need to ensure that reading and processing occur in
the same thread, or at least minimize thread spawning. Additionally,
since the rest of the code isn't inherently asynchronous, our bottleneck
lies in CPU operations rather than asynchronous processing.
These changes aim to improve performance and streamline our codebase.
Let me know if you have any questions or suggestions for further
improvements!
## Checklist before requesting a review
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my code
- [x] I have ideally added tests that prove my fix is effective or that
my feature works
---------
Signed-off-by: aminediro <aminediro@github.com>
Co-authored-by: aminediro <aminediro@github.com>
Co-authored-by: Stan Girard <girard.stanislas@gmail.com>
This pull request updates the parsing instructions in the `common.py`
file for the `llamaparse` feature. The previous parsing instruction for
transforming checkboxes into text has been modified to also extract
tables and transform them into key-value pairs. Additionally, the
instruction now allows for duplicate keys if needed. The example
instructions have also been updated to provide clearer examples for both
tables and checkboxes.
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
---------
Co-authored-by: chloedia <chloedaems0@gmail.com>
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
This pull request adds the Playwright library for web crawling. It
includes the necessary dependencies and updates the code to use
Playwright for crawling websites.
# Description
Delete the replacement of non ASCII characters into spaces
## Checklist before requesting a review
Please delete options that are not relevant.
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [x] New and existing unit tests pass locally with my changes
- [x] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
This pull request fixes the parsing instruction in the common.py file.
The result_type has been corrected to "markdown" and the
parsing_instruction has been updated to handle checkboxes, tables, and
other elements that are hard to parse in a meaningful way.
This pull request adds Llama Parse integration for complex document
parsing in Quivr. Llama Parse is a tool from Llama Index that allows you
to read complex documents in Quivr. It provides an API key that needs to
be added to the `.env` file as `LLAMA_CLOUD_API_KEY`. Once configured,
you can use the Llama Parse tool to read `pdf`, `docx`, and `doc` files
in Quivr.
This pull request adds the pyinstrument package and updates the Makefile
and backend code. The pyinstrument package is used for profiling and the
Makefile and backend code have been modified to support profiling.
This pull request updates the chunk size and overlap parameters in the
File class to improve performance. It also increases the top_n value in
the compressor for both the CohereRerank and FlashrankRerank models.
Additionally, it ensures that the page content is encoded in UTF-8
before processing.
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
<!--
ELLIPSIS_HIDDEN
-->
----
| 🚀 This description was created by
[Ellipsis](https://www.ellipsis.dev) for commit
997576d577 |
|--------|
### Summary:
This PR involves a significant refactoring of the codebase, with file
and class relocations and renames, and updated import paths, without
introducing new functionality.
**Key points**:
- Significant refactoring of the codebase for improved organization and
clarity.
- File and class relocations and renames.
- Updated import paths to reflect new locations and names.
- No new functionality introduced.
----
Generated with ❤️ by [ellipsis.dev](https://www.ellipsis.dev)
<!--
ELLIPSIS_HIDDEN
-->
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
<!--
ELLIPSIS_HIDDEN
-->
----
| <a href="https://ellipsis.dev" target="_blank"><img
src="https://avatars.githubusercontent.com/u/80834858?s=400&u=31e596315b0d8f7465b3ee670f25cea677299c96&v=4"
alt="Ellipsis" width="30px" height="30px"/></a> | 🚀 This PR
description was created by [Ellipsis](https://www.ellipsis.dev) for
commit 5350e0a071. |
|--------|--------|
### Summary:
This PR adds a new script for evaluating the RAG model using the Ragas
library, with results saved as a JSON file and printed to the console.
**Key points**:
- New script `run_evaluation.py` added to
`backend/tests/ragas_evaluation/`.
- The script processes documents, generates replies using a QuivrRAG
chain, and evaluates the replies using specified metrics.
- Results are saved as a JSON file and printed to the console.
- The script can be run from the command line with various options.
----
Generated with ❤️ by [ellipsis.dev](https://www.ellipsis.dev)
<!--
ELLIPSIS_HIDDEN
-->
---------
Co-authored-by: Damien Mourot <damien.mourot@gmail.com>
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Added ability to upload .bib files to brain using langchain bibtex
document loader. Also changed frontend to allow the choosing of said
file.
Looking for guidance on how to run unit tests locally!
## Checklist before requesting a review
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my code
- [x] I have commented hard-to-understand areas
- [x] I have ideally added tests that prove my fix is effective or that
my feature works
- [x] New and existing unit tests pass locally with my changes
- [x] Any dependent changes have been merged
This pull request adds the ingestion module and routes to the project.
It includes the necessary files and code changes to implement the
ingestion functionality.
This pull request updates the langchain.prompts and
langchain_core.messages modules. It includes changes to the code that
improve functionality and fix any existing issues.
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
moved to brains
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
---------
Co-authored-by: Antoine Dewez <44063631+Zewed@users.noreply.github.com>
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
logic
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
- Refactor knowledge to a module
- This PR breaks the Github Processor function -> need to comment brain
and file imports as it creates a circular dependency issue. Should be
fixed and reverted in next PR.
fixed
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
added explore button and removed unused feature openai key
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
New Modules folder with "user" module:
- controller: contains the current route
- entity: contains the current Models (TO be renamed DTO)
- repository: contains the current repo
- service: methods used by other modules
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):
# Description
Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.
## Checklist before requesting a review
Please delete options that are not relevant.
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged
## Screenshots (if appropriate):