Commit Graph

15 Commits

Author SHA1 Message Date
AmineDiro
675885c762
feat(upload): async improved (#2544)
# Description
Hey,

Here's a breakdown of what I've done:

- Reducing the number of opened fd and memory footprint: Previously, for
each uploaded file, we were opening a temporary NamedTemporaryFile to
write existing content read from Supabase. However, due to the
dependency on `langchain` loader classes, we couldn't use memory buffers
for the loaders. Now, with the changes made, we only open a single
temporary file for each `process_file_and_notify`, cutting down on
excessive file opening, read syscalls, and memory buffer usage. This
could cause stability issues when ingesting and processing large volumes
of documents. Unfortunately, there is still reopening of temporary files
in some code paths but this can be improved further in later work.
- Removing `UploadFile` class from File: The `UploadFile` ( a FastAPI
abstraction over a SpooledTemporaryFile for multipart upload) was
redundant in our `File` setup since we already downloaded the file from
remote storage and read it into memory + wrote the file into a temp
file. By removing this abstraction, we streamline our code and eliminate
unnecessary complexity.
- `async` function Adjustments: I've removed the async labeling from
functions where it wasn't truly asynchronous. For instance, calling
`filter_file` for processing files isn't genuinely async, ass async file
reading isn't actually asynchronous—it [uses a threadpool for reading
the
file](9f16bf5c25/starlette/datastructures.py (L458))
. Given that we're already leveraging `celery` for parallelism (one
worker per core), we need to ensure that reading and processing occur in
the same thread, or at least minimize thread spawning. Additionally,
since the rest of the code isn't inherently asynchronous, our bottleneck
lies in CPU operations rather than asynchronous processing.

These changes aim to improve performance and streamline our codebase. 
Let me know if you have any questions or suggestions for further
improvements!

## Checklist before requesting a review
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my code
- [x] I have ideally added tests that prove my fix is effective or that
my feature works

---------

Signed-off-by: aminediro <aminediro@github.com>
Co-authored-by: aminediro <aminediro@github.com>
Co-authored-by: Stan Girard <girard.stanislas@gmail.com>
2024-06-04 06:29:27 -07:00
Stan Girard
5c28f16a09
Add additional modules to celery.autodiscover_tasks() (#2607)
# Description

Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.

## Checklist before requesting a review

Please delete options that are not relevant.

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged

## Screenshots (if appropriate):
2024-05-21 14:13:25 -07:00
Stan Girard
d41a0b4be4
Feat/auth-playground (#2605)
# Description

Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.

## Checklist before requesting a review

Please delete options that are not relevant.

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged

## Screenshots (if appropriate):
2024-05-21 13:20:35 -07:00
Stan Girard
848aed46ea
Revert "feat: Add Google Drive & Sharepoint sync in backend" (#2603)
Reverts QuivrHQ/quivr#2592
2024-05-21 08:37:08 -07:00
Stan Girard
8303aca9d9
feat: Add Google Drive & Sharepoint sync in backend (#2592)
# Description

Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.

## Checklist before requesting a review

Please delete options that are not relevant.

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged

## Screenshots (if appropriate):
2024-05-21 07:53:04 -07:00
Stan Girard
cd73412e8f
Revert "feat(celery): moved assistant summary to celery" (#2558)
Reverts QuivrHQ/quivr#2557
2024-05-07 09:20:42 -07:00
Stan Girard
55834365b2
feat(celery): moved assistant summary to celery (#2557)
This pull request moves the assistant summary functionality to the
celery module for better organization and separation of concerns.
2024-05-07 09:12:31 -07:00
Stan Girard
f5fb444bc8 fix: Update Celery config to remove SSL certificate requirement 2024-02-20 19:56:14 -08:00
Stan Girard
19c11944bc
feat: implement elasticache (#2234)
# Description

Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.

## Checklist before requesting a review

Please delete options that are not relevant.

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged

## Screenshots (if appropriate):
2024-02-20 19:52:14 -08:00
Stan Girard
2a8014d3a2 fix: Refactor Celery configuration to use environment variables 2024-02-20 10:14:29 -08:00
Stan Girard
42db236e40 Add Redis password logging in celery_config.py 2024-02-20 01:03:43 -08:00
Stan Girard
4406f1deb2 Add result_backend to Celery configuration 2024-02-20 01:02:38 -08:00
Stan Girard
41057cd58c Add logger to celery_config.py 2024-02-20 00:21:40 -08:00
Stan Girard
f406afcc45
Add Redis configuration to celery_config.py (#2227)
This pull request adds Redis configuration to the `celery_config.py`
file. The `CELERY_BROKER_URL` has been replaced with `REDIS_HOST`,
`REDIS_PORT`, and `REDIS_PASS` environment variables to configure the
Redis broker and backend. This change allows for better integration with
Redis in the application.
2024-02-19 23:51:18 -08:00
Stan Girard
4d91d1cadc
feat(integrations): integration with Notion in the backend (#2123)
moved to brains

# Description

Please include a summary of the changes and the related issue. Please
also include relevant motivation and context.

## Checklist before requesting a review

Please delete options that are not relevant.

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented hard-to-understand areas
- [ ] I have ideally added tests that prove my fix is effective or that
my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] Any dependent changes have been merged

## Screenshots (if appropriate):

---------

Co-authored-by: Antoine Dewez <44063631+Zewed@users.noreply.github.com>
2024-02-05 21:02:46 -08:00