2024-04-10 14:28:22 +03:00
|
|
|
import logging
|
feat(upload): async improved (#2544)
# Description
Hey,
Here's a breakdown of what I've done:
- Reducing the number of opened fd and memory footprint: Previously, for
each uploaded file, we were opening a temporary NamedTemporaryFile to
write existing content read from Supabase. However, due to the
dependency on `langchain` loader classes, we couldn't use memory buffers
for the loaders. Now, with the changes made, we only open a single
temporary file for each `process_file_and_notify`, cutting down on
excessive file opening, read syscalls, and memory buffer usage. This
could cause stability issues when ingesting and processing large volumes
of documents. Unfortunately, there is still reopening of temporary files
in some code paths but this can be improved further in later work.
- Removing `UploadFile` class from File: The `UploadFile` ( a FastAPI
abstraction over a SpooledTemporaryFile for multipart upload) was
redundant in our `File` setup since we already downloaded the file from
remote storage and read it into memory + wrote the file into a temp
file. By removing this abstraction, we streamline our code and eliminate
unnecessary complexity.
- `async` function Adjustments: I've removed the async labeling from
functions where it wasn't truly asynchronous. For instance, calling
`filter_file` for processing files isn't genuinely async, ass async file
reading isn't actually asynchronous—it [uses a threadpool for reading
the
file](https://github.com/encode/starlette/blob/9f16bf5c25e126200701f6e04330864f4a91a898/starlette/datastructures.py#L458)
. Given that we're already leveraging `celery` for parallelism (one
worker per core), we need to ensure that reading and processing occur in
the same thread, or at least minimize thread spawning. Additionally,
since the rest of the code isn't inherently asynchronous, our bottleneck
lies in CPU operations rather than asynchronous processing.
These changes aim to improve performance and streamline our codebase.
Let me know if you have any questions or suggestions for further
improvements!
## Checklist before requesting a review
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my code
- [x] I have ideally added tests that prove my fix is effective or that
my feature works
---------
Signed-off-by: aminediro <aminediro@github.com>
Co-authored-by: aminediro <aminediro@github.com>
Co-authored-by: Stan Girard <girard.stanislas@gmail.com>
2024-06-04 16:29:27 +03:00
|
|
|
import os
|
2024-04-10 14:28:22 +03:00
|
|
|
|
2024-04-07 04:35:57 +03:00
|
|
|
import litellm
|
2024-04-10 14:28:22 +03:00
|
|
|
import sentry_sdk
|
feat(upload): async improved (#2544)
# Description
Hey,
Here's a breakdown of what I've done:
- Reducing the number of opened fd and memory footprint: Previously, for
each uploaded file, we were opening a temporary NamedTemporaryFile to
write existing content read from Supabase. However, due to the
dependency on `langchain` loader classes, we couldn't use memory buffers
for the loaders. Now, with the changes made, we only open a single
temporary file for each `process_file_and_notify`, cutting down on
excessive file opening, read syscalls, and memory buffer usage. This
could cause stability issues when ingesting and processing large volumes
of documents. Unfortunately, there is still reopening of temporary files
in some code paths but this can be improved further in later work.
- Removing `UploadFile` class from File: The `UploadFile` ( a FastAPI
abstraction over a SpooledTemporaryFile for multipart upload) was
redundant in our `File` setup since we already downloaded the file from
remote storage and read it into memory + wrote the file into a temp
file. By removing this abstraction, we streamline our code and eliminate
unnecessary complexity.
- `async` function Adjustments: I've removed the async labeling from
functions where it wasn't truly asynchronous. For instance, calling
`filter_file` for processing files isn't genuinely async, ass async file
reading isn't actually asynchronous—it [uses a threadpool for reading
the
file](https://github.com/encode/starlette/blob/9f16bf5c25e126200701f6e04330864f4a91a898/starlette/datastructures.py#L458)
. Given that we're already leveraging `celery` for parallelism (one
worker per core), we need to ensure that reading and processing occur in
the same thread, or at least minimize thread spawning. Additionally,
since the rest of the code isn't inherently asynchronous, our bottleneck
lies in CPU operations rather than asynchronous processing.
These changes aim to improve performance and streamline our codebase.
Let me know if you have any questions or suggestions for further
improvements!
## Checklist before requesting a review
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my code
- [x] I have ideally added tests that prove my fix is effective or that
my feature works
---------
Signed-off-by: aminediro <aminediro@github.com>
Co-authored-by: aminediro <aminediro@github.com>
Co-authored-by: Stan Girard <girard.stanislas@gmail.com>
2024-06-04 16:29:27 +03:00
|
|
|
from dotenv import load_dotenv # type: ignore
|
2024-04-28 16:10:21 +03:00
|
|
|
from fastapi import FastAPI, HTTPException, Request
|
|
|
|
from fastapi.responses import HTMLResponse, JSONResponse
|
|
|
|
from pyinstrument import Profiler
|
2024-06-27 13:51:01 +03:00
|
|
|
from quivr_api.logger import get_logger
|
|
|
|
from quivr_api.middlewares.cors import add_cors_middleware
|
|
|
|
from quivr_api.modules.analytics.controller.analytics_routes import analytics_router
|
|
|
|
from quivr_api.modules.api_key.controller import api_key_router
|
|
|
|
from quivr_api.modules.assistant.controller import assistant_router
|
|
|
|
from quivr_api.modules.brain.controller import brain_router
|
|
|
|
from quivr_api.modules.chat.controller import chat_router
|
|
|
|
from quivr_api.modules.knowledge.controller import knowledge_router
|
|
|
|
from quivr_api.modules.misc.controller import misc_router
|
2024-08-06 15:51:27 +03:00
|
|
|
from quivr_api.modules.models.controller.model_routes import model_router
|
2024-06-27 13:51:01 +03:00
|
|
|
from quivr_api.modules.onboarding.controller import onboarding_router
|
|
|
|
from quivr_api.modules.prompt.controller import prompt_router
|
|
|
|
from quivr_api.modules.sync.controller import sync_router
|
|
|
|
from quivr_api.modules.upload.controller import upload_router
|
|
|
|
from quivr_api.modules.user.controller import user_router
|
|
|
|
from quivr_api.packages.utils import handle_request_validation_error
|
|
|
|
from quivr_api.packages.utils.telemetry import maybe_send_telemetry
|
|
|
|
from quivr_api.routes.crawl_routes import crawl_router
|
|
|
|
from quivr_api.routes.subscription_routes import subscription_router
|
2023-11-27 12:08:00 +03:00
|
|
|
from sentry_sdk.integrations.fastapi import FastApiIntegration
|
2023-11-28 16:27:39 +03:00
|
|
|
from sentry_sdk.integrations.starlette import StarletteIntegration
|
2023-05-31 14:51:23 +03:00
|
|
|
|
feat(upload): async improved (#2544)
# Description
Hey,
Here's a breakdown of what I've done:
- Reducing the number of opened fd and memory footprint: Previously, for
each uploaded file, we were opening a temporary NamedTemporaryFile to
write existing content read from Supabase. However, due to the
dependency on `langchain` loader classes, we couldn't use memory buffers
for the loaders. Now, with the changes made, we only open a single
temporary file for each `process_file_and_notify`, cutting down on
excessive file opening, read syscalls, and memory buffer usage. This
could cause stability issues when ingesting and processing large volumes
of documents. Unfortunately, there is still reopening of temporary files
in some code paths but this can be improved further in later work.
- Removing `UploadFile` class from File: The `UploadFile` ( a FastAPI
abstraction over a SpooledTemporaryFile for multipart upload) was
redundant in our `File` setup since we already downloaded the file from
remote storage and read it into memory + wrote the file into a temp
file. By removing this abstraction, we streamline our code and eliminate
unnecessary complexity.
- `async` function Adjustments: I've removed the async labeling from
functions where it wasn't truly asynchronous. For instance, calling
`filter_file` for processing files isn't genuinely async, ass async file
reading isn't actually asynchronous—it [uses a threadpool for reading
the
file](https://github.com/encode/starlette/blob/9f16bf5c25e126200701f6e04330864f4a91a898/starlette/datastructures.py#L458)
. Given that we're already leveraging `celery` for parallelism (one
worker per core), we need to ensure that reading and processing occur in
the same thread, or at least minimize thread spawning. Additionally,
since the rest of the code isn't inherently asynchronous, our bottleneck
lies in CPU operations rather than asynchronous processing.
These changes aim to improve performance and streamline our codebase.
Let me know if you have any questions or suggestions for further
improvements!
## Checklist before requesting a review
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my code
- [x] I have ideally added tests that prove my fix is effective or that
my feature works
---------
Signed-off-by: aminediro <aminediro@github.com>
Co-authored-by: aminediro <aminediro@github.com>
Co-authored-by: Stan Girard <girard.stanislas@gmail.com>
2024-06-04 16:29:27 +03:00
|
|
|
load_dotenv()
|
|
|
|
|
2024-04-07 04:35:57 +03:00
|
|
|
# Set the logging level for all loggers to WARNING
|
|
|
|
logging.basicConfig(level=logging.INFO)
|
|
|
|
logging.getLogger("httpx").setLevel(logging.WARNING)
|
|
|
|
logging.getLogger("LiteLLM").setLevel(logging.WARNING)
|
|
|
|
logging.getLogger("litellm").setLevel(logging.WARNING)
|
2024-07-15 20:10:03 +03:00
|
|
|
get_logger("quivr_core")
|
2024-04-07 04:35:57 +03:00
|
|
|
litellm.set_verbose = False
|
2023-05-21 02:20:55 +03:00
|
|
|
|
2023-10-30 12:18:04 +03:00
|
|
|
|
2024-04-07 04:35:57 +03:00
|
|
|
logger = get_logger(__name__)
|
2023-10-09 16:23:13 +03:00
|
|
|
|
|
|
|
|
2024-02-24 06:32:32 +03:00
|
|
|
def before_send(event, hint):
|
|
|
|
# If this is a transaction event
|
|
|
|
if event["type"] == "transaction":
|
|
|
|
# And the transaction name contains 'healthz'
|
|
|
|
if "healthz" in event["transaction"]:
|
|
|
|
# Drop the event by returning None
|
|
|
|
return None
|
|
|
|
# For other events, return them as is
|
|
|
|
return event
|
|
|
|
|
|
|
|
|
2023-07-14 22:02:26 +03:00
|
|
|
sentry_dsn = os.getenv("SENTRY_DSN")
|
|
|
|
if sentry_dsn:
|
2023-07-01 22:12:13 +03:00
|
|
|
sentry_sdk.init(
|
2023-07-14 22:02:26 +03:00
|
|
|
dsn=sentry_dsn,
|
2023-11-28 16:27:39 +03:00
|
|
|
sample_rate=0.1,
|
|
|
|
enable_tracing=True,
|
2024-02-23 01:48:45 +03:00
|
|
|
traces_sample_rate=0.1,
|
2023-11-28 16:27:39 +03:00
|
|
|
integrations=[
|
2024-02-24 06:32:32 +03:00
|
|
|
StarletteIntegration(transaction_style="url"),
|
|
|
|
FastApiIntegration(transaction_style="url"),
|
2023-11-28 16:27:39 +03:00
|
|
|
],
|
2024-02-24 06:32:32 +03:00
|
|
|
before_send=before_send,
|
2023-07-01 22:12:13 +03:00
|
|
|
)
|
|
|
|
|
2023-05-21 02:20:55 +03:00
|
|
|
app = FastAPI()
|
2023-06-04 00:12:42 +03:00
|
|
|
add_cors_middleware(app)
|
2023-06-20 22:53:04 +03:00
|
|
|
|
2023-06-20 17:17:13 +03:00
|
|
|
app.include_router(brain_router)
|
2023-06-11 00:59:16 +03:00
|
|
|
app.include_router(chat_router)
|
|
|
|
app.include_router(crawl_router)
|
2024-04-10 14:28:22 +03:00
|
|
|
app.include_router(assistant_router)
|
2024-05-21 23:20:35 +03:00
|
|
|
app.include_router(sync_router)
|
2023-10-05 12:31:26 +03:00
|
|
|
app.include_router(onboarding_router)
|
2023-06-11 00:59:16 +03:00
|
|
|
app.include_router(misc_router)
|
2024-04-10 12:20:21 +03:00
|
|
|
app.include_router(analytics_router)
|
2023-06-11 00:59:16 +03:00
|
|
|
app.include_router(upload_router)
|
|
|
|
app.include_router(user_router)
|
2023-06-14 22:21:13 +03:00
|
|
|
app.include_router(api_key_router)
|
2023-07-11 19:20:31 +03:00
|
|
|
app.include_router(subscription_router)
|
2023-08-03 10:53:38 +03:00
|
|
|
app.include_router(prompt_router)
|
2023-09-20 10:35:37 +03:00
|
|
|
app.include_router(knowledge_router)
|
2024-08-06 15:51:27 +03:00
|
|
|
app.include_router(model_router)
|
2023-06-22 18:50:06 +03:00
|
|
|
|
2024-04-28 16:10:21 +03:00
|
|
|
PROFILING = os.getenv("PROFILING", "false").lower() == "true"
|
|
|
|
|
|
|
|
|
|
|
|
if PROFILING:
|
|
|
|
|
|
|
|
@app.middleware("http")
|
|
|
|
async def profile_request(request: Request, call_next):
|
|
|
|
profiling = request.query_params.get("profile", False)
|
|
|
|
if profiling:
|
|
|
|
profiler = Profiler()
|
|
|
|
profiler.start()
|
|
|
|
await call_next(request)
|
|
|
|
profiler.stop()
|
|
|
|
return HTMLResponse(profiler.output_html())
|
|
|
|
else:
|
|
|
|
return await call_next(request)
|
|
|
|
|
2023-07-14 22:02:26 +03:00
|
|
|
|
2023-06-22 18:50:06 +03:00
|
|
|
@app.exception_handler(HTTPException)
|
2023-06-23 11:36:55 +03:00
|
|
|
async def http_exception_handler(_, exc):
|
2023-06-22 18:50:06 +03:00
|
|
|
return JSONResponse(
|
|
|
|
status_code=exc.status_code,
|
|
|
|
content={"detail": exc.detail},
|
|
|
|
)
|
2023-08-01 10:24:57 +03:00
|
|
|
|
|
|
|
|
|
|
|
handle_request_validation_error(app)
|
2023-08-08 18:01:31 +03:00
|
|
|
|
2024-02-08 07:44:37 +03:00
|
|
|
if os.getenv("TELEMETRY_ENABLED") == "true":
|
|
|
|
logger.info("Telemetry enabled, we use telemetry to collect anonymous usage data.")
|
|
|
|
logger.info(
|
|
|
|
"To disable telemetry, set the TELEMETRY_ENABLED environment variable to false."
|
|
|
|
)
|
2024-04-07 04:35:57 +03:00
|
|
|
maybe_send_telemetry("booting", {"status": "ok"})
|
2024-04-25 17:22:13 +03:00
|
|
|
maybe_send_telemetry("ping", {"ping": "pong"})
|
2024-02-08 07:44:37 +03:00
|
|
|
|
|
|
|
|
2023-08-08 18:01:31 +03:00
|
|
|
if __name__ == "__main__":
|
|
|
|
# run main.py to debug backend
|
|
|
|
import uvicorn
|
|
|
|
|
2024-06-26 10:58:55 +03:00
|
|
|
uvicorn.run(app, host="0.0.0.0", port=5050, log_level="debug", access_log=False)
|