quivr/backend/Dockerfile

42 lines
1012 B
Docker
Raw Normal View History

2023-11-13 17:03:55 +03:00
# Using a slim version for a smaller base image
FROM python:3.11.6-slim-bullseye
ARG DEV_MODE
ENV DEV_MODE=$DEV_MODE
2023-11-13 17:03:55 +03:00
# Install GEOS library, Rust, and other dependencies, then clean up
RUN apt-get clean && apt-get update && apt-get install -y \
libgeos-dev \
libcurl4-openssl-dev \
libssl-dev \
binutils \
pandoc \
curl \
git \
Adds pytesseract, tesseract and poopler-utils (#1648) To enable the ingestion of copy protected PDF via OCR instead of text extraction # Description Copy protected PDF can't be properly imported via the standard langchain loader. See the following errors: ``` 2023-11-15 14:16:31,927 [INFO] models.files: Computing documents from file Cradle to Cradle Criteria for the built environmen.pdf [nltk_data] Downloading package punkt to [nltk_data] /home/pascal_gula_luccid_ai/nltk_data... [nltk_data] Unzipping tokenizers/punkt.zip. [nltk_data] Downloading package averaged_perceptron_tagger to [nltk_data] /home/pascal_gula_luccid_ai/nltk_data... [nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip. Error processing file: detectron2 is not installed, pytesseract is not installed and the text of the PDF is not extractable. To process this file, install detectron2, install pytesseract, or remove copy protection from the PDF. ``` ``` 2023-11-15 15:04:14,624 [INFO] models.files: Computing documents from file Cradle to Cradle Criteria for the built environmen.pdf Error processing file: Unable to get page count. Is poppler installed and in PATH? ``` ``` 023-11-15 15:59:11,886 [INFO] models.files: Computing documents from file Cradle to Cradle Criteria for the built environmen.pdf Error processing file: tesseract is not installed or it's not in your PATH. See README file for more information. ``` ## Checklist before requesting a review Please delete options that are not relevant. - [x] My code follows the style guidelines of this project - [x] I have performed a self-review of my code - [x] I have commented hard-to-understand areas - [x] I have ideally added tests that prove my fix is effective or that my feature works - [x] New and existing unit tests pass locally with my changes - [x] Any dependent changes have been merged ## Screenshots (if appropriate): None Co-authored-by: Stan Girard <girard.stanislas@gmail.com>
2023-11-22 19:26:11 +03:00
poppler-utils \
tesseract-ocr \
build-essential && \
rm -rf /var/lib/apt/lists/* && apt-get clean
New Webapp migration (#56) * feat(v2): loaders added * feature: Add scroll animations * feature: upload ui * feature: upload multiple files * fix: Same file name and size remove * feat(crawler): added * feat(parsers): v2 added more * feat(v2): audio now working * feat(v2): all loaders * feat(v2): explorer * chore: add links * feat(api): added status in return message * refactor(website): remove old code * feat(upload): return type for messages * feature: redirect to upload if ENV=local * fix(chat): fixed some issues * feature: respect response type * loading state * feature: Loading stat * feat(v2): added explore and chat pages * feature: modal settings * style: Chat UI * feature: scroll to bottom when chatting * feature: smooth scroll in chat * feature(anim): Slide chat in * feature: markdown chat * feat(explorer): list * feat(doc): added document item * feat(explore): added modal * Add clarification on Project API keys and web interface for migration scripts to Readme (#58) * fix(demo): changed link * add support to uploading zip file (#62) * Catch UnicodeEncodeError exception (#64) * feature: fixed chatbar * fix(loaders): missing argument * fix: layout * fix: One whole chatbox * fix: Scroll into view * fix(build): vercel issues * chore(streamlit): moved to own file * refactor(api): moved to backend folder * feat(docker): added docker compose * Fix a bug where langchain memories were not being cleaned (#71) * Update README.md (#70) * chore(streamlit): moved to own file * refactor(api): moved to backend folder * docs(readme): updated for new version * docs(readme): added old readme * docs(readme): update copy dot env file * docs(readme): cleanup --------- Co-authored-by: iMADi-ARCH <nandanaditya985@gmail.com> Co-authored-by: Matt LeBel <github@lebel.io> Co-authored-by: Evan Carlson <45178375+EvanCarlson@users.noreply.github.com> Co-authored-by: Mustafa Hasan Khan <65130881+mustafahasankhan@users.noreply.github.com> Co-authored-by: zhulixi <48713110+zlxxlz1026@users.noreply.github.com> Co-authored-by: Stanisław Tuszyński <stanislaw@tuszynski.me>
2023-05-21 02:20:55 +03:00
# Add Rust binaries to the PATH
ENV PATH="/root/.cargo/bin:${PATH}"
New Webapp migration (#56) * feat(v2): loaders added * feature: Add scroll animations * feature: upload ui * feature: upload multiple files * fix: Same file name and size remove * feat(crawler): added * feat(parsers): v2 added more * feat(v2): audio now working * feat(v2): all loaders * feat(v2): explorer * chore: add links * feat(api): added status in return message * refactor(website): remove old code * feat(upload): return type for messages * feature: redirect to upload if ENV=local * fix(chat): fixed some issues * feature: respect response type * loading state * feature: Loading stat * feat(v2): added explore and chat pages * feature: modal settings * style: Chat UI * feature: scroll to bottom when chatting * feature: smooth scroll in chat * feature(anim): Slide chat in * feature: markdown chat * feat(explorer): list * feat(doc): added document item * feat(explore): added modal * Add clarification on Project API keys and web interface for migration scripts to Readme (#58) * fix(demo): changed link * add support to uploading zip file (#62) * Catch UnicodeEncodeError exception (#64) * feature: fixed chatbar * fix(loaders): missing argument * fix: layout * fix: One whole chatbox * fix: Scroll into view * fix(build): vercel issues * chore(streamlit): moved to own file * refactor(api): moved to backend folder * feat(docker): added docker compose * Fix a bug where langchain memories were not being cleaned (#71) * Update README.md (#70) * chore(streamlit): moved to own file * refactor(api): moved to backend folder * docs(readme): updated for new version * docs(readme): added old readme * docs(readme): update copy dot env file * docs(readme): cleanup --------- Co-authored-by: iMADi-ARCH <nandanaditya985@gmail.com> Co-authored-by: Matt LeBel <github@lebel.io> Co-authored-by: Evan Carlson <45178375+EvanCarlson@users.noreply.github.com> Co-authored-by: Mustafa Hasan Khan <65130881+mustafahasankhan@users.noreply.github.com> Co-authored-by: zhulixi <48713110+zlxxlz1026@users.noreply.github.com> Co-authored-by: Stanisław Tuszyński <stanislaw@tuszynski.me>
2023-05-21 02:20:55 +03:00
WORKDIR /code
# Copy just the requirements first
COPY ./requirements.txt .
New Webapp migration (#56) * feat(v2): loaders added * feature: Add scroll animations * feature: upload ui * feature: upload multiple files * fix: Same file name and size remove * feat(crawler): added * feat(parsers): v2 added more * feat(v2): audio now working * feat(v2): all loaders * feat(v2): explorer * chore: add links * feat(api): added status in return message * refactor(website): remove old code * feat(upload): return type for messages * feature: redirect to upload if ENV=local * fix(chat): fixed some issues * feature: respect response type * loading state * feature: Loading stat * feat(v2): added explore and chat pages * feature: modal settings * style: Chat UI * feature: scroll to bottom when chatting * feature: smooth scroll in chat * feature(anim): Slide chat in * feature: markdown chat * feat(explorer): list * feat(doc): added document item * feat(explore): added modal * Add clarification on Project API keys and web interface for migration scripts to Readme (#58) * fix(demo): changed link * add support to uploading zip file (#62) * Catch UnicodeEncodeError exception (#64) * feature: fixed chatbar * fix(loaders): missing argument * fix: layout * fix: One whole chatbox * fix: Scroll into view * fix(build): vercel issues * chore(streamlit): moved to own file * refactor(api): moved to backend folder * feat(docker): added docker compose * Fix a bug where langchain memories were not being cleaned (#71) * Update README.md (#70) * chore(streamlit): moved to own file * refactor(api): moved to backend folder * docs(readme): updated for new version * docs(readme): added old readme * docs(readme): update copy dot env file * docs(readme): cleanup --------- Co-authored-by: iMADi-ARCH <nandanaditya985@gmail.com> Co-authored-by: Matt LeBel <github@lebel.io> Co-authored-by: Evan Carlson <45178375+EvanCarlson@users.noreply.github.com> Co-authored-by: Mustafa Hasan Khan <65130881+mustafahasankhan@users.noreply.github.com> Co-authored-by: zhulixi <48713110+zlxxlz1026@users.noreply.github.com> Co-authored-by: Stanisław Tuszyński <stanislaw@tuszynski.me>
2023-05-21 02:20:55 +03:00
2023-11-13 17:03:55 +03:00
# Upgrade pip
RUN pip install --upgrade pip
2023-11-13 17:03:55 +03:00
# Increase timeout to wait for the new installation
RUN pip install --no-cache-dir -r requirements.txt --timeout 200
2023-11-13 17:03:55 +03:00
RUN if [ "$DEV_MODE" = "true" ]; then pip install --no-cache debugpy --timeout 200; fi
# Copy the rest of the application
COPY . .
New Webapp migration (#56) * feat(v2): loaders added * feature: Add scroll animations * feature: upload ui * feature: upload multiple files * fix: Same file name and size remove * feat(crawler): added * feat(parsers): v2 added more * feat(v2): audio now working * feat(v2): all loaders * feat(v2): explorer * chore: add links * feat(api): added status in return message * refactor(website): remove old code * feat(upload): return type for messages * feature: redirect to upload if ENV=local * fix(chat): fixed some issues * feature: respect response type * loading state * feature: Loading stat * feat(v2): added explore and chat pages * feature: modal settings * style: Chat UI * feature: scroll to bottom when chatting * feature: smooth scroll in chat * feature(anim): Slide chat in * feature: markdown chat * feat(explorer): list * feat(doc): added document item * feat(explore): added modal * Add clarification on Project API keys and web interface for migration scripts to Readme (#58) * fix(demo): changed link * add support to uploading zip file (#62) * Catch UnicodeEncodeError exception (#64) * feature: fixed chatbar * fix(loaders): missing argument * fix: layout * fix: One whole chatbox * fix: Scroll into view * fix(build): vercel issues * chore(streamlit): moved to own file * refactor(api): moved to backend folder * feat(docker): added docker compose * Fix a bug where langchain memories were not being cleaned (#71) * Update README.md (#70) * chore(streamlit): moved to own file * refactor(api): moved to backend folder * docs(readme): updated for new version * docs(readme): added old readme * docs(readme): update copy dot env file * docs(readme): cleanup --------- Co-authored-by: iMADi-ARCH <nandanaditya985@gmail.com> Co-authored-by: Matt LeBel <github@lebel.io> Co-authored-by: Evan Carlson <45178375+EvanCarlson@users.noreply.github.com> Co-authored-by: Mustafa Hasan Khan <65130881+mustafahasankhan@users.noreply.github.com> Co-authored-by: zhulixi <48713110+zlxxlz1026@users.noreply.github.com> Co-authored-by: Stanisław Tuszyński <stanislaw@tuszynski.me>
2023-05-21 02:20:55 +03:00
EXPOSE 5050
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "5050", "--workers", "6"]