quivr/backend/utils/vectors.py

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.schema import Document
from llm.brainpicking import BrainPicking, BrainSettings
from llm.summarization import llm_evaluate_summaries, llm_summerize
from logger import get_logger
from models.chats import ChatMessage
from models.settings import BrainSettings, CommonsDep
from pydantic import BaseModel

logger = get_logger(__name__)

class Neurons(BaseModel):

    commons: CommonsDep
    settings = BrainSettings()
    
    def create_vector(self, user_id, doc, user_openai_api_key=None):
        logger.info(f"Creating vector for document")
        logger.info(f"Document: {doc}")
        if user_openai_api_key:
            self.commons['documents_vector_store']._embedding = OpenAIEmbeddings(openai_api_key=user_openai_api_key)
        try:
            sids = self.commons['documents_vector_store'].add_documents([doc])
            if sids and len(sids) > 0:
                self.commons['supabase'].table("vectors").update({"user_id": user_id}).match({"id": sids[0]}).execute()
        except Exception as e:
            logger.error(f"Error creating vector for document {e}")

    def create_embedding(self, content):
        return self.commons['embeddings'].embed_query(content)

    def similarity_search(self, query, table='match_summaries', top_k=5, threshold=0.5):
        query_embedding = self.create_embedding(query)
        summaries = self.commons['supabase'].rpc(
            table, {'query_embedding': query_embedding,
                    'match_count': top_k, 'match_threshold': threshold}
        ).execute()
        return summaries.data


def create_summary(commons: CommonsDep, document_id, content, metadata):
    logger.info(f"Summarizing document {content[:100]}")
    summary = llm_summerize(content)
    logger.info(f"Summary: {summary}")
    metadata['document_id'] = document_id
    summary_doc_with_metadata = Document(
        page_content=summary, metadata=metadata)
    sids = commons['summaries_vector_store'].add_documents(
        [summary_doc_with_metadata])
    if sids and len(sids) > 0:
        commons['supabase'].table("summaries").update(
            {"document_id": document_id}).match({"id": sids[0]}).execute()
add summarization backend 2023-05-22 09:39:55 +03:00			`from langchain.embeddings.openai import OpenAIEmbeddings`
			`from langchain.schema import Document`
refactor(brains): into brainpicking 2023-06-19 23:55:42 +03:00			`from llm.brainpicking import BrainPicking, BrainSettings`
Feat/user chat history (#275) * ♻️ refactor backend main routes * 🗃️ new user_id uuid column in users table * 🗃️ new chats table * ✨ new chat endpoints * ✨ change chat routes post to handle undef chat_id * ♻️ extract components from chat page * ✨ add chatId to useQuestion * ✨ new ChatsList * ✨ new optional dynamic route chat/{chat_id} * 🩹 add setQuestion to speach utils * feat: self supplied key (#286) * feat(brain): increased size if api key and more * fix(key): not displayed * feat(apikey): now password input * fix(twitter): handle wrong * feat(chat): basic source documents support (#289) * ♻️ refactor backend main routes * 🗃️ new user_id uuid column in users table * 🗃️ new chats table * ✨ new chat endpoints * ✨ change chat routes post to handle undef chat_id * ♻️ extract components from chat page * ✨ add chatId to useQuestion * ✨ new ChatsList * ✨ new optional dynamic route chat/{chat_id} * 🩹 add setQuestion to speach utils * 🎨 separate creation and update endpoints for chat * 🩹 add feat(chat): basic source documents support * ✨ add chatName upon creation and for chats list * 💄 improve chatsList style * User chat history and multiple chats (#290) * ♻️ refactor backend main routes * 🗃️ new user_id uuid column in users table * 🗃️ new chats table * ✨ new chat endpoints * ✨ change chat routes post to handle undef chat_id * ♻️ extract components from chat page * ✨ add chatId to useQuestion * ✨ new ChatsList * ✨ new optional dynamic route chat/{chat_id} * refactor(chat): use layout to avoid refetching all chats on every chat * refactor(chat): useChats hook instead of useQuestion * fix(chat): fix errors * refactor(chat): better folder structure * feat: self supplied key (#286) * feat(brain): increased size if api key and more * fix(key): not displayed * feat(apikey): now password input * fix(twitter): handle wrong * feat(chat): basic source documents support (#289) * style(chat): better looking sidebar * resume merge * fix(backend): add os and logger imports * small fixes * chore(chat): remove empty interface * chore(chat): use const * fix(chat): merge errors * fix(chat): remove useSpeech args * chore(chat): remove unused args * fix(chat): remove duplicate components --------- Co-authored-by: gozineb <zinebe@theodo.fr> Co-authored-by: Matt <77928207+mattzcarey@users.noreply.github.com> Co-authored-by: Stan Girard <girard.stanislas@gmail.com> Co-authored-by: xleven <xleven@outlook.com> * fix and refactor errors * fix(fresh): installation issues * chore(conflict): merged old code * fix(multi-chat): use update endpoint * feat(embeddings): now using users api key --------- Co-authored-by: Matt <77928207+mattzcarey@users.noreply.github.com> Co-authored-by: Stan Girard <girard.stanislas@gmail.com> Co-authored-by: xleven <xleven@outlook.com> Co-authored-by: Aditya Nandan <61308761+iMADi-ARCH@users.noreply.github.com> Co-authored-by: iMADi-ARCH <nandanaditya985@gmail.com> Co-authored-by: Mamadou DICKO <mamadoudicko100@gmail.com> 2023-06-11 00:59:16 +03:00			`from llm.summarization import llm_evaluate_summaries, llm_summerize`
add summarization backend 2023-05-22 09:39:55 +03:00			`from logger import get_logger`
Feat/user chat history (#275) * ♻️ refactor backend main routes * 🗃️ new user_id uuid column in users table * 🗃️ new chats table * ✨ new chat endpoints * ✨ change chat routes post to handle undef chat_id * ♻️ extract components from chat page * ✨ add chatId to useQuestion * ✨ new ChatsList * ✨ new optional dynamic route chat/{chat_id} * 🩹 add setQuestion to speach utils * feat: self supplied key (#286) * feat(brain): increased size if api key and more * fix(key): not displayed * feat(apikey): now password input * fix(twitter): handle wrong * feat(chat): basic source documents support (#289) * ♻️ refactor backend main routes * 🗃️ new user_id uuid column in users table * 🗃️ new chats table * ✨ new chat endpoints * ✨ change chat routes post to handle undef chat_id * ♻️ extract components from chat page * ✨ add chatId to useQuestion * ✨ new ChatsList * ✨ new optional dynamic route chat/{chat_id} * 🩹 add setQuestion to speach utils * 🎨 separate creation and update endpoints for chat * 🩹 add feat(chat): basic source documents support * ✨ add chatName upon creation and for chats list * 💄 improve chatsList style * User chat history and multiple chats (#290) * ♻️ refactor backend main routes * 🗃️ new user_id uuid column in users table * 🗃️ new chats table * ✨ new chat endpoints * ✨ change chat routes post to handle undef chat_id * ♻️ extract components from chat page * ✨ add chatId to useQuestion * ✨ new ChatsList * ✨ new optional dynamic route chat/{chat_id} * refactor(chat): use layout to avoid refetching all chats on every chat * refactor(chat): useChats hook instead of useQuestion * fix(chat): fix errors * refactor(chat): better folder structure * feat: self supplied key (#286) * feat(brain): increased size if api key and more * fix(key): not displayed * feat(apikey): now password input * fix(twitter): handle wrong * feat(chat): basic source documents support (#289) * style(chat): better looking sidebar * resume merge * fix(backend): add os and logger imports * small fixes * chore(chat): remove empty interface * chore(chat): use const * fix(chat): merge errors * fix(chat): remove useSpeech args * chore(chat): remove unused args * fix(chat): remove duplicate components --------- Co-authored-by: gozineb <zinebe@theodo.fr> Co-authored-by: Matt <77928207+mattzcarey@users.noreply.github.com> Co-authored-by: Stan Girard <girard.stanislas@gmail.com> Co-authored-by: xleven <xleven@outlook.com> * fix and refactor errors * fix(fresh): installation issues * chore(conflict): merged old code * fix(multi-chat): use update endpoint * feat(embeddings): now using users api key --------- Co-authored-by: Matt <77928207+mattzcarey@users.noreply.github.com> Co-authored-by: Stan Girard <girard.stanislas@gmail.com> Co-authored-by: xleven <xleven@outlook.com> Co-authored-by: Aditya Nandan <61308761+iMADi-ARCH@users.noreply.github.com> Co-authored-by: iMADi-ARCH <nandanaditya985@gmail.com> Co-authored-by: Mamadou DICKO <mamadoudicko100@gmail.com> 2023-06-11 00:59:16 +03:00			`from models.chats import ChatMessage`
feat(settings): refactored 2023-06-19 23:46:25 +03:00			`from models.settings import BrainSettings, CommonsDep`
feat(neurons): added class 2023-06-19 22:15:35 +03:00			`from pydantic import BaseModel`
add summarization backend 2023-05-22 09:39:55 +03:00
			`logger = get_logger(__name__)`

feat(neurons): added class 2023-06-19 22:15:35 +03:00			`class Neurons(BaseModel):`

			`commons: CommonsDep`
			`settings = BrainSettings()`

			`def create_vector(self, user_id, doc, user_openai_api_key=None):`
			`logger.info(f"Creating vector for document")`
			`logger.info(f"Document: {doc}")`
			`if user_openai_api_key:`
			`self.commons['documents_vector_store']._embedding = OpenAIEmbeddings(openai_api_key=user_openai_api_key)`
			`try:`
			`sids = self.commons['documents_vector_store'].add_documents([doc])`
			`if sids and len(sids) > 0:`
			`self.commons['supabase'].table("vectors").update({"user_id": user_id}).match({"id": sids[0]}).execute()`
			`except Exception as e:`
			`logger.error(f"Error creating vector for document {e}")`

			`def create_embedding(self, content):`
			`return self.commons['embeddings'].embed_query(content)`

			`def similarity_search(self, query, table='match_summaries', top_k=5, threshold=0.5):`
			`query_embedding = self.create_embedding(query)`
			`summaries = self.commons['supabase'].rpc(`
			`table, {'query_embedding': query_embedding,`
			`'match_count': top_k, 'match_threshold': threshold}`
			`).execute()`
			`return summaries.data`


Feat/multiple brains backend (#340) * 🗃️ add new tables for multiple brains * 🗑️ remove date input from fetch_user_id_from_credentials * ✨ new /brain endpoints * ♻️ refactor backend utils by splitting it into files * 💡 comments for next actions to update /upload 2023-06-17 00:36:53 +03:00			`def create_summary(commons: CommonsDep, document_id, content, metadata):`
add summarization backend 2023-05-22 09:39:55 +03:00			`logger.info(f"Summarizing document {content[:100]}")`
			`summary = llm_summerize(content)`
			`logger.info(f"Summary: {summary}")`
			`metadata['document_id'] = document_id`
			`summary_doc_with_metadata = Document(`
			`page_content=summary, metadata=metadata)`
Feat/multiple brains backend (#340) * 🗃️ add new tables for multiple brains * 🗑️ remove date input from fetch_user_id_from_credentials * ✨ new /brain endpoints * ♻️ refactor backend utils by splitting it into files * 💡 comments for next actions to update /upload 2023-06-17 00:36:53 +03:00			`sids = commons['summaries_vector_store'].add_documents(`
add summarization backend 2023-05-22 09:39:55 +03:00			`[summary_doc_with_metadata])`
			`if sids and len(sids) > 0:`
Feat/multiple brains backend (#340) * 🗃️ add new tables for multiple brains * 🗑️ remove date input from fetch_user_id_from_credentials * ✨ new /brain endpoints * ♻️ refactor backend utils by splitting it into files * 💡 comments for next actions to update /upload 2023-06-17 00:36:53 +03:00			`commons['supabase'].table("summaries").update(`
add summarization backend 2023-05-22 09:39:55 +03:00			`{"document_id": document_id}).match({"id": sids[0]}).execute()`