quivr/backend/vectorstore/supabase.py

from typing import Any, List

from langchain.docstore.document import Document
from langchain.embeddings.base import Embeddings
from langchain.vectorstores import SupabaseVectorStore
from logger import get_logger
from supabase.client import Client

logger = get_logger(__name__)


class CustomSupabaseVectorStore(SupabaseVectorStore):
    """A custom vector store that uses the match_vectors table instead of the vectors table."""

    brain_id: str = "none"
    user_id: str = "none"

    def __init__(
        self,
        client: Client,
        embedding: Embeddings,
        table_name: str,
        brain_id: str = "none",
        user_id: str = "none",
    ):
        super().__init__(client, embedding, table_name)
        self.brain_id = brain_id
        self.user_id = user_id

    def find_brain_closest_query(
        self,
        user_id: str,
        query: str,
        k: int = 6,
        table: str = "match_brain",
        threshold: float = 0.5,
    ) -> [dict]:
        vectors = self._embedding.embed_documents([query])
        query_embedding = vectors[0]

        res = self._client.rpc(
            table,
            {
                "query_embedding": query_embedding,
                "match_count": k,
                "p_user_id": str(self.user_id),
            },
        ).execute()

        # Get the brain_id of the brain that is most similar to the query
        # Get the brain_id and name of the brains that are most similar to the query
        brain_details = [
            {
                "id": item.get("id", None),
                "name": item.get("name", None),
                "similarity": item.get("similarity", 0.0),
            }
            for item in res.data
        ]
        return brain_details

    def similarity_search(
        self,
        query: str,
        k: int = 6,
        table: str = "match_vectors",
        threshold: float = 0.5,
        **kwargs: Any,
    ) -> List[Document]:
        vectors = self._embedding.embed_documents([query])
        query_embedding = vectors[0]
        res = self._client.rpc(
            table,
            {
                "query_embedding": query_embedding,
                "match_count": k,
                "p_brain_id": str(self.brain_id),
            },
        ).execute()

        match_result = [
            (
                Document(
                    metadata={
                        **search.get("metadata", {}),
                        "id": search.get("id", ""),
                        "similarity": search.get("similarity", 0.0),
                    },
                    page_content=search.get("content", ""),
                ),
                search.get("similarity", 0.0),
            )
            for search in res.data
            if search.get("content")
        ]

        documents = [doc for doc, _ in match_result]

        return documents
feat(vectorstore): own folder 2023-06-19 21:15:34 +03:00			`from typing import Any, List`

			`from langchain.docstore.document import Document`
fix: update backend tests (#992) * fix: update backend tests * fix(pytest): update types 2023-08-21 13:45:32 +03:00			`from langchain.embeddings.base import Embeddings`
feat(vectorstore): own folder 2023-06-19 21:15:34 +03:00			`from langchain.vectorstores import SupabaseVectorStore`
feat(search): new way to interact with Quivr (#2026) Co-authored-by: Zewed <dewez.antoine2@gmail.com> Co-authored-by: Antoine Dewez <44063631+Zewed@users.noreply.github.com> 2024-01-20 07:34:30 +03:00			`from logger import get_logger`
Feat/static analysis (#582) * feat: add static analysis * chore: update Makefile add static analysis script * chore: add vscode extensions recommandations 2023-07-10 15:27:49 +03:00			`from supabase.client import Client`
feat(vectorstore): own folder 2023-06-19 21:15:34 +03:00
feat(search): new way to interact with Quivr (#2026) Co-authored-by: Zewed <dewez.antoine2@gmail.com> Co-authored-by: Antoine Dewez <44063631+Zewed@users.noreply.github.com> 2024-01-20 07:34:30 +03:00			`logger = get_logger(__name__)`

feat(vectorstore): own folder 2023-06-19 21:15:34 +03:00
			`class CustomSupabaseVectorStore(SupabaseVectorStore):`
feat(chat): use openai function for answer (#354) * feat(chat): use openai function for answer (backend) * feat(chat): use openai function for answer (frontend) * chore: refacto BrainPicking * feat: update chat creation logic * feat: simplify chat system logic * feat: set default method to gpt-3.5-turbo-0613 * feat: use user own openai key * feat(chat): slightly improve prompts * feat: add global error interceptor * feat: remove unused endpoints * docs: update chat system doc * chore(linter): add unused import remove config * feat: improve dx * feat: improve OpenAiFunctionBasedAnswerGenerator prompt 2023-06-22 18:50:06 +03:00			`"""A custom vector store that uses the match_vectors table instead of the vectors table."""`

Feat/multiple brains files (#361) 2023-06-28 20:39:27 +03:00			`brain_id: str = "none"`
fix: 🐛 search (#2045) fixed public brains avaiable # Description Please include a summary of the changes and the related issue. Please also include relevant motivation and context. ## Checklist before requesting a review Please delete options that are not relevant. - [ ] My code follows the style guidelines of this project - [ ] I have performed a self-review of my code - [ ] I have commented hard-to-understand areas - [ ] I have ideally added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes - [ ] Any dependent changes have been merged ## Screenshots (if appropriate): 2024-01-21 23:00:21 +03:00			`user_id: str = "none"`
feat(chat): use openai function for answer (#354) * feat(chat): use openai function for answer (backend) * feat(chat): use openai function for answer (frontend) * chore: refacto BrainPicking * feat: update chat creation logic * feat: simplify chat system logic * feat: set default method to gpt-3.5-turbo-0613 * feat: use user own openai key * feat(chat): slightly improve prompts * feat: add global error interceptor * feat: remove unused endpoints * docs: update chat system doc * chore(linter): add unused import remove config * feat: improve dx * feat: improve OpenAiFunctionBasedAnswerGenerator prompt 2023-06-22 18:50:06 +03:00
			`def __init__(`
			`self,`
			`client: Client,`
fix: update backend tests (#992) * fix: update backend tests * fix(pytest): update types 2023-08-21 13:45:32 +03:00			`embedding: Embeddings,`
feat(chat): use openai function for answer (#354) * feat(chat): use openai function for answer (backend) * feat(chat): use openai function for answer (frontend) * chore: refacto BrainPicking * feat: update chat creation logic * feat: simplify chat system logic * feat: set default method to gpt-3.5-turbo-0613 * feat: use user own openai key * feat(chat): slightly improve prompts * feat: add global error interceptor * feat: remove unused endpoints * docs: update chat system doc * chore(linter): add unused import remove config * feat: improve dx * feat: improve OpenAiFunctionBasedAnswerGenerator prompt 2023-06-22 18:50:06 +03:00			`table_name: str,`
Feat/multiple brains files (#361) 2023-06-28 20:39:27 +03:00			`brain_id: str = "none",`
fix: 🐛 search (#2045) fixed public brains avaiable # Description Please include a summary of the changes and the related issue. Please also include relevant motivation and context. ## Checklist before requesting a review Please delete options that are not relevant. - [ ] My code follows the style guidelines of this project - [ ] I have performed a self-review of my code - [ ] I have commented hard-to-understand areas - [ ] I have ideally added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes - [ ] Any dependent changes have been merged ## Screenshots (if appropriate): 2024-01-21 23:00:21 +03:00			`user_id: str = "none",`
feat(chat): use openai function for answer (#354) * feat(chat): use openai function for answer (backend) * feat(chat): use openai function for answer (frontend) * chore: refacto BrainPicking * feat: update chat creation logic * feat: simplify chat system logic * feat: set default method to gpt-3.5-turbo-0613 * feat: use user own openai key * feat(chat): slightly improve prompts * feat: add global error interceptor * feat: remove unused endpoints * docs: update chat system doc * chore(linter): add unused import remove config * feat: improve dx * feat: improve OpenAiFunctionBasedAnswerGenerator prompt 2023-06-22 18:50:06 +03:00			`):`
feat(vectorstore): own folder 2023-06-19 21:15:34 +03:00			`super().__init__(client, embedding, table_name)`
Feat/multiple brains files (#361) 2023-06-28 20:39:27 +03:00			`self.brain_id = brain_id`
fix: 🐛 search (#2045) fixed public brains avaiable # Description Please include a summary of the changes and the related issue. Please also include relevant motivation and context. ## Checklist before requesting a review Please delete options that are not relevant. - [ ] My code follows the style guidelines of this project - [ ] I have performed a self-review of my code - [ ] I have commented hard-to-understand areas - [ ] I have ideally added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes - [ ] Any dependent changes have been merged ## Screenshots (if appropriate): 2024-01-21 23:00:21 +03:00			`self.user_id = user_id`
feat(chat): use openai function for answer (#354) * feat(chat): use openai function for answer (backend) * feat(chat): use openai function for answer (frontend) * chore: refacto BrainPicking * feat: update chat creation logic * feat: simplify chat system logic * feat: set default method to gpt-3.5-turbo-0613 * feat: use user own openai key * feat(chat): slightly improve prompts * feat: add global error interceptor * feat: remove unused endpoints * docs: update chat system doc * chore(linter): add unused import remove config * feat: improve dx * feat: improve OpenAiFunctionBasedAnswerGenerator prompt 2023-06-22 18:50:06 +03:00
feat(search): new way to interact with Quivr (#2026) Co-authored-by: Zewed <dewez.antoine2@gmail.com> Co-authored-by: Antoine Dewez <44063631+Zewed@users.noreply.github.com> 2024-01-20 07:34:30 +03:00			`def find_brain_closest_query(`
			`self,`
fix: 🐛 search (#2045) fixed public brains avaiable # Description Please include a summary of the changes and the related issue. Please also include relevant motivation and context. ## Checklist before requesting a review Please delete options that are not relevant. - [ ] My code follows the style guidelines of this project - [ ] I have performed a self-review of my code - [ ] I have commented hard-to-understand areas - [ ] I have ideally added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes - [ ] Any dependent changes have been merged ## Screenshots (if appropriate): 2024-01-21 23:00:21 +03:00			`user_id: str,`
feat(search): new way to interact with Quivr (#2026) Co-authored-by: Zewed <dewez.antoine2@gmail.com> Co-authored-by: Antoine Dewez <44063631+Zewed@users.noreply.github.com> 2024-01-20 07:34:30 +03:00			`query: str,`
			`k: int = 6,`
			`table: str = "match_brain",`
			`threshold: float = 0.5,`
feat(brains): added now multiple brains close by (#2039) # Description Please include a summary of the changes and the related issue. Please also include relevant motivation and context. ## Checklist before requesting a review Please delete options that are not relevant. - [ ] My code follows the style guidelines of this project - [ ] I have performed a self-review of my code - [ ] I have commented hard-to-understand areas - [ ] I have ideally added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes - [ ] Any dependent changes have been merged ## Screenshots (if appropriate): 2024-01-21 05:39:03 +03:00			`) -> [dict]:`
feat(search): new way to interact with Quivr (#2026) Co-authored-by: Zewed <dewez.antoine2@gmail.com> Co-authored-by: Antoine Dewez <44063631+Zewed@users.noreply.github.com> 2024-01-20 07:34:30 +03:00			`vectors = self._embedding.embed_documents([query])`
			`query_embedding = vectors[0]`
fix: 🐛 search (#2045) fixed public brains avaiable # Description Please include a summary of the changes and the related issue. Please also include relevant motivation and context. ## Checklist before requesting a review Please delete options that are not relevant. - [ ] My code follows the style guidelines of this project - [ ] I have performed a self-review of my code - [ ] I have commented hard-to-understand areas - [ ] I have ideally added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes - [ ] Any dependent changes have been merged ## Screenshots (if appropriate): 2024-01-21 23:00:21 +03:00
feat(search): new way to interact with Quivr (#2026) Co-authored-by: Zewed <dewez.antoine2@gmail.com> Co-authored-by: Antoine Dewez <44063631+Zewed@users.noreply.github.com> 2024-01-20 07:34:30 +03:00			`res = self._client.rpc(`
			`table,`
			`{`
			`"query_embedding": query_embedding,`
			`"match_count": k,`
fix: 🐛 search (#2045) fixed public brains avaiable # Description Please include a summary of the changes and the related issue. Please also include relevant motivation and context. ## Checklist before requesting a review Please delete options that are not relevant. - [ ] My code follows the style guidelines of this project - [ ] I have performed a self-review of my code - [ ] I have commented hard-to-understand areas - [ ] I have ideally added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes - [ ] Any dependent changes have been merged ## Screenshots (if appropriate): 2024-01-21 23:00:21 +03:00			`"p_user_id": str(self.user_id),`
feat(search): new way to interact with Quivr (#2026) Co-authored-by: Zewed <dewez.antoine2@gmail.com> Co-authored-by: Antoine Dewez <44063631+Zewed@users.noreply.github.com> 2024-01-20 07:34:30 +03:00			`},`
			`).execute()`

			`# Get the brain_id of the brain that is most similar to the query`
feat(brains): added now multiple brains close by (#2039) # Description Please include a summary of the changes and the related issue. Please also include relevant motivation and context. ## Checklist before requesting a review Please delete options that are not relevant. - [ ] My code follows the style guidelines of this project - [ ] I have performed a self-review of my code - [ ] I have commented hard-to-understand areas - [ ] I have ideally added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes - [ ] Any dependent changes have been merged ## Screenshots (if appropriate): 2024-01-21 05:39:03 +03:00			`# Get the brain_id and name of the brains that are most similar to the query`
			`brain_details = [`
			`{`
			`"id": item.get("id", None),`
			`"name": item.get("name", None),`
			`"similarity": item.get("similarity", 0.0),`
			`}`
			`for item in res.data`
			`]`
			`return brain_details`
feat(search): new way to interact with Quivr (#2026) Co-authored-by: Zewed <dewez.antoine2@gmail.com> Co-authored-by: Antoine Dewez <44063631+Zewed@users.noreply.github.com> 2024-01-20 07:34:30 +03:00
feat(vectorstore): own folder 2023-06-19 21:15:34 +03:00			`def similarity_search(`
feat(chat): use openai function for answer (#354) * feat(chat): use openai function for answer (backend) * feat(chat): use openai function for answer (frontend) * chore: refacto BrainPicking * feat: update chat creation logic * feat: simplify chat system logic * feat: set default method to gpt-3.5-turbo-0613 * feat: use user own openai key * feat(chat): slightly improve prompts * feat: add global error interceptor * feat: remove unused endpoints * docs: update chat system doc * chore(linter): add unused import remove config * feat: improve dx * feat: improve OpenAiFunctionBasedAnswerGenerator prompt 2023-06-22 18:50:06 +03:00			`self,`
			`query: str,`
feat: 🎸 telegram fixed a few bugs 2023-11-02 00:33:47 +03:00			`k: int = 6,`
feat(chat): added streaming (#808) * feat(tmp): added streaming * feat(streaming): implemented by changing order 2023-07-31 22:34:34 +03:00			`table: str = "match_vectors",`
feat(chat): use openai function for answer (#354) * feat(chat): use openai function for answer (backend) * feat(chat): use openai function for answer (frontend) * chore: refacto BrainPicking * feat: update chat creation logic * feat: simplify chat system logic * feat: set default method to gpt-3.5-turbo-0613 * feat: use user own openai key * feat(chat): slightly improve prompts * feat: add global error interceptor * feat: remove unused endpoints * docs: update chat system doc * chore(linter): add unused import remove config * feat: improve dx * feat: improve OpenAiFunctionBasedAnswerGenerator prompt 2023-06-22 18:50:06 +03:00			`threshold: float = 0.5,`
feat(search): new way to interact with Quivr (#2026) Co-authored-by: Zewed <dewez.antoine2@gmail.com> Co-authored-by: Antoine Dewez <44063631+Zewed@users.noreply.github.com> 2024-01-20 07:34:30 +03:00			`**kwargs: Any,`
feat(vectorstore): own folder 2023-06-19 21:15:34 +03:00			`) -> List[Document]:`
			`vectors = self._embedding.embed_documents([query])`
			`query_embedding = vectors[0]`
			`res = self._client.rpc(`
			`table,`
			`{`
			`"query_embedding": query_embedding,`
			`"match_count": k,`
Feat/multiple brains files (#361) 2023-06-28 20:39:27 +03:00			`"p_brain_id": str(self.brain_id),`
feat(vectorstore): own folder 2023-06-19 21:15:34 +03:00			`},`
			`).execute()`

			`match_result = [`
			`(`
			`Document(`
feat: Refactor get_question_context_for_brain endpoint (#1872) to return a list of DocumentAnswer objects # Description Please include a summary of the changes and the related issue. Please also include relevant motivation and context. ## Checklist before requesting a review Please delete options that are not relevant. - [ ] My code follows the style guidelines of this project - [ ] I have performed a self-review of my code - [ ] I have commented hard-to-understand areas - [ ] I have ideally added tests that prove my fix is effective or that my feature works - [ ] New and existing unit tests pass locally with my changes - [ ] Any dependent changes have been merged ## Screenshots (if appropriate): 2023-12-13 00:33:23 +03:00			`metadata={`
			`**search.get("metadata", {}),`
			`"id": search.get("id", ""),`
			`"similarity": search.get("similarity", 0.0),`
			`},`
feat(vectorstore): own folder 2023-06-19 21:15:34 +03:00			`page_content=search.get("content", ""),`
			`),`
			`search.get("similarity", 0.0),`
			`)`
			`for search in res.data`
			`if search.get("content")`
			`]`

			`documents = [doc for doc, _ in match_result]`

feat(chat): use openai function for answer (#354) * feat(chat): use openai function for answer (backend) * feat(chat): use openai function for answer (frontend) * chore: refacto BrainPicking * feat: update chat creation logic * feat: simplify chat system logic * feat: set default method to gpt-3.5-turbo-0613 * feat: use user own openai key * feat(chat): slightly improve prompts * feat: add global error interceptor * feat: remove unused endpoints * docs: update chat system doc * chore(linter): add unused import remove config * feat: improve dx * feat: improve OpenAiFunctionBasedAnswerGenerator prompt 2023-06-22 18:50:06 +03:00			`return documents`