docs: transform vectordb page into subdirectory

[DOCS-1197]: https://hasurahq.atlassian.net/browse/DOCS-1197?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

PR-URL: https://github.com/hasura/graphql-engine-mono/pull/10078
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
GitOrigin-RevId: e1e218318c115c469b7e90cd4b2fbb4b2bbbd767
This commit is contained in:
Rob Dominguez 2023-08-09 06:01:03 -04:00 committed by hasura-bot
parent 975b905999
commit 378aba7588
16 changed files with 242 additions and 129 deletions

View File

@ -0,0 +1,4 @@
{
"label": "Vector Databases",
"position": 1.45
}

View File

@ -0,0 +1,60 @@
---
sidebar_label: Vector Databases
keywords:
- hasura
- docs
- databases
- vector databases
- ai
- machine learning
sidebar_position: 1
---
# How Does Hasura Work with Vector Databases?
## What are vectors?
Vectors are mathematical representations for unstructured data like text, audio, or video data. Vectors generated from
deep neural network models like large language models (LLMs) are of a high-dimension to capture multiple latent
features, which can be then used to classify text, or cluster related text.
Word vectors are numerical representations of individual words that capture their meaning and usage patterns. Each word
is represented as a vector in a high-dimensional space, where the dimensions correspond to different features or
attributes of the word, such as its context, syntactic role, and semantic properties.
## Vectors in the context of Large Language Models
LLMs are trained on public datasets and until a certain point in time. They can't be used to answer questions on your
new organizational data.
For instance, at the moment, the last training date for OpenAI's `text-davinci-003` model was June 2021 and so it has no
idea about events in 2023.
However, we can steer an LLM to answer queries relevant to our new data by providing the context in which it should
answer the question. This is typically done by providing additional information that it can use. But, given the
limitation on the input size we have to pick the right data to feed to it as context.
We do this by taking all the textual data and chunking it, which is an important strategy for your LLM application.
Now when an input user query comes in, we search for chunks that have similar context and feed to our LLMs.
## Why do we need vector databases?
Vector databases are optimized to search for similar vectors.
Chunking and then searching for relevant chunks can't be done at query time for large systems with a large amount of
text.
We first chunk and then store all chunked vectors in vector databases so that we can find relevant chunks using semantic
search at the time of query.
## How does Hasura work with vector databases?
Hasura connects with vector databases just the same way as you would any other relational database. You can quickly and
easily deploy a custom data connector agent to connect to your vector database.
## Next steps
- [Connect to a Weaviate vector database](/databases/vector-databases/weaviate.mdx)
- [Learn more about Hasura Data Connectors](/databases/data-connectors/index.mdx)
- [Deploy a custom data connector agent to Hasura Cloud](/hasura-cli/connector-plugin/index.mdx)

View File

@ -0,0 +1,178 @@
---
sidebar_label: Weaviate
keywords:
- hasura
- docs
- databases
- vector databases
- ai
- machine learning
- weaviate
sidebar_position: 2
---
import Thumbnail from '@site/src/components/Thumbnail';
# Connect Hasura to Weaviate
[Weaviate](https://weaviate.io/) is a cloud-native, modular, real-time vector search engine that allows you to build
intelligent applications by using machine learning models as the data layer. It is open-source and can be deployed
on-premise or in the cloud.
:::info Connecting vector databases to Hasura
To connect a vector database to Hasura, you'll need to take advantage of
[Hasura Data Connectors](/databases/data-connectors/index.mdx). You can deploy any custom data connector agent to Hasura
Cloud using our CLI plugin. For more information, refer to the [docs](/hasura-cli/connector-plugin/index.mdx).
If you're curious what other connectors are available, check out our [NDC Hub](https://github.com/hasura/ndc-hub).
:::
## Step 1: Deploy a data connector agent
We'll use the Hasura CLI to deploy a custom data connector agent to Hasura Cloud. Below, we're using the `create`
command and naming our connector `weaviate-connector:v1`. We're also passing in the GitHub repo URL for the connector
agent using the `--github-repo-url` flag:
```bash
hasura connector create weaviate-connector:v1 --github-repo-url https://github.com/hasura/weaviate_gdc/tree/main
```
We can check on the progress of the deployment using the `status` command:
```bash
hasura connector status weaviate-connector:v1
```
Once the `DONE` status is returned, we can grab the URL for our data connector agent using the `list` command:
```bash
hasura connector list
```
This will return a list of all the custom data connector agents you own. **The second value returned is the URL which
we'll use in the next step; copy it to your clipboard.**
## Step 2: Add the data connector agent to your Hasura Cloud project
In your Cloud project, navigate to the `Data` tab and click `Manage` in the left-hand sidebar.
At the bottom of the screen, you'll see an expandable section titled `Data Connector Agents`.
<Thumbnail
src="/img/databases/vector-dbs/weaviate/weaviate_add-agent.png"
alt="Add the agent for a Weaviate database"
width="1000px"
/>
Click this and scroll down to `Add Agent`.
Name this agent `weaviate` and paste the URL you copied from the CLI into the `URL` field and click `Connect`.
<Thumbnail
src="/img/databases/vector-dbs/weaviate/weaviate_configure-agent.png"
alt="Add the agent for a Weaviate database"
width="1000px"
/>
## Step 3: Select the driver
Navigate to the `Data` tab and select `Connect Database`, then select `Weaviate` from the list of drivers:
<Thumbnail
src="/img/databases/vector-dbs/weaviate/weaviate_connect-db.png"
alt="Configure the Weaviate agent"
width="1000px"
/>
## Step 4: Connect your database
At this point, we'll need to configure a few parameters:
<Thumbnail
src="/img/databases/vector-dbs/weaviate/connect-weaveate-database.png"
alt="Connect Weaviate database"
width="1000px"
/>
| Parameter | Description |
| ------------- | ------------------------------------------------------- |
| Database Name | The name of your Weaviate database. |
| `apiKey` | The API key for your Weaviate database. |
| `host` | The URL of your Weaviate database. |
| `openAPIKey` | The OpenAI key for use with your Weaviate database. |
| `scheme` | The URL scheme for your Weaviate database (http/https). |
:::info Where can I find these parameters?
For the Weaviate-specific parameters, on the
[Weaviate Cloud Services' Console](https://console.weaviate.cloud/dashboard), you can see your cluster's connection
information on the cluster's card.
You can register for an OpenAI key [here](https://openai.com/blog/openai-api).
:::
## Step 5: Track your tables
To make schemas accessible for querying using GraphQL, we'll need to track them. In the example below, we're tracking a
schema called `Resume` by checking the box next to it and clicking `Track Selected`:
<Thumbnail src="/img/databases/vector-dbs/weaviate/track-tables.png" alt="Connect Weaviate database" width="1000px" />
Tracking this schema will generate a type available in your GraphQL API that you can query against 🎉
:::info Don't have any tables to track?
You will need to define the schema in your vector database. For a walkthrough of setting up a Weaviate schema, refer to
this [tutorial](https://weaviate.io/developers/weaviate/configuration/schema-configuration).
:::
## Step 6: Define a remote relationship
The information stored in Weaviate is vectorized and not in a human-readable format. We want to be able to return the
information from our relational database using the vectorized data from Weaviate. To do this, we need to define a remote
relationship.
In the example below, we're defining a remote relationship between the `Resume` schema in our vector database and the
`application` table in our relational database. This way, whenever we query the vectorized information in our `Resume`
table, we can return the information from our relational database.
<Thumbnail
src="/img/databases/vector-dbs/weaviate/define-remote-relationship.png"
alt="Define remote relationship"
width="1000px"
/>
## Step 7: Query your data
You can now query across both your vector database and your existing relational database tables as if they were in one
location!
In our example, we have two tables in our relational database:
1. `candidate`
<Thumbnail src="/img/databases/vector-dbs/weaviate/candidate.png" alt="Candidate 1 table" width="425px" />
2. `application`
<Thumbnail src="/img/databases/vector-dbs/weaviate/application.png" alt="Application 2 table" width="700px" />
Our vector database stores the resumes as:
<Thumbnail src="/img/databases/vector-dbs/weaviate/resume-store.png" alt="Resume store" width="1000px" />
If we head to the `API` tab in the Hasura Console, in our GraphQL query, we are able to fetch all the candidate and
application information for a resume. Hasura brings this all together to provide this seamless querying experience.
<Thumbnail src="/img/databases/vector-dbs/weaviate/execute-query.png" alt="Execute query" width="1000px" />
## Next Steps
- Check out our [Learn tutorial](https://hasura.io/learn/graphql/vectordbs/introduction/) on Generative AI using Hasura,
Weaviate, Next.js and Tailwind CSS 🎉
- Learn more about [Hasura Data Connectors](/databases/data-connectors/index.mdx).
- Check out the available connectors on the [NDC Hub](https://github.com/hasura/ndc-hub)... or build your own!

View File

@ -1,129 +0,0 @@
---
sidebar_label: Vector Databases
keywords:
- hasura
- docs
- databases
- vector databases
- ai
- machine learning
sidebar_position: 1.45
---
import Thumbnail from '@site/src/components/Thumbnail';
# How Does Hasura Work With Vector Databases?
## What are vectors?
Vectors are mathematical representations for unstructured data like text, audio, or video data. Vectors generated from
deep neural network models like large language models (LLMs) are of a high-dimension to capture multiple latent
features, which can be then used to classify text, or cluster related text.
Word vectors are numerical representations of individual words that capture their meaning and usage patterns. Each word
is represented as a vector in a high-dimensional space, where the dimensions correspond to different features or
attributes of the word, such as its context, syntactic role, and semantic properties.
## Vectors in the context of Large Language Models
LLMs are trained on public datasets and until a certain point in time. They cant be used to answer
questions on your new organizational data.
For instance, at the moment, the last training date for OpenAI's `text-davinci-003` model was June 2021 and so it has
no idea about events in 2023.
However, we can steer an LLM to answer queries relevant to our new data by providing the context in which it should
answer the question. This is typically done by providing additional information that it can use. But, given the
limitation on the input size we have to pick the right data to feed to it as context.
We do this by taking all the textual data and chunking it, which is an important strategy for your LLM application.
Now when an input user query comes in, we search for chunks that have similar context and feed to our LLMs.
## Why do we need vector databases?
Vector databases are optimized to search for similar vectors.
Chunking and then searching for relevant chunks cant be done at query time for large systems with a large amount of
text.
We first chunk and then store all chunked vectors in vector databases so that we can find relevant chunks using semantic
search at the time of query.
## How does Hasura work with a vector databases?
Hasura connects with vector databases just the same way as you would any other relational database. Supported vector
databases will be available for you to integrate in the `Connect Database` section in Console.
To demo these features please [check out our blog post](https://hasura.io/blog/hasura-brings-the-power-of-generative-ai-to-your-data/)
on how to set it up with Weaviate.
### Step 1: Select the driver
In our case the driver is Weaviate:
<Thumbnail src="/img/databases/vector-dbs/vector-db-connect-db.png" alt="Add database source" width="700px" />
### Step 2: Connect your database
At this step you have to configure few parameters such as:
- API to access your vector db
- Host of your vector db
- URL scheme: http/https
- The model you would like to use for factorizing text. In this demo example, we only support OpenAI. Hence it
requests you to provide OpenAI key.
<Thumbnail src="/img/databases/vector-dbs/connect-weaveate-database.png" alt="Connect Weaviate database" width="1000px" />
### Step 3: Review your database in the `Data Manager` tab
<Thumbnail src="/img/databases/vector-dbs/data-manager-tab.png" alt="Data manager tab" width="1000px" />
### Step 4: Create your vector database table (schema)
You will need to define the schema in your vector database. For a walkthrough of setting up a Weaviate schema, refer
to the [tutorial](https://weaviate.io/developers/weaviate/configuration/schema-configuration).
### Step 5: Track your tables
In order (for schemas) to be accessible for querying using Graph QL you will need to track them.
<Thumbnail src="/img/databases/vector-dbs/track-tables.png" alt="Connect Weaviate database" width="1000px" />
### Step 6: Define the remote relationship
Define the remote relationship from your vector database to your relational database
<Thumbnail src="/img/databases/vector-dbs/define-remote-relationship.png" alt="Define remote relationship"
width="1000px" />
### Step 7: Go nuts! Query query query!
You can now query across both your vector database and your existing relational database tables as if they were in
one location.
We have 2 tables in our relational database:
1. Candidate 1
<Thumbnail src="/img/databases/vector-dbs/candidate.png" alt="Candidate 1 table" width="425px" />
2. Application 2
<Thumbnail src="/img/databases/vector-dbs/application.png" alt="Application 2 table" width="700px" />
Our vector database stores the resume as:
<Thumbnail src="/img/databases/vector-dbs/resume-store.png" alt="Resume store" width="1000px" />
In our GraphQL query we are able to fetch all the candidate and application information for a resume. Hasura brings
them all together to provide this seamless querying experience.
<Thumbnail src="/img/databases/vector-dbs/execute-query.png" alt="Execute query" width="1000px" />

Binary file not shown.

Before

Width:  |  Height:  |  Size: 65 KiB

View File

Before

Width:  |  Height:  |  Size: 23 KiB

After

Width:  |  Height:  |  Size: 23 KiB

View File

Before

Width:  |  Height:  |  Size: 16 KiB

After

Width:  |  Height:  |  Size: 16 KiB

View File

Before

Width:  |  Height:  |  Size: 71 KiB

After

Width:  |  Height:  |  Size: 71 KiB

View File

Before

Width:  |  Height:  |  Size: 68 KiB

After

Width:  |  Height:  |  Size: 68 KiB

View File

Before

Width:  |  Height:  |  Size: 126 KiB

After

Width:  |  Height:  |  Size: 126 KiB

View File

Before

Width:  |  Height:  |  Size: 178 KiB

After

Width:  |  Height:  |  Size: 178 KiB

View File

Before

Width:  |  Height:  |  Size: 65 KiB

After

Width:  |  Height:  |  Size: 65 KiB

View File

Before

Width:  |  Height:  |  Size: 83 KiB

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 65 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB