zed/crates/semantic_index/Cargo.toml

[package]
name = "semantic_index"
description = "Process, chunk, and embed text as vectors for semantic search."
version = "0.1.0"
edition = "2021"
publish = false
license = "GPL-3.0-or-later"

[lints]
workspace = true

[lib]
path = "src/semantic_index.rs"

[[example]]
name = "index"
path = "examples/index.rs"
crate-type = ["bin"]

[dependencies]
anyhow.workspace = true
client.workspace = true
clock.workspace = true
collections.workspace = true
fs.workspace = true
futures.workspace = true
futures-batch.workspace = true
gpui.workspace = true
language.workspace = true
log.workspace = true
heed.workspace = true
http.workspace = true
open_ai.workspace = true
parking_lot.workspace = true
project.workspace = true
settings.workspace = true
serde.workspace = true
serde_json.workspace = true
sha2.workspace = true
smol.workspace = true
theme.workspace = true
tree-sitter.workspace = true
ui. workspace = true
util. workspace = true
unindent.workspace = true
workspace.workspace = true
worktree.workspace = true

[dev-dependencies]
env_logger.workspace = true
client = { workspace = true, features = ["test-support"] }
fs = { workspace = true, features = ["test-support"] }
futures.workspace = true
gpui = { workspace = true, features = ["test-support"] }
language = { workspace = true, features = ["test-support"] }
languages.workspace = true
project = { workspace = true, features = ["test-support"] }
tempfile.workspace = true
util = { workspace = true, features = ["test-support"] }
worktree = { workspace = true, features = ["test-support"] }
workspace = { workspace = true, features = ["test-support"] }
http = { workspace = true, features = ["test-support"] }
Semantic Index (#10329) This introduces semantic indexing in Zed based on chunking text from files in the developer's workspace and creating vector embeddings using an embedding model. As part of this, we've created an embeddings provider trait that allows us to work with OpenAI, a local Ollama model, or a Zed hosted embedding. The semantic index is built by breaking down text for known (programming) languages into manageable chunks that are smaller than the max token size. Each chunk is then fed to a language model to create a high dimensional vector which is then normalized to a unit vector to allow fast comparison with other vectors with a simple dot product. Alongside the vector, we store the path of the file and the range within the document where the vector was sourced from. Zed will soon grok contextual similarity across different text snippets, allowing for natural language search beyond keyword matching. This is being put together both for human-based search as well as providing results to Large Language Models to allow them to refine how they help developers. Remaining todo: * [x] Change `provider` to `model` within the zed hosted embeddings database (as its currently a combo of the provider and the model in one name) Release Notes: - N/A --------- Co-authored-by: Nathan Sobo <nathan@zed.dev> Co-authored-by: Antonio Scandurra <me@as-cii.com> Co-authored-by: Conrad Irwin <conrad@zed.dev> Co-authored-by: Marshall Bowers <elliott.codes@gmail.com> Co-authored-by: Antonio <antonio@zed.dev> 2024-04-12 20:40:59 +03:00			`[package]`
			`name = "semantic_index"`
			`description = "Process, chunk, and embed text as vectors for semantic search."`
			`version = "0.1.0"`
			`edition = "2021"`
			`publish = false`
			`license = "GPL-3.0-or-later"`

Move `lints` section to the top of `Cargo.toml`, to match the others 2024-04-18 22:53:48 +03:00			`[lints]`
			`workspace = true`

Semantic Index (#10329) This introduces semantic indexing in Zed based on chunking text from files in the developer's workspace and creating vector embeddings using an embedding model. As part of this, we've created an embeddings provider trait that allows us to work with OpenAI, a local Ollama model, or a Zed hosted embedding. The semantic index is built by breaking down text for known (programming) languages into manageable chunks that are smaller than the max token size. Each chunk is then fed to a language model to create a high dimensional vector which is then normalized to a unit vector to allow fast comparison with other vectors with a simple dot product. Alongside the vector, we store the path of the file and the range within the document where the vector was sourced from. Zed will soon grok contextual similarity across different text snippets, allowing for natural language search beyond keyword matching. This is being put together both for human-based search as well as providing results to Large Language Models to allow them to refine how they help developers. Remaining todo: * [x] Change `provider` to `model` within the zed hosted embeddings database (as its currently a combo of the provider and the model in one name) Release Notes: - N/A --------- Co-authored-by: Nathan Sobo <nathan@zed.dev> Co-authored-by: Antonio Scandurra <me@as-cii.com> Co-authored-by: Conrad Irwin <conrad@zed.dev> Co-authored-by: Marshall Bowers <elliott.codes@gmail.com> Co-authored-by: Antonio <antonio@zed.dev> 2024-04-12 20:40:59 +03:00			`[lib]`
			`path = "src/semantic_index.rs"`

New revision of the Assistant Panel (#10870) This is a crate only addition of a new version of the AssistantPanel. We'll be putting this behind a feature flag while we iron out the new experience. Release Notes: - N/A --------- Co-authored-by: Nathan Sobo <nathan@zed.dev> Co-authored-by: Antonio Scandurra <me@as-cii.com> Co-authored-by: Conrad Irwin <conrad@zed.dev> Co-authored-by: Marshall Bowers <elliott.codes@gmail.com> Co-authored-by: Antonio Scandurra <antonio@zed.dev> Co-authored-by: Nate Butler <nate@zed.dev> Co-authored-by: Nate Butler <iamnbutler@gmail.com> Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com> Co-authored-by: Max <max@zed.dev> 2024-04-24 02:23:26 +03:00			`[[example]]`
			`name = "index"`
			`path = "examples/index.rs"`
			`crate-type = ["bin"]`

Semantic Index (#10329) This introduces semantic indexing in Zed based on chunking text from files in the developer's workspace and creating vector embeddings using an embedding model. As part of this, we've created an embeddings provider trait that allows us to work with OpenAI, a local Ollama model, or a Zed hosted embedding. The semantic index is built by breaking down text for known (programming) languages into manageable chunks that are smaller than the max token size. Each chunk is then fed to a language model to create a high dimensional vector which is then normalized to a unit vector to allow fast comparison with other vectors with a simple dot product. Alongside the vector, we store the path of the file and the range within the document where the vector was sourced from. Zed will soon grok contextual similarity across different text snippets, allowing for natural language search beyond keyword matching. This is being put together both for human-based search as well as providing results to Large Language Models to allow them to refine how they help developers. Remaining todo: * [x] Change `provider` to `model` within the zed hosted embeddings database (as its currently a combo of the provider and the model in one name) Release Notes: - N/A --------- Co-authored-by: Nathan Sobo <nathan@zed.dev> Co-authored-by: Antonio Scandurra <me@as-cii.com> Co-authored-by: Conrad Irwin <conrad@zed.dev> Co-authored-by: Marshall Bowers <elliott.codes@gmail.com> Co-authored-by: Antonio <antonio@zed.dev> 2024-04-12 20:40:59 +03:00			`[dependencies]`
			`anyhow.workspace = true`
			`client.workspace = true`
			`clock.workspace = true`
			`collections.workspace = true`
			`fs.workspace = true`
			`futures.workspace = true`
			`futures-batch.workspace = true`
			`gpui.workspace = true`
			`language.workspace = true`
			`log.workspace = true`
			`heed.workspace = true`
Extract `http` from `util` (#11680) This avoids the CLI linking libssl etc... Release Notes: - N/A 2024-05-11 00:50:20 +03:00			`http.workspace = true`
Semantic Index (#10329) This introduces semantic indexing in Zed based on chunking text from files in the developer's workspace and creating vector embeddings using an embedding model. As part of this, we've created an embeddings provider trait that allows us to work with OpenAI, a local Ollama model, or a Zed hosted embedding. The semantic index is built by breaking down text for known (programming) languages into manageable chunks that are smaller than the max token size. Each chunk is then fed to a language model to create a high dimensional vector which is then normalized to a unit vector to allow fast comparison with other vectors with a simple dot product. Alongside the vector, we store the path of the file and the range within the document where the vector was sourced from. Zed will soon grok contextual similarity across different text snippets, allowing for natural language search beyond keyword matching. This is being put together both for human-based search as well as providing results to Large Language Models to allow them to refine how they help developers. Remaining todo: * [x] Change `provider` to `model` within the zed hosted embeddings database (as its currently a combo of the provider and the model in one name) Release Notes: - N/A --------- Co-authored-by: Nathan Sobo <nathan@zed.dev> Co-authored-by: Antonio Scandurra <me@as-cii.com> Co-authored-by: Conrad Irwin <conrad@zed.dev> Co-authored-by: Marshall Bowers <elliott.codes@gmail.com> Co-authored-by: Antonio <antonio@zed.dev> 2024-04-12 20:40:59 +03:00			`open_ai.workspace = true`
Semantic index progress (#11071) Release Notes: - N/A --------- Co-authored-by: Antonio Scandurra <me@as-cii.com> Co-authored-by: Kyle <kylek@zed.dev> Co-authored-by: Marshall <marshall@zed.dev> Co-authored-by: Marshall Bowers <elliott.codes@gmail.com> 2024-04-27 03:06:05 +03:00			`parking_lot.workspace = true`
Semantic Index (#10329) This introduces semantic indexing in Zed based on chunking text from files in the developer's workspace and creating vector embeddings using an embedding model. As part of this, we've created an embeddings provider trait that allows us to work with OpenAI, a local Ollama model, or a Zed hosted embedding. The semantic index is built by breaking down text for known (programming) languages into manageable chunks that are smaller than the max token size. Each chunk is then fed to a language model to create a high dimensional vector which is then normalized to a unit vector to allow fast comparison with other vectors with a simple dot product. Alongside the vector, we store the path of the file and the range within the document where the vector was sourced from. Zed will soon grok contextual similarity across different text snippets, allowing for natural language search beyond keyword matching. This is being put together both for human-based search as well as providing results to Large Language Models to allow them to refine how they help developers. Remaining todo: * [x] Change `provider` to `model` within the zed hosted embeddings database (as its currently a combo of the provider and the model in one name) Release Notes: - N/A --------- Co-authored-by: Nathan Sobo <nathan@zed.dev> Co-authored-by: Antonio Scandurra <me@as-cii.com> Co-authored-by: Conrad Irwin <conrad@zed.dev> Co-authored-by: Marshall Bowers <elliott.codes@gmail.com> Co-authored-by: Antonio <antonio@zed.dev> 2024-04-12 20:40:59 +03:00			`project.workspace = true`
			`settings.workspace = true`
			`serde.workspace = true`
			`serde_json.workspace = true`
			`sha2.workspace = true`
			`smol.workspace = true`
More fixes to the semantic index's chunking (#11376) This fixes a tricky intermittent issue I was seeing, where failed to chunk certain files correctly because of the way we reuse Tree-sitter `Parser` instances across parses. I've also accounted for leading comments in chunk boundaries, so that items are grouped with their leading comments whenever possible when chunking. Finally, we've changed the `debug project index` action so that it opens a simple debug view in a pane, instead of printing paths to the console. This lets you click into a path and see how it was chunked. Release Notes: - N/A --------- Co-authored-by: Marshall <marshall@zed.dev> 2024-05-04 05:00:18 +03:00			`theme.workspace = true`
Use outline queries to chunk files syntactically (#11283) This chunking strategy uses the existing `outline` query to chunk files. We try to find chunk boundaries that are: * at starts or ends of lines * nested within as few outline items as possible Release Notes: - N/A 2024-05-02 22:28:21 +03:00			`tree-sitter.workspace = true`
More fixes to the semantic index's chunking (#11376) This fixes a tricky intermittent issue I was seeing, where failed to chunk certain files correctly because of the way we reuse Tree-sitter `Parser` instances across parses. I've also accounted for leading comments in chunk boundaries, so that items are grouped with their leading comments whenever possible when chunking. Finally, we've changed the `debug project index` action so that it opens a simple debug view in a pane, instead of printing paths to the console. This lets you click into a path and see how it was chunked. Release Notes: - N/A --------- Co-authored-by: Marshall <marshall@zed.dev> 2024-05-04 05:00:18 +03:00			`ui. workspace = true`
Semantic Index (#10329) This introduces semantic indexing in Zed based on chunking text from files in the developer's workspace and creating vector embeddings using an embedding model. As part of this, we've created an embeddings provider trait that allows us to work with OpenAI, a local Ollama model, or a Zed hosted embedding. The semantic index is built by breaking down text for known (programming) languages into manageable chunks that are smaller than the max token size. Each chunk is then fed to a language model to create a high dimensional vector which is then normalized to a unit vector to allow fast comparison with other vectors with a simple dot product. Alongside the vector, we store the path of the file and the range within the document where the vector was sourced from. Zed will soon grok contextual similarity across different text snippets, allowing for natural language search beyond keyword matching. This is being put together both for human-based search as well as providing results to Large Language Models to allow them to refine how they help developers. Remaining todo: * [x] Change `provider` to `model` within the zed hosted embeddings database (as its currently a combo of the provider and the model in one name) Release Notes: - N/A --------- Co-authored-by: Nathan Sobo <nathan@zed.dev> Co-authored-by: Antonio Scandurra <me@as-cii.com> Co-authored-by: Conrad Irwin <conrad@zed.dev> Co-authored-by: Marshall Bowers <elliott.codes@gmail.com> Co-authored-by: Antonio <antonio@zed.dev> 2024-04-12 20:40:59 +03:00			`util. workspace = true`
Use outline queries to chunk files syntactically (#11283) This chunking strategy uses the existing `outline` query to chunk files. We try to find chunk boundaries that are: * at starts or ends of lines * nested within as few outline items as possible Release Notes: - N/A 2024-05-02 22:28:21 +03:00			`unindent.workspace = true`
More fixes to the semantic index's chunking (#11376) This fixes a tricky intermittent issue I was seeing, where failed to chunk certain files correctly because of the way we reuse Tree-sitter `Parser` instances across parses. I've also accounted for leading comments in chunk boundaries, so that items are grouped with their leading comments whenever possible when chunking. Finally, we've changed the `debug project index` action so that it opens a simple debug view in a pane, instead of printing paths to the console. This lets you click into a path and see how it was chunked. Release Notes: - N/A --------- Co-authored-by: Marshall <marshall@zed.dev> 2024-05-04 05:00:18 +03:00			`workspace.workspace = true`
Semantic Index (#10329) This introduces semantic indexing in Zed based on chunking text from files in the developer's workspace and creating vector embeddings using an embedding model. As part of this, we've created an embeddings provider trait that allows us to work with OpenAI, a local Ollama model, or a Zed hosted embedding. The semantic index is built by breaking down text for known (programming) languages into manageable chunks that are smaller than the max token size. Each chunk is then fed to a language model to create a high dimensional vector which is then normalized to a unit vector to allow fast comparison with other vectors with a simple dot product. Alongside the vector, we store the path of the file and the range within the document where the vector was sourced from. Zed will soon grok contextual similarity across different text snippets, allowing for natural language search beyond keyword matching. This is being put together both for human-based search as well as providing results to Large Language Models to allow them to refine how they help developers. Remaining todo: * [x] Change `provider` to `model` within the zed hosted embeddings database (as its currently a combo of the provider and the model in one name) Release Notes: - N/A --------- Co-authored-by: Nathan Sobo <nathan@zed.dev> Co-authored-by: Antonio Scandurra <me@as-cii.com> Co-authored-by: Conrad Irwin <conrad@zed.dev> Co-authored-by: Marshall Bowers <elliott.codes@gmail.com> Co-authored-by: Antonio <antonio@zed.dev> 2024-04-12 20:40:59 +03:00			`worktree.workspace = true`

			`[dev-dependencies]`
			`env_logger.workspace = true`
			`client = { workspace = true, features = ["test-support"] }`
			`fs = { workspace = true, features = ["test-support"] }`
			`futures.workspace = true`
			`gpui = { workspace = true, features = ["test-support"] }`
			`language = { workspace = true, features = ["test-support"] }`
			`languages.workspace = true`
			`project = { workspace = true, features = ["test-support"] }`
			`tempfile.workspace = true`
			`util = { workspace = true, features = ["test-support"] }`
			`worktree = { workspace = true, features = ["test-support"] }`
More fixes to the semantic index's chunking (#11376) This fixes a tricky intermittent issue I was seeing, where failed to chunk certain files correctly because of the way we reuse Tree-sitter `Parser` instances across parses. I've also accounted for leading comments in chunk boundaries, so that items are grouped with their leading comments whenever possible when chunking. Finally, we've changed the `debug project index` action so that it opens a simple debug view in a pane, instead of printing paths to the console. This lets you click into a path and see how it was chunked. Release Notes: - N/A --------- Co-authored-by: Marshall <marshall@zed.dev> 2024-05-04 05:00:18 +03:00			`workspace = { workspace = true, features = ["test-support"] }`
Extract `http` from `util` (#11680) This avoids the CLI linking libssl etc... Release Notes: - N/A 2024-05-11 00:50:20 +03:00			`http = { workspace = true, features = ["test-support"] }`