### Pull Request Title
Introduce `max_output_tokens` Field for OpenAI Models
https://platform.deepseek.com/api-docs/news/news0725/#4-8k-max_tokens-betarelease-longer-possibilities
### Description
This commit introduces a new field `max_output_tokens` to the OpenAI
models, which allows specifying the maximum number of tokens that can be
generated in the output. This field is now integrated into the request
handling across multiple crates, ensuring that the output token limit is
respected during language model completions.
Changes include:
- Adding `max_output_tokens` to the `Custom` variant of the
`open_ai::Model` enum.
- Updating the `into_open_ai` method in `LanguageModelRequest` to accept
and use `max_output_tokens`.
- Modifying the `OpenAiLanguageModel` and `CloudLanguageModel`
implementations to pass `max_output_tokens` when converting requests.
- Ensuring that the `max_output_tokens` field is correctly serialized
and deserialized in relevant structures.
This enhancement provides more control over the output length of OpenAI
model responses, improving the flexibility and accuracy of language
model interactions.
### Changes
- Added `max_output_tokens` to the `Custom` variant of the
`open_ai::Model` enum.
- Updated the `into_open_ai` method in `LanguageModelRequest` to accept
and use `max_output_tokens`.
- Modified the `OpenAiLanguageModel` and `CloudLanguageModel`
implementations to pass `max_output_tokens` when converting requests.
- Ensured that the `max_output_tokens` field is correctly serialized and
deserialized in relevant structures.
### Related Issue
https://github.com/zed-industries/zed/pull/16358
### Screenshots / Media
N/A
### Checklist
- [x] Code compiles correctly.
- [x] All tests pass.
- [ ] Documentation has been updated accordingly.
- [ ] Additional tests have been added to cover new functionality.
- [ ] Relevant documentation has been updated or added.
### Release Notes
- Added `max_output_tokens` field to OpenAI models for controlling
output token length.
This PR is a refactor to pave the way for allowing the user to view and
edit workflow step resolutions. I've made tool calls work more like
normal streaming completions for all providers. The `use_any_tool`
method returns a stream of strings (which contain chunks of JSON). I've
also done some minor cleanup of language model providers in general,
removing the duplication around handling streaming responses.
Release Notes:
- N/A
This PR updates the LLM service to authorize access to language model
providers based on the requester's country.
We detect the country using Cloudflare's
[`CF-IPCountry`](https://developers.cloudflare.com/fundamentals/reference/http-request-headers/#cf-ipcountry)
header.
The country code is then checked against the list of supported countries
for the given LLM provider. Countries that are not supported will
receive an `HTTP 451: Unavailable For Legal Reasons` response.
Release Notes:
- N/A
- [x] OpenAI
- [ ] ~Google~ Moved into a separate branch at:
https://github.com/zed-industries/zed/tree/tool-calls-in-google-ai I've
ran into issues with having the API digest our schema without tripping
over itself - the function call parameters are malformed and whatnot. We
can resume from that branch if needed.
- [x] Ollama
- [x] Cloud
- [ ] ~Copilot Chat (?)~
Release Notes:
- Added tool calling capabilities to OpenAI and Ollama models.
In this pull request, we change the zed.dev protocol so that we pass the
raw JSON for the specified provider directly to our server. This avoids
the need to define a protobuf message that's a superset of all these
formats.
@bennetbo: We also changed the settings for available_models under
zed.dev to be a flat format, because the nesting seemed too confusing.
Can you help us upgrade the local provider configuration to be
consistent with this? We do whatever we need to do when parsing the
settings to make this simple for users, even if it's a bit more complex
on our end. We want to use versioning to avoid breaking existing users,
but need to keep making progress.
```json
"zed.dev": {
"available_models": [
{
"provider": "anthropic",
"name": "some-newly-released-model-we-havent-added",
"max_tokens": 200000
}
]
}
```
Release Notes:
- N/A
---------
Co-authored-by: Nathan <nathan@zed.dev>
This PR adds the missing workspace lint configuration for the following
crates that were missing it:
- `google_ai`
- `open_ai`
- `tab_switcher`
Release Notes:
- N/A
<img width="624" alt="image"
src="https://github.com/user-attachments/assets/f492b0bd-14c3-49e2-b2ff-dc78e52b0815">
- [x] Correctly set custom model token count
- [x] How to count tokens for Gemini models?
- [x] Feature flag zed.dev provider
- [x] Figure out how to configure custom models
- [ ] Update docs
Release Notes:
- Added support for quickly switching between multiple language model
providers in the assistant panel
---------
Co-authored-by: Antonio <antonio@zed.dev>
Here is an image of my now getting assistance responses!
![2024-07-03_08-45-37_swappy](https://github.com/zed-industries/zed/assets/20910163/904adc51-cb40-4622-878e-f679e0212426)
I ended up adding a function to handle the use case of not serializing
the tool_calls response if it is either null or empty to keep the
functionality of the existing implementation (not deserializing if vec
is empty). I'm sorta a noob, so happy to make changes if this isn't done
correctly, although it does work and it does pass tests!
Thanks a bunch to [amtoaer](https://github.com/amtoaer) for pointing me
in the direction on how to fix it.
Release Notes:
- Fixed some responses being dropped from OpenAI-compatible providers
([#13741](https://github.com/zed-industries/zed/issues/13741)).
This PR adds a setting to allow configuring the low-speed timeout for
the Assistant when using the OpenAI provider.
The `low_speed_timeout_in_seconds` accepts a number of seconds that the
HTTP client can go below a minimum speed limit (currently set to 100
bytes/second) before it times out.
```json
{
"assistant": {
"version": "1",
"provider": { "name": "openai", "low_speed_timeout_in_seconds": 60 }
},
}
```
This should help the case where the `openai` provider is being used with
a local model that requires higher timeouts.
Issue: https://github.com/zed-industries/zed/issues/9913
Release Notes:
- Added a `low_speed_timeout_in_seconds` setting to the Assistant's
OpenAI provider
([#9913](https://github.com/zed-industries/zed/issues/9913)).
This is a crate only addition of a new version of the AssistantPanel.
We'll be putting this behind a feature flag while we iron out the new
experience.
Release Notes:
- N/A
---------
Co-authored-by: Nathan Sobo <nathan@zed.dev>
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Conrad Irwin <conrad@zed.dev>
Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
Co-authored-by: Antonio Scandurra <antonio@zed.dev>
Co-authored-by: Nate Butler <nate@zed.dev>
Co-authored-by: Nate Butler <iamnbutler@gmail.com>
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Co-authored-by: Max <max@zed.dev>
This introduces semantic indexing in Zed based on chunking text from
files in the developer's workspace and creating vector embeddings using
an embedding model. As part of this, we've created an embeddings
provider trait that allows us to work with OpenAI, a local Ollama model,
or a Zed hosted embedding.
The semantic index is built by breaking down text for known
(programming) languages into manageable chunks that are smaller than the
max token size. Each chunk is then fed to a language model to create a
high dimensional vector which is then normalized to a unit vector to
allow fast comparison with other vectors with a simple dot product.
Alongside the vector, we store the path of the file and the range within
the document where the vector was sourced from.
Zed will soon grok contextual similarity across different text snippets,
allowing for natural language search beyond keyword matching. This is
being put together both for human-based search as well as providing
results to Large Language Models to allow them to refine how they help
developers.
Remaining todo:
* [x] Change `provider` to `model` within the zed hosted embeddings
database (as its currently a combo of the provider and the model in one
name)
Release Notes:
- N/A
---------
Co-authored-by: Nathan Sobo <nathan@zed.dev>
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Conrad Irwin <conrad@zed.dev>
Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
Co-authored-by: Antonio <antonio@zed.dev>
Co-authored-by: Antonio <antonio@zed.dev>
Resurrected this from some assistant work I did in Spring of 2023.
- [x] Resurrect streaming responses
- [x] Use streaming responses to enable AI via Zed's servers by default
(but preserve API key option for now)
- [x] Simplify protobuf
- [x] Proxy to OpenAI on zed.dev
- [x] Proxy to Gemini on zed.dev
- [x] Improve UX for switching between openAI and google models
- We current disallow cycling when setting a custom model, but we need a
better solution to keep OpenAI models available while testing the google
ones
- [x] Show remaining tokens correctly for Google models
- [x] Remove semantic index
- [x] Delete `ai` crate
- [x] Cloud front so we can ban abuse
- [x] Rate-limiting
- [x] Fix panic when using inline assistant
- [x] Double check the upgraded `AssistantSettings` are
backwards-compatible
- [x] Add hosted LLM interaction behind a `language-models` feature
flag.
Release Notes:
- We are temporarily removing the semantic index in order to redesign it
from scratch.
---------
Co-authored-by: Antonio <antonio@zed.dev>
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Thorsten <thorsten@zed.dev>
Co-authored-by: Max <max@zed.dev>