juspay/services-flake

mirror of https://github.com/juspay/services-flake.git synced 2024-09-11 04:15:36 +03:00

shivaraj-bh 14a3740710 docs(ollama): Embed X demo

2024-06-14 00:12:43 +05:30

1.3 KiB

Raw Blame History

Ollama

Ollama enables you to easily run large language models (LLMs) locally. It supports Llama 3, Mistral, Gemma and many others.

❄️You can now perform LLM inference with Ollama in services-flake!https://t.co/rtHIYdnPfb pic.twitter.com/1hBqMyViEm
— NixOS Asia (@nixos_asia) June 12, 2024

Getting Started

# In `perSystem.process-compose.<name>`
{
  services.ollama."ollama1".enable = true;
}

Acceleration

By default Ollama uses the CPU for inference. To enable GPU acceleration:

CUDA

For NVIDIA GPUs.

# In `perSystem.process-compose.<name>`
{
  services.ollama."ollama1" = {
    enable = true;
    acceleration = "cuda";
  };
}

ROCm

For Radeon GPUs.

# In `perSystem.process-compose.<name>`
{
  services.ollama."ollama1" = {
    enable = true;
    acceleration = "rocm";
  };
}