services-flake/doc/ollama.md
2024-06-14 00:12:43 +05:30

1.3 KiB

Ollama

Ollama enables you to easily run large language models (LLMs) locally. It supports Llama 3, Mistral, Gemma and many others.

❄️You can now perform LLM inference with Ollama in services-flake!https://t.co/rtHIYdnPfb pic.twitter.com/1hBqMyViEm

— NixOS Asia (@nixos_asia) June 12, 2024

Getting Started

# In `perSystem.process-compose.<name>`
{
  services.ollama."ollama1".enable = true;
}

Acceleration

By default Ollama uses the CPU for inference. To enable GPU acceleration:

CUDA

For NVIDIA GPUs.

# In `perSystem.process-compose.<name>`
{
  services.ollama."ollama1" = {
    enable = true;
    acceleration = "cuda";
  };
}

ROCm

For Radeon GPUs.

# In `perSystem.process-compose.<name>`
{
  services.ollama."ollama1" = {
    enable = true;
    acceleration = "rocm";
  };
}