docs: restructure + add nixpkgs-improvements.md

This commit is contained in:
DavHau 2021-09-23 18:51:55 +01:00
parent 9692a6dde1
commit d6fc258333
3 changed files with 257 additions and 182 deletions

193
README.md
View File

@ -1,203 +1,32 @@
## [WIP] dream2nix - A generic framework for 2nix tools
dream2nix is a generic framework for 2nix tools (tools translating from other build systems to nix).
dream2nix is a generic framework for 2nix converters (converting from other build systems to nix).
It focuses on the following aspects:
- Modularity
- Customizability
- Maintainability
- Nixpkgs Compatibility (not enforcing IFD)
- Code de-duplication across 2nix tools
- Code de-duplication across 2nix converters
- Code de-duplication in nixpkgs
- Risk free opt-in FOD fetching (no reproducibility issues)
- Common UI across 2nix tools
- Common UI across 2nix converters
- Reduce effort to develop new 2nix solutions
- Exploration and adoption of new nix features
- Simplified updating of packages
### Motivation
2nix tools, or in other words, tools converting instructions of other build systems to nix build instructions, are an important part of the nix/nixos ecosystem. These tools make packaging workflows easier and often allow to manage complexity that would be hard or impossible to manage without.
2nix converters, or in other words, tools converting instructions of other build systems to nix build instructions, are an important part of the nix/nixos ecosystem. These converters make packaging workflows easier and often allow to manage complexity that would be hard or impossible to manage without.
Yet the current landscape of 2nix tools has certain weaknesses. Existing 2nix tools are very monolithic. Authors of these tools are often motivated by some specific use case and therefore the individual approaches are strongly biased and not flexible. All existing tools have quite different user interfaces, use different strategies of parsing, resolving, fetching, building with significantly different options for customizability. As a user of these tools it often feels like there is some part of the tool that suits the needs well, but at the same time it has undesirable hard coded behaviour. Often one would like to use some aspect of one tool combined with some aspect of another tool. One tool, for example, might do a good job in reading a specific lock file format, but lacks customizability for building. Another tool might come with a good customization interface, but is unable to parse the lock file format. Some tools are restricted to use IFD or FOD, while others enforce code generation.
Yet the current landscape of 2nix converters has certain weaknesses. Existing 2nix converters are very monolithic. Authors of these converters are often motivated by some specific use case and therefore the individual approaches are strongly biased and not flexible. All existing converters have quite different user interfaces, use different strategies of parsing, resolving, fetching, building with significantly different options for customizability. As a user of these converters it often feels like there is some part of it that suits the needs well, but at the same time it has undesirable hard coded behaviour. Often one would like to use some aspect of one converter combined with some aspect of another converter. One converter might do a good job in reading a specific lock file format, but lacks customizability for building. Another converters might come with a good customization interface, but is unable to parse the lock file format. Some tools are restricted to use IFD or FOD, while others enforce code generation.
The idea of this project is therefore to create a standardized, generic, modular framework for 2nix solutions, aiming for better flexibility, maintainability and usability.
Ideally this repository will become a hub for re-usable nix code delivered with a nice UI to allow for simpler, more automated packaging.
The plan is to integrate many existing 2nix converters into this framework, and thereby improving many of the previously named aspects and providing a unified UI for all 2nix solutions.
### Further Reading
- [Summary of the core concepts and benefits](/docs/concepts-and-benefits.md)
- [How would this improve the packaging situation in nixpkgs](/docs/nixpkgs-improvements.md)
### Modularity:
The following phases which are generic to basically all existing 2nix solutions:
- parsing project metadata
- resolving/locking dependencies (not always required)
- fetching sources
- building/installing packages
... should be separated from each other with well defined interfaces.
This will allow for free compsition of different approaches for these phases.
The user should be able to freely choose between:
- input metadata formats (eg. lock file formats)
- metadata fetching/translation strategies: IFD vs. in-tree
- source fetching strategies: granular fetching vs fetching via single large FOD to minimize expression file size
- installation strategies: build dependencies individually vs inside a single derivation.
### Customizability
Every Phase mentioned in the previous section should be customizable at a high degree via override functions. Practical examples:
- Inject extra requirements/dependencies
- fetch sources from alternative locations
- replace or modify sources
- customize the build/installation procedure
### Maintainability
Due to the modular architecture with strict interfaces, contributers can add support for new lock-file formats or new strategies for fetching, building, installing more easily.
### Compatibility
Depending on where the nix code is used, different approaches are desired or discouraged. While IFD might be desired for some out of tree projects to achieve simplified UX, it is strictly prohibited in nixpkgs due to nix/hydra limitations.
All solutions which follow the dream2nix specification will be compatible with both approaches without having to re-invent the tool.
### Code de-duplication
Common problems that apply to many 2nix solutions can be solved once by the framework. Examples:
- handling cyclic dependencies
- handling sources from various origins (http, git, local, ...)
- generate nixpkgs/hydra friendly output (no IFD)
- good user interface
### Code de-duplication in nixpkgs
Essential components like package update scripts or fetching and override logic are provided by the dream2nix framework and are stored only once in the source tree instead of several times.
### Risk free opt-in FOD fetching
Optionally, to save more storag space, individual hashes for source can be ommited and a single large FOD used instead.
Due to a unified minimalistic fetching layer the risk of FOD hash breakages should be very low.
### Common UI across many 2nix solutions
2nix solutions which follow the dream2nix framework will have a unified UI for workflows like project initialization or code generation. This will allow quicker onboarding of new users by providing familiar workflows across different build systems.
### Reduced effort to develop new 2nix solutions
Since the framework already solves common problems and provides an interface for integrating new build systems, developers will have an easier time creating their next 2nix solution.
### Architecture
The general architecture should consist of these components:
`Input -> Translation -> Generic Lock -> Fetching -> Building`
```
┌───────┐
│ Input │◄── Arbitrary
└────┬──┘ URLs + Metadata containing Build instructions
│ ┌──────────┐ in standardized minimalistic form (json)
└──►│Translator│ │
└───────┬──┘ ▼
▲ │ ┌────────────┐
│ └──►│Generic Lock│
└─────────┬──┘
impure/pure │ ┌────────┐
online/offline ├──►│Fetcher │◄── Same across all
pure-nix/IFD/external │ └────────┘ languages/frameworks
│ ▼
│ ┌────────┐
└──►│Builder │◄── Reads extra metadata
└────────┘ from generic lock
```
Input:
- can consist of:
- requirement contstraints
- requirement files
- lock-files
- project's source tree
Translator:
- read input and generate generic lock format containing:
- URLs + hashes of sources
- metadata for building
- different strategies can be used:
- `pure-nix`: translate input by using the nix language only
- `IFD/recursive`: translate using a nix build
- `external`: translate using an external tool which resolves against an online package index
- for more information about translators and how nixpkgs compatibility is guaranteed, check [docs/translators.md](/docs/translators.md)
Generic Lock (standardized format):
- Produced by `Translator`. Contains URLs + hashes for sources and metadata relevant for building.
- The contained format for sources and dependency relations is independent of the build system. Fetching works always the same.
- The metadata also contains build system specific attributes as individual approaches are required here. A specific builder for the individual build system will later read this metadata and transform it into nix derivations.
- It is not relevant which steps/strategies have been taken to create this lock. From this point on, there are no impurities. This format will contain everything necessary for a fully reproducible build.
- This format can always be put into nixpkgs, not requiring any IFD (given the nix code for the builder exists within nixpkgs).
- In case of a pure-nix translator, the generic lock data can be generated on the fly and passed directly to the builder, preventing unnecessary usage of IFD.
Fetcher:
- Since a generic lock was produced in the previous step, the fetching layer can be the same across all build systems.
Builder:
- Receives sources from fetcher and metadata produced by the translator.
- The builder transforms the metadata into nix derivation(s).
- Strictly separating the builder from previous phases allows:
- switching between different build strategies or upgrading the builder without having to re-run the translator each time.
- reducing code duplication if a project contains multiple packages built via dream2nix.
### Example (walk through the phases)
#### python project with poetry.lock
As an example we package a python project that uses poetry for dependency management.
Potery uses `pyproject.toml` and `poetry.lock` to lock dependencies
- Input: pyproject.toml, poetry.lock (toml)
- Translator: written in pure nix, reading the toml input and generating the generic lock format
- Generic Lock (for explanatory purposes dumped to json and commented):
```json5
{
// generic lock format version
"version": 1,
// format for sources is always the same (not specific to python)
"sources": {
"requests": {
"type": "tarball",
"url": "https://download.pypi.org/requests/2.28.0",
"hash": "deadbeefdeadbeefdeadbeefdeadbeefdeadbeef",
},
"certifi": {
"type": "github",
"owner": "certifi",
"repo": "python-certifi",
"hash": "deadbeefdeadbeefdeadbeefdeadbeefdeadbeef"
}
},
// generic metadata (not specific to python)
"generic": {
// this indicates which builder must be used
"buildSystem": "python",
// translator which generated this file
// (not relevant for building)
"producedBy": "translator-poetry-1",
// dependency graph of the packages
"dependencyGraph": {
"requests": [
"certifi"
]
}
},
// all fields inside 'buildSystem' are specific to
// the selected buildSystem (python)
"buildSystem": {
// tell the python builder how the inputs must be handled
"sourceFormats": {
"requests": "sdist", // triggers build instructions for sdist
"certifi": "wheel" // triggers build instructions for wheel
}
}
}
```
- This lock data can now either:
- be dumped to a .json file and committed to a repo
- passed directly to the fetching/building layer
- the fetcher will only read the sources section and translate it to standard fetcher calls.
- the building layer will read the "buildSystem" attribute and select the python builder for building.
- the python builder will read all information from "buildSystem" and translate the data to a final derivation.
Notes on IFD, FOD and code generation:
- No matter which type of tanslator is used, it is always possible to export the generic lock to a file, which can later be evaluated without using IFD or FOD, similar to current nix code generators, just with a standardized format.
- If the translator supports IFD or is written in pure nix, it is optional to the user to skip exporting the generic lock and instead evaluate everything on the fly.

View File

@ -0,0 +1,176 @@
### Modularity:
The following phases which are generic to basically all existing 2nix solutions:
- parsing project metadata
- resolving/locking dependencies (not always required)
- fetching sources
- building/installing packages
... should be separated from each other with well defined interfaces.
This will allow for free compsition of different approaches for these phases.
The user should be able to freely choose between:
- input metadata formats (eg. lock file formats)
- metadata fetching/translation strategies: IFD vs. in-tree
- source fetching strategies: granular fetching vs fetching via single large FOD to minimize expression file size
- installation strategies: build dependencies individually vs inside a single derivation.
### Customizability
Every Phase mentioned in the previous section should be customizable at a high degree via override functions. Practical examples:
- Inject extra requirements/dependencies
- fetch sources from alternative locations
- replace or modify sources
- customize the build/installation procedure
### Maintainability
Due to the modular architecture with strict interfaces, contributers can add support for new lock-file formats or new strategies for fetching, building, installing more easily.
### Compatibility
Depending on where the nix code is used, different approaches are desired or discouraged. While IFD might be desired for some out of tree projects to achieve simplified UX, it is strictly prohibited in nixpkgs due to nix/hydra limitations.
All solutions which follow the dream2nix specification will be compatible with both approaches without having to re-invent the tool.
### Code de-duplication
Common problems that apply to many 2nix solutions can be solved once by the framework. Examples:
- handling cyclic dependencies
- handling sources from various origins (http, git, local, ...)
- generate nixpkgs/hydra friendly output (no IFD)
- good user interface
### Code de-duplication in nixpkgs
Essential components like package update scripts or fetching and override logic are provided by the dream2nix framework and are stored only once in the source tree instead of several times.
### Risk free opt-in FOD fetching
Optionally, to save more storag space, individual hashes for source can be ommited and a single large FOD used instead.
Due to a unified minimalistic fetching layer the risk of FOD hash breakages should be very low.
### Common UI across many 2nix solutions
2nix solutions which follow the dream2nix framework will have a unified UI for workflows like project initialization or code generation. This will allow quicker onboarding of new users by providing familiar workflows across different build systems.
### Reduced effort to develop new 2nix solutions
Since the framework already solves common problems and provides an interface for integrating new build systems, developers will have an easier time creating their next 2nix solution.
### Architecture
The general architecture should consist of these components:
`Input -> Translation -> Generic Lock -> Fetching -> Building`
```
┌───────┐
│ Input │◄── Arbitrary
└────┬──┘ URLs + Metadata containing Build instructions
│ ┌──────────┐ in standardized minimalistic form (json)
└──►│Translator│ │
└───────┬──┘ ▼
▲ │ ┌────────────┐
│ └──►│Generic Lock│
└─────────┬──┘
impure/pure │ ┌────────┐
online/offline ├──►│Fetcher │◄── Same across all
pure-nix/IFD/external │ └────────┘ languages/frameworks
│ ▼
│ ┌────────┐
└──►│Builder │◄── Reads extra metadata
└────────┘ from generic lock
```
Input:
- can consist of:
- requirement contstraints
- requirement files
- lock-files
- project's source tree
Translator:
- read input and generate generic lock format containing:
- URLs + hashes of sources
- metadata for building
- different strategies can be used:
- `pure-nix`: translate input by using the nix language only
- `IFD/recursive`: translate using a nix build
- `external`: translate using an external tool which resolves against an online package index
- for more information about translators and how nixpkgs compatibility is guaranteed, check [./translators.md](/docs/translators.md)
Generic Lock (standardized format):
- Produced by `Translator`. Contains URLs + hashes for sources and metadata relevant for building.
- The contained format for sources and dependency relations is independent of the build system. Fetching works always the same.
- The metadata also contains build system specific attributes as individual approaches are required here. A specific builder for the individual build system will later read this metadata and transform it into nix derivations.
- It is not relevant which steps/strategies have been taken to create this lock. From this point on, there are no impurities. This format will contain everything necessary for a fully reproducible build.
- This format can always be put into nixpkgs, not requiring any IFD (given the nix code for the builder exists within nixpkgs).
- In case of a pure-nix translator, the generic lock data can be generated on the fly and passed directly to the builder, preventing unnecessary usage of IFD.
Fetcher:
- Since a generic lock was produced in the previous step, the fetching layer can be the same across all build systems.
Builder:
- Receives sources from fetcher and metadata produced by the translator.
- The builder transforms the metadata into nix derivation(s).
- Strictly separating the builder from previous phases allows:
- switching between different build strategies or upgrading the builder without having to re-run the translator each time.
- reducing code duplication if a project contains multiple packages built via dream2nix.
### Example (walk through the phases)
#### python project with poetry.lock
As an example we package a python project that uses poetry for dependency management.
Potery uses `pyproject.toml` and `poetry.lock` to lock dependencies
- Input: pyproject.toml, poetry.lock (toml)
- Translator: written in pure nix, reading the toml input and generating the generic lock format
- Generic Lock (for explanatory purposes dumped to json and commented):
```json5
{
// generic lock format version
"version": 1,
// format for sources is always the same (not specific to python)
"sources": {
"requests": {
"type": "tarball",
"url": "https://download.pypi.org/requests/2.28.0",
"hash": "deadbeefdeadbeefdeadbeefdeadbeefdeadbeef",
},
"certifi": {
"type": "github",
"owner": "certifi",
"repo": "python-certifi",
"hash": "deadbeefdeadbeefdeadbeefdeadbeefdeadbeef"
}
},
// generic metadata (not specific to python)
"generic": {
// this indicates which builder must be used
"buildSystem": "python",
// translator which generated this file
// (not relevant for building)
"producedBy": "translator-poetry-1",
// dependency graph of the packages
"dependencyGraph": {
"requests": [
"certifi"
]
}
},
// all fields inside 'buildSystem' are specific to
// the selected buildSystem (python)
"buildSystem": {
// tell the python builder how the inputs must be handled
"sourceFormats": {
"requests": "sdist", // triggers build instructions for sdist
"certifi": "wheel" // triggers build instructions for wheel
}
}
}
```
- This lock data can now either:
- be dumped to a .json file and committed to a repo
- passed directly to the fetching/building layer
- the fetcher will only read the sources section and translate it to standard fetcher calls.
- the building layer will read the "buildSystem" attribute and select the python builder for building.
- the python builder will read all information from "buildSystem" and translate the data to a final derivation.
Notes on IFD, FOD and code generation:
- No matter which type of tanslator is used, it is always possible to export the generic lock to a file, which can later be evaluated without using IFD or FOD, similar to current nix code generators, just with a standardized format.
- If the translator supports IFD or is written in pure nix, it is optional to the user to skip exporting the generic lock and instead evaluate everything on the fly.

View File

@ -0,0 +1,70 @@
## List of problems which currently exist in nixpkgs
### Generated Code Size/Duplication:
- large .nix files containing auto generated code for fetching sources (example: nodejs)
- many duplicated .nix files containing build logic
### Update Scripts Duplicaiton/Complexity:
- update scripts are largely duplicated
- update scripts are complex
### Fetching / Caching issues (large FODs):
- non-reproducible large FOD fetchers (example: rust)
- updating FODs is not risk free (forget to update hash)
- bad caching properties due to large FODs
### Update Workflows:
- package update workflows can be complicated
- package update workflows vary significantly depending on the language/fragmework
### Merge Conflicts for shared dependencies:
- Due to badly organized shared dependencies, merge conflicts are likely (example: global node-packages.nix)
### Customizability / Overriding:
- Capabilities vary depending on the underlying solution.
- UI is different depending on the underlying solution.
### Inefficient/Slow Innovation
- Design issues (FOD-impurity, Merge Conflicts, etc.) cannot be fixed easily and lead to long term suffering of maintainers.
- Innovation often happens on individual solutions and are not adapted ecosystem wide
- New nix features will not be easily adapted as this will require updating many individual solutions.
---
## How dream2nix intends to fix these issues
### Generated Code Size/Duplication:
- dream2nix minimizes the amount of generated nix code, as all the fetch/build logic resides in the framework and therefore is not duplicated across packages.
- If the upstream lock file format can be interpreted with pure nix, then generating any intermediary code can be ommited if the upstream lock file is stored instead.
- Once any kind of recursive nix (IFD, recursive-nix, RFC-92) is enabled in nixpkgs, dream2nix will utilize it and eliminate the requirement of generating nix code or storing upsteam lock files
### Update Scripts Duplicaiton/Complexity:
- storing `update.sh` scripts alongside packages will not be necessary anymore. dream2nix can generate update procedures on the fly by reading the package declaration.
- The UI for updating packages is the same across all languages/frameworks
### Fetching / Caching issues:
- dream2nix' upstream metadata translators and always produce a clear list of URLs to fetch
- large-FOD fetching is not necessary and never enforced
- large-FOD fetching can be used optionally (to reduce amount of hashes to be stored)
- even if large-FOD fetching is used, it won't have any of the known reproducibility issues, since dream2nix does never make use of upstream toolchain for fetching and potentially impure operations like dependency resolution are never done inside an FOD.
- updating hashes of FODs is done via dream2nix CLI, which ensures that the correct hashes are in place
- As the use of large-FOD fetching is not necessary and therefore minimized, dependencies are cached on an individual basis and shared between packages.
### Update Workflows:
the workflow for updating packages will be unified and largely independenct of the underlying language/framework.
### Merge Conflicts for shared dependencies:
- Having a central set of shared dependencies can make sense to reduce the code size of nixpkgs and load on the cache.
- To eliminate merge conflicts, the globabl package set can be maintained via a two stage process. Individual package maintainers can manage their dependencies independently. Once every staging cycle, common dependencies can be found via graph analysis and moved into a global package set.
- The total amount of dependency versions used can also be minimized by re-running the resolver on individual packages, prioritizing dependencies from the global set of common packages.
### Customizability / Overriding:
dream2nix provides good interfaces for customizability which are unified as much as possible independently from the underlying buildsystems.
### Inefficient/Slow Innovation
- Since dream2nix centrally handles many core elements of packaging like different strategies for fetching and building, it is much easier to fix problems at large scale and apply new innovations to all underlysing buildsystems at once.
- Experimenting with and adding support for new nix features will be easier as the framework offers better abstractions than existing 2nix converters and allows adding/modifying strategies more easily.