1
1
mirror of https://github.com/rui314/mold.git synced 2024-09-11 13:06:59 +03:00

Update README

This commit is contained in:
Rui Ueyama 2020-10-21 14:55:50 +09:00
parent 2d49c77717
commit 8aadae6b35

125
README.md
View File

@ -1,6 +1,123 @@
# mold: modern linker
# mold: A Modern Linker
mold is an experimental linker aiming to be almost as fast as the
"cat" command.
![mold image](mold.jpg)
![mold image](mold.jpg)
This is a repository of a linker I'm currently developing as an
independent project for my Masters degree.
My goal is to make a linker that is almost as fast as concatenating
object files with `cat` command. Concretely speaking, I want to use the
linker to link a Chromium executable (about 1.8 GiB in size) just in 2
seconds. LLVM's lld, the fastest open-source linker which I originally
created a few years ago takes about 12 seconds to link Chromium on my
machine. So the goal is 6x performance bump over lld. I don't know if
I can ever achieve that, but it's worth a try. I need to create
something anyway to earn units to graduate, and I want to (at least
try to) create something useful.
I have quite a few new ideas as to how to achieve that speedup, though
they are still just random unproved thoughts which need to be
implemented and tested with benchmarks. Here is a brain dump:
- In order to achieve a `cat`-like performance, the most important
thing is to fix the layout of an output file as soon as possible, so
that we can start copying actual data from input object files to an
output executable/shared library file.
- The number of cores on a PC has increased a lot lately, and this
trend is expected to continue. We'll see many more cores on our
development machines. We should design the linker in such a way that
it scales well with many more cores. I have a 64-core/128-thread
machine, so my goal is to use that CPU nicely.
- Copying data from input files to an output file is I/O-bounded, so
there should be room for doing computationally-intensive tasks while
copying data from one file to another.
- We should focus on shortening the link time for the usual
modify-build-debug development cycle because the linker speed
matters most in that situation. It is OK to take a little bit more
time for cold build because for cold build, the time taken by the
compiler is a dominant factor anyway.
- After the first invocation of the linker, the linker should not exit
but instead become a daemon to keep parsed input files in memory.
The daemonized linker keeps an eye on the build directories using
[inotify(2)](https://en.wikipedia.org/wiki/Inotify), and as soon as
a new file is created or an exiting file is updated, it reloads a
file to memory.
- Daemonizing alone wouldn't make the linker magically faster. We need
to split the linker into two in such a way that the latter half of
the process finishes as quickly as possible by speculatively parsing
and preprocessing input file in the first half of the process. The
key factor of success would be to design nice data structures to
pass data between the first half and the second half that allows us
to offload as much processing as possible from the second to the
first half.
- One of the most time-consuming stage among linker stages is symbol
resolution. To resolve symbols, we basically have to throw all
symbol strings into a hash table to match undefined symbols with
defined symbols. But this can be done in the daemon rather than
after the actual command line is specified using [string
interning](https://en.wikipedia.org/wiki/String_interning).
- Object files may contain a special section called a mergeable string
section. The section contains lots of null-terminated strings, and
the linker is expected to gather all mergeable string sections and
merge their contents. So, if two object files contain the same
string literal, for example, the resulting output will contain a
single merged string. This step is time-consuming, but string
merging can be done in the daemon using string interning.
- Static archives (.a files) contain object files, but the static
archive's string table contains only defined symbols of member
object files and lacks other types of symbols. That makes static
archives unsuitable for speculative parsing. The daemon should
ignore the string table of static archive and directly read all
member object files to get the whole picture of all possible input
files.
- If there's a relocation that uses a GOT of a symbol, then we have to
create a GOT entry for that symbol. Otherwise, we shouldn't. That
means we need to scan all relocation tables to fix the length and
the contents of a .got section. This is perhaps time-consuming, but
we can do that while copying data from input files to an output
file. After the data copy is done, we can attach a .got section at
the end of the output file.
- If we aim to the 2 seconds goal for Chromium, every millisecond
counts. We can't ignore the latency of process exit. If we mmap a
lot of files, \_exit(2) is not instantaneous but takes a few hundred
milliseconds because the kernel has to clean up a lot of
resources. As a workaround, we should organize the linker command as
two processes; the first process forks the second process, and the
second process does the actual work. As soon as the second process
writes a result file to a filesystem, it notifies the first process,
and the first process exits. The second process can take time to
exit, because it is not an interactive process.
- GNU ld, GNU gold and LLVM lld support essentially the same set of
command line options and features. mold doesn't have to be
completely compatible with them. As long as it can be used for
linking large user-land programs, I'm fine with that. It is OK to
leave some command line options unimplemented; if mold is blazingly
fast, other project would still be happy to adopt it by modifying
their projects' build files.
- I don't want to support the linker script language in mold because
it's so complicated and inevitably slows down the linker. User-land
programs rarely use linker scripts, so it shouldn't be a roadblock
for most projects.
- [Intel Threading Building
Blocks](https://github.com/oneapi-src/oneTBB) (TBB) is a good
library for parallel execution and has several concurrent
containers. We are particularly interested in using
`parallel_for_each` and `concurrent_hash_map`.
- The output from the linker should be deterministic for the sake of
[build reproducibility](https://en.wikipedia.org/wiki/Reproducible_builds).
This might add a little bit of overhead to the linker, but that
shouldn't be too much.