2020-10-21 08:55:50 +03:00
|
|
|
# mold: A Modern Linker
|
2020-10-21 06:52:11 +03:00
|
|
|
|
2020-10-21 08:55:50 +03:00
|
|
|
![mold image](mold.jpg)
|
2020-10-21 06:52:11 +03:00
|
|
|
|
2021-01-16 11:27:06 +03:00
|
|
|
This is a repository of a linker I'm currently developing as a
|
|
|
|
replacement for existing Unix linkers such as GNU BFD, GNU gold or
|
|
|
|
LLVM lld.
|
2020-10-21 08:55:50 +03:00
|
|
|
|
2021-01-16 11:27:06 +03:00
|
|
|
My goal was to make a linker that is as fast as concatenating input
|
2020-11-02 14:57:08 +03:00
|
|
|
object files with `cat` command. It may sound like an impossible goal,
|
2021-01-16 11:27:06 +03:00
|
|
|
but it's not entirely impossible because of the following two reasons:
|
2020-11-02 14:57:08 +03:00
|
|
|
|
|
|
|
1. `cat` is a simple single-threaded program which isn't the fastest
|
|
|
|
one as a file copy command. My linker can use multiple threads to
|
|
|
|
copy file contents more efficiently to save time to do extra work.
|
|
|
|
|
|
|
|
2. Copying file contents is I/O-bounded, and many CPU cores should be
|
|
|
|
available during file copy. We can use them to do extra work while
|
|
|
|
copying file contents.
|
|
|
|
|
2021-01-16 11:27:06 +03:00
|
|
|
Concretely speaking, I wanted to use the linker to link a Chromium
|
|
|
|
executable with full debug info (~2 GiB in size) just in 1 second.
|
|
|
|
LLVM's lld, the fastest open-source linker which I originally created
|
|
|
|
a few years ago, takes about 12 seconds to link Chromium on my machine.
|
|
|
|
So the goal is 12x performance bump over lld. Compared to GNU gold,
|
|
|
|
it's more than 50x.
|
|
|
|
|
|
|
|
It looks like mold has achieved the goal. It can link Chromium in 2
|
|
|
|
seconds with 8-cores/16-threads, and if I enable the preloading
|
|
|
|
feature (I'll explain it later), the latency of the linker for an
|
|
|
|
interactive use is less than 900 milliseconds. It is actualy faster
|
|
|
|
than `cat`.
|
|
|
|
|
|
|
|
Note that even though mold can create a runnable Chrome executable,
|
|
|
|
it is far from complete and not usable for production. mold is still
|
|
|
|
just a toy linker, and this is still just my pet project.
|
2020-10-21 08:55:50 +03:00
|
|
|
|
2020-10-22 07:52:34 +03:00
|
|
|
## Background
|
|
|
|
|
|
|
|
- Even though lld has significantly improved the situation, linking is
|
2020-10-26 16:22:06 +03:00
|
|
|
still one of the slowest steps in a build. It is especially
|
2020-10-22 07:52:34 +03:00
|
|
|
annoying when I changed one line of code and had to wait for a few
|
|
|
|
seconds or even more for a linker to complete. It should be
|
|
|
|
instantaneous. There's a need for a faster linker.
|
|
|
|
|
|
|
|
- The number of cores on a PC has increased a lot lately, and this
|
|
|
|
trend is expected to continue. However, the existing linkers can't
|
2020-10-26 16:26:44 +03:00
|
|
|
take the advantage of that because they don't scale well for more
|
2020-10-22 07:52:34 +03:00
|
|
|
cores. I have a 64-core/128-thread machine, so my goal is to create
|
|
|
|
a linker that uses the CPU nicely. mold should be much faster than
|
|
|
|
other linkers on 4 or 8-core machines too, though.
|
|
|
|
|
|
|
|
- It looks to me that the designs of the existing linkers are somewhat
|
|
|
|
similar, and I believe there are a lot of drastically different
|
|
|
|
designs that haven't been explored yet. Develoeprs generally don't
|
|
|
|
care about linkers as long as they work correctly, and they don't
|
|
|
|
even think about creating a new one. So there may be lots of low
|
|
|
|
hanging fruits there in this area.
|
|
|
|
|
|
|
|
## Basic design
|
|
|
|
|
2020-10-21 08:55:50 +03:00
|
|
|
- In order to achieve a `cat`-like performance, the most important
|
2020-10-21 10:51:40 +03:00
|
|
|
thing is to fix the layout of an output file as quickly as possible, so
|
2020-10-21 08:55:50 +03:00
|
|
|
that we can start copying actual data from input object files to an
|
2020-10-26 16:23:44 +03:00
|
|
|
output file as soon as possible.
|
2020-10-21 08:55:50 +03:00
|
|
|
|
|
|
|
- Copying data from input files to an output file is I/O-bounded, so
|
|
|
|
there should be room for doing computationally-intensive tasks while
|
|
|
|
copying data from one file to another.
|
|
|
|
|
2020-12-21 14:52:02 +03:00
|
|
|
- We should allow the linker to preload object files from disk and
|
|
|
|
parse them in memory before a complete set of input object files
|
|
|
|
is ready. My idea is this: if a user invokes the linker with
|
|
|
|
`--preload` flag along with other command line flags a few seconds
|
|
|
|
before the actual linker invocation, then the following actual
|
|
|
|
linker invocation with the same command line options (except
|
|
|
|
`--preload` flag) becomes magically faster. Behind the scenes, the
|
|
|
|
linker starts preloading object files on the first invocation and
|
|
|
|
becomes a daemon. The second invocation of the linker notifies the
|
|
|
|
daemon to reload updated object files and then proceed.
|
2020-10-21 08:55:50 +03:00
|
|
|
|
|
|
|
- Daemonizing alone wouldn't make the linker magically faster. We need
|
|
|
|
to split the linker into two in such a way that the latter half of
|
|
|
|
the process finishes as quickly as possible by speculatively parsing
|
2020-10-21 10:51:40 +03:00
|
|
|
and preprocessing input files in the first half of the process. The
|
2020-10-22 07:52:34 +03:00
|
|
|
key factor of success would be to design nice data structures that
|
|
|
|
allows us to offload as much processing as possible from the second
|
|
|
|
to the first half.
|
2020-10-21 08:55:50 +03:00
|
|
|
|
|
|
|
- One of the most time-consuming stage among linker stages is symbol
|
|
|
|
resolution. To resolve symbols, we basically have to throw all
|
|
|
|
symbol strings into a hash table to match undefined symbols with
|
2020-10-26 16:24:51 +03:00
|
|
|
defined symbols. But this can be done in the daemon using [string
|
2020-10-21 08:55:50 +03:00
|
|
|
interning](https://en.wikipedia.org/wiki/String_interning).
|
|
|
|
|
|
|
|
- Object files may contain a special section called a mergeable string
|
|
|
|
section. The section contains lots of null-terminated strings, and
|
|
|
|
the linker is expected to gather all mergeable string sections and
|
|
|
|
merge their contents. So, if two object files contain the same
|
|
|
|
string literal, for example, the resulting output will contain a
|
|
|
|
single merged string. This step is time-consuming, but string
|
|
|
|
merging can be done in the daemon using string interning.
|
|
|
|
|
|
|
|
- Static archives (.a files) contain object files, but the static
|
|
|
|
archive's string table contains only defined symbols of member
|
|
|
|
object files and lacks other types of symbols. That makes static
|
|
|
|
archives unsuitable for speculative parsing. The daemon should
|
|
|
|
ignore the string table of static archive and directly read all
|
2020-10-21 10:51:40 +03:00
|
|
|
member object files of all archives to get the whole picture of
|
|
|
|
all possible input files.
|
2020-10-21 08:55:50 +03:00
|
|
|
|
|
|
|
- If there's a relocation that uses a GOT of a symbol, then we have to
|
|
|
|
create a GOT entry for that symbol. Otherwise, we shouldn't. That
|
|
|
|
means we need to scan all relocation tables to fix the length and
|
|
|
|
the contents of a .got section. This is perhaps time-consuming, but
|
2020-12-21 14:52:02 +03:00
|
|
|
this step is parallelizable.
|
2020-10-21 08:55:50 +03:00
|
|
|
|
2020-10-22 07:52:34 +03:00
|
|
|
## Compatibility
|
2020-10-21 08:55:50 +03:00
|
|
|
|
|
|
|
- GNU ld, GNU gold and LLVM lld support essentially the same set of
|
|
|
|
command line options and features. mold doesn't have to be
|
|
|
|
completely compatible with them. As long as it can be used for
|
|
|
|
linking large user-land programs, I'm fine with that. It is OK to
|
|
|
|
leave some command line options unimplemented; if mold is blazingly
|
2020-10-22 07:52:34 +03:00
|
|
|
fast, other projects would still be happy to adopt it by modifying
|
2020-10-21 08:55:50 +03:00
|
|
|
their projects' build files.
|
|
|
|
|
2020-10-22 14:25:35 +03:00
|
|
|
- mold emits Linux executables and runs only on Linux. I won't avoid
|
|
|
|
Unix-ism when writing code (e.g. I'll probably use fork(2)).
|
|
|
|
I don't want to think about portability until mold becomes a thing
|
|
|
|
that's worth to be ported.
|
2020-10-22 07:52:34 +03:00
|
|
|
|
2021-01-12 16:49:47 +03:00
|
|
|
## Linker Script
|
|
|
|
|
|
|
|
Linker script is an embedded language for the linker. It is mainly
|
|
|
|
used to control how input sections are mapped to output sections and
|
|
|
|
the layout of the output, but it can also do a lot of tricky stuff.
|
|
|
|
Its feature is useful especially for embedded programming, but it's
|
|
|
|
also an awfully underdocumented and complex language.
|
|
|
|
|
|
|
|
We have to implement a subset of the linker script language anwyay,
|
|
|
|
because on Linux, /usr/lib/x86_64-linux-gnu/libc.so is (despite its
|
|
|
|
name) not a shared object file but actually an ASCII file containing
|
|
|
|
linker script code to load the _actual_ libc.so file. But the feature
|
|
|
|
set for this purpose is very limited, and it is okay to implement them
|
|
|
|
to mold.
|
|
|
|
|
|
|
|
Besides that, we really don't want to implement the linker script
|
|
|
|
langauge. But at the same time, we want to satisfy the user needs that
|
|
|
|
are currently satisfied with the linker script langauge. So, what
|
|
|
|
should we do? Here is my observation:
|
|
|
|
|
|
|
|
- Linker script allows to do a lot of tricky stuff, such as specifying
|
|
|
|
the exact layout of a file, inserting arbitrary bytes between
|
|
|
|
sections, etc. But most of them can be done with a post-link binary
|
|
|
|
editing tool (such as `objcopy`).
|
|
|
|
|
|
|
|
- It looks like there are two things that truely cannot be done by a
|
|
|
|
post-link editing tool: (a) mapping input sections to output
|
|
|
|
sections, and (b) applying relocations.
|
|
|
|
|
|
|
|
From the above observation, I believe we need to provide only the
|
|
|
|
following features instead of the entire linker script langauge:
|
|
|
|
|
|
|
|
- A method to specify how input sections are mapped to output
|
|
|
|
sections, and
|
|
|
|
|
|
|
|
- a method to set addresses to output sections, so that relocations
|
|
|
|
are applied based on desired adddresses.
|
|
|
|
|
|
|
|
I believe everything else can be done with a post-link binary editing
|
|
|
|
tool.
|
|
|
|
|
2020-10-22 07:52:34 +03:00
|
|
|
## Details
|
|
|
|
|
2020-11-02 15:05:36 +03:00
|
|
|
- If we aim to the 1 second goal for Chromium, every millisecond
|
2020-10-22 07:52:34 +03:00
|
|
|
counts. We can't ignore the latency of process exit. If we mmap a
|
|
|
|
lot of files, \_exit(2) is not instantaneous but takes a few hundred
|
|
|
|
milliseconds because the kernel has to clean up a lot of
|
|
|
|
resources. As a workaround, we should organize the linker command as
|
|
|
|
two processes; the first process forks the second process, and the
|
|
|
|
second process does the actual work. As soon as the second process
|
|
|
|
writes a result file to a filesystem, it notifies the first process,
|
|
|
|
and the first process exits. The second process can take time to
|
|
|
|
exit, because it is not an interactive process.
|
|
|
|
|
2020-11-19 09:57:16 +03:00
|
|
|
- At least on Linux, it looks like the filesystem's performance to
|
|
|
|
allocate new blocks to a new file is the limiting factor when
|
|
|
|
creating a new large file and filling its contents using mmap.
|
|
|
|
If you already have a large file on a filesystem, writing to it is
|
|
|
|
much faster than creating a new fresh file and writing to it.
|
|
|
|
Based on this observation, mold should overwrite to an existing
|
|
|
|
executable file if exists. My quick benchmark showed that I could
|
|
|
|
save 300 milliseconds when creating a 2 GiB output file.
|
|
|
|
Linux doesn't allow to open an executable for writing if it is
|
|
|
|
running (you'll get "text busy" error if you attempt). mold should
|
|
|
|
fall back to the usual way if it fails to open an output file.
|
|
|
|
|
2021-01-25 09:00:38 +03:00
|
|
|
- As an implementation strategy, we do _not_ care about memory leak
|
|
|
|
because we really can't save that much memory by doing precise
|
|
|
|
memory management. It is because most objects that are allocated
|
|
|
|
during an execution of mold are needed until the very end of the
|
|
|
|
program. I'm sure this is an odd memory management scheme (or the
|
|
|
|
lack thereof), but this is what LLVM lld does too.
|
|
|
|
|
2020-11-02 15:06:44 +03:00
|
|
|
- The output from the linker should be deterministic for the sake of
|
|
|
|
[build reproducibility](https://en.wikipedia.org/wiki/Reproducible_builds)
|
|
|
|
and ease of debugging. This might add a little bit of overhead to
|
|
|
|
the linker, but that shouldn't be too much.
|
|
|
|
|
2020-10-24 16:05:40 +03:00
|
|
|
- A .build-id, a unique ID embedded to an output file, is usually
|
|
|
|
computed by applying a cryptographic hash function (e.g. SHA-1) to
|
2021-01-16 09:16:15 +03:00
|
|
|
an output file. This is a slow step, but we can speed it up by
|
|
|
|
splitting a file into small chunks, computing SHA-1 for each chunk,
|
|
|
|
and then computing SHA-1 of the concatenated SHA-1 hashes
|
|
|
|
(i.e. constructing a [Markle
|
|
|
|
Tree](https://en.wikipedia.org/wiki/Merkle_tree) of height 2).
|
|
|
|
Modern x86 processors have purpose-built instructions for SHA-1 and
|
2021-01-23 02:35:31 +03:00
|
|
|
can compute SHA-1 pretty quickly at about 2 GiB/s rate. Using 16
|
|
|
|
cores, a build-id for a 2 GiB executable can be computed in 60 to 70
|
|
|
|
milliseconds.
|
2020-10-24 16:05:40 +03:00
|
|
|
|
2020-10-21 08:55:50 +03:00
|
|
|
- [Intel Threading Building
|
|
|
|
Blocks](https://github.com/oneapi-src/oneTBB) (TBB) is a good
|
|
|
|
library for parallel execution and has several concurrent
|
|
|
|
containers. We are particularly interested in using
|
|
|
|
`parallel_for_each` and `concurrent_hash_map`.
|
|
|
|
|
2021-01-16 09:24:52 +03:00
|
|
|
- TBB provides `tbbmalloc` which works better for multi-threaded
|
|
|
|
applications than the glib'c malloc, but it looks like
|
2021-01-18 06:54:28 +03:00
|
|
|
[jemalloc](https://github.com/jemalloc/jemalloc) and
|
|
|
|
[mimalloc](https://github.com/microsoft/mimalloc) are a little bit
|
2021-01-16 09:24:52 +03:00
|
|
|
more scalable than `tbbmalloc`.
|
|
|
|
|
2020-10-22 19:14:11 +03:00
|
|
|
## Size of the problem
|
|
|
|
|
|
|
|
When linking Chrome, a linker reads 3,430,966,844 bytes of data in
|
|
|
|
total. The data contains the following items:
|
|
|
|
|
2020-10-23 07:17:21 +03:00
|
|
|
| Data item | Number
|
|
|
|
| ------------------------ | ------
|
|
|
|
| Object files | 30,723
|
2020-10-26 16:26:44 +03:00
|
|
|
| Public undefined symbols | 1,428,149
|
|
|
|
| Mergeable strings | 1,579,996
|
|
|
|
| Comdat groups | 9,914,510
|
2021-01-14 09:48:39 +03:00
|
|
|
| Regular sections¹ | 10,345,314
|
2020-10-26 16:26:44 +03:00
|
|
|
| Public defined symbols | 10,512,135
|
|
|
|
| Symbols | 23,953,607
|
|
|
|
| Sections | 27,543,225
|
|
|
|
| Relocations against SHF_ALLOC sections | 39,496,375
|
|
|
|
| Relocations | 62,024,719
|
2020-10-23 07:23:12 +03:00
|
|
|
|
2021-01-14 09:48:39 +03:00
|
|
|
¹ Sections that have to be copied from input object files to an
|
2020-10-23 07:23:12 +03:00
|
|
|
output file. Sections that contain relocations or symbols are for
|
2020-10-24 09:59:38 +03:00
|
|
|
example excluded.
|
2021-01-22 18:09:13 +03:00
|
|
|
|
2021-01-26 16:45:27 +03:00
|
|
|
## Internals
|
|
|
|
|
|
|
|
In this section, I'll explain the internals of mold linker.
|
|
|
|
|
|
|
|
### A brief history of Unix and the Unix linker
|
|
|
|
|
|
|
|
Conceptually, what a linker does is pretty simple. A compiler compiles
|
|
|
|
a fragment of a program (a single source file) into a fragment of
|
|
|
|
machine code and data (an object file, which typically has the .o
|
|
|
|
extension), and a linker stiches them together into a single
|
|
|
|
executable image.
|
|
|
|
|
|
|
|
In reality, modern linkers for Unix-like systems are much more
|
|
|
|
compilcated than the naive understanding because it have gradually
|
|
|
|
gained one feature at a time over the 50 years history of Unix, and
|
|
|
|
it's now something like a bag of lots of miscellaneous features in
|
|
|
|
which none of the features is more important than the others. It is
|
|
|
|
very easy to miss the forest for the trees, since for those who don't
|
|
|
|
know the details of the Unix linker, it is not clear which feature is
|
|
|
|
essential and which is not.
|
|
|
|
|
|
|
|
That being said, one thing is clear that at any point of Unix history,
|
|
|
|
a Unix linker has a coherent feature set for the Unix of that age. So,
|
|
|
|
let me entangle the history to see how the operating system, runtime
|
|
|
|
and linker have gained features that we see today. That should give
|
|
|
|
you an idea why a particular feature has been added to a linker in the
|
|
|
|
first place.
|
|
|
|
|
|
|
|
1. Original Unix doesn't support shared library, and a program is
|
|
|
|
always loaded to a fixed address. An executable is something like a
|
|
|
|
memory dump which is just loaded to a particular address by the
|
|
|
|
kernel. After loading, the kernel starts executing the program by
|
|
|
|
setting an instruction pointer to a particular address.
|
|
|
|
|
|
|
|
A linker for such a simple execution environment is pretty simple.
|
|
|
|
It concatenates all object files given to a linker to create an
|
|
|
|
executable memory image and then fixes up references between object
|
|
|
|
files so that object files can use functions and variables defined
|
|
|
|
in other object files.
|
|
|
|
|
|
|
|
Static library support, which is still an important feature of Unix
|
|
|
|
linker, dates back to this early period of Unix history.
|
|
|
|
To understand what it is, imagine that you are trying to compile
|
|
|
|
a program for the early Unix. You don't want to waste time to
|
|
|
|
compile the libc functions every time you compile your program (the
|
|
|
|
computers of the era was incredibly slow), so you have already
|
|
|
|
placed each libc function into a separate source file and compiled
|
|
|
|
them individually. That means, you have object files for each libc
|
|
|
|
function, e.g., printf.o, scanf.o, atoi.o, write.o, etc.
|
|
|
|
|
|
|
|
Given this configuration, all you have to do to link your program
|
|
|
|
against libc functions is to pick up a right set of libc object
|
|
|
|
files and give them to the linker along with object files of your
|
|
|
|
program. But, keeping the command line options in sync with the
|
|
|
|
libc functions you are using in your program is bothersome. You can
|
|
|
|
be conservative; you can specify all libc object files to the
|
|
|
|
command line, but that leads to a program bloat because the linker
|
|
|
|
unconditionally link all object files given to it. So, a new
|
|
|
|
feature was added to the linker to fix the problem. That is the
|
|
|
|
archive file.
|
|
|
|
|
|
|
|
If you put all libc object files into an archive file (which is
|
|
|
|
similar to zip but is not compressed, and typically named libc.a)
|
|
|
|
and pass the archive file to the linker, the linker pulls out an
|
|
|
|
object file _only when_ it is referenced by your program. In other
|
|
|
|
words, unlike object files directly given to a linker, object files
|
|
|
|
wrapped in an archive are not linked to an output by default.
|
|
|
|
Object files in an archive work as supplements to complete your
|
|
|
|
program.
|
|
|
|
|
|
|
|
Even today, you can still find a libc archive file. Run `ar t
|
|
|
|
/usr/lib/x86_64-linux-gnu/libc.a` on Linux should give you a list
|
|
|
|
of object files in the libc archive.
|
|
|
|
|
|
|
|
2. In '80s, Sun Microsystems, a leading commercial Unix vendor at the
|
|
|
|
time, added a shared library support to their Unix variant, SunOS.
|
|
|
|
|
|
|
|
(This section is incomplete.)
|
|
|
|
|
2021-01-22 18:09:13 +03:00
|
|
|
## Rejected ideas
|
|
|
|
|
|
|
|
In this section, I'll explain the alternative designs I currently do
|
|
|
|
not plan to implement and why I turned them down.
|
|
|
|
|
|
|
|
- Placing variable-length sections at end of an output file and start
|
|
|
|
copying file contents before fixing the output file layout
|
|
|
|
|
2021-01-23 06:11:57 +03:00
|
|
|
Idea: Fixing the layout of regular sections seems easy, and if we
|
2021-01-23 02:35:31 +03:00
|
|
|
place them at beginning of a file, we can start copying their
|
|
|
|
contents from their input files to an output file. While copying
|
|
|
|
file contents, we can compute the sizes of variable-length sections
|
2021-01-23 07:56:18 +03:00
|
|
|
such as .got or .plt and place them at end of the file.
|
2021-01-22 18:09:13 +03:00
|
|
|
|
2021-01-23 02:35:31 +03:00
|
|
|
Reason for rejection: I did not choose this design because I doubt
|
|
|
|
if it could actually shorten link time and I think I don't need it
|
|
|
|
anyway.
|
2021-01-22 18:09:13 +03:00
|
|
|
|
|
|
|
The linker has to de-duplicate comdat sections (i.e. inline
|
|
|
|
functions that are included into multiple object files), so we
|
2021-01-23 02:19:58 +03:00
|
|
|
cannot compute the layout of regular sections until we resolve all
|
2021-01-22 18:09:13 +03:00
|
|
|
symbols and de-duplicate comdats. That takes a few hundred
|
|
|
|
milliseconds. After that, we can compute the sizes of
|
|
|
|
variable-length sections in less than 100 milliseconds. It's quite
|
|
|
|
fast, so it doesn't seem to make much sense to proceed without
|
|
|
|
fixing the final file layout.
|
|
|
|
|
2021-01-23 07:56:18 +03:00
|
|
|
The other reason to reject this idea is because there's good a
|
|
|
|
chance for this idea to have a negative impact on linker's overall
|
|
|
|
performance. If we copy file contents before fixing the layout, we
|
|
|
|
can't apply relocations to them while copying because symbol
|
|
|
|
addresses are not available yet. If we fix the file layout first, we
|
|
|
|
can apply relocations while copying, which is effectively zero-cost
|
|
|
|
due to a very good data locality. On the other hand, if we apply
|
|
|
|
relocations long after we copy file contents, it's pretty expensive
|
|
|
|
because section contents are very likely to have been evicted from
|
|
|
|
CPU cache.
|
2021-01-23 02:54:48 +03:00
|
|
|
|
2021-01-22 18:09:13 +03:00
|
|
|
- Incremental linking
|
|
|
|
|
2021-01-23 02:35:31 +03:00
|
|
|
Idea: Incremental linking is a technique to patch a previous linker's
|
2021-01-22 18:09:13 +03:00
|
|
|
output file so that only functions or data that are updated from the
|
|
|
|
previous build are written to it. It is expected to significantly
|
|
|
|
reduce the amount of data copied from input files to an output file
|
|
|
|
and thus speed up linking. GNU BFD and gold linkers support it.
|
|
|
|
|
2021-01-23 02:35:31 +03:00
|
|
|
Reason for rejection: I turned it down because it (1) is
|
|
|
|
complicated, (2) doesn't seem to speed it up that much and (3) has
|
|
|
|
several practical issues. Let me explain each of them.
|
2021-01-22 18:09:13 +03:00
|
|
|
|
|
|
|
First, incremental linking for real C/C++ programs is not as easy as
|
|
|
|
one might think. Let me take malloc as an example. malloc is usually
|
|
|
|
defined by libc, but you can implement it in your program, and if
|
|
|
|
that's the case, the symbol `malloc` will be resolved to your
|
|
|
|
function instead of the one in libc. If you include a library that
|
|
|
|
defines malloc (such as libjemalloc or libtbbmallc) before libc,
|
|
|
|
their malloc will override libc's malloc.
|
|
|
|
|
2021-01-23 02:19:58 +03:00
|
|
|
Assume that you are using a nonstandard malloc. What if you remove
|
|
|
|
your malloc from your code, or remove `-ljemalloc` from your
|
|
|
|
Makefile? The linker has to include a malloc from libc, which may
|
|
|
|
include more object files to satisfy its dependencies. Such code
|
|
|
|
change can affect the entire program rather than just replacing one
|
2021-01-23 02:54:48 +03:00
|
|
|
function. The same is true to adding malloc to your program. Making
|
|
|
|
a local change doesn't necessarily result in a local change in the
|
|
|
|
binary level. It can easily have cascading effects.
|
|
|
|
|
|
|
|
Some ELF fancy features make incremental linking even harder to
|
|
|
|
implement. Take the weak symbol as an example. If you define `atoi`
|
|
|
|
as a weak symbol in your program, and if you are not using `atoi`
|
|
|
|
at all in your program, that symbol will be resolved to address
|
|
|
|
0. But if you start using some libc function that indirectly calls
|
|
|
|
`atoi`, then `atoi` will be included to your program, and your weak
|
|
|
|
symbol will be resolved to that function. I don't know how to
|
|
|
|
efficiently fix up a binary for this case.
|
2021-01-22 18:09:13 +03:00
|
|
|
|
|
|
|
This is a hard problem, so existing linkers don't try too hard to
|
2021-01-23 02:19:58 +03:00
|
|
|
solve it. For example, IIRC, gold falls back to full link if any
|
|
|
|
function is removed from a previous build. If you want to not annoy
|
|
|
|
users in the fallback case, you need to make full link fast anyway.
|
2021-01-22 18:09:13 +03:00
|
|
|
|
|
|
|
Second, incremental linking itself has an overhead. It has to detect
|
2021-01-23 02:54:48 +03:00
|
|
|
updated files, patch an existing output file and write additional
|
2021-01-23 02:19:58 +03:00
|
|
|
data to an output file for future incremental linking. GNU gold, for
|
|
|
|
instance, takes almost 30 seconds on my machine to do a null
|
|
|
|
incremental link (i.e. no object files are updated from a previous
|
|
|
|
build) for chrome. It's just too slow.
|
2021-01-22 18:09:13 +03:00
|
|
|
|
2021-01-23 02:19:58 +03:00
|
|
|
Third, there are other practical issues in incremental linking. It's
|
2021-01-23 02:54:48 +03:00
|
|
|
not reproducible, so your binary isn't going to be the same as other
|
|
|
|
binaries even if you are compiling the same source tree using the
|
|
|
|
same compiler toolchain. Or, it is complex and there might be a bug
|
|
|
|
in it. If something doesn't work correctly, "remove --incremental
|
|
|
|
from your Makefile and try again" could be a piece of advise, but
|
|
|
|
that isn't ideal.
|
2021-01-22 18:09:13 +03:00
|
|
|
|
2021-01-23 02:54:48 +03:00
|
|
|
So, all in all, incremental linking is tricky. I wanted to make full
|
|
|
|
link as fast as possible, so that we don't have to think about how
|
|
|
|
to workaround the slowness of full link.
|
2021-01-22 18:36:30 +03:00
|
|
|
|
|
|
|
- Defining a completely new file format and use it
|
|
|
|
|
2021-01-23 02:35:31 +03:00
|
|
|
Idea: Sometimes, the ELF file format itself seems to be a limiting
|
|
|
|
factor of improving linker's performance. We might be able to make a
|
|
|
|
far better one if we create a new file format.
|
2021-01-22 18:36:30 +03:00
|
|
|
|
2021-01-23 02:35:31 +03:00
|
|
|
Reason for rejection: I rejected the idea because it apparently has
|
|
|
|
a practical issue (backward compatibility issue) and also doesn't
|
|
|
|
seem to improve performance of linkers that much. As clearly
|
|
|
|
demonstrated by mold, we can create a fast linker for ELF. I believe
|
|
|
|
ELF isn't that bad, after all. The semantics of the existing Unix
|
2021-01-22 18:36:30 +03:00
|
|
|
linkers, such as the name resolution algorithm or the linker script,
|
|
|
|
have slowed the linkers down, but that's not a problem of the file
|
|
|
|
format itself.
|
2021-01-23 02:35:31 +03:00
|
|
|
|
|
|
|
- Watching object files using inotify(2)
|
|
|
|
|
|
|
|
Idea: When mold is running as a daemon for preloading, use
|
2021-01-23 02:54:48 +03:00
|
|
|
inotify(2) to watch file system updates so that it can reload files
|
|
|
|
as soon as they are updated.
|
2021-01-23 02:35:31 +03:00
|
|
|
|
|
|
|
Reason for rejection: Just like the maximum number of files you can
|
|
|
|
simultaneously open, the maximum number of files you can watch using
|
2021-01-23 02:54:48 +03:00
|
|
|
inotify(2) isn't that large. Maybe just a single instance of mold is
|
|
|
|
fine with inotify(2), but it may fail if you run multiple of it.
|
|
|
|
|
|
|
|
The other reason for not doing it is because mold is quite fast
|
|
|
|
without it anyway. Invoking stat(2) on each file for file update
|
|
|
|
check takes less than 100 milliseconds for Chrome, and if most of
|
|
|
|
the input files are not updated, parsing updated files takes almost
|
|
|
|
no time.
|