Our assert() statements are pretty cheap. We want to enable them
even for release builds.
By default, cmake passes -DNDEBUG to disable assert() for release
builds. So, in this patch, we undefine it to re-enable assertions.
You can now build mold with the following commands:
$ mkdir -p out/debug
$ cd out/debug
$ cmake -GNinja -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Debug ../..
$ ninja
To run tests, use the following commands:
$ cd out/debug
$ ctest -j$(nproc)
By default, std::atomic<T>::operator=() stores a given value with
the sequentially-consistent memory ordering, which is a very strong
memory order. We only need the atomicity of store in this case,
so we can use memory_order_relaxed instead.
This change improves performance when we have a lot of section
fragments.
This change shortens the time for garbage collection for clang-13
from 170ms to 100ms.
"oneTBB" used to be a git submodule, and it is now just a subdirectory.
Using the same directory name makes it difficult to switch between
branches. So I rename it "tbb".
Linking clang-13 with debug info takes ~3.6 seconds on a simulated
10-core/20-threads machine. mold spends most of its time (~2.3 seconds)
merging string literals in .debug_str. Input .debug_str sections contain
70 million string literals in total, which is reduced to 2 million after
de-duplication. The input object files contain a lot of duplicates.
clang-13 with debug info is enormous -- it is ~3.1 GiB after linking.
It looks like TBB's concurrent hashmap doesn't scale well with the
input.
In this patch, I implemented our own concurrent hashmap. The hashmap
is extremely lightweight and support only the key-value insertion
operation. It doesn't even support rehashing. It aborts once the hash
table becomes full.
In order to know the correct size for the hashmap before inserting
strings into it, I also implemented HyperLogLog algorithm in this patch.
HyperLogLog is an algorithm that gives a fairly accurate estimate on
the number of unique elements.
With this patch, mold can link clang-13 in ~2.5 seconds, which is ~30%
faster than before.
https://github.com/rui314/mold/issues/73
Previously, a GOT relocation (e.g. R_X86_64_REX_GOTPCRELX) and a
R_X86_64_64 relocation referring the same imported symbols were
resolved to different addresses. Here is why:
- When we saw a R_X86_64_64 relocation against an imported symbol,
we created a PLT and resolve the relocation there.
- GOT relocation is resolved to a GOT entry, which has a true
address of an imported function at runtime, which is different
from PLT entries that redirect calls to the real function.
With this patch, we no longer create a PLT entry for R_X86_64_64.
Instead, we emit a dynamic relocation so that it is always resolved
to a real function address.
Fixes GNU MP's `make check` failure, which was reported at
https://github.com/rui314/mold/issues/81
The grammar of the command is this
VERSION { <version-script> }
where <version-script> is a version script you can specify with
the --version-script option.