1
1
mirror of https://github.com/rui314/mold.git synced 2024-10-05 00:57:08 +03:00
Commit Graph

240 Commits

Author SHA1 Message Date
Rui Ueyama
31ca2cc2ae [ELF] Compile an Aho-Corasick tree into a DFA 2022-01-22 12:35:49 +09:00
Rui Ueyama
e9a5d1cee3 [ELF] Handle glob patterns directly instead of converting them to regex
Converting a glob pattern to a regex is fragile, and std::regex is slow.
So we should handle glob patterns ourselves.
2022-01-21 23:32:27 +09:00
Rui Ueyama
3f4e570320 Simplify 2022-01-21 20:59:19 +09:00
Rui Ueyama
d0c1c4db19 [ELF] Use Aho-Corasick algo to handle version script patternsC
To process version scriots, we have to match glob patterns against
symbol strings. Sometimes, we have hundreds or thousands of glob
patterns and have to match them against millions of mangled long
C++ symbol names. This step can be very slow.

In this patch, I implemented the Aho-Corasick algorithm to match glob
patterns to symbol strings as quickly as possible. For the details
of the algorithm, see https://en.wikipedia.org/wiki/Glob_(programming).

This patch improves mold's performance for programs that uses large
version scripts. For example, linking libQt6Gui.so.6.3.0 reduced from
1.10s to 0.05s with this patch.

This patch also changes how symbol versions are applied if two or more
version patterns match to a single symbol string. Previously, the last
one in a script file took precedence. Now, the first one takes
precedence. I believe the new behavior is compatible with GNU ld.

Fixes https://github.com/rui314/mold/issues/156
Fixed https://github.com/rui314/mold/issues/287
2022-01-21 20:30:05 +09:00
Rui Ueyama
92876820cb [ELF/ARM64] Support range extension thunks
ARM64 branch instructions have only a 25-bit displacement. Non-thumb
instructions are always aligned to 4 byte boundaries, but with that
implicit trailing two zeros considered, they can represent only a 27-bit
displacement. That means branch instructions can jump to a location only
if it is within a ±128 MiB range.

If a branch destination is farther than that, a linker has to emit machine
code sequence that constructs a full 32-bit address in a register to jump
to the final destination, and redirect the branch to that code sequence.
Such code sequence is called a "range extension thunk" or just "thunk".

Previously, mold didn't support range extension thunks, so it couldn't link
large programs. That would fail with an "relocation out of range" error.
Now, mold gained a feature to create thunks and can link large programs.

Thunk creation is an interesting algorithmic problem. We need to insert
a thunk for at least in every 128 MiB chunk, because otherwise branch
instructions wouldn't be able to jump to a thunk. Adding an entry to a thunk
could slightly enlarge the distance between a branch instruction location
and its destination if the thunk is in between them. That could make the
branch that was previously reachable unreachable.

Usually, this problem is solved by an iterative algorithm. With the
iterative algorithm, a linker check for reachability of all relocations,
create new thunks if necessary, and repeat it until no new thunks are
created.

I implemented a different algorithm than that in this patch. The algorithm
implemented in this patch is guaranteed to work in O(n) where n is the
number of relocations. This algorithm might be novel.

And the algorithm implemented in this patch is quite fast. It can create
thunks in 80 milliseconds on a 16-core Amazon Graviton 2 machine for
clang-14 that has an ~100MB .text section.
2022-01-20 03:16:50 +00:00
Rui Ueyama
be41fdbc21 [ELF] Rename variables
`ctx.arg` should contain only variables that can directly be mapped
to command line arguments.
2022-01-20 08:35:52 +09:00
Rui Ueyama
43b2a1e423 [ELF] Do not scan .text for .relr.dyn 2022-01-17 20:40:22 +09:00
Rui Ueyama
04ba8f4cbd [ELF] Rename a function 2022-01-17 10:26:23 +09:00
Rui Ueyama
1152b6eb38 [ELF] Remove redundant assignment 2022-01-16 08:41:26 +09:00
Rui Ueyama
bd6afa1b23 [ELF] Add -pack-dyn-relocs=relr
.relr.dyn is a new section that has been implemented in other linkers
recently. That section contains only the RELATIVE-type dynamic
relocations (i.e. base relocations). Compared to the regular
.rela.dyn, a .relr.dyn's size is typically less than 1/10 because the
section is compressed.

Since PIEs (position-independent executables) tend to contain lots of
RELATIVE-type relocations and PIEs are now the default on many Linux
distributions for security reasons, .relr.dyn is more effective than
it was. It can reduce binary size by a few percent or more.

Note that the runtime support is catching up, so binaries built with
`-pack-dyn-relocs=relr` may not work on your system unless you are
running a very recent version of Linux.
2022-01-14 20:54:37 +09:00
Rui Ueyama
2ee2243c63 Refactor 2022-01-13 23:53:53 +09:00
Rui Ueyama
d8dc8a6787 [ELF] Optimize relocation processing for non-alloc sections
In mold, a relocation refers either a symbol or a section piece.
A section piece is a segment of a mergeable section such as
.rodata.str.1 or .debug_str.

Previously, we preprocessed all relocations referring mergeable
sections to find their corresponding section pieces. We did this
because we want to find a relocation target as quickly as possible
for `gc_sections` and `scan_rels`.

However, we didn't actually have to do that for non-alloc sections,
as non-alloc sections are not subject to neither `gc_sections` nor
`scan_rels`. So, we could skip relocation preprocessing for non-alloc
sections. This commit implement that optimization.

It looks like this is an effective optimization for programs that have
large debug info because debug sections tend to contain a lot of
relocations referring a .debug_str which is a mergeable section.
Here is a few notable examples.

            Output size  Before  After
  clang-14  2.2 GiB      1.455s  1.396s  (4% faster)
  mongodb   4.8 GiB      2.341s  1.925s  (17% faster)

These nubmers were measured on a simulated 16-core 32-thread machine.
2022-01-13 23:22:03 +09:00
Rui Ueyama
d3cd2e780f Remove a redundant data structure 2022-01-13 20:38:43 +09:00
Rui Ueyama
d2363f1819 Fix warning for the case if -DDEBUG is given 2022-01-13 16:51:12 +09:00
Rui Ueyama
9d27ee0839 [ELF] Skip incompatible files specified by linker script GROUP command
Fixes https://github.com/rui314/mold/issues/260
2022-01-13 10:42:13 +09:00
Rui Ueyama
e78608a37a [ELF] Rename a file
We collectively call .o and .so files as input files, so we should
name this file input-files.cc.
2022-01-12 20:41:49 +09:00
Rui Ueyama
d2c57e99ca Refactor 2022-01-12 20:35:41 +09:00
Rui Ueyama
205138f16e Refactor SharedFile class
This change factors out `elf_syms` and `first_global` from SharedFile
and move them into the base class.
2022-01-12 15:58:33 +09:00
Rui Ueyama
cafc57fa65 Simplify 2022-01-12 11:16:21 +09:00
Rui Ueyama
ce5749ce2f [ELF] Make symbol resolution deterministic
Previously, our parallel symbol resolution algorithm was not
deterministic in edge cases. As an example, consider the following two
source files:

  foo.c:
    inline void fn1() { ... }

  bar.c:
    inline void fn1() { ... }
    void fn2() { ... }

Let's say you compile these files and put them into an archive file.
If mold decided to pull out `foo.o` first for `fn1` and then `bar.o`
for `fn2`, then both `foo.o` and `bar.o` are included into a result.
However, if mold pulled out `bar.o` first, then there's no chance for
`foo.o` to be pulled out, so only `foo.o` would be included into a
result.

The algorithm implemented in this commit should be deterministic.
We do not override symbols when we mark live objects.
2022-01-12 11:16:21 +09:00
Rui Ueyama
9ca6a9dc5e [ELF] Add -z ibt
Fixes https://github.com/rui314/mold/issues/229
2022-01-09 12:38:38 +09:00
Rui Ueyama
31a43a7ba6 [ELF] Add -z cet-report
Fixes https://github.com/rui314/mold/issues/229
2022-01-09 12:34:32 +09:00
Rui Ueyama
778f8629eb Refactor 2022-01-09 12:34:32 +09:00
Rui Ueyama
e29bd8f42b [ELF] Add -z shstk
Fixes https://github.com/rui314/mold/issues/229
2022-01-09 12:34:24 +09:00
Rui Ueyama
fbfa01dcd1 [ELF] Implement -z ibtplt
https://github.com/rui314/mold/issues/229
2022-01-08 14:09:12 +09:00
Rui Ueyama
18367e69a8 Refactor 2022-01-08 13:24:29 +09:00
Rui Ueyama
644fdc8a2f Simplify 2022-01-07 20:42:25 +09:00
Rui Ueyama
0e17dbeda8 [ELF] Make --defsym'ed symbols absolute
If a symbol is defined in the form of --defsym=foo=0x<hexvalue>,
it should be defined as an absolute symbol with the given value.
2022-01-07 16:28:14 +09:00
Rui Ueyama
79e397ea03 Add a missing #include 2022-01-06 20:54:23 +09:00
Rui Ueyama
daa88f2f06 [ELF] Handle '[]' in glob patterns
Previously, mold crashes due to an invalid regex pattern exception
when `[...]` is given as a version script pattern.

Fixes https://github.com/rui314/mold/issues/258
2022-01-06 20:45:46 +09:00
Rui Ueyama
43fa021d48 [ELF] Do not place non-exported symbols into .gnu.hash
Previously, mold put all global symbols into .gnu.hash. Although I
believe it was not an error, it bloated the size of .gnu.hash because
.gnu.hash needs only exported symbols.

https://github.com/rui314/mold/issues/255
2022-01-06 17:47:19 +09:00
Rui Ueyama
907f713e51 [ELF] Remove NEEDS_DYNSYM flag from symbol
I wanted to make sure that a symbol X is in .dynsym if and only if
(X.is_imported || X.is_exported).
2022-01-06 15:27:39 +09:00
Rui Ueyama
be49d673a5 [ELF] Deduce emulation from input files if -m is not given 2022-01-05 19:55:32 +09:00
Rui Ueyama
a5029d19a8 [ELF] Automatically fall back to ld.bfd or ld.lld if LTO is in use
This is very hacky but highly practical, so I couldn't resist to not
implement this. We should support LTO natively in the future. In the
meantime, this feature should work as a poor-man's replacement.

Fixes https://github.com/rui314/mold/issues/242
2022-01-04 20:50:16 +09:00
Rui Ueyama
a2839f60dd [ELF] Improve an error message
Fixes https://github.com/rui314/mold/issues/233
2022-01-04 13:49:48 +09:00
Rui Ueyama
9894b3173b [ELF] Add --default-symver
Fixes https://github.com/rui314/mold/issues/228
2022-01-03 20:29:33 +09:00
Rui Ueyama
d100a735a0 Refactor: rename a function 2021-12-31 21:28:16 +09:00
Rui Ueyama
902f23c456 Refactor 2021-12-31 21:04:07 +09:00
Rui Ueyama
c92e192e56 [ELF] Remove redundant inline keywords 2021-12-31 21:02:06 +09:00
Rui Ueyama
440ff27e58 [ELF] Refactor 2021-12-31 21:01:59 +09:00
Rui Ueyama
085e3c3abe Simplify 2021-12-31 18:04:39 +09:00
Rui Ueyama
58e43633fc Revert "Revert "Use C++17 filesystem API""
This reverts commit f29a85f20a with a fix
for a build failure.
2021-12-31 16:39:54 +09:00
Rui Ueyama
f29a85f20a Revert "Use C++17 filesystem API"
This reverts commit f6e91df440 because
it causes build breakages for a lot of Gentoo packages.
2021-12-30 22:15:52 +09:00
Rui Ueyama
f6e91df440 Use C++17 filesystem API 2021-12-30 21:32:01 +09:00
Rui Ueyama
d49c50ddaf [ELF] Add --defsym
Fixes https://github.com/rui314/mold/issues/208
2021-12-30 14:48:39 +09:00
Rui Ueyama
f3766cda81 [ELF] Add -z {max,common}-page-size
Fixes https://github.com/rui314/mold/issues/203
2021-12-29 17:14:08 +09:00
Rui Ueyama
6e290aab3e [ELF] Implement --color-diagnostics 2021-12-25 16:55:51 +09:00
Rui Ueyama
5601cf4236 [ELF] Add -z separate-code, -z noseparate-code and -z separate-lodable-segments
Fixes https://github.com/rui314/mold/issues/172
2021-12-24 20:28:45 +09:00
Rui Ueyama
8c86c28496 Add -z nodefaultlib
Fixes https://github.com/rui314/mold/issues/184
2021-12-23 15:01:57 +09:00
Rui Ueyama
530568e662 [ELF] Use a better type 2021-12-20 20:08:25 +09:00
Rui Ueyama
455e729393 [ELF] Make version script application faster
Instead of calling regex match multiple times, create a single
regex and call regex match function only once.

https://github.com/rui314/mold/issues/156
2021-12-17 19:09:26 +09:00
Rui Ueyama
04d5abc9eb Attempt to fix an ODR violation 2021-12-16 20:34:13 +09:00
Rui Ueyama
733bb6354f Rename variables 2021-12-11 21:40:57 +09:00
Rui Ueyama
54d75153d2 Rename variables 2021-12-07 21:55:04 +09:00
Rui Ueyama
84cd674ff1 Revert "[ELF] Rename section piece -> subsection"
This reverts commit 3fc9c6c3ed.
2021-12-07 14:44:32 +09:00
Rui Ueyama
58b74bf961 Revert "[ELF] s/SectionFragment/Subsection/g"
This reverts commit aa46df7735.
2021-12-07 14:44:11 +09:00
Rui Ueyama
22116629f1 [ELF] Add --start-lib and --end-lib
Fixes https://github.com/rui314/mold/issues/133
2021-12-06 20:09:46 +09:00
Rui Ueyama
986516545a [ELF] Do not share OutputFile between ELF and Mach-O 2021-10-26 14:06:11 +09:00
Rui Ueyama
ca9a6d0843 Move code from elf/output-file.cc to output-file.h 2021-10-05 11:37:30 +09:00
Rui Ueyama
30b45d2263 [ELF] Rename Symbol<E>::intern -> intern 2021-10-05 08:31:28 +09:00
Rui Ueyama
3fc9c6c3ed [ELF] Rename section piece -> subsection 2021-10-05 08:22:38 +09:00
Rui Ueyama
77069156cb [ELF] Rename OutputChunk Chunk 2021-10-04 10:16:24 +09:00
Rui Ueyama
928c39937a Refactor 2021-10-03 16:26:31 +09:00
Rui Ueyama
001cf042d9 Refactor 2021-10-03 16:01:59 +09:00
Rui Ueyama
fcd10254b7 Rename variables 2021-10-03 16:01:59 +09:00
Rui Ueyama
ebc8a68cb3 [ELF] Rename variables 2021-10-02 14:12:42 +09:00
Rui Ueyama
52c7793326 Move archive-file.cc out of elf directory 2021-09-30 23:13:27 +09:00
Rui Ueyama
7c9205f68d Move memory-mapped-file.cc out of elf directory
So that we can use the class from mold/mach-o.
2021-09-30 23:13:22 +09:00
Rui Ueyama
a2c9d0ad4d [ELF] Reduce the size of Subsection struct 2021-09-29 16:30:04 +09:00
Rui Ueyama
aa46df7735 [ELF] s/SectionFragment/Subsection/g 2021-09-29 16:18:41 +09:00
Rui Ueyama
2844f1f573 [ELF] Refactor 2021-09-28 13:23:58 +09:00
Rui Ueyama
bc2bd0e26a [ELF] Refactor 2021-09-27 23:01:51 +09:00
Rui Ueyama
c40b9aea5f [Mach-O] Generalize perf.cc so that we can use the feature in mold/mach-o 2021-09-27 18:14:56 +09:00
Rui Ueyama
541399911d [ELF] Refactor 2021-09-26 19:55:46 +09:00
Rui Ueyama
a5d02b18c7 Refactor 2021-09-25 21:47:09 +09:00
Rui Ueyama
7232178513 [Mach-O] wip 2021-09-16 14:20:38 +09:00
Rui Ueyama
150565601b Move error handlers from mold::elf to mold 2021-09-15 15:32:42 +09:00
Rui Ueyama
6ab3ddaf8e Move cleanup handlers from mold::elf to mold 2021-09-15 15:25:14 +09:00
Rui Ueyama
7ffc3a4545 [Mach-O] Add a feature to dump an executable
This is not a linker feature, but in order to learn how Mach-O
executables are constructed, I'll implement a dump feature.
I'll remove the feature once I understand the structure of Mach-O
binaries.
2021-09-13 18:16:13 +09:00
Rui Ueyama
e051ad2a9a Do not define _GNU_SOURCE
We should not depend on glibc-specific features.
2021-09-13 17:29:49 +09:00
Rui Ueyama
08b61f29d2 [ELF] Add --require-defined 2021-09-12 18:34:42 +09:00
Rui Ueyama
0a4305f82a Inline a few functions 2021-09-11 14:54:55 +09:00
Rui Ueyama
2e5f480a97 Simplify input file preloading logic 2021-09-10 20:41:20 +09:00
Rui Ueyama
8558a98bf6 Simplify 2021-09-10 16:05:45 +09:00
Rui Ueyama
68ff4a8408 Rename functions 2021-09-08 20:15:03 +09:00
Rui Ueyama
dc7e0ca4bf Remove a useless macro 2021-09-08 19:51:02 +09:00
Rui Ueyama
78bddacd5b Move target-independent files to the top directory 2021-09-08 19:49:51 +09:00
Rui Ueyama
cab0ccf0bd [Mach-O] Add a stub for Mach-O 2021-09-08 19:03:07 +09:00
Rui Ueyama
c21302e28a Do not use strerror
strerror is not guaranteed to be thread-safe
2021-09-05 17:15:15 +09:00
Rui Ueyama
459b5973bb Move code to elf sub-directory 2021-09-02 23:16:49 +09:00