1
1
mirror of https://github.com/rui314/mold.git synced 2024-12-27 10:23:41 +03:00
mold/BUGS.md

120 lines
5.6 KiB
Markdown
Raw Normal View History

2020-11-05 12:38:14 +03:00
This is a note about interesting bugs that I met during the
development of the mold linker.
## GNU IFUNC
2020-11-05 12:57:00 +03:00
Problem: A statically-linked "hello world" program mysteriously
crashed in `__libc_start_main` function which is called just after
`_start`.
Investigation: I opened up gdb and found that the program reads a
2020-11-05 13:44:56 +03:00
bogus value from some array. It looks like `memcpy` failed to copy
2020-11-05 12:57:00 +03:00
proper data there. After some investigation, I noticed that `memcpy`
2020-11-05 13:44:56 +03:00
did't copy data at all but instead returned the address of
2020-11-05 12:57:00 +03:00
`__memcpy_avx_unaligned` function, which is a real `memcpy` function
optimized for machines with the AVX registers.
This odd issue was caused by the GNU IFUNC mechanism. That is, if a
function symbol has type `STT_GNU_IFUNC`, the function does not do
what its name suggests to do but instead returns a pointer to a
function that does the actual job. In this case, `memcpy` is an IFUNC
function, and it returns an address of `__memcpy_avx_unaligned` which
is a real `memcpy` function.
IFUNC function addresses are stored to `.got` section in an ELF
executable. The dynamic loader executes all IFUNC functions at
2020-11-05 13:44:56 +03:00
startup to replace their GOT entries with their return values. This
2020-11-05 12:57:00 +03:00
mechanism allows programs to choose the best implementation among
variants of the same function at runtime based on the machine info.
2020-11-05 12:38:14 +03:00
If a program is statically-linked, there's no dynamic loader that
2020-11-05 12:57:00 +03:00
rewrites the GOT entries. Therefore, if a program is
2020-11-05 12:38:14 +03:00
statically-linked, a libc's startup routine does that on behalf of the
2020-11-05 12:57:00 +03:00
dynamic loader. Concretely, a startup routine interprets all dynamic
relocations between `__rela_iplt_start` and `__rela_iplt_start`
symbols. It is linker's responsibility to emit dynamic relocations
for IFUNC symbols even if it is linking a statically-linked program
and mark the beginning and the ending of a `.rela.dyn` section with
the symbols, so that the startup routine can find the relocations.
The bug was my linker didn't define `__rela_iplt_start` and
`__rela_iplt_stop` symbols. Since these symbols are weak, they are
initialized to zero. From the point of the initializer function,
2020-11-05 13:44:56 +03:00
there's no dynamic relocations between `__rela_iplt_start` and
2020-11-05 12:57:00 +03:00
`__rela_iplt_start` symbols. That left GOT entries for IFUNC symbols
untouched.
The proper fix was to emit dynamic relocations for IFUNC symbols and
define the linker-synthesized symbols. I did that, and the bug was
fixed.
2020-11-05 13:34:23 +03:00
## stdio buffering
2020-11-05 13:44:56 +03:00
Problem: A statically-linked "Hello world" program prints out the
message if executed as `./hello`, but it doesn't output anything if
executed as `./hello | cat`.
2020-11-05 13:34:23 +03:00
Investigation: I knew that the default buffering mode for stdout is
line buffering (buffer is flushed on every '\n'), but if it is not
connected to the terminal (i.e. `isatty(2)` returns 0 on
`STDOUT_FILENO`), it automatically switches to full buffering (buffer
is flushed when it becomes full). So, it looks like libc failed to
flush the stdout on program exit for some reason.
I traced all function calls using gdb and noticed that `__libc_atexit`
was not called. That function seemed to be responsible for buffer
flushing. I don't know how exactly I found the root cause, but after
spending an hour or two, I found that `__start___libc_atexit` and
`__stop___libc_atexit` have value 0 in my linker's output while they
mark a section containing the address of `__libc_atexit` in GNU ld's
output.
2020-11-05 13:44:56 +03:00
So, libc doesn't directly call `__libc_atexit` but instead call all
function pointers between `__start___libc_atexit` and
`__stop___libc_atexit` symbols. libc puts `__libc_atexit` address in
`_libc_atexit` section, expecting that the linker automatically
creates the start and the end marker symbols for the section.
2020-11-05 13:34:23 +03:00
There's an obscure linker feature: if a section name is valid as a C
identifier (e.g. `foo` or `_foo_bar` but not `.foo`), the linker
2020-11-05 13:44:56 +03:00
automatically creates marker symbols by prepending `__start_` and
`__stop_` to the section name. My linker lacked the feature.
2020-11-05 13:34:23 +03:00
I implemented the feature, and the bug was fixed.
## TLS variable initialization
Problem: A statically-linked "hello world" program crashes after
reading a thread-local variable.
Investigation: Thread-local variables are very different from other
types of varaibles because there may be more than one instance of the
2020-11-05 13:44:56 +03:00
same variable in memory. Each thread has its copy of thread-local
2020-11-05 13:34:23 +03:00
varaibles. `%fs` segment register points the end of the variable area
for the current thread, and the variables are accessed as an offset
from `%fs`.
Thread-local variables may be initialized (e.g. `thread_local int x =
5;`). The linker gathers all thread-local variables and put them into
`PT_TLS` segment. At runtime, the contents of the segment is used as
2020-11-05 13:44:56 +03:00
an "initialization image" for new threads. When a new thread is
created, the image is memcpy'ed to the new thread's thread-local
variable area. The initialization image itself is read-only at
runtime.
2020-11-05 13:34:23 +03:00
It took more than a day to find out the location where the memcpy call
2020-11-05 13:44:56 +03:00
copies the initialization image to is different from the location
where the thread-local variables reside. As a result, thread-local
variables have garbage as initial values, and the program crashes when
using them.
2020-11-05 13:34:23 +03:00
The problem is that I set a very large value (4096) to the alignment
of `PT_TLS` segment. All `PT_LOAD` segments are naturally aligned to
the page boundary, so I use the same value for `PT_TLS`, but that was
a mistake. When a thread initialization routine sets a value to `%fs`,
it first aligns the end of the thread-local variable area address to
2020-11-05 13:44:56 +03:00
`PT_TLS` alignment value. So, if you set a large value to `PT_TLS`
2020-11-05 13:34:23 +03:00
alignment, `%fs` is set to a wrong place.
I fixed `PT_TLS` alignment, and the bug was gone.