2020-11-05 12:38:14 +03:00
|
|
|
This is a note about interesting bugs that I met during the
|
|
|
|
development of the mold linker.
|
|
|
|
|
|
|
|
## GNU IFUNC
|
|
|
|
|
2020-11-05 12:57:00 +03:00
|
|
|
Problem: A statically-linked "hello world" program mysteriously
|
|
|
|
crashed in `__libc_start_main` function which is called just after
|
|
|
|
`_start`.
|
|
|
|
|
|
|
|
Investigation: I opened up gdb and found that the program reads a
|
2020-11-05 13:44:56 +03:00
|
|
|
bogus value from some array. It looks like `memcpy` failed to copy
|
2020-11-05 12:57:00 +03:00
|
|
|
proper data there. After some investigation, I noticed that `memcpy`
|
2020-11-05 13:44:56 +03:00
|
|
|
did't copy data at all but instead returned the address of
|
2020-11-05 12:57:00 +03:00
|
|
|
`__memcpy_avx_unaligned` function, which is a real `memcpy` function
|
|
|
|
optimized for machines with the AVX registers.
|
|
|
|
|
|
|
|
This odd issue was caused by the GNU IFUNC mechanism. That is, if a
|
|
|
|
function symbol has type `STT_GNU_IFUNC`, the function does not do
|
|
|
|
what its name suggests to do but instead returns a pointer to a
|
|
|
|
function that does the actual job. In this case, `memcpy` is an IFUNC
|
|
|
|
function, and it returns an address of `__memcpy_avx_unaligned` which
|
|
|
|
is a real `memcpy` function.
|
|
|
|
|
|
|
|
IFUNC function addresses are stored to `.got` section in an ELF
|
|
|
|
executable. The dynamic loader executes all IFUNC functions at
|
2020-11-05 13:44:56 +03:00
|
|
|
startup to replace their GOT entries with their return values. This
|
2020-11-05 12:57:00 +03:00
|
|
|
mechanism allows programs to choose the best implementation among
|
|
|
|
variants of the same function at runtime based on the machine info.
|
2020-11-05 12:38:14 +03:00
|
|
|
|
|
|
|
If a program is statically-linked, there's no dynamic loader that
|
2020-11-05 12:57:00 +03:00
|
|
|
rewrites the GOT entries. Therefore, if a program is
|
2020-11-05 12:38:14 +03:00
|
|
|
statically-linked, a libc's startup routine does that on behalf of the
|
2020-11-05 12:57:00 +03:00
|
|
|
dynamic loader. Concretely, a startup routine interprets all dynamic
|
|
|
|
relocations between `__rela_iplt_start` and `__rela_iplt_start`
|
|
|
|
symbols. It is linker's responsibility to emit dynamic relocations
|
|
|
|
for IFUNC symbols even if it is linking a statically-linked program
|
|
|
|
and mark the beginning and the ending of a `.rela.dyn` section with
|
|
|
|
the symbols, so that the startup routine can find the relocations.
|
|
|
|
|
|
|
|
The bug was my linker didn't define `__rela_iplt_start` and
|
|
|
|
`__rela_iplt_stop` symbols. Since these symbols are weak, they are
|
|
|
|
initialized to zero. From the point of the initializer function,
|
2020-11-05 13:44:56 +03:00
|
|
|
there's no dynamic relocations between `__rela_iplt_start` and
|
2020-11-05 12:57:00 +03:00
|
|
|
`__rela_iplt_start` symbols. That left GOT entries for IFUNC symbols
|
|
|
|
untouched.
|
|
|
|
|
|
|
|
The proper fix was to emit dynamic relocations for IFUNC symbols and
|
|
|
|
define the linker-synthesized symbols. I did that, and the bug was
|
|
|
|
fixed.
|
2020-11-05 13:34:23 +03:00
|
|
|
|
|
|
|
## stdio buffering
|
|
|
|
|
2020-11-05 13:44:56 +03:00
|
|
|
Problem: A statically-linked "Hello world" program prints out the
|
|
|
|
message if executed as `./hello`, but it doesn't output anything if
|
|
|
|
executed as `./hello | cat`.
|
2020-11-05 13:34:23 +03:00
|
|
|
|
|
|
|
Investigation: I knew that the default buffering mode for stdout is
|
|
|
|
line buffering (buffer is flushed on every '\n'), but if it is not
|
|
|
|
connected to the terminal (i.e. `isatty(2)` returns 0 on
|
|
|
|
`STDOUT_FILENO`), it automatically switches to full buffering (buffer
|
|
|
|
is flushed when it becomes full). So, it looks like libc failed to
|
|
|
|
flush the stdout on program exit for some reason.
|
|
|
|
|
|
|
|
I traced all function calls using gdb and noticed that `__libc_atexit`
|
|
|
|
was not called. That function seemed to be responsible for buffer
|
|
|
|
flushing. I don't know how exactly I found the root cause, but after
|
|
|
|
spending an hour or two, I found that `__start___libc_atexit` and
|
|
|
|
`__stop___libc_atexit` have value 0 in my linker's output while they
|
|
|
|
mark a section containing the address of `__libc_atexit` in GNU ld's
|
|
|
|
output.
|
|
|
|
|
2020-11-05 13:44:56 +03:00
|
|
|
So, libc doesn't directly call `__libc_atexit` but instead call all
|
|
|
|
function pointers between `__start___libc_atexit` and
|
|
|
|
`__stop___libc_atexit` symbols. libc puts `__libc_atexit` address in
|
|
|
|
`_libc_atexit` section, expecting that the linker automatically
|
|
|
|
creates the start and the end marker symbols for the section.
|
2020-11-05 13:34:23 +03:00
|
|
|
|
|
|
|
There's an obscure linker feature: if a section name is valid as a C
|
|
|
|
identifier (e.g. `foo` or `_foo_bar` but not `.foo`), the linker
|
2020-11-05 13:44:56 +03:00
|
|
|
automatically creates marker symbols by prepending `__start_` and
|
|
|
|
`__stop_` to the section name. My linker lacked the feature.
|
2020-11-05 13:34:23 +03:00
|
|
|
|
|
|
|
I implemented the feature, and the bug was fixed.
|
|
|
|
|
|
|
|
## TLS variable initialization
|
|
|
|
|
|
|
|
Problem: A statically-linked "hello world" program crashes after
|
|
|
|
reading a thread-local variable.
|
|
|
|
|
|
|
|
Investigation: Thread-local variables are very different from other
|
|
|
|
types of varaibles because there may be more than one instance of the
|
2020-11-05 13:44:56 +03:00
|
|
|
same variable in memory. Each thread has its copy of thread-local
|
2020-11-05 13:34:23 +03:00
|
|
|
varaibles. `%fs` segment register points the end of the variable area
|
|
|
|
for the current thread, and the variables are accessed as an offset
|
|
|
|
from `%fs`.
|
|
|
|
|
|
|
|
Thread-local variables may be initialized (e.g. `thread_local int x =
|
|
|
|
5;`). The linker gathers all thread-local variables and put them into
|
|
|
|
`PT_TLS` segment. At runtime, the contents of the segment is used as
|
2020-11-05 13:44:56 +03:00
|
|
|
an "initialization image" for new threads. When a new thread is
|
|
|
|
created, the image is memcpy'ed to the new thread's thread-local
|
|
|
|
variable area. The initialization image itself is read-only at
|
|
|
|
runtime.
|
2020-11-05 13:34:23 +03:00
|
|
|
|
|
|
|
It took more than a day to find out the location where the memcpy call
|
2020-11-05 13:44:56 +03:00
|
|
|
copies the initialization image to is different from the location
|
|
|
|
where the thread-local variables reside. As a result, thread-local
|
|
|
|
variables have garbage as initial values, and the program crashes when
|
|
|
|
using them.
|
2020-11-05 13:34:23 +03:00
|
|
|
|
|
|
|
The problem is that I set a very large value (4096) to the alignment
|
|
|
|
of `PT_TLS` segment. All `PT_LOAD` segments are naturally aligned to
|
|
|
|
the page boundary, so I use the same value for `PT_TLS`, but that was
|
|
|
|
a mistake. When a thread initialization routine sets a value to `%fs`,
|
|
|
|
it first aligns the end of the thread-local variable area address to
|
2020-11-05 13:44:56 +03:00
|
|
|
`PT_TLS` alignment value. So, if you set a large value to `PT_TLS`
|
2020-11-05 13:34:23 +03:00
|
|
|
alignment, `%fs` is set to a wrong place.
|
|
|
|
|
|
|
|
I fixed `PT_TLS` alignment, and the bug was gone.
|