In C++, the size of an empty class is 1 rather than 0 because every
object is guaranteed to have a unique address. But if we do not need
the unique address guarantee, we can add `[[no_unique_address]]` to
a class member to save 1 byte. This is a new C++20 feature.
Previously, we set the address of `_GLOBAL_OFFSET_TABLE_` to `.got.plt`
on i386 and x86-64 and to `.got` on other targets. But it looks like
this special treament of x86 isn't necessary. The x86-64 psABI said
that the symbol can even be in the middle of `.got` (*1). If `.got.plt`
is missing, GNU linker set it to `.got`.
This commit unconditionally set the symbol value to `.got`.
(*1) x86-64 psABI 1.0 p.77: "The symbol _GLOBAL_OFFSET_TABLE_ may
reside in the middle of the .got section, allowing both negative and
non-negative offsets into the array of addresses."
Despite its comment, gcc sometimes failed to optimize the loop into
single `bswap` instruction. So it looks like we should explicitly use
`__builtin_bswap` functions.
Usually, init_array is readable and writable, but it looks like rustc
with `-C lto=fat` create a read-only .init_array. mold used to create
two output .init_array sections for read/write and read-only
init_array input sections. That caused one of them are not executed
on startup.
Now, mold create read/write .init_array output section even if an
input .init_array section is read-only.
Fixes https://github.com/rui314/mold/issues/363
We used to emit the following sequence for each PLT entry.
0xf3, 0x0f, 0x1e, 0xfa, // endbr64
0xff, 0x25, 0, 0, 0, 0, // jmp *foo@GOTPLT
0xe8, 0, 0, 0, 0, // call PLT[0]
In the above instruction sequence, `jmp` was expected to jump to the
following `call` instruction if the PLT entry is executed for the
first time. However, it wouldn't work because we needed another
`endbr64` before `call`.
This problem should be fixed in this commit. Now, the PLT header is
32 bytes long and each entry is 16 bytes.
IBT-enabled PLT section is sparse as 2/3 of the section contents are
NOP instructions. In this patch, I implemented alternative instruction
sequences for IBTPLT, so that each IBTPLT entry occupies only 16 bytes
instead of 32 bytes.
In return, our new IBTPLT section needs a 64 bytes header instead of a
16 bytes one, but it's overall a good deal because we usually have many
PLT entries.
When we relax R_X86_64_TLSGD or R_X86_64_TLSGD, we rewrite two
instructions (the one referred by TLSGD/TLSLD and the following one
referred by the following relocation). The second instruction is
usually 5 bytes, but if it can be longer than that, and if that's the
case, we need to emit a nop to fill the gap at the end of the longer
instruction.
We didn't do that. As a result, the remaining garbage of the second
instruction is executed and caused an unpredictable result (illegal
instruction or segv).
This patch fixes the issue.
Fixes https://github.com/rui314/mold/issues/360