mirror of
https://github.com/rui314/mold.git
synced 2024-09-20 09:27:45 +03:00
213 lines
9.4 KiB
C++
213 lines
9.4 KiB
C++
// This file contains helper functions for thread-local storage (TLS).
|
|
// TLS is probably the most obscure feature the linker has to support,
|
|
// so I'll explain it in detail in this comment.
|
|
//
|
|
// TLS is a per-thread storage. Thread-local variables (TLVs) are in a TLS
|
|
// so that each thread has its own set of thread-local variables. Taking
|
|
// an address of a TLV returns a unique value for each thread. For example,
|
|
// `&foo` for the following code returns different pointer values for
|
|
// different threads.
|
|
//
|
|
// thread_local int foo;
|
|
//
|
|
// TLV is a relatively new feature. C for example didn't provide the
|
|
// official support for it through the keyword `thread_local` until C11.
|
|
// TLV needs a coordination between the compiler, the linker and the
|
|
// runtime to work correctly.
|
|
//
|
|
// An ELF exectuable or a shared library using TLV contains a "TLS
|
|
// template image" in the PT_TLS segment. For each newly created thread
|
|
// including the initial one, the runtime allocates a contiguous memory
|
|
// for TLS template images for an executable and its depending shared
|
|
// libraries and copies template images there. That per-thread memory is
|
|
// called the "TLS block". After allocating and initializing a TLS block,
|
|
// the runtime sets a register to refer to the TLS block, so that the
|
|
// thread-local variables are accessible relative to the register.
|
|
//
|
|
// The register referring the per-thread storage is called the Thread
|
|
// Pointer (TP). TP is part of the thread's context. When the kernel
|
|
// scheduler switches threads, TP is saved and restored automatically just
|
|
// like other registers are.
|
|
//
|
|
// The TLS template image is read-only. It contains TLVs' initial values
|
|
// for new threads, and no one writes to it at runtime.
|
|
//
|
|
// Now, let's think about how to access a TLV. We need to know the TLV's
|
|
// address to access it which can be done in various ways as follows:
|
|
//
|
|
// 1. If we are creating an executable, we know the exact size of the TLS
|
|
// template image we are creating, and we know where the TP will be
|
|
// set to after the template is copied to the TLS block. Therefore,
|
|
// the TP-relative address of a TLV in the main executable is known at
|
|
// link-time. That means, computing a TLV's address can be as easy as
|
|
// `add %dst, %tp, <link-time constant>`.
|
|
//
|
|
// 2. If we are creating a shared library, we don't exactly know where
|
|
// its TLS template image will be copied to in terms of the
|
|
// TP-relative address, because we don't know how large is the main
|
|
// executable's and other libraries' TLS template images are. Only the
|
|
// runtime knows the exact TP-relative address.
|
|
//
|
|
// We can solve the problem with an indirection. Specifically, for
|
|
// each TLV whose TP-relative address is only known at process startup
|
|
// time, we create a GOT entry to store its TP-relative address. We
|
|
// also emit a dynamic relocation to let the runtime to fill the GOT
|
|
// entry with a per-TLV TP-relative address.
|
|
//
|
|
// Computing a TLV address in this scheme needs at least two machine
|
|
// instructions; first one loads a value from a GOT entry, and the
|
|
// second adds the loaded value to TP.
|
|
//
|
|
// 3. Now, think about libraries that you dynamically load with dlopen.
|
|
// Even the runtime doesn't know how many bytes has to be reserved for
|
|
// such libraries, so TLVs in dlopen'ed libraries cannot be allocated
|
|
// in the initial TLS block.
|
|
//
|
|
// The address of such TLV is obtained by calling a libc-provided
|
|
// function, __tls_get_addr(). The function takes two arguments; a
|
|
// module ID to identify the ELF file and the TLV's offset within the
|
|
// ELF file's TLS template image. Accessing a TLV is sometimes
|
|
// compiled to a function call! The module ID and the offset are
|
|
// usually stored to GOT as two consecutive words.
|
|
//
|
|
// The last access method is the most generic, so the compiler emits such
|
|
// code by default. But that's the most expensive one, so the linker
|
|
// rewrites instructions if possible so that 3) is relaxed to 2) or even
|
|
// to 1).
|
|
//
|
|
// 1) is often called the Local Exec access model. 2) is Initial Exec, and
|
|
// 3) is called General Dynamic.
|
|
//
|
|
// There's also a trick that the compiler can use if it knows two TLVs are
|
|
// in the same ELF file (usually in the same file as the code is). In this
|
|
// case, we can call __tls_get_addr() only once with a module ID and the
|
|
// offset 0 to obtain the base address of the ELF file's TLS block.
|
|
// The base address obtained this way is sometimes called Dynamic Thread
|
|
// Pointer or DTP. We can then compute TLVs' addresses by adding their
|
|
// DTP-relative addresses to DTP. This access model is called the Local
|
|
// Dynamic.
|
|
//
|
|
//
|
|
// === TLS Descriptor access model ===
|
|
//
|
|
// As described above, there are arguably too many different TLS access
|
|
// models from the most generic one you can use in any ELF file to the
|
|
// most efficient one you can use only in the main executable. Compiling
|
|
// source code with an appropriate TLS access model is bothersome.
|
|
// To solve the problem, a new TLS access model was proposed. That is
|
|
// called the TLS Descriptor (TLSDESC) model.
|
|
//
|
|
// For a TLV compiled with TLSDESC, we allocate two consecutive GOT slots
|
|
// and create a TLSDESC dynamic relocation for them. The dynamic linker
|
|
// sets a function pointer to the first GOT slot and its argument to the
|
|
// second slot.
|
|
//
|
|
// To access the TLV, we call the function pointer with the argument we
|
|
// read from the second GOT slot. The function returns the TLV's
|
|
// TP-relative address.
|
|
//
|
|
// The runtime chooses the best access method depending on the situation
|
|
// and sets a pointer to the most efficient code to the first GOT slot.
|
|
// For example, if a TLV's TP-relative address is known at process startup
|
|
// time, the runtime sets that address to the second GOT slot and set a
|
|
// function that just returns its argument to the first GOT slot.
|
|
//
|
|
// With TLSDECS, the compiler can always emit the same code for TLVs
|
|
// without sacrificing runtime performance.
|
|
//
|
|
// TLSDESC is better than the traditional, non-TLSDESC TLS access models.
|
|
// It's the default on ARM64, but on other targets, TLSDESC is
|
|
// unfortunately either optional or even not supported at all. So we still
|
|
// need to support both the traditional TLS models and the TLSDESC model.
|
|
|
|
#include "mold.h"
|
|
|
|
namespace mold::elf {
|
|
|
|
template <typename E>
|
|
static ElfPhdr<E> *get_tls_segment(Context<E> &ctx) {
|
|
if (ctx.phdr)
|
|
for (ElfPhdr<E> &phdr : ctx.phdr->phdrs)
|
|
if (phdr.p_type == PT_TLS)
|
|
return &phdr;
|
|
return nullptr;
|
|
}
|
|
|
|
template <typename E>
|
|
u64 get_tls_begin(Context<E> &ctx) {
|
|
if (ElfPhdr<E> *phdr = get_tls_segment(ctx))
|
|
return phdr->p_vaddr;
|
|
return 0;
|
|
}
|
|
|
|
// Returns the TP address which can be used for efficient TLV accesses in
|
|
// the main executable. TP at runtime refers to a per-process TLS block
|
|
// whose address is not known at link-time. So the address returned from
|
|
// this function is the TP if the TLS template image were a TLS block.
|
|
template <typename E>
|
|
u64 get_tp_addr(Context<E> &ctx) {
|
|
ElfPhdr<E> *phdr = get_tls_segment(ctx);
|
|
if (!phdr)
|
|
return 0;
|
|
|
|
// On x86, SPARC and s390x, TP (%gs on i386, %fs on x86-64, %g7 on SPARC
|
|
// and %a0/%a1 on s390x) refers past the end of the TLS block for
|
|
// historical reasons. TLVs are accessed with negative offsets from TP.
|
|
if constexpr (is_x86<E> || is_sparc<E> || is_s390x<E>)
|
|
return align_to(phdr->p_vaddr + phdr->p_memsz, phdr->p_align);
|
|
|
|
// On ARM, SH4 and Alpha, the runtime appends two words at the beginning
|
|
// of TLV template image when copying TLVs to the TLS block, so we need
|
|
// to offset it.
|
|
if constexpr (is_arm<E> || is_sh4<E> || is_alpha<E>)
|
|
return align_down(phdr->p_vaddr - sizeof(Word<E>) * 2, phdr->p_align);
|
|
|
|
// On PPC and m68k, TP is 0x7000 (28 KiB) past the beginning of the TLV
|
|
// block to maximize the addressable range for load/store instructions
|
|
// with 16-bits signed immediates. It's not exactly 0x8000 (32 KiB) off
|
|
// because there's a small implementation-defined piece of data before
|
|
// the TLV block, and the runtime wants to access them efficiently too.
|
|
if constexpr (is_ppc<E> || is_m68k<E>)
|
|
return phdr->p_vaddr + 0x7000;
|
|
|
|
// RISC-V just uses the beginning of the main executable's TLV block as
|
|
// TP. RISC-V load/store instructions usually take 12-bits signed
|
|
// immediates, so the beginning of TLV ± 2 KiB is accessible with a
|
|
// single load/store instruction.
|
|
assert(is_riscv<E>);
|
|
return phdr->p_vaddr;
|
|
}
|
|
|
|
// Returns the address when __tls_get_addr would be called with offset 0.
|
|
template <typename E>
|
|
u64 get_dtp_addr(Context<E> &ctx) {
|
|
ElfPhdr<E> *phdr = get_tls_segment(ctx);
|
|
if (!phdr)
|
|
return 0;
|
|
|
|
// On PPC64 and m68k, R_DTPOFF is resolved to the address 0x8000 (32
|
|
// KiB) past the start of the TLS block. The bias maximizes the
|
|
// accessible range for load/store instructions with 16-bits signed
|
|
// immediates. That is, if the offset were right at the beginning of
|
|
// the start of the TLS block, the half of addressible space (negative
|
|
// immediates) would have been wasted.
|
|
if constexpr (is_ppc<E> || is_m68k<E>)
|
|
return phdr->p_vaddr + 0x8000;
|
|
|
|
// On RISC-V, the bias is 0x800 as the load/store instructions in the
|
|
// ISA usually have a 12-bit immediate.
|
|
if constexpr (is_riscv<E>)
|
|
return phdr->p_vaddr + 0x800;
|
|
|
|
// On other targets, DTP simply refers to the beginning of the TLS block.
|
|
return phdr->p_vaddr;
|
|
}
|
|
|
|
using E = MOLD_TARGET;
|
|
|
|
template u64 get_tls_begin<E>(Context<E> &);
|
|
template u64 get_tp_addr<E>(Context<E> &);
|
|
template u64 get_dtp_addr<E>(Context<E> &);
|
|
|
|
} // namespace mold::elf
|