Update a document

2024-07-15 00:30:25 +03:00 · 2022-02-22 19:26:08 +09:00 · 2022-02-22 19:26:08 +09:00 · fb6967c0cf
commit fb6967c0cf
parent 385c1c494d
1 changed files with 120 additions and 120 deletions
--- a/docs/glossary.md
+++ b/docs/glossary.md
@ -1,158 +1,158 @@
-The concept of linking is very simple: a compiler compiles a piece of
+The very concept of linking is simple: a compiler compiles a piece of
 source code into an object file (a file containing machine code), and
 a linker combines object files into a single executable or a shared
 library file. However, the actual implementation of the linker for
 modern systems is much more complicated because hardware, operating
 system, compiler and linker all have many more features.

-In this file, I'll explain random topics that you need to understand
-to read mold code in the glossary format.
+In this file, I'll explain random topics in the glossary format that
+you need to understand to read mold code.

- DSO
+## DSO

-  A .so file. Short for Dynamic Shared Object. Often called as a
-  shared library, a dynamic libray or a shared object as well.
+A .so file. Short for Dynamic Shared Object. Often called as a
+shared library, a dynamic libray or a shared object as well.

-  An DSO contains common functions and data that are used by multiple
-  executables and/or other DSOs. At runtime, a DSO is loaded to a
-  contiguous region in the virtual address.
+An DSO contains common functions and data that are used by multiple
+executables and/or other DSOs. At runtime, a DSO is loaded to a
+contiguous region in the virtual address.

- Object file
+## Object file

-  A .o file. An object file contains machine code and data, but it
-  cannot be executed because it's not self-contained. For example,
-  if you compile a C source file containing a call of `printf`,
-  the actual function code of `printf` is not included in the resulting
-  object file. You include `stdio.h`, but that teaches the compiler
-  only about `printf`'s type, and the compiler still don't know what
-  `printf` actually does. Therefore, it cannot emit code for `printf`.
+A .o file. An object file contains machine code and data, but it
+cannot be executed because it's not self-contained. For example,
+if you compile a C source file containing a call of `printf`,
+the actual function code of `printf` is not included in the resulting
+object file. You include `stdio.h`, but that teaches the compiler
+only about `printf`'s type, and the compiler still don't know what
+`printf` actually does. Therefore, it cannot emit code for `printf`.

-  You need to link an object file with other object file or a shared
-  library to make it exectuable.
+You need to link an object file with other object file or a shared
+library to make it exectuable.

- Virtual address space
+## Virtual address space

-  A pointer has a value like 0x803020 which is an address of the
-  pointee. But it doesn't mean that the pointee resides at the
-  physical memory address 0x803020 on the computer. Modern CPUs
-  contains so-called Mmeory Management Unit (MMU), and all access to
-  the memory are first translated by MMU to the physical address.
-  The address before translation is called the "virtual address".
-  Unless you are doing the kernel programming, all addresses you
-  handle are virtual addresses.
+A pointer has a value like 0x803020 which is an address of the
+pointee. But it doesn't mean that the pointee resides at the
+physical memory address 0x803020 on the computer. Modern CPUs
+contains so-called Mmeory Management Unit (MMU), and all access to
+the memory are first translated by MMU to the physical address.
+The address before translation is called the "virtual address".
+Unless you are doing the kernel programming, all addresses you
+handle are virtual addresses.

-  The OS kernel controls the MMU so that each process owns the entire
-  virtual address space. So, even if two process uses the same virtual
-  address, they don't conflict. They are mapped to different physical
-  addresses.
+The OS kernel controls the MMU so that each process owns the entire
+virtual address space. So, even if two process uses the same virtual
+address, they don't conflict. They are mapped to different physical
+addresses.

-  The existence of MMU has several implications to the linker. First,
-  we can link the main executable to a specific address. On process
-  startup, there's no code or data in the virtual address space, so
-  the mapping of the main executable always succeed. However, it's not
-  true to DSOs because they are loaded after the main executable and
-  possibly other DSOs. Therefore, shared libraries must be linked in a
-  way that they can be loaded to any address in the virtual address
-  space.
+The existence of MMU has several implications to the linker. First,
+we can link the main executable to a specific address. On process
+startup, there's no code or data in the virtual address space, so
+the mapping of the main executable always succeed. However, it's not
+true to DSOs because they are loaded after the main executable and
+possibly other DSOs. Therefore, shared libraries must be linked in a
+way that they can be loaded to any address in the virtual address
+space.

- Relocation
+## Relocation

-  A piece of information for the linker as to how to link object files
-  or a dynamic objects.
+A piece of information for the linker as to how to link object files
+or a dynamic objects.

-  Object files can refer functions or data in other object files. For
-  example, if you compile a function which calls a non-local function
-  `foo`, the resulting code contains something like this:
+Object files can refer functions or data in other object files. For
+example, if you compile a function which calls a non-local function
+`foo`, the resulting code contains something like this:

-    ```
-      26:   e8 00 00 00 00          callq  2b <bar+0xb>
-                            27: R_X86_64_PLT32      foo-0x4
-    ```
+```
+  26:   e8 00 00 00 00          callq  2b <bar+0xb>
+                        27: R_X86_64_PLT32      foo-0x4
+```

-  The above `callq` is the instruction to call a function at the
-  machine code level. It's opcode is `0xe8` in x86-64, so the
-  instruction begins with `0xe8`. The following four bytes are
-  displacement; that is, the address of the branch target relative to
-  the end of this `callq` instruction. Notice that the displacement is
-  0. The compiler couldn't fill the displacement because it has no
-  idea as to where `foo` will be at runtime. So, the compiler write 0
-  as a placeholder and instead write a relocation `R_X86_64_PLT32`
-  with `foo` as its associated symbol. The linker reads this
-  relocation, computes the offsets between this call instruction and
-  function `foo` and overwrite the placeholder value 0 with an actual
-  displacement.
+The above `callq` is the instruction to call a function at the
+machine code level. It's opcode is `0xe8` in x86-64, so the
+instruction begins with `0xe8`. The following four bytes are
+displacement; that is, the address of the branch target relative to
+the end of this `callq` instruction. Notice that the displacement is
+0. The compiler couldn't fill the displacement because it has no
+idea as to where `foo` will be at runtime. So, the compiler write 0
+as a placeholder and instead write a relocation `R_X86_64_PLT32`
+with `foo` as its associated symbol. The linker reads this
+relocation, computes the offsets between this call instruction and
+function `foo` and overwrite the placeholder value 0 with an actual
+displacement.

-  There are many different types of relocations. For example, if you
-  want to fix up not with a displacement but with an absolute address
-  of a symbol, you need to use `R_X86_64_ABS64` instead.
+There are many different types of relocations. For example, if you
+want to fix up not with a displacement but with an absolute address
+of a symbol, you need to use `R_X86_64_ABS64` instead.

- Static library
+## Static library

-  A .a file. Often called as an archive file or just archive as well.
+A .a file. Often called as an archive file or just archive as well.

-  A static library is a container just like tar or zip. Actually,
-  there's no technical reason to not use tar or (uncompressed) zip,
-  but traditionally the .a file format is used by the linker.
+A static library is a container just like tar or zip. Actually,
+there's no technical reason to not use tar or (uncompressed) zip,
+but traditionally the .a file format is used by the linker.

-  A static library contains object files and can be passed to the
-  linker along with other object files and/or archives.
+A static library contains object files and can be passed to the
+linker along with other object files and/or archives.

-  A linker pulls out object files from an archive only if it is needed
-  to resolve undefined symbols. In other words, object files in an
-  archive are not linked by default and used as a complement to supply
-  missing definitions. This is ideal for a library because you don't
-  want to link library functions unless you are actually using them.
+A linker pulls out object files from an archive only if it is needed
+to resolve undefined symbols. In other words, object files in an
+archive are not linked by default and used as a complement to supply
+missing definitions. This is ideal for a library because you don't
+want to link library functions unless you are actually using them.

-  Contrary to archive files, object files directly given to a linker
-  are always linked to the output.
+Contrary to archive files, object files directly given to a linker
+are always linked to the output.

-  To maximize the benefit of archive files, a library often used as a
-  static library is broken down to small files to separate each
-  function individually (for example, look at
-  https://git.musl-libc.org/cgit/musl/tree/src/stdio). By doing this,
-  you import only used functions.
+To maximize the benefit of archive files, a library often used as a
+static library is broken down to small files to separate each
+function individually (for example, look at
+https://git.musl-libc.org/cgit/musl/tree/src/stdio). By doing this,
+you import only used functions.

-  A static file is created by `ar`, whose command line arguments are
-  similar to `tar`. A static library contains the symbol table which
-  offers a quick way to look up an object file for a defined symbol,
-  but mold does not use the static library's symbol table. mold
-  doesdn't need a symbol table to exist in an archive, and if exists,
-  mold just ignores it.
+A static file is created by `ar`, whose command line arguments are
+similar to `tar`. A static library contains the symbol table which
+offers a quick way to look up an object file for a defined symbol,
+but mold does not use the static library's symbol table. mold
+doesdn't need a symbol table to exist in an archive, and if exists,
+mold just ignores it.

-  See also: DSO (dynamic library)
+See also: DSO (dynamic library)

- Symbol
+## Symbol

-  A symbol is a label assigned to a specific location in an input file
-  or an output file. For example, if you define function `foo` and
-  compile it, the resulting object file contains a symbol `foo`
-  pointing to the beginning of the machine code for `foo`.
+A symbol is a label assigned to a specific location in an input file
+or an output file. For example, if you define function `foo` and
+compile it, the resulting object file contains a symbol `foo`
+pointing to the beginning of the machine code for `foo`.

-  Usually, a symbol name is a function or a variable name. If an
-  object is anonymous (such the one for a string literal), a compiler
-  generated a unique symbol, which often starts with `.` to avoid
-  conflict with user-defined symbols.
+Usually, a symbol name is a function or a variable name. If an
+object is anonymous (such the one for a string literal), a compiler
+generated a unique symbol, which often starts with `.` to avoid
+conflict with user-defined symbols.

-  For C++, symbol name is a complex "mangled" name. We need to mangle
-  identifiers because a simple name such as `foo` cannot be uniquely
-  identify a function or a data in C++, because for example `foo` may
-  be in a namespace or defined as a static member in some class. If
-  `foo` is an overloaded function, we need to distinguish different
-  `foo`s by its type. Therefore, C++ compiler mangles an identifier by
-  appending nmaepsace names, type information and such so that
-  different things get different names.
+For C++, symbol name is a complex "mangled" name. We need to mangle
+identifiers because a simple name such as `foo` cannot be uniquely
+identify a function or a data in C++, because for example `foo` may
+be in a namespace or defined as a static member in some class. If
+`foo` is an overloaded function, we need to distinguish different
+`foo`s by its type. Therefore, C++ compiler mangles an identifier by
+appending nmaepsace names, type information and such so that
+different things get different names.

-  For example, a function `int foo(int)` in a namespace `bar` is
-  mangled as `_ZN3bar3fooEi`.
+For example, a function `int foo(int)` in a namespace `bar` is
+mangled as `_ZN3bar3fooEi`.

-  A symbol can be either defined or undefined. A defined symbol points
-  to some location in a file which may contain the function's machine
-  code or the variable's initial value. An undefined symbol does not
-  point to anywhere. It needs to be merged with a defined symbol with
-  the same name at link-time. This merging process is called "name
-  resolution".
+A symbol can be either defined or undefined. A defined symbol points
+to some location in a file which may contain the function's machine
+code or the variable's initial value. An undefined symbol does not
+point to anywhere. It needs to be merged with a defined symbol with
+the same name at link-time. This merging process is called "name
+resolution".

-  For example, if your program is using `printf`, it usually contains
-  `printf` as an undefined symbol. You need to link it with `libc.a`
-  or `libc.so`, which contain a defined symbol of `printf`, to make a
-  complete program.
+For example, if your program is using `printf`, it usually contains
+`printf` as an undefined symbol. You need to link it with `libc.a`
+or `libc.so`, which contain a defined symbol of `printf`, to make a
+complete program.