This isn't needed for Process / Thread as they only reference it
by pointer and it's already part of Kernel/Forward.h. So just include
it where the implementation needs to call it.
To add a new per-CPU data structure, add an ID for it to the
ProcessorSpecificDataID enum.
Then call ProcessorSpecific<T>::initialize() when you are ready to
construct the per-CPU data structure on the current CPU. It can then
be accessed via ProcessorSpecific<T>::get().
This patch replaces the existing hard-coded mechanisms for Scheduler
and MemoryManager per-CPU data structure.
This enables further work on implementing KASLR by adding relocation
support to the pre-kernel and updating the kernel to be less dependent
on specific virtual memory layouts.
Despite what the declaration would have us believe these are not "u8*".
If they were we wouldn't have to use the & operator to get the address
of them and then cast them to "u8*"/FlatPtr afterwards.
Depending on the values it might be difficult to figure out whether a
value is decimal or hexadecimal. So let's make this more obvious. Also
this allows copying and pasting those numbers into GNOME calculator and
probably also other apps which auto-detect the base.
The entire process is not needed, just require the user to pass in the
Space. Also provide no_lock variant to use when you already have the
VM/Space lock acquired, to avoid unnecessary recursive spinlock
acquisitions.
The non CPU specific code of the kernel shouldn't need to deal with
architecture specific registers, and should instead deal with an
abstract view of the machine. This allows us to remove a variety of
architecture specific ifdefs and helps keep the code slightly more
portable.
We do this by exposing the abstract representation of instruction
pointer, stack pointer, base pointer, return register, etc on the
RegisterState struct.
This switches tracking CPU usage to more accurately measure time in
user and kernel land using either the TSC or another time source.
This will also come in handy when implementing a tickless kernel mode.
This implements a simple bootloader that is capable of loading ELF64
kernel images. It does this by using QEMU/GRUB to load the kernel image
from disk and pass it to our bootloader as a Multiboot module.
The bootloader then parses the ELF image and sets it up appropriately.
The kernel's entry point is a C++ function with architecture-native
code.
Co-authored-by: Liav A <liavalb@gmail.com>
The kernel doesn't currently boot when using an address other than
0xc0000000 because the page tables aren't set up properly for that
but this at least lets us build the kernel.
The 32-bit boot code jumps to 0xc0000000 + entry address once page
tables are set up. This is unnecessary for 64-bit mode because we'll
do another far jump just moments later.
By moving the PhysicalPage classes out of the kernel heap into a static
array, one for each physical page, we can avoid the added overhead and
easily find them by indexing into an array.
This also wraps the PhysicalPage into a PhysicalPageEntry, which allows
us to re-use each slot with information where to find the next free
page.
We already use PAE for the NX bit, but this changes the PhysicalAddress
structure to be able to hold 64 bit physical addresses. This allows us
to use all the available physical memory.
Because the remainder variable will always be 0 unless a fault happened
we should not use it to decide if we have nothing left to memset when
finishing the fast path. This caused not all bytes to be zeroed
if the size was not an exact multiple of sizeof(size_t).
Fixes#8352
When retrieving and setting x86 MSRs two registers are required. The
existing setter and getter for the MSR class made this implementation
detail visible to the caller. This changes the setter and getter to
use u64 instead.
Right now we're using the FS segment for our per-CPU struct. On x86_64
there's an instruction to switch between a kernel and usermode GS
segment (swapgs) which we could use.
This patch doesn't update the rest of the code to use swapgs but it
prepares for that by using the GS segment instead of the FS segment.
The intention is to add dynamic mechanism for notifying the userspace
about hotplug events. Currently, the DMI (SMBIOS) blobs and ACPI tables
are exposed in the new filesystem.
Userland faulted on the very first instruction before because the
PML4T/PDPT/etc. weren't marked as user-accessible. For some reason
x86 doesn't care about that.
Also, we need to provide an appropriate userspace stack segment
selector to iretq.
This isn't particularly useful because by the time we've entered
init() the CPU had better support x86_64 anyway. However this shows the
CPU flag in System Monitor - even in 32-bit mode.
This adds just enough stubs to make the kernel compile on x86_64. Obviously
it won't do anything useful - in fact it won't even attempt to boot because
Multiboot doesn't support ELF64 binaries - but it gets those compiler errors
out of the way so more progress can be made getting all the missing
functionality in place.
This doesn't really matter in terms of writability for the kernel text
because we set up proper page mappings anyway which prohibit writing
to the text segment. However, this makes the profiler happy which
previously died when validating the kernel's ELF program headers.
If FXSR is not present, fall back to fnsave and frstor instructions.
These instructions aren't available on the canonical i686 CPU which is
the Pentium Pro processor.
Currently in SMP mode we hard code support for up to only 8 processors.
There is no reason for this to be a dynamic allocation that needs to be
guarded by a spinlock. Instead use a Array<T* with inline storage of 8,
allowing each processor to initialize it self in place, avoiding all
the need for locks.
Hook the kernel page fault handler and capture page fault events when
the fault has a current thread attached in TLS. We capture the eip and
ebp so we can unwind the stack and locate which pieces of code are
generating the most page faults.
Co-authored-by: Gunnar Beutner <gbeutner@serenityos.org>
As we removed the support of VBE modesetting that was done by GRUB early
on boot, we need to determine if we can modeset the resolution with our
drivers, and if not, we should enable text mode and ensure that
SystemServer knows about it too.
Also, SystemServer should first check if there's a framebuffer device
node, which is an indication that text mode was not even if it was
requested. Then, if it doesn't find it, it should check what boot_mode
argument the user specified (in case it's self-test). This way if we
try to use bochs-display device (which is not VGA compatible) and
request a text mode, it will not honor the request and will continue
with graphical mode.
Also try to print critical messages with mininum memory allocations
possible.
In LibVT, We make the implementation flexible for kernel-specific
methods that are implemented in ConsoleImpl class.
We used GRUB to modeset the resolution for a long time, but for good
reasons I see no point with keeping it supported in our kernel. We
support bochs-display device on QEMU (both the VGA compatible and
non-VGA compatible variants), so for QEMU we can still boot the system
in graphical mode even without GRUB help.
Also, we now have a native driver for Intel graphics and although it
doesn't support most Intel graphics cards out there yet, it's a good
starting point to support more cards. If a user wants to boot on
bare-metal in graphical mode, all he needs to do is to add the removed
flag back again, as the kernel still supports pre-set framebuffers.
By constraining two implementations, the compiler will select the best
fitting one. All this will require is duplicating the implementation and
simplifying for the `void` case.
This constraining also informs both the caller and compiler by passing
the callback parameter types as part of the constraint
(e.g.: `IterationFunction<int>`).
Some `for_each` functions in LibELF only take functions which return
`void`. This is a minimal correctness check, as it removes one way for a
function to incompletely do something.
There seems to be a possible idiom where inside a lambda, a `return;` is
the same as `continue;` in a for-loop.
If we are attempting to emit debugging information about an unhandleable
page fault, don't crash trying to kill threads or dump processes if the
current_thread isn't set in TLS. Attempt to keep proceeding in order to
dump as much useful information as possible.
Related: #6948
The variety of checks for Processor::id() == 0 could use some assistance
in the readability department. This change adds a new function to
represent this check, and replaces the comparison everywhere it's used.
This solves a problem where checking whether a thread is an idle
thread may require iterating all processors if it is not the idle
thread of the current processor.
GCC with -flto is more aggressive when it comes to inlining and
discarding functions which is why we must mark some of the functions
as NEVER_INLINE (because they contain asm labels which would be
duplicated in the object files if the compiler decides to inline
the function elsewhere) and __attribute__((used)) for others so
that GCC doesn't discard them.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
This adds PT_PEEKDEBUG and PT_POKEDEBUG to allow for reading/writing
the debug registers, and updates the Kernel's debug handler to read the
new information from the debug status register.
Alot of code is shared between i386/i686/x86 and x86_64
and a lot probably will be used for compatability modes.
So we start by moving the headers into one Directory.
We will probalby be able to move some cpp files aswell.
According to the Intel manual: "After reset, all bits (except bit 0) in
XCR0 are cleared to zero; XCR0[0] is set to 1."
Sadly we can't trust this, for example VirtualBox starts with
bits 0-4 set, so let's do it ourselves.
Fixes#5653
Previously, the instruction fetch flag of the page fault handler
did not have the currect binary representation, and would always
return false. This aligns these flags.
Because registering and unregistering interrupt handlers triggers
calls to virtual functions, we can't do this in the constructor
and destructor.
Fixes#5539
This was necessary in the past when crash handling would modify
various global things, but all that stuff is long gone so we can
simplify crashes by leaving the interrupt flag alone.
Make more of the kernel compile in 64-bit mode, and make some things
pointer-size-agnostic (by using FlatPtr.)
There's a lot of work to do here before the kernel will even compile.
We were only 448 KiB away from filling up the old slot size we reserve
for the kernel above the 3 GiB mark. This expands the slot to 16 MiB,
which allows us to continue booting the kernel until somebody takes
the time to improve our loader.
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)
Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.
We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.
When building the kernel with -O2, we somehow ended up with the kernel
command line outside of the lower 8MB of physical memory. Since we don't
map that area in our initial page table setup, we would triple fault
when trying to parse the command line.
This patch sidesteps the issue by copying the (first 4KB of) the kernel
command line to a buffer in a known safe location at boot.
We want to make sure these functions actually do get unmapped. If they
were inlined somewhere, the inlined version(s) would remain mapped.
Thanks to "thislooksfun" for the suggestion! :^)
There's no real system here, I just added it to various functions
that I don't believe we ever want to call after initialization
has finished.
With these changes, we're able to unmap 60 KiB of kernel text
after init. :^)
You can now declare functions with UNMAP_AFTER_INIT and they'll get
segregated into a separate kernel section that gets completely
unmapped at the end of initialization.
This can be used for anything we don't need to call once we've booted
into userspace.
There are two nice things about this mechanism:
- It allows us to free up entire pages of memory for other use.
(Note that this patch does not actually make use of the freed
pages yet, but in the future we totally could!)
- It allows us to get rid of obviously dangerous gadgets like
write-to-CR0 and write-to-CR4 which are very useful for an attacker
trying to disable SMAP/SMEP/etc.
I've also made sure to include a helpful panic message in case you
hit a kernel crash because of this protection. :^)
You can now use the READONLY_AFTER_INIT macro when declaring a variable
and we will put it in a special ".ro_after_init" section in the kernel.
Data in that section remains writable during the boot and init process,
and is then marked read-only just before launching the SystemServer.
This is based on an idea from the Linux kernel. :^)
Since kernel stacks are much smaller (64 KiB) than userspace stacks,
we only add a small bit of randomness here (0-256 bytes, 16b aligned.)
This makes the location of the task context switch buffer not be
100% predictable. Note that we still also add extra randomness upon
syscall entry, so this patch primarily affects context switching.
If we try to align a number above 0xfffff000 to the next multiple of
the page size (4 KiB), it would wrap around to 0. This is most likely
never what we want, so let's assert if that happens.
If we're flushing user space pointers and the process only has one
thread, we do not need to broadcast this to other processors as
they will all discard that request anyway.
Attempt to wake idle processors to get threads to be scheduled more quickly.
We don't want to wait until the next timer tick if we have processors that
aren't doing anything.
This eliminates the window between calling Processor::current and
the member function where a thread could be moved to another
processor. This is generally not as big of a concern as with
Processor::current_thread, but also slightly more light weight.
Change Thread::current to be a static function and read using the fs
register, which eliminates a window between Processor::current()
returning and calling a function on it, which can trigger preemption
and a move to a different processor, which then causes operating
on the wrong object.
We also need to store m_in_critical in the Thread upon switching,
and we need to restore it. This solves a problem where threads
moving between different processors could end up with an unexpected
value.
This allows us to determine what the previous mode (user or kernel)
was, e.g. in the timer interrupt. This is used e.g. to determine
whether a signal handler should be set up.
Fixes#5096
We were enabling interrupts too early, before the first context switch to
a thread was complete. This could then trigger another context switch
within the context switch, which lead to a crash.
This was done with the help of several scripts, I dump them here to
easily find them later:
awk '/#ifdef/ { print "#cmakedefine01 "$2 }' AK/Debug.h.in
for debug_macro in $(awk '/#ifdef/ { print $2 }' AK/Debug.h.in)
do
find . \( -name '*.cpp' -o -name '*.h' -o -name '*.in' \) -not -path './Toolchain/*' -not -path './Build/*' -exec sed -i -E 's/#ifdef '$debug_macro'/#if '$debug_macro'/' {} \;
done
# Remember to remove WRAPPER_GERNERATOR_DEBUG from the list.
awk '/#cmake/ { print "set("$2" ON)" }' AK/Debug.h.in
Booting old computers without RDRAND/RDSEED and without a disk makes
the system severely starved for entropy. Uses interrupts as a source
to side-step that issue.
Also warn whenever the system is starved of entropy, because that's
a non-obvious failure mode.
It was possible to signal a process while it was paging in an inode
backed VM object. This would cause the inode read to EINTR, and the
page fault handler would assert.
Solve this by simply not unblocking threads due to signals if they are
currently busy handling a page fault. This is probably not the best way
to solve this issue, so I've added a FIXME to that effect.
Both ESP and GDTR are left undefined by the Multiboot specification and
OS images must not rely on these values to be valid. Fix the undefined
behaviors so that booting with PXELINUX does not triple-fault the CPU.
These changes are arbitrarily divided into multiple commits to make it
easier to find potentially introduced bugs with git bisect.Everything:
The modifications in this commit were automatically made using the
following command:
find . -name '*.cpp' -exec sed -i -E 's/dbg\(\) << ("[^"{]*");/dbgln\(\1\);/' {} \;
If a TLB flush request is broadcast to other processors and the addresses
to flush are user mode addresses, we can ignore such a request on the
target processor if the page directory currently in use doesn't match
the addresses to be flushed. We still need to broadcast to all processors
in that case because the other processors may switch to that same page
directory at any time.