This replaces all uses of LexicalPath in the Kernel with the functions
from KLexicalPath. This also allows the Kernel to stop including
AK::LexicalPath.
This adds KLexicalPath, which are a few static functions which aim to
mostly emulate AK::LexicalPath. They are however constrained to work
with absolute paths only, containing no '.' or '..' path segments and no
consecutive slashes. This way, it is possible to avoid use StringView
for the return values and thus avoid allocating new String objects.
As explained above, the functions are currently very strict about the
allowed input paths. This seems to not be a problem currently. Since the
functions VERIFY this, potential bugs caused by this will become
immediately obvious.
Instead of using one file for the entire "backend" of the ProcFS data
and metadata, we could split that file into two files that represent
2 logical chunks of the ProcFS exposed objects:
1. Global and inter-process information. This includes all fixed data in
the root folder of the ProcFS, networking information and the bus
folder.
2. Per-process information. This includes all dynamic data about a
process that resides in the /proc/PID/ folder.
This change makes it more easier to read the code and to change it,
hence we do it although there's no technical benefit from it now :)
The new ProcFS design consists of two main parts:
1. The representative ProcFS class, which is derived from the FS class.
The ProcFS and its inodes are much more lean - merely 3 classes to
represent the common type of inodes - regular files, symbolic links and
directories. They're backed by a ProcFSExposedComponent object, which
is responsible for the functional operation behind the scenes.
2. The backend of the ProcFS - the ProcFSComponentsRegistrar class
and all derived classes from the ProcFSExposedComponent class. These
together form the entire backend and handle all the functions you can
expect from the ProcFS.
The ProcFSExposedComponent derived classes split to 3 types in the
manner of lifetime in the kernel:
1. Persistent objects - this category includes all basic objects, like
the root folder, /proc/bus folder, main blob files in the root folders,
etc. These objects are persistent and cannot die ever.
2. Semi-persistent objects - this category includes all PID folders,
and subdirectories to the PID folders. It also includes exposed objects
like the unveil JSON'ed blob. These object are persistent as long as the
the responsible process they represent is still alive.
3. Dynamic objects - this category includes files in the subdirectories
of a PID folder, like /proc/PID/fd/* or /proc/PID/stacks/*. Essentially,
these objects are always created dynamically and when no longer in need
after being used, they're deallocated.
Nevertheless, the new allocated backend objects and inodes try to use
the same InodeIndex if possible - this might change only when a thread
dies and a new thread is born with a new thread stack, or when a file
descriptor is closed and a new one within the same file descriptor
number is opened. This is needed to actually be able to do something
useful with these objects.
The new design assures that many ProcFS instances can be used at once,
with one backend for usage for all instances.
The intention is to add dynamic mechanism for notifying the userspace
about hotplug events. Currently, the DMI (SMBIOS) blobs and ACPI tables
are exposed in the new filesystem.
Neither the kernel nor LibELF support loading libraries with larger
PT_LOAD alignment. The default on x86 is 4096 while it's 2MiB on x86_64.
This changes the alignment to 4096 on all platforms.
Currently, Kernel::Graphics::FramebufferConsole is written assuming that
the underlying framebuffer memory exists in physically contiguous
memory. There are a bunch of framebuffer devices that would need to use
the components of FramebufferConsole (in particular access to the kernel
bitmap font rendering logic). To reduce code duplication, framebuffer
console has been split into two parts, the abstract
GenericFramebufferConsole class which does the rendering, and the
ContiguousFramebufferConsole class which contains all logic related to
managing the underling vm object.
Also, a new flush method has been added to the class, to support devices
that require an extra flush step to render.
Multiboot only supports ELF32 executables. This changes the build
process to build an ELF32 executable which has a 32-bit entry point,
but consists of mostly 64-bit code.
This adds just enough stubs to make the kernel compile on x86_64. Obviously
it won't do anything useful - in fact it won't even attempt to boot because
Multiboot doesn't support ELF64 binaries - but it gets those compiler errors
out of the way so more progress can be made getting all the missing
functionality in place.
This doesn't really matter in terms of writability for the kernel text
because we set up proper page mappings anyway which prohibit writing
to the text segment. However, this makes the profiler happy which
previously died when validating the kernel's ELF program headers.
These are the actual structures that allow USB to work (i.e the ones
actually defined in the specification). This should provide us enough
of a baseline implementation that we can build on to support
different types of USB device.
These are pretty common on older LGA1366 & LGA1150 motherboards.
NOTE: Since the registers datasheets for all versions of the chip
besides versions 1 - 3 are still under NDAs i had to collect
several "magical vendor constants" from the *BSD driver and the
linux driver that i was not able to name verbosely, and as such
these are labeled with the comment "vendor magic values".
We call it E1000E, because the layout for these cards is somewhat not
the same like E1000 supported cards.
Also, this card supports advanced features that are not supported on
8254x cards.
Instead of initializing network adapters in init.cpp, let's move that
logic into a separate class to handle this.
Also, it seems like a good idea to shift responsiblity on enumeration
of network adapters after the boot process, so this singleton will take
care of finding the appropriate network adapter when asked to with an
IPv4 address or interface name.
With this change being merged, we simplify the creation logic of
NetworkAdapter derived classes, so we enumerate the PCI bus only once,
searching for driver candidates when doing so, and we let each driver
to test if it is resposible for the specified PCI device.
Previously we'd incur the costs for a function call via the PLT even
for the most trivial ref-count actions like increasing/decreasing the
reference count.
By moving the code to the header file we allow the compiler to inline
this code into the caller's function.
This is a simple string class for use in the kernel. It encapsulates
a length + character array in a single-allocation object.
Main differences from AK::String:
- Single-owner (no reference counting.)
- Allocation failures are exposed, not hidden.
The basic idea is to allow better and more precise string management
in the kernel.
It seems like overly-specific classes were written for no good reason.
Instead of making each adapter to have its own unique FramebufferDevice
class, let's generalize everything to keep implementation more
consistent.
When debugging kernel code, it's necessary to set extra flags. Normal
advice is to set -ggdb3. Sometimes that still doesn't provide enough
debugging information for complex functions that still get optimized.
Compiling with -Og gives the best optimizations for debugging, but can
sometimes be broken by changes that are innocuous when the compiler gets
more of a chance to look at them. The new CMake option enables both
compile options for kernel code.
This had very bad interactions with ccache, often leading to rebuilds
with 100% cache misses, etc. Ali says it wasn't that big of a speedup
in the end anyway, so let's not bother with it.
We can always bring it back in the future if it seems like a good idea.
This simple driver simply finds a device in a device definitions list
and then sets up a SerialDevice instance based on the definition.
The driver currently only supports "WCH CH382 2S" pci serial boards,
as that is the only device available for me to test with, but most
other pci serial devices should be as easily addable as adding a
board_definitions entry.
As we removed the support of VBE modesetting that was done by GRUB early
on boot, we need to determine if we can modeset the resolution with our
drivers, and if not, we should enable text mode and ensure that
SystemServer knows about it too.
Also, SystemServer should first check if there's a framebuffer device
node, which is an indication that text mode was not even if it was
requested. Then, if it doesn't find it, it should check what boot_mode
argument the user specified (in case it's self-test). This way if we
try to use bochs-display device (which is not VGA compatible) and
request a text mode, it will not honor the request and will continue
with graphical mode.
Also try to print critical messages with mininum memory allocations
possible.
In LibVT, We make the implementation flexible for kernel-specific
methods that are implemented in ConsoleImpl class.
This new subsystem is replacing the old code that was used to
create device nodes of framebuffer devices in /dev.
This subsystem includes for now 3 roles:
1. GraphicsManagement singleton object that is used in the boot
process to enumerate and initialize display devices.
2. GraphicsDevice(s) that are used to control the display adapter.
3. FramebufferDevice(s) that are used to control the device node in
/dev.
For now, we support the Bochs display adapter and any other
generic VGA compatible adapter that was configured by the boot
loader to a known and fixed resolution.
Two improvements in the Bochs display adapter code are that
we can support native bochs-display device (this device doesn't
expose any VGA capabilities) and also that we use the MMIO region,
to configure the device, instead of setting IO ports for such tasks.
Previously ByteBuffer would internally hold a RefPtr to the byte
buffer and would behave like a reference type, i.e. copying a
ByteBuffer would not create a duplicate byte buffer, but rather
two objects which refer to the same internal buffer.
This also changes ByteBuffer so that it has some internal capacity
much like the Vector<T> type. Unlike Vector<T> however a byte
buffer's data may be uninitialized.
With this commit ByteBuffer makes use of the kmalloc_good_size()
API to pick an optimal allocation size for its internal buffer.
This commit replaces the former, hand-written parser with a new one that
can be generated automatically according to a state change diagram.
The new `EscapeSequenceParser` class provides a more ergonomic interface
to dealing with escape sequences. This interface has been inspired by
Alacritty's [vte library](https://github.com/alacritty/vte/).
I tried to avoid changing the application logic inside the `Terminal`
class. While this code has not been thoroughly tested, I can't find
regressions in the basic command line utilities or `vttest`.
`Terminal` now displays nicer debug messages when it encounters an
unknown escape sequence. Defensive programming and bounds checks have
been added where we access parameters, and as a result, we can now
endure 4-5 seconds of `cat /dev/urandom`. :D
We generate EscapeSequenceStateMachine.h when building the in-kernel
LibVT, and we assume that the file is already in place when the userland
library is being built. This will probably cause problems later on, but
I can't find a way to do it nicely.
Make messages which should be fatal, actually fail the build.
- FATAL is not a valid mode keyword. The full list is available in the
docs: https://cmake.org/cmake/help/v3.19/command/message.html
- SEND_ERROR doesn't immediately stop processing, FATAL_ERROR does.
We should immediately stop if the Toolchain is not present.
- The app icon size validation was just a WARNING that is easy to
overlook. We should promote it to a FATAL_ERROR so that people will
not overlook the issue when adding a new application. We can only make
the small icon message FATAL_ERROR, as there is currently one
violation of the medium app icon validation.
This patch modifies InodeWatcher to switch to a one watcher, multiple
watches architecture. The following changes have been made:
- The watch_file syscall is removed, and in its place the
create_iwatcher, iwatcher_add_watch and iwatcher_remove_watch calls
have been added.
- InodeWatcher now holds multiple WatchDescriptions for each file that
is being watched.
- The InodeWatcher file descriptor can be read from to receive events on
all watched files.
Co-authored-by: Gunnar Beutner <gunnar@beutner.name>
Lots of people are confused by the error message you get when the
Toolchain is behind/messed up:
'initializer-list: No such file or directory'
Before this error can happen, catch the problem at CMake configure time,
and provide them with an actionable error message.
Make this stuff a bit easier to maintain by using the
root level variables to build up the Toolchain paths.
Also leave a note for future editors of BuildIt.sh to
give them warning about the other changes they'll need
to make.
Until we get the goodness that C++ modules are supposed to be, let's try
to shave off some parse time using precompiled headers.
This commit only adds some very common AK headers, only to binaries,
libraries and the kernel (tests are not covered due to incompatibility
with AK/TestSuite.h).
This option is on by default, but can be disabled by passing
`-DPRECOMPILE_COMMON_HEADERS=OFF` to cmake, which will disable all
header precompilations.
This makes the build about 30 seconds faster on my machine (about 7%).
This enables building usermode programs with exception handling. It also
builds a libstdc++ without exception support for the kernel.
This is necessary because the libstdc++ that gets built is different
when exceptions are enabled. Using the same library binary would
require extensive stubs for exception-related functionality in the
kernel.
This is a very basic implementation that only requests 4096 bytes
of entropy from the host once, but its still high quality entropy
so it should be a good fix for #4490 (boot-time entropy starvation)
for virtualized environments.
Co-authored-by: Sahan <sahan.h.fernando@gmail.com>
This allows converting a single virtual buffer into its non-physically
contiguous parts, this is especially useful for DMA-based devices that
support scatter/gather-like functionality, as it eliminates the need
to clone outgoing buffers into one physically contiguous buffer.
This patch allocates a physical page for each of the virtqueues and
memcpys to it when receiving a buffer to get a physical, aligned
contiguous buffer as required by the virtio specification.
Co-authored-by: Sahan <sahan.h.fernando@gmail.com>
Based on pull #3236 by tomuta, this is still very much WIP but will
eventually allow us to switch from the considerably slower method of
knocking on port 0xe9 for each character
Co-authored-by: Tom <tomut@yahoo.com>
Co-authored-by: Sahan <sahan.h.fernando@gmail.com>
Based on pull #3236 by tomuta, this adds helper methods for generic
device initialization, and partily-broken virtqueue helper methods
Co-authored-by: Tom <tomut@yahoo.com>
Co-authored-by: Sahan <sahan.h.fernando@gmail.com>
`-Wnull-dereference` has found a lot of "possible null dereferences" in userland.
However, in the kernel, no warnings occurred. To keep it that way,
preemptivly add the flag here and remove it once it is enabled system
wide.
This flag makes the compiler check statically for a null deref. It does
not take into account any human-imposed invariants and as such, may need a
`VERIFY(ptr);`, `if(ptr)`, or `if(!ptr)` before using a pointer.
However, as long as a pointer is not reassigned,
the verify will be valid, meaning that adding `VERIFY` can be done sparingly.
Now the kernel supports 2 ECAM access methods.
MMIOAccess was renamed to WindowedMMIOAccess and is what we had until
now - each device that is detected on boot is assigned to a
memory-mapped window, so IO operations on multiple devices can occur
simultaneously due to creating multiple virtual mappings, hence the name
is a memory-mapped window.
This commit adds a new class called MMIOAccess (not to be confused with
the old MMIOAccess class). This class creates one memory-mapped window.
On each IO operation on a configuration space of a device, it maps the
requested PCI bus region to that window. Therefore it holds a SpinLock
during the operation to ensure that no other PCI bus region was mapped
during the call.
A user can choose to either use PCI ECAM with memory-mapped window
for each device, or for an entire bus. By default, the kernel prefers to
map the entire PCI bus region.
The end goal of this commit is to allow to boot on bare metal with no
PS/2 device connected to the system. It turned out that the original
code relied on the existence of the PS/2 keyboard, so VirtualConsole
called it even though ACPI indicated the there's no i8042 controller on
my real machine because I didn't plug any PS/2 device.
The code is much more flexible, so adding HID support for other type of
hardware (e.g. USB HID) could be much simpler.
Briefly describing the change, we have a new singleton called
HIDManagement, which is responsible to initialize the i8042 controller
if exists, and to enumerate its devices. I also abstracted a bit
things, so now every Human interface device is represented with the
HIDDevice class. Then, there are 2 types of it - the MouseDevice and
KeyboardDevice classes; both are responsible to handle the interface in
the DevFS.
PS2KeyboardDevice, PS2MouseDevice and VMWareMouseDevice classes are
responsible for handling the hardware-specific interface they are
assigned to. Therefore, they are inheriting from the IRQHandler class.
If the user requests to force PIO mode, we just create IDEChannel
objects which are capable of sending PIO commands only.
However, if the user doesn't force PIO mode, we create BMIDEChannel
objects, which are sending DMA commands.
This change is somewhat simplifying the code, so each class is
supporting its type of operation - PIO or DMA. The PATADiskDevice
should not care if DMA is enabled or not.
Later on, we could write an IDEChannel class for UDMA modes,
that are available and documented on Intel specifications for their IDE
controllers.
We can't use deferred functions for anything that may require preemption,
such as copying from/to user or accessing the disk. For those purposes
we should use a work queue, which is essentially a kernel thread that
may be preempted or blocked.
Alot of code is shared between i386/i686/x86 and x86_64
and a lot probably will be used for compatability modes.
So we start by moving the headers into one Directory.
We will probalby be able to move some cpp files aswell.
This returns ENOSYS if you are running in the real kernel, and some
other result if you are running in UserspaceEmulator.
There are other ways we could check if we're inside an emulator, but
it seemed easier to just ask. :^)
The hierarchy is AHCIController, AHCIPortHandler, AHCIPort and
SATADiskDevice. Each AHCIController has at least one AHCIPortHandler.
An AHCIPortHandler is an interrupt handler that takes care of
enumeration of handled AHCI ports when an interrupt occurs. Each
AHCIPort takes care of one SATADiskDevice, and later on we can add
support for Port multiplier.
When we implement support of Message signalled interrupts, we can spawn
many AHCIPortHandlers, and allow each one of them to be responsible for
a set of AHCIPorts.
For some reason I don't yet understand, building the kernel with -O2
produces a way-too-large kernel on some people's systems.
Since there are some really nice performance benefits from -O2 in
userspace, let's do a compromise and build Userland with -O2 but
put Kernel back into the -Os box for now.
Due to the non-standard way the boot assembler code is linked into
the kernel (not and actual dependency, but linked via linker.ld script)
both make and ninja weren't re-linking the kernel when boot.S was
changed. This should theoretically work since we use the cmake
`add_dependencies(..)` directive to express a manual dependency
on boot from Kernel, but something is obviously broken in cmake.
We can work around that with a hack, which forces a dependency on
a file we know will always exist in the kernel (init.cpp). So if
boot.S is rebuilt, then init.cpp is forced to be rebuilt, and then
we re-link the kernel. init.cpp is also relatively small, so it
compiles fast.
With the kernel command line issue fixed, we can now enable these
KUBSAN options without getting triple faults on startup:
* alignment
* null
* pointer-overflow
Clangd (CLion) was choking on some of the -fsanitize options, and since
we're not building the kernel with Clang anyway, let's just disable
the options for non-GCC compilers for now.
This removes some hard references to the toolchain, some unnecessary
uses of an external install command, and disables a -Werror flag (for
the time being) - only if run inside serenity.
With this, we can build and link the kernel :^)
KASAN is a dynamic analysis tool that finds memory errors. It focuses
mostly on finding use-after-free and out-of-bound read/writes bugs.
KASAN works by allocating a "shadow memory" region which is used to store
whether each byte of memory is safe to access. The compiler then instruments
the kernel code and a check is inserted which validates the state of the
shadow memory region on every memory access (load or store).
To fully integrate KASAN into the SerenityOS kernel we need to:
a) Implement the KASAN interface to intercept the injected loads/stores.
void __asan_load*(address);
void __asan_store(address);
b) Setup KASAN region and determine the shadow memory offset + translation.
This might be challenging since Serenity is only 32bit at this time.
Ex: Linux implements kernel address -> shadow address translation like:
static inline void *kasan_mem_to_shadow(const void *addr)
{
return ((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT)
+ KASAN_SHADOW_OFFSET;
}
c) Integrating KASAN with Kernel allocators.
The kernel allocators need to be taught how to record allocation state
in the shadow memory region.
This commit only implements the initial steps of this long process:
- A new (default OFF) CMake build flag `ENABLE_KERNEL_ADDRESS_SANITIZER`
- Stubs out enough of the KASAN interface to allow the Kernel to link clean.
Currently the KASAN kernel crashes on boot (triple fault because of the crash
in strlen other sanitizer are seeing) but the goal here is to just get started,
and this should help others jump in and continue making progress on KASAN.
References:
* ASAN Paper: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37752.pdf
* KASAN Docs: https://github.com/google/kasan
* NetBSD KASAN Blog: https://blog.netbsd.org/tnf/entry/kernel_address_sanitizer_part_3
* LWN KASAN Article: https://lwn.net/Articles/612153/
* Tracking Issue #5351
Let's be a little more expressive when inducing a kernel panic. :^)
PANIC(...) passes any arguments you give it to dmesgln(), then prints
a backtrace and hangs the machine.
CLion doesn't understand that we switch compilers mid-build (which I
can understand since it's a bit unusual.) Defining __serenity__ makes
the majority of IDE features work correctly in the kernel context.
This patch adds Space, a class representing a process's address space.
- Each Process has a Space.
- The Space owns the PageDirectory and all Regions in the Process.
This allows us to reorganize sys$execve() so that it constructs and
populates a new Space fully before committing to it.
Previously, we would construct the new address space while still
running in the old one, and encountering an error meant we had to do
tedious and error-prone rollback.
Those problems are now gone, replaced by what's hopefully a set of much
smaller problems and missing cleanups. :^)
We now build the kernel with partial UBSAN support.
The following -fsanitize sub-options are enabled:
* nonnull-attribute
* bool
If the kernel detects UB at runtime, it will now print a debug message
with a stack trace. This is very cool! I'm leaving it on by default for
now, but we'll probably have to re-evaluate this as more options are
enabled and slowdown increases.
This adds support for FUTEX_WAKE_OP, FUTEX_WAIT_BITSET, FUTEX_WAKE_BITSET,
FUTEX_REQUEUE, and FUTEX_CMP_REQUEUE, as well well as global and private
futex and absolute/relative timeouts against the appropriate clock. This
also changes the implementation so that kernel resources are only used when
a thread is blocked on a futex.
Global futexes are implemented as offsets in VMObjects, so that different
processes can share a futex against the same VMObject despite potentially
being mapped at different virtual addresses.
All users of this mechanism have been switched to anonymous files and
passing file descriptors with sendfd()/recvfd().
Shbufs got us where we are today, but it's time we say good-bye to them
and welcome a much more idiomatic replacement. :^)
This patch adds a new AnonymousFile class which is a File backed by
an AnonymousVMObject that can only be mmap'ed and nothing else, really.
I'm hoping that this can become a replacement for shbufs. :^)
This patch merges the profiling functionality in the kernel with the
performance events mechanism. A profiler sample is now just another
perf event, rather than a dedicated thing.
Since perf events were already per-process, this now makes profiling
per-process as well.
Processes with perf events would already write out a perfcore.PID file
to the current directory on death, but since we may want to profile
a process and then let it continue running, recorded perf events can
now be accessed at any time via /proc/PID/perf_events.
This patch also adds information about process memory regions to the
perfcore JSON format. This removes the need to supply a core dump to
the Profiler app for symbolication, and so the "profiler coredump"
mechanism is removed entirely.
There's still a hard limit of 4MB worth of perf events per process,
so this is by no means a perfect final design, but it's a nice step
forward for both simplicity and stability.
Fixes#4848Fixes#4849
This patch adds sys$abort() which immediately crashes the process with
SIGABRT. This makes assertion backtraces a lot nicer by removing all
the gunk that otherwise happens between __assertion_failed() and
actually crashing from the SIGABRT.
Insert stack canaries to find stack corruptions in the kernel.
It looks like this was enabled in the past (842716a) but appears to have been
lost during the CMake conversion.
The `-fstack-protector-strong` variant was chosen because it catches more issues
than `-fstack-protector`, but doesn't have substantial performance impact like
`-fstack-protector-all`.
Instead of specifying the boot argument to be root=/dev/hdXY, now
one can write root=PARTUUID= with the right UUID, and if the partition
is found, the kernel will boot from it.
This feature is mainly used with GUID partitions, and is considered to
be the most reliable way for the kernel to identify partitions.
RTTI is still disabled for the Kernel, and for the Dynamic Loader. This
allows for much less awkward navigation of class heirarchies in LibCore,
LibGUI, LibWeb, and LibJS (eventually). Measured RootFS size increase
was < 1%, and libgui.so binary size was ~3.3%. The small binary size
increase here seems worth it :^)
* Add SERENITY_ARCH option to CMake for selecting the target toolchain
* Port all build scripts but continue to use i686
* Update GitHub Actions cache to include BuildIt.sh
The partitioning code was very outdated, and required a full refactor.
The new subsystem removes duplicated code and uses more AK containers.
The most important change is that all implementations of the
PartitionTable class conform to one interface, which made it possible
to remove unnecessary code in the EBRPartitionTable class.
Finding partitions is now done in the StorageManagement singleton,
instead of doing so in init.cpp.
Also, now we don't try to find partitions on demand - the kernel will
try to detect if a StorageDevice is partitioned, and if so, will check
what is the partition table, which could be MBR, GUID or EBR.
Then, it will create DiskPartitionMetadata object for each partition
that is available in the partition table. This object will be used
by the partition enumeration code to create a DiskPartition with the
correct minor number.
The DevFS along with DevPtsFS give a complete solution for populating
device nodes in /dev. The main purpose of DevFS is to eliminate the
need of device nodes generation when building the system.
Later on, DevFS will assist with exposing disk partition nodes.
This new flag controls two things:
- Whether the kernel will generate core dumps for the process
- Whether the EUID:EGID should own the process's files in /proc
Processes are automatically made non-dumpable when their EUID or EGID is
changed, either via syscalls that specifically modify those ID's, or via
sys$execve(), when a set-uid or set-gid program is executed.
A process can change its own dumpable flag at any time by calling the
new sys$prctl(PR_SET_DUMPABLE) syscall.
Fixes#4504.
This commit gets rid of ELF::Loader entirely since its very ambiguous
purpose was actually to load executables for the kernel, and that is
now handled by the kernel itself.
This patch includes some drive-by cleanup in LibDebug and CrashDaemon
enabled by the fact that we no longer need to keep the ref-counted
ELF::Loader around.
The StorageManagement class has 2 roles:
1. During boot, it should find all storage controllers in the machine,
and then determine what is the boot device.
2. Later on boot, it is a registrar of all storage controllers and
storage devices. Thus, it could be used to show information about these
devices when implemented.
This change allows the user to specify a boot driver other than /dev/hda
and if it's connected in the machine - it will boot.
This new subsystem is somewhat replacing the IDE disk code we had with a
new flexible design.
StorageDevice is a generic class that represent a generic storage
device. It is meant that specific storage hardware will override the
interface. StorageController is a generic class that represent
a storage controller that can be found in a machine.
The IDEController class governs two IDEChannels. An IDEChannel is
responsible to manage the master & slave devices of the channel,
therefore an IDEChannel is an IRQHandler.
Such device is not an IRQHandler by itself, but actually a controller of
many IRQ or MSI devices. The purpose of this class is to manage multiple
sources of interrupts.
For example, a generic ISA IDE controller controls 2 IRQ sources - 14
and 15. So, when we initialize the IDE controller, it will initialize
two IDE channels (also known as PATAChannels) to utilize IRQ 14 and 15,
respectively. NVMe with MSI-X support can theoretically handle up to
2048 interrupts.
When a process crashes, we generate a coredump file and write it in
/tmp/coredumps/.
The coredump file is an ELF file of type ET_CORE.
It contains a segment for every userspace memory region of the process,
and an additional PT_NOTE segment that contains the registers state for
each thread, and a additional data about memory regions
(e.g their name).
This makes the Scheduler a lot leaner by not having to evaluate
block conditions every time it is invoked. Instead evaluate them as
the states change, and unblock threads at that point.
This also implements some more waitid/waitpid/wait features and
behavior. For example, WUNTRACED and WNOWAIT are now supported. And
wait will now not return EINTR when SIGCHLD is delivered at the
same time.
This allows issuing asynchronous requests for devices and waiting
on the completion of the request. The requests can cascade into
multiple sub-requests.
Since IRQs may complete at any time, if the current process is no
longer the same that started the process, we need to swich the
paging context before accessing user buffers.
Change the PATA driver to use this model.
Rework the PS/2 keyboard and mouse drivers to use a common 8042
controller driver. Also, reset and reconfigure the 8042 controller
as they are not guaranteed to be in the state that we expect.
This allows issuing asynchronous requests for devices and waiting
on the completion of the request. The requests can cascade into
multiple sub-requests.
Since IRQs may complete at any time, if the current process is no
longer the same that started the process, we need to swich the
paging context before accessing user buffers.
Change the PATA driver to use this model.
This enables the APIC timer on all CPUs, which means Scheduler::timer_tick
is now called on all CPUs independently. We still don't do anything on
the APs as it instantly crashes due to a number of other problems.
This finally takes care of the kind-of excessive boilerplate code that were the
ctype adapters. On the other hand, I had to link `LibC/ctype.cpp` to the Kernel
(for `AK/JsonParser.cpp` and `AK/Format.cpp`). The previous commit actually makes
sense now: the `string.h` includes in `ctype.{h,cpp}` would require to link more LibC
stuff to the Kernel when it only needs the `_ctype_` array of `ctype.cpp`, and there
wasn't any string stuff used in ctype.
Instead of all this I could have put static derivatives of `is_any_of()` in the
concerned AK files, however that would have meant more boilerplate and workarounds;
so I went for the Kernel approach.
This function is not avaliable in the kernel.
In the future it would be nice to have some sort of <charconv> header
that does this for all integer types and then call it in strtoull and et
cetera.
The difference would be that this function say 'from_chars' would return
an Optional and not just interpret anything invalid as zero.
Since the CPU already does almost all necessary validation steps
for us, we don't really need to attempt to do this. Doing it
ourselves doesn't really work very reliably, because we'd have to
account for other processors modifying virtual memory, and we'd
have to account for e.g. pages not being able to be allocated
due to insufficient resources.
So change the copy_to/from_user (and associated helper functions)
to use the new safe_memcpy, which will return whether it succeeded
or not. The only manual validation step needed (which the CPU
can't perform for us) is making sure the pointers provided by user
mode aren't pointing to kernel mappings.
To make it easier to read/write from/to either kernel or user mode
data add the UserOrKernelBuffer helper class, which will internally
either use copy_from/to_user or directly memcpy, or pass the data
through directly using a temporary buffer on the stack.
Last but not least we need to keep syscall params trivial as we
need to copy them from/to user mode using copy_from/to_user.
The JS tests pointed out that the implementation in DateTime
had an off-by-one in the month when doing the leap year check,
so this change fixes that bug.
This does not add any behaviour change to the processes, but it ties a
TTY to an active process group via TIOCSPGRP, and returns the TTY to the
kernel when all processes in the process group die.
Also makes the TTY keep a link to the original controlling process' parent (for
SIGCHLD) instead of the process itself.
This is racy in userspace and non-racy in kernelspace so let's keep
it in kernelspace.
The behavior change where CLOEXEC is preserved when dup2() is called
with (old_fd == new_fd) was good though, let's keep that.
By having a separate list of constructors for the kernel heap
code, we can properly use constructors without re-running them
after the heap was already initialized. This solves some problems
where values were wiped out because they were overwritten by
running their constructors later in the initialization process.
This syscall allows a parent process to disown a child process, setting
its parent PID to 0.
Unparented processes are automatically reaped by the kernel upon exit,
and no sys$waitid() is required. This will make it much nicer to do
spawn-and-forget which is common in the GUI environment.
This is something I've been meaning to do for a long time, and here we
finally go. This patch moves all sys$foo functions out of Process.cpp
and into files in Kernel/Syscalls/.
It's not exactly one syscall per file (although it could be, but I got
a bit tired of the repetitive work here..)
This makes hacking on individual syscalls a lot less painful since you
don't have to rebuild nearly as much code every time. I'm also hopeful
that this makes it easier to understand individual syscalls. :^)