Previously we would unintentionally leave them zero-initialized,
resulting in any threads created post fork (but without execve) having
invalid thread local storage pointers stored in their FS register.
This commit adds all necessary includes, so all functions are properly
declared.
PCI.cpp is moved to PCI/Initializer.cpp, as that matches the header
path.
The `[[gnu::packed]]` attribute apparently lowered the required
alignment of the structs, which caused the compiler to generate two
1 byte loads/stores on RISC-V. This caused the kernel to read/write
incorrect values, as the device only seems to accept 2 byte operations.
`MM.protect_kernel_image` would otherwise make the contents of these
sections read-only, as they were for some reason placed before `.data`
and after the start of `.text`.
RAMFS was passing 0, which lead to the userspace seeing all entries as
DT_UNKNOWN when iterating over the directory contents.
To repro prior to this commit, simply check `echo /tmp/*/`.
Following 77441079dd, the code in Kernel/Devices/HID/MouseDevice.cpp
is used by both USB and PS2 rodents. Make sure not to emit misleading
debug messages that could suggest that a USB mouse is a PS/2 one.
Other arches don't use the prekernel, so don't try to unmap it on
non-x86 platforms.
For some reason, this didn't cause aarch64 to crash, but on riscv64 this
would cause a panic.
This doesn't affect system functionality, but somewhat reduces the
reliance on complicated hardcoded paths. It also allows the user to
simply link /init (which is normally a symbolic link) to another program
to run it instead of SystemServer as the default option.
We could technically copy the dynamic loader to other path and run it
from there, so let's not assume paths.
If the user is so determined to do such thing, then a warning is quite
meaningless.
When writing to /sys/kernel/request_panic it will do a kernel panic.
Trying to truncate the node will result in kernel panic with a slightly
different message.
The networking subsystem currently assumes all adapters are Ethernet
adapters, including the LoopbackAdapter, so all packets are pre-pended
with an Ethernet Frame header. Since the MTU must not include any
overhead added by the data-link (Ethernet in this case) or physical
layers, we need to subtract it from the MTU.
This fixes a kernel panic which occurs when sending a packet that is at
least 65523 bytes long through the loopback adapter, which results in
the kernel "receiving" a packet which is larger than the support MTU
out the other end. (As the actual final size was increased by the
addition of the ethernet frame header)
As per POSIX, the behavior of listen() with a backlog value of 0 is
implementation defined: "A backlog argument of 0 may allow the socket
to accept connections, in which case the length of the listen queue may
be set to an implementation-defined minimum value."
Since creating a socket that can't accept any connections seems
relatively useless, and as other platforms (Linux, FreeBSD, etc) chose
to support accepting connections with this backlog value, support it as
well by normalizing it to 1.
This is necessary for being able to use the qemu `-kernel` option.
The QEMU virt machine uses OpenSBI's FW_DYNAMIC feature to pass
the kernel entry address, which is the virtual entry point address
specified in the kernel ELF. If we instead `objcopy` the kernel into a
raw binary, OpenSBI will jump to the physical kernel load address, which
is what we want it to do.
When the FileSystem does a sync, it gathers up all the inodes with
dirty metadata into a vector. The inode mutex is not held while
checking the inode dirty bit, which can lead to a kernel panic
due to concurrent inode modifications.
Fixes: #21796
It seems like the current implementation returns 0 in case we do not
have enough data for a whole packet yet. The 0 value gets propagated
to the return value of the syscall which according to the spec
should return non-zero values for non-errors cases. This causes panic,
as there is a VERIFY guard checking that more than > 0 bytes are
written if no error has occurred.
There's no need to have separate syscall for this kind of functionality,
as we can just have a device node in /dev, called "beep", that allows
writing tone generation packets to emulate the same behavior.
In addition to that, we remove LibC sysbeep function, as this function
was never being used by any C program nor it was standardized in any
way.
Instead, we move the userspace implementation to LibCore.
Previously, attempting to update an ext2 inode with a UID or GID
larger than 65535 would overflow. We now write the high bits of UIDs
and GIDs to the same place that Linux does within the `osd2` struct.
A bit old but a relatively uncomplicated device capable of outputting
1920x1080 video with 32-bit color. Tested with a Voodoo 3 3000 16MB
PCI card. Resolution switching from DisplaySettings also works.
If the requested mode contains timing information, it is used directly.
Otherwise, display timing values are selected from the EDID. First the
detailed timings are checked, and then standard and established
timings for which there is a matching DMT mode. The driver does not
(yet) read the actual EDID, so the generic EDID in DisplayConnector now
includes a set of common display modes to make this work.
The driver should also be compatible with the Voodoo Banshee, 4 and 5
but I don't have these cards to test this with. The PCI IDs of these
cards are included as a commented line in case someone wants to give it
a try.
Moving the DeviceManagement initialization, which is only needed by
userland in the first place, to after interrupt and time management
initialization (like other things that require randomness) allows the
SipHash initialization to access good randomness without problems.
Note: There currently is another, unrelated boot problem on aarch64,
which is not caused by SipHash as far as we know. This commit therefore
only fixes the SipHash regression.
This view is really nice to check flags, but when clearing them we must
make sure that we only ever try to set 1 bit at a time, which makes
setting bits through the structured view a footgun, as that fetches,
ors in and then sets, potentially resetting other flags.
According to multiboot spec if flag for framebuffer isn't
set then corresponding fields are invalid. In reality they're set
to 0 but let's be defensive.
Loaders try to put modules as low as reasonable but on
EFI often "reasonable" is much higher than on BIOS. As
a result target can be easily higher than source.
Then we have 2 problems:
* memmove compares virtual address and since target
is mapped higher it ends up going backwards which
is wrong if target is physically below source
* order of copying of sections must be inverted if
target is below source
Prekernel code currently assumes that mapping until MAX_KERNEL_SIZE
is enough to make the modules accessible. GRUB tries to load as low
as possible but higher than 1 MiB. Hence this is usually true.
However on EFI some ranges may already be used by boot services and
GRUB tries to avoid them if possible. This pushes modules higher.
The simplest solution is to map entire 4 GiB space.
As an additional benefit it makes the framebuffer accessible that
can be used for the debugging.
About half of the Processor code is common across architectures, so
let's share it with a templated base class. Also, other code that can be
shared in some ways, like FPUState and TrapFrame functions, is adjusted
here. Functions which cannot be shared trivially (without internal
refactoring) are left alone for now.
SipHash is highly HashDoS-resistent, initialized with a random seed at
startup (i.e. non-deterministic) and usable for security-critical use
cases with large enough parameters. We just use it because it's
reasonably secure with parameters 1-3 while having excellent properties
and not being significantly slower than before.
This subtraction is necessary to ensure that the section has the correct
address. Also, without this change, the Kernel ELF binary would explode
in size. This was forgotten in a0dd6ec6b1.
This field is in a packed struct, which makes it possibly misaligned.
This knowledge is lost when invoking `dbgln` triggering an unaligned
access to it, aka UB. By explicitely copying it we avoid this issue.
Simplify core methods in the VirtIO bus handling code by ensuring proper
error propagation. This makes initialization of queues, handling changes
in device configuration, and other core patterns more readable as well.
It also allows us to remove the obnoxious pattern of checking for
boolean "success" and if we get false answer then returning an actual
errno code.
When a device is plugged into the machine (and hence, when
`Device::try_create()` is called), then we attempt to load a driver by
calling that driver's probe function.
At any one given time, there can be an abitrary number of USB drivers in
the system. The way driver mapping works (i.e, a device is inserted, and
a potentially matching driver is probed) requires us to have
instantiated driver objects _before_ a device is inserted. This leaves
us with a slight "chicken and egg" problem. We cannot call the probe
function before the driver is initialised, but we need to know _what_
driver to initialise.
This section is designed to store pointers to functions that are called
during the last stage of the early `_init` sequence in the Kernel. The
accompanying macro in `USBDriver` emits a symbol, based on the driver
name, into this table that is then automatically called.
This way, we enforce a "common" driver model; driver developers are not
only required to write their driver and inherit from `USB::Driver`, but
are also required to have a free floating init function that registers
their driver with the USB Core.
The VirtIO specification defines many types of devices with different
purposes, and it also defines 3 possible transport mediums where devices
could be connected to the host machine.
We only care about the PCIe transport, but this commit puts the actual
foundations for supporting the lean MMIO transport too in the future.
To ensure things are kept abstracted but still functional, the VirtIO
transport code is responsible for what is deemed as related to an actual
transport type - allocation of interrupt handlers and tinkering with low
level transport-related registers, etc.
We should consider whether the selected Thread is within the same jail
or not.
Therefore let's make it clear to callers with jail semantics if a called
method checks if the desired Thread object is within the same jail.
As for Thread::for_each_* methods, currently nothing in the kernel
codebase needs iteration with consideration for jails, so the old
Thread::for_each* were simply renamed to include "ignoring_jails" suffix
in their names.
Some syscalls could be simplified by using the non-static method
Process::get_thread_from_thread_list which should ensure that the
specified tid is of a Thread in the same Process of the current Thread.
The Kernel/API directory in general shouldn't include userspace code,
but structure definitions that both are shared between the Kernel and
userspace.
All users of the ioctl API obviously use LibC so LibC is the most common
and shared library for the affected programs.
The Kernel/API directory in general shouldn't include userspace code,
but structure definitions that both are shared between the Kernel and
userspace.
LibC is the most appropriate place for these methods as they're already
included in the sys/sysmacros.h file to create a set of convenient
macros for these methods.
Userspace initially didn't have any sort of mechanism to handle
device hotplug (either removing or inserting a device).
This meant that after a short term of scanning all known devices, by
fetching device events (DeviceEvent packets) from /dev/devctl, we
basically never try to read it again after SystemServer initialization
code.
To accommodate hotplug needs, we change SystemServer by ensuring it will
generate a known set of device nodes at their location during the its
main initialization code. This includes devices like /dev/mem, /dev/zero
and /dev/full, etc.
The actual responsible userspace program to handle hotplug events is a
new userspace program called DeviceMapper, with following key points:
- Its current task is to to constantly read the /dev/devctl device node.
Because we already created generic devices, we only handle devices
that are dynamically-generated in nature, like storage devices, audio
channels, etc.
- Since dynamically-generated device nodes could have an infinite minor
numbers, but major numbers are decoded to a device type, we create an
internal registry based on two structures - DeviceNodeFamily, and
RegisteredDeviceNode. DeviceNodeFamily objects are attached in the
main logic code, when handling a DeviceEvent device insertion packet.
A DeviceNodeFamily object has an internal HashTable to hold objects of
RegisteredDeviceNode class.
- Because some device nodes could still share the same major number (TTY
and serial TTY devices), we have two modes of allocation - limited
allocation (so a range is defined for a major number), or infinite
range. Therefore, two (or more) separate DeviceNodeFamily objects can
can exist albeit sharing the same major number, but they are required
to allocate from a different minor numbers' range to ensure there are
no collisions.
- As for KCOV, we handle this device differently. In case the user
compiled the kernel with such support - this happens to be a singular
device node that we usually don't need, so it's dynamically-generated
too, and because it has only one instance, we don't register it in our
internal registry to not make it complicated needlessly.
The Kernel code is modified to allow proper blocking in case of no
events in the DeviceControlDevice class, because otherwise we will need
to poll periodically the device to check if a new event is available,
which would waste CPU time for no good reason.
This is an initial implementation, about as basic as intended by the
RFC, and not configurable from userspace at the moment. It should reduce
the amount of low-sized packets sent, reducing overhead and thereby
network traffic.
The name for this directory is a bit awkward. Also, the distinction of
constant information is not really valuable as I thought it would be, so
let's bring that information back into the /sys/kernel directory.
The name "variables" is a bit awkward and what the directory entries are
really about is kernel configuration so let's make it clear with the new
name.
These options are not relevant and are actually meaningless on pure TTY
devices, as they are meant to be effective only for the VirtualConsole
devices.
This also removes the virtual marking from two methods because they're
no longer declared in the TTY class as well.
These syscalls are not necessary on their own, and they give the false
impression that a caller could set or get the thread name of any process
in the system, which is not true.
Therefore, move the functionality of these syscalls to be options in the
prctl syscall, which makes it abundantly clear that these operations
could only occur from a running thread in a process that sees other
threads in that process only.
This new Kernel StdLib function will be used to copy contents of a
FixedStringBuffer with a null character to a user process.
The first user of this new function is the prctl option of
PR_GET_PROCESS_NAME which would copy a process name including a null
character to a user provided buffer.
When doing PR_{SET,GET}_PROCESS_NAME, it's not expected to pass a signed
integer for the buffer size (in arg2). Therefore, cast it immediately to
a size_t integer type, and let the FixedStringBuffer StdLib memory copy
functions in such cases to worry about possible overflows.
There was a small mishmash of argument order, as seen on the table:
| Traits<T>::equals(U, T) | Traits<T>::equals(T, U)
============= | ======================= | =======================
uses equals() | HashMap | Vector, HashTable
defines equals() | *String[^1] | ByteBuffer
[^1]: String, DeprecatedString, their Fly-type equivalents and KString.
This mostly meant that you couldn't use a StringView for finding a value
in Vector<String>.
I'm changing the order of arguments to make the trait type itself first
(`Traits<T>::equals(T, U)`), as I think it's more expected and makes us
more consistent with the rest of the functions that put the stored type
first (like StringUtils functions and binary_serach). I've also renamed
the variable name "other" in find functions to "entry" to give more
importance to the value.
With this change, each of the following lines will now compile
successfully:
Vector<String>().contains_slow("WHF!"sv);
HashTable<String>().contains("WHF!"sv);
HashMap<ByteBuffer, int>().contains("WHF!"sv.bytes());
As "\n" is translated to "\r\n" in TTYs, the condition for a write
to succeed on a pseudoterminal should check if the underlying buffer
has 2 bytes empty rather than 1.
Fixes SerenityOS#18888
This patch ensures that the shutdown procedure can complete due to the
fact we don't kill kernel processes anymore, and only stop the scheduler
from running after the filesystems unmount procedure.
We also need kernel processes during the shutdown procedure, because we
rely on the WorkQueue threads to run WorkQueue items to complete async
IO requests initiated by filesystem sync & unmounting, etc.
This is also simplifying the code around the killing processes, because
we don't need to worry about edge cases such as the FinalizerTask
anymore.
The process could be long gone by the point the async IO request has
completed so hold a weak reference pointer to the requesting Process and
try get a strong reference only when needed.
This patch is necessary because otherwise async IO requests can hold
Process objects long after they were terminated, which would make it
impossible to perform certain tasks in the system, like killing all user
processes during the shutdown procedure.
We first must flush the superblock through the BlockBasedFileSystem
methods properly and only then clear the DiskCache pointer, to prevent a
possible kernel panic due to nullptr dereference.
Previously we would set the KeyCode correctly to the appropriate
extended keys values, like Home and End, but keep the code point of the
original keys, like 1, 2, 3, etc. Because of this, the keys would just
print the original keys, instead of behaving like the extended ones.
This is a minimal set of changes to allow `serenity.sh build riscv64` to
successfully generate the build environment and start building. This
includes some, but not all, assembly stubs that will be needed later on;
they are currently empty.
Shadow doorbell feature was added in the NVMe spec to improve
the performance of virtual devices.
Typically, ringing a doorbell involves writing to an MMIO register in
QEMU, which can be expensive as there will be a trap for the VM.
Shadow doorbell mechanism was added for the VM to communicate with the
OS when it needs to do an MMIO write, thereby avoiding it when it is
not necessary.
There is no performance improvement with this support in Serenity
at the moment because of the block layer constraint of not batching
multiple IOs. Once the command batching support is added to the block
layer, shadow doorbell support can improve performance by avoiding many
MMIO writes.
Default to old MMIO mechanism if shadow doorbell is not supported.
Introduce a new Struct Doorbell that encapsulates the mmio doorbell
register.
This commit does not introduce any functional changes and it is added
in preparation to adding shadow doorbell support.
This was the root cause of zombie processes showing up randomly and
disappearing after some disk activity, such as running shell commands -
The NVMeIO AsyncBlockDeviceRequest member simply held a pointer to a
Process object, therefore it could keep it alive a for a long time after
it ceased to actually function at all.
While LLD and mold support RELR "packed" relocations on all
architectures, the BFD linker currently only implements them on x86-64
and POWER.
This fixes two issues:
- The Kernel had it enabled even for AArch64 + GCC, which led to the
following being printed: `warning: -z pack-relative-relocs ignored`.
- The userland always had it disabled, even in the supported AArch64 +
Clang/mold scenarios.
Two non-functional changes:
- Remove pointless `-latomic` flag. It was specified via
`add_compile_options`, which only affects compilation and not linking,
so the library was never actually linked into the kernel. In fact, we
do not even build `libatomic` for our toolchain.
- Do not disable `-Wnonnull`. The warning-causing code was fixed at some
point.
This commit also removes `-mstrict-align` from the userland. Our target
AArch64 hardware natively supports unaligned accesses without a
significant performance penalty. Allowing the compiler to insert
unaligned accesses into aligned-as-written code allows for some
performance optimizations in fact. We keep this option turned on in the
kernel to preserve correctness for MMIO, as that might be sensitive to
alignment.
Add the device ID for PCI serial port cards that use the WCH CH351
chip. This device has been tested with real hardware where the serial
debug output could succesfully be received.
Now that support for 32-bit x86 has been removed, we don't have to worry
about the top half of `off_t`/`u64` values being chopped off when we try
to pass them in registers. Therefore, we no longer need the workaround
of pointers to stack-allocated values to syscalls.
Note that this changes the system call ABI, so statically linked
programs will have to be re-linked.
Using the kernel stack is preferable, especially when the examined
strings should be limited to a reasonable length.
This is a small improvement, because if we don't actually move these
strings then we don't need to own heap allocations for them during the
syscall handler function scope.
In addition to that, some kernel strings are known to be limited, like
the hostname string, for these strings we also can use FixedStringBuffer
to store and copy to and from these buffers, without using any heap
allocations at all.
Instead, use the FixedCharBuffer class to ensure we always use a static
buffer storage for these names. This ensures that if a Process or a
Thread were created, there's a guarantee that setting a new name will
never fail, as only copying of strings should be done to that static
storage.
The limits which are set are 32 characters for processes' names and 64
characters for thread names - this is because threads' names could be
more verbose than processes' names.
This class encapsulates a fixed Array with compile-time size definition
for storing ASCII characters.
There are also new Kernel StdLib functions to copy user data into such
objects so this class will be useful later on.
Previously we could get a raw pointer to a Mount object which might be
invalid when actually dereferencing it.
To ensure this could not happen, we should just use a callback that will
be used immediately after finding the appropriate Mount entry, while
holding the mount table lock.
We don't really need this method anymore, because we could just try to
find the mount entry based on the given mount point host custody.
This also allows us to remove the is_vfs_root and root_inode_id methods
from the VirtualFileSystem class.
We could easily encounter a case where we do the following:
```
mkdir -p /tmp2
mount /dev/hda /tmp2
```
would produce a bug that doing `ls /tmp2/tmp2` will give the contents
on `/dev/hda` ext2 root directory and also on `/tmp2/tmp2/tmp2` and so
on.
To prevent this, we must compare the current custody against each mount
entry's custody to ensure their paths match.
This is not useful, as we have literally zero knowledge about where this
inode is actually located at with respect to the entire global path tree
so we could easily encounter a case where we do the following:
```
mkdir -p /tmp2
mount /dev/hda /tmp2
```
and when traversing the /tmp2 directory entries, we will see the root
inode of /dev/hda on "/tmp2/tmp2", even if it was not mounted.
Therefore, we should just plainly give the raw directory entries as they
are written "on the disk". Anything else that needs to exactly know if
there's an underlying mounted filesystem, can just use the stat syscall
instead.
This ensures that the host mount point custody path is not the same like
the new to-be-mounted custody.
A scenario that could happen before adding this check is:
```
mkdir -p /tmp2
mount /dev/hda /tmp2/
mount /dev/hda /tmp2/
mount /dev/hda /tmp2/ # this will fail here
```
and after adding this check, the following scenario is now this:
```
mkdir -p /tmp2
mount /dev/hda /tmp2/
mount /dev/hda /tmp2/ # this will fail here
mount /dev/hda /tmp2/ # this will fail here too
```