In order to reduce our reliance on __builtin_{ffs, clz, ctz, popcount},
this commit removes all calls to these functions and replaces them with
the equivalent functions in AK/BuiltinWrappers.h.
Not much to say here, this is an implementation of this call that
accesses the actual limit constant that's used by the VirtualFileSystem
class.
As a side note, this is required for my eventual Qt port.
As pointed out by BertalanD on Discord, POSIX specifies that
_SC_SYMLOOP_MAX (implemented in the following commit) always needs to be
equal or more than _POSIX_SYMLOOP_MAX (8, defined in
LibC/bits/posix1_lim.h), hence I've increased it to that value to
comply with the standard.
The move to header is required for the following commit - to make this
constant accessible outside of the VFS class, namely in sysconf.
For setreuid and setresuid syscalls, -1 means to set the current
uid/euid/gid/egid value, to be more convenient for programming.
However, for other syscalls where we pass only one argument, there's no
justification to specify -1.
This behavior is identical to how Linux handles the value -1, and is
influenced by the fact that the manual pages for the group of one
argument syscalls that handle ID operations is ambiguous about this
topic.
Unsurprisingly, the /proc/PID/stacks/TID stack walk had the same
arbitrary memory read problem as the perf event stack walk.
It would be nice if the kernel had a single stack walk implementation,
but that's outside the scope of this commit.
Since perfcore files can be generated during process finalization,
we can't just allow them to contain sensitive kernel information
if they're gonna be owned by the process's own UID+GID.
So instead, perfcores are now owned by 0:0. This is not the most
ergonomic solution, but I'm not sure what we could do to make it nicer.
We'll have to think more about that. In the meantime, this patches up
a kernel info leak. :^)
When walking the stack to generate a perf_event sample, we now check
if a userspace stack frame points back into kernel memory.
It was possible to use this as an arbitrary kernel memory read. :^)
We can leave the .ksyms section mapped-but-read-only and then have the
symbols index simply point into it.
Note that we manually insert null-terminators into the symbols section
while parsing it.
This gets rid of ~950 KiB of kmalloc_eternal() at startup. :^)
We used to build with -Os in order to fit within a certain size, but
there isn't really a good reason for that kind of restriction.
Switching to -O2 yields a significant improvement in throughput,
for example `test-js` is roughly 20% faster on my machine. :^)
This fixes at least half of our LibC includes in the kernel. The source
of truth for errno codes and their description strings now lives in
Kernel/API/POSIX/errno.h as an enumeration, which LibC includes.
Before this commit, we only checked the receive buffer on the socket,
which is unused on datagram streams. Now we return the actual size of
the datagram without the protocol headers, which required the protocol
to tell us what the size of the payload is.
This small change allows to use the IOAPIC by default without to enable
SMP mode, which emulates Uni-Processor setup with IOAPIC instead of
using the PIC.
This opens the opportunity to utilize other types of interrupts like MSI
and MSI-X interrupts.
The MADT data could be on unaligned boundary - for example, a GSI number
(u32) on unaligned address which leads to a KUBSAN error and halting the
system.
Using the phrase "create" doesn't give information on whether the object
must be allocated or a failure to do so can be handled gracefully.
Therefore, we must use better phrase for such purpose, so "must_create"
for the allocate-and-construct static methods is definitely good choice.
Instead, allocate before constructing the object and pass NonnullOwnPtr
of KString to the object if needed. Some classes can determine their
names as they have a known attribute to look for or have a static name.
- dump_backtrace was using ebp instead of rbp on x86_64, only using the
lower 32-bits of rbp.
- The symbol loader was only fetching half of the pointer from the
symbol table. (8 chars instead of 16 chars)
Since it's possible to determine where the small zones will start to
occur for each PhysicalRegion, we can use arithmetic so that the call
time for both large and small zones is identical.
This file refers to the controlling terminal associated with the current
process. It's specified by POSIX, and is used by ports like openssh to
interface with the terminal even if the standard input/output is
redirected to somewhere else.
Our implementation leverages ProcFS's existing facilities to create
process-specific symbolic links. In our setup, `/dev/tty` is a symbolic
link to `/proc/self/tty`, which itself is a symlink to the appropriate
`/dev/pts` entry. If no TTY is attached, `/dev/tty` is left dangling.
Now that the userland has a compatiblity wrapper for select(), the
kernel doesn't need to implement this syscall natively. The poll()
interface been around since 1987, any code still using select()
should be slapped silly.
Note: the SerenityOS source tree mostly uses select() and not poll()
despite SerenityOS having support for poll() since early 2019...
This includes a new Thread::Blocker called SignalBlocker which blocks
until a signal of a matching type is pending. The current Blocker
implementation in the Kernel is very complicated, but cleaning it up is
a different yak for a different day.
As required by posix. Also rename Thread::clear_signals to
Thread::reset_signals_for_exec since it doesn't actually clear any
pending signals, but rather does execve related signal book-keeping.
This option is already enabled when building Lagom, so let's enable it
for the main build too. We will no longer be surprised by Lagom Clang
CI builds failing while everything compiles locally.
Furthermore, the stronger `-Wsuggest-override` warning is enabled in
this commit, which enforces the use of the `override` keyword in all
classes, not just those which already have some methods marked as
`override`. This works with both GCC and Clang.
It's not enough to just find the largest-address-not-above the argument,
we must also check that the found region actually contains the argument.
Regressed in a23edd42b8, thanks to Idan
for pointing this out.
Most of the time, we will be freeing physical pages within the
full-sized zones. We can do some simple math to find the right zone
immediately instead of looping through the zones, checking each one.
We still do loop through the slack/remainder zones at the end.
There's probably an even nicer way to solve this, but this is already a
nice improvement. :^)
We were already doing this for userspace memory regions (in the
Memory::AddressSpace class), so let's do it for kernel regions as well.
This gives a nice speed-up on test-js and probably basically everything
else as well. :^)
There is no use to create a temporary String of a char const* to just
cast it to a StringView on SysFSComponent construction again.
Also this could have lead to a UAF bug.
If we're sending an urgent signal (i.e. due to unexpected conditions)
and the Process did not setup any signal handler, we should immediately
terminate the Thread, to ensure the current trap frame is preserved for
the impending core dump.
Returning 'result.error().code()' erroneously creates an
ErrorOr<FlatPtr> of the positive errno code, which breaks our
error-returning convention.
This seems to be due to a forgotten minus-sign during the refactoring in
9e51e295cf. This latent bug was never
discovered, because currently the error-handling paths are rarely
exercised.
Also, remove incomplete, superfluous check.
Incomplete, because only the byte at the provided address was checked;
this misses the last bytes of the "jerk page".
Superfluous, because it is already correctly checked by peek_user_data
(which calls copy_from_user).
The caller/tracer should not typically attempt to read non-userspace
addresses, we don't need to "hot-path" it either.
Creating pointers from arbitrary values is not a valid thing to do in
constexpr functions. Furthermore, this functions is always called with
runtime values anyways, so there's no use in having it be constexpr.
Instead, make it ALWAYS_INLINE.
Clang rejects binding a reference to a null pointer at compile-time.
Let's just crash explicitly, instead of waiting for a null dereference
to mess things up.
Since we're iterating over multiple regions that interesect with the
requested range, just one of them having the requested access flags
is not enough to finish the syscall early.
SIGSTKFLT is a signal that signifies a stack fault in a x87 coprocessor,
this signal is not POSIX and also unused by Linux and the BSDs, so let's
use SIGSEGV so programs that setup signal handlers for the common
signals could still handle them in serenity.
The advices are almost always exclusive of one another, and while POSIX
does not define madvise, most other unix-like and *BSD systems also only
accept a singular value per call.
In the continuous effort of better handling OOM in the kernel,
we want to move away from all AK::String usage. One of the final
pieces left to accomplish this goal is replacing all of the usages
of `String::formatted` with something that can actually propagate
failure.
The StringBuilder API was enhanced in the recent past to propagate
failure and thus a slightly modified version of what exists in
`AK::format` will work well for implementing failable format with
`KString`.
Previously, Virtio console ports would not show up in `/sys/dev/char/`.
Also adds support to `SystemServer` to create more than one console
port device in `/dev/` in the multiport case.
The fixes are:
1. Don't copy PCI::DeviceIdentifier during construction. This is a heavy
structure to copy so we definitely don't want to do that. Instead, use
a const reference to it like what happens in other parts in the Kernel.
2. Declare the constructor as explicit to avoid construction errors.
Previously we `VERIFY()`ed that the device supports variable-rate audio
(VRA). Now, we query the VRA bit and if VRA is not supported, we do not
enable double-rate audio and disallow setting any sample rate except
the fixed 48kHz rate as defined by the AC'97 specification. This should
allow the driver to function on a wider array of hardware.
Note that in the AC'97 specification, DRA without VRA is allowed when
supported: this effectively doubles the sample rate to 96kHZ. For now,
we ignore that possibility and let it default to 48kHZ.