Make use of the new FileDescription::try_serialize_absolute_path() to
avoid String in favor of KString throughout much of sys$execve() and
its helpers.
A couple of things were changed:
1. Semantic changes - PCI segments are now called PCI domains, to better
match what they are really. It's also the name that Linux gave, and it
seems that Wikipedia also uses this name.
We also remove PCI::ChangeableAddress, because it was used in the past
but now it's no longer being used.
2. There are no WindowedMMIOAccess or MMIOAccess classes anymore, as
they made a bunch of unnecessary complexity. Instead, Windowed access is
removed entirely (this was tested, but never was benchmarked), so we are
left with IO access and memory access options. The memory access option
is essentially mapping the PCI bus (from the chosen PCI domain), to
virtual memory as-is. This means that unless needed, at any time, there
is only one PCI bus being mapped, and this is changed if access to
another PCI bus in the same PCI domain is needed. For now, we don't
support mapping of different PCI buses from different PCI domains at the
same time, because basically it's still a non-issue for most machines
out there.
2. OOM-safety is increased, especially when constructing the Access
object. It means that we pre-allocating any needed resources, and we try
to find PCI domains (if requested to initialize memory access) after we
attempt to construct the Access object, so it's possible to fail at this
point "gracefully".
3. All PCI API functions are now separated into a different header file,
which means only "clients" of the PCI subsystem API will need to include
that header file.
4. Functional changes - we only allow now to enumerate the bus after
a hardware scan. This means that the old method "enumerate_hardware"
is removed, so, when initializing an Access object, the initializing
function must call rescan on it to force it to find devices. This makes
it possible to fail rescan, and also to defer it after construction from
both OOM-safety terms and hotplug capabilities.
This change adds a static lock hierarchy / ranking to the Kernel with
the goal of reducing / finding deadlocks when running with SMP enabled.
We have seen quite a few lock ordering deadlocks (locks taken in a
different order, on two different code paths). As we properly annotate
locks in the system, then these facilities will find these locking
protocol violations automatically
The `LockRank` enum documents the various locks in the system and their
rank. The implementation guarantees that a thread holding one or more
locks of a lower rank cannot acquire an additional lock with rank that
is greater or equal to any of the currently held locks.
There are certain checks that we should skip if the system is crashing.
The system can avoid stack overflow during crash, or even triple
faulting while while handling issues that can causes recursive panics
or aborts.
I had to look at this for a moment before I realized that sys$execve()
and the spawning of /bin/SystemServer at boot are taking two different
paths out of exec().
Add a comment to help the next person looking at it. :^)
Before this patch, we were leaking a ref on the open file description
used for the interpreter (the dynamic loader) in sys$execve().
This surfaced when adapting the syscall to use TRY(), since we were now
correctly transferring ownership of the interpreter to Process::exec()
and no longer holding on to a local copy of it (in `elf_result`).
Fixing the leak uncovered another problem. The interpreter description
would now get destroyed when returning from do_exec(), which led to a
kernel panic when attempting to acquire a mutex.
This happens because we're in a particularly delicate state when
returning from do_exec(). Everything is primed for the upcoming context
switch into the new executable image, and trying to block the thread
at this point will panic the kernel.
We fix this by destroying the interpreter description earlier in
do_exec(), at the point where we no longer need it.
When executing a dynamically linked program, we need to pass the main
program executable via a file descriptor to the dynamic loader.
Before this patch, we were allocating an FD for this purpose long after
it was safe to do anything fallible. If we were unable to allocate an
FD we would simply panic the kernel(!)
We now hoist the allocation so it can fail before we've committed to
a new executable.
Due to the use of ELF::Image::for_each_program_header(), we were
previously unable to use TRY() in the ELF loading code (since the return
statement inside TRY() would only return from the iteration callback.)
Instead of creating a bunch of ByteBuffers and concatenating them to
generate the "notes" segment, we now simply create a KBufferBuilder
and tell each of the notes generator helpers to write into the builder.
This allows the code to flow more naturally, with some bonus additional
error propagation. :^)
Once we commit to a new executable image in sys$execve(), we can no
longer return with an error to whoever called us from userspace.
We must make sure to surface any potential errors before that point.
This patch moves signal trampoline allocation before the commit.
A number of other things remain to be moved.
We previously allowed Thread to exist in a state where its m_name was
null, and had to work around that in various places.
This patch removes that possibility and forces those who would create a
thread (or change the name of one) to provide a NonnullOwnPtr<KString>
with the name.
- Renamed try_create_absolute_path() => try_serialize_absolute_path()
- Use KResultOr and TRY() to propagate errors
- Don't call this when it's only for debug logging
- Use KResultOr and TRY() to propagate errors
- Check for OOM errors
- Move allocation out of constructors
There's still a lot more to do here, as SysFS is still quite brittle
in the face of memory pressure.
This expands the reach of error propagation greatly throughout the
kernel. Sadly, it also exposes the fact that we're allocating (and
doing other fallible things) in constructors all over the place.
This patch doesn't attempt to address that of course. That's work for
our future selves.
The default template argument is only used in one place, and it
looks like it was probably just an oversight. The rest of the Kernel
code all uses u8 as the type. So lets make that the default and remove
the unused template argument, as there doesn't seem to be a reason to
allow the size to be customizable.
Instead of checking it at every call site (to generate EBADF), we make
file_description(fd) return a KResultOr<NonnullRefPtr<FileDescription>>.
This allows us to wrap all the calls in TRY(). :^)
The only place that got a little bit messier from this is sys$mount(),
and there's a whole bunch of things there in need of cleanup.
Try to do both FD allocations up front instead of interleaved between
assigning them to the descriptor table. This prevents us from failing
in the middle of setting up the pipes.
This patch adds release_error() and release_value() to KResult, making
it usable with TRY().
Note that release_value() returns void, since there is no value inside
a KResult.
This commit moves the KResult and KResultOr objects to Kernel/API to
signify that they may now be freely used by userspace code at points
where a syscall-related error result is to be expected. It also exposes
KResult and KResultOr to the global namespace to make it nicer to use
for userspace code.
This is the idiomatic way to declare type aliases in modern C++.
Flagged by Sonar Cloud as a "Code Smell", but I happen to agree
with this particular one. :^)
This function is currently only ever used to create the init process
(SystemServer). It had a few idiosyncratic things about it that this
patch cleans up:
- Errors were returned in an int& out-param.
- It had a path for non-0 process PIDs which was never taken.
REQUIRE_PROMISE and REQUIRE_NO_PROMISES were macros for some reason,
and used all over the place.
This patch adds require_promise(Pledge) and require_no_promises()
to Process and makes the macros call these on the current process
instead of inlining code everywhere.
According to the VirtIO 1.0 specification:
"Non-transitional devices SHOULD have a PCI Device ID in the range
0x1040 to 0x107f. Non-transitional devices SHOULD have a PCI Revision ID
of 1 or higher. Non-transitional devices SHOULD have a PCI Subsystem
Device ID of 0x40 or higher."
It also says that:
"Transitional devices MUST have a PCI Revision ID of 0. Transitional
devices MUST have the PCI Subsystem Device ID matching the Virtio
Device ID, as indicated in section 5. Transitional devices MUST have the
Transitional PCI Device ID in the range 0x1000 to 0x103f."
So, for legacy devices, we know that revision ID in the PCI header won't
be 1, so we probe for PCI_SUBSYSTEM_ID value.
Instead of using the subsystem device ID, we can probe the DEVICE_ID
value directly in case it's not a legacy device.
This should cover all possibilities for identifying VirtIO devices, both
per the specification of 0.9.5, and future revisions from 1.0 onwards.
This ensures we safely handle interrupts (which can call virtual
functions), so they don't happen in the constructor - this pattern can
lead to a crash, if we are still in the constructor context because
not all methods are available for usage (some are pure virtual,
so it's possible to call __cxa_pure_virtual).
Also, under some conditions like adding a PCI device via PCI-passthrough
mechanism in QEMU, it became exposed to the eye that the code asserts on
RNG::handle_device_config_change(). That device has no configuration but
if the hypervisor still misbehaves and tries to configure it, we should
simply return false to indicate nothing happened.
Like with the ProcFS, description data can change at anytime, so it's
wise to ensure that when the userland reads from an Inode, data is
consistent unless the userland indicated it wants to refresh the data
(by seeking to offset 0, or re-attaching the Inode).
Otherwise, if the data changes in the middle of the reading, it can
cause silent corruption in output which can lead to random crashes.
This makes calling value() on a temporary KResultOr be a compile-time
error. This exposed a number of missing error checks (fixed in the
preceding commits.)
Our existing implementation did not check the element type of the other
pointer in the constructors and move assignment operators. This meant
that some operations that would require explicit casting on raw pointers
were done implicitly, such as:
- downcasting a base class to a derived class (e.g. `Kernel::Inode` =>
`Kernel::ProcFSDirectoryInode` in Kernel/ProcFS.cpp),
- casting to an unrelated type (e.g. `Promise<bool>` => `Promise<Empty>`
in LibIMAP/Client.cpp)
This, of course, allows gross violations of the type system, and makes
the need to type-check less obvious before downcasting. Luckily, while
adding the `static_ptr_cast`s, only two truly incorrect usages were
found; in the other instances, our casts just needed to be made
explicit.
And also try_create<T> => try_make_ref_counted<T>.
A global "create" was a bit much. The new name matches make<T> better,
which we've used for making single-owner objects since forever.
NetworkOrdered is a non trivial type, and it's undefined behavior to
cast a random pointer to it and then pretend it's that type.
Instead just call AK::convert_between_host_and_network_endian on the
individual u16*. This suppresses static analysis warnings.
I don't think there was a "bug" in the previous code, it worked, but
it was very brittle.
This leads to a bad pattern where anyone could create an RNG or a
Console object. Instead, let's just use the common pattern of a static
method to instantiate a new object and return it wrapped by a
NonnullRefPtr.
Before of this change, many specific classes to VirtIO were in the
Kernel namespace, which polluted it.
Everything should be more organized now, but there's still room for
improvement later.
This class member was used only to determine the device type when
printing messages to the debug log. Instead, remove this class member,
and add a quick way to find the device type according to how the VirtIO
specification says to do that.
This simplifies construction of VirtIODevices a bit, because now the
constructor doesn't need to ask for a String identified with the device
type.
This class as a CharacterDevice really was not useful, because you
couldn't even read from it.
Also, the random number generator interface should be the /dev/random,
so any other interface to get random numbers is generally not a good
idea.
Instead, let's keep this functionality as an entropy source for random
numbers generation, but without exposing a device node.
We now expose the `USBDevice`'s address in the SysFS object. This means
that device addresses are no longer determined by the name of the file
in the `/bus/usb/` directory. This was an incorrect way of determining
device address, as a standard PC can have multiple USB controllers
(and hence multiple buses) that can have overlapping device IDs.
Previous implementation sometimes didn't release the key after pressing
and holding shift due to repeating key updates when holding keys. This
meant repeating updates would set/unset `m_both_shift_keys_pressed`
repeatedly, sometimes resulting in shift still being considered pressed
even after you released it.
Simplify left and right shift key pressed logic by tracking both key
states separately and always updating modifiers based on them.
Initializing the variable this way fixes a kernel panic in Clang where
the object was zero-initialized, so the `m_in_scheduler` contained the
wrong value. GCC got it right, but we're better off making this change,
as leaving uninitialized fields in constant-initialized objects can
cause other weird situations like this. Also, initializing only a single
field to a non-zero value isn't worth the cost of no longer fitting in
`.bss`.
Another two variables suffer from the same problem, even though their
values are supposed to be zero. Removing these causes the
`_GLOBAL_sub_I_` function to no longer be generated and the (not
handled) `.init_array` section to be omitted.
This avoids a race between getting the processor-specific SchedulerData
and accessing it. (Switching to a different CPU in that window means
that we're operating on the wrong SchedulerData.)
Co-authored-by: Tom <tomut@yahoo.com>
Prior to this change, both uid_t and gid_t were typedef'ed to `u32`.
This made it easy to use them interchangeably. Let's not allow that.
This patch adds UserID and GroupID using the AK::DistinctNumeric
mechanism we've already been employing for pid_t/ProcessID.
Add a dummy Arch/aarch64/boot.S that for now does nothing but
let all processor cores sleep.
For now, none of the actual Prekernel code is built for aarch64.
This should help prevent deadlocks where a thread blocks on a Mutex
while interrupts are disabled, and makes it impossible for the holder of
the Mutex to make forward progress because it cannot be scheduled in.
Hide it behind a new debug macro LOCK_IN_CRITICAL_DEBUG for now, because
Ext2FS takes a series of Mutexes from the page fault handler, which
executes with interrupts disabled.
Previously, we would try to acquire a reference to the all processes
lock or other contended resources while holding both the scheduler lock
and the thread's blocker lock. This could lead to a deadlock if we
actually have to block on those other resources.
There are callers of processes().with or processes().for_each that
require interrupts to be disabled. Taking a Mutexe with interrupts
disabled is a recipe for deadlock, so convert this to a Spinlock.