We previously allowed Thread to exist in a state where its m_name was
null, and had to work around that in various places.
This patch removes that possibility and forces those who would create a
thread (or change the name of one) to provide a NonnullOwnPtr<KString>
with the name.
- Renamed try_create_absolute_path() => try_serialize_absolute_path()
- Use KResultOr and TRY() to propagate errors
- Don't call this when it's only for debug logging
- Use KResultOr and TRY() to propagate errors
- Check for OOM errors
- Move allocation out of constructors
There's still a lot more to do here, as SysFS is still quite brittle
in the face of memory pressure.
This expands the reach of error propagation greatly throughout the
kernel. Sadly, it also exposes the fact that we're allocating (and
doing other fallible things) in constructors all over the place.
This patch doesn't attempt to address that of course. That's work for
our future selves.
The default template argument is only used in one place, and it
looks like it was probably just an oversight. The rest of the Kernel
code all uses u8 as the type. So lets make that the default and remove
the unused template argument, as there doesn't seem to be a reason to
allow the size to be customizable.
Instead of checking it at every call site (to generate EBADF), we make
file_description(fd) return a KResultOr<NonnullRefPtr<FileDescription>>.
This allows us to wrap all the calls in TRY(). :^)
The only place that got a little bit messier from this is sys$mount(),
and there's a whole bunch of things there in need of cleanup.
Try to do both FD allocations up front instead of interleaved between
assigning them to the descriptor table. This prevents us from failing
in the middle of setting up the pipes.
This patch adds release_error() and release_value() to KResult, making
it usable with TRY().
Note that release_value() returns void, since there is no value inside
a KResult.
This commit moves the KResult and KResultOr objects to Kernel/API to
signify that they may now be freely used by userspace code at points
where a syscall-related error result is to be expected. It also exposes
KResult and KResultOr to the global namespace to make it nicer to use
for userspace code.
This is the idiomatic way to declare type aliases in modern C++.
Flagged by Sonar Cloud as a "Code Smell", but I happen to agree
with this particular one. :^)
This function is currently only ever used to create the init process
(SystemServer). It had a few idiosyncratic things about it that this
patch cleans up:
- Errors were returned in an int& out-param.
- It had a path for non-0 process PIDs which was never taken.
REQUIRE_PROMISE and REQUIRE_NO_PROMISES were macros for some reason,
and used all over the place.
This patch adds require_promise(Pledge) and require_no_promises()
to Process and makes the macros call these on the current process
instead of inlining code everywhere.
According to the VirtIO 1.0 specification:
"Non-transitional devices SHOULD have a PCI Device ID in the range
0x1040 to 0x107f. Non-transitional devices SHOULD have a PCI Revision ID
of 1 or higher. Non-transitional devices SHOULD have a PCI Subsystem
Device ID of 0x40 or higher."
It also says that:
"Transitional devices MUST have a PCI Revision ID of 0. Transitional
devices MUST have the PCI Subsystem Device ID matching the Virtio
Device ID, as indicated in section 5. Transitional devices MUST have the
Transitional PCI Device ID in the range 0x1000 to 0x103f."
So, for legacy devices, we know that revision ID in the PCI header won't
be 1, so we probe for PCI_SUBSYSTEM_ID value.
Instead of using the subsystem device ID, we can probe the DEVICE_ID
value directly in case it's not a legacy device.
This should cover all possibilities for identifying VirtIO devices, both
per the specification of 0.9.5, and future revisions from 1.0 onwards.
This ensures we safely handle interrupts (which can call virtual
functions), so they don't happen in the constructor - this pattern can
lead to a crash, if we are still in the constructor context because
not all methods are available for usage (some are pure virtual,
so it's possible to call __cxa_pure_virtual).
Also, under some conditions like adding a PCI device via PCI-passthrough
mechanism in QEMU, it became exposed to the eye that the code asserts on
RNG::handle_device_config_change(). That device has no configuration but
if the hypervisor still misbehaves and tries to configure it, we should
simply return false to indicate nothing happened.
Like with the ProcFS, description data can change at anytime, so it's
wise to ensure that when the userland reads from an Inode, data is
consistent unless the userland indicated it wants to refresh the data
(by seeking to offset 0, or re-attaching the Inode).
Otherwise, if the data changes in the middle of the reading, it can
cause silent corruption in output which can lead to random crashes.
This makes calling value() on a temporary KResultOr be a compile-time
error. This exposed a number of missing error checks (fixed in the
preceding commits.)
Our existing implementation did not check the element type of the other
pointer in the constructors and move assignment operators. This meant
that some operations that would require explicit casting on raw pointers
were done implicitly, such as:
- downcasting a base class to a derived class (e.g. `Kernel::Inode` =>
`Kernel::ProcFSDirectoryInode` in Kernel/ProcFS.cpp),
- casting to an unrelated type (e.g. `Promise<bool>` => `Promise<Empty>`
in LibIMAP/Client.cpp)
This, of course, allows gross violations of the type system, and makes
the need to type-check less obvious before downcasting. Luckily, while
adding the `static_ptr_cast`s, only two truly incorrect usages were
found; in the other instances, our casts just needed to be made
explicit.
And also try_create<T> => try_make_ref_counted<T>.
A global "create" was a bit much. The new name matches make<T> better,
which we've used for making single-owner objects since forever.
NetworkOrdered is a non trivial type, and it's undefined behavior to
cast a random pointer to it and then pretend it's that type.
Instead just call AK::convert_between_host_and_network_endian on the
individual u16*. This suppresses static analysis warnings.
I don't think there was a "bug" in the previous code, it worked, but
it was very brittle.
This leads to a bad pattern where anyone could create an RNG or a
Console object. Instead, let's just use the common pattern of a static
method to instantiate a new object and return it wrapped by a
NonnullRefPtr.
Before of this change, many specific classes to VirtIO were in the
Kernel namespace, which polluted it.
Everything should be more organized now, but there's still room for
improvement later.
This class member was used only to determine the device type when
printing messages to the debug log. Instead, remove this class member,
and add a quick way to find the device type according to how the VirtIO
specification says to do that.
This simplifies construction of VirtIODevices a bit, because now the
constructor doesn't need to ask for a String identified with the device
type.
This class as a CharacterDevice really was not useful, because you
couldn't even read from it.
Also, the random number generator interface should be the /dev/random,
so any other interface to get random numbers is generally not a good
idea.
Instead, let's keep this functionality as an entropy source for random
numbers generation, but without exposing a device node.
We now expose the `USBDevice`'s address in the SysFS object. This means
that device addresses are no longer determined by the name of the file
in the `/bus/usb/` directory. This was an incorrect way of determining
device address, as a standard PC can have multiple USB controllers
(and hence multiple buses) that can have overlapping device IDs.
Previous implementation sometimes didn't release the key after pressing
and holding shift due to repeating key updates when holding keys. This
meant repeating updates would set/unset `m_both_shift_keys_pressed`
repeatedly, sometimes resulting in shift still being considered pressed
even after you released it.
Simplify left and right shift key pressed logic by tracking both key
states separately and always updating modifiers based on them.
Initializing the variable this way fixes a kernel panic in Clang where
the object was zero-initialized, so the `m_in_scheduler` contained the
wrong value. GCC got it right, but we're better off making this change,
as leaving uninitialized fields in constant-initialized objects can
cause other weird situations like this. Also, initializing only a single
field to a non-zero value isn't worth the cost of no longer fitting in
`.bss`.
Another two variables suffer from the same problem, even though their
values are supposed to be zero. Removing these causes the
`_GLOBAL_sub_I_` function to no longer be generated and the (not
handled) `.init_array` section to be omitted.
This avoids a race between getting the processor-specific SchedulerData
and accessing it. (Switching to a different CPU in that window means
that we're operating on the wrong SchedulerData.)
Co-authored-by: Tom <tomut@yahoo.com>
Prior to this change, both uid_t and gid_t were typedef'ed to `u32`.
This made it easy to use them interchangeably. Let's not allow that.
This patch adds UserID and GroupID using the AK::DistinctNumeric
mechanism we've already been employing for pid_t/ProcessID.
Add a dummy Arch/aarch64/boot.S that for now does nothing but
let all processor cores sleep.
For now, none of the actual Prekernel code is built for aarch64.
This should help prevent deadlocks where a thread blocks on a Mutex
while interrupts are disabled, and makes it impossible for the holder of
the Mutex to make forward progress because it cannot be scheduled in.
Hide it behind a new debug macro LOCK_IN_CRITICAL_DEBUG for now, because
Ext2FS takes a series of Mutexes from the page fault handler, which
executes with interrupts disabled.
Previously, we would try to acquire a reference to the all processes
lock or other contended resources while holding both the scheduler lock
and the thread's blocker lock. This could lead to a deadlock if we
actually have to block on those other resources.
There are callers of processes().with or processes().for_each that
require interrupts to be disabled. Taking a Mutexe with interrupts
disabled is a recipe for deadlock, so convert this to a Spinlock.
There's no need to disable interrupts when trying to access an inode's
shared vmobject. Additionally, Inode::shared_vmobject() acquires a Mutex
which is a recipe for deadlock if acquired with interrupts disabled.
Two new ioctl requests are used to get and set the sample rate of the
sound card. The SB16 device keeps track of the sample rate separately,
because I don't want to figure out how to read the sample rate from the
device; it's easier that way.
The soundcard write doesn't set the sample rate to 44100 Hz every time
anymore, as we want to change it externally.
In order to use VirtualAddresses as compile time constants in the
AddressSanitizer implementation, we need to be able to use these
methods in constexpr functions / variable initializations.
The pattern of having Prekernel inherit all of the build flags of the
Kernel, and then disabling some flags by adding `-fno-<flag>` options
to then disable those options doesn't work in all scenarios. For example
the ASAN flag `-fasan-shadow-offset=<offset>` has no option to disable
it once it's been passed, so in a future change where this flag is added
we need to be able to disable it cleanly.
The cleaner way is to just allow the Prekernel CMake logic to filter out
the COMPILE_OPTIONS specified for that specific target. This allows us
to remove individual options without trashing all inherited options.
We have seen cases where the map fails, but we return the region
to the caller, causing them to page fault later on when they touch
the region.
The fix is to always observe the return code of map/remap.
Let's use an RAII helper to avoid having to update this on every path
out of block().
Note that this extends the time under `m_in_block == true` by a little
but that should be harmless.
The `m_should_block` member variable that many of the Thread::Blocker
subclasses had was really only used to carry state from the constructor
to the immediate-unblock-without-blocking escape hatch.
This patch refactors the blockers so that we don't need to hold on
to this flag after setup_blocker(), and instead the return value from
setup_blocker() is the authority on whether the unblock conditions
are already met.
This was previously used after construction to check for early unblock
conditions that couldn't be communicated from the constructor.
Now that we've moved early unblock checks from the constructor into
setup_blocker(), we don't need should_block() anymore.
Instead of registering with blocker sets and whatnot in the various
Blocker subclass constructors, this patch moves such initialization
to a separate setup_blocker() virtual.
setup_blocker() returns false if there's no need to actually block
the thread. This allows us to bail earlier in Thread::block().
Same deal as WaitQueueBlocker, we can get the blocked thread from
Blocker::thread() now, so there's no need to register the current
thread as custom data.
When adding a WaitQueueBlocker to a WaitQueue, it stored the blocked
thread in the registration's custom "void* data" slot.
This was only used to print the Thread* in some debug logging.
Now that Blocker always knows its origin Thread, we can simply add
a Blocker::thread() accessor and then get the blocked Thread& from
there. No need to register custom data.
There's no harm in the blocker always knowing which thread it originated
from. It also simplifies some logic since we don't need to think about
it ever being null.
The BlockerSet stores its blockers along with a "void* data" that may
contain some blocker-specific context relevant to the specific blocker
registration (for example, SelectBlocker stores a pointer to the
relevant entry in an array of SelectBlocker::FDInfo structs.)
When unregistering a blocker from a set, we don't need to key the
blocker by both the Blocker* and the data. Just the Blocker* is enough,
since all registrations for that blocker need to be removed anyway as
the blocker is about to be destroyed.
So we stop passing the "void* data" to BlockerSet::remove_blocker(),
which also allows us to remove the now-unneeded Blocker::m_block_data.
Namely, will_unblock_immediately_without_blocking(Reason).
This virtual function is called on a blocker *before any block occurs*,
if it turns out that we don't need to block the thread after all.
This can happens for one of two reasons:
- UnblockImmediatelyReason::UnblockConditionAlreadyMet
We don't need to block the thread because the condition for
unblocking it is already met.
- UnblockImmediatelyReason::TimeoutInThePast
We don't need to block the thread because a timeout was specified
and that timeout is already in the past.
This patch does not introduce any behavior changes, it's only meant to
clarify this part of the blocking logic.
Now that the old PCI::Device was removed, we can complete the PCI
changes by making the PCI::DeviceController to be named PCI::Device.
Really the entire purpose and the distinction between the two was about
interrupts, but since this is no longer a problem, just rename it to
simplify things further.
I created this class a long time ago just to be able to quickly make a
PCI device to also represent an interrupt handler (because PCI devices
have this capability for most devices).
Then after a while I introduced the PCI::DeviceController, which is
really almost the same thing (a PCI device class that has Address member
in it), but is not tied to interrupts so it can have no interrupts, or
spawn interrupt handlers however it wants to seems fit.
However I decided it's time to say goodbye for this class for
a couple of reasons:
1. It made a whole bunch of weird patterns where you had a PCI::Device
and a PCI::DeviceController being used in the topic of implementation,
where originally, they meant to be used mutually exclusively (you
can't and really don't want to use both).
2. We can really make all the classes that inherit from PCI::Device
to inherit from IRQHandler at this point. Later on, when we have MSI
interrupts support, we can go further and untie things even more.
3. It makes it possible to simplify the VirtIO implementation to a great
extent. While this commit almost doesn't change it, future changes
can untangle some complexity in the VirtIO code.
For UHCIController, E1000NetworkAdapter, NE2000NetworkAdapter,
RTL8139NetworkAdapter, RTL8168NetworkAdapter, E1000ENetworkAdapter we
are simply making them to inherit the IRQHandler. This makes some sense,
because the first 3 devices will never support anything besides IRQs.
For the last 2, they might have MSI support, so when we start to utilize
those, we might need to untie these classes from IRQHandler and spawn
IRQHandler(s) or MSIHandler(s) as needed.
The VirtIODevice class is also a case where we currently need to use
both PCI::DeviceController and IRQHandler classes as parents, but it
could also be untied from the latter.
Namely, unblock_all_blockers_whose_conditions_are_met().
The old name made it sound like things were getting unblocked no matter
what, but that's not actually the case.
What this actually does is iterate through the set of blockers,
unblocking those whose conditions are met. So give it a (very) verbose
name that errs on the side of descriptiveness.
This was only ever called immediately after FutexQueue::try_remove()
to VERIFY() that the state looks exactly like it should after returning
from try_remove().
By the time we end up destroying a BlockerSet, we don't need to take
the internal spinlock. And nobody else should be holding it either.
So replace the SpinlockLocker with a VERIFY().
The quickmap_page() and unquickmap_page() functions are used to map a
single physical page at a kernel virtual address for temporary access.
These use the per-CPU quickmap buffer in the page tables, and access to
this is guarded by the MM lock. To prevent bugs, quickmap_page() should
not *take* the MM lock, but rather verify that it is already held!
This exposed two situations where we were using quickmap without holding
the MM lock during page fault handling. This patch is forced to fix
these issues (which is great!) :^)
Don't assume that a platform machine will provide at least the 32 bit
version of SMBIOS tables. If there's no SMBIOS tables, don't expose
directory entries in the /sys/bios/ directory.
This patch removes the MutexContendedResource<T> helper class,
and MutexProtected<T> no longer inherits from T.
Instead, MutexProtected<T> simply has a T and a Mutex.
The LockedResource<T, LockMode> helper class is made a private nested
class in MutexProtected.
This has several benefits:
1) We no longer just blindly derefence a null pointer in various places
2) We will get nicer runtime error messages if the current process does
turn out to be null in the call location
3) GCC no longer complains about possible nullptr dereferences when
compiling without KUBSAN
This is primarily to be able to remove the GenericLexer include out of
Format.h as well. A subsequent commit will add AK::Result to
GenericLexer, which will cause naming conflicts with other structures
named Result. This can be avoided (for now) by preventing nearly every
file in the system from implicitly including GenericLexer.
Other changes in this commit are to add the GenericLexer include to
files where it is missing.
The previous version of this was pretty bad and caused a lot of
odd behevaiour to occur. We now abstract a lot of the allocation
behind a `template`d pool class that handles all of the memory
allocation.
The number of UHCI related files is starting to expand to the point
where it's best if we move this into their own subdirectory. It'll
also make it easier to manage when we decide to add some more
controller types (whenever that may be)
...in a few more places, at least.
find(1) is about to start relying on the reported types more or less
reflecting reality. This is especially relevant for magic symlinks
in ProcFS.
The `-z,text` linker flag causes the linker to reject shared libraries
and PIE executables that have textrels. Our code mostly did not use
these except in one place in LibC, which is changed in this commit.
This makes GNU ld match LLD's behavior, which has this option enabled by
default.
TEXTRELs pose a security risk, as performing these relocations require
executable pages to be written to by the dynamic linker. This can
significantly weaken W^X hardening mitigations.
Note that after this change, TEXTRELs can still be used in ports, as the
dynamic loader code is not changed. There are also uses of it in the
kernel, removing which are outside the scope of this PR. To allow those,
`-z,notext` is added.
This typo / bug in the Traits<T> implementation for StringView caused
AK::HashMap methods to return a `String` when looking up values out of
a hash map of type HashTable<StringView,StringView>.
This change fixes the typo, and fixes the only consumer, the kernel
Commandline class.
We don't need to access the Custody cache in IRQs or anything like that,
so it should be fine to use a regular Mutex (via ProtectedValue.)
This allows threads to block while waiting for the custody cache.
Thanks to Sergey for pointing this out. :^)
Usage patterns mean we are more likely to need a Custody we just cached.
Because lookup walks the list from the beginning, prepending new items
instead of appending means they will be found quicker.
This reduces the number of items in the cache we need to walk by 50% for
boot and application startups.
The VMObject class now manages its own instance list (it was previously
a member of MemoryManager.) Removal from the list is done safely on the
last unref(), closing a race window in the previous implementation.
Note that VMObject::all_instances() now has its own lock instead of
using the global MM lock.
Use a StringBuilder to generate a temporary string buffer with the
slave PTY names. (This works because StringBuilder has 128 bytes of
inline stack capacity before it does any heap allocations.)
This simplifies the DevPtsFS implementation somewhat, as it no longer
requires SlavePTY to register itself with it, since it can now simply
use the list of SlavePTY instances.
This class implements the emerging "ref-counted object that participates
in a lock-protected list that requires safe removal on unref()" pattern
as a base class that can be inherited in place of RefCounted<T>.
Make File inherit from RefCountedBase and provide a custom unref()
implementation. This will allow subclasses that participate in lists to
remove themselves in a safe way when being destroyed.
This patch adds a (spinlock-protected) custody cache. It's a simple
intrusive list containing all currently live custody objects.
This allows us to re-use existing custodies instead of creating them
again and again.
This gives a pretty decent performance improvement on "find /" :^)
We don't need to create a new string from a number in order to compare
an existing string to that number. Converting the existing string to a
number is much cheaper, since it does not require any heap allocations.
Ran into this while profiling "find /" :^)
There's no need for generated files in SysFS to tell you their precise
file size when you stat() them.
I noticed when profiling "find /" that we were spending a chunk of time
generating and throwing away SysFS content just so we could tell you
exactly how large it would be. :^)
Previously when trying to debug the system's routing, the ARP
information would clutter the output and make it difficult to focus on
the routing decisions. It would be better to specify these
debug messages under ARP_DEBUG.
This makes for nicer handling of errors compared to checking whether a
RefPtr is null. Additionally, this will give way to return different
types of errors in the future.