Because we were holding a strong ref to the OpenFileDescription in
LocalSocket and a strong ref to the LocalSocket in Inode, we were
creating a reference cycle in the event of the socket being cleaned up
after the file description did (i.e. unlinking the file before closing
the socket), because the file description never got destructed.
The TimeWait state is intended to prevent another socket from taking the
address tuple in case any packets are still in transit after the final
close. Since this state never delivers packets to userspace, it doesn't
make sense to keep the receive buffer around.
Replace the old logic where we would start with a host build, and swap
all the CMake compiler and target variables underneath it to trick
CMake into building for Serenity after we configured and built the Lagom
code generators.
The SuperBuild creates two ExternalProjects, one for Lagom and one for
Serenity. The Serenity project depends on the install stage for the
Lagom build. The SuperBuild also generates a CMakeToolchain file for the
Serenity build to use that replaces the old toolchain file that was only
used for Ports.
To ensure that code generators are rebuilt when core libraries such as
AK and LibCore are modified, developers will need to direct their manual
`ninja` invocations to the SuperBuild's binary directory instead of the
Serenity binary directory.
This commit includes warning coalescing and option style cleanup for the
affected CMakeLists in the Kernel, top level, and runtime support
libraries. A large part of the cleanup is replacing USE_CLANG_TOOLCHAIN
with the proper CMAKE_CXX_COMPILER_ID variable, which will no longer be
confused by a host clang compiler.
There are a few violations with signal handling that I won't be able to
fix it until later this week. So lets put lock rank enforcement under a
debug option for now so other folks don't hit these crashes until rank
enforcement is more fleshed out.
The better fix is to have a linker script. We'll need this to set
the entry point to 0x80000 for bare-metal builds anyways. But I'd
like to get some UART output in qemu before I add this (otherwise
I can't check if the bare-metal version does anything), so put
in this temporary kludge for now.
Needed for functions that have local variables.
In time we need to share this between aarch64 and intel, but while
we figure out what exactly the aarch64 Prekernel should do, let's
duplicate this.
Else, function-local statics create calls to
__cxa_guard_acquire / __cxa_guard_release on aarch64, which we don't
(yet?) implement. Since Prekernel is single-threaded, just sidestep
that for now.
PVS-Studio flagged these as uninitialized. While there is no bug here,
it is our policy to always initialize members to avoid potential bugs
in the future.
This change removes the halt and reboot syscalls, and create a new
mechanism to change the power state of the machine.
Instead of how power state was changed until now, put a SysFS node as
writable only for the superuser, that with a defined value, can result
in either reboot or poweroff.
In the future, a power group can be assigned to this node (which will be
the GroupID responsible for power management).
This opens an opportunity to permit to shutdown/reboot without superuser
permissions, so in the future, a userspace daemon can take control of
this node to perform power management operations without superuser
permissions, if we enforce different UserID/GroupID on that node.
Both should reside in the SysFS firmware directory which is normally
located in /sys/firmware.
Also, apply some OOM-safety patterns when creating the BIOS and ACPI
directories.
This will somwhat help unify them also under the same SysFS directory in
the commit.
Also, it feels much more like this change reflects the reality that both
ACPI and the BIOS are part of the firmware on x86 computers.
These interfaces are broken for about 9 months, maybe longer than that.
At this point, this is just a dead code nobody tests or tries to use, so
let's remove it instead of keeping a stale code just for the sake of
keeping it and hoping someone will fix it.
To better justify this, I read that OpenBSD removed loadable kernel
modules in 5.7 release (2014), mainly for the same reason we do -
nobody used it so they had no good reason to maintain it.
Still, OpenBSD had LKMs being effectively working, which is not the
current state in our project for a long time.
An arguably better approach to minimize the Kernel image size is to
allow dropping drivers and features while compiling a new image.
I forgot that we need to also initialize SerialDevice and also to ensure
it creates a sysfs node properly. Although I had a better fix for this,
it keeps the CI happy, so for now it's more than enough :)
Instead of doing so in the constructor, let's do immediately after the
constructor, so we can safely pass a reference of a Device, so the
SysFSDeviceComponent constructor can use that object to identify whether
it's a block device or a character device.
This allows to us to not hold a device in SysFSDeviceComponent with a
RefPtr.
Also, we also call the before_removing method in both SlavePTY::unref
and File::unref, so because Device has that method being overrided, it
can ensure the device is removed always cleanly.
This function was checking 1 byte after the provided range, which caused
it to reject valid userspace ranges that happened to end exactly at the
top of the user address space.
This fixes a long-standing issue with mysterious Optional errors in
Coredump::write_regions(). (It happened when trying to add a memory
region at the very top of the address space to a coredump.)
Let's remove the DynamicParser class, as it really did nothing yet in
the Kernel. Instead, when we add support for AML parsing, we can figure
out how to do it properly without the need of a derived class that just
complicates everything for no good reason.
This variant of dbgputstr does not lock the global log lock, as it is
called before the current or any other processor was initialized,
meaning that:
A) The $gs base was not setup yet, so we cannot enter into critical
sections, and as a result we cannot use SpinLocks
B) No other processors may try to print at the same time anyway
This fixes a triple fault that occurs when compiling serenity with
the i686 clang toolchain. (The underlying issue is that the old inline
assembly did not specify that it clobbered the eax/ecx/edx registers
and as such the compiler assumed they were not changed and used their
values across it)
Co-authored-by: Brian Gianforcaro <bgianf@serenityos.org>
This was a mistake in the move away from KBuffer-as-a-value type.
We need to check `packet` here, not `packet->data`.
Regressed in b300f9aa2f.
Fixes#9888.
This patch converts all the usage of AK::String around sys$execve() to
using KString instead, allowing us to catch and propagate OOM errors.
It also required changing the kernel CommandLine helper class to return
a vector of KString for the userspace init program arguments.
This is a fix so the VirtIO code doesn't lead to assertion because we
try to determine the name based on the PCI values of the VirtIO device,
because trying to read from the PCI configuration space requires to
acquire a Mutex, which fails in an IRQ context.
To ensure we never encounter a situation when we call a pure virtual
function in an IRQ context, let's make class_name() method to be a
non-pure virtual function, so it can be still called at anytime.
It isn't needed.
Also, we stopped linking Kernel against it in 67f0c0d5f0. libsupc++
depends on symbols like free() or realloc() which we removed from
Kernel/StdLib.cpp after 67f0c0d5f0 and which don't exist in Prekernel
either.
(It also happens to make the aarc64 link fail in less obvious ways.)
This is really a basic support for AHCI hotplug events, so we know how
to add a node representing the device in /sys/dev/block and removing it
according to the event type (insertion/removal).
This change doesn't take into account what happens if the device was
mounted or a read/write operation is being handled.
For this to work correctly, StorageManagement now uses the Singleton
container, as it might be accessed simultaneously from many CPUs
for hotplug events. DiskPartition holds a WeakPtr instead of a RefPtr,
to allow removal of a StorageDevice object from the heap.
StorageDevices are now stored and being referenced to via an
IntrusiveList to make it easier to remove them on hotplug event.
In future changes, all of the stated above might change, but for now,
this commit represents the least amount of changes to make everything
to work correctly.
We are no longer have a separate Inode object class for the pts
directory. With a small exception to this, all chmod and chown code
is now at one place.
It's now possible to create any name of a sub-directory in the
filesystem.
The current implementation of DevFS resembles the linux devtmpfs, and
not the traditional DevFS, so let's rename it to better represent the
direction of the development in regard to this filesystem.
The abbreviation for DevTmpFS is still "dev", because it doesn't add
value as a commandline option to make it longer.
In quick summary - DevFS in unix OSes is simply a static filesystem, so
device nodes are generated and removed by the kernel code. DevTmpFS
is a "modern reinvention" of the DevFS, so it is much more like a TmpFS
in the sense that not only it's stored entirely in RAM, but the userland
is responsible to add and remove devices nodes as it sees fit, and no
kernel code is directly being involved to keep the filesystem in sync.
In order to make this kind of operation simpler, we no longer use a
Vector to store pointers to DevFSDeviceInode, but an IntrusiveList is
used instead. Also, we only allow to remove device nodes for now, but
in theory we can allow to remove all kinds of files from the DevFS.
These files are not marked as block devices or character devices so they
are not meant to be used as device nodes. The filenames are formatted to
the pattern "major:minor", but a Userland program need to call the parse
these format and inspect the the major and minor numbers and create the
real device nodes in /dev.
Later on, it might be a good idea to ensure we don't create new
SysFSComponents on the heap for each Device, but rather generate
them only when required (and preferably to not create a SysFSComponent
at all if possible).
Devices might be removed and inserted at anytime, so let's ensure we
always do these kind of operations with a good known state of the
HashMap.
The VirtIO code was modified to create devices outside the IRQ handler,
so now it works with the new locking of the devices singleton, but a
better approach might be needed later on.
These methods are no longer needed because SystemServer is able to
populate the DevFS on its own.
Device absolute_path no longer assume a path to the /dev location,
because it really should not assume any path to a Device node.
Because StorageManagement still needs to know the storage name, we
declare a virtual method only for StorageDevices to override, but this
technique should really be removed later on.
Don't create these device nodes in the Kernel, so we essentially enforce
userspace (SystemServer) to take control of this operation and to decide
how to create these device nodes.
This makes the DevFS to resemble linux devtmpfs, and allows us to remove
a bunch of unneeded overriding implementations of device name creation
in the Kernel.
TmpFS inodes rely on the call to Inode::one_ref_left() to unregister
themselves from the inode cache in TmpFS.
When moving various kernel classes to ListedRefCounted for safe unref()
while participating on lists, I forgot to make ListedRefCounted check
for (and call) one_ref_left() & will_be_destroyed() on the CRTP class.
This patch moves everything from KBufferImpl into KBuffer instead.
One layer of indirection is removed, and the whole thing is massively
simplified. :^)
This patch adds KBufferBuilder::try_create() and treats it like anything
else that can fail. And so, failure to allocate the initial internal
buffer of the builder will now propagate an ENOMEM to the caller. :^)
This was a weird KBuffer API that assumed failure was impossible.
This patch converts it to a modern KResultOr<NonnullOwnPtr<KBuffer>> API
and updates the two clients to the new style.
Sockets remember their last error code in the SO_ERROR field, so we need
to take special care to remember this when returning an error.
This patch adds a SOCKET_TRY() that works like TRY() but also calls
set_so_error() on the failure path.
There's probably a lot more code that should be using this, but that's
outside the scope of this patch.
We don't really have anywhere to propagate the error in NetworkTask at
the moment, since it runs in its own kernel thread and has no direct
userspace caller.
Make use of the new FileDescription::try_serialize_absolute_path() to
avoid String in favor of KString throughout much of sys$execve() and
its helpers.
A couple of things were changed:
1. Semantic changes - PCI segments are now called PCI domains, to better
match what they are really. It's also the name that Linux gave, and it
seems that Wikipedia also uses this name.
We also remove PCI::ChangeableAddress, because it was used in the past
but now it's no longer being used.
2. There are no WindowedMMIOAccess or MMIOAccess classes anymore, as
they made a bunch of unnecessary complexity. Instead, Windowed access is
removed entirely (this was tested, but never was benchmarked), so we are
left with IO access and memory access options. The memory access option
is essentially mapping the PCI bus (from the chosen PCI domain), to
virtual memory as-is. This means that unless needed, at any time, there
is only one PCI bus being mapped, and this is changed if access to
another PCI bus in the same PCI domain is needed. For now, we don't
support mapping of different PCI buses from different PCI domains at the
same time, because basically it's still a non-issue for most machines
out there.
2. OOM-safety is increased, especially when constructing the Access
object. It means that we pre-allocating any needed resources, and we try
to find PCI domains (if requested to initialize memory access) after we
attempt to construct the Access object, so it's possible to fail at this
point "gracefully".
3. All PCI API functions are now separated into a different header file,
which means only "clients" of the PCI subsystem API will need to include
that header file.
4. Functional changes - we only allow now to enumerate the bus after
a hardware scan. This means that the old method "enumerate_hardware"
is removed, so, when initializing an Access object, the initializing
function must call rescan on it to force it to find devices. This makes
it possible to fail rescan, and also to defer it after construction from
both OOM-safety terms and hotplug capabilities.
This change adds a static lock hierarchy / ranking to the Kernel with
the goal of reducing / finding deadlocks when running with SMP enabled.
We have seen quite a few lock ordering deadlocks (locks taken in a
different order, on two different code paths). As we properly annotate
locks in the system, then these facilities will find these locking
protocol violations automatically
The `LockRank` enum documents the various locks in the system and their
rank. The implementation guarantees that a thread holding one or more
locks of a lower rank cannot acquire an additional lock with rank that
is greater or equal to any of the currently held locks.
There are certain checks that we should skip if the system is crashing.
The system can avoid stack overflow during crash, or even triple
faulting while while handling issues that can causes recursive panics
or aborts.
I had to look at this for a moment before I realized that sys$execve()
and the spawning of /bin/SystemServer at boot are taking two different
paths out of exec().
Add a comment to help the next person looking at it. :^)
Before this patch, we were leaking a ref on the open file description
used for the interpreter (the dynamic loader) in sys$execve().
This surfaced when adapting the syscall to use TRY(), since we were now
correctly transferring ownership of the interpreter to Process::exec()
and no longer holding on to a local copy of it (in `elf_result`).
Fixing the leak uncovered another problem. The interpreter description
would now get destroyed when returning from do_exec(), which led to a
kernel panic when attempting to acquire a mutex.
This happens because we're in a particularly delicate state when
returning from do_exec(). Everything is primed for the upcoming context
switch into the new executable image, and trying to block the thread
at this point will panic the kernel.
We fix this by destroying the interpreter description earlier in
do_exec(), at the point where we no longer need it.
When executing a dynamically linked program, we need to pass the main
program executable via a file descriptor to the dynamic loader.
Before this patch, we were allocating an FD for this purpose long after
it was safe to do anything fallible. If we were unable to allocate an
FD we would simply panic the kernel(!)
We now hoist the allocation so it can fail before we've committed to
a new executable.
Due to the use of ELF::Image::for_each_program_header(), we were
previously unable to use TRY() in the ELF loading code (since the return
statement inside TRY() would only return from the iteration callback.)
Instead of creating a bunch of ByteBuffers and concatenating them to
generate the "notes" segment, we now simply create a KBufferBuilder
and tell each of the notes generator helpers to write into the builder.
This allows the code to flow more naturally, with some bonus additional
error propagation. :^)
Once we commit to a new executable image in sys$execve(), we can no
longer return with an error to whoever called us from userspace.
We must make sure to surface any potential errors before that point.
This patch moves signal trampoline allocation before the commit.
A number of other things remain to be moved.
We previously allowed Thread to exist in a state where its m_name was
null, and had to work around that in various places.
This patch removes that possibility and forces those who would create a
thread (or change the name of one) to provide a NonnullOwnPtr<KString>
with the name.
- Renamed try_create_absolute_path() => try_serialize_absolute_path()
- Use KResultOr and TRY() to propagate errors
- Don't call this when it's only for debug logging
- Use KResultOr and TRY() to propagate errors
- Check for OOM errors
- Move allocation out of constructors
There's still a lot more to do here, as SysFS is still quite brittle
in the face of memory pressure.
This expands the reach of error propagation greatly throughout the
kernel. Sadly, it also exposes the fact that we're allocating (and
doing other fallible things) in constructors all over the place.
This patch doesn't attempt to address that of course. That's work for
our future selves.
The default template argument is only used in one place, and it
looks like it was probably just an oversight. The rest of the Kernel
code all uses u8 as the type. So lets make that the default and remove
the unused template argument, as there doesn't seem to be a reason to
allow the size to be customizable.
Instead of checking it at every call site (to generate EBADF), we make
file_description(fd) return a KResultOr<NonnullRefPtr<FileDescription>>.
This allows us to wrap all the calls in TRY(). :^)
The only place that got a little bit messier from this is sys$mount(),
and there's a whole bunch of things there in need of cleanup.
Try to do both FD allocations up front instead of interleaved between
assigning them to the descriptor table. This prevents us from failing
in the middle of setting up the pipes.
This patch adds release_error() and release_value() to KResult, making
it usable with TRY().
Note that release_value() returns void, since there is no value inside
a KResult.