This function is currently only ever used to create the init process
(SystemServer). It had a few idiosyncratic things about it that this
patch cleans up:
- Errors were returned in an int& out-param.
- It had a path for non-0 process PIDs which was never taken.
REQUIRE_PROMISE and REQUIRE_NO_PROMISES were macros for some reason,
and used all over the place.
This patch adds require_promise(Pledge) and require_no_promises()
to Process and makes the macros call these on the current process
instead of inlining code everywhere.
Prior to this change, both uid_t and gid_t were typedef'ed to `u32`.
This made it easy to use them interchangeably. Let's not allow that.
This patch adds UserID and GroupID using the AK::DistinctNumeric
mechanism we've already been employing for pid_t/ProcessID.
There are callers of processes().with or processes().for_each that
require interrupts to be disabled. Taking a Mutexe with interrupts
disabled is a recipe for deadlock, so convert this to a Spinlock.
This has several benefits:
1) We no longer just blindly derefence a null pointer in various places
2) We will get nicer runtime error messages if the current process does
turn out to be null in the call location
3) GCC no longer complains about possible nullptr dereferences when
compiling without KUBSAN
We are not using this for anything and it's just been sitting there
gathering dust for well over a year, so let's stop carrying all this
complexity around for no good reason.
This makes the following scenario impossible with an SMP setup:
1) CPU A enters unref() and decrements the link count to 0.
2) CPU B sees the process in the process list and ref()s it.
3) CPU A removes the process from the list and continues destructing.
4) CPU B is now holding a destructed Process object.
By holding the process list lock before doing anything with it, we
ensure that other CPUs can't find this process in the middle of it being
destructed.
This allows us to 1) let go of the Process when an inode is ref'ing for
ProcFSExposedComponent related reasons, and 2) change our ref/unref
implementation.
This patch removes KResult::operator int() and deals with the fallout.
This forces a lot of code to be more explicit in its handling of errors,
greatly improving readability.
The only two paths for copying strings in the kernel should be going
through the existing Userspace<char const*>, or StringArgument methods.
Lets enforce this by removing the option for using the raw cstring APIs
that were previously available.
The compiler can re-order the structure (class) members if that's
necessary, so if we make Process to inherit from ProcFSExposedComponent,
even if the declaration is to inherit first from ProcessBase, then from
ProcFSExposedComponent and last from Weakable<Process>, the members of
class ProcFSExposedComponent (including the Ref-counted parts) are the
first members of the Process class.
This problem made it impossible to safely use the current toggling
method with the write-protection bit on the ProcessBase members, so
instead of inheriting from it, we make its members the last ones in the
Process class so we can safely locate and modify the corresponding page
write protection bit of these values.
We make sure that the Process class doesn't expand beyond 8192 bytes and
the protected values are always aligned on a page boundary.
I had to move the thread list out of the protected base area of Process
so that it could live with its lock (which needs to be mutable).
Ideally it would live in the protected area, so maybe we can figure out
a way to do that later.
This isn't needed for Process / Thread as they only reference it
by pointer and it's already part of Kernel/Forward.h. So just include
it where the implementation needs to call it.
The way the Process::FileDescriptions::allocate() API works today means
that two callers who allocate back to back without associating a
FileDescription with the allocated FD, will receive the same FD and thus
one will stomp over the other.
Naively tracking which FileDescriptions are allocated and moving onto
the next would introduce other bugs however, as now if you "allocate"
a fd and then return early further down the control flow of the syscall
you would leak that fd.
This change modifies this behavior by tracking which descriptions are
allocated and then having an RAII type to "deallocate" the fd if the
association is not setup the end of it's scope.
This enables further work on implementing KASLR by adding relocation
support to the pre-kernel and updating the kernel to be less dependent
on specific virtual memory layouts.
GCC and Clang allow us to inject a call to a function named
__sanitizer_cov_trace_pc on every edge. This function has to be defined
by us. By noting down the caller in that function we can trace the code
we have encountered during execution. Such information is used by
coverage guided fuzzers like AFL and LibFuzzer to determine if a new
input resulted in a new code path. This makes fuzzing much more
effective.
Additionally this adds a basic KCOV implementation. KCOV is an API that
allows user space to request the kernel to start collecting coverage
information for a given user space thread. Furthermore KCOV then exposes
the collected program counters to user space via a BlockDevice which can
be mmaped from user space.
This work is required to add effective support for fuzzing SerenityOS to
the Syzkaller syscall fuzzer. :^) :^)
This adds a ".profile" extension to perfcore files written by the
Kernel. Also, the process name is now visible in the perfcore filename.
Furthermore, this patch adds error handling for the case where the
filename generated by the Kernel is already taken. In that case, a digit
will be added to the filename (before the extension).
This also adds some more error logging to dump_perfcore().
This implements a simple bootloader that is capable of loading ELF64
kernel images. It does this by using QEMU/GRUB to load the kernel image
from disk and pass it to our bootloader as a Multiboot module.
The bootloader then parses the ELF image and sets it up appropriately.
The kernel's entry point is a C++ function with architecture-native
code.
Co-authored-by: Liav A <liavalb@gmail.com>
We leak a ref() onto every user process when constructing them,
either via Process::create_user_process(), or via Process::sys$fork().
This ref() is balanced by a corresponding unref() in
Thread::WaitBlockCondition::finalize().
Since kernel processes don't have a leaked ref() on them, this led to
an extra Process::unref() on kernel processes during finalization.
This happened during every boot, with the `init_stage2` process.
Found by turning off kfree() scrubbing. :^)
It's possible that another thread might try to exit the process just
about the same time another thread does the same, or a crash happens.
Also, we may not be able to kill all other threads instantly as they
may be blocked in the kernel (though in this case they would get killed
before ever returning back to user mode. So keep track of whether
Process::die was already called and ignore it on subsequent calls.
Fixes#8485
This enables the Lock class to block a thread even while the thread is
working on a BlockCondition. A thread can still only be either blocked
by a Lock or a BlockCondition.
This also establishes a linked list of threads that are blocked by a
Lock and unblocking directly unlocks threads and wakes them directly.
This re-arranges the order of how things are initialized so that we
try to initialize process and thread management earlier. This is
neccessary because a lot of the code uses the Lock class, which really
needs to have a running scheduler in place so that we can properly
preempt.
This also enables us to potentially initialize some things in parallel.
The ProtectedDataMutationScope cannot blindly assume that there is only
exactly one thread at a time that may want to unprotect the Process.
Most of the time the big lock guaranteed this, but there are some cases
such as finalization (among others) where this is not necessarily
guaranteed.
This fixes random panics due to access violations when the
ProtectedDataMutationScope protects the Process instance while another
is still modifying it.
Fixes#8512
The new ProcFS design consists of two main parts:
1. The representative ProcFS class, which is derived from the FS class.
The ProcFS and its inodes are much more lean - merely 3 classes to
represent the common type of inodes - regular files, symbolic links and
directories. They're backed by a ProcFSExposedComponent object, which
is responsible for the functional operation behind the scenes.
2. The backend of the ProcFS - the ProcFSComponentsRegistrar class
and all derived classes from the ProcFSExposedComponent class. These
together form the entire backend and handle all the functions you can
expect from the ProcFS.
The ProcFSExposedComponent derived classes split to 3 types in the
manner of lifetime in the kernel:
1. Persistent objects - this category includes all basic objects, like
the root folder, /proc/bus folder, main blob files in the root folders,
etc. These objects are persistent and cannot die ever.
2. Semi-persistent objects - this category includes all PID folders,
and subdirectories to the PID folders. It also includes exposed objects
like the unveil JSON'ed blob. These object are persistent as long as the
the responsible process they represent is still alive.
3. Dynamic objects - this category includes files in the subdirectories
of a PID folder, like /proc/PID/fd/* or /proc/PID/stacks/*. Essentially,
these objects are always created dynamically and when no longer in need
after being used, they're deallocated.
Nevertheless, the new allocated backend objects and inodes try to use
the same InodeIndex if possible - this might change only when a thread
dies and a new thread is born with a new thread stack, or when a file
descriptor is closed and a new one within the same file descriptor
number is opened. This is needed to actually be able to do something
useful with these objects.
The new design assures that many ProcFS instances can be used at once,
with one backend for usage for all instances.
The types for asm_signal_trampoline and asm_signal_trampoline_end
were incorrect. They both point into the text segment but they're
not really functions.
This commit converts naked `new`s to `AK::try_make` and `AK::try_create`
wherever possible. If the called constructor is private, this can not be
done, so we instead now use the standard-defined and compiler-agnostic
`new (nothrow)`.
This adds just enough stubs to make the kernel compile on x86_64. Obviously
it won't do anything useful - in fact it won't even attempt to boot because
Multiboot doesn't support ELF64 binaries - but it gets those compiler errors
out of the way so more progress can be made getting all the missing
functionality in place.
When profiling a single process we didn't disable the profile timer.
enable_profile_timer()/disable_profiler_timer() support nested calls
so no special care has to be taken here to only disable the timer when
nobody else is using it.
By constraining two implementations, the compiler will select the best
fitting one. All this will require is duplicating the implementation and
simplifying for the `void` case.
This constraining also informs both the caller and compiler by passing
the callback parameter types as part of the constraint
(e.g.: `IterationFunction<int>`).
Some `for_each` functions in LibELF only take functions which return
`void`. This is a minimal correctness check, as it removes one way for a
function to incompletely do something.
There seems to be a possible idiom where inside a lambda, a `return;` is
the same as `continue;` in a for-loop.
This change looks more involved than it actually is. This simply
reshuffles the previous Process constructor and splits out the
parts which can fail (resource allocation) into separate methods
which can be called from a factory method. The factory is then
used everywhere instead of the constructor.
Modify the API so it's possible to propagate error on OOM failure.
NonnullOwnPtr<T> is not appropriate for the ThreadTracer::create() API,
so switch to OwnPtr<T>, use adopt_own_if_nonnull() to handle creation.
GCC with -flto is more aggressive when it comes to inlining and
discarding functions which is why we must mark some of the functions
as NEVER_INLINE (because they contain asm labels which would be
duplicated in the object files if the compiler decides to inline
the function elsewhere) and __attribute__((used)) for others so
that GCC doesn't discard them.
This turns the perfcore format into more a log than it was before,
which lets us properly log process, thread and region
creation/destruction. This also makes it unnecessary to dump the
process' regions every time it is scheduled like we did before.
Incidentally this also fixes 'profile -c' because we previously ended
up incorrectly dumping the parent's region map into the profile data.
Log-based mmap support enables profiling shared libraries which
are loaded at runtime, e.g. via dlopen().
This enables profiling both the parent and child process for
programs which use execve(). Previously we'd discard the profiling
data for the old process.
The Profiler tool has been updated to not treat thread IDs as
process IDs anymore. This enables support for processes with more
than one thread. Also, there's a new widget to filter which
process should be displayed.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
While profiling all processes the profile buffer lives forever.
Once you have copied the profile to disk, there's no need to keep it
in memory. This syscall surfaces the ability to clear that buffer.
Alot of code is shared between i386/i686/x86 and x86_64
and a lot probably will be used for compatability modes.
So we start by moving the headers into one Directory.
We will probalby be able to move some cpp files aswell.