Commit Graph

791 Commits

Author SHA1 Message Date
Liav A
5e062414c1 Kernel: Add support for jails
Our implementation for Jails resembles much of how FreeBSD jails are
working - it's essentially only a matter of using a RefPtr in the
Process class to a Jail object. Then, when we iterate over all processes
in various cases, we could ensure if either the current process is in
jail and therefore should be restricted what is visible in terms of
PID isolation, and also to be able to expose metadata about Jails in
/sys/kernel/jails node (which does not reveal anything to a process
which is in jail).

A lifetime model for the Jail object is currently plain simple - there's
simpy no way to manually delete a Jail object once it was created. Such
feature should be carefully designed to allow safe destruction of a Jail
without the possibility of releasing a process which is in Jail from the
actual jail. Each process which is attached into a Jail cannot leave it
until the end of a Process (i.e. when finalizing a Process). All jails
are kept being referenced in the JailManagement. When a last attached
process is finalized, the Jail is automatically destroyed.
2022-11-05 18:00:58 -06:00
kleines Filmröllchen
b8567d7a9d Kernel: Make scheduler control syscalls more generic
The syscalls are renamed as they no longer reflect the exact POSIX
functionality. They can now handle setting/getting scheduler parameters
for both threads and processes.
2022-10-27 11:30:19 +01:00
Idan Horowitz
4ce326205e Kernel: Stop verifying interrupts are disabled in Process::for_each
This is a left-over from back when we didn't have any locking on the
global Process list, nor did we have SMP support, so this acted as some
kind of locking mechanism. We now have proper locks around the Process
list, so this is no longer relevant.
2022-08-27 21:54:13 +03:00
Andreas Kling
d3e8eb5918 Kernel: Make file-backed memory regions remember description permissions
This allows sys$mprotect() to honor the original readable & writable
flags of the open file description as they were at the point we did the
original sys$mmap().

IIUC, this is what Dr. POSIX wants us to do:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/mprotect.html

Also, remove the bogus and racy "W^X" checking we did against mappings
based on their current inode metadata. If we want to do this, we can do
it properly. For now, it was not only racy, but also did blocking I/O
while holding a spinlock.
2022-08-24 14:57:51 +02:00
Andreas Kling
cf16b2c8e6 Kernel: Wrap process address spaces in SpinlockProtected
This forces anyone who wants to look into and/or manipulate an address
space to lock it. And this replaces the previous, more flimsy, manual
spinlock use.

Note that pointers *into* the address space are not safe to use after
you unlock the space. We've got many issues like this, and we'll have
to track those down as wlel.
2022-08-24 14:57:51 +02:00
Samuel Bowman
91574ed677 Kernel: Fix boot profiling
Boot profiling was previously broken due to init_stage2() passing the
event mask to sys$profiling_enable() via kernel pointer, but a user
pointer is expected.

To fix this, I added Process::profiling_enable() as an alternative to
Process::sys$profiling_enable which takes a u64 rather than a
Userspace<u64 const*>. It's a bit of a hack, but it works.
2022-08-23 11:48:50 +02:00
Anthony Iacono
ec3d8a7a18 Kernel: Remove unused Process::in_group() 2022-08-23 01:01:48 +02:00
Anthony Iacono
f86b671de2 Kernel: Use Process::credentials() and remove user ID/group ID helpers
Move away from using the group ID/user ID helpers in the process to
allow for us to take advantage of the immutable credentials instead.
2022-08-22 12:46:32 +02:00
Andreas Kling
8ed06ad814 Kernel: Guard Process "protected data" with a spinlock
This ensures that both mutable and immutable access to the protected
data of a process is serialized.

Note that there may still be multiple TOCTOU issues around this, as we
have a bunch of convenience accessors that make it easy to introduce
them. We'll need to audit those as well.
2022-08-21 12:25:14 +02:00
Andreas Kling
728c3fbd14 Kernel: Use RefPtr instead of LockRefPtr for Custody
By protecting all the RefPtr<Custody> objects that may be accessed from
multiple threads at the same time (with spinlocks), we remove the need
for using LockRefPtr<Custody> (which is basically a RefPtr with a
built-in spinlock.)
2022-08-21 12:25:14 +02:00
Andreas Kling
122d7d9533 Kernel: Add Credentials to hold a set of user and group IDs
This patch adds a new object to hold a Process's user credentials:

- UID, EUID, SUID
- GID, EGID, SGID, extra GIDs

Credentials are immutable and child processes initially inherit the
Credentials object from their parent.

Whenever a process changes one or more of its user/group IDs, a new
Credentials object is constructed.

Any code that wants to inspect and act on a set of credentials can now
do so without worrying about data races.
2022-08-20 18:32:50 +02:00
Andreas Kling
bec314611d Kernel: Move InodeMetadata methods out of line 2022-08-20 17:20:44 +02:00
Andreas Kling
11eee67b85 Kernel: Make self-contained locking smart pointers their own classes
Until now, our kernel has reimplemented a number of AK classes to
provide automatic internal locking:

- RefPtr
- NonnullRefPtr
- WeakPtr
- Weakable

This patch renames the Kernel classes so that they can coexist with
the original AK classes:

- RefPtr => LockRefPtr
- NonnullRefPtr => NonnullLockRefPtr
- WeakPtr => LockWeakPtr
- Weakable => LockWeakable

The goal here is to eventually get rid of the Lock* classes in favor of
using external locking.
2022-08-20 17:20:43 +02:00
kleines Filmröllchen
4314c25cf2 Kernel: Require lock rank for Spinlock construction
All users which relied on the default constructor use a None lock rank
for now. This will make it easier to in the future remove LockRank and
actually annotate the ranks by searching for None.
2022-08-19 20:26:47 -07:00
Linus Groh
146903a3b5 Kernel: Require semicolon after VERIFY_{NO_,}PROCESS_BIG_LOCK_ACQUIRED
This matches out general macro use, and specifically other verification
macros like VERIFY(), VERIFY_NOT_REACHED(), VERIFY_INTERRUPTS_ENABLED(),
and VERIFY_INTERRUPTS_DISABLED().
2022-08-17 22:56:51 +02:00
Andreas Kling
b6d0636656 Kernel: Don't leak file descriptors in sys$pipe()
If the final copy_to_user() call fails when writing the file descriptors
to the output array, we have to make sure the file descriptors don't
remain in the process file descriptor table. Otherwise they are
basically leaked, as userspace is not aware of them.

This matches the behavior of our sys$socketpair() implementation.
2022-08-16 20:35:32 +02:00
zzLinus
ca74443012 Kernel/LibC: Implement posix syscall clock_getres() 2022-07-25 15:33:50 +02:00
Idan Horowitz
9db10887a1 Kernel: Clean up sys$futex and add support for cross-process futexes 2022-07-21 16:39:22 +02:00
Idan Horowitz
55c7496200 Kernel: Propagate OOM conditions out of sys$futex 2022-07-21 16:39:22 +02:00
Hendiadyoin1
d783389877 Kernel+LibC: Add posix_fallocate syscall 2022-07-15 12:42:43 +02:00
sin-ack
3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
gggggg-gggggg
d728017578 Kernel+LibC+LibCore: Pass fcntl extra argument as pointer-sized variable
The extra argument to fcntl is a pointer in the case of F_GETLK/F_SETLK
and we were pulling out a u32, leading to pointer truncation on x86_64.
Among other things, this fixes Assistant on x86_64 :^)
2022-07-10 20:09:11 +02:00
Tim Schumacher
cf0ad3715e Kernel: Implement sigsuspend using a SignalBlocker
`sigsuspend` was previously implemented using a poll on an empty set of
file descriptors. However, this broke quite a few assumptions in
`SelectBlocker`, as it verifies at least one file descriptor to be
ready after waking up and as it relies on being notified by the file
descriptor.

A bare-bones `sigsuspend` may also be implemented by relying on any of
the `sigwait` functions, but as `sigsuspend` features several (currently
unimplemented) restrictions on how returns work, it is a syscall on its
own.
2022-07-08 22:27:38 +00:00
Tim Schumacher
add4dd3589 Kernel: Do a POSIX-correct signal handler reset on exec 2022-07-05 20:58:38 +03:00
Andrew Kaster
940be19259 Kernel: Create /proc/pid/cmdline to expose process arguments in procfs
In typical serenity style, they are just a JSON array
2022-06-19 09:05:35 +02:00
Ariel Don
9a6bd85924 Kernel+LibC+VFS: Implement utimensat(3)
Create POSIX utimensat() library call and corresponding system call to
update file access and modification times.
2022-05-21 18:15:00 +02:00
MacDue
d951e2ca97 Kernel: Add /proc/{pid}/children to ProcFS
This exposes the child processes for a process as a directory
of symlinks to the respective /proc entries for each child.

This makes for an easier and possibly more efficient way
to find and count a process's children. Previously the only
method was to parse the entire /proc/all JSON file.
2022-05-06 02:12:51 +04:30
sin-ack
bc7c8879c5 Kernel+LibC+LibCore: Implement the unlinkat(2) syscall 2022-04-23 10:43:32 -07:00
Luke Wilde
1682b0b6d8 Kernel: Remove big lock from sys$set_coredump_metadata
The only requirement for this syscall is to make
Process::m_coredump_properties SpinlockProtected.
2022-04-09 21:51:16 +02:00
Jelle Raaijmakers
7826729ab2 Kernel: Track big lock blocked threads in separate list
When we lock a mutex, eventually `Thread::block` is invoked which could
in turn invoke `Process::big_lock().restore_exclusive_lock()`. This
would then try to add the current thread to a different blocked thread
list then the one in use for the original mutex being locked, and
because it's an intrusive list, the thread is removed from its original
list during the `.append()`. When the original mutex eventually
unblocks, we no longer have the thread in the intrusive blocked threads
list and we panic.

Solve this by making the big lock mutex special and giving it its own
blocked thread list. Because the process big lock is temporary and is
being actively removed from e.g. syscalls, it's a matter of time before
we can also remove the fix introduced by this commit.

Fixes issue #9401.
2022-04-06 18:27:19 +02:00
Idan Horowitz
086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
Ali Mohammad Pur
8233da3398 Kernel: Add a 'no_error' pledge promise
This makes pledge() ignore promises that would otherwise cause it to
fail with EPERM, which is very useful for allowing programs to run under
a "jail" so to speak, without having them termiate early due to a
failing pledge() call.
2022-03-26 21:34:56 +04:30
Liav A
b5ef900ccd Kernel: Don't assume paths of TTYs and pseudo terminals anymore
The obsolete ttyname and ptsname syscalls are removed.
LibC doesn't rely on these anymore, and it helps simplifying the Kernel
in many places, so it's an overall an improvement.

In addition to that, /proc/PID/tty node is removed too as it is not
needed anymore by userspace to get the attached TTY of a process, as
/dev/tty (which is already a character device) represents that as well.
2022-03-22 20:26:05 +01:00
int16
256744ebdf Kernel: Make mmap validation functions return ErrorOr<void> 2022-03-22 12:20:19 +01:00
int16
4b96d9c813 Kernel: Move mmap validation functions to Process 2022-03-22 12:20:19 +01:00
Andreas Kling
580d89f093 Kernel: Put Process unveil state in a SpinlockProtected container
This makes path resolution safe to perform without holding the big lock.
2022-03-08 00:19:49 +01:00
Andreas Kling
24f02bd421 Kernel: Put Process's current directory in a SpinlockProtected
Also let's call it "current_directory" instead of "cwd" everywhere.
2022-03-08 00:19:49 +01:00
Ali Mohammad Pur
e14e919b78 Kernel: Fill some siginfo and ucontext fields on SA_SIGINFO
There's no reason to fill in any of these fields if SA_SIGINFO is not
given, as the signal handler won't be reading from them at all.
2022-03-04 20:07:05 +01:00
Ali Mohammad Pur
cf63447044 Kernel: Move signal handlers from being thread state to process state
POSIX requires that sigaction() and friends set a _process-wide_ signal
handler, so move signal handlers and flags inside Process.
This also fixes a "pid/tid confusion" FIXME, as we can now send the
signal to the process and let that decide which thread should get the
signal (which is the thread with tid==pid, but that's now the Process's
problem).
Note that each thread still retains its signal mask, as that is local to
each thread.
2022-03-04 20:07:05 +01:00
Lucas CHOLLET
839d3d9f74 Kernel: Add getrusage() syscall
Only the two timeval fields are maintained, as required by the POSIX
standard.
2022-02-28 20:09:37 +01:00
Idan Horowitz
feb00b7105 Everywhere: Make JSON serialization fallible
This allows us to eliminate a major source of infallible allocation in
the Kernel, as well as lay down the groundwork for OOM fallibility in
userland.
2022-02-27 20:37:57 +01:00
Idan Horowitz
064b93c2ad Kernel: Add OpenFileDescriptions::try_enumerate for fallible iteration
This API will allow users to short circuit iteration and properly
propagate errors.
2022-02-27 20:37:57 +01:00
Idan Horowitz
aca6c0c031 Kernel: Add Process::try_for_each_thread() for fallible iteration
This API will allow users to short circuit iteration and properly
propagate errors.
2022-02-27 20:37:57 +01:00
Jakub Berkop
895a050e04 Kernel: Fixed argument passing for profiling_enable syscall
Arguments larger than 32bit need to be passed as a pointer on a 32bit
architectures. sys$profiling_enable has u64 event_mask argument,
which means that it needs to be passed as an pointer. Previously upper
32bits were filled by garbage.
2022-02-19 11:37:02 +01:00
Idan Horowitz
316fa0c3f3 AK+Kernel: Specialize Trie for NNOP<KString> and use it in UnveilNode
This let's us avoid the infallible String allocations.
2022-02-16 22:21:37 +01:00
Ali Mohammad Pur
a1cb2c371a AK+Kernel: OOM-harden most parts of Trie
The only part of Unveil that can't handle OOM gracefully is the
String::formatted() use in the node metadata.
2022-02-15 18:03:02 +02:00
Jakub Berkop
4916c892b2 Kernel/Profiling: Add profiling to read syscall
Syscalls to read can now be profiled, allowing us to monitor
filesystem usage by different applications.
2022-02-14 11:38:13 +01:00
Idan Horowitz
c8ab7bde3b Kernel: Use try_make_weak_ptr() instead of make_weak_ptr() 2022-02-13 23:02:57 +01:00
Andrew Kaster
b4a7d148b1 Kernel: Expose maximum argument limit in sysconf
Move the definitions for maximum argument and environment size to
Process.h from execve.cpp. This allows sysconf(_SC_ARG_MAX) to return
the actual argument maximum of 128 KiB to userspace.
2022-02-13 22:06:54 +02:00
Idan Horowitz
e28af4a2fc Kernel: Stop using HashMap in Mutex
This commit removes the usage of HashMap in Mutex, thereby making Mutex
be allocation-free.

In order to achieve this several simplifications were made to Mutex,
removing unused code-paths and extra VERIFYs:
 * We no longer support 'upgrading' a shared lock holder to an
   exclusive holder when it is the only shared holder and it did not
   unlock the lock before relocking it as exclusive. NOTE: Unlike the
   rest of these changes, this scenario is not VERIFY-able in an
   allocation-free way, as a result the new LOCK_SHARED_UPGRADE_DEBUG
   debug flag was added, this flag lets Mutex allocate in order to
   detect such cases when debugging a deadlock.
 * We no longer support checking if a Mutex is locked by the current
   thread when the Mutex was not locked exclusively, the shared version
   of this check was not used anywhere.
 * We no longer support force unlocking/relocking a Mutex if the Mutex
   was not locked exclusively, the shared version of these functions
   was not used anywhere.
2022-01-29 16:45:39 +01:00