Commit Graph

699 Commits

Author SHA1 Message Date
Brian Gianforcaro
ede1483e48 Kernel: Make Process creation APIs OOM safe
This change looks more involved than it actually is. This simply
reshuffles the previous Process constructor and splits out the
parts which can fail (resource allocation) into separate methods
which can be called from a factory method. The factory is then
used everywhere instead of the constructor.
2021-05-15 09:01:32 +02:00
Andreas Kling
e46343bf9a Kernel: Make UserOrKernelBuffer R/W helpers return KResultOr<size_t>
This makes error propagation less cumbersome (and also exposed some
places where we were not doing it.)
2021-05-13 23:28:40 +02:00
Brian Gianforcaro
956314f0a1 Kernel: Make Process::start_tracing_from API OOM safe
Modify the API so it's possible to propagate error on OOM failure.
NonnullOwnPtr<T> is not appropriate for the ThreadTracer::create() API,
so switch to OwnPtr<T>, use adopt_own_if_nonnull() to handle creation.
2021-05-13 16:21:53 +02:00
sin-ack
fe5ca6ca27 Kernel: Implement multi-watch InodeWatcher :^)
This patch modifies InodeWatcher to switch to a one watcher, multiple
watches architecture.  The following changes have been made:

- The watch_file syscall is removed, and in its place the
  create_iwatcher, iwatcher_add_watch and iwatcher_remove_watch calls
  have been added.
- InodeWatcher now holds multiple WatchDescriptions for each file that
  is being watched.
- The InodeWatcher file descriptor can be read from to receive events on
  all watched files.

Co-authored-by: Gunnar Beutner <gunnar@beutner.name>
2021-05-12 22:38:20 +02:00
Brian Gianforcaro
ccdcb6a635 Kernel: Add PerformanceManager static class, move perf event APIs there
The current method of emitting performance events requires a bit of
boiler plate at every invocation, as well as having to ignore the
return code which isn't used outside of the perf event syscall. This
change attempts to clean that up by exposing high level API's that
can be used around the code base.
2021-05-07 15:35:23 +02:00
Itamar
6bbd2ebf83 Kernel+LibELF: Support initializing values of TLS data
Previously, TLS data was always zero-initialized.

To support initializing the values of TLS data, sys$allocate_tls now
receives a buffer with the desired initial data, and copies it to the
master TLS region of the process.

The DynamicLinker gathers the initial TLS image and passes it to
sys$allocate_tls.

We also now require the size passed to sys$allocate_tls to be
page-aligned, to make things easier. Note that this doesn't waste memory
as the TLS data has to be allocated in separate pages anyway.
2021-04-30 18:47:39 +02:00
Jesse Buhagiar
60cdbc9397 Kernel/LibC: Implement setreuid 2021-04-30 11:35:17 +02:00
Gunnar Beutner
aa792062cb Kernel+LibC: Implement the socketpair() syscall 2021-04-28 14:19:45 +02:00
Gunnar Beutner
eb798d5538 Kernel+Profiler: Improve profiling subsystem
This turns the perfcore format into more a log than it was before,
which lets us properly log process, thread and region
creation/destruction. This also makes it unnecessary to dump the
process' regions every time it is scheduled like we did before.

Incidentally this also fixes 'profile -c' because we previously ended
up incorrectly dumping the parent's region map into the profile data.

Log-based mmap support enables profiling shared libraries which
are loaded at runtime, e.g. via dlopen().

This enables profiling both the parent and child process for
programs which use execve(). Previously we'd discard the profiling
data for the old process.

The Profiler tool has been updated to not treat thread IDs as
process IDs anymore. This enables support for processes with more
than one thread. Also, there's a new widget to filter which
process should be displayed.
2021-04-26 17:13:55 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Brian Gianforcaro
4ed682aebc Kernel: Add a syscall to clear the profiling buffer
While profiling all processes the profile buffer lives forever.
Once you have copied the profile to disk, there's no need to keep it
in memory. This syscall surfaces the ability to clear that buffer.
2021-04-19 18:30:37 +02:00
Gunnar Beutner
f033416893 Kernel+LibC: Clean up how assertions work in the kernel and LibC
This also brings LibC's abort() function closer to the spec.
2021-04-18 11:11:15 +02:00
Gunnar Beutner
6cb28ecee8 LibC+LibELF: Implement support for the dl_iterate_phdr helper
This helper is used by libgcc_s to figure out where the .eh_frame sections
are located for all loaded shared objects.
2021-04-18 10:55:25 +02:00
AnotherTest
e4412f1f59 AK+Kernel: Make IntrusiveList capable of holding non-raw pointers
This should allow creating intrusive lists that have smart pointers,
while remaining free (compared to the impl before this commit) when
holding raw pointers :^)
As a sidenote, this also adds a `RawPtr<T>` type, which is just
equivalent to `T*`.
Note that this does not actually use such functionality, but is only
expected to pave the way for #6369, to replace NonnullRefPtrVector<T>
with intrusive lists.

As it is with zero-cost things, this makes the interface a bit less nice
by requiring the type name of what an `IntrusiveListNode` holds (and
optionally its container, if not RawPtr), and also requiring the type of
the container (normally `RawPtr`) on the `IntrusiveList` instance.
2021-04-16 22:26:52 +02:00
Andreas Kling
0b8226811f Kernel+CrashReporter: Add metadata about page faults to crash reports
Crash reports for page faults now tell you what kind of memory access
failed and where. :^)
2021-04-04 20:13:55 +02:00
Jean-Baptiste Boric
7a079f7780 LibC+Kernel: Switch off_t to 64 bits 2021-03-17 23:22:42 +01:00
Andreas Kling
ef1e5db1d0 Everywhere: Remove klog(), dbg() and purge all LogStream usage :^)
Good-bye LogStream. Long live AK::Format!
2021-03-12 17:29:37 +01:00
Andreas Kling
1608ef37d8 Kernel: Move process termination status/signal into protected data 2021-03-11 14:24:08 +01:00
Andreas Kling
4916b5c130 Kernel: Move process thread lists into protected data 2021-03-11 14:21:49 +01:00
Andreas Kling
b7b7a48c66 Kernel: Move process signal trampoline address into protected data 2021-03-11 14:21:49 +01:00
Andreas Kling
08e0e2eb41 Kernel: Move process umask into protected data :^) 2021-03-11 14:21:49 +01:00
Andreas Kling
90c0f9664e Kernel: Don't keep protected Process data in a separate allocation
The previous architecture had a huge flaw: the pointer to the protected
data was itself unprotected, allowing you to overwrite it at any time.

This patch reorganizes the protected data so it's part of the Process
class itself. (Actually, it's a new ProcessBase helper class.)

We use the first 4 KB of Process objects themselves as the new storage
location for protected data. Then we make Process objects page-aligned
using MAKE_ALIGNED_ALLOCATED.

This allows us to easily turn on/off write-protection for everything in
the ProcessBase portion of Process. :^)

Thanks to @bugaevc for pointing out the flaw! This is still not perfect
but it's an improvement.
2021-03-11 14:21:49 +01:00
Andreas Kling
de6c5128fd Kernel: Move process pledge promises into protected data 2021-03-10 22:50:00 +01:00
Andreas Kling
37ad880660 Kernel: Move process "dumpable" flag into protected data 2021-03-10 22:42:07 +01:00
Andreas Kling
3d27269f13 Kernel: Move process parent PID into protected data :^) 2021-03-10 22:30:02 +01:00
Andreas Kling
d677a73b0e Kernel: Move process extra_gids into protected data :^) 2021-03-10 22:30:02 +01:00
Andreas Kling
cbcf891040 Kernel: Move select Process members into protected memory
Process member variable like m_euid are very valuable targets for
kernel exploits and until now they have been writable at all times.

This patch moves m_euid along with a whole bunch of other members
into a new Process::ProtectedData struct. This struct is remapped
as read-only memory whenever we don't need to write to it.

This means that a kernel write primitive is no longer enough to
overwrite a process's effective UID, you must first unprotect the
protected data where the UID is stored. :^)
2021-03-10 22:30:02 +01:00
Andreas Kling
84725ef3a5 Kernel+UserspaceEmulator: Add sys$emuctl() system call
This returns ENOSYS if you are running in the real kernel, and some
other result if you are running in UserspaceEmulator.

There are other ways we could check if we're inside an emulator, but
it seemed easier to just ask. :^)
2021-03-09 08:58:26 +01:00
Andreas Kling
b425c2602c Kernel: Better handling of allocation failure in profiling
If we can't allocate a PerformanceEventBuffer to store the profiling
events, we now fail sys$profiling_enable() and sys$perf_event()
with ENOMEM instead of carrying on with a broken buffer.
2021-03-02 22:38:06 +01:00
Ben Wiederhake
336303bda4 Kernel: Make kgettimeofday use AK::Time 2021-03-02 08:36:08 +01:00
Ben Wiederhake
05d5e3fad9 Kernel: Remove duplicative kgettimeofday(timeval&) function 2021-03-02 08:36:08 +01:00
Ben Wiederhake
c040e64b7d Kernel: Make TimeManagement use AK::Time internally
I don't dare touch the multi-threading logic and locking mechanism, so it stays
timespec for now. However, this could and should be changed to AK::Time, and I
bet it will simplify the "increment_time_since_boot()" code.
2021-03-02 08:36:08 +01:00
Andreas Kling
272c2e6ec5 Kernel: Use Userspace<T> in sys${munmap,mprotect,madvise,msyscall}() 2021-03-01 15:53:33 +01:00
Andreas Kling
bebceaa32c Kernel: Use Userspace<T> in sys$select() 2021-03-01 15:07:01 +01:00
Andreas Kling
a1a82c1d95 Kernel: Use Userspace<T> in sys$get_dir_entries() 2021-03-01 15:04:31 +01:00
Andreas Kling
b5f32be577 Kernel: Use Userspace<T> in sys$get_stack_bounds() 2021-03-01 14:50:36 +01:00
Andreas Kling
122c7b6cbb Kernel: Use Userspace<T> in sys$write() 2021-03-01 14:35:06 +01:00
Andreas Kling
6a6eb8844a Kernel: Use Userspace<T> in sys$sigaction()
fuzz-syscalls found a bunch of unaligned accesses into struct sigaction
via this syscall. This patch fixes that issue by porting the syscall
to Userspace<T> which we should have done anyway. :^)

Fixes #5500.
2021-03-01 14:06:20 +01:00
Andreas Kling
ac71775de5 Kernel: Make all syscall functions return KResultOr<T>
This makes it a lot easier to return errors since we no longer have to
worry about negating EFOO errors and can just return them flat.
2021-03-01 13:54:32 +01:00
Linus Groh
e265054c12 Everywhere: Remove a bunch of redundant 'AK::' namespace prefixes
This is basically just for consistency, it's quite strange to see
multiple AK container types next to each other, some with and some
without the namespace prefix - we're 'using AK::Foo;' a lot and should
leverage that. :^)
2021-02-26 16:59:56 +01:00
Andreas Kling
8eeb8db2ed Kernel: Don't disable interrupts while dealing with a process crash
This was necessary in the past when crash handling would modify
various global things, but all that stuff is long gone so we can
simplify crashes by leaving the interrupt flag alone.
2021-02-25 19:36:36 +01:00
Andreas Kling
8f70528f30 Kernel: Take some baby steps towards x86_64
Make more of the kernel compile in 64-bit mode, and make some things
pointer-size-agnostic (by using FlatPtr.)

There's a lot of work to do here before the kernel will even compile.
2021-02-25 16:27:12 +01:00
Andreas Kling
5d180d1f99 Everywhere: Rename ASSERT => VERIFY
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)

Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.

We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.
2021-02-23 20:56:54 +01:00
Andreas Kling
84b2d4c475 Kernel: Add "map_fixed" pledge promise
This is a new promise that guards access to mmap() with MAP_FIXED.

Fixed-address mappings are rarely used, but can be useful if you are
trying to groom the process address space for malicious purposes.

None of our programs need this at the moment, as the only user of
MAP_FIXED is DynamicLoader, but the fixed mappings are constructed
before the process has had a chance to pledge anything.
2021-02-21 01:08:48 +01:00
Andreas Kling
55a9a4f57a Kernel: Use KResult a bit more in sys$execve() 2021-02-18 09:37:33 +01:00
AnotherTest
a3a7ab83c4 Kernel+LibC: Implement readv
We already had writev, so let's just add readv too.
2021-02-15 17:32:56 +01:00
Andreas Kling
781d29a337 Kernel+Userland: Give sys$recvfd() an options argument for O_CLOEXEC
@bugaevc pointed out that we shouldn't be setting this flag in
userspace, and he's right of course.
2021-02-14 10:39:48 +01:00
Andreas Kling
1593219a41 Kernel: Map signal trampoline into each process's address space
The signal trampoline was previously in kernelspace memory, but with
a special exception to make it user-accessible.

This patch moves it into each process's regular address space so we
can stop supporting user-allowed memory above 0xc0000000.
2021-02-14 01:33:17 +01:00
Andreas Kling
1ef43ec89a Kernel: Move get_interpreter_load_offset() out of Process class
This is only used inside the sys$execve() implementation so just make
it a execve.cpp local function.
2021-02-12 16:30:29 +01:00
Andreas Kling
4ff0f971f7 Kernel: Prevent execve/ptrace race
Add a per-process ptrace lock and use it to prevent ptrace access to a
process after it decides to commit to a new executable in sys$execve().

Fixes #5230.
2021-02-08 23:05:41 +01:00
Andreas Kling
0d7af498d7 Kernel: Move ShouldAllocateTls enum from Process to execve.cpp 2021-02-08 22:24:37 +01:00
Andreas Kling
8bda30edd2 Kernel: Move memory statistics helpers from Process to Space 2021-02-08 22:23:29 +01:00
Andreas Kling
f1b5def8fd Kernel: Factor address space management out of the Process class
This patch adds Space, a class representing a process's address space.

- Each Process has a Space.
- The Space owns the PageDirectory and all Regions in the Process.

This allows us to reorganize sys$execve() so that it constructs and
populates a new Space fully before committing to it.

Previously, we would construct the new address space while still
running in the old one, and encountering an error meant we had to do
tedious and error-prone rollback.

Those problems are now gone, replaced by what's hopefully a set of much
smaller problems and missing cleanups. :^)
2021-02-08 18:27:28 +01:00
Andreas Kling
cf5ab665e0 Kernel: Remove unused Process::for_each_thread_in_coredump() 2021-02-08 18:27:28 +01:00
Ben Wiederhake
0a2304ba05 Everywhere: Fix weird includes 2021-02-08 18:03:57 +01:00
Andreas Kling
5c1c82cd33 Kernel: Remove unused function Process::backtrace() 2021-02-07 19:27:00 +01:00
Andreas Kling
b1813e5dae Kernel: Remove some unused declarations from Process 2021-02-07 19:27:00 +01:00
Andreas Kling
823186031d Kernel: Add a way to specify which memory regions can make syscalls
This patch adds sys$msyscall() which is loosely based on an OpenBSD
mechanism for preventing syscalls from non-blessed memory regions.

It works similarly to pledge and unveil, you can call it as many
times as you like, and when you're finished, you call it with a null
pointer and it will stop accepting new regions from then on.

If a syscall later happens and doesn't originate from one of the
previously blessed regions, the kernel will simply crash the process.
2021-02-02 20:13:44 +01:00
Ben Wiederhake
cbee0c26e1 Kernel+keymap+KeyboardMapper: New pledge for getkeymap 2021-02-01 09:54:32 +01:00
Ben Wiederhake
a2c21a55e1 Kernel+LibKeyboard: Enable querying the current keymap 2021-02-01 09:54:32 +01:00
Andreas Kling
d0c5979d96 Kernel: Add "prot_exec" pledge promise and require it for PROT_EXEC
This prevents sys$mmap() and sys$mprotect() from creating executable
memory mappings in pledged programs that don't have this promise.

Note that the dynamic loader runs before pledging happens, so it's
unaffected by this.
2021-01-29 18:56:34 +01:00
Andreas Kling
5ff355c0cd Kernel: Generate coredump backtraces from "threads for coredump" list
This broke with the change that gave each process a list of its own
threads. Since threads are removed slightly earlier from that list
during process teardown, we're not able to use it for generating
coredump backtraces. Fortunately we have the "threads for coredump"
list for just this purpose. :^)
2021-01-28 08:41:18 +01:00
Tom
8104abf640 Kernel: Remove colonel special-case from Process::for_each_thread
Since each Process now has its own list of threads, we don't need
to treat colonel any different anymore. This also means that it
reports all kernel threads, not just the idle threads.
2021-01-28 08:14:12 +01:00
Tom
ac3927086f Kernel: Keep a list of threads per Process
This allow us to iterate only the threads of the process.
2021-01-27 22:48:41 +01:00
Tom
03a9ee79fa Kernel: Implement thread priority queues
Rather than walking all Thread instances and putting them into
a vector to be sorted by priority, queue them into priority sorted
linked lists as soon as they become ready to be executed.
2021-01-27 22:48:41 +01:00
Andreas Kling
e67402c702 Kernel: Remove Range "valid" state and use Optional<Range> instead
It's easier to understand VM ranges if they are always valid. We can
simply use an empty Optional<Range> to encode absence when needed.
2021-01-27 21:14:42 +01:00
Tom
21d288a10e Kernel: Make Thread::current smp-safe
Change Thread::current to be a static function and read using the fs
register, which eliminates a window between Processor::current()
returning and calling a function on it, which can trigger preemption
and a move to a different processor, which then causes operating
on the wrong object.
2021-01-27 21:12:24 +01:00
Andreas Kling
c7858622ec Kernel: Update process promise states on execve() and fork()
We now move the execpromises state into the regular promises, and clear
the execpromises state.

Also make sure to duplicate the promise state on fork.

This fixes an issue where "su" would launch a shell which immediately
crashed due to not having pledged "stdio".
2021-01-26 15:26:37 +01:00
Andreas Kling
1e25d2b734 Kernel: Remove allocate_region() functions that don't take a Range
Let's force callers to provide a VM range when allocating a region.
This makes ENOMEM error handling more visible and removes implicit
VM allocation which felt a bit magical.
2021-01-26 14:13:57 +01:00
Linus Groh
629180b7d8 Kernel: Support pledge() with empty promises
This tells the kernel that the process wants to use pledge, but without
pledging anything - effectively restricting it to syscalls that don't
require a certain promise. This is part of OpenBSD's pledge() as well,
which served as basis for Serenity's.
2021-01-25 23:22:21 +01:00
Linus Groh
678919e9c1 Kernel: Set "pledge_violation" coredump metadata in REQUIRE_PROMISE()
Similar to LibC storing an assertion message before aborting, process
death by pledge violation now sets a "pledge_violation" key with the
respective pledge name as value in its coredump metadata, which the
CrashReporter will then show.
2021-01-20 21:01:15 +01:00
Tom
1d621ab172 Kernel: Some futex improvements
This adds support for FUTEX_WAKE_OP, FUTEX_WAIT_BITSET, FUTEX_WAKE_BITSET,
FUTEX_REQUEUE, and FUTEX_CMP_REQUEUE, as well well as global and private
futex and absolute/relative timeouts against the appropriate clock. This
also changes the implementation so that kernel resources are only used when
a thread is blocked on a futex.

Global futexes are implemented as offsets in VMObjects, so that different
processes can share a futex against the same VMObject despite potentially
being mapped at different virtual addresses.
2021-01-17 20:30:31 +01:00
Andreas Kling
bf0719092f Kernel+Userland: Remove shared buffers (shbufs)
All users of this mechanism have been switched to anonymous files and
passing file descriptors with sendfd()/recvfd().

Shbufs got us where we are today, but it's time we say good-bye to them
and welcome a much more idiomatic replacement. :^)
2021-01-17 09:07:32 +01:00
Andreas Kling
05dbfe9ab6 Kernel: Remove sys$shbuf_seal() and userland wrappers
There are no remaining users of this syscall so let it go. :^)
2021-01-17 00:18:01 +01:00
Andreas Kling
b818cf898e Kernel+Userland: Remove sys$shbuf_allow_all() and userland wrappers
Nobody is using globally shared shbufs anymore, so let's remove them.
2021-01-16 22:43:03 +01:00
Ben Wiederhake
ea5825f2c9 Kernel+LibC: Make sys$getcwd truncate the result silently
This gives us the superpower of knowing the ideal buffer length if it fails.
See also https://github.com/SerenityOS/serenity/discussions/4357
2021-01-16 22:40:53 +01:00
Andreas Kling
01c2480eb3 Kernel+LibC+WindowServer: Remove unused thread/process boost mechanism
The priority boosting mechanism has been broken for a very long time.
Let's remove it from the codebase and we can bring it back the day
someone feels like implementing it in a working way. :^)
2021-01-16 14:52:04 +01:00
Andreas Kling
43109f9614 Kernel: Remove unused syscall sys$minherit()
This is no longer used. We can bring it back the day we need it.
2021-01-16 14:52:04 +01:00
Andreas Kling
de31e82f97 Kernel: Remove sys$shbuf_set_volatile() and userland wrappers
There are no remaining users of this syscall so let's remove it! :^)
2021-01-16 14:52:04 +01:00
Linus Groh
1ccc2e6482 Kernel: Store process arguments and environment in coredumps
Currently they're only pushed onto the stack but not easily accessible
from the Process class, so this adds a Vector<String> for both.
2021-01-15 23:26:47 +01:00
Linus Groh
057ae36e32 Kernel: Prevent threads from being destructed between die() and finalize()
Killing remaining threads already happens in Process::die(), but
coredumps are only written in Process::finalize(). We need to keep a
reference to each of those threads to prevent them from being destructed
between those two functions, otherwise coredumps will only ever contain
information about the last remaining thread.

Fixes the underlying problem of #4778, though the UI will need
refinements to not show every thread's backtrace mashed together.
2021-01-15 23:26:47 +01:00
Andreas Kling
64b0d89335 Kernel: Make Process::allocate_region*() return KResultOr<Region*>
This allows region allocation to return specific errors and we don't
have to assume every failure is an ENOMEM.
2021-01-15 19:10:30 +01:00
Andreas Kling
fb4993f067 Kernel: Add anonymous files, created with sys$anon_create()
This patch adds a new AnonymousFile class which is a File backed by
an AnonymousVMObject that can only be mmap'ed and nothing else, really.

I'm hoping that this can become a replacement for shbufs. :^)
2021-01-15 13:56:47 +01:00
Andreas Kling
f03800cee3 Kernel: Add dedicated "ptrace" pledge promise
The vast majority of programs don't ever need to use sys$ptrace(),
and it seems like a high-value system call to prevent a compromised
process from using.

This patch moves sys$ptrace() from the "proc" promise to its own,
new "ptrace" promise and updates the affected apps.
2021-01-11 22:32:59 +01:00
Andreas Kling
5c73c1bff8 Kernel: Don't dump perfcore for non-dumpable processes
Fixes #4904
2021-01-11 18:53:45 +01:00
Andreas Kling
5dafb72370 Kernel+Profiler: Make profiling per-process and without core dumps
This patch merges the profiling functionality in the kernel with the
performance events mechanism. A profiler sample is now just another
perf event, rather than a dedicated thing.

Since perf events were already per-process, this now makes profiling
per-process as well.

Processes with perf events would already write out a perfcore.PID file
to the current directory on death, but since we may want to profile
a process and then let it continue running, recorded perf events can
now be accessed at any time via /proc/PID/perf_events.

This patch also adds information about process memory regions to the
perfcore JSON format. This removes the need to supply a core dump to
the Profiler app for symbolication, and so the "profiler coredump"
mechanism is removed entirely.

There's still a hard limit of 4MB worth of perf events per process,
so this is by no means a perfect final design, but it's a nice step
forward for both simplicity and stability.

Fixes #4848
Fixes #4849
2021-01-11 11:36:00 +01:00
Itamar
f259d96871 Kernel: Avoid collision between dynamic loader and main program
When loading non position-independent programs, we now take care not to
load the dynamic loader at an address that collides with the location
the main program wants to load at.

Fixes #4847.
2021-01-10 22:04:43 +01:00
Itamar
40a8159c62 Kernel: Plumb the elf header of the main program down to Process::load
This will enable us to take the desired load address of non-position
independent programs into account when randomizing the load address
of the dynamic loader.
2021-01-10 22:04:43 +01:00
asynts
019c9eb749 Everywhere: Replace a bundle of dbg with dbgln.
These changes are arbitrarily divided into multiple commits to make it
easier to find potentially introduced bugs with git bisect.
2021-01-09 21:11:09 +01:00
Andreas Kling
d991658794 Kernel+LibC: Tidy up assertion failures with a dedicated syscall
This patch adds sys$abort() which immediately crashes the process with
SIGABRT. This makes assertion backtraces a lot nicer by removing all
the gunk that otherwise happens between __assertion_failed() and
actually crashing from the SIGABRT.
2021-01-04 21:57:30 +01:00
Tom
901ef3f1c8 Kernel: Specify default memory order for some non-synchronizing Atomics 2021-01-04 19:13:52 +01:00
Linus Groh
0571a17f57 Kernel+LibELF: Store termination signal in coredump ProcessInfo 2021-01-03 22:12:42 +01:00
William Marlow
747e8de96a Kernel+Loader.so: Allow dynamic executables without an interpreter
Commit a3a9016701 removed the PT_INTERP header
from Loader.so which cleaned up some kernel code in execve. Unfortunately
it prevents Loader.so from being run as an executable
2021-01-03 19:45:16 +01:00
Andreas Kling
5dae85afe7 Kernel: Pass "shared" flag to Region constructor
Before this change, we would sometimes map a region into the address
space with !is_shared(), and then moments later call set_shared(true).

I found this very confusing while debugging, so this patch makes us pass
the initial shared flag to the Region constructor, ensuring that it's in
the correct state by the time we first map the region.
2021-01-02 16:57:31 +01:00
Tom
476f17b3f1 Kernel: Merge PurgeableVMObject into AnonymousVMObject
This implements memory commitments and lazy-allocation of committed
memory.
2021-01-01 23:43:44 +01:00
Tom
c3451899bc Kernel: Add MAP_NORESERVE support to mmap
Rather than lazily committing regions by default, we now commit
the entire region unless MAP_NORESERVE is specified.

This solves random crashes in low-memory situations where e.g. the
malloc heap allocated memory, but using pages that haven't been
used before triggers a crash when no more physical memory is available.

Use this flag to create large regions without actually committing
the backing memory. madvise() can be used to commit arbitrary areas
of such regions after creating them.
2021-01-01 23:43:44 +01:00
Linus Groh
91332515a6 Kernel: Add sys$set_coredump_metadata() syscall
This can be used by applications to store information (key/value pairs)
likely useful for debugging, which will then be embedded in the coredump.
2020-12-30 16:28:27 +01:00
Andreas Kling
30dbe9c78a Kernel+LibC: Add a very limited sys$mremap() implementation
This syscall can currently only remap a shared file-backed mapping into
a private file-backed mapping.
2020-12-29 02:20:43 +01:00
Andreas Kling
0e2b7f9c9a Kernel: Remove the per-process icon_id and sys$set_process_icon()
This was a goofy kernel API where you could assign an icon_id (int) to
a process which referred to a global shbuf with a 16x16 icon bitmap
inside it.

Instead of this, programs that want to display a process icon now
retrieve it from the process executable instead.
2020-12-27 01:16:56 +01:00
Andreas Kling
21ccbc2167 Kernel: Expose process executable paths in /proc/all 2020-12-27 01:16:56 +01:00
AnotherTest
7b5aa06702 Kernel: Allow 'elevating' unveil permissions if implicitly inherited from '/'
This can happen when an unveil follows another with a path that is a
sub-path of the other one:
```c++
unveil("/home/anon/.config/whoa.ini", "rw");
unveil("/home/anon", "r"); // this would fail, as "/home/anon" inherits
                           // the permissions of "/", which is None.
```
2020-12-26 16:10:04 +01:00
AnotherTest
a9184fcb76 Kernel: Implement unveil() as a prefix-tree
Fixes #4530.
2020-12-26 11:54:54 +01:00
Andreas Kling
82f86e35d6 Kernel+LibC: Introduce a "dumpable" flag for processes
This new flag controls two things:
- Whether the kernel will generate core dumps for the process
- Whether the EUID:EGID should own the process's files in /proc

Processes are automatically made non-dumpable when their EUID or EGID is
changed, either via syscalls that specifically modify those ID's, or via
sys$execve(), when a set-uid or set-gid program is executed.

A process can change its own dumpable flag at any time by calling the
new sys$prctl(PR_SET_DUMPABLE) syscall.

Fixes #4504.
2020-12-25 19:35:55 +01:00
Andreas Kling
89d3b09638 Kernel: Allocate new main thread stack before committing to exec
If the allocation fails (e.g ENOMEM) we want to simply return an error
from sys$execve() and continue executing the current executable.

This patch also moves make_userspace_stack_for_main_thread() out of the
Thread class since it had nothing in particular to do with Thread.
2020-12-25 16:22:01 +01:00
Andreas Kling
2f1712cc29 Kernel: Move ELF auxiliary vector building out of Process class
Process had a couple of members whose only purpose was holding on to
some temporary data while building the auxiliary vector. Remove those
members and move the vector building to a free function in execve.cpp
2020-12-25 15:23:35 +01:00
Andreas Kling
40e9edd798 LibELF: Move AuxiliaryValue into the ELF namespace 2020-12-25 14:48:30 +01:00
Andreas Kling
d7ad082afa Kernel+LibELF: Stop doing ELF symbolication in the kernel
Now that the CrashDaemon symbolicates crashes in userspace, let's take
this one step further and stop trying to symbolicate userspace programs
in the kernel at all.
2020-12-25 01:03:46 +01:00
Andreas Kling
8e79bde2b7 Kernel: Move KBufferBuilder to the fallible KBuffer API
KBufferBuilder::build() now returns an OwnPtr<KBuffer> and can fail.
Clients of the API have been updated to handle that situation.
2020-12-18 19:22:26 +01:00
Itamar
b4842d33bb Kernel: Generate a coredump file when a process crashes
When a process crashes, we generate a coredump file and write it in
/tmp/coredumps/.

The coredump file is an ELF file of type ET_CORE.
It contains a segment for every userspace memory region of the process,
and an additional PT_NOTE segment that contains the registers state for
each thread, and a additional data about memory regions
(e.g their name).
2020-12-14 23:05:53 +01:00
Itamar
efe4da57df Loader: Stabilize loader & Use shared libraries everywhere :^)
The dynamic loader is now stable enough to be used everywhere in the
system - so this commit does just that.
No More .a Files, Long Live .so's!
2020-12-14 23:05:53 +01:00
Itamar
9ca1a0731f Kernel: Support TLS allocation from userspace
This adds an allocate_tls syscall through which a userspace process
can request the allocation of a TLS region with a given size.

This will be used by the dynamic loader to allocate TLS for the main
executable & its libraries.
2020-12-14 23:05:53 +01:00
Itamar
5b87904ab5 Kernel: Add ability to load interpreter instead of main program
When the main executable needs an interpreter, we load the requested
interpreter program, and pass to it an open file decsriptor to the main
executable via the auxiliary vector.

Note that we do not allocate a TLS region for the interpreter.
2020-12-14 23:05:53 +01:00
Tom
c455fc2030 Kernel: Change wait blocking to Process-only blocking
This prevents zombies created by multi-threaded applications and brings
our model back to closer to what other OSs do.

This also means that SIGSTOP needs to halt all threads, and SIGCONT needs
to resume those threads.
2020-12-12 21:28:12 +01:00
Tom
4bbee00650 Kernel: disown should unblock any potential waiters
This is necessary because if a process changes the state to Stopped
or resumes from that state, a wait entry is created in the parent
process. So, if a child process does this before disown is called,
we need to clear those entries to avoid leaking references/zombies
that won't be cleaned up until the former parent exits.

This also should solve an even more unlikely corner case where another
thread is waiting on a pid that is being disowned by another thread.
2020-12-12 21:28:12 +01:00
Tom
4c1e27ec65 Kernel: Use TimerQueue for SIGALRM 2020-12-02 13:02:04 +01:00
Tom
046d6855f5 Kernel: Move block condition evaluation out of the Scheduler
This makes the Scheduler a lot leaner by not having to evaluate
block conditions every time it is invoked. Instead evaluate them as
the states change, and unblock threads at that point.

This also implements some more waitid/waitpid/wait features and
behavior. For example, WUNTRACED and WNOWAIT are now supported. And
wait will now not return EINTR when SIGCHLD is delivered at the
same time.
2020-11-30 13:17:02 +01:00
Tom
6a620562cc Kernel: Allow passing a thread argument for new kernel threads
This adds the ability to pass a pointer to kernel thread/process.
Also add the ability to use a closure as thread function, which
allows passing information to a kernel thread more easily.
2020-11-30 13:17:02 +01:00
Sergey Bugaev
098070b767 Kernel: Add unveil('b')
This is a new "browse" permission that lets you open (and subsequently list
contents of) directories underneath the path, but not regular files or any other
types of files.
2020-11-23 18:37:40 +01:00
Nico Weber
323e727a4c Kernel+LibC: Add adjtime(2)
Most systems (Linux, OpenBSD) adjust 0.5 ms per second, or 0.5 us per
1 ms tick. That is, the clock is sped up or slowed down by at most
0.05%.  This means adjusting the clock by 1 s takes 2000 s, and the
clock an be adjusted by at most 1.8 s per hour.

FreeBSD adjusts 5 ms per second if the remaining time adjustment is
>= 1 s (0.5%) , else it adjusts by 0.5 ms as well. This allows adjusting
by (almost) 18 s per hour.

Since Serenity OS can lose more than 22 s per hour (#3429), this
picks an adjustment rate up to 1% for now. This allows us to
adjust up to 36s per hour, which should be sufficient to adjust
the clock fast enough to keep up with how much time the clock
currently loses. Once we have a fancier NTP implementation that can
adjust tick rate in addition to offset, we can think about reducing
this.

adjtime is a bit old-school and most current POSIX-y OSs instead
implement adjtimex/ntp_adjtime, but a) we have to start somewhere
b) ntp_adjtime() is a fairly gnarly API. OpenBSD's adjfreq looks
like it might provide similar functionality with a nicer API. But
before worrying about all this, it's probably a good idea to get
to a place where the kernel APIs are (barely) good enough so that
we can write an ntp service, and once we have that we should write
a way to automatically evaluate how well it keeps the time adjusted,
and only then should we add improvements ot the adjustment mechanism.
2020-11-10 19:03:08 +01:00
Tom
838d9fa251 Kernel: Make Thread refcounted
Similar to Process, we need to make Thread refcounted. This will solve
problems that will appear once we schedule threads on more than one
processor. This allows us to hold onto threads without necessarily
holding the scheduler lock for the entire duration.
2020-09-27 19:46:04 +02:00
Nico Weber
b36a2d6686 Kernel+LibC+UserspaceEmulator: Mostly add recvmsg(), sendmsg()
The implementation only supports a single iovec for now.
Some might say having more than one iovec is the main point of
recvmsg() and sendmsg(), but I'm interested in the control message
bits.
2020-09-17 17:23:01 +02:00
Nico Weber
c9a3a5b488 Kernel: Use Userspace<> for sys$writev 2020-09-15 20:20:38 +02:00
Tom
c8d9f1b9c9 Kernel: Make copy_to/from_user safe and remove unnecessary checks
Since the CPU already does almost all necessary validation steps
for us, we don't really need to attempt to do this. Doing it
ourselves doesn't really work very reliably, because we'd have to
account for other processors modifying virtual memory, and we'd
have to account for e.g. pages not being able to be allocated
due to insufficient resources.

So change the copy_to/from_user (and associated helper functions)
to use the new safe_memcpy, which will return whether it succeeded
or not. The only manual validation step needed (which the CPU
can't perform for us) is making sure the pointers provided by user
mode aren't pointing to kernel mappings.

To make it easier to read/write from/to either kernel or user mode
data add the UserOrKernelBuffer helper class, which will internally
either use copy_from/to_user or directly memcpy, or pass the data
through directly using a temporary buffer on the stack.

Last but not least we need to keep syscall params trivial as we
need to copy them from/to user mode using copy_from/to_user.
2020-09-13 21:19:15 +02:00
Tom
0fab0ee96a Kernel: Rename Process::is_ring0/3 to Process::is_kernel/user_process
Since "rings" typically refer to code execution and user processes
can also execute in ring 0, rename these functions to more accurately
describe what they mean: kernel processes and user processes.
2020-09-10 19:57:15 +02:00
asynts
ec1080b18a Refactor: Replace usages of FixedArray with Vector. 2020-09-08 14:01:21 +02:00
AnotherTest
688e54eac7 Kernel: Distinguish between new and old process groups with equal pgids
This does not add any behaviour change to the processes, but it ties a
TTY to an active process group via TIOCSPGRP, and returns the TTY to the
kernel when all processes in the process group die.
Also makes the TTY keep a link to the original controlling process' parent (for
SIGCHLD) instead of the process itself.
2020-08-19 21:21:34 +02:00
Brian Gianforcaro
8e97de2df9 Kernel: Use Userspace<T> for the recvfrom syscall, and Socket implementation
This fixes a bunch of unchecked kernel reads and writes, seems like they
would might exploitable :). Write of sockaddr_in size to any address you
please...
2020-08-19 21:05:28 +02:00
Brian Gianforcaro
9f9b05ba0f Kernel: Use Userspace<T> for the sendto syscall, and Socket implementation
Note that the data member is of type ImmutableBufferArgument, which has
no Userspace<T> usage. I left it alone for now, to be fixed in a future
change holistically for all usages.
2020-08-19 21:05:28 +02:00
Andreas Kling
9ddd540ca9 Kernel: Bump process thread count to a 32-bit value
We should support more than 65535 threads, after all. :^)
2020-08-17 18:05:35 +02:00
Andreas Kling
65f2270232 Kernel+LibC+UserspaceEmulator: Bring back sys$dup2()
This is racy in userspace and non-racy in kernelspace so let's keep
it in kernelspace.

The behavior change where CLOEXEC is preserved when dup2() is called
with (old_fd == new_fd) was good though, let's keep that.
2020-08-15 11:11:34 +02:00
Andreas Kling
bf247fb45f Kernel+LibC+UserspaceEmulator: Remove sys$dup() and sys$dup2()
We can just implement these in userspace, so yay two less syscalls!
2020-08-15 01:30:22 +02:00
Brian Gianforcaro
0e627b0273 Kernel: Use Userspace<T> for the exit_thread syscall
Userspace<void*> is a bit strange here, as it would appear to the
user that we intend to de-refrence the pointer in kernel mode.

However I think it does a good join of illustrating that we are
treating the void* as a value type,  instead of a pointer type.
2020-08-10 12:52:15 +02:00
Brian Gianforcaro
d3847b3489 Kernel: Use Userspace<T> for the join_thread syscall 2020-08-10 12:52:15 +02:00
Brian Gianforcaro
e8917cc5f3 Kernel: Use Userspace<T> for the chroot syscall 2020-08-10 12:52:15 +02:00
Brian Gianforcaro
20e2a5c111 Kernel: Use Userspace<T> for the module_unload syscall 2020-08-10 12:52:15 +02:00
Brian Gianforcaro
c4927ceb08 Kernel: Use Userspace<T> for the module_load syscall 2020-08-10 12:52:15 +02:00
Brian Gianforcaro
b5a2a215f6 Kernel: Use Userspace<T> for the getrandom syscall 2020-08-10 12:52:15 +02:00
Brian Gianforcaro
c8ae244ab8 Kernel: Use Userspace<T> for the shbuf_get syscall 2020-08-10 12:52:15 +02:00
Brian Gianforcaro
e073f2b59e Kernel: Use Userspace<T> for the get_thread_name syscall 2020-08-10 12:52:15 +02:00
Brian Gianforcaro
9652b0ae2b Kernel: Use Userspace<T> for the set_thread_name syscall 2020-08-10 12:52:15 +02:00
Brian Gianforcaro
0e20a6df0a Kernel: Use Userspace<T> for the connect syscall 2020-08-10 12:52:15 +02:00
Brian Gianforcaro
8bd9dbc220 Kernel: Use Userspace<T> for the accept syscall 2020-08-10 12:52:15 +02:00
Brian Gianforcaro
02660b5d60 Kernel: Use Userspace<T> for the bind syscall, and implementation 2020-08-10 12:52:15 +02:00
Brian Gianforcaro
2bac7190c8 Kernel: Use Userspace<T> for the chmod syscall 2020-08-10 12:52:15 +02:00
Brian Gianforcaro
82bf6e8133 Kernel: Use Userspace<T> for the umount syscall 2020-08-10 12:52:15 +02:00
Brian Gianforcaro
317800324c Kernel: Use Userspace<T> for the unlink syscall 2020-08-10 12:52:15 +02:00
Brian Gianforcaro
ecfe20efd2 Kernel: Use Userspace<T> for the sigpending syscall 2020-08-10 12:52:15 +02:00
Brian Gianforcaro
fbb26b28b9 Kernel: Use Userspace<T> for the sigprocmask syscall 2020-08-10 12:52:15 +02:00
Brian Gianforcaro
431145148e Kernel: Use Userspace<T> for the fstat syscall 2020-08-10 12:52:15 +02:00
Brian Gianforcaro
8dd78201a4 Kernel: Use Userspace<T> for the uname syscall 2020-08-10 12:52:15 +02:00