Commit Graph

209 Commits

Author SHA1 Message Date
Andreas Kling
cc0f5917d3 Kernel: Slap a handful more things with UNMAP_AFTER_INIT 2021-02-20 00:00:19 +01:00
Andreas Kling
4188373020 Kernel: Fix TOCTOU in syscall entry region validation
We were doing stack and syscall-origin region validations before
taking the big process lock. There was a window of time where those
regions could then be unmapped/remapped by another thread before we
proceed with our syscall.

This patch closes that window, and makes sys$get_stack_bounds() rely
on the fact that we now know the userspace stack pointer to be valid.

Thanks to @BenWiederhake for spotting this! :^)
2021-02-14 11:47:14 +01:00
Andreas Kling
10b7f6b77e Kernel: Mark handle_crash() as [[noreturn]] 2021-02-14 11:47:14 +01:00
Andreas Kling
3131281747 Kernel: Panic on syscall from process with IOPL != 0
If this happens then the kernel is in an undefined state, so we should
rather panic than attempt to limp along.
2021-02-14 10:51:17 +01:00
Andreas Kling
f1b5def8fd Kernel: Factor address space management out of the Process class
This patch adds Space, a class representing a process's address space.

- Each Process has a Space.
- The Space owns the PageDirectory and all Regions in the Process.

This allows us to reorganize sys$execve() so that it constructs and
populates a new Space fully before committing to it.

Previously, we would construct the new address space while still
running in the old one, and encountering an error meant we had to do
tedious and error-prone rollback.

Those problems are now gone, replaced by what's hopefully a set of much
smaller problems and missing cleanups. :^)
2021-02-08 18:27:28 +01:00
Andreas Kling
823186031d Kernel: Add a way to specify which memory regions can make syscalls
This patch adds sys$msyscall() which is loosely based on an OpenBSD
mechanism for preventing syscalls from non-blessed memory regions.

It works similarly to pledge and unveil, you can call it as many
times as you like, and when you're finished, you call it with a null
pointer and it will stop accepting new regions from then on.

If a syscall later happens and doesn't originate from one of the
previously blessed regions, the kernel will simply crash the process.
2021-02-02 20:13:44 +01:00
Tom
0bd558081e Kernel: Track previous mode when entering/exiting traps
This allows us to determine what the previous mode (user or kernel)
was, e.g. in the timer interrupt. This is used e.g. to determine
whether a signal handler should be set up.

Fixes #5096
2021-01-27 21:12:24 +01:00
Andreas Kling
6bfbc5f5f5 Kernel: Don't allow modifying IOPL via sys$ptrace() or sys$sigreturn()
It was possible to overwrite the entire EFLAGS register since we didn't
do any masking in the ptrace and sigreturn syscalls.

This made it trivial to gain IO privileges by raising IOPL to 3 and
then you could talk to hardware to do all kinds of nasty things.

Thanks to @allesctf for finding these issues! :^)

Their exploit/write-up: https://github.com/allesctf/writeups/blob/master/2020/hxpctf/wisdom2/writeup.md
2020-12-22 19:38:25 +01:00
Tom
c455fc2030 Kernel: Change wait blocking to Process-only blocking
This prevents zombies created by multi-threaded applications and brings
our model back to closer to what other OSs do.

This also means that SIGSTOP needs to halt all threads, and SIGCONT needs
to resume those threads.
2020-12-12 21:28:12 +01:00
Tom
da5cc34ebb Kernel: Fix some issues related to fixes and block conditions
Fix some problems with join blocks where the joining thread block
condition was added twice, which lead to a crash when trying to
unblock that condition a second time.

Deferred block condition evaluation by File objects were also not
properly keeping the File object alive, which lead to some random
crashes and corruption problems.

Other problems were caused by the fact that the Queued state didn't
handle signals/interruptions consistently. To solve these issues we
remove this state entirely, along with Thread::wait_on and change
the WaitQueue into a BlockCondition instead.

Also, deliver signals even if there isn't going to be a context switch
to another thread.

Fixes #4336 and #4330
2020-12-12 21:28:12 +01:00
Tom
046d6855f5 Kernel: Move block condition evaluation out of the Scheduler
This makes the Scheduler a lot leaner by not having to evaluate
block conditions every time it is invoked. Instead evaluate them as
the states change, and unblock threads at that point.

This also implements some more waitid/waitpid/wait features and
behavior. For example, WUNTRACED and WNOWAIT are now supported. And
wait will now not return EINTR when SIGCHLD is delivered at the
same time.
2020-11-30 13:17:02 +01:00
Andreas Kling
1951dfa46a Kernel: Convert dbg() to dbgln() in Syscall.cpp 2020-11-23 16:08:42 +01:00
Ben Wiederhake
64cc3f51d0 Meta+Kernel: Make clang-format-10 clean 2020-09-25 21:18:17 +02:00
Nico Weber
df62e54d1e
Kernel: Request random numbers for syscall stack noise in larger chunks (#3125)
Cuts time needed for `disasm /bin/id` from 2.5s to 1s -- identical
to the time it needs when not doing the random adjustment at all.

The downside is that it's now very easy to get the random offsets
with out-of-bounds reads, so it does make this mitigation less
effective.
2020-08-13 21:05:08 +02:00
Brian Gianforcaro
0e627b0273 Kernel: Use Userspace<T> for the exit_thread syscall
Userspace<void*> is a bit strange here, as it would appear to the
user that we intend to de-refrence the pointer in kernel mode.

However I think it does a good join of illustrating that we are
treating the void* as a value type,  instead of a pointer type.
2020-08-10 12:52:15 +02:00
Andreas Kling
83a4fbf548 Kernel: Tidy up the syscalls list by reorganizing the enumerator macro 2020-08-04 18:17:16 +02:00
Tom
f4a5c9b6c2 Kernel: Consolidate timeout logic
Allow passing in an optional timeout to Thread::block and move
the timeout check out of Thread::Blocker. This way all Blockers
implicitly support timeouts and don't need to implement it
themselves. Do however allow them to override timeouts (e.g.
for sockets).
2020-08-03 18:23:00 +02:00
Andreas Kling
be7add690d Kernel: Rename region_from_foo() => find_region_from_foo()
Let's emphasize that these functions actually go out and find regions.
2020-07-30 23:52:28 +02:00
Andreas Kling
8ec8ec8b1c Kernel: Remove special-casing of sys$gettid() in syscall entry
We had a fast-path for the gettid syscall that was useful before
we started caching the thread ID in LibC. Just get rid of it. :^)
2020-07-18 00:25:02 +02:00
Andreas Kling
11c4a28660 Kernel: Move headers intended for userspace use into Kernel/API/ 2020-07-04 17:22:23 +02:00
Tom
16783bd14d Kernel: Turn Thread::current and Process::current into functions
This allows us to query the current thread and process on a
per processor basis
2020-07-01 12:07:01 +02:00
Tom
fb41d89384 Kernel: Implement software context switching and Processor structure
Moving certain globals into a new Processor structure for
each CPU allows us to eventually run an instance of the
scheduler on each CPU.
2020-07-01 12:07:01 +02:00
Itamar
6b74d38aab Kernel: Add 'ptrace' syscall
This commit adds a basic implementation of
the ptrace syscall, which allows one process
(the tracer) to control another process (the tracee).

While a process is being traced, it is stopped whenever a signal is
received (other than SIGCONT).

The tracer can start tracing another thread with PT_ATTACH,
which causes the tracee to stop.

From there, the tracer can use PT_CONTINUE
to continue the execution of the tracee,
or use other request codes (which haven't been implemented yet)
to modify the state of the tracee.

Additional request codes are PT_SYSCALL, which causes the tracee to
continue exection but stop at the next entry or exit from a syscall,
and PT_GETREGS which fethces the last saved register set of the tracee
(can be used to inspect syscall arguments and return value).

A special request code is PT_TRACE_ME, which is issued by the tracee
and causes it to stop when it calls execve and wait for the
tracer to attach.
2020-03-28 18:27:18 +01:00
Liav A
dbc536e917 Interrupts: Assert if trying to install an handler on syscall vector
Installing an interrupt handler on the syscall IDT vector can lead to
fatal results, so we must assert if that happens.
2020-03-24 16:15:33 +01:00
Shannon Booth
81adefef27 Kernel: Run clang-format on files
Let's rip off the band-aid
2020-03-22 01:22:32 +01:00
Liav A
f0ca29eb4b Kernel: Run clang-format on various files 2020-03-02 22:23:39 +01:00
Liav A
0fc60e41dd Kernel: Use klog() instead of kprintf()
Also, duplicate data in dbg() and klog() calls were removed.
In addition, leakage of virtual address to kernel log is prevented.
This is done by replacing kprintf() calls to dbg() calls with the
leaked data instead.
Also, other kprintf() calls were replaced with klog().
2020-03-02 22:23:39 +01:00
Liav A
9e520fd0d6 Syscall: Use dbg() instead of dbgprintf() 2020-02-27 13:05:12 +01:00
Cristian-Bogdan SIRB
5aa5ce53bc Kernel: Fix the gettid syscall
syscall_handler was not actually updating the value in regs->eax, so the
gettid() was always returning 85: the value of regs->eax was not
actually updated, and it remained the one from Userland (the value of
SC_gettid).

The syscall_handler was modified to actually get a pointer to
RegisterState, so any changes to it will actually be saved.

NOTE: This was actually more of a compiler optimization:
On the SC_gettid flow, we saved in regs.eax the return value of
sys$gettid(), but the compiler discarded it, since it followed a return.
On a normal flow, the value of regs.eax was reused in
tracer->did_syscall, so the compiler actually updated the value.
2020-02-27 10:58:43 +01:00
Andreas Kling
48f7c28a5c Kernel: Replace "current" with Thread::current and Process::current
Suggested by Sergey. The currently running Thread and Process are now
Thread::current and Process::current respectively. :^)
2020-02-17 15:04:27 +01:00
Andreas Kling
a356e48150 Kernel: Move all code into the Kernel namespace 2020-02-16 01:27:42 +01:00
Andreas Kling
0341ddc5eb Kernel: Rename RegisterDump => RegisterState 2020-02-16 00:15:37 +01:00
Andreas Kling
e576c9e952 Kernel: Clear ESI and EDI on syscall entry
Since these are not part of the system call convention, we don't care
what userspace had in there. Might as well scrub it before entering
the kernel.

I would scrub EBP too, but that breaks the comfy kernel-thru-userspace
stack traces we currently get. It can be done with some effort.
2020-01-25 10:34:32 +01:00
Andreas Kling
5ce1cc89a0 Kernel: Add fast-path for sys$gettid()
The userspace locks are very aggressively calling sys$gettid() to find
out which thread ID they have.

Since syscalls are quite heavy, this can get very expensive for some
programs. This patch adds a fast-path for sys$gettid(), which makes it
skip all of the usual syscall validation and just return the thread ID
right away.

This cuts Kernel/Process.cpp compile time by ~18%, from ~29 to ~24 sec.
2020-01-19 17:32:05 +01:00
Andreas Kling
94ca55cefd Meta: Add license header to source files
As suggested by Joshua, this commit adds the 2-clause BSD license as a
comment block to the top of every source file.

For the first pass, I've just added myself for simplicity. I encourage
everyone to add themselves as copyright holders of any file they've
added or modified in some significant way. If I've added myself in
error somewhere, feel free to replace it with the appropriate copyright
holder instead.

Going forward, all new source files should include a license header.
2020-01-18 09:45:54 +01:00
Andreas Kling
8b54ba0d61 Kernel: Dispatch pending signals when returning from a syscall
It was quite easy to put the system into a heavy churn state by doing
e.g "cat /dev/zero".

It was then basically impossible to kill the "cat" process, even with
"kill -9", since signals are only delivered in two conditions:

a) The target thread is blocked in the kernel
b) The target thread is running in userspace

However, since "cat /dev/zero" command spends most of its time actively
running in the kernel, not blocked, the signal dispatch code just kept
postponing actually handling the signal indefinitely.

To fix this, we now check before returning from a syscall if there are
any pending unmasked signals, and if so, we take a dramatic pause by
blocking the current thread, knowing it will immediately be unblocked
by signal dispatch anyway. :^)
2020-01-12 15:04:33 +01:00
Andreas Kling
17ef5bc0ac Kernel: Rename {ss,esp}_if_crossRing to userspace_{ss,esp}
These were always so awkwardly named.
2020-01-09 18:02:01 +01:00
Andreas Kling
9eef39d68a Kernel: Start implementing x86 SMAP support
Supervisor Mode Access Prevention (SMAP) is an x86 CPU feature that
prevents the kernel from accessing userspace memory. With SMAP enabled,
trying to read/write a userspace memory address while in the kernel
will now generate a page fault.

Since it's sometimes necessary to read/write userspace memory, there
are two new instructions that quickly switch the protection on/off:
STAC (disables protection) and CLAC (enables protection.)
These are exposed in kernel code via the stac() and clac() helpers.

There's also a SmapDisabler RAII object that can be used to ensure
that you don't forget to re-enable protection before returning to
userspace code.

THis patch also adds copy_to_user(), copy_from_user() and memset_user()
which are the "correct" way of doing things. These functions allow us
to briefly disable protection for a specific purpose, and then turn it
back on immediately after it's done. Going forward all kernel code
should be moved to using these and all uses of SmapDisabler are to be
considered FIXME's.

Note that we're not realizing the full potential of this feature since
I've used SmapDisabler quite liberally in this initial bring-up patch.
2020-01-05 18:14:51 +01:00
Andreas Kling
f081990717 Kernel: Use get_fast_random() for the random syscall stack offset 2020-01-03 12:48:28 +01:00
Andreas Kling
1d94b5eb04 Kernel: Add a random offset to kernel stacks upon syscall entry
When entering the kernel from a syscall, we now insert a small bit of
stack padding after the RegisterDump. This makes kernel stacks less
deterministic across syscalls and may make some bugs harder to exploit.

Inspired by Elena Reshetova's talk on kernel stack exploitation.
2020-01-01 23:21:24 +01:00
Andreas Kling
f01fd54d1b Kernel: Make separate kernel entry points for each PIC IRQ
Instead of having a common entry point and looking at the PIC ISR to
figure out which IRQ we're servicing, just make a separate entryway
for each IRQ that pushes the IRQ number and jumps to a common routine.

This fixes a weird issue where incoming network packets would sometimes
cause the mouse to stop working. I didn't track it down further than
realizing we were sometimes EOI'ing the wrong IRQ.
2019-12-15 12:47:53 +01:00
Andreas Kling
e49d6cc7e9 Kernel: Tidy up kernel entry points a little bit
Now that we can see the kernel entry points all the time in profiles,
let's tweak the names a little bit and switch to named exceptions.
2019-12-14 16:16:57 +01:00
Andreas Kling
e56daf547c Kernel: Disallow syscalls from writeable memory
Processes will now crash with SIGSEGV if they attempt making a syscall
from PROT_WRITE memory.

This neat idea comes from OpenBSD. :^)
2019-11-29 16:30:05 +01:00
Andreas Kling
5b8cf2ee23 Kernel: Make syscall counters and page fault counters per-thread
Now that we show individual threads in SystemMonitor and "top",
it's also very nice to have individual counters for the threads. :^)
2019-11-26 21:37:38 +01:00
Andreas Kling
c23addd1fb Kernel: When userspaces calls a removed syscall, fail with ENOSYS
This is a bit gentler than jumping to 0x0, which always crashes the
whole process. Also log a debug message about what happened, and let
the user know that it's probably time to rebuild the program.
2019-11-18 12:35:14 +01:00
Andreas Kling
3da6d89d1f Kernel+LibC: Remove the isatty() syscall
This can be implemented entirely in userspace by calling tcgetattr().
To avoid screwing up the syscall indexes, this patch also adds a
mechanism for removing a syscall without shifting the index of other
syscalls.

Note that ports will still have to be rebuilt after this change,
as their LibC code will try to make the isatty() syscall on startup.
2019-11-17 20:03:42 +01:00
Andreas Kling
794758df3a Kernel: Implement some basic stack pointer validation
VM regions can now be marked as stack regions, which is then validated
on syscall, and on page fault.

If a thread is caught with its stack pointer pointing into anything
that's *not* a Region with its stack bit set, we'll crash the whole
process with SIGSTKFLT.

Userspace must now allocate custom stacks by using mmap() with the new
MAP_STACK flag. This mechanism was first introduced in OpenBSD, and now
we have it too, yay! :^)
2019-11-17 12:15:43 +01:00
Sergey Bugaev
1e1ddce9d8 Kernel: Unwind kernel stacks before dying
While executing in the kernel, a thread can acquire various resources
that need cleanup, such as locks and references to RefCounted objects.
This cleanup normally happens on the exit path, such as in destructors
for various RAII guards. But we weren't calling those exit paths when
killing threads that have been executing in the kernel, such as threads
blocked on reading or sleeping, thus causing leaks.

This commit changes how killing threads works. Now, instead of killing
a thread directly, one is supposed to call thread->set_should_die(),
which will unblock it and make it unwind the stack if it is blocked
in the kernel. Then, just before returning to the userspace, the thread
will automatically die.
2019-11-14 20:05:58 +01:00
Andreas Kling
69ca9cfd78 LibPthread: Start working on a POSIX threading library
This patch adds pthread_create() and pthread_exit(), which currently
simply wrap our existing create_thread() and exit_thread() syscalls.

LibThread is also ported to using LibPthread.
2019-11-13 21:49:24 +01:00
Andreas Kling
b285a1944e Kernel: Clear the x86 DF flag when entering the kernel
The SysV ABI says that the DF flag should be clear on function entry.
That means we have to clear it when jumping into the kernel from some
random userspace context.
2019-11-09 22:42:19 +01:00