Commit Graph

1068 Commits

Author SHA1 Message Date
Brian Gianforcaro
25a620a573 Kernel: Enable timeout support for sys$futex(FUTEX_WAIT)
Utilize the new Thread::wait_on timeout parameter to implement
timeout support for FUTEX_WAIT.

As we compute the relative time from the user specified absolute
time, we try to delay that computation as long as possible before
we call into Thread::wait_on(..). To enable this a small bit of
refactoring was done pull futex_queue fetching out and timeout fetch
and calculation separation.
2020-04-26 21:31:52 +02:00
Andreas Kling
fb826aa59a Kernel: Make sys$sethostname() superuser-only
Also take the hostname string lock exclusively.
2020-04-26 15:51:57 +02:00
Luke Payne
f191b84b50 Kernel: Added the ability to set the hostname via new syscall
Userland/hostname: Now takes parameter to set the hostname
LibC/unistd: Added sethostname function
2020-04-26 12:59:09 +02:00
Brian Gianforcaro
0f3990cfa3 Kernel: Support signaling all processes with pid == -1
This is a special case that was previously not implemented.
The idea is that you can dispatch a signal to all other processes
the calling process has access to.

There was some minor refactoring to make the self signal logic
into a function so it could easily be easily re-used from do_killall.
2020-04-26 12:54:10 +02:00
Brian Gianforcaro
1f64e3eb16 Kernel: Implement FUTEX_WAKE of arbitrary count.
Previously we just woke all waiters no matter how many were
requested. Fix this by implementing WaitQueue::wake_n(..).
2020-04-26 12:35:35 +02:00
Drew Stratford
4a37362249 LibPthread: implicitly call pthread_exit on return from start routine.
Previously, when returning from a pthread's start_routine, we would
segfault. Now we instead implicitly call pthread_exit as specified in
the standard.

pthread_create now creates a thread running the new
pthread_create_helper, which properly manages the calling and exiting
of the start_routine supplied to pthread_create. To accomplish this,
the thread's stack initialization has been moved out of
sys$create_thread and into the userspace function create_thread.
2020-04-25 16:51:35 +02:00
Itamar
edaa9c06d9 LibELF: Make ELF::Loader RefCounted 2020-04-20 17:25:50 +02:00
Sergey Bugaev
54550365eb Kernel: Use shared locking mode in some places
The notable piece of code that remains to be converted is Ext2FS.
2020-04-18 13:58:29 +02:00
Sergey Bugaev
f18d6610d3 Kernel: Don't include null terminator in sys$readlink() result
POSIX says, "Conforming applications should not assume that the returned
contents of the symbolic link are null-terminated."

If we do include the null terminator into the returning string, Python
believes it to actually be a part of the returned name, and gets unhappy
about that later. This suggests other systems Python runs in don't include
it, so let's do that too.

Also, make our userspace support non-null-terminated realpath().
2020-04-14 18:40:24 +02:00
Andreas Kling
815b73bdcc Kernel: Simplify sys$setgroups(0, ...)
If we're dropping all groups, just clear the extra_gids and return.
2020-04-14 15:30:25 +02:00
Andreas Kling
9962db5bf8 Kernel: Remove SmapDisablers in {peek,poke}_user_data() 2020-04-14 09:52:49 +02:00
Itamar
3e9a7175d1 Debugger: Add DebugSession
The DebugSession class wraps the usage of Ptrace.
It is intended to be used by cli & gui debugger programs.

Also, call objdump for disassemly
2020-04-13 00:53:22 +02:00
Itamar
aae3f7b914 Process: Fix siginfo for code CLD_STOPPED
si_code, si_status where swapped
2020-04-13 00:53:22 +02:00
Itamar
9e51e295cf ptrace: Add PT_SETREGS
PT_SETTREGS sets the regsiters of the traced thread. It can only be
used when the tracee is stopped.

Also, refactor ptrace.
The implementation was getting long and cluttered the alraedy large
Process.cpp file.

This commit moves the bulk of the implementation to Kernel/Ptrace.cpp,
and factors out peek & poke to separate methods of the Process class.
2020-04-13 00:53:22 +02:00
Itamar
0431712660 ptrace: Stop a traced thread when it exists from execve
This was a missing feature in the PT_TRACEME command.

This feature allows the tracer to interact with the tracee before the
tracee has started executing its program.

It will be useful for automatically inserting a breakpoint at a
debugged program's entry point.
2020-04-13 00:53:22 +02:00
Itamar
b306ac9b2b ptrace: Add PT_POKE
PT_POKE writes a single word to the tracee's address space.

Some caveats:
- If the user requests to write to an address in a read-only region, we
temporarily change the page's protections to allow it.

- If the user requests to write to a region that's backed by a
SharedInodeVMObject, we replace the vmobject with a PrivateIndoeVMObject.
2020-04-13 00:53:22 +02:00
Itamar
984ff93406 ptrace: Add PT_PEEK
PT_PEEK reads a single word from the tracee's address space and returns
it to the tracer.
2020-04-13 00:53:22 +02:00
Andreas Kling
c19b56dc99 Kernel+LibC: Add minherit() and MAP_INHERIT_ZERO
This patch adds the minherit() syscall originally invented by OpenBSD.
Only the MAP_INHERIT_ZERO mode is supported for now. If set on an mmap
region, that region will be zeroed out on fork().
2020-04-12 20:22:26 +02:00
Andrew Kaster
61acca223f LibELF: Move validation methods to their own file
These validate_elf_* methods really had no business being static
methods of ELF::Image. Now that the ELF namespace exists, it makes
sense to just move them to be free functions in the namespace.
2020-04-11 22:41:05 +02:00
Andrew Kaster
21b5909dc6 LibELF: Move ELF classes into namespace ELF
This is for consistency with other namespace changes that were made
a while back to the other libraries :)
2020-04-11 22:41:05 +02:00
Andreas Kling
dec352dacd Kernel: Ignore zero-length PROGBITS sections in sys$module_load() 2020-04-10 16:36:01 +02:00
Andreas Kling
c06d5ef114 Kernel+LibC: Remove ESUCCESS
There's no official ESUCCESS==0 errno code, and it keeps breaking the
Lagom build when we use it, so let's just say 0 instead.
2020-04-10 13:09:35 +02:00
Andreas Kling
871d450b93 Kernel: Remove redundant "ACPI" from filenames in ACPI/ 2020-04-09 18:17:27 +02:00
Andreas Kling
4644217094 Kernel: Remove "non-operational" ACPI parser state
If we don't support ACPI, just don't instantiate an ACPI parser.
This is way less confusing than having a special parser class whose
only purpose is to do nothing.

We now search for the RSDP in ACPI::initialize() instead of letting
the parser constructor do it. This allows us to defer the decision
to create a parser until we're sure we can make a useful one.
2020-04-09 17:19:11 +02:00
Andreas Kling
dc7340332d Kernel: Update cryptically-named functions related to symbolication 2020-04-08 17:19:46 +02:00
Liav A
23fb985f02 Kernel & Userland: Allow to mount image files formatted with Ext2FS 2020-04-06 15:36:36 +02:00
Andreas Kling
9ae3cced76 Revert "Kernel & Userland: Allow to mount image files formatted with Ext2FS"
This reverts commit a60ea79a41.

Reverting these changes since they broke things.
Fixes #1608.
2020-04-03 21:28:57 +02:00
Liav A
a60ea79a41 Kernel & Userland: Allow to mount image files formatted with Ext2FS 2020-04-02 12:03:08 +02:00
Itamar
6b74d38aab Kernel: Add 'ptrace' syscall
This commit adds a basic implementation of
the ptrace syscall, which allows one process
(the tracer) to control another process (the tracee).

While a process is being traced, it is stopped whenever a signal is
received (other than SIGCONT).

The tracer can start tracing another thread with PT_ATTACH,
which causes the tracee to stop.

From there, the tracer can use PT_CONTINUE
to continue the execution of the tracee,
or use other request codes (which haven't been implemented yet)
to modify the state of the tracee.

Additional request codes are PT_SYSCALL, which causes the tracee to
continue exection but stop at the next entry or exit from a syscall,
and PT_GETREGS which fethces the last saved register set of the tracee
(can be used to inspect syscall arguments and return value).

A special request code is PT_TRACE_ME, which is issued by the tracee
and causes it to stop when it calls execve and wait for the
tracer to attach.
2020-03-28 18:27:18 +01:00
Shannon Booth
757c14650f Kernel: Simplify process assertion checking if region is in range
Let's use the helper function for this :)
2020-03-22 08:51:40 +01:00
Liav A
b536547c52 Process: Use monotonic time for timeouts 2020-03-19 15:48:00 +01:00
Liav A
4484513b45 Kernel: Add new syscall to allow changing the system date 2020-03-19 15:48:00 +01:00
Liav A
9db291d885 Kernel: Introduce the new Time management subsystem
This new subsystem includes better abstractions of how time will be
handled in the OS. We take advantage of the existing RTC timer to aid
in keeping time synchronized. This is standing in contrast to how we
handled time-keeping in the kernel, where the PIT was responsible for
that function in addition to update the scheduler about ticks.
With that new advantage, we can easily change the ticking dynamically
and still keep the time synchronized.

In the process context, we no longer use a fixed declaration of
TICKS_PER_SECOND, but we call the TimeManagement singleton class to
provide us the right value. This allows us to use dynamic ticking in
the future, a feature known as tickless kernel.

The scheduler no longer does by himself the calculation of real time
(Unix time), and just calls the TimeManagment singleton class to provide
the value.

Also, we can use 2 new boot arguments:
- the "time" boot argument accpets either the value "modern", or
  "legacy". If "modern" is specified, the time management subsystem will
  try to setup HPET. Otherwise, for "legacy" value, the time subsystem
  will revert to use the PIT & RTC, leaving HPET disabled.
  If this boot argument is not specified, the default pattern is to try
  to setup HPET.
- the "hpet" boot argumet accepts either the value "periodic" or
  "nonperiodic". If "periodic" is specified, the HPET will scan for
  periodic timers, and will assert if none are found. If only one is
  found, that timer will be assigned for the time-keeping task. If more
  than one is found, both time-keeping task & scheduler-ticking task
  will be assigned to periodic timers.
  If this boot argument is not specified, the default pattern is to try
  to scan for HPET periodic timers. This boot argument has no effect if
  HPET is disabled.

In hardware context, PIT & RealTimeClock classes are merely inheriting
from the HardwareTimer class, and they allow to use the old i8254 (PIT)
and RTC devices, managing them via IO ports. By default, the RTC will be
programmed to a frequency of 1024Hz. The PIT will be programmed to a
frequency close to 1000Hz.

About HPET, depending if we need to scan for periodic timers or not,
we try to set a frequency close to 1000Hz for the time-keeping timer
and scheduler-ticking timer. Also, if possible, we try to enable the
Legacy replacement feature of the HPET. This feature if exists,
instructs the chipset to disconnect both i8254 (PIT) and RTC.
This behavior is observable on QEMU, and was verified against the source
code:
ce967e2f33

The HPETComparator class is inheriting from HardwareTimer class, and is
responsible for an individual HPET comparator, which is essentially a
timer. Therefore, it needs to call the singleton HPET class to perform
HPET-related operations.

The new abstraction of Hardware timers brings an opportunity of more new
features in the foreseeable future. For example, we can change the
callback function of each hardware timer, thus it makes it possible to
swap missions between hardware timers, or to allow to use a hardware
timer for other temporary missions (e.g. calibrating the LAPIC timer,
measuring the CPU frequency, etc).
2020-03-19 15:48:00 +01:00
Alex Muscar
d013753f83
Kernel: Resolve relative paths when there is a veil (#1474) 2020-03-19 09:57:34 +01:00
Andreas Kling
ad92a1e4bc Kernel: Add sys$get_stack_bounds() for finding the stack base & size
This will be useful when implementing conservative garbage collection.
2020-03-16 19:06:33 +01:00
Andreas Kling
3803196edb Kernel: Get rid of SmapDisabler in sys$fstat() 2020-03-10 13:34:24 +01:00
Liav A
0f45a1b5e7 Kernel: Allow to reboot in ACPI via PCI or MMIO access
Also, we determine if ACPI reboot is supported by checking the FADT
flags' field.
2020-03-09 10:53:13 +01:00
Ben Wiederhake
b066586355 Kernel: Fix race in waitid
This is similar to 28e1da344d
and 4dd4dd2f3c.

The crux is that wait verifies that the outvalue (siginfo* infop)
is writable *before* waiting, and writes to it *after* waiting.
In the meantime, a concurrent thread can make the output region
unwritable, e.g. by deallocating it.
2020-03-08 14:12:12 +01:00
Ben Wiederhake
d8cd4e4902 Kernel: Fix race in select
This is similar to 28e1da344d
and 4dd4dd2f3c.

The crux is that select verifies that the filedescriptor sets
are writable *before* blocking, and writes to them *after* blocking.
In the meantime, a concurrent thread can make the output buffer
unwritable, e.g. by deallocating it.
2020-03-08 14:12:12 +01:00
Andreas Kling
b1058b33fb AK: Add global FlatPtr typedef. It's u32 or u64, based on sizeof(void*)
Use this instead of uintptr_t throughout the codebase. This makes it
possible to pass a FlatPtr to something that has u32 and u64 overloads.
2020-03-08 13:06:51 +01:00
Andreas Kling
c6693f9b3a Kernel: Simplify a bunch of dbg() and klog() calls
LogStream can handle VirtualAddress and PhysicalAddress directly.
2020-03-06 15:00:44 +01:00
Liav A
85eb1d26d5 Kernel: Run clang-format on Process.cpp & ACPIDynamicParser.h 2020-03-05 19:04:04 +01:00
Liav A
1b8cd6db7b Kernel: Call ACPI reboot method first if possible
Now we call ACPI reboot method first if possible, and if ACPI reboot is
not available, we attempt to reboot via the keyboard controller.
2020-03-05 19:04:04 +01:00
Ben Wiederhake
4dd4dd2f3c Kernel: Fix race in clock_nanosleep
This is a complete fix of clock_nanosleep, because the thread holds the
process lock again when returning from sleep()/sleep_until().
Therefore, no further concurrent invalidation can occur.
2020-03-03 20:13:32 +01:00
Liav A
0fc60e41dd Kernel: Use klog() instead of kprintf()
Also, duplicate data in dbg() and klog() calls were removed.
In addition, leakage of virtual address to kernel log is prevented.
This is done by replacing kprintf() calls to dbg() calls with the
leaked data instead.
Also, other kprintf() calls were replaced with klog().
2020-03-02 22:23:39 +01:00
Andreas Kling
47beab926d Kernel: Remove ability to create kernel-only regions at user addresses
This was only used by the mechanism for mapping executables into each
process's own address space. Now that we remap executables on demand
when needed for symbolication, this can go away.
2020-03-02 11:20:34 +01:00
Andreas Kling
e56f8706ce Kernel: Map executables at a kernel address during ELF load
This is both simpler and more robust than mapping them in the process
address space.
2020-03-02 11:20:34 +01:00
Andreas Kling
678c87087d Kernel: Load executables on demand when symbolicating
Previously we would map the entire executable of a program in its own
address space (but make it unavailable to userspace code.)

This patch removes that and changes the symbolication code to remap
the executable on demand (and into the kernel's own address space
instead of the process address space.)

This opens up a couple of further simplifications that will follow.
2020-03-02 11:20:34 +01:00
Andreas Kling
0acac186fb Kernel: Make the "entire executable" region shared
This makes Region::clone() do the right thing with it on fork().
2020-03-02 06:13:29 +01:00
Andreas Kling
5c2a296a49 Kernel: Mark read-only PT_LOAD mappings as shared regions
This makes Region::clone() do the right thing for these now that we
differentiate based on Region::is_shared().
2020-03-01 21:26:36 +01:00
Andreas Kling
ecfde5997b Kernel: Use SharedInodeVMObject for executables after all
I had the wrong idea about this. Thanks to Sergey for pointing it out!

Here's what he says (reproduced for posterity):

> Private mappings protect the underlying file from the changes made by
> you, not the other way around. To quote POSIX, "If MAP_PRIVATE is
> specified, modifications to the mapped data by the calling process
> shall be visible only to the calling process and shall not change the
> underlying object. It is unspecified whether modifications to the
> underlying object done after the MAP_PRIVATE mapping is established
> are visible through the MAP_PRIVATE mapping." In practice that means
> that the pages that were already paged in don't get updated when the
> underlying file changes, and the pages that weren't paged in yet will
> load the latest data at that moment.
> The only thing MAP_FILE | MAP_PRIVATE is really useful for is mapping
> a library and performing relocations; it's definitely useless (and
> actively harmful for the system memory usage) if you only read from
> the file.

This effectively reverts e2697c2ddd.
2020-03-01 21:16:27 +01:00
Andreas Kling
bb7dd63f74 Kernel: Run clang-format on Process.cpp 2020-03-01 21:16:27 +01:00
Andreas Kling
687b52ceb5 Kernel: Name perfcore files "perfcore.PID"
This way we can trace many things and we get one perfcore file per
process instead of everyone trying to write to "perfcore"
2020-03-01 20:59:02 +01:00
Andreas Kling
fee20bd8de Kernel: Remove some more harmless InodeVMObject miscasts 2020-03-01 12:27:03 +01:00
Andreas Kling
95e3aec719 Kernel: Fix harmless type miscast in Process::amount_clean_inode() 2020-03-01 11:23:23 +01:00
Andreas Kling
e2697c2ddd Kernel: Use PrivateInodeVMObject for loading program executables
This will be a memory usage pessimization until we actually implement
CoW sharing of the memory pages with SharedInodeVMObject.

However, it's a huge architectural improvement, so let's take it and
improve on this incrementally.

fork() should still be neutral, since all private mappings are CoW'ed.
2020-03-01 11:23:10 +01:00
Andreas Kling
88b334135b Kernel: Remove some Region construction helpers
It's now up to the caller to provide a VMObject when constructing a new
Region object. This will make it easier to handle things going wrong,
like allocation failures, etc.
2020-03-01 11:23:10 +01:00
Andreas Kling
4badef8137 Kernel: Return bytes written if sys$write() fails after writing some
If we wrote anything we should just inform userspace that we did,
and not worry about the error code. Userspace can call us again if
it wants, and we'll give them the error then.
2020-02-29 18:42:35 +01:00
Andreas Kling
7cd1bdfd81 Kernel: Simplify some dbg() logging
We don't have to log the process name/PID/TID, dbg() automatically adds
that as a prefix to every line.

Also we don't have to do .characters() on Strings passed to dbg() :^)
2020-02-29 13:39:06 +01:00
Andreas Kling
8fbdda5a2d Kernel: Implement basic support for sys$mmap() with MAP_PRIVATE
You can now mmap a file as private and writable, and the changes you
make will only be visible to you.

This works because internally a MAP_PRIVATE region is backed by a
unique PrivateInodeVMObject instead of using the globally shared
SharedInodeVMObject like we always did before. :^)

Fixes #1045.
2020-02-28 23:25:00 +01:00
Andreas Kling
aa1e209845 Kernel: Remove some unnecessary indirection in InodeFile::mmap()
InodeFile now directly calls Process::allocate_region_with_vmobject()
instead of taking an awkward detour via a special Region constructor.
2020-02-28 20:29:14 +01:00
Andreas Kling
651417a085 Kernel: Split InodeVMObject into two subclasses
We now have PrivateInodeVMObject and SharedInodeVMObject, corresponding
to MAP_PRIVATE and MAP_SHARED respectively.

Note that PrivateInodeVMObject is not used yet.
2020-02-28 20:20:35 +01:00
Andreas Kling
07a26aece3 Kernel: Rename InodeVMObject => SharedInodeVMObject 2020-02-28 20:07:51 +01:00
Andreas Kling
5af95139fa Kernel: Make Process::m_master_tls_region a WeakPtr
Let's not keep raw Region* variables around like that when it's so easy
to avoid it.
2020-02-28 14:05:30 +01:00
Andreas Kling
b0623a0c58 Kernel: Remove SmapDisabler in sys$connect() 2020-02-28 13:20:26 +01:00
Andreas Kling
dcd619bd46 Kernel: Merge the shbuf_get_size() syscall into shbuf_get()
Add an extra out-parameter to shbuf_get() that receives the size of the
shared buffer. That way we don't need to make a separate syscall to
get the size, which we always did immediately after.
2020-02-28 12:55:58 +01:00
Andreas Kling
f72e5bbb17 Kernel+LibC: Rename shared buffer syscalls to use a prefix
This feels a lot more consistent and Unixy:

    create_shared_buffer()   => shbuf_create()
    share_buffer_with()      => shbuf_allow_pid()
    share_buffer_globally()  => shbuf_allow_all()
    get_shared_buffer()      => shbuf_get()
    release_shared_buffer()  => shbuf_release()
    seal_shared_buffer()     => shbuf_seal()
    get_shared_buffer_size() => shbuf_get_size()

Also, "shared_buffer_id" is shortened to "shbuf_id" all around.
2020-02-28 12:55:58 +01:00
Liav A
db23703570 Process: Use dbg() instead of dbgprintf()
Also, fix a bad derefernce in sys$create_shared_buffer() method.
2020-02-27 13:05:12 +01:00
Andreas Kling
4997dcde06 Kernel: Always disable interrupts in do_killpg()
Will caught an assertion when running "kill 9999999999999" :^)
2020-02-27 11:05:16 +01:00
Andreas Kling
4a293e8a21 Kernel: Ignore signals sent to threadless (zombie) processes
If a process doesn't have any threads left, it's in a zombie state and
we can't meaningfully send signals to it. So just ignore them.

Fixes #1313.
2020-02-27 11:04:15 +01:00
Andreas Kling
0c1497846e Kernel: Don't allow profiling a dead process
Work towards #1313.
2020-02-27 10:42:31 +01:00
Cristian-Bogdan SIRB
05ce8586ea Kernel: Fix ASSERTION failed in join_thread syscall
set_interrupted_by_death was never called whenever a thread that had
a joiner died, so the joiner remained with the joinee pointer there,
resulting in an assertion fail in JoinBlocker: m_joinee pointed to
a freed task, filled with garbage.

Thread::current->m_joinee may not be valid after the unblock

Properly return the joinee exit value to the joiner thread.
2020-02-27 10:09:44 +01:00
Andreas Kling
d28fa89346 Kernel: Don't assert on sys$kill() with pid=INT32_MIN
On 32-bit platforms, INT32_MIN == -INT32_MIN, so we can't expect this
to always work:

    if (pid < 0)
        positive_pid = -pid; // may still be negative!

This happens because the -INT32_MIN expression becomes a long and is
then truncated back to an int.

Fixes #1312.
2020-02-27 10:02:04 +01:00
Cristian-Bogdan SIRB
717cd5015e Kernel: Allow process with multiple threads to call exec and exit
This allows a process wich has more than 1 thread to call exec, even
from a thread. This kills all the other threads, but it won't wait for
them to finish, just makes sure that they are not in a running/runable
state.

In the case where a thread does exec, the new program PID will be the
thread TID, to keep the PID == TID in the new process.

This introduces a new function inside the Process class,
kill_threads_except_self which is called on exit() too (exit with
multiple threads wasn't properly working either).

Inside the Lock class, there is the need for a new function,
clear_waiters, which removes all the waiters from the
Process::big_lock. This is needed since after a exit/exec, there should
be no other threads waiting for this lock, the threads should be simply
killed. Only queued threads should wait for this lock at this point,
since blocked threads are handled in set_should_die.
2020-02-26 13:06:40 +01:00
Andreas Kling
ceec1a7d38 AK: Make Vector use size_t for its size and capacity 2020-02-25 14:52:35 +01:00
Andreas Kling
d0f5b43c2e Kernel: Use Vector::unstable_remove() when deallocating a region
Process::m_regions is not sorted, so we can use unstable_remove()
to avoid shifting the vector contents. :^)
2020-02-24 18:34:49 +01:00
Andreas Kling
30a8991dbf Kernel: Make Region weakable and use WeakPtr<Region> instead of Region*
This turns use-after-free bugs into null pointer dereferences instead.
2020-02-24 13:32:45 +01:00
Andreas Kling
79576f9280 Kernel: Clear the region lookup cache on exec()
Each process has a 1-level lookup cache for fast repeated lookups of
the same VM region (which tends to be the majority of lookups.)
The cache is used by the following syscalls: munmap, madvise, mprotect
and set_mmap_name.

After a succesful exec(), there could be a stale Region* in the lookup
cache, and the new executable was able to manipulate it using a number
of use-after-free code paths.
2020-02-24 12:37:27 +01:00
Liav A
895e874eb4 Kernel: Include the new PIT class in system components 2020-02-24 11:27:03 +01:00
Andreas Kling
fc5ebe2a50 Kernel: Disown shared buffers on sys$execve()
When committing to a new executable, disown any shared buffers that the
process was previously co-owning.

Otherwise accessing the same shared buffer ID from the new program
would cause the kernel to find a cached (and stale!) reference to the
previous program's VM region corresponding to that shared buffer,
leading to a Region* use-after-free.

Fixes #1270.
2020-02-22 12:29:38 +01:00
Andreas Kling
ece2971112 Kernel: Disable profiling during the critical section of sys$execve()
Since we're gonna throw away these stacks at the end of exec anyway,
we might as well disable profiling before starting to mess with the
process page tables. One less weird situation to worry about in the
sampling code.
2020-02-22 11:09:03 +01:00
Andreas Kling
d7a13dbaa7 Kernel: Reset profiling state on exec() (but keep it going)
We now log the new executable on exec() and throw away all the samples
we've accumulated so far. But profiling keeps going.
2020-02-22 10:54:50 +01:00
Andreas Kling
2a679f228e Kernel: Fix bitrotted DEBUG_IO logging 2020-02-21 15:49:30 +01:00
Andreas Kling
bead20c40f Kernel: Remove SmapDisabler in sys$create_shared_buffer() 2020-02-18 14:12:39 +01:00
Andreas Kling
9aa234cc47 Kernel: Reset FPU state on exec() 2020-02-18 13:44:27 +01:00
Andreas Kling
a7dbb3cf96 Kernel: Use a FixedArray for a process's extra GIDs
There's not really enough of these to justify using a HashTable.
2020-02-18 11:35:47 +01:00
Andreas Kling
48f7c28a5c Kernel: Replace "current" with Thread::current and Process::current
Suggested by Sergey. The currently running Thread and Process are now
Thread::current and Process::current respectively. :^)
2020-02-17 15:04:27 +01:00
Andreas Kling
4f4af24b9d Kernel: Tear down process address space during finalization
Process teardown is divided into two main stages: finalize and reap.

Finalization happens in the "Finalizer" kernel and runs with interrupts
enabled, allowing destructors to take locks, etc.

Reaping happens either in sys$waitid() or in the scheduler for orphans.

The more work we can do in finalization, the better, since it's fully
pre-emptible and reduces the amount of time the system runs without
interrupts enabled.
2020-02-17 14:33:06 +01:00
Andreas Kling
31e1af732f Kernel+LibC: Allow sys$mmap() callers to specify address alignment
This is exposed via the non-standard serenity_mmap() call in userspace.
2020-02-16 12:55:56 +01:00
Andreas Kling
7a8be7f777 Kernel: Remove SmapDisabler in sys$accept() 2020-02-16 08:20:54 +01:00
Andreas Kling
7717084ac7 Kernel: Remove SmapDisabler in sys$clock_gettime() 2020-02-16 08:13:11 +01:00
Andreas Kling
16818322c5 Kernel: Reduce header dependencies of Process and Thread 2020-02-16 02:01:42 +01:00
Andreas Kling
e28809a996 Kernel: Add forward declaration header 2020-02-16 01:50:32 +01:00
Andreas Kling
1d611e4a11 Kernel: Reduce header dependencies of MemoryManager and Region 2020-02-16 01:33:41 +01:00
Andreas Kling
a356e48150 Kernel: Move all code into the Kernel namespace 2020-02-16 01:27:42 +01:00
Andreas Kling
1f55079488 Kernel: Remove SmapDisabler in sys$getgroups() 2020-02-16 00:30:00 +01:00
Andreas Kling
eb7b0c76a8 Kernel: Remove SmapDisabler in sys$setgroups() 2020-02-16 00:27:10 +01:00
Andreas Kling
0341ddc5eb Kernel: Rename RegisterDump => RegisterState 2020-02-16 00:15:37 +01:00
Andreas Kling
580a94bc44 Kernel+LibC: Merge sys$stat() and sys$lstat()
There is now only one sys$stat() instead of two separate syscalls.
2020-02-10 19:49:49 +01:00
Liav A
e559af2008 Kernel: Apply changes to use LibBareMetal definitions 2020-02-09 19:38:17 +01:00
Andreas Kling
7291370478 Kernel: Make File::truncate() take a u64
No point in taking a signed type here. We validate at the syscall layer
and then pass around a u64 from then on.
2020-02-08 12:07:04 +01:00
Andreas Kling
88ea152b24 Kernel: Merge unnecessary DiskDevice class into BlockDevice 2020-02-08 02:20:03 +01:00
Andreas Kling
2b0b7cc5a4 Net: Add a basic sys$shutdown() implementation
Calling shutdown prevents further reads and/or writes on a socket.
We should do a few more things based on the type of socket, but this
initial implementation just puts the basic mechanism in place.

Work towards #428.
2020-02-08 00:54:43 +01:00
Andreas Kling
f3a5985bb2 Kernel: Remove two bad FIXME's
We should absolutely *not* create a new thread in sys$exec().
There's also no sys$spawn() anymore.
2020-02-08 00:06:15 +01:00
Andreas Kling
d04fcccc90 Kernel: Truncate addresses stored by getsockname() and getpeername()
If there's not enough space in the output buffer for the whole sockaddr
we now simply truncate the address instead of returning EINVAL.

This patch also makes getpeername() actually return the peer address
rather than the local address.. :^)
2020-02-07 23:43:32 +01:00
Andreas Kling
dc18859695 Kernel: memset() all siginfo_t structs after creating them 2020-02-06 14:12:20 +01:00
Sergey Bugaev
1b866bbf42 Kernel: Fix sys$waitid(P_ALL, WNOHANG) return value
According to POSIX, waitid() should fill si_signo and si_pid members
with zeroes if there are no children that have already changed their
state by the time of the call. Let's just fill the whole structure
with zeroes to avoid leaking kernel memory.
2020-02-06 16:06:30 +03:00
Andreas Kling
75cb125e56 Kernel: Put sys$waitid() debug logging behind PROCESS_DEBUG 2020-02-05 19:14:56 +01:00
Sergey Bugaev
b3a24d732d Kernel+LibC: Add sys$waitid(), and make sys$waitpid() wrap it
sys$waitid() takes an explicit description of whether it's waiting for a single
process with the given PID, all of the children, a group, etc., and returns its
info as a siginfo_t.

It also doesn't automatically imply WEXITED, which clears up the confusion in
the kernel.
2020-02-05 18:14:37 +01:00
Andreas Kling
3879e5b9d4 Kernel: Start working on a syscall for logging performance events
This patch introduces sys$perf_event() with two event types:

- PERF_EVENT_MALLOC
- PERF_EVENT_FREE

After the first call to sys$perf_event(), a process will begin keeping
these events in a buffer. When the process dies, that buffer will be
written out to "perfcore" in the current directory unless that filename
is already taken.

This is probably not the best way to do this, but it's a start and will
make it possible to start doing memory allocation profiling. :^)
2020-02-02 20:26:27 +01:00
Andreas Kling
934b1d8a9b Kernel: Finalizer should not go back to sleep if there's more to do
Before putting itself back on the wait queue, the finalizer task will
now check if there's more work to do, and if so, do it first. :^)

This patch also puts a bunch of process/thread debug logging behind
PROCESS_DEBUG and THREAD_DEBUG since it was unbearable to debug this
stuff with all the spam.
2020-02-01 10:56:17 +01:00
Andreas Kling
6634da31d9 Kernel: Disallow empty ranges in munmap/mprotect/madvise 2020-01-30 21:55:49 +01:00
Andreas Kling
31d1c82621 Kernel: Reject non-user address ranges in mmap/munmap/mprotect/madvise
There's no valid reason to allow non-userspace address ranges in these
system calls.
2020-01-30 21:51:27 +01:00
Andreas Kling
afd2b5a53e Kernel: Copy "stack" and "mmap" bits when splitting a Region 2020-01-30 21:51:27 +01:00
Andreas Kling
c9e877a294 Kernel: Address validation helpers should take size_t, not ssize_t 2020-01-30 21:51:27 +01:00
Andreas Kling
c64904a483 Kernel: sys$readlink() should return the number of bytes written out 2020-01-27 21:50:51 +01:00
Andreas Kling
8b49804895 Kernel: sys$waitpid() only needs the waitee thread in the stopped case
If the waitee process is dead, we don't need to inspect the thread.

This fixes an issue with sys$waitpid() failing before reap() since
dead processes will have no remaining threads alive.
2020-01-27 21:21:48 +01:00
Andreas Kling
f4302b58fb Kernel: Remove SmapDisablers in sys$getsockname() and sys$getpeername()
Instead use the user/kernel copy helpers to only copy the minimum stuff
needed from to/from userspace.

Based on work started by Brian Gianforcaro.
2020-01-27 21:11:36 +01:00
Andreas Kling
5163c5cc63 Kernel: Expose the signal that stopped a thread via sys$waitpid() 2020-01-27 20:47:10 +01:00
Andreas Kling
638fe6f84a Kernel: Disable interrupts while looking into the thread table
There was a race window in a bunch of syscalls between calling
Thread::from_tid() and checking if the found thread was in the same
process as the calling thread.

If the found thread object was destroyed at that point, there was a
use-after-free that could be exploited by filling the kernel heap with
something that looked like a thread object.
2020-01-27 14:04:57 +01:00
Andreas Kling
c1f74bf327 Kernel: Never validate access to the kmalloc memory range
Memory validation is used to verify that user syscalls are allowed to
access a given memory range. Ring 0 threads never make syscalls, and
so will never end up in validation anyway.

The reason we were allowing kmalloc memory accesses is because kernel
thread stacks used to be allocated in kmalloc memory. Since that's no
longer the case, we can stop making exceptions for kmalloc in the
validation code.
2020-01-27 12:43:21 +01:00
Andreas Kling
137a45dff2 Kernel: read()/write() should respect timeouts when used on a sockets
Move timeout management to the ReadBlocker and WriteBlocker classes.
Also get rid of the specialized ReceiveBlocker since it no longer does
anything that ReadBlocker can't do.
2020-01-26 17:54:23 +01:00
Andreas Kling
b011857e4f Kernel: Make writev() work again
Vector::ensure_capacity() makes sure the underlying vector buffer can
contain all the data, but it doesn't update the Vector::size().

As a result, writev() would simply collect all the buffers to write,
and then do nothing.
2020-01-26 10:10:15 +01:00
Andreas Kling
b93f6b07c2 Kernel: Make sched_setparam() and sched_getparam() operate on threads
Instead of operating on "some random thread in PID", these now operate
on the thread with a specific TID. This matches other systems better.
2020-01-26 09:58:58 +01:00
Andreas Kling
f4e7aecec2 Kernel: Preserve CoW bits when splitting VM regions 2020-01-25 17:57:10 +01:00
Andreas Kling
7cc0b18f65 Kernel: Only open a single description for stdio in non-fork processes 2020-01-25 17:05:02 +01:00
Andreas Kling
81ddd2dae0 Kernel: Make sys$setsid() clear the calling process's controlling TTY 2020-01-25 14:53:48 +01:00
Andreas Kling
2bf11b8348 Kernel: Allow empty strings in validate_and_copy_string_from_user()
Sergey pointed out that we should just allow empty strings everywhere.
2020-01-25 14:14:11 +01:00
Andreas Kling
69de90a625 Kernel: Simplify Process constructor
Move all the fork-specific inheritance logic to sys$fork(), and all the
stuff for setting up stdio for non-fork ring 3 processes moves to
Process::create_user_process().

Also: we were setting up the PGID, SID and umask twice. Also the code
for copying the open file descriptors was overly complicated. Now it's
just a simple Vector copy assignment. :^)
2020-01-25 14:13:47 +01:00
Andreas Kling
0f5221568b Kernel: sys$execve() should not EFAULT for empty argument strings
It's okay to exec { "/bin/echo", "" } and it should not EFAULT.
2020-01-25 12:21:30 +01:00
Andreas Kling
30ad7953ca Kernel: Rename UnveilState to VeilState 2020-01-21 19:28:59 +01:00
Andreas Kling
f38cfb3562 Kernel: Tidy up debug logging a little bit
When using dbg() in the kernel, the output is automatically prefixed
with [Process(PID:TID)]. This makes it a lot easier to understand which
thread is generating the output.

This patch also cleans up some common logging messages and removes the
now-unnecessary "dbg() << *current << ..." pattern.
2020-01-21 16:16:20 +01:00
Andreas Kling
6081c76515 Kernel: Make O_RDONLY non-zero
Sergey suggested that having a non-zero O_RDONLY would make some things
less confusing, and it seems like he's right about that.

We can now easily check read/write permissions separately instead of
dancing around with the bits.

This patch also fixes unveil() validation for O_RDWR which previously
forgot to check for "r" permission.
2020-01-21 13:27:08 +01:00
Andreas Kling
1b3cac2f42 Kernel: Don't forget about unveiled paths with zero permissions
We need to keep these around, otherwise the calling process can remove
and re-add a path to increase its permissions.
2020-01-21 11:42:28 +01:00
Andreas Kling
22cfb1f3bd Kernel: Clear unveiled state on exec() 2020-01-21 10:46:31 +01:00
Andreas Kling
cf48c20170 Kernel: Forked children should inherit unveil()'ed paths 2020-01-21 09:44:32 +01:00
Andreas Kling
0569123ad7 Kernel: Add a basic implementation of unveil()
This syscall is a complement to pledge() and adds the same sort of
incremental relinquishing of capabilities for filesystem access.

The first call to unveil() will "drop a veil" on the process, and from
now on, only unveiled parts of the filesystem are visible to it.

Each call to unveil() specifies a path to either a directory or a file
along with permissions for that path. The permissions are a combination
of the following:

- r: Read access (like the "rpath" promise)
- w: Write access (like the "wpath" promise)
- x: Execute access
- c: Create/remove access (like the "cpath" promise)

Attempts to open a path that has not been unveiled with fail with
ENOENT. If the unveiled path lacks sufficient permissions, it will fail
with EACCES.

Like pledge(), subsequent calls to unveil() with the same path can only
remove permissions, not add them.

Once you call unveil(nullptr, nullptr), the veil is locked, and it's no
longer possible to unveil any more paths for the process, ever.

This concept comes from OpenBSD, and their implementation does various
things differently, I'm sure. This is just a first implementation for
SerenityOS, and we'll keep improving on it as we go. :^)
2020-01-20 22:12:04 +01:00
Andreas Kling
e901a3695a Kernel: Use the templated copy_to/from_user() in more places
These ensure that the "to" and "from" pointers have the same type,
and also that we copy the correct number of bytes.
2020-01-20 13:41:21 +01:00
Sergey Bugaev
d5426fcc88 Kernel: Misc tweaks 2020-01-20 13:26:06 +01:00
Sergey Bugaev
9bc6157998 Kernel: Return new fd from sys$fcntl(F_DUPFD)
This fixes GNU Bash getting confused after performing a redirection.
2020-01-20 13:26:06 +01:00
Andreas Kling
4b7a89911c Kernel: Remove some unnecessary casts to uintptr_t
VirtualAddress is constructible from uintptr_t and const void*.
PhysicalAddress is constructible from uintptr_t but not const void*.
2020-01-20 13:13:03 +01:00
Andreas Kling
a246e9cd7e Use uintptr_t instead of u32 when storing pointers as integers
uintptr_t is 32-bit or 64-bit depending on the target platform.
This will help us write pointer size agnostic code so that when the day
comes that we want to do a 64-bit port, we'll be in better shape.
2020-01-20 13:13:03 +01:00
Andreas Kling
8d9dd1b04b Kernel: Add a 1-deep cache to Process::region_from_range()
This simple cache gets hit over 70% of the time on "g++ Process.cpp"
and shaves ~3% off the runtime.
2020-01-19 16:44:37 +01:00
Andreas Kling
ae0c435e68 Kernel: Add a Process::add_region() helper
This is a private helper for adding a Region to Process::m_regions.
It's just for convenience since it's a bit cumbersome to do this.
2020-01-19 16:26:42 +01:00
Andreas Kling
1dc9fa9506 Kernel: Simplify PageDirectory swapping in sys$execve()
Swap out both the PageDirectory and the Region list at the same time,
instead of doing the Region list slightly later.
2020-01-19 16:05:42 +01:00
Andreas Kling
6eab7b398d Kernel: Make ProcessPagingScope restore CR3 properly
Instead of restoring CR3 to the current process's paging scope when a
ProcessPagingScope goes out of scope, we now restore exactly whatever
the CR3 value was when we created the ProcessPagingScope.

This fixes breakage in situations where a process ends up with nested
ProcessPagingScopes. This was making profiling very fragile, and with
this change it's now possible to profile g++! :^)
2020-01-19 13:44:53 +01:00
Andreas Kling
f7b394e9a1 Kernel: Assert that copy_to/from_user() are called with user addresses
This will panic the kernel immediately if these functions are misused
so we can catch it and fix the misuse.

This patch fixes a couple of misuses:

    - create_signal_trampolines() writes to a user-accessible page
      above the 3GB address mark. We should really get rid of this
      page but that's a whole other thing.

    - CoW faults need to use copy_from_user rather than copy_to_user
      since it's the *source* pointer that points to user memory.

    - Inode faults need to use memcpy rather than copy_to_user since
      we're copying a kernel stack buffer into a quickmapped page.

This should make the copy_to/from_user() functions slightly less useful
for exploitation. Before this, they were essentially just glorified
memcpy() with SMAP disabled. :^)
2020-01-19 09:18:55 +01:00
Andreas Kling
5ce9382e98 Kernel: Only require "stdio" pledge for sending signals to self
This should match what OpenBSD does. Sending a signal to yourself seems
basically harmless.
2020-01-19 08:50:55 +01:00
Sergey Bugaev
3e1ed38d4b Kernel: Do not return ENOENT for unresolved symbols
ENOENT means "no such file or directory", not "no such symbol". Return EINVAL
instead, as we already do in other cases.
2020-01-18 23:51:22 +01:00
Sergey Bugaev
d0d13e2bf5 Kernel: Move setting file flags and r/w mode to VFS::open()
Previously, VFS::open() would only use the passed flags for permission checking
purposes, and Process::sys$open() would set them on the created FileDescription
explicitly. Now, they should be set by VFS::open() on any files being opened,
including files that the kernel opens internally.

This also lets us get rid of the explicit check for whether or not the returned
FileDescription was a preopen fd, and in fact, fixes a bug where a read-only
preopen fd without any other flags would be considered freshly opened (due to
O_RDONLY being indistinguishable from 0) and granted a new set of flags.
2020-01-18 23:51:22 +01:00
Sergey Bugaev
544b8286da Kernel: Do not open stdio fds for kernel processes
Kernel processes just do not need them.

This also avoids touching the file (sub)system early in the boot process when
initializing the colonel process.
2020-01-18 23:51:22 +01:00
Sergey Bugaev
6466c3d750 Kernel: Pass correct permission flags when opening files
Right now, permission flags passed to VFS::open() are effectively ignored, but
that is going to change.

* O_RDONLY is 0, but it's still nicer to pass it explicitly
* POSIX says that binding a Unix socket to a symlink shall fail with EADDRINUSE
2020-01-18 23:51:22 +01:00
Andreas Kling
862b3ccb4e Kernel: Enforce W^X between sys$mmap() and sys$execve()
It's now an error to sys$mmap() a file as writable if it's currently
mapped executable by anyone else.

It's also an error to sys$execve() a file that's currently mapped
writable by anyone else.

This fixes a race condition vulnerability where one program could make
modifications to an executable while another process was in the kernel,
in the middle of exec'ing the same executable.

Test: Kernel/elf-execve-mmap-race.cpp
2020-01-18 23:40:12 +01:00
Andreas Kling
4e6fe3c14b Kernel: Symbolicate kernel EIP on process crash
Process::crash() was assuming that EIP was always inside the ELF binary
of the program, but it could also be in the kernel.
2020-01-18 14:38:39 +01:00
Andreas Kling
9c9fe62a4b Kernel: Validate the requested range in allocate_region_with_vmobject() 2020-01-18 14:37:22 +01:00
Andreas Kling
aa63de53bd Kernel: Use get_syscall_path_argument() in sys$execve()
Paths passed to sys$execve() should certainly be subject to all the
usual path validation checks.
2020-01-18 11:43:28 +01:00
Andreas Kling
b65572b3fe Kernel: Disallow mmap names longer than PATH_MAX 2020-01-18 11:34:53 +01:00
Andreas Kling
94ca55cefd Meta: Add license header to source files
As suggested by Joshua, this commit adds the 2-clause BSD license as a
comment block to the top of every source file.

For the first pass, I've just added myself for simplicity. I encourage
everyone to add themselves as copyright holders of any file they've
added or modified in some significant way. If I've added myself in
error somewhere, feel free to replace it with the appropriate copyright
holder instead.

Going forward, all new source files should include a license header.
2020-01-18 09:45:54 +01:00
Andreas Kling
19c31d1617 Kernel: Always dump kernel regions when dumping process regions 2020-01-18 08:57:18 +01:00
Sergey Bugaev
064cd2278c Kernel: Remove the use of FileSystemPath in sys$realpath()
Now that VFS::resolve_path() canonicalizes paths automatically, we don't need to
do that here anymore.
2020-01-17 21:49:58 +01:00
Sergey Bugaev
8642a7046c Kernel: Let inodes provide pre-open file descriptions
Some magical inodes, such as /proc/pid/fd/fileno, are going to want to open() to
a custom FileDescription, so add a hook for that.
2020-01-17 21:49:58 +01:00
Sergey Bugaev
e0013a6b4c Kernel+LibC: Unify sys$open() and sys$openat()
The syscall is now called sys$open(), but it behaves like the old sys$openat().
In userspace, open_with_path_length() is made a wrapper over openat_with_path_length().
2020-01-17 21:49:58 +01:00
Andreas Kling
4d4d5e1c07 Kernel: Drop futex queues/state on exec()
This state is not meaningful to the new process image so just drop it.
2020-01-17 16:08:00 +01:00
Andreas Kling
26a31c7efb Kernel: Add "accept" pledge promise for accepting incoming connections
This patch adds a new "accept" promise that allows you to call accept()
on an already listening socket. This lets programs set up a socket for
for listening and then dropping "inet" and/or "unix" so that only
incoming (and existing) connections are allowed from that point on.
No new outgoing connections or listening server sockets can be created.

In addition to accept() it also allows getsockopt() with SOL_SOCKET
and SO_PEERCRED, which is used to find the PID/UID/GID of the socket
peer. This is used by our IPC library when creating shared buffers that
should only be accessible to a specific peer process.

This allows us to drop "unix" in WindowServer and LookupServer. :^)

It also makes the debugging/introspection RPC sockets in CEventLoop
based programs work again.
2020-01-17 11:19:06 +01:00
Andreas Kling
c6e552ac8f Kernel+LibELF: Don't blindly trust ELF symbol offsets in symbolication
It was possible to craft a custom ELF executable that when symbolicated
would cause the kernel to read from user-controlled addresses anywhere
in memory. You could then fetch this memory via /proc/PID/stack

We fix this by making ELFImage hand out StringView rather than raw
const char* for symbol names. In case a symbol offset is outside the
ELF image, you get a null StringView. :^)

Test: Kernel/elf-symbolication-kernel-read-exploit.cpp
2020-01-16 22:11:31 +01:00
Andreas Kling
d79de38bd2 Kernel: Don't allow userspace to sys$open() literal symlinks
The O_NOFOLLOW_NOERROR is an internal kernel mechanism used for the
implementation of sys$readlink() and sys$lstat().

There is no reason to allow userspace to open symlinks directly.
2020-01-15 21:19:26 +01:00
Andreas Kling
e23536d682 Kernel: Use Vector::unstable_remove() in a couple of places 2020-01-15 19:26:41 +01:00
Liav A
d2b41010c5 Kernel: Change Region allocation helpers
We now can create a cacheable Region, so when map() is called, if a
Region is cacheable then all the virtual memory space being allocated
to it will be marked as not cache disabled.

In addition to that, OS components can create a Region that will be
mapped to a specific physical address by using the appropriate helper
method.
2020-01-14 15:38:58 +01:00
Andreas Kling
65cb406327 Kernel: Allow unlocking a held Lock with interrupts disabled
This is needed to eliminate a race in Thread::wait_on() where we'd
otherwise have to wait until after unlocking the process lock before
we can disable interrupts.
2020-01-13 18:56:46 +01:00
Andrew Kaster
7a7e7c82b5 Kernel: Tighten up exec/do_exec and allow for PT_INTERP iterpreters
This patch changes how exec() figures out which program image to
actually load. Previously, we opened the path to our main executable in
find_shebang_interpreter_for_executable, read the first page (or less,
if the file was smaller) and then decided whether to recurse with the
interpreter instead. We then then re-opened the main executable in
do_exec.

However, since we now want to parse the ELF header and Program Headers
of an elf image before even doing any memory region work, we can change
the way this whole process works. We open the file and read (up to) the
first page in exec() itself, then pass just the page and the amount read
to find_shebang_interpreter_for_executable. Since we now have that page
and the FileDescription for the main executable handy, we can do a few
things. First, validate the ELF header and ELF program headers for any
shenanigans. ELF32 Little Endian i386 only, please. Second, we can grab
the PT_INTERP interpreter from any ET_DYN files, and open that guy right
away if it exists. Finally, we can pass the main executable's and
optionally the PT_INTERP interpreter's file descriptions down to do_exec
and not have to feel guilty about opening the file twice.

In do_exec, we now have a choice. Are we going to load the main
executable, or the interpreter? We could load both, but it'll be way
easier for the inital pass on the RTLD if we only load the interpreter.
Then it can load the main executable itself like any old shared object,
just, the one with main in it :). Later on we can load both of them
into memory and the RTLD can relocate itself before trying to do
anything. The way it's written now the RTLD will get dibs on its
requested virtual addresses being the actual virtual addresses.
2020-01-13 13:03:30 +01:00
Brian Gianforcaro
4cee441279 Kernel: Combine validate and copy of user mode pointers (#1069)
Right now there is a significant amount of boiler plate code required
to validate user mode parameters in syscalls. In an attempt to reduce
this a bit, introduce validate_read_and_copy_typed which combines the
usermode address check and does the copy internally if the validation
passes. This cleans up a little bit of code from a significant amount
of syscalls.
2020-01-13 11:19:17 +01:00
Brian Gianforcaro
9cac205d67 Kernel: Fix SMAP in setkeymap syscall
It looks like setkeymap was missed when
the SMAP functionality was introduced.

Disable SMAP only in the scope where we
actually read the usermode addresses.
2020-01-13 11:17:10 +01:00
Brian Gianforcaro
02704a73e9 Kernel: Use the templated copy_from_user where possible
Now that the templated version of copy_from_user exists
their is normally no reason to use the version which
takes the number of bytes to copy. Move to the templated
version where possible.
2020-01-13 11:07:39 +01:00
Andreas Kling
20b2bfcafd Kernel: Fix SMAP violation in sys$getrandom() 2020-01-12 20:10:53 +01:00
Sergey Bugaev
33c0dc08a7 Kernel: Don't forget to copy & destroy root_directory_for_procfs
Also, rename it to root_directory_relative_to_global_root.
2020-01-12 20:02:11 +01:00
Sergey Bugaev
dd54d13d8d Kernel+LibC: Allow passing mount flags to chroot()
Since a chroot is in many ways similar to a separate root mount, we can also
apply mount flags to it as if it was an actual mount. These flags will apply
whenever the chrooted process accesses its root directory, but not when other
processes access this same directory for the outside. Since it's common to
chdir("/") immediately after chrooting (so that files accessed through the
current directory inherit the same mount flags), this effectively allows one to
apply additional limitations to a process confined inside a chroot.

To this effect, sys$chroot() gains a mount_flags argument (exposed as
chroot_with_mount_flags() in userspace) which can be set to all the same values
as the flags argument for sys$mount(), and additionally to -1 to keep the flags
set for that file system. Note that passing 0 as mount_flags will unset any
flags that may have been set for the file system, not keep them.
2020-01-12 20:02:11 +01:00
Sergey Bugaev
93ff911473 Kernel: Properly propagate bind mount flags
Previously, when performing a bind mount flags other than MS_BIND were ignored.
Now, they're properly propagated the same way a for any other mount.
2020-01-12 20:02:11 +01:00
Sergey Bugaev
b620ed25ab Kernel: Simplify Ext2FS mount code path
Instead of looking up device metadata and then looking up a device by that
metadata explicitly, just use VFS::open(). This also means that attempting to
mount a device residing on a MS_NODEV file system will properly fail.
2020-01-12 20:02:11 +01:00
Sergey Bugaev
35b0f10f20 Kernel: Don't dump backtrace on successful exits
This was getting really annoying.
2020-01-12 20:02:11 +01:00
Andreas Kling
d1839ae0c9 Kernel: Clearing promises with pledge("") should fail
Thanks Sergey for catching this brain-fart. :^)
2020-01-12 12:16:17 +01:00
Andreas Kling
114a770c6f Kernel: Reduce pledge requirement for recvfrom()+sendto() to "stdio"
Since these only operate on already-open sockets, we should treat them
the same as we do read() and write() by putting them into "stdio".
2020-01-12 11:52:37 +01:00
Andreas Kling
955034e86e Kernel: Remove manual STAC/CLAC in create_thread() 2020-01-12 11:51:31 +01:00
Andreas Kling
a6cef2408c Kernel: Add sigreturn() to "stdio" with all the other signal syscalls 2020-01-12 10:32:56 +01:00
Andreas Kling
7b53699e6f Kernel: Require the "thread" pledge promise for futex() 2020-01-12 10:31:21 +01:00
Andreas Kling
c32d65ae9f Kernel: Put some more syscalls in the "stdio" bucket
yield() and get_kernel_info_page() seem like decent fits for "stdio".
2020-01-12 10:31:21 +01:00
Andreas Kling
ca609ce5a3 Kernel: Put fcntl() debug spam behind DEBUG_IO 2020-01-12 10:01:22 +01:00
Andreas Kling
017b34e1ad Kernel: Add "video" pledge for accessing framebuffer devices
WindowServer becomes the only user.
2020-01-12 02:18:30 +01:00
Andreas Kling
f187374c1b Kernel: fork()ed children should inherit pledge promises :^)
Update various places that now need wider promises as they are not
reset by fork() anymore.
2020-01-11 23:28:41 +01:00
Andreas Kling
409a4f7756 ping: Use pledge() 2020-01-11 20:48:43 +01:00
Sergey Bugaev
0cb0f54783 Kernel: Implement bind mounts
You can now bind-mount files and directories. This essentially exposes an
existing part of the file system in another place, and can be used as an
alternative to symlinks or hardlinks.

Here's an example of doing this:

    # mkdir /tmp/foo
    # mount /home/anon/myfile.txt /tmp/foo -o bind
    # cat /tmp/foo
    This is anon's file.
2020-01-11 18:57:53 +01:00
Sergey Bugaev
61c1106d9f Kernel+LibC: Implement a few mount flags
We now support these mount flags:
* MS_NODEV: disallow opening any devices from this file system
* MS_NOEXEC: disallow executing any executables from this file system
* MS_NOSUID: ignore set-user-id bits on executables from this file system

The fourth flag, MS_BIND, is defined, but currently ignored.
2020-01-11 18:57:53 +01:00
Sergey Bugaev
2fcbb846fb Kernel+LibC: Add O_EXEC, move exec permission checking to VFS::open()
O_EXEC is mentioned by POSIX, so let's have it. Currently, it is only used
inside the kernel to ensure the process has the right permissions when opening
an executable.
2020-01-11 18:57:53 +01:00
Sergey Bugaev
4566c2d811 Kernel+LibC: Add support for mount flags
At the moment, the actual flags are ignored, but we correctly propagate them all
the way from the original mount() syscall to each custody that resides on the
mounted FS.
2020-01-11 18:57:53 +01:00
Andreas Kling
83f59419cd Kernel: Oops, recvfrom() is not quite ready for SMAP protections yet 2020-01-11 13:03:44 +01:00
Andreas Kling
24c736b0e7 Kernel: Use the Syscall string and buffer types more
While I was updating syscalls to stop passing null-terminated strings,
I added some helpful struct types:

    - StringArgument { const char*; size_t; }
    - ImmutableBuffer<Data, Size> { const Data*; Size; }
    - MutableBuffer<Data, Size> { Data*; Size; }

The Process class has some convenience functions for validating and
optionally extracting the contents from these structs:

    - get_syscall_path_argument(StringArgument)
    - validate_and_copy_string_from_user(StringArgument)
    - validate(ImmutableBuffer)
    - validate(MutableBuffer)

There's still so much code around this and I'm wondering if we should
generate most of it instead. Possible nice little project.
2020-01-11 12:47:47 +01:00
Andreas Kling
1434f30f92 Kernel: Remove SmapDisabler in bind() 2020-01-11 12:07:45 +01:00
Andreas Kling
2d7ae42f75 Kernel: Remove SmapDisabler in clock_nanosleep() 2020-01-11 11:51:03 +01:00
Andreas Kling
0ca6d6c8d2 Kernel: Remove validate_read_str() as nothing uses it anymore :^) 2020-01-11 10:57:50 +01:00
Andreas Kling
f5092b1c7e Kernel: Pass a parameter struct to mount()
This was the last remaining syscall that took a null-terminated string
and figured out how long it was by walking it in kernelspace *shudder*.
2020-01-11 10:56:02 +01:00
Andreas Kling
e380142853 Kernel: Pass a parameter struct to rename() 2020-01-11 10:36:54 +01:00
Andreas Kling
46830a0c32 Kernel: Pass a parameter struct to symlink() 2020-01-11 10:31:33 +01:00
Andreas Kling
c97bfbd609 Kernel: Pass a parameter struct to mknod() 2020-01-11 10:27:37 +01:00
Andreas Kling
6536a80aa9 Kernel: Pass a parameter struct to chown() 2020-01-11 10:17:44 +01:00
Andreas Kling
29b3d95004 Kernel: Expose a process's filesystem root as a /proc/PID/root symlink
In order to preserve the absolute path of the process root, we save the
custody used by chroot() before stripping it to become the new "/".
There's probably a better way to do this.
2020-01-10 23:48:44 +01:00
Andreas Kling
ddd0b19281 Kernel: Add a basic chroot() syscall :^)
The chroot() syscall now allows the superuser to isolate a process into
a specific subtree of the filesystem. This is not strictly permanent,
as it is also possible for a superuser to break *out* of a chroot, but
it is a useful mechanism for isolating unprivileged processes.

The VFS now uses the current process's root_directory() as the root for
path resolution purposes. The root directory is stored as an uncached
Custody in the Process object.
2020-01-10 23:14:04 +01:00
Andreas Kling
485443bfca Kernel: Pass characters+length to link() 2020-01-10 21:26:47 +01:00
Andreas Kling
416c7ac2b5 Kernel: Rename Syscall::SyscallString => Syscall::StringArgument 2020-01-10 20:16:18 +01:00
Andreas Kling
0695ff8282 Kernel: Pass characters+length to readlink()
Note that I'm developing some helper types in the Syscall namespace as
I go here. Once I settle on some nice types, I will convert all the
other syscalls to use them as well.
2020-01-10 20:13:23 +01:00
Andreas Kling
8c5cd97b45 Kernel: Fix kernel null deref on process crash during join_thread()
The join_thread() syscall is not supposed to be interruptible by
signals, but it was. And since the process death mechanism piggybacked
on signal interrupts, it was possible to interrupt a pthread_join() by
killing the process that was doing it, leading to confusing due to some
assumptions being made by Thread::finalize() for threads that have a
pending joiner.

This patch fixes the issue by making "interrupted by death" a distinct
block result separate from "interrupted by signal". Then we handle that
state in join_thread() and tidy things up so that thread finalization
doesn't get confused by the pending joiner being gone.

Test: Tests/Kernel/null-deref-crash-during-pthread_join.cpp
2020-01-10 19:23:45 +01:00
Andreas Kling
de69f84868 Kernel: Remove SmapDisablers in fchmod() and fchown() 2020-01-10 14:20:14 +01:00
Andreas Kling
952bb95baa Kernel: Enable SMAP protection during the execve() syscall
The userspace execve() wrapper now measures all the strings and puts
them in a neat and tidy structure on the stack.

This way we know exactly how much to copy in the kernel, and we don't
have to use the SMAP-violating validate_read_str(). :^)
2020-01-10 12:20:36 +01:00
Andreas Kling
197e73ee31 Kernel+LibELF: Enable SMAP protection during non-syscall exec()
When loading a new executable, we now map the ELF image in kernel-only
memory and parse it there. Then we use copy_to_user() when initializing
writable regions with data from the executable.

Note that the exec() syscall still disables SMAP protection and will
require additional work. This patch only affects kernel-originated
process spawns.
2020-01-10 10:57:06 +01:00
Andreas Kling
ff16298b44 Kernel: Removed an unused global variable 2020-01-09 18:02:37 +01:00
Andreas Kling
17ef5bc0ac Kernel: Rename {ss,esp}_if_crossRing to userspace_{ss,esp}
These were always so awkwardly named.
2020-01-09 18:02:01 +01:00
Andreas Kling
4b4d369c5d Kernel: Take path+length in the unlink() and umount() syscalls 2020-01-09 16:23:41 +01:00
Andrew Kaster
e594724b01 Kernel: mmap(..., MAP_PRIVATE, fd, offset) is not supported
Make mmap return -ENOTSUP in this case to make sure users don't get
confused and think they're using a private mapping when it's actually
shared. It's currenlty not possible to open a file and mmap it
MAP_PRIVATE, and change the perms of the private mapping to ones that
don't match the permissions of the underlying file.
2020-01-09 09:29:36 +01:00
Andreas Kling
e1d4b19461 Kernel: open() and openat() should ignore non-permission bits in mode 2020-01-08 15:21:06 +01:00
Andreas Kling
532f240f24 Kernel: Remove unused syscall for setting the signal mask 2020-01-08 15:21:06 +01:00
Andreas Kling
200459d644 Kernel: Fix SMAP violation in join_thread() 2020-01-08 15:21:05 +01:00
Andreas Kling
50056d1d84 Kernel: mmap() should fail with ENODEV for directories 2020-01-08 12:47:37 +01:00
Andreas Kling
fe9680f0a4 Kernel: Validate PROT_READ and PROT_WRITE against underlying file
This patch fixes some issues with the mmap() and mprotect() syscalls,
neither of whom were checking the permission bits of the underlying
files when mapping an inode MAP_SHARED.

This made it possible to subvert execution of any running program
by simply memory-mapping its executable and replacing some of the code.

Test: Kernel/mmap-write-into-running-programs-executable-file.cpp
2020-01-07 19:32:32 +01:00
Andreas Kling
5387a19268 Kernel: Make Process::file_description() vend a RefPtr<FileDescription>
This encourages callers to strongly reference file descriptions while
working with them.

This fixes a use-after-free issue where one thread would close() an
open fd while another thread was blocked on it becoming readable.

Test: Kernel/uaf-close-while-blocked-in-read.cpp
2020-01-07 15:53:42 +01:00
Andreas Kling
6a4b376021 Kernel: Validate ftruncate(fd, length) syscall inputs
- EINVAL if 'length' is negative
- EBADF if 'fd' is not open for writing
2020-01-07 14:48:43 +01:00
Andreas Kling
78a63930cc Kernel+LibELF: Validate PT_LOAD and PT_TLS offsets before memcpy()'ing
Before this, you could make the kernel copy memory from anywhere by
setting up an ELF executable with a program header specifying file
offsets outside the file.

Since ELFImage didn't even know how large it was, we had no clue that
we were copying things from outside the ELF.

Fix this by adding a size field to ELFImage and validating program
header ranges before memcpy()'ing to them.

The ELF code is definitely going to need more validation and checking.
2020-01-06 21:04:57 +01:00
Andreas Kling
8088fa0556 Kernel: Process::send_signal() should prefer main thread
The main/first thread in a process always has the same TID as the PID.
2020-01-06 14:37:26 +01:00
Andreas Kling
a803312eb4 Kernel: Send SIGCHLD to the thread with same PID as my PPID
Instead of delivering SIGCHLD to "any thread" in the process with PPID,
send it to the thread with the same TID as my PPID.
2020-01-06 14:35:42 +01:00
Andreas Kling
cd42ccd686 Kernel: The waitpid() syscall was not storing to "wstatus" in all cases 2020-01-06 14:34:04 +01:00
Andreas Kling
47cc3e68c6 Kernel: Remove bogus kernel image access validation checks
This code had been misinterpreting the Multiboot ELF section headers
since the beginning. Furthermore QEMU wasn't even passing us any
headers at all, so this wasn't checking anything.
2020-01-06 13:27:14 +01:00
Andreas Kling
53bda09d15 Kernel: Make utime() take path+length, remove SmapDisabler 2020-01-06 12:23:30 +01:00
Andreas Kling
1226fec19e Kernel: Remove SmapDisablers in stat() and lstat() 2020-01-06 12:13:48 +01:00
Andreas Kling
a47f0c93de Kernel: Pass name+length to mmap() and remove SmapDisabler 2020-01-06 12:04:55 +01:00
Andreas Kling
33025a8049 Kernel: Pass name+length to set_mmap_name() and remove SmapDisabler 2020-01-06 11:56:59 +01:00
Andreas Kling
6af8392cf8 Kernel: Remove SmapDisabler in futex() 2020-01-06 11:44:15 +01:00
Andreas Kling
a30fb5c5c1 Kernel: SMAP fixes for module_load() and module_unload()
Remove SmapDisabler in module_load() + use get_syscall_path_argument().
Also fix a SMAP violation in module_unload().
2020-01-06 11:36:16 +01:00
Andreas Kling
7c916b9fe9 Kernel: Make realpath() take path+length, get rid of SmapDisabler 2020-01-06 11:32:25 +01:00
Andreas Kling
d6b06fd5a3 Kernel: Make watch_file() syscall take path length as a size_t
We don't care to handle negative path lengths anyway.
2020-01-06 11:15:49 +01:00
Andreas Kling
cf7df95ffe Kernel: Use get_syscall_path_argument() for syscalls that take paths 2020-01-06 11:15:49 +01:00
Andreas Kling
0df72d4712 Kernel: Pass path+length to mkdir(), rmdir() and chmod() 2020-01-06 11:15:49 +01:00
Andreas Kling
642137f014 Kernel: Make access() take path+length
Also, let's return EFAULT for nullptr at the LibC layer. We can't do
all bad addresses this way, but we can at least do null. :^)
2020-01-06 11:15:48 +01:00
Andreas Kling
2c3a6c37ac Kernel: Paper over SMAP violations in clock_{gettime,nanosleep}()
Just put some SmapDisablers here to unbreak the nesalizer port.
2020-01-05 23:20:33 +01:00
Andreas Kling
c5890afc8b Kernel: Make chdir() take path+length 2020-01-05 22:06:25 +01:00
Andreas Kling
f231e9ea76 Kernel: Pass path+length to the stat() and lstat() syscalls
It's not pleasant having to deal with null-terminated strings as input
to syscalls, so let's get rid of them one by one.
2020-01-05 22:02:54 +01:00
Andreas Kling
152a83fac5 Kernel: Remove SmapDisabler in watch_file() 2020-01-05 21:55:20 +01:00
Andreas Kling
80cbb72f2f Kernel: Remove SmapDisablers in open(), openat() and set_thread_name()
This patch introduces a helpful copy_string_from_user() function
that takes a bounded null-terminated string from userspace memory
and copies it into a String object.
2020-01-05 21:51:06 +01:00
Andreas Kling
c4a1ea34c2 Kernel: Fix SMAP violation in writev() syscall 2020-01-05 19:20:08 +01:00
Andreas Kling
9eef39d68a Kernel: Start implementing x86 SMAP support
Supervisor Mode Access Prevention (SMAP) is an x86 CPU feature that
prevents the kernel from accessing userspace memory. With SMAP enabled,
trying to read/write a userspace memory address while in the kernel
will now generate a page fault.

Since it's sometimes necessary to read/write userspace memory, there
are two new instructions that quickly switch the protection on/off:
STAC (disables protection) and CLAC (enables protection.)
These are exposed in kernel code via the stac() and clac() helpers.

There's also a SmapDisabler RAII object that can be used to ensure
that you don't forget to re-enable protection before returning to
userspace code.

THis patch also adds copy_to_user(), copy_from_user() and memset_user()
which are the "correct" way of doing things. These functions allow us
to briefly disable protection for a specific purpose, and then turn it
back on immediately after it's done. Going forward all kernel code
should be moved to using these and all uses of SmapDisabler are to be
considered FIXME's.

Note that we're not realizing the full potential of this feature since
I've used SmapDisabler quite liberally in this initial bring-up patch.
2020-01-05 18:14:51 +01:00
Andreas Kling
1525c11928 Kernel: Add missing iovec base validation for writev() syscall
We were forgetting to validate the base pointers of iovecs passed into
the writev() syscall.

Thanks to braindead for finding this bug! :^)
2020-01-05 10:38:02 +01:00
Andreas Kling
c89fe8a6a3 Kernel: Fix bad TOCTOU pattern in syscalls that take a parameter struct
Our syscall calling convention only allows passing up to 3 arguments in
registers. For syscalls that take more arguments, we bake them into a
struct and pass a pointer to that struct instead.

When doing pointer validation, this is what we would do:

    1) Validate the "params" struct
    2) Validate "params->some_pointer"
    3) ... other stuff ...
    4) Use "params->some_pointer"

Since the parameter struct is stored in userspace, it can be modified
by userspace after validation has completed.

This was a recurring pattern in many syscalls that was further hidden
by me using structured binding declarations to give convenient local
names to things in the parameter struct:

    auto& [some_pointer, ...] = *params;
    memcpy(some_pointer, ...);

This devilishly makes "some_pointer" look like a local variable but
it's actually more like an alias for "params->some_pointer" and will
expand to a dereference when accessed!

This patch fixes the issues by explicitly copying out each member from
the parameter structs before validating them, and then never using
the "param" pointers beyond that.

Thanks to braindead for finding this bug! :^)
2020-01-05 10:37:57 +01:00
Andreas Kling
3a27790fa7 Kernel: Use Thread::from_tid() in more places 2020-01-04 18:56:04 +01:00
Andreas Kling
95ba0d5a02 Kernel: Remove unused "putch" syscall 2020-01-04 16:00:25 +01:00
Andreas Kling
5abc30e057 Kernel: Allow setgroups() to drop all groups with nullptr
Previously we'd EFAULT for setgroups(0, nullptr), but we can just as
well tolerate it if someone wants to drop groups without a pointer.
2020-01-04 13:47:54 +01:00
Andreas Kling
d84299c7be Kernel: Allow fchmod() and fchown() on pre-bind() local sockets
In order to ensure a specific owner and mode when the local socket
filesystem endpoint is instantiated, we need to be able to call
fchmod() and fchown() on a socket fd between socket() and bind().

This is because until we call bind(), there is no filesystem inode
for the socket yet.
2020-01-03 20:14:56 +01:00
Andreas Kling
1dc64ec064 Kernel: Remove unnecessary logic in kill() and killpg() syscalls
As Sergey pointed out, do_killpg() already interprets PID 0 as the
PGID of the calling process.
2020-01-03 12:58:59 +01:00
Andreas Kling
9026598999 Kernel: Add a more expressive API for getting random bytes
We now have these API's in <Kernel/Random.h>:

    - get_fast_random_bytes(u8* buffer, size_t buffer_size)
    - get_good_random_bytes(u8* buffer, size_t buffer_size)
    - get_fast_random<T>()
    - get_good_random<T>()

Internally they both use x86 RDRAND if available, otherwise they fall
back to the same LCG we had in RandomDevice all along.

The main purpose of this patch is to give kernel code a way to better
express its needs for random data.

Randomness is something that will require a lot more work, but this is
hopefully a step in the right direction.
2020-01-03 12:43:07 +01:00
Andreas Kling
24cc67d199 Kernel: Remove read_tsc() syscall
Since nothing is using this, let's just remove it. That's one less
thing to worry about.
2020-01-03 09:27:09 +01:00
Andreas Kling
8cc5fa5598 Kernel: Unbreak module loading (broke with NX bit changes)
Modules are now mapped fully RWX. This can definitely be improved,
but at least it unbreaks the feature for now.
2020-01-03 03:44:55 +01:00
Andreas Kling
0a1865ebc6 Kernel: read() and write() should fail with EBADF for wrong mode fd's
It was previously possible to write to read-only file descriptors,
and read from write-only file descriptors.

All FileDescription objects now start out non-readable + non-writable,
and whoever is creating them has to "manually" enable reading/writing
by calling set_readable() and/or set_writable() on them.
2020-01-03 03:29:59 +01:00
Andreas Kling
15f3abc849 Kernel: Handle O_DIRECTORY in VFS::open() instead of in each syscall
Just taking care of some FIXMEs.
2020-01-03 03:16:29 +01:00
Andreas Kling
05653a9189 Kernel: killpg() with pgrp=0 should signal every process in the group
In the same group as the calling process, that is.
2020-01-03 03:16:29 +01:00
Andreas Kling
005313df82 Kernel: kill() with signal 0 should not actually send anything
Also kill() with pid 0 should send to everyone in the same process
group as the calling process.
2020-01-03 03:16:29 +01:00
Andreas Kling
8345f51a24 Kernel: Remove unnecessary wraparound check in Process::validate_read()
This will be checked moments later by MM.validate_user_read().
2020-01-03 03:16:29 +01:00
Andreas Kling
fdde5cdf26 Kernel: Don't include the process GID in the "extra GIDs" table
Process::m_extra_gids is for supplementary GIDs only.
2020-01-02 23:45:52 +01:00
Andreas Kling
9fe316c2d8 Kernel: Add some missing error checks to the setpgid() syscall 2020-01-02 19:40:04 +01:00
Andreas Kling
285130cc55 Kernel: Remove debug spam about marking threads for death 2020-01-02 13:45:22 +01:00
Andreas Kling
7f843ef3b2 Kernel: Make the purge() syscall superuser-only
I don't think we need to give unprivileged users access to what is
essentially a kernel testing mechanism.
2020-01-02 13:39:49 +01:00
Andreas Kling
c01f766fb2 Kernel: writev() should fail with EINVAL if total length > INT32_MAX 2020-01-02 13:01:41 +01:00
Andreas Kling
7f04334664 Kernel: Remove broken implementation of Unix SHM
This code never worked, as was never used for anything. We can build
a much better SHM implementation on top of TmpFS or similar when we
get to the point when we need one.
2020-01-02 12:44:21 +01:00
Andrew Kaster
bc50a10cc9 Kernel: sys$mprotect protects sub-regions as well as whole ones
Split a region into two/three if the desired mprotect range is a strict
subset of an existing region. We can then set the access bits on a new
region that is just our desired range and add both the new
desired subregion and the leftovers back to our page tables.
2020-01-02 12:27:13 +01:00
Andreas Kling
3f7de2713e Kernel: Make mknod() respect the process umask
Otherwise the /bin/mknod command would create world-writable inodes
by default (when run by superuser) which you probably don't want.
2020-01-02 02:40:43 +01:00
Andreas Kling
c7eb3ff1b3 Kernel: mknod() should not allow unprivileged users to create devices
In fact, unless you are superuser, you may only create a regular file,
a named pipe, or a local domain socket. Anything else should EPERM.
2020-01-02 02:36:12 +01:00
Andreas Kling
3dcec260ed Kernel: Validate the full range of user memory passed to syscalls
We now validate the full range of userspace memory passed into syscalls
instead of just checking that the first and last byte of the memory are
in process-owned regions.

This fixes an issue where it was possible to avoid rejection of invalid
addresses that sat between two valid ones, simply by passing a valid
address and a size large enough to put the end of the range at another
valid address.

I added a little test utility that tries to provoke EFAULT in various
ways to help verify this. I'm sure we can think of more ways to test
this but it's at least a start. :^)

Thanks to mozjag for pointing out that this code was still lacking!

Incidentally this also makes backtraces work again.

Fixes #989.
2020-01-02 02:17:12 +01:00
Andreas Kling
38f93ef13b Kernel: Disable x86 RDTSC instruction in userspace
It's still possible to read the TSC via the read_tsc() syscall, but we
will now clear some of the bottom bits for unprivileged users.
2020-01-01 18:22:20 +01:00
Andreas Kling
f598bbbb1d Kernel: Prevent executing I/O instructions in userspace
All threads were running with iomapbase=0 in their TSS, which the CPU
interprets as "there's an I/O permission bitmap starting at offset 0
into my TSS".

Because of that, any bits that were 1 inside the TSS would allow the
thread to execute I/O instructions on the port with that bit index.

Fix this by always setting the iomapbase to sizeof(TSS32), and also
setting the TSS descriptor's limit to sizeof(TSS32), effectively making
the I/O permissions bitmap zero-length.

This should make it no longer possible to do I/O from userspace. :^)
2020-01-01 17:31:41 +01:00
Andreas Kling
14cdd3fdc1 Kernel: Make module_load() and module_unload() be superuser-only
These should just fail with EPERM if you're not the superuser.
2020-01-01 00:46:08 +01:00
Tibor Nagy
624116a8b1 Kernel: Implement AltGr key support 2019-12-31 19:31:42 +01:00
Andreas Kling
36f1de3c89 Kernel: Pointer range validation should fail on wraparound
Let's reject address ranges that wrap around the 2^32 mark.
2019-12-31 18:23:17 +01:00
Andreas Kling
903b159856 Kernel: Write address validation was only checking end of write range
Thanks to yyyyyyy for finding the bug! :^)
2019-12-31 18:18:54 +01:00
Andreas Kling
3f254bfbc8 Kernel+ping: Only allow superuser to create SOCK_RAW sockets
/bin/ping is now setuid-root, and will drop privileges immediately
after opening a raw socket.
2019-12-31 01:42:34 +01:00
Andreas Kling
a69734bf2e Kernel: Also add a process boosting mechanism
Let's also have set_process_boost() for giving all threads in a process
the same boost.
2019-12-30 20:10:00 +01:00
Andreas Kling
610f3ad12f Kernel: Add a basic thread boosting mechanism
This patch introduces a syscall:

    int set_thread_boost(int tid, int amount)

You can use this to add a permanent boost value to the effective thread
priority of any thread with your UID (or any thread in the system if
you are the superuser.)

This is quite crude, but opens up some interesting opportunities. :^)
2019-12-30 19:23:13 +01:00
Andreas Kling
50677bf806 Kernel: Refactor scheduler to use dynamic thread priorities
Threads now have numeric priorities with a base priority in the 1-99
range.

Whenever a runnable thread is *not* scheduled, its effective priority
is incremented by 1. This is tracked in Thread::m_extra_priority.
The effective priority of a thread is m_priority + m_extra_priority.

When a runnable thread *is* scheduled, its m_extra_priority is reset to
zero and the effective priority returns to base.

This means that lower-priority threads will always eventually get
scheduled to run, once its effective priority becomes high enough to
exceed the base priority of threads "above" it.

The previous values for ThreadPriority (Low, Normal and High) are now
replaced as follows:

    Low -> 10
    Normal -> 30
    High -> 50

In other words, it will take 20 ticks for a "Low" priority thread to
get to "Normal" effective priority, and another 20 to reach "High".

This is not perfect, and I've used some quite naive data structures,
but I think the mechanism will allow us to build various new and
interesting optimizations, and we can figure out better data structures
later on. :^)
2019-12-30 18:46:17 +01:00
Andrew Kaster
cdcab7e5f4 Kernel: Retry mmap if MAP_FIXED is not in flags and addr is not 0
If an mmap fails to allocate a region, but the addr passed in was
non-zero, non-fixed mmaps should attempt to allocate at any available
virtual address.
2019-12-29 23:01:27 +01:00
Andreas Kling
fed3416bd2 Kernel: Embrace the SerenityOS name 2019-12-29 19:08:02 +01:00
Andreas Kling
1f31156173 Kernel: Add a mode flag to sys$purge and allow purging clean inodes 2019-12-29 13:16:53 +01:00
Andreas Kling
c74cde918a Kernel+SystemMonitor: Expose amount of per-process clean inode memory
This is memory that's loaded from an inode (file) but not modified in
memory, so still identical to what's on disk. This kind of memory can
be freed and reloaded transparently from disk if needed.
2019-12-29 12:45:58 +01:00
Andreas Kling
0d5e0e4cad Kernel+SystemMonitor: Expose amount of per-process dirty private memory
Dirty private memory is all memory in non-inode-backed mappings that's
process-private, meaning it's not shared with any other process.

This patch exposes that number via SystemMonitor, giving us an idea of
how much memory each process is responsible for all on its own.
2019-12-29 12:28:32 +01:00
Andreas Kling
95034fdfbd Kernel: Move PC speaker beep timing logic from scheduler to the syscall
I don't know why I put this in the scheduler to begin with.. the caller
can just block until the beeping is finished.
2019-12-26 22:31:26 +01:00
Andreas Kling
4a8683ea68 Kernel+LibPthread+LibC: Add a naive futex and use it for pthread_cond_t
This patch implements a simple version of the futex (fast userspace
mutex) API in the kernel and uses it to make the pthread_cond_t API's
block instead of busily sched_yield().

An arbitrary userspace address is passed to the kernel as a "token"
that identifies the futex and you can then FUTEX_WAIT and FUTEX_WAKE
that specific userspace address.

FUTEX_WAIT corresponds to pthread_cond_wait() and FUTEX_WAKE is used
for pthread_cond_signal() and pthread_cond_broadcast().

I'm pretty sure I'm missing something in this implementation, but it's
hopefully okay for a start. :^)
2019-12-25 23:54:06 +01:00
Andreas Kling
9e55bcb7da Kernel: Make kernel memory regions be non-executable by default
From now on, you'll have to request executable memory specifically
if you want some.
2019-12-25 22:41:34 +01:00
Andreas Kling
56a28890eb Kernel: Clarify the various input validity checks in mmap()
Also share some validation logic between mmap() and mprotect().
2019-12-25 21:50:13 +01:00
Andreas Kling
419e0ced27 Kernel: Don't allow mmap()/mprotect() to set up PROT_WRITE|PROT_EXEC
..but also allow mprotect() to set PROT_EXEC on a region, something
we were just ignoring before.
2019-12-25 13:35:57 +01:00
Conrad Pankoff
efa7141d14 Kernel: Fail module loading if any symbols can not be resolved 2019-12-24 11:52:01 +01:00
Conrad Pankoff
9a8032b479 Kernel: Disallow loading a module twice without explicitly unloading it
This ensures that a module has the chance to run its cleanup functions
before it's taken out of service.
2019-12-24 02:20:37 +01:00
Conrad Pankoff
3aaeff483b Kernel: Add a size argument to validate_read_from_kernel 2019-12-24 01:28:38 +01:00
Andreas Kling
4b8851bd01 Kernel: Make TID's be unique PID's
This is a little strange, but it's how I understand things should work.

The first thread in a new process now has TID == PID.
Additional threads subsequently spawned in that process all have unique
TID's generated by the PID allocator. TIDs are now globally unique.
2019-12-22 12:38:01 +01:00
Andreas Kling
16812f0f98 Kernel: Get rid of "main thread" concept
The idea of all processes reliably having a main thread was nice in
some ways, but cumbersome in others. More importantly, it didn't match
up with POSIX thread semantics, so let's move away from it.

This thread gets rid of Process::main_thread() and you now we just have
a bunch of Thread objects floating around each Process.

When the finalizer nukes the last Thread in a Process, it will also
tear down the Process.

There's a bunch of more things to fix around this, but this is where we
get started :^)
2019-12-22 12:37:58 +01:00
Andreas Kling
b6ee8a2c8d Kernel: Rename vmo => vmobject everywhere 2019-12-19 19:15:27 +01:00
Andreas Kling
8ea4217c01 Kernel: Merge Process::fork() into sys$fork()
There was no good reason for this to be a separate function.
2019-12-19 19:07:41 +01:00
Andreas Kling
3012b224f0 Kernel: Fix intermittent assertion failure in sys$exec()
While setting up the main thread stack for a new process, we'd incur
some zero-fill page faults. This was to be expected, since we allocate
a huge stack but lazily populate it with physical pages.

The problem is that page fault handlers may enable interrupts in order
to grab a VMObject lock (or to page in from an inode.)

During exec(), a process is reorganizing itself and will be in a very
unrunnable state if the scheduler should interrupt it and then later
ask it to run again. Which is exactly what happens if the process gets
pre-empted while the new stack's zero-fill page fault grabs the lock.

This patch fixes the issue by creating new main thread stacks before
disabling interrupts and going into the critical part of exec().
2019-12-18 23:03:23 +01:00
Andreas Kling
72ec2fae6e Kernel: Ignore MADV_SET_NONVOLATILE if already non-volatile
Just return 0 right away without changing any region flags.
2019-12-18 20:48:58 +01:00