Commit Graph

955 Commits

Author SHA1 Message Date
Itamar
9e51e295cf ptrace: Add PT_SETREGS
PT_SETTREGS sets the regsiters of the traced thread. It can only be
used when the tracee is stopped.

Also, refactor ptrace.
The implementation was getting long and cluttered the alraedy large
Process.cpp file.

This commit moves the bulk of the implementation to Kernel/Ptrace.cpp,
and factors out peek & poke to separate methods of the Process class.
2020-04-13 00:53:22 +02:00
Itamar
0431712660 ptrace: Stop a traced thread when it exists from execve
This was a missing feature in the PT_TRACEME command.

This feature allows the tracer to interact with the tracee before the
tracee has started executing its program.

It will be useful for automatically inserting a breakpoint at a
debugged program's entry point.
2020-04-13 00:53:22 +02:00
Itamar
b306ac9b2b ptrace: Add PT_POKE
PT_POKE writes a single word to the tracee's address space.

Some caveats:
- If the user requests to write to an address in a read-only region, we
temporarily change the page's protections to allow it.

- If the user requests to write to a region that's backed by a
SharedInodeVMObject, we replace the vmobject with a PrivateIndoeVMObject.
2020-04-13 00:53:22 +02:00
Itamar
984ff93406 ptrace: Add PT_PEEK
PT_PEEK reads a single word from the tracee's address space and returns
it to the tracer.
2020-04-13 00:53:22 +02:00
Andreas Kling
c19b56dc99 Kernel+LibC: Add minherit() and MAP_INHERIT_ZERO
This patch adds the minherit() syscall originally invented by OpenBSD.
Only the MAP_INHERIT_ZERO mode is supported for now. If set on an mmap
region, that region will be zeroed out on fork().
2020-04-12 20:22:26 +02:00
Andrew Kaster
61acca223f LibELF: Move validation methods to their own file
These validate_elf_* methods really had no business being static
methods of ELF::Image. Now that the ELF namespace exists, it makes
sense to just move them to be free functions in the namespace.
2020-04-11 22:41:05 +02:00
Andrew Kaster
21b5909dc6 LibELF: Move ELF classes into namespace ELF
This is for consistency with other namespace changes that were made
a while back to the other libraries :)
2020-04-11 22:41:05 +02:00
Andreas Kling
dec352dacd Kernel: Ignore zero-length PROGBITS sections in sys$module_load() 2020-04-10 16:36:01 +02:00
Andreas Kling
c06d5ef114 Kernel+LibC: Remove ESUCCESS
There's no official ESUCCESS==0 errno code, and it keeps breaking the
Lagom build when we use it, so let's just say 0 instead.
2020-04-10 13:09:35 +02:00
Andreas Kling
871d450b93 Kernel: Remove redundant "ACPI" from filenames in ACPI/ 2020-04-09 18:17:27 +02:00
Andreas Kling
4644217094 Kernel: Remove "non-operational" ACPI parser state
If we don't support ACPI, just don't instantiate an ACPI parser.
This is way less confusing than having a special parser class whose
only purpose is to do nothing.

We now search for the RSDP in ACPI::initialize() instead of letting
the parser constructor do it. This allows us to defer the decision
to create a parser until we're sure we can make a useful one.
2020-04-09 17:19:11 +02:00
Andreas Kling
dc7340332d Kernel: Update cryptically-named functions related to symbolication 2020-04-08 17:19:46 +02:00
Liav A
23fb985f02 Kernel & Userland: Allow to mount image files formatted with Ext2FS 2020-04-06 15:36:36 +02:00
Andreas Kling
9ae3cced76 Revert "Kernel & Userland: Allow to mount image files formatted with Ext2FS"
This reverts commit a60ea79a41.

Reverting these changes since they broke things.
Fixes #1608.
2020-04-03 21:28:57 +02:00
Liav A
a60ea79a41 Kernel & Userland: Allow to mount image files formatted with Ext2FS 2020-04-02 12:03:08 +02:00
Itamar
6b74d38aab Kernel: Add 'ptrace' syscall
This commit adds a basic implementation of
the ptrace syscall, which allows one process
(the tracer) to control another process (the tracee).

While a process is being traced, it is stopped whenever a signal is
received (other than SIGCONT).

The tracer can start tracing another thread with PT_ATTACH,
which causes the tracee to stop.

From there, the tracer can use PT_CONTINUE
to continue the execution of the tracee,
or use other request codes (which haven't been implemented yet)
to modify the state of the tracee.

Additional request codes are PT_SYSCALL, which causes the tracee to
continue exection but stop at the next entry or exit from a syscall,
and PT_GETREGS which fethces the last saved register set of the tracee
(can be used to inspect syscall arguments and return value).

A special request code is PT_TRACE_ME, which is issued by the tracee
and causes it to stop when it calls execve and wait for the
tracer to attach.
2020-03-28 18:27:18 +01:00
Shannon Booth
757c14650f Kernel: Simplify process assertion checking if region is in range
Let's use the helper function for this :)
2020-03-22 08:51:40 +01:00
Liav A
b536547c52 Process: Use monotonic time for timeouts 2020-03-19 15:48:00 +01:00
Liav A
4484513b45 Kernel: Add new syscall to allow changing the system date 2020-03-19 15:48:00 +01:00
Liav A
9db291d885 Kernel: Introduce the new Time management subsystem
This new subsystem includes better abstractions of how time will be
handled in the OS. We take advantage of the existing RTC timer to aid
in keeping time synchronized. This is standing in contrast to how we
handled time-keeping in the kernel, where the PIT was responsible for
that function in addition to update the scheduler about ticks.
With that new advantage, we can easily change the ticking dynamically
and still keep the time synchronized.

In the process context, we no longer use a fixed declaration of
TICKS_PER_SECOND, but we call the TimeManagement singleton class to
provide us the right value. This allows us to use dynamic ticking in
the future, a feature known as tickless kernel.

The scheduler no longer does by himself the calculation of real time
(Unix time), and just calls the TimeManagment singleton class to provide
the value.

Also, we can use 2 new boot arguments:
- the "time" boot argument accpets either the value "modern", or
  "legacy". If "modern" is specified, the time management subsystem will
  try to setup HPET. Otherwise, for "legacy" value, the time subsystem
  will revert to use the PIT & RTC, leaving HPET disabled.
  If this boot argument is not specified, the default pattern is to try
  to setup HPET.
- the "hpet" boot argumet accepts either the value "periodic" or
  "nonperiodic". If "periodic" is specified, the HPET will scan for
  periodic timers, and will assert if none are found. If only one is
  found, that timer will be assigned for the time-keeping task. If more
  than one is found, both time-keeping task & scheduler-ticking task
  will be assigned to periodic timers.
  If this boot argument is not specified, the default pattern is to try
  to scan for HPET periodic timers. This boot argument has no effect if
  HPET is disabled.

In hardware context, PIT & RealTimeClock classes are merely inheriting
from the HardwareTimer class, and they allow to use the old i8254 (PIT)
and RTC devices, managing them via IO ports. By default, the RTC will be
programmed to a frequency of 1024Hz. The PIT will be programmed to a
frequency close to 1000Hz.

About HPET, depending if we need to scan for periodic timers or not,
we try to set a frequency close to 1000Hz for the time-keeping timer
and scheduler-ticking timer. Also, if possible, we try to enable the
Legacy replacement feature of the HPET. This feature if exists,
instructs the chipset to disconnect both i8254 (PIT) and RTC.
This behavior is observable on QEMU, and was verified against the source
code:
ce967e2f33

The HPETComparator class is inheriting from HardwareTimer class, and is
responsible for an individual HPET comparator, which is essentially a
timer. Therefore, it needs to call the singleton HPET class to perform
HPET-related operations.

The new abstraction of Hardware timers brings an opportunity of more new
features in the foreseeable future. For example, we can change the
callback function of each hardware timer, thus it makes it possible to
swap missions between hardware timers, or to allow to use a hardware
timer for other temporary missions (e.g. calibrating the LAPIC timer,
measuring the CPU frequency, etc).
2020-03-19 15:48:00 +01:00
Alex Muscar
d013753f83
Kernel: Resolve relative paths when there is a veil (#1474) 2020-03-19 09:57:34 +01:00
Andreas Kling
ad92a1e4bc Kernel: Add sys$get_stack_bounds() for finding the stack base & size
This will be useful when implementing conservative garbage collection.
2020-03-16 19:06:33 +01:00
Andreas Kling
3803196edb Kernel: Get rid of SmapDisabler in sys$fstat() 2020-03-10 13:34:24 +01:00
Liav A
0f45a1b5e7 Kernel: Allow to reboot in ACPI via PCI or MMIO access
Also, we determine if ACPI reboot is supported by checking the FADT
flags' field.
2020-03-09 10:53:13 +01:00
Ben Wiederhake
b066586355 Kernel: Fix race in waitid
This is similar to 28e1da344d
and 4dd4dd2f3c.

The crux is that wait verifies that the outvalue (siginfo* infop)
is writable *before* waiting, and writes to it *after* waiting.
In the meantime, a concurrent thread can make the output region
unwritable, e.g. by deallocating it.
2020-03-08 14:12:12 +01:00
Ben Wiederhake
d8cd4e4902 Kernel: Fix race in select
This is similar to 28e1da344d
and 4dd4dd2f3c.

The crux is that select verifies that the filedescriptor sets
are writable *before* blocking, and writes to them *after* blocking.
In the meantime, a concurrent thread can make the output buffer
unwritable, e.g. by deallocating it.
2020-03-08 14:12:12 +01:00
Andreas Kling
b1058b33fb AK: Add global FlatPtr typedef. It's u32 or u64, based on sizeof(void*)
Use this instead of uintptr_t throughout the codebase. This makes it
possible to pass a FlatPtr to something that has u32 and u64 overloads.
2020-03-08 13:06:51 +01:00
Andreas Kling
c6693f9b3a Kernel: Simplify a bunch of dbg() and klog() calls
LogStream can handle VirtualAddress and PhysicalAddress directly.
2020-03-06 15:00:44 +01:00
Liav A
85eb1d26d5 Kernel: Run clang-format on Process.cpp & ACPIDynamicParser.h 2020-03-05 19:04:04 +01:00
Liav A
1b8cd6db7b Kernel: Call ACPI reboot method first if possible
Now we call ACPI reboot method first if possible, and if ACPI reboot is
not available, we attempt to reboot via the keyboard controller.
2020-03-05 19:04:04 +01:00
Ben Wiederhake
4dd4dd2f3c Kernel: Fix race in clock_nanosleep
This is a complete fix of clock_nanosleep, because the thread holds the
process lock again when returning from sleep()/sleep_until().
Therefore, no further concurrent invalidation can occur.
2020-03-03 20:13:32 +01:00
Liav A
0fc60e41dd Kernel: Use klog() instead of kprintf()
Also, duplicate data in dbg() and klog() calls were removed.
In addition, leakage of virtual address to kernel log is prevented.
This is done by replacing kprintf() calls to dbg() calls with the
leaked data instead.
Also, other kprintf() calls were replaced with klog().
2020-03-02 22:23:39 +01:00
Andreas Kling
47beab926d Kernel: Remove ability to create kernel-only regions at user addresses
This was only used by the mechanism for mapping executables into each
process's own address space. Now that we remap executables on demand
when needed for symbolication, this can go away.
2020-03-02 11:20:34 +01:00
Andreas Kling
e56f8706ce Kernel: Map executables at a kernel address during ELF load
This is both simpler and more robust than mapping them in the process
address space.
2020-03-02 11:20:34 +01:00
Andreas Kling
678c87087d Kernel: Load executables on demand when symbolicating
Previously we would map the entire executable of a program in its own
address space (but make it unavailable to userspace code.)

This patch removes that and changes the symbolication code to remap
the executable on demand (and into the kernel's own address space
instead of the process address space.)

This opens up a couple of further simplifications that will follow.
2020-03-02 11:20:34 +01:00
Andreas Kling
0acac186fb Kernel: Make the "entire executable" region shared
This makes Region::clone() do the right thing with it on fork().
2020-03-02 06:13:29 +01:00
Andreas Kling
5c2a296a49 Kernel: Mark read-only PT_LOAD mappings as shared regions
This makes Region::clone() do the right thing for these now that we
differentiate based on Region::is_shared().
2020-03-01 21:26:36 +01:00
Andreas Kling
ecfde5997b Kernel: Use SharedInodeVMObject for executables after all
I had the wrong idea about this. Thanks to Sergey for pointing it out!

Here's what he says (reproduced for posterity):

> Private mappings protect the underlying file from the changes made by
> you, not the other way around. To quote POSIX, "If MAP_PRIVATE is
> specified, modifications to the mapped data by the calling process
> shall be visible only to the calling process and shall not change the
> underlying object. It is unspecified whether modifications to the
> underlying object done after the MAP_PRIVATE mapping is established
> are visible through the MAP_PRIVATE mapping." In practice that means
> that the pages that were already paged in don't get updated when the
> underlying file changes, and the pages that weren't paged in yet will
> load the latest data at that moment.
> The only thing MAP_FILE | MAP_PRIVATE is really useful for is mapping
> a library and performing relocations; it's definitely useless (and
> actively harmful for the system memory usage) if you only read from
> the file.

This effectively reverts e2697c2ddd.
2020-03-01 21:16:27 +01:00
Andreas Kling
bb7dd63f74 Kernel: Run clang-format on Process.cpp 2020-03-01 21:16:27 +01:00
Andreas Kling
687b52ceb5 Kernel: Name perfcore files "perfcore.PID"
This way we can trace many things and we get one perfcore file per
process instead of everyone trying to write to "perfcore"
2020-03-01 20:59:02 +01:00
Andreas Kling
fee20bd8de Kernel: Remove some more harmless InodeVMObject miscasts 2020-03-01 12:27:03 +01:00
Andreas Kling
95e3aec719 Kernel: Fix harmless type miscast in Process::amount_clean_inode() 2020-03-01 11:23:23 +01:00
Andreas Kling
e2697c2ddd Kernel: Use PrivateInodeVMObject for loading program executables
This will be a memory usage pessimization until we actually implement
CoW sharing of the memory pages with SharedInodeVMObject.

However, it's a huge architectural improvement, so let's take it and
improve on this incrementally.

fork() should still be neutral, since all private mappings are CoW'ed.
2020-03-01 11:23:10 +01:00
Andreas Kling
88b334135b Kernel: Remove some Region construction helpers
It's now up to the caller to provide a VMObject when constructing a new
Region object. This will make it easier to handle things going wrong,
like allocation failures, etc.
2020-03-01 11:23:10 +01:00
Andreas Kling
4badef8137 Kernel: Return bytes written if sys$write() fails after writing some
If we wrote anything we should just inform userspace that we did,
and not worry about the error code. Userspace can call us again if
it wants, and we'll give them the error then.
2020-02-29 18:42:35 +01:00
Andreas Kling
7cd1bdfd81 Kernel: Simplify some dbg() logging
We don't have to log the process name/PID/TID, dbg() automatically adds
that as a prefix to every line.

Also we don't have to do .characters() on Strings passed to dbg() :^)
2020-02-29 13:39:06 +01:00
Andreas Kling
8fbdda5a2d Kernel: Implement basic support for sys$mmap() with MAP_PRIVATE
You can now mmap a file as private and writable, and the changes you
make will only be visible to you.

This works because internally a MAP_PRIVATE region is backed by a
unique PrivateInodeVMObject instead of using the globally shared
SharedInodeVMObject like we always did before. :^)

Fixes #1045.
2020-02-28 23:25:00 +01:00
Andreas Kling
aa1e209845 Kernel: Remove some unnecessary indirection in InodeFile::mmap()
InodeFile now directly calls Process::allocate_region_with_vmobject()
instead of taking an awkward detour via a special Region constructor.
2020-02-28 20:29:14 +01:00
Andreas Kling
651417a085 Kernel: Split InodeVMObject into two subclasses
We now have PrivateInodeVMObject and SharedInodeVMObject, corresponding
to MAP_PRIVATE and MAP_SHARED respectively.

Note that PrivateInodeVMObject is not used yet.
2020-02-28 20:20:35 +01:00
Andreas Kling
07a26aece3 Kernel: Rename InodeVMObject => SharedInodeVMObject 2020-02-28 20:07:51 +01:00
Andreas Kling
5af95139fa Kernel: Make Process::m_master_tls_region a WeakPtr
Let's not keep raw Region* variables around like that when it's so easy
to avoid it.
2020-02-28 14:05:30 +01:00
Andreas Kling
b0623a0c58 Kernel: Remove SmapDisabler in sys$connect() 2020-02-28 13:20:26 +01:00
Andreas Kling
dcd619bd46 Kernel: Merge the shbuf_get_size() syscall into shbuf_get()
Add an extra out-parameter to shbuf_get() that receives the size of the
shared buffer. That way we don't need to make a separate syscall to
get the size, which we always did immediately after.
2020-02-28 12:55:58 +01:00
Andreas Kling
f72e5bbb17 Kernel+LibC: Rename shared buffer syscalls to use a prefix
This feels a lot more consistent and Unixy:

    create_shared_buffer()   => shbuf_create()
    share_buffer_with()      => shbuf_allow_pid()
    share_buffer_globally()  => shbuf_allow_all()
    get_shared_buffer()      => shbuf_get()
    release_shared_buffer()  => shbuf_release()
    seal_shared_buffer()     => shbuf_seal()
    get_shared_buffer_size() => shbuf_get_size()

Also, "shared_buffer_id" is shortened to "shbuf_id" all around.
2020-02-28 12:55:58 +01:00
Liav A
db23703570 Process: Use dbg() instead of dbgprintf()
Also, fix a bad derefernce in sys$create_shared_buffer() method.
2020-02-27 13:05:12 +01:00
Andreas Kling
4997dcde06 Kernel: Always disable interrupts in do_killpg()
Will caught an assertion when running "kill 9999999999999" :^)
2020-02-27 11:05:16 +01:00
Andreas Kling
4a293e8a21 Kernel: Ignore signals sent to threadless (zombie) processes
If a process doesn't have any threads left, it's in a zombie state and
we can't meaningfully send signals to it. So just ignore them.

Fixes #1313.
2020-02-27 11:04:15 +01:00
Andreas Kling
0c1497846e Kernel: Don't allow profiling a dead process
Work towards #1313.
2020-02-27 10:42:31 +01:00
Cristian-Bogdan SIRB
05ce8586ea Kernel: Fix ASSERTION failed in join_thread syscall
set_interrupted_by_death was never called whenever a thread that had
a joiner died, so the joiner remained with the joinee pointer there,
resulting in an assertion fail in JoinBlocker: m_joinee pointed to
a freed task, filled with garbage.

Thread::current->m_joinee may not be valid after the unblock

Properly return the joinee exit value to the joiner thread.
2020-02-27 10:09:44 +01:00
Andreas Kling
d28fa89346 Kernel: Don't assert on sys$kill() with pid=INT32_MIN
On 32-bit platforms, INT32_MIN == -INT32_MIN, so we can't expect this
to always work:

    if (pid < 0)
        positive_pid = -pid; // may still be negative!

This happens because the -INT32_MIN expression becomes a long and is
then truncated back to an int.

Fixes #1312.
2020-02-27 10:02:04 +01:00
Cristian-Bogdan SIRB
717cd5015e Kernel: Allow process with multiple threads to call exec and exit
This allows a process wich has more than 1 thread to call exec, even
from a thread. This kills all the other threads, but it won't wait for
them to finish, just makes sure that they are not in a running/runable
state.

In the case where a thread does exec, the new program PID will be the
thread TID, to keep the PID == TID in the new process.

This introduces a new function inside the Process class,
kill_threads_except_self which is called on exit() too (exit with
multiple threads wasn't properly working either).

Inside the Lock class, there is the need for a new function,
clear_waiters, which removes all the waiters from the
Process::big_lock. This is needed since after a exit/exec, there should
be no other threads waiting for this lock, the threads should be simply
killed. Only queued threads should wait for this lock at this point,
since blocked threads are handled in set_should_die.
2020-02-26 13:06:40 +01:00
Andreas Kling
ceec1a7d38 AK: Make Vector use size_t for its size and capacity 2020-02-25 14:52:35 +01:00
Andreas Kling
d0f5b43c2e Kernel: Use Vector::unstable_remove() when deallocating a region
Process::m_regions is not sorted, so we can use unstable_remove()
to avoid shifting the vector contents. :^)
2020-02-24 18:34:49 +01:00
Andreas Kling
30a8991dbf Kernel: Make Region weakable and use WeakPtr<Region> instead of Region*
This turns use-after-free bugs into null pointer dereferences instead.
2020-02-24 13:32:45 +01:00
Andreas Kling
79576f9280 Kernel: Clear the region lookup cache on exec()
Each process has a 1-level lookup cache for fast repeated lookups of
the same VM region (which tends to be the majority of lookups.)
The cache is used by the following syscalls: munmap, madvise, mprotect
and set_mmap_name.

After a succesful exec(), there could be a stale Region* in the lookup
cache, and the new executable was able to manipulate it using a number
of use-after-free code paths.
2020-02-24 12:37:27 +01:00
Liav A
895e874eb4 Kernel: Include the new PIT class in system components 2020-02-24 11:27:03 +01:00
Andreas Kling
fc5ebe2a50 Kernel: Disown shared buffers on sys$execve()
When committing to a new executable, disown any shared buffers that the
process was previously co-owning.

Otherwise accessing the same shared buffer ID from the new program
would cause the kernel to find a cached (and stale!) reference to the
previous program's VM region corresponding to that shared buffer,
leading to a Region* use-after-free.

Fixes #1270.
2020-02-22 12:29:38 +01:00
Andreas Kling
ece2971112 Kernel: Disable profiling during the critical section of sys$execve()
Since we're gonna throw away these stacks at the end of exec anyway,
we might as well disable profiling before starting to mess with the
process page tables. One less weird situation to worry about in the
sampling code.
2020-02-22 11:09:03 +01:00
Andreas Kling
d7a13dbaa7 Kernel: Reset profiling state on exec() (but keep it going)
We now log the new executable on exec() and throw away all the samples
we've accumulated so far. But profiling keeps going.
2020-02-22 10:54:50 +01:00
Andreas Kling
2a679f228e Kernel: Fix bitrotted DEBUG_IO logging 2020-02-21 15:49:30 +01:00
Andreas Kling
bead20c40f Kernel: Remove SmapDisabler in sys$create_shared_buffer() 2020-02-18 14:12:39 +01:00
Andreas Kling
9aa234cc47 Kernel: Reset FPU state on exec() 2020-02-18 13:44:27 +01:00
Andreas Kling
a7dbb3cf96 Kernel: Use a FixedArray for a process's extra GIDs
There's not really enough of these to justify using a HashTable.
2020-02-18 11:35:47 +01:00
Andreas Kling
48f7c28a5c Kernel: Replace "current" with Thread::current and Process::current
Suggested by Sergey. The currently running Thread and Process are now
Thread::current and Process::current respectively. :^)
2020-02-17 15:04:27 +01:00
Andreas Kling
4f4af24b9d Kernel: Tear down process address space during finalization
Process teardown is divided into two main stages: finalize and reap.

Finalization happens in the "Finalizer" kernel and runs with interrupts
enabled, allowing destructors to take locks, etc.

Reaping happens either in sys$waitid() or in the scheduler for orphans.

The more work we can do in finalization, the better, since it's fully
pre-emptible and reduces the amount of time the system runs without
interrupts enabled.
2020-02-17 14:33:06 +01:00
Andreas Kling
31e1af732f Kernel+LibC: Allow sys$mmap() callers to specify address alignment
This is exposed via the non-standard serenity_mmap() call in userspace.
2020-02-16 12:55:56 +01:00
Andreas Kling
7a8be7f777 Kernel: Remove SmapDisabler in sys$accept() 2020-02-16 08:20:54 +01:00
Andreas Kling
7717084ac7 Kernel: Remove SmapDisabler in sys$clock_gettime() 2020-02-16 08:13:11 +01:00
Andreas Kling
16818322c5 Kernel: Reduce header dependencies of Process and Thread 2020-02-16 02:01:42 +01:00
Andreas Kling
e28809a996 Kernel: Add forward declaration header 2020-02-16 01:50:32 +01:00
Andreas Kling
1d611e4a11 Kernel: Reduce header dependencies of MemoryManager and Region 2020-02-16 01:33:41 +01:00
Andreas Kling
a356e48150 Kernel: Move all code into the Kernel namespace 2020-02-16 01:27:42 +01:00
Andreas Kling
1f55079488 Kernel: Remove SmapDisabler in sys$getgroups() 2020-02-16 00:30:00 +01:00
Andreas Kling
eb7b0c76a8 Kernel: Remove SmapDisabler in sys$setgroups() 2020-02-16 00:27:10 +01:00
Andreas Kling
0341ddc5eb Kernel: Rename RegisterDump => RegisterState 2020-02-16 00:15:37 +01:00
Andreas Kling
580a94bc44 Kernel+LibC: Merge sys$stat() and sys$lstat()
There is now only one sys$stat() instead of two separate syscalls.
2020-02-10 19:49:49 +01:00
Liav A
e559af2008 Kernel: Apply changes to use LibBareMetal definitions 2020-02-09 19:38:17 +01:00
Andreas Kling
7291370478 Kernel: Make File::truncate() take a u64
No point in taking a signed type here. We validate at the syscall layer
and then pass around a u64 from then on.
2020-02-08 12:07:04 +01:00
Andreas Kling
88ea152b24 Kernel: Merge unnecessary DiskDevice class into BlockDevice 2020-02-08 02:20:03 +01:00
Andreas Kling
2b0b7cc5a4 Net: Add a basic sys$shutdown() implementation
Calling shutdown prevents further reads and/or writes on a socket.
We should do a few more things based on the type of socket, but this
initial implementation just puts the basic mechanism in place.

Work towards #428.
2020-02-08 00:54:43 +01:00
Andreas Kling
f3a5985bb2 Kernel: Remove two bad FIXME's
We should absolutely *not* create a new thread in sys$exec().
There's also no sys$spawn() anymore.
2020-02-08 00:06:15 +01:00
Andreas Kling
d04fcccc90 Kernel: Truncate addresses stored by getsockname() and getpeername()
If there's not enough space in the output buffer for the whole sockaddr
we now simply truncate the address instead of returning EINVAL.

This patch also makes getpeername() actually return the peer address
rather than the local address.. :^)
2020-02-07 23:43:32 +01:00
Andreas Kling
dc18859695 Kernel: memset() all siginfo_t structs after creating them 2020-02-06 14:12:20 +01:00
Sergey Bugaev
1b866bbf42 Kernel: Fix sys$waitid(P_ALL, WNOHANG) return value
According to POSIX, waitid() should fill si_signo and si_pid members
with zeroes if there are no children that have already changed their
state by the time of the call. Let's just fill the whole structure
with zeroes to avoid leaking kernel memory.
2020-02-06 16:06:30 +03:00
Andreas Kling
75cb125e56 Kernel: Put sys$waitid() debug logging behind PROCESS_DEBUG 2020-02-05 19:14:56 +01:00
Sergey Bugaev
b3a24d732d Kernel+LibC: Add sys$waitid(), and make sys$waitpid() wrap it
sys$waitid() takes an explicit description of whether it's waiting for a single
process with the given PID, all of the children, a group, etc., and returns its
info as a siginfo_t.

It also doesn't automatically imply WEXITED, which clears up the confusion in
the kernel.
2020-02-05 18:14:37 +01:00
Andreas Kling
3879e5b9d4 Kernel: Start working on a syscall for logging performance events
This patch introduces sys$perf_event() with two event types:

- PERF_EVENT_MALLOC
- PERF_EVENT_FREE

After the first call to sys$perf_event(), a process will begin keeping
these events in a buffer. When the process dies, that buffer will be
written out to "perfcore" in the current directory unless that filename
is already taken.

This is probably not the best way to do this, but it's a start and will
make it possible to start doing memory allocation profiling. :^)
2020-02-02 20:26:27 +01:00
Andreas Kling
934b1d8a9b Kernel: Finalizer should not go back to sleep if there's more to do
Before putting itself back on the wait queue, the finalizer task will
now check if there's more work to do, and if so, do it first. :^)

This patch also puts a bunch of process/thread debug logging behind
PROCESS_DEBUG and THREAD_DEBUG since it was unbearable to debug this
stuff with all the spam.
2020-02-01 10:56:17 +01:00
Andreas Kling
6634da31d9 Kernel: Disallow empty ranges in munmap/mprotect/madvise 2020-01-30 21:55:49 +01:00
Andreas Kling
31d1c82621 Kernel: Reject non-user address ranges in mmap/munmap/mprotect/madvise
There's no valid reason to allow non-userspace address ranges in these
system calls.
2020-01-30 21:51:27 +01:00
Andreas Kling
afd2b5a53e Kernel: Copy "stack" and "mmap" bits when splitting a Region 2020-01-30 21:51:27 +01:00
Andreas Kling
c9e877a294 Kernel: Address validation helpers should take size_t, not ssize_t 2020-01-30 21:51:27 +01:00
Andreas Kling
c64904a483 Kernel: sys$readlink() should return the number of bytes written out 2020-01-27 21:50:51 +01:00
Andreas Kling
8b49804895 Kernel: sys$waitpid() only needs the waitee thread in the stopped case
If the waitee process is dead, we don't need to inspect the thread.

This fixes an issue with sys$waitpid() failing before reap() since
dead processes will have no remaining threads alive.
2020-01-27 21:21:48 +01:00
Andreas Kling
f4302b58fb Kernel: Remove SmapDisablers in sys$getsockname() and sys$getpeername()
Instead use the user/kernel copy helpers to only copy the minimum stuff
needed from to/from userspace.

Based on work started by Brian Gianforcaro.
2020-01-27 21:11:36 +01:00
Andreas Kling
5163c5cc63 Kernel: Expose the signal that stopped a thread via sys$waitpid() 2020-01-27 20:47:10 +01:00
Andreas Kling
638fe6f84a Kernel: Disable interrupts while looking into the thread table
There was a race window in a bunch of syscalls between calling
Thread::from_tid() and checking if the found thread was in the same
process as the calling thread.

If the found thread object was destroyed at that point, there was a
use-after-free that could be exploited by filling the kernel heap with
something that looked like a thread object.
2020-01-27 14:04:57 +01:00
Andreas Kling
c1f74bf327 Kernel: Never validate access to the kmalloc memory range
Memory validation is used to verify that user syscalls are allowed to
access a given memory range. Ring 0 threads never make syscalls, and
so will never end up in validation anyway.

The reason we were allowing kmalloc memory accesses is because kernel
thread stacks used to be allocated in kmalloc memory. Since that's no
longer the case, we can stop making exceptions for kmalloc in the
validation code.
2020-01-27 12:43:21 +01:00
Andreas Kling
137a45dff2 Kernel: read()/write() should respect timeouts when used on a sockets
Move timeout management to the ReadBlocker and WriteBlocker classes.
Also get rid of the specialized ReceiveBlocker since it no longer does
anything that ReadBlocker can't do.
2020-01-26 17:54:23 +01:00
Andreas Kling
b011857e4f Kernel: Make writev() work again
Vector::ensure_capacity() makes sure the underlying vector buffer can
contain all the data, but it doesn't update the Vector::size().

As a result, writev() would simply collect all the buffers to write,
and then do nothing.
2020-01-26 10:10:15 +01:00
Andreas Kling
b93f6b07c2 Kernel: Make sched_setparam() and sched_getparam() operate on threads
Instead of operating on "some random thread in PID", these now operate
on the thread with a specific TID. This matches other systems better.
2020-01-26 09:58:58 +01:00
Andreas Kling
f4e7aecec2 Kernel: Preserve CoW bits when splitting VM regions 2020-01-25 17:57:10 +01:00
Andreas Kling
7cc0b18f65 Kernel: Only open a single description for stdio in non-fork processes 2020-01-25 17:05:02 +01:00
Andreas Kling
81ddd2dae0 Kernel: Make sys$setsid() clear the calling process's controlling TTY 2020-01-25 14:53:48 +01:00
Andreas Kling
2bf11b8348 Kernel: Allow empty strings in validate_and_copy_string_from_user()
Sergey pointed out that we should just allow empty strings everywhere.
2020-01-25 14:14:11 +01:00
Andreas Kling
69de90a625 Kernel: Simplify Process constructor
Move all the fork-specific inheritance logic to sys$fork(), and all the
stuff for setting up stdio for non-fork ring 3 processes moves to
Process::create_user_process().

Also: we were setting up the PGID, SID and umask twice. Also the code
for copying the open file descriptors was overly complicated. Now it's
just a simple Vector copy assignment. :^)
2020-01-25 14:13:47 +01:00
Andreas Kling
0f5221568b Kernel: sys$execve() should not EFAULT for empty argument strings
It's okay to exec { "/bin/echo", "" } and it should not EFAULT.
2020-01-25 12:21:30 +01:00
Andreas Kling
30ad7953ca Kernel: Rename UnveilState to VeilState 2020-01-21 19:28:59 +01:00
Andreas Kling
f38cfb3562 Kernel: Tidy up debug logging a little bit
When using dbg() in the kernel, the output is automatically prefixed
with [Process(PID:TID)]. This makes it a lot easier to understand which
thread is generating the output.

This patch also cleans up some common logging messages and removes the
now-unnecessary "dbg() << *current << ..." pattern.
2020-01-21 16:16:20 +01:00
Andreas Kling
6081c76515 Kernel: Make O_RDONLY non-zero
Sergey suggested that having a non-zero O_RDONLY would make some things
less confusing, and it seems like he's right about that.

We can now easily check read/write permissions separately instead of
dancing around with the bits.

This patch also fixes unveil() validation for O_RDWR which previously
forgot to check for "r" permission.
2020-01-21 13:27:08 +01:00
Andreas Kling
1b3cac2f42 Kernel: Don't forget about unveiled paths with zero permissions
We need to keep these around, otherwise the calling process can remove
and re-add a path to increase its permissions.
2020-01-21 11:42:28 +01:00
Andreas Kling
22cfb1f3bd Kernel: Clear unveiled state on exec() 2020-01-21 10:46:31 +01:00
Andreas Kling
cf48c20170 Kernel: Forked children should inherit unveil()'ed paths 2020-01-21 09:44:32 +01:00
Andreas Kling
0569123ad7 Kernel: Add a basic implementation of unveil()
This syscall is a complement to pledge() and adds the same sort of
incremental relinquishing of capabilities for filesystem access.

The first call to unveil() will "drop a veil" on the process, and from
now on, only unveiled parts of the filesystem are visible to it.

Each call to unveil() specifies a path to either a directory or a file
along with permissions for that path. The permissions are a combination
of the following:

- r: Read access (like the "rpath" promise)
- w: Write access (like the "wpath" promise)
- x: Execute access
- c: Create/remove access (like the "cpath" promise)

Attempts to open a path that has not been unveiled with fail with
ENOENT. If the unveiled path lacks sufficient permissions, it will fail
with EACCES.

Like pledge(), subsequent calls to unveil() with the same path can only
remove permissions, not add them.

Once you call unveil(nullptr, nullptr), the veil is locked, and it's no
longer possible to unveil any more paths for the process, ever.

This concept comes from OpenBSD, and their implementation does various
things differently, I'm sure. This is just a first implementation for
SerenityOS, and we'll keep improving on it as we go. :^)
2020-01-20 22:12:04 +01:00
Andreas Kling
e901a3695a Kernel: Use the templated copy_to/from_user() in more places
These ensure that the "to" and "from" pointers have the same type,
and also that we copy the correct number of bytes.
2020-01-20 13:41:21 +01:00
Sergey Bugaev
d5426fcc88 Kernel: Misc tweaks 2020-01-20 13:26:06 +01:00
Sergey Bugaev
9bc6157998 Kernel: Return new fd from sys$fcntl(F_DUPFD)
This fixes GNU Bash getting confused after performing a redirection.
2020-01-20 13:26:06 +01:00
Andreas Kling
4b7a89911c Kernel: Remove some unnecessary casts to uintptr_t
VirtualAddress is constructible from uintptr_t and const void*.
PhysicalAddress is constructible from uintptr_t but not const void*.
2020-01-20 13:13:03 +01:00
Andreas Kling
a246e9cd7e Use uintptr_t instead of u32 when storing pointers as integers
uintptr_t is 32-bit or 64-bit depending on the target platform.
This will help us write pointer size agnostic code so that when the day
comes that we want to do a 64-bit port, we'll be in better shape.
2020-01-20 13:13:03 +01:00
Andreas Kling
8d9dd1b04b Kernel: Add a 1-deep cache to Process::region_from_range()
This simple cache gets hit over 70% of the time on "g++ Process.cpp"
and shaves ~3% off the runtime.
2020-01-19 16:44:37 +01:00
Andreas Kling
ae0c435e68 Kernel: Add a Process::add_region() helper
This is a private helper for adding a Region to Process::m_regions.
It's just for convenience since it's a bit cumbersome to do this.
2020-01-19 16:26:42 +01:00
Andreas Kling
1dc9fa9506 Kernel: Simplify PageDirectory swapping in sys$execve()
Swap out both the PageDirectory and the Region list at the same time,
instead of doing the Region list slightly later.
2020-01-19 16:05:42 +01:00
Andreas Kling
6eab7b398d Kernel: Make ProcessPagingScope restore CR3 properly
Instead of restoring CR3 to the current process's paging scope when a
ProcessPagingScope goes out of scope, we now restore exactly whatever
the CR3 value was when we created the ProcessPagingScope.

This fixes breakage in situations where a process ends up with nested
ProcessPagingScopes. This was making profiling very fragile, and with
this change it's now possible to profile g++! :^)
2020-01-19 13:44:53 +01:00
Andreas Kling
f7b394e9a1 Kernel: Assert that copy_to/from_user() are called with user addresses
This will panic the kernel immediately if these functions are misused
so we can catch it and fix the misuse.

This patch fixes a couple of misuses:

    - create_signal_trampolines() writes to a user-accessible page
      above the 3GB address mark. We should really get rid of this
      page but that's a whole other thing.

    - CoW faults need to use copy_from_user rather than copy_to_user
      since it's the *source* pointer that points to user memory.

    - Inode faults need to use memcpy rather than copy_to_user since
      we're copying a kernel stack buffer into a quickmapped page.

This should make the copy_to/from_user() functions slightly less useful
for exploitation. Before this, they were essentially just glorified
memcpy() with SMAP disabled. :^)
2020-01-19 09:18:55 +01:00
Andreas Kling
5ce9382e98 Kernel: Only require "stdio" pledge for sending signals to self
This should match what OpenBSD does. Sending a signal to yourself seems
basically harmless.
2020-01-19 08:50:55 +01:00
Sergey Bugaev
3e1ed38d4b Kernel: Do not return ENOENT for unresolved symbols
ENOENT means "no such file or directory", not "no such symbol". Return EINVAL
instead, as we already do in other cases.
2020-01-18 23:51:22 +01:00
Sergey Bugaev
d0d13e2bf5 Kernel: Move setting file flags and r/w mode to VFS::open()
Previously, VFS::open() would only use the passed flags for permission checking
purposes, and Process::sys$open() would set them on the created FileDescription
explicitly. Now, they should be set by VFS::open() on any files being opened,
including files that the kernel opens internally.

This also lets us get rid of the explicit check for whether or not the returned
FileDescription was a preopen fd, and in fact, fixes a bug where a read-only
preopen fd without any other flags would be considered freshly opened (due to
O_RDONLY being indistinguishable from 0) and granted a new set of flags.
2020-01-18 23:51:22 +01:00
Sergey Bugaev
544b8286da Kernel: Do not open stdio fds for kernel processes
Kernel processes just do not need them.

This also avoids touching the file (sub)system early in the boot process when
initializing the colonel process.
2020-01-18 23:51:22 +01:00
Sergey Bugaev
6466c3d750 Kernel: Pass correct permission flags when opening files
Right now, permission flags passed to VFS::open() are effectively ignored, but
that is going to change.

* O_RDONLY is 0, but it's still nicer to pass it explicitly
* POSIX says that binding a Unix socket to a symlink shall fail with EADDRINUSE
2020-01-18 23:51:22 +01:00
Andreas Kling
862b3ccb4e Kernel: Enforce W^X between sys$mmap() and sys$execve()
It's now an error to sys$mmap() a file as writable if it's currently
mapped executable by anyone else.

It's also an error to sys$execve() a file that's currently mapped
writable by anyone else.

This fixes a race condition vulnerability where one program could make
modifications to an executable while another process was in the kernel,
in the middle of exec'ing the same executable.

Test: Kernel/elf-execve-mmap-race.cpp
2020-01-18 23:40:12 +01:00
Andreas Kling
4e6fe3c14b Kernel: Symbolicate kernel EIP on process crash
Process::crash() was assuming that EIP was always inside the ELF binary
of the program, but it could also be in the kernel.
2020-01-18 14:38:39 +01:00
Andreas Kling
9c9fe62a4b Kernel: Validate the requested range in allocate_region_with_vmobject() 2020-01-18 14:37:22 +01:00
Andreas Kling
aa63de53bd Kernel: Use get_syscall_path_argument() in sys$execve()
Paths passed to sys$execve() should certainly be subject to all the
usual path validation checks.
2020-01-18 11:43:28 +01:00
Andreas Kling
b65572b3fe Kernel: Disallow mmap names longer than PATH_MAX 2020-01-18 11:34:53 +01:00
Andreas Kling
94ca55cefd Meta: Add license header to source files
As suggested by Joshua, this commit adds the 2-clause BSD license as a
comment block to the top of every source file.

For the first pass, I've just added myself for simplicity. I encourage
everyone to add themselves as copyright holders of any file they've
added or modified in some significant way. If I've added myself in
error somewhere, feel free to replace it with the appropriate copyright
holder instead.

Going forward, all new source files should include a license header.
2020-01-18 09:45:54 +01:00
Andreas Kling
19c31d1617 Kernel: Always dump kernel regions when dumping process regions 2020-01-18 08:57:18 +01:00
Sergey Bugaev
064cd2278c Kernel: Remove the use of FileSystemPath in sys$realpath()
Now that VFS::resolve_path() canonicalizes paths automatically, we don't need to
do that here anymore.
2020-01-17 21:49:58 +01:00
Sergey Bugaev
8642a7046c Kernel: Let inodes provide pre-open file descriptions
Some magical inodes, such as /proc/pid/fd/fileno, are going to want to open() to
a custom FileDescription, so add a hook for that.
2020-01-17 21:49:58 +01:00
Sergey Bugaev
e0013a6b4c Kernel+LibC: Unify sys$open() and sys$openat()
The syscall is now called sys$open(), but it behaves like the old sys$openat().
In userspace, open_with_path_length() is made a wrapper over openat_with_path_length().
2020-01-17 21:49:58 +01:00
Andreas Kling
4d4d5e1c07 Kernel: Drop futex queues/state on exec()
This state is not meaningful to the new process image so just drop it.
2020-01-17 16:08:00 +01:00
Andreas Kling
26a31c7efb Kernel: Add "accept" pledge promise for accepting incoming connections
This patch adds a new "accept" promise that allows you to call accept()
on an already listening socket. This lets programs set up a socket for
for listening and then dropping "inet" and/or "unix" so that only
incoming (and existing) connections are allowed from that point on.
No new outgoing connections or listening server sockets can be created.

In addition to accept() it also allows getsockopt() with SOL_SOCKET
and SO_PEERCRED, which is used to find the PID/UID/GID of the socket
peer. This is used by our IPC library when creating shared buffers that
should only be accessible to a specific peer process.

This allows us to drop "unix" in WindowServer and LookupServer. :^)

It also makes the debugging/introspection RPC sockets in CEventLoop
based programs work again.
2020-01-17 11:19:06 +01:00
Andreas Kling
c6e552ac8f Kernel+LibELF: Don't blindly trust ELF symbol offsets in symbolication
It was possible to craft a custom ELF executable that when symbolicated
would cause the kernel to read from user-controlled addresses anywhere
in memory. You could then fetch this memory via /proc/PID/stack

We fix this by making ELFImage hand out StringView rather than raw
const char* for symbol names. In case a symbol offset is outside the
ELF image, you get a null StringView. :^)

Test: Kernel/elf-symbolication-kernel-read-exploit.cpp
2020-01-16 22:11:31 +01:00
Andreas Kling
d79de38bd2 Kernel: Don't allow userspace to sys$open() literal symlinks
The O_NOFOLLOW_NOERROR is an internal kernel mechanism used for the
implementation of sys$readlink() and sys$lstat().

There is no reason to allow userspace to open symlinks directly.
2020-01-15 21:19:26 +01:00
Andreas Kling
e23536d682 Kernel: Use Vector::unstable_remove() in a couple of places 2020-01-15 19:26:41 +01:00
Liav A
d2b41010c5 Kernel: Change Region allocation helpers
We now can create a cacheable Region, so when map() is called, if a
Region is cacheable then all the virtual memory space being allocated
to it will be marked as not cache disabled.

In addition to that, OS components can create a Region that will be
mapped to a specific physical address by using the appropriate helper
method.
2020-01-14 15:38:58 +01:00
Andreas Kling
65cb406327 Kernel: Allow unlocking a held Lock with interrupts disabled
This is needed to eliminate a race in Thread::wait_on() where we'd
otherwise have to wait until after unlocking the process lock before
we can disable interrupts.
2020-01-13 18:56:46 +01:00
Andrew Kaster
7a7e7c82b5 Kernel: Tighten up exec/do_exec and allow for PT_INTERP iterpreters
This patch changes how exec() figures out which program image to
actually load. Previously, we opened the path to our main executable in
find_shebang_interpreter_for_executable, read the first page (or less,
if the file was smaller) and then decided whether to recurse with the
interpreter instead. We then then re-opened the main executable in
do_exec.

However, since we now want to parse the ELF header and Program Headers
of an elf image before even doing any memory region work, we can change
the way this whole process works. We open the file and read (up to) the
first page in exec() itself, then pass just the page and the amount read
to find_shebang_interpreter_for_executable. Since we now have that page
and the FileDescription for the main executable handy, we can do a few
things. First, validate the ELF header and ELF program headers for any
shenanigans. ELF32 Little Endian i386 only, please. Second, we can grab
the PT_INTERP interpreter from any ET_DYN files, and open that guy right
away if it exists. Finally, we can pass the main executable's and
optionally the PT_INTERP interpreter's file descriptions down to do_exec
and not have to feel guilty about opening the file twice.

In do_exec, we now have a choice. Are we going to load the main
executable, or the interpreter? We could load both, but it'll be way
easier for the inital pass on the RTLD if we only load the interpreter.
Then it can load the main executable itself like any old shared object,
just, the one with main in it :). Later on we can load both of them
into memory and the RTLD can relocate itself before trying to do
anything. The way it's written now the RTLD will get dibs on its
requested virtual addresses being the actual virtual addresses.
2020-01-13 13:03:30 +01:00
Brian Gianforcaro
4cee441279 Kernel: Combine validate and copy of user mode pointers (#1069)
Right now there is a significant amount of boiler plate code required
to validate user mode parameters in syscalls. In an attempt to reduce
this a bit, introduce validate_read_and_copy_typed which combines the
usermode address check and does the copy internally if the validation
passes. This cleans up a little bit of code from a significant amount
of syscalls.
2020-01-13 11:19:17 +01:00
Brian Gianforcaro
9cac205d67 Kernel: Fix SMAP in setkeymap syscall
It looks like setkeymap was missed when
the SMAP functionality was introduced.

Disable SMAP only in the scope where we
actually read the usermode addresses.
2020-01-13 11:17:10 +01:00
Brian Gianforcaro
02704a73e9 Kernel: Use the templated copy_from_user where possible
Now that the templated version of copy_from_user exists
their is normally no reason to use the version which
takes the number of bytes to copy. Move to the templated
version where possible.
2020-01-13 11:07:39 +01:00
Andreas Kling
20b2bfcafd Kernel: Fix SMAP violation in sys$getrandom() 2020-01-12 20:10:53 +01:00
Sergey Bugaev
33c0dc08a7 Kernel: Don't forget to copy & destroy root_directory_for_procfs
Also, rename it to root_directory_relative_to_global_root.
2020-01-12 20:02:11 +01:00
Sergey Bugaev
dd54d13d8d Kernel+LibC: Allow passing mount flags to chroot()
Since a chroot is in many ways similar to a separate root mount, we can also
apply mount flags to it as if it was an actual mount. These flags will apply
whenever the chrooted process accesses its root directory, but not when other
processes access this same directory for the outside. Since it's common to
chdir("/") immediately after chrooting (so that files accessed through the
current directory inherit the same mount flags), this effectively allows one to
apply additional limitations to a process confined inside a chroot.

To this effect, sys$chroot() gains a mount_flags argument (exposed as
chroot_with_mount_flags() in userspace) which can be set to all the same values
as the flags argument for sys$mount(), and additionally to -1 to keep the flags
set for that file system. Note that passing 0 as mount_flags will unset any
flags that may have been set for the file system, not keep them.
2020-01-12 20:02:11 +01:00
Sergey Bugaev
93ff911473 Kernel: Properly propagate bind mount flags
Previously, when performing a bind mount flags other than MS_BIND were ignored.
Now, they're properly propagated the same way a for any other mount.
2020-01-12 20:02:11 +01:00
Sergey Bugaev
b620ed25ab Kernel: Simplify Ext2FS mount code path
Instead of looking up device metadata and then looking up a device by that
metadata explicitly, just use VFS::open(). This also means that attempting to
mount a device residing on a MS_NODEV file system will properly fail.
2020-01-12 20:02:11 +01:00
Sergey Bugaev
35b0f10f20 Kernel: Don't dump backtrace on successful exits
This was getting really annoying.
2020-01-12 20:02:11 +01:00
Andreas Kling
d1839ae0c9 Kernel: Clearing promises with pledge("") should fail
Thanks Sergey for catching this brain-fart. :^)
2020-01-12 12:16:17 +01:00
Andreas Kling
114a770c6f Kernel: Reduce pledge requirement for recvfrom()+sendto() to "stdio"
Since these only operate on already-open sockets, we should treat them
the same as we do read() and write() by putting them into "stdio".
2020-01-12 11:52:37 +01:00
Andreas Kling
955034e86e Kernel: Remove manual STAC/CLAC in create_thread() 2020-01-12 11:51:31 +01:00
Andreas Kling
a6cef2408c Kernel: Add sigreturn() to "stdio" with all the other signal syscalls 2020-01-12 10:32:56 +01:00
Andreas Kling
7b53699e6f Kernel: Require the "thread" pledge promise for futex() 2020-01-12 10:31:21 +01:00
Andreas Kling
c32d65ae9f Kernel: Put some more syscalls in the "stdio" bucket
yield() and get_kernel_info_page() seem like decent fits for "stdio".
2020-01-12 10:31:21 +01:00
Andreas Kling
ca609ce5a3 Kernel: Put fcntl() debug spam behind DEBUG_IO 2020-01-12 10:01:22 +01:00
Andreas Kling
017b34e1ad Kernel: Add "video" pledge for accessing framebuffer devices
WindowServer becomes the only user.
2020-01-12 02:18:30 +01:00
Andreas Kling
f187374c1b Kernel: fork()ed children should inherit pledge promises :^)
Update various places that now need wider promises as they are not
reset by fork() anymore.
2020-01-11 23:28:41 +01:00
Andreas Kling
409a4f7756 ping: Use pledge() 2020-01-11 20:48:43 +01:00
Sergey Bugaev
0cb0f54783 Kernel: Implement bind mounts
You can now bind-mount files and directories. This essentially exposes an
existing part of the file system in another place, and can be used as an
alternative to symlinks or hardlinks.

Here's an example of doing this:

    # mkdir /tmp/foo
    # mount /home/anon/myfile.txt /tmp/foo -o bind
    # cat /tmp/foo
    This is anon's file.
2020-01-11 18:57:53 +01:00
Sergey Bugaev
61c1106d9f Kernel+LibC: Implement a few mount flags
We now support these mount flags:
* MS_NODEV: disallow opening any devices from this file system
* MS_NOEXEC: disallow executing any executables from this file system
* MS_NOSUID: ignore set-user-id bits on executables from this file system

The fourth flag, MS_BIND, is defined, but currently ignored.
2020-01-11 18:57:53 +01:00
Sergey Bugaev
2fcbb846fb Kernel+LibC: Add O_EXEC, move exec permission checking to VFS::open()
O_EXEC is mentioned by POSIX, so let's have it. Currently, it is only used
inside the kernel to ensure the process has the right permissions when opening
an executable.
2020-01-11 18:57:53 +01:00
Sergey Bugaev
4566c2d811 Kernel+LibC: Add support for mount flags
At the moment, the actual flags are ignored, but we correctly propagate them all
the way from the original mount() syscall to each custody that resides on the
mounted FS.
2020-01-11 18:57:53 +01:00
Andreas Kling
83f59419cd Kernel: Oops, recvfrom() is not quite ready for SMAP protections yet 2020-01-11 13:03:44 +01:00
Andreas Kling
24c736b0e7 Kernel: Use the Syscall string and buffer types more
While I was updating syscalls to stop passing null-terminated strings,
I added some helpful struct types:

    - StringArgument { const char*; size_t; }
    - ImmutableBuffer<Data, Size> { const Data*; Size; }
    - MutableBuffer<Data, Size> { Data*; Size; }

The Process class has some convenience functions for validating and
optionally extracting the contents from these structs:

    - get_syscall_path_argument(StringArgument)
    - validate_and_copy_string_from_user(StringArgument)
    - validate(ImmutableBuffer)
    - validate(MutableBuffer)

There's still so much code around this and I'm wondering if we should
generate most of it instead. Possible nice little project.
2020-01-11 12:47:47 +01:00
Andreas Kling
1434f30f92 Kernel: Remove SmapDisabler in bind() 2020-01-11 12:07:45 +01:00
Andreas Kling
2d7ae42f75 Kernel: Remove SmapDisabler in clock_nanosleep() 2020-01-11 11:51:03 +01:00
Andreas Kling
0ca6d6c8d2 Kernel: Remove validate_read_str() as nothing uses it anymore :^) 2020-01-11 10:57:50 +01:00
Andreas Kling
f5092b1c7e Kernel: Pass a parameter struct to mount()
This was the last remaining syscall that took a null-terminated string
and figured out how long it was by walking it in kernelspace *shudder*.
2020-01-11 10:56:02 +01:00
Andreas Kling
e380142853 Kernel: Pass a parameter struct to rename() 2020-01-11 10:36:54 +01:00
Andreas Kling
46830a0c32 Kernel: Pass a parameter struct to symlink() 2020-01-11 10:31:33 +01:00
Andreas Kling
c97bfbd609 Kernel: Pass a parameter struct to mknod() 2020-01-11 10:27:37 +01:00
Andreas Kling
6536a80aa9 Kernel: Pass a parameter struct to chown() 2020-01-11 10:17:44 +01:00
Andreas Kling
29b3d95004 Kernel: Expose a process's filesystem root as a /proc/PID/root symlink
In order to preserve the absolute path of the process root, we save the
custody used by chroot() before stripping it to become the new "/".
There's probably a better way to do this.
2020-01-10 23:48:44 +01:00
Andreas Kling
ddd0b19281 Kernel: Add a basic chroot() syscall :^)
The chroot() syscall now allows the superuser to isolate a process into
a specific subtree of the filesystem. This is not strictly permanent,
as it is also possible for a superuser to break *out* of a chroot, but
it is a useful mechanism for isolating unprivileged processes.

The VFS now uses the current process's root_directory() as the root for
path resolution purposes. The root directory is stored as an uncached
Custody in the Process object.
2020-01-10 23:14:04 +01:00
Andreas Kling
485443bfca Kernel: Pass characters+length to link() 2020-01-10 21:26:47 +01:00
Andreas Kling
416c7ac2b5 Kernel: Rename Syscall::SyscallString => Syscall::StringArgument 2020-01-10 20:16:18 +01:00
Andreas Kling
0695ff8282 Kernel: Pass characters+length to readlink()
Note that I'm developing some helper types in the Syscall namespace as
I go here. Once I settle on some nice types, I will convert all the
other syscalls to use them as well.
2020-01-10 20:13:23 +01:00
Andreas Kling
8c5cd97b45 Kernel: Fix kernel null deref on process crash during join_thread()
The join_thread() syscall is not supposed to be interruptible by
signals, but it was. And since the process death mechanism piggybacked
on signal interrupts, it was possible to interrupt a pthread_join() by
killing the process that was doing it, leading to confusing due to some
assumptions being made by Thread::finalize() for threads that have a
pending joiner.

This patch fixes the issue by making "interrupted by death" a distinct
block result separate from "interrupted by signal". Then we handle that
state in join_thread() and tidy things up so that thread finalization
doesn't get confused by the pending joiner being gone.

Test: Tests/Kernel/null-deref-crash-during-pthread_join.cpp
2020-01-10 19:23:45 +01:00
Andreas Kling
de69f84868 Kernel: Remove SmapDisablers in fchmod() and fchown() 2020-01-10 14:20:14 +01:00
Andreas Kling
952bb95baa Kernel: Enable SMAP protection during the execve() syscall
The userspace execve() wrapper now measures all the strings and puts
them in a neat and tidy structure on the stack.

This way we know exactly how much to copy in the kernel, and we don't
have to use the SMAP-violating validate_read_str(). :^)
2020-01-10 12:20:36 +01:00
Andreas Kling
197e73ee31 Kernel+LibELF: Enable SMAP protection during non-syscall exec()
When loading a new executable, we now map the ELF image in kernel-only
memory and parse it there. Then we use copy_to_user() when initializing
writable regions with data from the executable.

Note that the exec() syscall still disables SMAP protection and will
require additional work. This patch only affects kernel-originated
process spawns.
2020-01-10 10:57:06 +01:00
Andreas Kling
ff16298b44 Kernel: Removed an unused global variable 2020-01-09 18:02:37 +01:00