Commit Graph

228 Commits

Author SHA1 Message Date
Andreas Kling
f020081a38 Kernel: Put "Couldn't find user region" spam behind MM_DEBUG
This basically never tells us anything actionable anyway, and it's a
real annoyance when doing something validation-heavy like profiling.
2020-02-22 10:09:54 +01:00
Andreas Kling
b298c01e92 Kernel: Log instead of crashing when getting a page fault during IRQ
This is definitely a bug, but it seems to happen randomly every now
and then and we need more info to track it down, so let's log for now.
2020-02-21 19:05:45 +01:00
Andreas Kling
04e40da188 Kernel: Fix crash when reading /proc/PID/vmobjects
InodeVMObjects can have nulled-out physical page slots. That just means
we haven't cached that page from disk right now.
2020-02-21 16:03:56 +01:00
Andreas Kling
59b9e49bcd Kernel: Don't trigger page faults during profiling stack walk
The kernel sampling profiler will walk thread stacks during the timer
tick handler. Since it's not safe to trigger page faults during IRQ's,
we now avoid this by checking the page tables manually before accessing
each stack location.
2020-02-21 15:49:39 +01:00
Andreas Kling
d46071c08f Kernel: Assert on page fault during IRQ
We're not equipped to deal with page faults during an IRQ handler,
so add an assertion so we can immediately tell what's wrong.

This is why profiling sometimes hangs the system -- walking the stack
of the profiled thread causes a page fault and things fall apart.
2020-02-21 15:49:34 +01:00
Andreas Kling
a87544fe8b Kernel: Refuse to allocate 0 bytes of virtual address space 2020-02-19 22:19:55 +01:00
Andreas Kling
f17c377a0c Kernel: Use bitfields in Region
This makes Region 4 bytes smaller and we can use bitfield initializers
since they are allowed in C++20. :^)
2020-02-19 12:03:11 +01:00
Andreas Kling
4b16ac0034 Kernel: Purging a page should point it back to the shared zero page
Anonymous VM objects should never have null entries in their physical
page list. Instead, "empty" or untouched pages should refer to the
shared zero page.

Fixes #1237.
2020-02-18 09:56:11 +01:00
Andreas Kling
48f7c28a5c Kernel: Replace "current" with Thread::current and Process::current
Suggested by Sergey. The currently running Thread and Process are now
Thread::current and Process::current respectively. :^)
2020-02-17 15:04:27 +01:00
Andreas Kling
31e1af732f Kernel+LibC: Allow sys$mmap() callers to specify address alignment
This is exposed via the non-standard serenity_mmap() call in userspace.
2020-02-16 12:55:56 +01:00
Andreas Kling
7533d61458 Kernel: Fix weird whitespace mistake in RangeAllocator 2020-02-16 08:01:33 +01:00
Andreas Kling
635ae70b8f Kernel: More header dependency reduction work 2020-02-16 02:15:33 +01:00
Andreas Kling
e28809a996 Kernel: Add forward declaration header 2020-02-16 01:50:32 +01:00
Andreas Kling
1d611e4a11 Kernel: Reduce header dependencies of MemoryManager and Region 2020-02-16 01:33:41 +01:00
Andreas Kling
a356e48150 Kernel: Move all code into the Kernel namespace 2020-02-16 01:27:42 +01:00
Andreas Kling
5507945306 Kernel: Widen PhysicalPage refcount to 32 bits
A 16-bit refcount is just begging for trouble right nowl.
A 32-bit refcount will be begging for trouble later down the line,
so we'll have to revisit this eventually. :^)
2020-02-15 22:34:48 +01:00
Andreas Kling
c624d3875e Kernel: Use a shared physical page for zero-filled pages until written
This patch adds a globally shared zero-filled PhysicalPage that will
be mapped into every slot of every zero-filled AnonymousVMObject until
that page is written to, achieving CoW-like zero-filled pages.

Initial testing show that this doesn't actually achieve any sharing yet
but it seems like a good design regardless, since it may reduce the
number of page faults taken by programs.

If you look at the refcount of MM.shared_zero_page() it will have quite
a high refcount, but that's just because everything maps it everywhere.
If you want to see the "real" refcount, you can build with the
MAP_SHARED_ZERO_PAGE_LAZILY flag, and we'll defer mapping of the shared
zero page until the first NP read fault.

I've left this behavior behind a flag for future testing of this code.
2020-02-15 13:17:40 +01:00
Andreas Kling
27f0102bbe Kernel: Add getter and setter for the X86 CR3 register
This gets rid of a bunch of inline assembly.
2020-02-10 20:00:32 +01:00
Andreas Kling
ccfee3e573 Kernel: Remove more <LibBareMetal/Output/kstdio.h> includes 2020-02-10 12:07:48 +01:00
Andreas Kling
6cbd72f54f AK: Remove bitrotted Traits::dump() mechanism
This was only used by HashTable::dump() which I used when doing the
first HashTable implementation. Removing this allows us to also remove
most includes of <AK/kstdio.h>.
2020-02-10 11:55:34 +01:00
Liav A
99ea80695e Kernel: Use VirtualAddress & PhysicalAddress classes from LibBareMetal 2020-02-09 19:38:17 +01:00
Liav A
e559af2008 Kernel: Apply changes to use LibBareMetal definitions 2020-02-09 19:38:17 +01:00
Andreas Kling
00d8ec3ead Kernel: The inode fault handler should grab the VMObject lock earlier
It doesn't look healthy to create raw references into an array before
a temporary unlock. In fact, that temporary unlock looks generally
unhealthy, but it's a different problem.
2020-02-08 12:55:21 +01:00
Andreas Kling
a9d7902bb7 x86: Simplify region unmapping a bit
Add PageTableEntry::clear() to zero out a whole PTE, and use that for
unmapping instead of clearing individual fields.
2020-02-08 12:49:38 +01:00
Andreas Kling
f91b3aab47 Kernel: Cloned shared regions should also be marked as shared 2020-02-08 02:39:46 +01:00
Andreas Kling
bf5b7c32d8 Kernel: Add some sanity assertions in RangeAllocator::deallocate()
We should never end up deallocating an empty range, or a range that
ends before it begins.
2020-01-30 21:51:27 +01:00
Andreas Kling
31a141bd10 Kernel: Range::contains() should reject ranges with 2^32 wrap-around 2020-01-30 21:51:27 +01:00
Andreas Kling
a27c5d2fb7 Kernel: Fail with EFAULT for any address+size that would wrap around
Previously we were only checking that each of the virtual pages in the
specified range were valid.

This made it possible to pass in negative buffer sizes to some syscalls
as long as (address) and (address+size) were on the same page.
2020-01-29 12:56:07 +01:00
Andreas Kling
c17f80e720 Kernel: AnonymousVMObject::create_for_physical_range() should fail more
Previously it was not possible for this function to fail. You could
exploit this by triggering the creation of a VMObject whose physical
memory range would wrap around the 32-bit limit.

It was quite easy to map kernel memory into userspace and read/write
whatever you wanted in it.

Test: Kernel/bxvga-mmap-kernel-into-userspace.cpp
2020-01-28 20:48:07 +01:00
Andreas Kling
8131875da6 Kernel: Remove outdated comment in MemoryManager
Regions *do* zero-fill on demand now. :^)
2020-01-28 10:28:04 +01:00
Andreas Kling
3de5439579 AK: Let's call decrementing reference counts "unref" instead of "deref"
It always bothered me that we're using the overloaded "dereference"
term for this. Let's call it "unreference" instead. :^)
2020-01-23 15:14:21 +01:00
Andreas Kling
f38cfb3562 Kernel: Tidy up debug logging a little bit
When using dbg() in the kernel, the output is automatically prefixed
with [Process(PID:TID)]. This makes it a lot easier to understand which
thread is generating the output.

This patch also cleans up some common logging messages and removes the
now-unnecessary "dbg() << *current << ..." pattern.
2020-01-21 16:16:20 +01:00
Liav A
200a5b0649 Kernel: Remove map_for_kernel() in MemoryManager
We don't need to have this method anymore. It was a hack that was used
in many components in the system but currently we use better methods to
create virtual memory mappings. To prevent any further use of this
method it's best to just remove it completely.

Also, the APIC code is disabled for now since it doesn't help booting
the system, and is broken since it relies on identity mapping to exist
in the first 1MB. Any call to the APIC code will result in assertion
failed.

In addition to that, the name of the method which is responsible to
create an identity mapping between 1MB to 2MB was changed, to be more
precise about its purpose.
2020-01-21 11:29:58 +01:00
Andreas Kling
a0b716cfc5 Add AnonymousVMObject::create_with_physical_page()
This can be used to create a VMObject for a single PhysicalPage.
2020-01-20 13:13:03 +01:00
Andreas Kling
4ebff10bde Kernel: Write-only regions should still be mapped as present
There is no real "read protection" on x86, so we have no choice but to
map write-only pages simply as "present & read/write".

If we get a read page fault in a non-readable region, that's still a
correctness issue, so we crash the process. It's by no means a complete
protection against invalid reads, since it's trivial to fool the kernel
by first causing a write fault in the same region.
2020-01-20 13:13:03 +01:00
Andreas Kling
4b7a89911c Kernel: Remove some unnecessary casts to uintptr_t
VirtualAddress is constructible from uintptr_t and const void*.
PhysicalAddress is constructible from uintptr_t but not const void*.
2020-01-20 13:13:03 +01:00
Andreas Kling
a246e9cd7e Use uintptr_t instead of u32 when storing pointers as integers
uintptr_t is 32-bit or 64-bit depending on the target platform.
This will help us write pointer size agnostic code so that when the day
comes that we want to do a 64-bit port, we'll be in better shape.
2020-01-20 13:13:03 +01:00
Andreas Kling
05836757c6 Kernel: Oops, fix bad sort order of available VM ranges
This made the allocator perform worse, so here's another second off of
the Kernel/Process.cpp compile time from a simple bugfix! (31s to 30s)
2020-01-19 15:53:43 +01:00
Andreas Kling
6eab7b398d Kernel: Make ProcessPagingScope restore CR3 properly
Instead of restoring CR3 to the current process's paging scope when a
ProcessPagingScope goes out of scope, we now restore exactly whatever
the CR3 value was when we created the ProcessPagingScope.

This fixes breakage in situations where a process ends up with nested
ProcessPagingScopes. This was making profiling very fragile, and with
this change it's now possible to profile g++! :^)
2020-01-19 13:44:53 +01:00
Andreas Kling
ad3f931707 Kernel: Optimize VM range deallocation a bit
Previously, when deallocating a range of VM, we would sort and merge
the range list. This was quite slow for large processes.

This patch optimizes VM deallocation in the following ways:

- Use binary search instead of linear scan to find the place to insert
  the deallocated range.

- Insert at the right place immediately, removing the need to sort.

- Merge the inserted range with any adjacent range(s) in-line instead
  of doing a separate merge pass into a list copy.

- Add Traits<Range> to inform Vector that Range objects are trivial
  and can be moved using memmove().

I've also added an assertion that deallocated ranges are actually part
of the RangeAllocator's initial address range.

I've benchmarked this using g++ to compile Kernel/Process.cpp.
With these changes, compilation goes from ~41 sec to ~35 sec.
2020-01-19 13:29:59 +01:00
Andreas Kling
f7b394e9a1 Kernel: Assert that copy_to/from_user() are called with user addresses
This will panic the kernel immediately if these functions are misused
so we can catch it and fix the misuse.

This patch fixes a couple of misuses:

    - create_signal_trampolines() writes to a user-accessible page
      above the 3GB address mark. We should really get rid of this
      page but that's a whole other thing.

    - CoW faults need to use copy_from_user rather than copy_to_user
      since it's the *source* pointer that points to user memory.

    - Inode faults need to use memcpy rather than copy_to_user since
      we're copying a kernel stack buffer into a quickmapped page.

This should make the copy_to/from_user() functions slightly less useful
for exploitation. Before this, they were essentially just glorified
memcpy() with SMAP disabled. :^)
2020-01-19 09:18:55 +01:00
Andreas Kling
2cd212e5df Kernel: Let's say that everything < 3GB is user virtual memory
Technically the bottom 2MB is still identity-mapped for the kernel and
not made available to userspace at all, but for simplicity's sake we
can just ignore that and make "address < 0xc0000000" the canonical
check for user/kernel.
2020-01-19 08:58:33 +01:00
Andreas Kling
862b3ccb4e Kernel: Enforce W^X between sys$mmap() and sys$execve()
It's now an error to sys$mmap() a file as writable if it's currently
mapped executable by anyone else.

It's also an error to sys$execve() a file that's currently mapped
writable by anyone else.

This fixes a race condition vulnerability where one program could make
modifications to an executable while another process was in the kernel,
in the middle of exec'ing the same executable.

Test: Kernel/elf-execve-mmap-race.cpp
2020-01-18 23:40:12 +01:00
Andreas Kling
6fea316611 Kernel: Move all CPU feature initialization into cpu_setup()
..and do it very very early in boot.
2020-01-18 10:11:29 +01:00
Andreas Kling
94ca55cefd Meta: Add license header to source files
As suggested by Joshua, this commit adds the 2-clause BSD license as a
comment block to the top of every source file.

For the first pass, I've just added myself for simplicity. I encourage
everyone to add themselves as copyright holders of any file they've
added or modified in some significant way. If I've added myself in
error somewhere, feel free to replace it with the appropriate copyright
holder instead.

Going forward, all new source files should include a license header.
2020-01-18 09:45:54 +01:00
Andreas Kling
19c31d1617 Kernel: Always dump kernel regions when dumping process regions 2020-01-18 08:57:18 +01:00
Andreas Kling
345f92d5ac Kernel: Remove two unused MemoryManager functions 2020-01-18 08:57:18 +01:00
Andreas Kling
3e8b60c618 Kernel: Clean up MemoryManager initialization a bit more
Move the CPU feature enabling to functions in Arch/i386/CPU.cpp.
2020-01-18 00:28:16 +01:00
Andreas Kling
a850a89c1b Kernel: Add a random offset to the base of the per-process VM allocator
This is not ASLR, but it does de-trivialize exploiting the ELF loader
which would previously always parse executables at 0x01001000 in every
single exec(). I've taken advantage of this multiple times in my own
toy exploits and it's starting to feel cheesy. :^)
2020-01-17 23:29:54 +01:00
Andreas Kling
536c0ff3ee Kernel: Only clone the bottom 2MB of mappings from kernel to processes 2020-01-17 22:34:36 +01:00