Commit Graph

475 Commits

Author SHA1 Message Date
Tom
30d36a3ad1 Kernel: Remove assertion from Region::commit
We should be able to gracefully fail a commit in low-memory situations.
2020-09-01 22:08:43 +02:00
Tom
eb1cc5d665 Kernel: Only remap regions if memory was purged from them 2020-09-01 22:08:43 +02:00
Andreas Kling
171868e4f7 Kernel: Preserve internal state in cloned PurgeableVMObjects
When cloning a purgeable memory region (which happens on fork),
we need to preserve the "was purged" and "volatile" state of the
original region, or they will always appear as non-volatile and
unpurged regions in the child process.

Fixes #3374.
2020-09-01 17:45:28 +02:00
Andreas Kling
cc5403f77b Kernel: Remove unused variable PhysicalRegion::m_last 2020-08-30 13:13:55 +02:00
Tom
4b66692a55 Kernel: Make Heap implementation reusable, and make kmalloc expandable
Add an ExpandableHeap and switch kmalloc to use it, which allows
for the kmalloc heap to grow as needed.

In order to make heap expansion to work, we keep around a 1 MiB backup
memory region, because creating a region would require space in the
same heap. This means, the heap will grow as soon as the reported
utilization is less than 1 MiB. It will also return memory if an entire
subheap is no longer needed, although that is rarely possible.
2020-08-30 11:39:38 +02:00
Ben Wiederhake
081bb29626 Kernel: Unbreak building with extra debug macros, part 2 2020-08-30 09:43:49 +02:00
Tom
67dbb56444 Kernel: Release page tables when no longer needed
When unmapping regions, check if page tables can be freed.

This is a follow-up change for #3254.
2020-08-28 09:21:24 +02:00
Tom
dd83f6a266 Kernel: Fix losing PTEs
We can't use a HashMap with a small key that doesn't guarantee
collisions. Change it to a HashTable instead.

Fixes #3254
2020-08-26 01:25:06 +02:00
Tom
bcbe2fe525 Kernel: Protect looping over VMObject regions
We need to hold the memory manager lock so nobody else can modify
these lists while we're iterating them.
2020-08-26 01:25:06 +02:00
Tom
d89582880e Kernel: Switch singletons to use new Singleton class
MemoryManager cannot use the Singleton class because
MemoryManager::initialize is called before the global constructors
are run. That caused the Singleton to be re-initialized, causing
it to create another MemoryManager instance.

Fixes #3226
2020-08-25 09:48:48 +02:00
Tom
ba6e4fb77f Kernel: Fix kmalloc memory corruption
Rather than hardcoding where the kmalloc pool should be, place
it at the end of the kernel image instead. This avoids corrupting
global variables or other parts of the kernel as it grows.

Fixes #3257
2020-08-25 09:48:48 +02:00
Tom
08a569fbe0 Kernel: Make PhysicalPage not movable and use atomic ref counting
We should not be moving ref-counted objects.
2020-08-25 09:48:48 +02:00
Andreas Kling
2fd9e72264 Revert "Kernel: Switch singletons to use new Singleton class"
This reverts commit f48feae0b2.
2020-08-22 18:01:59 +02:00
Andreas Kling
8925ad3fa0 Revert "Kernel: Move Singleton class to AK"
This reverts commit f0906250a1.
2020-08-22 16:34:49 +02:00
Andreas Kling
b0a24a83be Revert "Kernel: Fix regression where MemoryManager is initialized twice"
This reverts commit 8a75e0b892.
2020-08-22 16:34:15 +02:00
Andreas Kling
68580d5a8d Revert "AK: Get rid of make_singleton function"
This reverts commit 5a98e329d1.
2020-08-22 16:34:14 +02:00
Andreas Kling
0db7e04c2e Revert "Kernel: Make PhysicalPage not movable and use atomic ref counting"
This reverts commit a89ccd842b.
2020-08-22 16:34:11 +02:00
Tom
a89ccd842b Kernel: Make PhysicalPage not movable and use atomic ref counting
We should not be moving ref-counted objects.
2020-08-22 10:46:24 +02:00
Tom
5a98e329d1 AK: Get rid of make_singleton function
Just default the InitFunction template argument.
2020-08-22 10:46:24 +02:00
Tom
8a75e0b892 Kernel: Fix regression where MemoryManager is initialized twice
MemoryManager cannot use the Singleton class because
MemoryManager::initialize is called before the global constructors
are run. That caused the Singleton to be re-initialized, causing
it to create another MemoryManager instance.
2020-08-22 10:46:24 +02:00
Tom
f0906250a1 Kernel: Move Singleton class to AK 2020-08-22 10:46:24 +02:00
Tom
cf8ce839da Kernel: Fix assertion when releasing contiguous memory region
There is no guarantee that the memory manager lock is held when
physical pages are released, so just acquire the memory manager
lock.
2020-08-21 12:03:20 +02:00
Tom
f48feae0b2 Kernel: Switch singletons to use new Singleton class
Fixes #3226
2020-08-21 11:47:35 +02:00
Nico Weber
b31aa2e918 Kernel: Switch a comment to GiB 2020-08-16 16:33:28 +02:00
Nico Weber
430b265cd4 AK: Rename KB, MB, GB to KiB, MiB, GiB
The SI prefixes "k", "M", "G" mean "10^3", "10^6", "10^9".
The IEC prefixes "Ki", "Mi", "Gi" mean "2^10", "2^20", "2^30".

Let's use the correct name, at least in code.

Only changes the name of the constants, no other behavior change.
2020-08-16 16:33:28 +02:00
Nico Weber
6e1d6d1ff5 Kernel: Don't request a random u32 when all but 5 bits are immediately masked off 2020-08-13 18:52:24 +02:00
Muhammad Zahalqa
615ba0f368
AK: Fix overflow and mixed-signedness issues in binary_search() (#2961) 2020-08-02 21:10:35 +02:00
Andreas Kling
be7add690d Kernel: Rename region_from_foo() => find_region_from_foo()
Let's emphasize that these functions actually go out and find regions.
2020-07-30 23:52:28 +02:00
Andreas Kling
949aef4aef Kernel: Move syscall implementations out of Process.cpp
This is something I've been meaning to do for a long time, and here we
finally go. This patch moves all sys$foo functions out of Process.cpp
and into files in Kernel/Syscalls/.

It's not exactly one syscall per file (although it could be, but I got
a bit tired of the repetitive work here..)

This makes hacking on individual syscalls a lot less painful since you
don't have to rebuild nearly as much code every time. I'm also hopeful
that this makes it easier to understand individual syscalls. :^)
2020-07-30 23:40:57 +02:00
Andreas Kling
fe6474e692 Kernel: Switch to using AK::is and AK::downcast 2020-07-26 17:51:00 +02:00
asynts
707d92db61 Refactor: Change the AK::binary_search signature to use AK::Span. 2020-07-26 16:49:06 +02:00
Tom
06d50f64b0 Kernel: Aggregate TLB flush requests for Regions for SMP
Rather than sending one TLB flush request for each page,
aggregate them so that we're not spamming the other
processors with FlushTLB IPIs.
2020-07-06 22:39:06 +02:00
Tom
655f4daeb1 Kernel: Minor MM optimization for SMP
MemoryManager::quickmap_pd and MemoryManager::quickmap_pt can only
be called by one processor at the time anyway, since anything using
these must have the MM lock held. So, no need to inform the other
CPUs to flush their TLBs, we can just flush our own.
2020-07-06 17:17:24 +02:00
Tom
bc107d0b33 Kernel: Add SMP IPI support
We can now properly initialize all processors without
crashing by sending SMP IPI messages to synchronize memory
between processors.

We now initialize the APs once we have the scheduler running.
This is so that we can process IPI messages from the other
cores.

Also rework interrupt handling a bit so that it's more of a
1:1 mapping. We need to allocate non-sharable interrupts for
IPIs.

This also fixes the occasional hang/crash because all
CPUs now synchronize memory with each other.
2020-07-06 17:07:44 +02:00
Tom
9b4e6f6a23 Kernel: Consolidate features into CPUFeature enum
This allows us to consolidate printing out all the CPU features
into one log statement. Also expose them in /proc/cpuinfo
2020-07-03 19:32:34 +02:00
Tom
e373e5f007 Kernel: Fix signal delivery
When delivering urgent signals to the current thread
we need to check if we should be unblocked, and if not
we need to yield to another process.

We also need to make sure that we suppress context switches
during Process::exec() so that we don't clobber the registers
that it sets up (eip mainly) by a context switch. To be able
to do that we add the concept of a critical section, which are
similar to Process::m_in_irq but different in that they can be
requested at any time. Calls to Scheduler::yield and
Scheduler::donate_to will return instantly without triggering
a context switch, but the processor will then asynchronously
trigger a context switch once the critical section is left.
2020-07-03 19:32:34 +02:00
Tom
2a38cc9a12 Kernel: Add a quickmap region for each processor
Threads need to be able to concurrently quickmap things.
2020-07-01 12:07:01 +02:00
Tom
16783bd14d Kernel: Turn Thread::current and Process::current into functions
This allows us to query the current thread and process on a
per processor basis
2020-07-01 12:07:01 +02:00
Tom
d98edb3171 Kernel: List all CPUs in /proc/cpuinfo 2020-07-01 12:07:01 +02:00
Tom
fb41d89384 Kernel: Implement software context switching and Processor structure
Moving certain globals into a new Processor structure for
each CPU allows us to eventually run an instance of the
scheduler on each CPU.
2020-07-01 12:07:01 +02:00
Peter Elliott
e1aef94a40 Kernel: Make Random work on CPUs without rdrand
- If rdseed is not available, fallback to rdrand.
- If rdrand is not available, block for entropy, or use insecure prng
  depending on if user wants fast or good random.
2020-06-27 19:40:33 +02:00
Tom
841364b609 Kernel: Add mechanism to identity map the lowest 2MB 2020-06-04 18:15:23 +02:00
etaIneLp
826dc94187 Kernel: Create page structures correctly in boot.s 2020-05-26 09:50:12 +02:00
Andreas Kling
e870b936c3 Kernel: Add non-const version of TypedMapping::operator->() 2020-05-23 15:57:19 +02:00
Andreas Kling
4b847810bf Kernel: Simplify scanning BIOS/EBDA and MP parser initialization
Add a MappedROM::find_chunk_starting_with() helper since that's a very
common usage pattern in clients of this code.

Also convert MultiProcessorParser from a persistent singleton object
to a temporary object constructed via a failable factory function.
2020-05-22 13:36:57 +02:00
Andreas Kling
84b7bc5e14 Kernel: Add convenient ways to map whole BIOS and EBDA into memory
This patch adds a MappedROM abstraction to the Kernel VM subsystem.
It's basically the read-only byte buffer equivalent of a TypedMapping.

We use this in the ACPI and MP table parsers to scan for interesting
stuff in low memory instead of doing a bunch of address arithmetic.
2020-05-22 13:17:38 +02:00
Sergey Bugaev
746db0bedb Kernel: Validate access to whole regions 2020-05-20 14:11:13 +02:00
Sergey Bugaev
0dd68a2949 Kernel: Look for a user region first
We're far more likely to be looking for a user region than otherwise, so
optimize for that case.
2020-05-20 14:11:13 +02:00
Andreas Kling
21d5f4ada1 Kernel: Absorb LibBareMetal back into the kernel
This was supposed to be the foundation for some kind of pre-kernel
environment, but nobody is working on it right now, so let's move
everything back into the kernel and remove all the confusion.
2020-05-16 12:00:04 +02:00
Sergey Bugaev
450a2a0f9c Build: Switch to CMake :^)
Closes https://github.com/SerenityOS/serenity/issues/2080
2020-05-14 20:15:18 +02:00
Andreas Kling
85a3678b4f Kernel: Assert on startup if we don't find any physical pages
Instead of checking this on every page allocation, just check it once
on startup. :^)
2020-05-08 22:15:02 +02:00
Andreas Kling
55f61c0004 Kernel: Add for_each_vmobject_of_type<T>
This makes iterating over a specific type of VMObjects a bit nicer.
2020-05-08 22:10:47 +02:00
Andreas Kling
042b1f6814 Kernel: Propagate failure to commit VM regions in more places
Ultimately we should not panic just because we can't fully commit a VM
region (by populating it with physical pages.)

This patch handles some of the situations where commit() can fail.
2020-05-08 21:47:08 +02:00
Andreas Kling
d74650e80d Kernel: Use NonnullRefPtrVector<T> instead of Vector<RefPtr<T>> some 2020-05-08 21:12:16 +02:00
Andreas Kling
beaec6bd2d Kernel: Memory purging was incorrectly "purging" the shared zero page
This caused us to report one purged page per occurrence of the shared
zero page in a purgeable memory region, despite it being a no-op.

Thanks to Sergey for spotting the bad assertion removal that led to
this being found!
2020-05-07 09:44:41 +02:00
Andreas Kling
6fe83b0ac4 Kernel: Crash the current process on OOM (instead of panicking kernel)
This patch adds PageFaultResponse::OutOfMemory which informs the fault
handler that we were unable to allocate a necessary physical page and
cannot continue.

In response to this, the kernel will crash the current process. Because
we are OOM, we can't symbolicate the crash like we normally would
(since the ELF symbolication code needs to allocate), so we also
communicate to Process::crash() that we're out of memory.

Now we can survive "allocate 300 MB" (only the allocate process dies.)
This is definitely not perfect and can easily end up killing a random
innocent other process who happened to allocate one page at the wrong
time, but it's a *lot* better than panicking on OOM. :^)
2020-05-06 22:28:23 +02:00
Andreas Kling
c633c1c2ea Kernel: Assert on OOM in Region::commit()
This function has a lot of callers that don't bother checking if it
returns successfully or not. We'll need to handle failure in a bunch
of places and then we can remove this assertion.
2020-05-06 22:28:23 +02:00
Andreas Kling
43593455db Kernel: Don't assert on OOM in allocate_user_physical_page()
We now give callers a chance to react to OOM situations.
2020-05-06 22:28:23 +02:00
Andreas Kling
4419685b7e Kernel: Leave VMObject alone on OOM during CoW fault
If we OOM during a CoW fault and fail to allocate a new page for the
writing process, just leave the original VMObject alone so everyone
else can keep using it.
2020-04-28 17:05:14 +02:00
Andreas Kling
9c856811b2 Kernel: Add Region helpers for accessing underlying physical pages
Since a Region is basically a view into a potentially larger VMObject,
it was always necessary to include the Region starting offset when
accessing its underlying physical pages.

Until now, you had to do that manually, but this patch adds a simple
Region::physical_page() for read-only access and a physical_page_slot()
when you want a mutable reference to the RefPtr<PhysicalPage> itself.

A lot of code is simplified by making use of this.
2020-04-28 17:05:14 +02:00
Andreas Kling
1d43544e08 Kernel: Switch the first-8MB-of-upper-3GB pseudo mappings to 4KB pages
This memory range was set up using 2MB pages by the code in boot.S.
Because of that, the kernel image protection code didn't work, since it
assumed 4KB pages.

We now switch to 4KB pages during MemoryManager initialization. This
makes the kernel image protection code work correctly again. :^)
2020-04-13 22:35:37 +02:00
Itamar
b306ac9b2b ptrace: Add PT_POKE
PT_POKE writes a single word to the tracee's address space.

Some caveats:
- If the user requests to write to an address in a read-only region, we
temporarily change the page's protections to allow it.

- If the user requests to write to a region that's backed by a
SharedInodeVMObject, we replace the vmobject with a PrivateIndoeVMObject.
2020-04-13 00:53:22 +02:00
Andreas Kling
c19b56dc99 Kernel+LibC: Add minherit() and MAP_INHERIT_ZERO
This patch adds the minherit() syscall originally invented by OpenBSD.
Only the MAP_INHERIT_ZERO mode is supported for now. If set on an mmap
region, that region will be zeroed out on fork().
2020-04-12 20:22:26 +02:00
Andreas Kling
f614f0e2cb Kernel: Add typed_map<T>(PhysicalAddress) and use it in ACPI parsing
There was a frequently occurring pattern of "map this physical address
into kernel VM, then read from it, then unmap it again".

This new typed_map() encapsulates that logic by giving you back a
typed pointer to the kind of structure you're interested in accessing.

It returns a TypedMapping<T> that can be used mostly like a pointer.
When destroyed, the TypedMapping object will unmap the memory. :^)
2020-04-09 17:19:11 +02:00
Andreas Kling
522d8c5d71 Kernel: Non-readable-but-writable regions should still be mapped
Fixes #1436.
2020-04-03 10:10:56 +02:00
Andreas Kling
7d862dd5fc AK: Reduce header dependency graph of String.h
String.h no longer pulls in StringView.h. We do this by moving a bunch
of String functions out-of-line.
2020-03-23 13:48:44 +01:00
Shannon Booth
81adefef27 Kernel: Run clang-format on files
Let's rip off the band-aid
2020-03-22 01:22:32 +01:00
Liav A
d6e122fd3a Kernel: Allow contiguous allocations in physical memory
For that, we have a new type of VMObject, called
ContiguousVMObject, that is responsible for allocating contiguous
physical pages.
2020-03-08 14:13:30 +01:00
Andreas Kling
fa9fba6901 Kernel: Add missing #includes now that <AK/StdLibExtras.h> is smaller 2020-03-08 13:06:51 +01:00
Andreas Kling
b1058b33fb AK: Add global FlatPtr typedef. It's u32 or u64, based on sizeof(void*)
Use this instead of uintptr_t throughout the codebase. This makes it
possible to pass a FlatPtr to something that has u32 and u64 overloads.
2020-03-08 13:06:51 +01:00
Andreas Kling
7a6c4a72d5 LibWeb: Move everything into the Web namespace 2020-03-07 10:27:02 +01:00
Andreas Kling
c6693f9b3a Kernel: Simplify a bunch of dbg() and klog() calls
LogStream can handle VirtualAddress and PhysicalAddress directly.
2020-03-06 15:00:44 +01:00
Andreas Kling
94f287b1c0 Kernel: Unmap non-readable pages
This was caught by running all crash tests with "crash -A".
Basically, non-readable pages need to not be mapped *at all* so that
a "page not present" exception is provoked on access.

Unfortunately x86 does not support write-only mappings, so this is
the best we can do.

Fixes #1336.
2020-03-06 09:58:59 +01:00
Liav A
0fc60e41dd Kernel: Use klog() instead of kprintf()
Also, duplicate data in dbg() and klog() calls were removed.
In addition, leakage of virtual address to kernel log is prevented.
This is done by replacing kprintf() calls to dbg() calls with the
leaked data instead.
Also, other kprintf() calls were replaced with klog().
2020-03-02 22:23:39 +01:00
Andreas Kling
918ebabf60 Kernel: MemoryManager should create cacheable regions by default 2020-03-02 13:04:17 +01:00
Andreas Kling
ecdd9a5bc6 Kernel: Reduce code duplication a little bit in Region allocation
This patch reduces the number of code paths that lead to the allocation
of a Region object. It's quite hard to follow the various ways in which
this can happen, so this is an effort to simplify.
2020-03-01 15:56:23 +01:00
Andreas Kling
5e0c4d689f Kernel: Move ProcessPagingScope to its own files 2020-03-01 15:38:09 +01:00
Andreas Kling
fee20bd8de Kernel: Remove some more harmless InodeVMObject miscasts 2020-03-01 12:27:03 +01:00
Andreas Kling
b614462079 Kernel: Include the dirty bits when cloning an InodeVMObject
Now that (private) InodeVMObjects can be CoW-cloned on fork(), we need
to make sure we clone the dirty bits as well.
2020-03-01 12:11:50 +01:00
Andreas Kling
48bbfe51fb Kernel: Add some InodeVMObject type assertions in Region::clone()
Let's make sure that we're never cloning shared inode-backed objects
as if they were private, and vice versa.
2020-03-01 11:23:10 +01:00
Andreas Kling
88b334135b Kernel: Remove some Region construction helpers
It's now up to the caller to provide a VMObject when constructing a new
Region object. This will make it easier to handle things going wrong,
like allocation failures, etc.
2020-03-01 11:23:10 +01:00
Andreas Kling
fddc3c957b Kernel: CoW-clone private inode-backed memory regions on fork()
When forking a process, we now turn all of the private inode-backed
mmap() regions into copy-on-write regions in both the parent and child.

This patch also removes an assertion that becomes irrelevant.
2020-03-01 11:23:10 +01:00
Andreas Kling
7cd1bdfd81 Kernel: Simplify some dbg() logging
We don't have to log the process name/PID/TID, dbg() automatically adds
that as a prefix to every line.

Also we don't have to do .characters() on Strings passed to dbg() :^)
2020-02-29 13:39:06 +01:00
Andreas Kling
5f7056d62c Kernel: Expose the VMObject type of each Region in /proc/PID/vm 2020-02-28 23:25:40 +01:00
Andreas Kling
aa1e209845 Kernel: Remove some unnecessary indirection in InodeFile::mmap()
InodeFile now directly calls Process::allocate_region_with_vmobject()
instead of taking an awkward detour via a special Region constructor.
2020-02-28 20:29:14 +01:00
Andreas Kling
651417a085 Kernel: Split InodeVMObject into two subclasses
We now have PrivateInodeVMObject and SharedInodeVMObject, corresponding
to MAP_PRIVATE and MAP_SHARED respectively.

Note that PrivateInodeVMObject is not used yet.
2020-02-28 20:20:35 +01:00
Andreas Kling
07a26aece3 Kernel: Rename InodeVMObject => SharedInodeVMObject 2020-02-28 20:07:51 +01:00
Liav A
d16b26f83a MemoryManager: Use dbg() instead of dbgprintf() 2020-02-27 13:05:12 +01:00
Liav A
42665817d1 RangeAllocator: Use dbg() instead of dbgprintf() 2020-02-27 13:05:12 +01:00
Liav A
3f2d5f2774 PhysicalPage: Use dbg() instead of dbgprintf() 2020-02-27 13:05:12 +01:00
Liav A
24d2aeda8e Region: Use dbg() instead of dbgprintf() 2020-02-27 13:05:12 +01:00
Liav A
3f95a7fc97 InodeVMObject: Use dbg() instead of dbgprintf() 2020-02-27 13:05:12 +01:00
Liav A
62adbbc598 PageDirectory: Use dbg() instead of dbgprintf() 2020-02-27 13:05:12 +01:00
Andreas Kling
ceec1a7d38 AK: Make Vector use size_t for its size and capacity 2020-02-25 14:52:35 +01:00
Andreas Kling
30a8991dbf Kernel: Make Region weakable and use WeakPtr<Region> instead of Region*
This turns use-after-free bugs into null pointer dereferences instead.
2020-02-24 13:32:45 +01:00
Andreas Kling
0763f67043 AK: Make Bitmap use size_t for its size
Also rework its API's to return Optional<size_t> instead of int with -1
as the error value.
2020-02-24 09:56:07 +01:00
Andreas Kling
7ec758773c Kernel: Dump all kernel regions when we hit a page fault during IRQ
This way you can try to figure out what the faulting address is.
2020-02-23 11:10:52 +01:00
Andreas Kling
f020081a38 Kernel: Put "Couldn't find user region" spam behind MM_DEBUG
This basically never tells us anything actionable anyway, and it's a
real annoyance when doing something validation-heavy like profiling.
2020-02-22 10:09:54 +01:00
Andreas Kling
b298c01e92 Kernel: Log instead of crashing when getting a page fault during IRQ
This is definitely a bug, but it seems to happen randomly every now
and then and we need more info to track it down, so let's log for now.
2020-02-21 19:05:45 +01:00
Andreas Kling
04e40da188 Kernel: Fix crash when reading /proc/PID/vmobjects
InodeVMObjects can have nulled-out physical page slots. That just means
we haven't cached that page from disk right now.
2020-02-21 16:03:56 +01:00
Andreas Kling
59b9e49bcd Kernel: Don't trigger page faults during profiling stack walk
The kernel sampling profiler will walk thread stacks during the timer
tick handler. Since it's not safe to trigger page faults during IRQ's,
we now avoid this by checking the page tables manually before accessing
each stack location.
2020-02-21 15:49:39 +01:00
Andreas Kling
d46071c08f Kernel: Assert on page fault during IRQ
We're not equipped to deal with page faults during an IRQ handler,
so add an assertion so we can immediately tell what's wrong.

This is why profiling sometimes hangs the system -- walking the stack
of the profiled thread causes a page fault and things fall apart.
2020-02-21 15:49:34 +01:00
Andreas Kling
a87544fe8b Kernel: Refuse to allocate 0 bytes of virtual address space 2020-02-19 22:19:55 +01:00
Andreas Kling
f17c377a0c Kernel: Use bitfields in Region
This makes Region 4 bytes smaller and we can use bitfield initializers
since they are allowed in C++20. :^)
2020-02-19 12:03:11 +01:00
Andreas Kling
4b16ac0034 Kernel: Purging a page should point it back to the shared zero page
Anonymous VM objects should never have null entries in their physical
page list. Instead, "empty" or untouched pages should refer to the
shared zero page.

Fixes #1237.
2020-02-18 09:56:11 +01:00
Andreas Kling
48f7c28a5c Kernel: Replace "current" with Thread::current and Process::current
Suggested by Sergey. The currently running Thread and Process are now
Thread::current and Process::current respectively. :^)
2020-02-17 15:04:27 +01:00
Andreas Kling
31e1af732f Kernel+LibC: Allow sys$mmap() callers to specify address alignment
This is exposed via the non-standard serenity_mmap() call in userspace.
2020-02-16 12:55:56 +01:00
Andreas Kling
7533d61458 Kernel: Fix weird whitespace mistake in RangeAllocator 2020-02-16 08:01:33 +01:00
Andreas Kling
635ae70b8f Kernel: More header dependency reduction work 2020-02-16 02:15:33 +01:00
Andreas Kling
e28809a996 Kernel: Add forward declaration header 2020-02-16 01:50:32 +01:00
Andreas Kling
1d611e4a11 Kernel: Reduce header dependencies of MemoryManager and Region 2020-02-16 01:33:41 +01:00
Andreas Kling
a356e48150 Kernel: Move all code into the Kernel namespace 2020-02-16 01:27:42 +01:00
Andreas Kling
5507945306 Kernel: Widen PhysicalPage refcount to 32 bits
A 16-bit refcount is just begging for trouble right nowl.
A 32-bit refcount will be begging for trouble later down the line,
so we'll have to revisit this eventually. :^)
2020-02-15 22:34:48 +01:00
Andreas Kling
c624d3875e Kernel: Use a shared physical page for zero-filled pages until written
This patch adds a globally shared zero-filled PhysicalPage that will
be mapped into every slot of every zero-filled AnonymousVMObject until
that page is written to, achieving CoW-like zero-filled pages.

Initial testing show that this doesn't actually achieve any sharing yet
but it seems like a good design regardless, since it may reduce the
number of page faults taken by programs.

If you look at the refcount of MM.shared_zero_page() it will have quite
a high refcount, but that's just because everything maps it everywhere.
If you want to see the "real" refcount, you can build with the
MAP_SHARED_ZERO_PAGE_LAZILY flag, and we'll defer mapping of the shared
zero page until the first NP read fault.

I've left this behavior behind a flag for future testing of this code.
2020-02-15 13:17:40 +01:00
Andreas Kling
27f0102bbe Kernel: Add getter and setter for the X86 CR3 register
This gets rid of a bunch of inline assembly.
2020-02-10 20:00:32 +01:00
Andreas Kling
ccfee3e573 Kernel: Remove more <LibBareMetal/Output/kstdio.h> includes 2020-02-10 12:07:48 +01:00
Andreas Kling
6cbd72f54f AK: Remove bitrotted Traits::dump() mechanism
This was only used by HashTable::dump() which I used when doing the
first HashTable implementation. Removing this allows us to also remove
most includes of <AK/kstdio.h>.
2020-02-10 11:55:34 +01:00
Liav A
99ea80695e Kernel: Use VirtualAddress & PhysicalAddress classes from LibBareMetal 2020-02-09 19:38:17 +01:00
Liav A
e559af2008 Kernel: Apply changes to use LibBareMetal definitions 2020-02-09 19:38:17 +01:00
Andreas Kling
00d8ec3ead Kernel: The inode fault handler should grab the VMObject lock earlier
It doesn't look healthy to create raw references into an array before
a temporary unlock. In fact, that temporary unlock looks generally
unhealthy, but it's a different problem.
2020-02-08 12:55:21 +01:00
Andreas Kling
a9d7902bb7 x86: Simplify region unmapping a bit
Add PageTableEntry::clear() to zero out a whole PTE, and use that for
unmapping instead of clearing individual fields.
2020-02-08 12:49:38 +01:00
Andreas Kling
f91b3aab47 Kernel: Cloned shared regions should also be marked as shared 2020-02-08 02:39:46 +01:00
Andreas Kling
bf5b7c32d8 Kernel: Add some sanity assertions in RangeAllocator::deallocate()
We should never end up deallocating an empty range, or a range that
ends before it begins.
2020-01-30 21:51:27 +01:00
Andreas Kling
31a141bd10 Kernel: Range::contains() should reject ranges with 2^32 wrap-around 2020-01-30 21:51:27 +01:00
Andreas Kling
a27c5d2fb7 Kernel: Fail with EFAULT for any address+size that would wrap around
Previously we were only checking that each of the virtual pages in the
specified range were valid.

This made it possible to pass in negative buffer sizes to some syscalls
as long as (address) and (address+size) were on the same page.
2020-01-29 12:56:07 +01:00
Andreas Kling
c17f80e720 Kernel: AnonymousVMObject::create_for_physical_range() should fail more
Previously it was not possible for this function to fail. You could
exploit this by triggering the creation of a VMObject whose physical
memory range would wrap around the 32-bit limit.

It was quite easy to map kernel memory into userspace and read/write
whatever you wanted in it.

Test: Kernel/bxvga-mmap-kernel-into-userspace.cpp
2020-01-28 20:48:07 +01:00
Andreas Kling
8131875da6 Kernel: Remove outdated comment in MemoryManager
Regions *do* zero-fill on demand now. :^)
2020-01-28 10:28:04 +01:00
Andreas Kling
3de5439579 AK: Let's call decrementing reference counts "unref" instead of "deref"
It always bothered me that we're using the overloaded "dereference"
term for this. Let's call it "unreference" instead. :^)
2020-01-23 15:14:21 +01:00
Andreas Kling
f38cfb3562 Kernel: Tidy up debug logging a little bit
When using dbg() in the kernel, the output is automatically prefixed
with [Process(PID:TID)]. This makes it a lot easier to understand which
thread is generating the output.

This patch also cleans up some common logging messages and removes the
now-unnecessary "dbg() << *current << ..." pattern.
2020-01-21 16:16:20 +01:00
Liav A
200a5b0649 Kernel: Remove map_for_kernel() in MemoryManager
We don't need to have this method anymore. It was a hack that was used
in many components in the system but currently we use better methods to
create virtual memory mappings. To prevent any further use of this
method it's best to just remove it completely.

Also, the APIC code is disabled for now since it doesn't help booting
the system, and is broken since it relies on identity mapping to exist
in the first 1MB. Any call to the APIC code will result in assertion
failed.

In addition to that, the name of the method which is responsible to
create an identity mapping between 1MB to 2MB was changed, to be more
precise about its purpose.
2020-01-21 11:29:58 +01:00
Andreas Kling
a0b716cfc5 Add AnonymousVMObject::create_with_physical_page()
This can be used to create a VMObject for a single PhysicalPage.
2020-01-20 13:13:03 +01:00
Andreas Kling
4ebff10bde Kernel: Write-only regions should still be mapped as present
There is no real "read protection" on x86, so we have no choice but to
map write-only pages simply as "present & read/write".

If we get a read page fault in a non-readable region, that's still a
correctness issue, so we crash the process. It's by no means a complete
protection against invalid reads, since it's trivial to fool the kernel
by first causing a write fault in the same region.
2020-01-20 13:13:03 +01:00
Andreas Kling
4b7a89911c Kernel: Remove some unnecessary casts to uintptr_t
VirtualAddress is constructible from uintptr_t and const void*.
PhysicalAddress is constructible from uintptr_t but not const void*.
2020-01-20 13:13:03 +01:00
Andreas Kling
a246e9cd7e Use uintptr_t instead of u32 when storing pointers as integers
uintptr_t is 32-bit or 64-bit depending on the target platform.
This will help us write pointer size agnostic code so that when the day
comes that we want to do a 64-bit port, we'll be in better shape.
2020-01-20 13:13:03 +01:00
Andreas Kling
05836757c6 Kernel: Oops, fix bad sort order of available VM ranges
This made the allocator perform worse, so here's another second off of
the Kernel/Process.cpp compile time from a simple bugfix! (31s to 30s)
2020-01-19 15:53:43 +01:00
Andreas Kling
6eab7b398d Kernel: Make ProcessPagingScope restore CR3 properly
Instead of restoring CR3 to the current process's paging scope when a
ProcessPagingScope goes out of scope, we now restore exactly whatever
the CR3 value was when we created the ProcessPagingScope.

This fixes breakage in situations where a process ends up with nested
ProcessPagingScopes. This was making profiling very fragile, and with
this change it's now possible to profile g++! :^)
2020-01-19 13:44:53 +01:00
Andreas Kling
ad3f931707 Kernel: Optimize VM range deallocation a bit
Previously, when deallocating a range of VM, we would sort and merge
the range list. This was quite slow for large processes.

This patch optimizes VM deallocation in the following ways:

- Use binary search instead of linear scan to find the place to insert
  the deallocated range.

- Insert at the right place immediately, removing the need to sort.

- Merge the inserted range with any adjacent range(s) in-line instead
  of doing a separate merge pass into a list copy.

- Add Traits<Range> to inform Vector that Range objects are trivial
  and can be moved using memmove().

I've also added an assertion that deallocated ranges are actually part
of the RangeAllocator's initial address range.

I've benchmarked this using g++ to compile Kernel/Process.cpp.
With these changes, compilation goes from ~41 sec to ~35 sec.
2020-01-19 13:29:59 +01:00
Andreas Kling
f7b394e9a1 Kernel: Assert that copy_to/from_user() are called with user addresses
This will panic the kernel immediately if these functions are misused
so we can catch it and fix the misuse.

This patch fixes a couple of misuses:

    - create_signal_trampolines() writes to a user-accessible page
      above the 3GB address mark. We should really get rid of this
      page but that's a whole other thing.

    - CoW faults need to use copy_from_user rather than copy_to_user
      since it's the *source* pointer that points to user memory.

    - Inode faults need to use memcpy rather than copy_to_user since
      we're copying a kernel stack buffer into a quickmapped page.

This should make the copy_to/from_user() functions slightly less useful
for exploitation. Before this, they were essentially just glorified
memcpy() with SMAP disabled. :^)
2020-01-19 09:18:55 +01:00
Andreas Kling
2cd212e5df Kernel: Let's say that everything < 3GB is user virtual memory
Technically the bottom 2MB is still identity-mapped for the kernel and
not made available to userspace at all, but for simplicity's sake we
can just ignore that and make "address < 0xc0000000" the canonical
check for user/kernel.
2020-01-19 08:58:33 +01:00
Andreas Kling
862b3ccb4e Kernel: Enforce W^X between sys$mmap() and sys$execve()
It's now an error to sys$mmap() a file as writable if it's currently
mapped executable by anyone else.

It's also an error to sys$execve() a file that's currently mapped
writable by anyone else.

This fixes a race condition vulnerability where one program could make
modifications to an executable while another process was in the kernel,
in the middle of exec'ing the same executable.

Test: Kernel/elf-execve-mmap-race.cpp
2020-01-18 23:40:12 +01:00
Andreas Kling
6fea316611 Kernel: Move all CPU feature initialization into cpu_setup()
..and do it very very early in boot.
2020-01-18 10:11:29 +01:00
Andreas Kling
94ca55cefd Meta: Add license header to source files
As suggested by Joshua, this commit adds the 2-clause BSD license as a
comment block to the top of every source file.

For the first pass, I've just added myself for simplicity. I encourage
everyone to add themselves as copyright holders of any file they've
added or modified in some significant way. If I've added myself in
error somewhere, feel free to replace it with the appropriate copyright
holder instead.

Going forward, all new source files should include a license header.
2020-01-18 09:45:54 +01:00
Andreas Kling
19c31d1617 Kernel: Always dump kernel regions when dumping process regions 2020-01-18 08:57:18 +01:00
Andreas Kling
345f92d5ac Kernel: Remove two unused MemoryManager functions 2020-01-18 08:57:18 +01:00
Andreas Kling
3e8b60c618 Kernel: Clean up MemoryManager initialization a bit more
Move the CPU feature enabling to functions in Arch/i386/CPU.cpp.
2020-01-18 00:28:16 +01:00
Andreas Kling
a850a89c1b Kernel: Add a random offset to the base of the per-process VM allocator
This is not ASLR, but it does de-trivialize exploiting the ELF loader
which would previously always parse executables at 0x01001000 in every
single exec(). I've taken advantage of this multiple times in my own
toy exploits and it's starting to feel cheesy. :^)
2020-01-17 23:29:54 +01:00
Andreas Kling
536c0ff3ee Kernel: Only clone the bottom 2MB of mappings from kernel to processes 2020-01-17 22:34:36 +01:00
Andreas Kling
122c76d7fa Kernel: Don't allocate per-process PDPT from super pages either
The default system is now down to 3 super pages allocated on boot. :^)
2020-01-17 22:34:36 +01:00
Andreas Kling
ad1f79fb4a Kernel: Stop allocating page tables from the super pages pool
We now use the regular "user" physical pages for on-demand page table
allocations. This was by far the biggest source of super physical page
exhaustion, so that bug should be a thing of the past now. :^)

We still have super pages, but they are barely used. They remain useful
for code that requires memory with a low physical address.

Fixes #1000.
2020-01-17 22:34:36 +01:00
Andreas Kling
f71fc88393 Kernel: Re-enable protection of the kernel image in memory 2020-01-17 22:34:36 +01:00
Andreas Kling
59b584d983 Kernel: Tidy up the lowest part of the address space
After MemoryManager initialization, we now only leave the lowest 1MB
of memory identity-mapped. The very first (null) page is not present.
All other pages are RW but not X. Supervisor only.
2020-01-17 22:34:36 +01:00
Andreas Kling
545ec578b3 Kernel: Tidy up the types imported from boot.S a little bit 2020-01-17 22:34:36 +01:00
Andreas Kling
7e6f0efe7c Kernel: Move Multiboot memory map parsing to its own function 2020-01-17 22:34:36 +01:00
Andreas Kling
ba8275a48e Kernel: Clean up ensure_pte() 2020-01-17 22:34:36 +01:00
Andreas Kling
e362b56b4f Kernel: Move kernel above the 3GB virtual address mark
The kernel and its static data structures are no longer identity-mapped
in the bottom 8MB of the address space, but instead move above 3GB.

The first 8MB above 3GB are pseudo-identity-mapped to the bottom 8MB of
the physical address space. But things don't have to stay this way!

Thanks to Jesse who made an earlier attempt at this, it was really easy
to get device drivers working once the page tables were in place! :^)

Fixes #734.
2020-01-17 22:34:26 +01:00
Liav A
d2b41010c5 Kernel: Change Region allocation helpers
We now can create a cacheable Region, so when map() is called, if a
Region is cacheable then all the virtual memory space being allocated
to it will be marked as not cache disabled.

In addition to that, OS components can create a Region that will be
mapped to a specific physical address by using the appropriate helper
method.
2020-01-14 15:38:58 +01:00
Andreas Kling
5c3c2a9bac Kernel: Copy Region's "is_mmap" flag when cloning regions for fork()
Otherwise child processes will not be allowed to munmap(), madvise(),
etc. on the cloned regions!
2020-01-10 19:24:01 +01:00
Andreas Kling
62c45850e1 Kernel: Page allocation should not use memset_user() when zeroing
We're not zeroing new pages through a userspace address, so this should
not use memset_user().
2020-01-10 10:57:33 +01:00
Andreas Kling
197e73ee31 Kernel+LibELF: Enable SMAP protection during non-syscall exec()
When loading a new executable, we now map the ELF image in kernel-only
memory and parse it there. Then we use copy_to_user() when initializing
writable regions with data from the executable.

Note that the exec() syscall still disables SMAP protection and will
require additional work. This patch only affects kernel-originated
process spawns.
2020-01-10 10:57:06 +01:00
Andreas Kling
8e7420ddf2 Kernel: Harden memory mapping of the kernel image
We now map the kernel's text and rodata segments read+execute.
We also make the data and bss segments non-executable.

Thanks to q3k for the idea! :^)
2020-01-06 13:55:39 +01:00
Andreas Kling
9eef39d68a Kernel: Start implementing x86 SMAP support
Supervisor Mode Access Prevention (SMAP) is an x86 CPU feature that
prevents the kernel from accessing userspace memory. With SMAP enabled,
trying to read/write a userspace memory address while in the kernel
will now generate a page fault.

Since it's sometimes necessary to read/write userspace memory, there
are two new instructions that quickly switch the protection on/off:
STAC (disables protection) and CLAC (enables protection.)
These are exposed in kernel code via the stac() and clac() helpers.

There's also a SmapDisabler RAII object that can be used to ensure
that you don't forget to re-enable protection before returning to
userspace code.

THis patch also adds copy_to_user(), copy_from_user() and memset_user()
which are the "correct" way of doing things. These functions allow us
to briefly disable protection for a specific purpose, and then turn it
back on immediately after it's done. Going forward all kernel code
should be moved to using these and all uses of SmapDisabler are to be
considered FIXME's.

Note that we're not realizing the full potential of this feature since
I've used SmapDisabler quite liberally in this initial bring-up patch.
2020-01-05 18:14:51 +01:00
Andreas Kling
aba7829724 Kernel: InodeVMObject can't call Inode::size() with interrupts disabled
Inode::size() may try to take a lock, so we can't be calling it with
interrupts disabled.

This fixes a kernel hang when trying to execute a binary in a TmpFS.
2020-01-03 15:40:03 +01:00
Andreas Kling
0f9800ca57 Kernel: Make the loop that marks the bottom 1MB NX a little less busy 2020-01-02 22:02:29 +01:00
Andreas Kling
32ec1e5aed Kernel: Mask kernel addresses in backtraces and profiles
Addresses outside the userspace virtual range will now show up as
0xdeadc0de in backtraces and profiles generated by unprivileged users.
2020-01-02 20:51:31 +01:00
Andreas Kling
3dcec260ed Kernel: Validate the full range of user memory passed to syscalls
We now validate the full range of userspace memory passed into syscalls
instead of just checking that the first and last byte of the memory are
in process-owned regions.

This fixes an issue where it was possible to avoid rejection of invalid
addresses that sat between two valid ones, simply by passing a valid
address and a size large enough to put the end of the range at another
valid address.

I added a little test utility that tries to provoke EFAULT in various
ways to help verify this. I'm sure we can think of more ways to test
this but it's at least a start. :^)

Thanks to mozjag for pointing out that this code was still lacking!

Incidentally this also makes backtraces work again.

Fixes #989.
2020-01-02 02:17:12 +01:00
Andreas Kling
ea1911b561 Kernel: Share code between Region::map() and Region::remap_page()
These were doing mostly the same things, so let's just share the code.
2020-01-01 19:32:55 +01:00
Andreas Kling
5aeaab601e Kernel: Move CPU feature detection to Arch/x86/CPU.{cpp.h}
We now refuse to boot on machines that don't support PAE since all
of our paging code depends on it.

Also let's only enable SSE and PGE support if the CPU advertises it.
2020-01-01 12:57:00 +01:00
Andreas Kling
8602fa5b49 Kernel: Enable x86 SMEP (Supervisor Mode Execution Protection)
This prevents the kernel from jumping to code in userspace memory.
2020-01-01 01:59:52 +01:00
Andreas Kling
c9ec415e2f Kernel: Always reject never-userspace addresses before checking regions
At the moment, addresses below 8MB and above 3GB are never accessible
to userspace, so just reject them without even looking at the current
process's memory regions.
2019-12-31 03:45:54 +01:00
Andreas Kling
66d5ebafa6 Kernel: Let's also not consider kernel regions to be valid user stacks
This one is less obviously exploitable than the previous one, but still
a bug nonetheless.
2019-12-31 00:28:14 +01:00
Andreas Kling
0fc24fe256 Kernel: User pointer validation should reject kernel-only addresses
We were happily allowing syscalls with pointers into kernel-only
regions (virtual address >= 0xc0000000).

This patch fixes that by only considering user regions in the current
process, and also double-checking the Region::is_user_accessible() flag
before approving an access.

Thanks to Fire30 for finding the bug! :^)
2019-12-31 00:24:35 +01:00
Andreas Kling
1f31156173 Kernel: Add a mode flag to sys$purge and allow purging clean inodes 2019-12-29 13:16:53 +01:00
Andreas Kling
c74cde918a Kernel+SystemMonitor: Expose amount of per-process clean inode memory
This is memory that's loaded from an inode (file) but not modified in
memory, so still identical to what's on disk. This kind of memory can
be freed and reloaded transparently from disk if needed.
2019-12-29 12:45:58 +01:00
Andreas Kling
0d5e0e4cad Kernel+SystemMonitor: Expose amount of per-process dirty private memory
Dirty private memory is all memory in non-inode-backed mappings that's
process-private, meaning it's not shared with any other process.

This patch exposes that number via SystemMonitor, giving us an idea of
how much memory each process is responsible for all on its own.
2019-12-29 12:28:32 +01:00
Andreas Kling
c1f8291ce4 Kernel: When physical page allocation fails, try to purge something
Instead of panicking right away when we run out of physical pages,
we now try to find a PurgeableVMObject with some volatile pages in it.
If we find one, we purge that entire object and steal one of its pages.

This makes it possible for the kernel to keep going instead of dying.
Very cool. :^)
2019-12-26 11:45:36 +01:00
Conrad Pankoff
17aef7dc99 Kernel: Detect support for no-execute (NX) CPU features
Previously we assumed all hosts would have support for IA32_EFER.NXE.
This is mostly true for newer hardware, but older hardware will crash
and burn if you try to use this feature.

Now we check for support via CPUID.80000001[20].
2019-12-26 10:05:51 +01:00
Andreas Kling
9e55bcb7da Kernel: Make kernel memory regions be non-executable by default
From now on, you'll have to request executable memory specifically
if you want some.
2019-12-25 22:41:34 +01:00
Andreas Kling
0b7a2e0a5a Kernel: Set NX bit for virtual addresses 0-1MB and 2-8MB
This removes the ability to jump into kmalloc memory, etc.
Only the kernel image itself is allowed to exec, located between 1-2MB.
2019-12-25 22:24:28 +01:00
Andreas Kling
ce5f7f6c07 Kernel: Use the CPU's NX bit to enforce PROT_EXEC on memory mappings
Now that we have PAE support, we can ask the CPU to crash processes for
trying to execute non-executable memory. This is pretty cool! :^)
2019-12-25 13:35:57 +01:00
Andreas Kling
52deb09382 Kernel: Enable PAE (Physical Address Extension)
Introduce one more (CPU) indirection layer in the paging code: the page
directory pointer table (PDPT). Each PageDirectory now has 4 separate
PageDirectoryEntry arrays, governing 1 GB of VM each.

A really neat side-effect of this is that we can now share the physical
page containing the >=3GB kernel-only address space metadata between
all processes, instead of lazily cloning it on page faults.

This will give us access to the NX (No eXecute) bit, allowing us to
prevent execution of memory that's not supposed to be executed.
2019-12-25 13:35:57 +01:00
Andreas Kling
c087abc48d Kernel: Rename PageDirectory::find_by_pdb() => find_by_cr3()
I caught myself wondering what "pdb" stood for, so let's rename this
to something more obvious.
2019-12-25 02:58:03 +01:00
Andreas Kling
7a0088c4d2 Kernel: Clean up Region access bit setters a little 2019-12-25 02:58:03 +01:00
Andreas Kling
c9a5253ac2 Kernel: Uh, actually *actually* turn on CR4.PGE
I'm not sure how I managed to misread the location of this bit twice.
But I did! Here is finally the correct value, according to Intel:

    "Page Global Enable (bit 7 of CR4)"

Jeez! :^)
2019-12-25 02:58:03 +01:00
Andreas Kling
3623e35978 Kernel: Oops, actually enable CR4.PGE (page table global bit)
Turns out we were setting the wrong bit here. Now we will actually keep
kernel memory mappings in the TLB across context switches.
2019-12-24 22:45:27 +01:00
Andreas Kling
ae2d72377d Kernel: Enable the x86 WP bit to catch invalid memory writes in ring 0
Setting this bit will cause the CPU to generate a page fault when
writing to read-only memory, even if we're executing in the kernel.

Seemingly the only change needed to make this work was to have the
inode-backed page fault handler use a temporary mapping for writing
the read-from-disk data into the newly-allocated physical page.
2019-12-21 16:21:13 +01:00
Andreas Kling
62c2309336 Kernel: Fix some warnings about passing non-POD to kprintf 2019-12-20 20:19:46 +01:00
Andreas Kling
b6ee8a2c8d Kernel: Rename vmo => vmobject everywhere 2019-12-19 19:15:27 +01:00
Andreas Kling
1d4d6f16b2 Kernel: Add a specific-page variant of Region::commit() 2019-12-18 22:43:32 +01:00
Andreas Kling
0a75a46501 Kernel: Make sure the kernel info page is read-only for userspace
To enforce this, we create two separate mappings of the same underlying
physical page. A writable mapping for the kernel, and a read-only one
for userspace (the one returned by sys$get_kernel_info_page.)
2019-12-15 22:21:28 +01:00
Andreas Kling
931e4b7f5e Kernel+SystemMonitor: Prevent userspace access to process ELF image
Every process keeps its own ELF executable mapped in memory in case we
need to do symbol lookup (for backtraces, etc.)

Until now, it was mapped in a way that made it accessible to the
program, despite the program not having mapped it itself.
I don't really see a need for userspace to have access to this right
now, so let's lock things down a little bit.

This patch makes it inaccessible to userspace and exposes that fact
through /proc/PID/vm (per-region "user_accessible" flag.)
2019-12-15 20:11:57 +01:00
Andreas Kling
05a441afb2 Kernel: Don't turn private read-only regions into shared ones on fork
Even if they are read-only now, they can be mprotect(PROT_WRITE)'d in
the future, so we have to make sure they are CoW mapped.
2019-12-15 16:53:46 +01:00
Andreas Kling
3fbc50a350 Kernel+SystemMonitor: Expose the number of set CoW bits in each Region
This number tells us how many more pages in a given region will trigger
a CoW fault if written to.
2019-12-15 16:53:00 +01:00
Andreas Kling
9ad151c665 Kernel: Improve comment about the system virtual memory map a bit 2019-12-15 16:13:08 +01:00
Andreas Kling
65229a4082 Kernel: Move VMObject::for_each_region() to MemoryManager.h
It can't be in VMObject.h since it depends on MemoryManager.h
2019-12-09 20:06:03 +01:00
Andreas Kling
a22b7f96fc Kernel: Remap all regions referring to a PurgeableVMObject on purge
Otherwise we won't get page faults next time you try to access the
purged memory.
2019-12-09 20:05:04 +01:00
Andreas Kling
dbb644f20c Kernel: Start implementing purgeable memory support
It's now possible to get purgeable memory by using mmap(MAP_PURGEABLE).
Purgeable memory has a "volatile" flag that can be set using madvise():

- madvise(..., MADV_SET_VOLATILE)
- madvise(..., MADV_SET_NONVOLATILE)

When in the "volatile" state, the kernel may take away the underlying
physical memory pages at any time, without notifying the owner.
This gives you a guilt discount when caching very large things. :^)

Setting a purgeable region to non-volatile will return whether or not
the memory has been taken away by the kernel while being volatile.
Basically, if madvise(..., MADV_SET_NONVOLATILE) returns 1, that means
the memory was purged while volatile, and whatever was in that piece
of memory needs to be reconstructed before use.
2019-12-09 19:12:38 +01:00
Andreas Kling
05c65fb4f1 Kernel: Don't CoW non-writable pages
A page fault in a page marked for CoW should not trigger a CoW if the
page is non-writable. I think this makes sense.
2019-12-02 19:20:09 +01:00
Andreas Kling
f41ae755ec Kernel: Crash on memory access in non-readable regions
This patch makes it possible to make memory regions non-readable.
This is enforced using the "present" bit in the page tables.
A process that hits an not-present page fault in a non-readable
region will be crashed.
2019-12-02 19:18:52 +01:00
Andreas Kling
7dc9c90f83 Kernel: Fix bug where mprotect() would ignore setting PROT_WRITE
A typo in Region::set_writable() caused us to update the readable flag
rather than the writable flag.
2019-12-02 18:15:36 +01:00
Andreas Kling
cde0a1eeb5 Kernel: Put some debug spam behind PAGE_FAULT_DEBUG 2019-12-01 16:03:24 +01:00