This allows sys$mprotect() to honor the original readable & writable
flags of the open file description as they were at the point we did the
original sys$mmap().
IIUC, this is what Dr. POSIX wants us to do:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/mprotect.html
Also, remove the bogus and racy "W^X" checking we did against mappings
based on their current inode metadata. If we want to do this, we can do
it properly. For now, it was not only racy, but also did blocking I/O
while holding a spinlock.
We were holding the MM lock across all of the region unmapping code.
This was previously necessary since the quickmaps used during unmapping
required holding the MM lock.
Now that it's no longer necessary, we can leave the MM lock alone here.
Until now, our kernel has reimplemented a number of AK classes to
provide automatic internal locking:
- RefPtr
- NonnullRefPtr
- WeakPtr
- Weakable
This patch renames the Kernel classes so that they can coexist with
the original AK classes:
- RefPtr => LockRefPtr
- NonnullRefPtr => NonnullLockRefPtr
- WeakPtr => LockWeakPtr
- Weakable => LockWeakable
The goal here is to eventually get rid of the Lock* classes in favor of
using external locking.
Region::physical_page() now takes the VMObject lock while accessing the
physical pages array, and returns a RefPtr<PhysicalPage>. This ensures
that the array access is safe.
Region::physical_page_slot() now VERIFY()'s that the VMObject lock is
held by the caller. Since we're returning a reference to the physical
page slot in the VMObject's physical page array, this is the best we
can do here.
We really only need the VMObject lock when accessing the physical pages
array, so once we have a strong pointer to the physical page we want to
remap, we can give up the VMObject lock.
This fixes a deadlock I encountered while building DOOM on SMP.
When handling a page fault, we only need to remap the faulting region in
the current process. There's no need to traverse *all* regions that map
the same VMObject and remap them cross-process as well.
Those other regions will get remapped lazily by their own page fault
handlers eventually. Or maybe they won't and we avoided some work. :^)
This patch move AddressSpace (the per-process memory manager) to using
the new atomic "place" APIs in RegionTree as well, just like we did for
MemoryManager in the previous commit.
This required updating quite a few places where VM allocation and
actually committing a Region object to the AddressSpace were separated
by other code.
All you have to do now is call into AddressSpace once and it'll take
care of everything for you.
Instead of first allocating the VM range, and then inserting a region
with that range into the MM region tree, we now do both things in a
single atomic operation:
- RegionTree::place_anywhere(Region&, size, alignment)
- RegionTree::place_specifically(Region&, address, size)
To reduce the number of things we do while locking the region tree,
we also require callers to provide a constructed Region object.
This patch ports MemoryManager to RegionTree as well. The biggest
difference between this and the userspace code is that kernel regions
are owned by extant OwnPtr<Region> objects spread around the kernel,
while userspace regions are owned by the AddressSpace itself.
For kernelspace, there are a couple of situations where we need to make
large VM reservations that never get backed by regular VMObjects
(for example the kernel image reservation, or the big kmalloc range.)
Since we can't make a VM reservation without a Region object anymore,
this patch adds a way to create unbacked Region objects that can be
used for this exact purpose. They have no internal VMObject.)
RegionTree holds an IntrusiveRedBlackTree of Region objects and vends a
set of APIs for allocating memory ranges.
It's used by AddressSpace at the moment, and will be used by MM soon.
If we crashed in the middle of mapping in Regions, some of the regions
may not have a page directory yet, and will result in a crash when
Region::remap() is called.
When a page fault led to the mapping of a new physical page, we were
updating the page tables for *every* region that shared the same
underlying VMObject.
Let's just not do that, avoiding a bunch of unnecessary page table
updates and TLB invalidations.
This allows us to enable Write-Combine on e.g. framebuffers,
significantly improving performance on bare metal.
To keep things simple we right now only use one of up to three bits
(bit 7 in the PTE), which maps to the PA4 entry in the PAT MSR, which
we set to the Write-Combine mode on each CPU at boot time.
We were already using a non-intrusive RedBlackTree, and since the kernel
regions tree is non-owning, this is a trivial conversion that makes a
bunch of the tree operations infallible (by being allocation-free.) :^)
When deleting an entire AddressSpace, we don't need to do TLB flushes
at all (since the entire page directory is going away anyway).
We also don't need to deallocate VM ranges one by one, since the entire
VM range allocator will be deleted anyway.
We now use AK::Error and AK::ErrorOr<T> in both kernel and userspace!
This was a slightly tedious refactoring that took a long time, so it's
not unlikely that some bugs crept in.
Nevertheless, it does pass basic functionality testing, and it's just
real nice to finally see the same pattern in all contexts. :^)
...and also RangeAllocator => VirtualRangeAllocator.
This clarifies that the ranges we're dealing with are *virtual* memory
ranges and not anything else.