Commit Graph

7546 Commits

Author SHA1 Message Date
Liav A
76aace6f19 Kernel: Move x86-specific init sequence code to the x86/Arch directory
The code in init.cpp is specific to the x86 initialization sequence, so
move it to the Arch/x86 directory in the same fashion like the aarch64
pattern.
2022-09-20 18:43:05 +01:00
Liav A
1b7b360ca1 Kernel: Move x86-specific IRQ controller code to Arch/x86 directory
The PIC and APIC code are specific to x86 platforms, so move them out of
the general Interrupts directory to Arch/x86/common/Interrupts directory
instead.
2022-09-20 18:43:05 +01:00
Liav A
aeef1c52bc Kernel: Move PCI IDE driver code to the Arch/x86 directory
That code heavily relies on x86-specific instructions, and while other
CPU architectures and platforms can have PCI IDE controllers, currently
we don't support those, so this code is a special case which needs to be
in the Arch/x86 directory.
In the future it could be put back to the original place when we make it
more generic and suitable for other platforms.
2022-09-20 18:43:05 +01:00
Liav A
8d6da9863f Kernel: Move x86 Bochs VBE code to the Arch/x86 directory
To do this, we make the QEMUDisplayConnector class more standalone so it
does not need to inherit from the BochsDisplayConnector class.
2022-09-20 18:43:05 +01:00
Liav A
c50a81e93e Kernel: Move x86-specific HID code to the Arch/x86 directory
The i8042 controller with its attached devices, the PS2 keyboard and
mouse, rely on x86-specific IO instructions to work. Therefore, move
them to the Arch/x86 directory to make it easier to omit the handling
code of these devices.
2022-09-20 18:43:05 +01:00
Liav A
948be9674a Kernel: Don't compile ISA IDE controller code in non-x86 builds
The ISA IDE controller code makes sense to be compiled in a x86 build as
it relies on access to the x86 IO space. For other architectures, we can
just omit the code as there's no way we can use that code again.
To ensure we can omit the code easily, we move it to the Arch/x86
directory.
2022-09-20 18:43:05 +01:00
Liav A
bb6f61ee5d Kernel/PCI: Convert PCI BAR number to a strong typed enum class 2022-09-20 18:43:05 +01:00
Liav A
f510c0ba04 Kernel: Remove stale includes of x86 IO header file
The AHCI code doesn't rely on x86 IO at all as it only uses memory
mapped IO so we can simply remove the header.

We also simply don't use x86 IO in the Intel graphics driver, so we can
simply remove the include of the x86 IO header there too.

Everything else was a bunch of stale includes to the x86 IO header and
are actually not necessary, so let's remove them to make it easier to
compile non-x86 Kernel builds.
2022-09-20 18:43:05 +01:00
Liav A
485d4e01ed Kernel: Move VMWare backdoor communication code to the x86 directory
The VMWare backdoor handling code involves many x86-specific
instructions and therefore should be in the Arch/x86 directory. This
ensures we can easily omit the code in compile-time for non-x86 builds.
2022-09-20 18:43:05 +01:00
Liav A
e39086f2c6 Kernel: Move PCI initialization x86-specific code to the arch directory
It seems more correct to let each platform to define its own sequence of
initialization of the PCI bus, so let's remove the #if flags and just
put the entire Initializer.cpp file in the appropriate code directory.
2022-09-20 18:43:05 +01:00
Liav A
5576151e68 Kernel: Don't blindly compile Bochs debug output code in ConsoleDevice
Only use the Bochs debug output if we compile a x86 build since bochs
debug output relies on x86 specific instructions.

We also remove the CONSOLE_OUT_TO_BOCHS_DEBUG_PORT flag as we always
compile bochs debug output for x86 builds and we always want to include
the bochs debug output capability as it is very handy and doesn't hurt
bare metal hardware or do any other problem besides taking a small
amount of CPU cycles.
2022-09-20 18:43:05 +01:00
Liav A
fdef8d0d37 Kernel: Move PCSpeaker code to the x86-specific architecture directory
The PCSpeaker code is specific to x86 platforms, thus it makes sense to
put in the Arch/x86 subdirectory.
2022-09-20 18:43:05 +01:00
Liav A
1596ee241f Kernel/PCI: Move IO based HostBridge code to x86 arch-specific directory
The simple PCI::HostBridge class implements access to the PCI
configuration space by using x86 IO instructions. Therefore, it should
be put in the Arch/x86/PCI directory so it can be easily omitted for
non-x86 builds.
2022-09-20 18:43:05 +01:00
Liav A
a02c9c9569 Kernel: Abstract platform-specific serial port access from kprintf
kprintf should not really care about the hardware-specific details of
each UART or serial port out there, so instead of using x86 specific
instructions, let's ensure that we will compile only the relevant code
for debug output for a targeted-specific platform.
2022-09-20 18:43:05 +01:00
Liav A
d5ee03ef5b Kernel/x86: Move RTC and CMOS code to x86 arch-specific subdirectory
The RTC and CMOS are currently only supported for x86 platforms and use
specific x86 instructions to produce only certain x86 plaform operations
and results, therefore, we move them to the Arch/x86 specific directory.
2022-09-20 18:43:05 +01:00
Liav A
e740d959df Kernel: Move CMOS code to the Kernel namespace 2022-09-20 18:43:05 +01:00
Liav A
84fbab6803 Kernel: Move IO delay code to x86 architecture subdirectory
Many code patterns and hardware procedures rely on reliable delay in the
microseconds granularity, and since they are using such delays which are
valid cases, but should not rely on x86 specific code, we allow to
determine in compile time the proper platform-specific code to use to
invoke such delays.
2022-09-20 18:43:05 +01:00
Liav A
cac72259d0 Kernel: Put the RTC code in the Kernel namespace
We only use the RTC code in the Kernel, so it doesn't make sense to make
the RTC namespace outside of it. In addition to that, we will need later
on to use the RTC in an x86 specific manner and this will help us to use
this code in such fashion.
2022-09-20 18:43:05 +01:00
Liav A
16428e4d4c Kernel: Convert NVMe code includes to absolute paths 2022-09-20 18:43:05 +01:00
Liav A
9252a892bb Kernel: Abstracts x86 reboot and shutdown specific methods
We move QEMU and VirtualBox shutdown sequences to a separate file, as
well as moving the i8042 reboot code sequence too to another file.

This allows us to abstract specific methods from the power state node
code of the SysFS filesystem, to allow other architectures to put their
methods there too in the future.
2022-09-20 18:43:05 +01:00
Liav A
0a220a413f Kernel/PCI: Don't use x86 initialization methods in non-x86 builds
Using the IO address space is only relevant for x86 machines, so let's
not compile instructions to access the PCI configuration space when we
don't target x86 platforms.
2022-09-20 18:43:05 +01:00
Liav A
4555cac639 Kernel: Move QEMU shutdown code to the x86 subdirectory
QEMU VM shutdown code is really x86 specific, so let's ensure we only
use it when compiling a Kernel for x86 machines.
2022-09-20 18:43:05 +01:00
Ben Wiederhake
55d78ca40d Kernel: Replace KString::must_create, fix init_args 2022-09-18 18:47:34 -07:00
Ben Wiederhake
f11a69aafb Kernel: Fix misplaced #include in ATA/Definitions.h 2022-09-18 18:30:05 -07:00
Ben Wiederhake
c214d31c5e Everywhere: Fix order of includes and #pragma once 2022-09-18 18:30:05 -07:00
Ben Wiederhake
87eac0e424 Kernel: Add missing include in API
This remained undetected for a long time as HeaderCheck is disabled by
default. This commit makes the following file compile again:

    // file: compile_me.cpp
    #include <Kernel/API/POSIX/ucontext.h>
    // That's it, this was enough to cause a compilation error.
2022-09-18 13:27:24 -04:00
b14ckcat
3452cbd1ed Kernel/USB: Hotplug multiple USB device crash hotfix 2022-09-17 17:11:13 +02:00
Liav A
0c675192c9 Kernel: Send SIGBUS to threads that use after valid Inode mmaped range
According to Dr. POSIX, we should allow to call mmap on inodes even on
ranges that currently don't map to any actual data. Trying to read or
write to those ranges should result in SIGBUS being sent to the thread
that did violating memory access.
2022-09-16 14:55:45 +03:00
Liav A
3ad0e1a1d5 Kernel: Handle mmap requests on zero-length data file inodes safely 2022-09-16 14:55:45 +03:00
Liav A
c88cc8557f Kernel/FileSystem: Make Inode::{write,read}_bytes methods non-virtual
We make these methods non-virtual because we want to ensure we properly
enforce locking of the m_inode_lock mutex. Also, for write operations,
we want to call prepare_to_write_data before the actual write. The
previous design required us to ensure the callers do that at various
places which lead to hard-to-find bugs. By moving everything to a place
where we call prepare_to_write_data only once, we eliminate a possibilty
of forgeting to call it on some code path in the kernel.
2022-09-16 14:55:45 +03:00
Liav A
4f4717e351 Kernel/FileSystem: Mark ext2 inode block list non-const
The block list required a bit of work, and now the only method being
declared const to bypass its const-iness is the read_bytes method that
calls a new method called compute_block_list_with_exclusive_locking that
takes care of proper locking before trying to update the block list data
of the ext2 inode.
2022-09-16 14:55:45 +03:00
Liav A
843bd43c5b Kernel/FileSystem: Mark ext2 inode lookup cache non-const
For the lookup cache, no method being declared const tried to modify it,
so it was easy to drop the mutable declaration on the HashMap member.
2022-09-16 14:55:45 +03:00
Tim Schumacher
8763dbcccc Everywhere: Remove a bunch of dead write-only variables
LLVM 15 now warns (and thus errors) about this, and there is really no
point in keeping them.
2022-09-16 05:39:28 +00:00
Brian Gianforcaro
d0a1775369 Everywhere: Fix a variety of typos
Spelling fixes found by `codespell`.
2022-09-14 04:46:49 +00:00
Andreas Kling
2cc947ede4 Kernel: Use correct timestamp in sys$utimens()
We were mixing up the nanosecond and second parts of the timestamps.

Regressed in 280694bb46.
2022-09-13 17:03:31 +02:00
Filiph Sandström
7e1e208d08 Kernel: Add basic aarch64 support to MemoryManager
FIXME: There's still a lot to do like for example, port `quickmap_page`.
This does however get us further into the boot process than before.
2022-09-12 00:56:44 +01:00
Filiph Sandström
14fe03569a Kernel: Add support for displaying critical output on aarch64 2022-09-12 00:56:44 +01:00
Filiph Sandström
3b331a83e2 Kernel: Include CommandLine as a part of aarch64 2022-09-12 00:56:44 +01:00
Filiph Sandström
fcd1cf4e1b Kernel: Include DeviceManagement as a part of aarch64 2022-09-12 00:56:44 +01:00
demostanis
c56cbf8027 CMake: Quote all CMAKE_COMMAND occurences
Building might fail if the cmake command path contains
whitespace. See https://stackoverflow.com/a/35853080.
2022-09-02 23:34:47 +01:00
Tim Schumacher
5e11a512d6 Kernel: Buffer an entire region when generating coredumps
This allows us to unlock the region tree lock early, to avoid keeping
the lock while we are doing IO.
2022-08-31 16:28:47 +02:00
Tim Schumacher
32a03cffeb Kernel: Work using copies of specific region data during a coredump
This limits our interaction with the "real" region tree (and therefore
its lock) to the time where we actually read from the user address
space.
2022-08-31 16:28:47 +02:00
Liav A
2c84466ad8 Kernel/Storage: Introduce new boot device addressing modes
Before of this patch, we supported two methods to address a boot device:
1. Specifying root=/dev/hdXY, where X is a-z letter which corresponds to
a boot device, and Y as number from 1 to 16, to indicate the partition
number, which can be omitted to instruct the kernel to use a raw device
rather than a partition on a raw device.
2. Specifying root=PARTUUID: with a GUID string of a GUID partition. In
case of existing storage device with GPT partitions, this is most likely
the safest option to ensure booting from persistent storage.

While option 2 is more advanced and reliable, the first option has 2
caveats:
1. The string prefix "/dev/hd" doesn't mean anything beside a convention
on Linux installations, that was taken into use in Serenity. In Serenity
we don't mount DevTmpFS before we mount the boot device on /, so the
kernel doesn't really access /dev anyway, so this convention is only a
big misleading relic that can easily make the user to assume we access
/dev early on boot.
2. This convention although resemble the simple linux convention, is
quite limited in specifying a correct boot device across hardware setup
changes, so option 2 was recommended to ensure the system is always
bootable.

With these caveats in mind, this commit tries to fix the problem with
adding more addressing options as well as to remove the first option
being mentioned above of addressing.
To sum it up, there are 4 addressing options:
1. Hardware relative address - Each instance of StorageController is
assigned with a index number relative to the type of hardware it handles
which makes it possible to address storage devices with a prefix of the
commandset ("ata" for ATA, "nvme" for NVMe, "ramdisk" for Plain memory),
and then the number for the parent controller relative hardware index,
another number LUN target_id, and a third number for LUN disk_id.
2. LUN address - Similar to the previous option, but instead we rely on
the parent controller absolute index for the first number.
3. Block device major and minor numbers - by specifying the major and
minor numbers, the kernel can simply try to get the corresponding block
device and use it as the boot device.
4. GUID string, in the same fashion like before, so the user use the
"PARTUUID:" string prefix and add the GUID of the GPT partition.

For the new address modes 1 and 2, the user can choose to also specify a
partition out of the selected boot device. To do that, the user needs to
append the semicolon character and then add the string "partX" where X
is to be changed for the partition number. We start counting from 0, and
therefore the first partition number is 0 and not 1 in the kernel boot
argument.
2022-08-30 00:50:15 +01:00
b14ckcat
550b3c7330 Kernel/USB: Rework UHCI interrupt transfer schedule
This reworks the way the UHCI schedule is set up to handle interrupt
transfers, creating 11 queue heads each assigned a different
period/latency, so that interrupt transfers can be linked into the
schedule with their specified period more easily.
2022-08-28 13:40:07 +02:00
b14ckcat
4a3a0ac19e Kernel/USB: Rework queued transfer schedule
Modifies the way the UHCI schedule is set up & modified to allow for
multiple transfers of the same type, from one or more devices, to be
queued up and handled simultaneously.
2022-08-28 13:40:07 +02:00
Idan Horowitz
12300b7d0b Kernel: Dump OOM debug info after releasing the MM global data lock
Otherwise we would be holding the MM global data lock and the Process
address space locks in reversed order to the rest of the system, which
can lead to deadlocks.
2022-08-27 21:54:13 +03:00
Idan Horowitz
4ce326205e Kernel: Stop verifying interrupts are disabled in Process::for_each
This is a left-over from back when we didn't have any locking on the
global Process list, nor did we have SMP support, so this acted as some
kind of locking mechanism. We now have proper locks around the Process
list, so this is no longer relevant.
2022-08-27 21:54:13 +03:00
Skye Sprung
eca85f2050 Kernel: Changed serial debug functions and variables for readability
Change the name of set_serial_debug(bool on_or_off) to
set_serial_debug_enabled(bool desired_state). This is to make the names
more expressive and less unclear as to what the function does, as it
only sets the enabled state.
Likewise, change the name of get_serial_debug() to
is_serial_debug_enabled() in order to make clear from the name that
this is simply the state of s_serial_debug_enabled.
Change the name of serial_debug to s_serial_debug_enabled since this is
a static bool describing this state.
Finally, change the signature of set_serial_debug_enabled to return a
bool, as this is more logical and understandable.
2022-08-27 19:42:52 +01:00
Tim Schumacher
8ba6e96d05 Kernel: Reorganize and colorize the scheduler thread list dump 2022-08-26 13:07:07 +02:00
Tim Schumacher
2bf5052608 Kernel: Show more (b)locking info when dumping the process list 2022-08-26 13:07:07 +02:00
Timon Kruiper
47af812a23 Kernel/aarch64: Implement VERIFY_INTERRUPTS_{ENABLED, DISABLED}
This requires to stub `Process::all_instances()`.
2022-08-26 12:51:57 +02:00
Timon Kruiper
026f37b031 Kernel: Move Spinlock functions back to arch independent Locking folder
Now that the Spinlock code is not dependent on architectural specific
code anymore, we can move it back to the Locking folder. This also means
that the Spinlock implemenation is now used for the aarch64 kernel.
2022-08-26 12:51:57 +02:00
Timon Kruiper
c9118de5a6 Kernel/aarch64: Implement critical section related functions 2022-08-26 12:51:57 +02:00
Timon Kruiper
e8aff0c1c8 Kernel: Use InterruptsState in Spinlock code
This commit updates the lock function from Spinlock and
RecursiveSpinlock to return the InterruptsState of the processor,
instead of the processor flags. The unlock functions would only look at
the interrupt flag of the processor flags, so we now use the
InterruptsState enum to clarify the intent, and such that we can use the
same Spinlock code for the aarch64 build.

To not break the build, all the call sites are updated aswell.
2022-08-26 12:51:57 +02:00
Timon Kruiper
6432f3eee8 Kernel: Add enum InterruptsState and helper functions
This commit adds the concept of an InterruptsState to the kernel. This
will be used to make the Spinlock code architecture independent. A new
Processor.cpp file is added such that we don't have to duplicate the
code.
2022-08-26 12:51:57 +02:00
Timon Kruiper
c1b0d254fd Kernel/aarch64: Add stubs for Mutex::lock and Mutex::unlock
Also add the g_scheduler_lock to the Dummy.cpp. This makes the aarch64
build succeeded again.
2022-08-26 12:51:57 +02:00
Andreas Kling
a3b2b20782 Kernel: Remove global MM lock in favor of SpinlockProtected
Globally shared MemoryManager state is now kept in a GlobalData struct
and wrapped in SpinlockProtected.

A small set of members are left outside the GlobalData struct as they
are only set during boot initialization, and then remain constant.
This allows us to access those members without taking any locks.
2022-08-26 01:04:51 +02:00
Andreas Kling
2c72d495a3 Kernel: Use RefPtr instead of LockRefPtr for PhysicalPage
I believe this to be safe, as the main thing that LockRefPtr provides
over RefPtr is safe copying from a shared LockRefPtr instance. I've
inspected the uses of RefPtr<PhysicalPage> and it seems they're all
guarded by external locking. Some of it is less obvious, but this is
an area where we're making continuous headway.
2022-08-24 18:35:41 +02:00
Andreas Kling
5a804b9a1d Kernel: Make PhysicalPage::ref() use relaxed memory order
When incrementing a reference count, it should be sufficient to use
relaxed ordering. Note that unref() still uses acquire-release.
2022-08-24 18:35:41 +02:00
Andreas Kling
0dd88fd836 Kernel: Remove unnecessary forward declaration of s_mm_lock 2022-08-24 14:57:51 +02:00
Andreas Kling
ac3ea277aa Kernel: Don't take MM lock in ~PageDirectory()
We don't need the MM lock to unregister a PageDirectory from the CR3
map. This is already protected by the CR3 map's own lock.
2022-08-24 14:57:51 +02:00
Andreas Kling
5beed613ca Kernel: Don't take MM lock in MemoryManager::dump_kernel_regions()
We have to hold the region tree lock while dumping its regions anyway,
and taking the MM lock here was unnecessary.
2022-08-24 14:57:51 +02:00
Andreas Kling
05156cac94 Kernel: Don't take MM lock in MemoryManager::enter_address_space()
We're not accessing any of the MM members here. Also remove some
redundant code to update CR3, since it calls activate_page_directory()
which does exactly the same thing.
2022-08-24 14:57:51 +02:00
Andreas Kling
2607a6a4bd Kernel: Update comment about what the MM lock protects 2022-08-24 14:57:51 +02:00
Andreas Kling
da24a937f5 Kernel: Don't wrap AddressSpace's RegionTree in SpinlockProtected
Now that AddressSpace itself is always SpinlockProtected, we don't
need to also wrap the RegionTree. Whoever has the AddressSpace locked
is free to poke around its tree.
2022-08-24 14:57:51 +02:00
Andreas Kling
d3e8eb5918 Kernel: Make file-backed memory regions remember description permissions
This allows sys$mprotect() to honor the original readable & writable
flags of the open file description as they were at the point we did the
original sys$mmap().

IIUC, this is what Dr. POSIX wants us to do:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/mprotect.html

Also, remove the bogus and racy "W^X" checking we did against mappings
based on their current inode metadata. If we want to do this, we can do
it properly. For now, it was not only racy, but also did blocking I/O
while holding a spinlock.
2022-08-24 14:57:51 +02:00
Andreas Kling
30861daa93 Kernel: Simplify the File memory-mapping API
Before this change, we had File::mmap() which did all the work of
setting up a VMObject, and then creating a Region in the current
process's address space.

This patch simplifies the interface by removing the region part.
Files now only have to return a suitable VMObject from
vmobject_for_mmap(), and then sys$mmap() itself will take care of
actually mapping it into the address space.

This fixes an issue where we'd try to block on I/O (for inode metadata
lookup) while holding the address space spinlock. It also reduces time
spent holding the address space lock.
2022-08-24 14:57:51 +02:00
Andreas Kling
cf16b2c8e6 Kernel: Wrap process address spaces in SpinlockProtected
This forces anyone who wants to look into and/or manipulate an address
space to lock it. And this replaces the previous, more flimsy, manual
spinlock use.

Note that pointers *into* the address space are not safe to use after
you unlock the space. We've got many issues like this, and we'll have
to track those down as wlel.
2022-08-24 14:57:51 +02:00
Andreas Kling
d6ef18f587 Kernel: Don't hog the MM lock while unmapping regions
We were holding the MM lock across all of the region unmapping code.
This was previously necessary since the quickmaps used during unmapping
required holding the MM lock.

Now that it's no longer necessary, we can leave the MM lock alone here.
2022-08-24 14:57:51 +02:00
Andreas Kling
dc9d2c1b10 Kernel: Wrap RegionTree objects in SpinlockProtected
This makes locking them much more straightforward, and we can remove
a bunch of confusing use of AddressSpace::m_lock. That lock will also
be converted to use of SpinlockProtected in a subsequent patch.
2022-08-24 14:57:51 +02:00
James Bellamy
9c1ee8cbd1 Kernel: Remove big lock from sys$socket
With the implementation of the credentials object the socket syscall no
longer needs the big lock.
2022-08-23 20:29:50 +02:00
Timon Kruiper
d62bd3c635 Kernel/aarch64: Properly initialize T0SZ and T1SZ fields in TCR_EL1
By default these 2 fields were zero, which made it rely on
implementation defined behavior whether these fields internally would be
set to the correct value. The ARM processor in the Raspberry PI (and
QEMU 6.x) would actually fixup these values, whereas QEMU 7.x now does
not do that anymore, and a translation fault would be generated instead.

For more context see the relevant QEMU issue:
 - https://gitlab.com/qemu-project/qemu/-/issues/1157

Fixes #14856
2022-08-23 09:23:27 -04:00
Samuel Bowman
91574ed677 Kernel: Fix boot profiling
Boot profiling was previously broken due to init_stage2() passing the
event mask to sys$profiling_enable() via kernel pointer, but a user
pointer is expected.

To fix this, I added Process::profiling_enable() as an alternative to
Process::sys$profiling_enable which takes a u64 rather than a
Userspace<u64 const*>. It's a bit of a hack, but it works.
2022-08-23 11:48:50 +02:00
Anthony Iacono
ec3d8a7a18 Kernel: Remove unused Process::in_group() 2022-08-23 01:01:48 +02:00
Andreas Kling
434d77cd43 Kernel/ProcFS: Silently ignore attempts to update ProcFS timestamps
We have to override Inode::update_timestamps() for ProcFS inodes,
otherwise we'll get the default behavior of erroring with ENOTIMPL.
2022-08-23 01:00:40 +02:00
Andreas Kling
5307e1bf01 Kernel/SysFS: Silently ignore attempts to update SysFS timestamps
We have to override Inode::update_timestamps() for SysFS inodes,
otherwise we'll get the default behavior of erroring with ENOTIMPL.
2022-08-23 00:55:41 +02:00
Andreas Kling
4c081e0479 Kernel/x86: Protect the CR3->PD map with a spinlock
This can be accessed from multiple CPUs at the same time, so relying on
the interrupt flag is clearly insufficient.
2022-08-22 17:56:03 +02:00
Andreas Kling
6cd3695761 Kernel: Stop taking MM lock while using regular quickmaps
You're still required to disable interrupts though, as the mappings are
per-CPU. This exposed the fact that our CR3 lookup map is insufficiently
protected (but we'll address that in a separate commit.)
2022-08-22 17:56:03 +02:00
Andreas Kling
c8375c51ff Kernel: Stop taking MM lock while using PD/PT quickmaps
This is no longer required as these quickmaps are now per-CPU. :^)
2022-08-22 17:56:03 +02:00
Andreas Kling
a838fdfd88 Kernel: Make the page table quickmaps per-CPU
While the "regular" quickmap (used to temporarily map a physical page
at a known address for quick access) has been per-CPU for a while,
we also have the PD (page directory) and PT (page table) quickmaps
used by the memory management code to edit page tables. These have been
global, which meant that SMP systems had to keep fighting over them.

This patch makes *all* quickmaps per-CPU. We reserve virtual addresses
for up to 64 CPUs worth of quickmaps for now.

Note that all quickmaps are still protected by the MM lock, and we'll
have to fix that too, before seeing any real throughput improvements.
2022-08-22 17:56:03 +02:00
Andreas Kling
930dedfbd8 Kernel: Make sys$utime() and sys$utimensat() not take the big lock 2022-08-22 17:56:03 +02:00
Andreas Kling
280694bb46 Kernel: Update atime/ctime/mtime timestamps atomically
Instead of having three separate APIs (one for each timestamp),
there's now only Inode::update_timestamps() and it takes 3x optional
timestamps. The non-empty timestamps are updated while holding the inode
mutex, and the outside world no longer has to look at intermediate
timestamp states.
2022-08-22 17:56:03 +02:00
Andreas Kling
35b2e9c663 Kernel: Make sys$mknod() not take the big lock 2022-08-22 17:56:03 +02:00
Anthony Iacono
f86b671de2 Kernel: Use Process::credentials() and remove user ID/group ID helpers
Move away from using the group ID/user ID helpers in the process to
allow for us to take advantage of the immutable credentials instead.
2022-08-22 12:46:32 +02:00
Andreas Kling
42435ce5e4 Kernel: Make sys$recvfrom() with MSG_DONTWAIT not so racy
Instead of temporary changing the open file description's "blocking"
flag while doing a non-waiting recvfrom, we instead plumb the currently
wanted blocking behavior all the way through to the underlying socket.
2022-08-21 16:45:42 +02:00
Andreas Kling
8997c6a4d1 Kernel: Make Socket::connect() take credentials as input 2022-08-21 16:35:03 +02:00
Andreas Kling
51318d51a4 Kernel: Make Socket::bind() take credentials as input 2022-08-21 16:33:09 +02:00
Andreas Kling
8d0bd3f225 Kernel: Make LocalSocket do chown/chmod through VFS
This ensures that all the permissions checks are made against the
provided credentials. Previously we were just calling through directly
to the inode setters, which did no security checks!
2022-08-21 16:22:34 +02:00
Andreas Kling
dbe182f1c6 Kernel: Make Inode::resolve_as_link() take credentials as input 2022-08-21 16:17:13 +02:00
Andreas Kling
006f753647 Kernel: Make File::{chown,chmod} take credentials as input
...instead of getting them from Process::current(). :^)
2022-08-21 16:15:29 +02:00
Andreas Kling
c3351d4b9f Kernel: Make VirtualFileSystem functions take credentials as input
Instead of getting credentials from Process::current(), we now require
that they be provided as input to the various VFS functions.

This ensures that an atomic set of credentials is used throughout an
entire VFS operation.
2022-08-21 16:02:24 +02:00
James Bellamy
9744dedb50 Kernel: Use credentials object in Socket set_origin/acceptor 2022-08-21 14:55:01 +02:00
James Bellamy
2686640baf Kernel: Use credentials object in LocalSocket constructor 2022-08-21 14:55:01 +02:00
James Bellamy
386642ffcf Kernel: Use credentials object in VirtualFileSystem
Use credentials object in mknod, create, mkdir, and symlink
2022-08-21 14:55:01 +02:00
James Bellamy
8ef5dbed21 Kernel: Use credentials object in Coredump:try_create_target_file 2022-08-21 14:55:01 +02:00
Andreas Kling
18abba2c4d Kernel: Make sys$getppid() not take the big lock
This only needs to access the process PPID, which is protected by the
"protected data" lock.
2022-08-21 13:29:36 +02:00
Andreas Kling
8ed06ad814 Kernel: Guard Process "protected data" with a spinlock
This ensures that both mutable and immutable access to the protected
data of a process is serialized.

Note that there may still be multiple TOCTOU issues around this, as we
have a bunch of convenience accessors that make it easy to introduce
them. We'll need to audit those as well.
2022-08-21 12:25:14 +02:00
Andreas Kling
728c3fbd14 Kernel: Use RefPtr instead of LockRefPtr for Custody
By protecting all the RefPtr<Custody> objects that may be accessed from
multiple threads at the same time (with spinlocks), we remove the need
for using LockRefPtr<Custody> (which is basically a RefPtr with a
built-in spinlock.)
2022-08-21 12:25:14 +02:00
Liav A
5331d243c6 Kernel/Syscall: Make anon_create to not use Process::allocate_fd method
Instead, allocate when acquiring the lock on m_fds struct, which is
safer to do in terms of safely mutating the m_fds struct, because we
don't use the big process lock in this syscall.
2022-08-21 10:56:48 +01:00
Andreas Kling
619ac65302 Kernel: Get GID from credentials object in sys$setgroups()
I missed one instance of these. Thanks Anthony Iacono for spotting it!
2022-08-20 22:41:49 +02:00
Andreas Kling
9eeee24a39 Kernel+LibC: Enforce a limit on the number of supplementary group IDs
This patch adds the NGROUPS_MAX constant and enforces it in
sys$setgroups() to ensure that no process has more than 32 supplementary
group IDs.

The number doesn't mean anything in particular, just had to pick a
number. Perhaps one day we'll have a reason to change it.
2022-08-20 22:39:56 +02:00
Andreas Kling
998c1152ef Kernel: Mark syscalls that get/set user/group ID as not needing big lock
Now that these operate on the neatly atomic and immutable Credentials
object, they should no longer require the process big lock for
synchronization. :^)
2022-08-20 18:36:47 +02:00
Andreas Kling
122d7d9533 Kernel: Add Credentials to hold a set of user and group IDs
This patch adds a new object to hold a Process's user credentials:

- UID, EUID, SUID
- GID, EGID, SGID, extra GIDs

Credentials are immutable and child processes initially inherit the
Credentials object from their parent.

Whenever a process changes one or more of its user/group IDs, a new
Credentials object is constructed.

Any code that wants to inspect and act on a set of credentials can now
do so without worrying about data races.
2022-08-20 18:32:50 +02:00
Andreas Kling
bec314611d Kernel: Move InodeMetadata methods out of line 2022-08-20 17:20:44 +02:00
Andreas Kling
11eee67b85 Kernel: Make self-contained locking smart pointers their own classes
Until now, our kernel has reimplemented a number of AK classes to
provide automatic internal locking:

- RefPtr
- NonnullRefPtr
- WeakPtr
- Weakable

This patch renames the Kernel classes so that they can coexist with
the original AK classes:

- RefPtr => LockRefPtr
- NonnullRefPtr => NonnullLockRefPtr
- WeakPtr => LockWeakPtr
- Weakable => LockWeakable

The goal here is to eventually get rid of the Lock* classes in favor of
using external locking.
2022-08-20 17:20:43 +02:00
Andreas Kling
e475263113 AK+Kernel: Add AK::AtomicRefCounted and use everywhere in the kernel
Instead of having two separate implementations of AK::RefCounted, one
for userspace and one for kernelspace, there is now RefCounted and
AtomicRefCounted.
2022-08-20 17:15:52 +02:00
Liav A
00e59e8ab7 Kernel: Annotate SpinlockProtected<PacketList> in NetworkAdapter class 2022-08-19 23:50:28 -07:00
kleines Filmröllchen
4314c25cf2 Kernel: Require lock rank for Spinlock construction
All users which relied on the default constructor use a None lock rank
for now. This will make it easier to in the future remove LockRank and
actually annotate the ranks by searching for None.
2022-08-19 20:26:47 -07:00
Idan Horowitz
ae9c6a9ded Kernel: Add 8-byte atomics for i686 GCC
Unlike Clang, GCC does not support 8-byte atomics on i686 with the
-mno-80387 flag set, so until that is fixed, implement a minimal set of
atomics that are currently required.
2022-08-19 19:49:38 +03:00
Tim Schumacher
df4ba7b430 Kernel: Put too small unused network packets back into the list 2022-08-19 14:51:58 +02:00
Tim Schumacher
9e7faff181 Kernel: Protect the list of unused network packets with a Spinlock 2022-08-19 14:51:58 +02:00
Andreas Kling
766bf5c89e Kernel: Don't take thread lock for signal dispatch
Signal dispatch is already protected by the global scheduler lock, but
in some cases we also took Thread::m_lock for some reason. This led to
a number of different deadlocks that started showing up with 4+ CPU's
attached to the system.

As a first step towards solving this, simply don't take the thread lock
and let the scheduler lock cover it.

Eventually, we should work in the other direction and break the
scheduler lock into much finer-grained locks, but let's get out of the
deadlock swamp first.
2022-08-19 14:39:15 +02:00
Liav A
a1a1462a22 Kernel/Memory: Use scope guard to remove a region if we failed to map it 2022-08-19 15:26:04 +03:00
Andreas Kling
23902d46f1 Kernel: Don't lock scheduler in ~Thread()
This is not necessary, and is a leftover from before Thread started
using the ListedRefCounted pattern to be safely removed from lists on
the last call to unref().
2022-08-19 14:06:03 +02:00
Andreas Kling
806ade1367 Kernel: Don't lock scheduler while updating thread scheduling times
We can use simple atomic variables with relaxed ordering for this,
and avoid locking altogether.
2022-08-19 14:05:57 +02:00
Andreas Kling
5ada38f9c3 Kernel: Reduce time under VMObject lock while handling zero faults
We only need to hold the VMObject lock while inspecting and/or updating
the physical page array in the VMObject.
2022-08-19 12:52:48 +02:00
Andreas Kling
a84d893af8 Kernel/x86: Re-enable interrupts ASAP when handling page faults
As soon as we've saved CR2 (the faulting address), we can re-enable
interrupt processing. This should make the kernel more responsive under
heavy fault loads.
2022-08-19 12:14:57 +02:00
Andreas Kling
e1476788ad Kernel: Make sys$anon_create() allocate physical pages immediately
This fixes an issue where a sharing process would map the "lazy
committed page" early and then get stuck with that page even after
it had been replaced in the VMObject by a page fault.

Regressed in 27c1135d30, which made it
happen every time with the backing bitmaps used for WebContent.
2022-08-18 20:59:04 +02:00
Andreas Kling
4bc3745ce6 Kernel: Make Region's physical page accessors safer to use
Region::physical_page() now takes the VMObject lock while accessing the
physical pages array, and returns a RefPtr<PhysicalPage>. This ensures
that the array access is safe.

Region::physical_page_slot() now VERIFY()'s that the VMObject lock is
held by the caller. Since we're returning a reference to the physical
page slot in the VMObject's physical page array, this is the best we
can do here.
2022-08-18 19:20:33 +02:00
Andreas Kling
c3ad4ffcec Kernel: Schedule threads on all processors when SMP is enabled
Note that SMP is still off by default, but this basically removes the
weird "SMP on but threads don't get scheduled" behavior we had by
default. If you pass "smp=on" to the kernel, you now get SMP. :^)
2022-08-18 18:58:33 +02:00
Andreas Kling
b560442fe1 Kernel: Don't hog VMObject lock when remapping a region page
We really only need the VMObject lock when accessing the physical pages
array, so once we have a strong pointer to the physical page we want to
remap, we can give up the VMObject lock.

This fixes a deadlock I encountered while building DOOM on SMP.
2022-08-18 18:56:35 +02:00
Andreas Kling
10399a258f Kernel: Move Region physical page accessors out of line 2022-08-18 18:52:34 +02:00
Andreas Kling
c14dda14c4 Kernel: Add a comment about what the MM lock protects 2022-08-18 18:52:34 +02:00
Andreas Kling
75348bdfd3 Kernel: Don't require MM lock for Region::set_page_directory()
The MM lock is not required for this, it's just a simple ref-counted
pointer assignment.
2022-08-18 18:52:34 +02:00
Andreas Kling
abb84b9fcd Kernel: Fix inconsistent lock acquisition order in kmalloc
We always want to grab the page directory lock before the MM lock.
This fixes a deadlock I encountered when building DOOM with make -j4.
2022-08-18 18:52:34 +02:00
Andreas Kling
27c1135d30 Kernel: Don't remap all regions from Region::remap_vmobject_page()
When handling a page fault, we only need to remap the faulting region in
the current process. There's no need to traverse *all* regions that map
the same VMObject and remap them cross-process as well.

Those other regions will get remapped lazily by their own page fault
handlers eventually. Or maybe they won't and we avoided some work. :^)
2022-08-18 18:52:34 +02:00
Andreas Kling
45e6123de8 Kernel: Shorten time under spinlocks while handling inode faults
- Instead of holding the VMObject lock across physical page allocation
  and quick-map + copy, we now only hold it when updating the VMObject's
  physical page slot.
2022-08-18 18:52:34 +02:00
Andreas Kling
04c362b4dd Kernel: Fix TOCTOU in sys$unveil()
Make sure we reject the unveil attempt with EPERM if the veil was locked
by another thread while we were parsing argument (and not holding the
veil state spinlock.)

Thanks Brian for spotting this! :^)

Amendment to #14907.
2022-08-18 01:04:28 +02:00
Andreas Kling
ae3fa20252 Kernel/x86: Don't re-enable interrupts too soon when unlocking spinlocks
To ensure that we stay on the same CPU that acquired the spinlock until
we're completely unlocked, we now leave the critical section *before*
re-enabling interrupts.
2022-08-18 00:58:34 +02:00
Andreas Kling
cb04caa18e Kernel: Protect the Custody cache with a spinlock
Protecting it with a mutex meant that anyone unref()'ing a Custody
might need to block on said mutex.
2022-08-18 00:58:34 +02:00
Andreas Kling
17de393253 Kernel: Remove outdated FIXME in Custody.h 2022-08-18 00:58:34 +02:00
Andreas Kling
ec330c2ce6 Kernel: Use consistent lock acquisition order in Thread::block*()
We want to grab g_scheduler_lock *before* Thread::m_block_lock.
This appears to have fixed a deadlock that I encountered while building
DOOM with make -j2.
2022-08-18 00:58:34 +02:00
Andreas Kling
ae8558dd5c Kernel: Don't do path resolution in sys$chdir() while holding spinlock
Path resolution may do blocking I/O so we must not do it while holding
a spinlock. There are tons of problems like this throughout the kernel
and we need to find and fix all of them.
2022-08-18 00:58:34 +02:00
Andreas Kling
51bc87d15a Kernel/x86: Disable interrupts when leaving critical sections
This fixes an issue where we could get preempted after acquiring the
current Processor pointer, but before calling methods on it.

I strongly suspect this was the cause of "Processor::current() == this"
assertion failures.
2022-08-18 00:58:34 +02:00
Andreas Kling
cd348918f9 Kernel/x86: Move Processor::{leave,clear}_critical() out of line
I don't think this code needs to be ALWAYS_INLINE, and moving it out
of line will make backtraces nicer.
2022-08-18 00:58:34 +02:00
Samuel Bowman
b5a2f59320 Kernel: Make sys$unveil() not take the big process lock
The unveil syscall uses the UnveilData struct which is already
SpinlockProtected, so there is no need to take the big lock.
2022-08-18 00:04:31 +02:00
Linus Groh
146903a3b5 Kernel: Require semicolon after VERIFY_{NO_,}PROCESS_BIG_LOCK_ACQUIRED
This matches out general macro use, and specifically other verification
macros like VERIFY(), VERIFY_NOT_REACHED(), VERIFY_INTERRUPTS_ENABLED(),
and VERIFY_INTERRUPTS_DISABLED().
2022-08-17 22:56:51 +02:00
Andreas Kling
ce6e93d96b Kernel: Make sys$socketpair() not take the big lock
This system call mainly accesses the file descriptor table, and this is
already guarded by MutexProtected.
2022-08-16 20:43:23 +02:00
Andreas Kling
164c9617c3 Kernel: Only lock file descriptor table once in sys$pipe()
Instead of locking it twice, we now frontload all the work that doesn't
touch the fd table, and then only lock it towards the end of the
syscall.

The benefit here is simplicity. The downside is that we do a bit of
unnecessary work in the EMFILE error case, but we don't need to optimize
that case anyway.
2022-08-16 20:39:45 +02:00
Andreas Kling
b6d0636656 Kernel: Don't leak file descriptors in sys$pipe()
If the final copy_to_user() call fails when writing the file descriptors
to the output array, we have to make sure the file descriptors don't
remain in the process file descriptor table. Otherwise they are
basically leaked, as userspace is not aware of them.

This matches the behavior of our sys$socketpair() implementation.
2022-08-16 20:35:32 +02:00
Andreas Kling
307932857e Kernel: Make sys$pipe() not take the big lock
This system call mainly accesses the file descriptor table, and this is
already guarded by MutexProtected.
2022-08-16 20:20:11 +02:00
Andreas Kling
0b58fd5aef Kernel: Remove unnecessary TOCTOU bug in sys$pipe()
We don't need to explicitly check for EMFILE conditions before doing
anything in sys$pipe(). The fd allocation code will take care of it
for us anyway.
2022-08-16 20:16:17 +02:00
Mike Akers
de980de0e4 Kernel: Lock the inode before writing in SharedInodeVMObject::sync
We ensure that when we call SharedInodeVMObject::sync we lock the inode
lock before calling Inode virtual write_bytes method directly to avoid
assertion on the unlocked inode lock, as it was regressed recently. This
is not a complete fix as the need to lock from each path before calling
the write_bytes method should be avoided because it can lead to
hard-to-find bugs, and this commit only fixes the problem temporarily.
2022-08-16 16:54:03 +02:00
dylanbobb
8180211431 Kernel: Release 1 page instead of all pages when starved for pages
Previously, when starved for pages, *all* clean file-backed memory
would be released, which is quite excessive.

This patch instead releases just 1 page, since only 1 page is needed
to satisfy the request to `allocate_physical_page()`
2022-08-16 01:13:17 +02:00
dylanbobb
09d0fae2c6 Kernel: Allow release of a specific amount of of clean pages
Previously, we could only release *all* clean pages.
This patch makes it possible to release a specific amount of clean
pages. If the attempted number of pages to release is more than the
amount of clean pages, all clean pages will be released.
2022-08-16 01:13:17 +02:00
Andreas Kling
e7d2696a56 Kernel: Shrink default userspace stack size from 4 MiB to 1 MiB
This knocks 70 MiB off our idle footprint, (from 350 MiB to 280 MiB.)
2022-08-15 17:18:11 +02:00
Idan Horowitz
4edae21bd1 Kernel: Remove regions from the region tree after failing to map them
At the point at which we try to map the Region it was already added to
the Process region tree, so we have to make sure to remove it before
freeing it in the mapping failure path, otherwise the tree will contain
a dangling pointer to the free'd instance.
2022-08-15 02:42:28 +02:00
Andreas Kling
ae8f1c7dc8 Kernel: Leak a ref() on the new Process ASAP in sys$fork()
This fixes an issue where failing the fork due to OOM or other error,
we'd end up destroying the Process too early. By the time we got to
WaitBlockerSet::finalize(), it was long gone.
2022-08-15 00:53:28 +02:00
Jorropo
ec4b83326b Kernel: Don't release file-pages if volatile memory purge did it 2022-08-15 00:11:33 +02:00
Andreas Kling
3c7b0dab0b Kernel: Dump list of processes and their memory usage when OOMing 2022-08-14 23:33:28 +02:00