The emulated aarch64 CPU does not contain the RNG cpu feature, so the
random number generator was not seeded. This commit adds a fallback to
use TimeManagement as a entropy source, such that get_good_random_bytes
works, which is needed for running the first userspace application on
aarch64.
This sets up the correct ThreadRegisters state when a process is
exec'ed, which happens when the first userspace application is executed.
Also changes Processor.cpp to get the stack pointer from the
ThreadRegisters.
This allows the function to be called from other translation units, in
particular this allows the CrashHandler.cpp file to be shared between
aarch64 and x86_64.
Setting the kernel_load_base variable caused backtracking to regress, so
to have proper backtracing the calculation of the symbol address in
KSyms.cpp needs to keep into account that the aarch64 kernel is linked
at a high virtual memory address.
When we execute in userspace, the exception level is EL0, so to handle
exceptions, such as interrupts, and syscalls, we need to add handlers to
vector_table.S. For now we only support running userspace applications
in AArch64 mode, so this commit only adds the handlers for that mode.
To detect instruction aborts, a helper to Registers.h is added, and used
in Interrupts.cpp. Additionally, the PageFault class gets a setter to
set the PageFaults m_is_instruction_fetch bool, and is also used in
Interrupts.cpp.
This reverts commit 4e0f85432a as the
ramdisk code is useful for the bring-up of the aarch64 port. Once the
kernel supports better ram-based filesystems, this code will be removed
again.
This replaces manually grabbing the thread's main lock.
This lets us remove the `get_thread_name` and `set_thread_name` syscalls
from the big lock. :^)
This filesystem is based on the code of the long-lived TmpFS. It differs
from that filesystem in one keypoint - its root inode doesn't have a
sticky bit on it.
Therefore, we mount it on /dev, to ensure only root can modify files on
that directory. In addition to that, /tmp is mounted directly in the
SystemServer main (start) code, so it's no longer specified in the fstab
file. We ensure that /tmp has a sticky bit and has the value 0777 for
root directory permissions, which is certainly a special case when using
RAM-backed (and in general other) filesystems.
Because of these 2 changes, it's no longer needed to maintain the TmpFS
filesystem, hence it's removed (renamed to RAMFS), because the RAMFS
represents the purpose of this filesystem in a much better way - it
relies on being backed by RAM "storage", and therefore it's easy to
conclude it's temporary and volatile, so its content is gone on either
system shutdown or unmounting of the filesystem.
When doing PT_SETREGS, we want to verify that the debugged thread is
executing in usermode.
b2f7ccf refactored things and flipped the relevant check around, which
broke things that use PT_SETREGS (for example, stepping over
breakpoints with sdb).
Splitting the I2C-related code lets the DisplayConnector code to utilize
I2C operations without caring about the specific details of the hardware
and allow future expansion of the driver to other newer generations
sharing the same GMBus code.
We should require a timeout for GMBus operations always, because faulty
hardware could let us just spin forever. Also, if nothing is listening
to the bus (which should result in a NAK), we could also spin forever.
Thanks to Andrew Kaster, which gave a review back in October, about a
big PR I opened (#15502), I managed to figure out why we always had a
problem with the first byte being read into the EDID buffer with the
GMBus code. It turns out that this simple invalid cast was making the
entire problem and using the correct AK::Array::data() method fixed this
notorious long standing problem for good.
This patch removes the x86 mechanism for calling syscalls, favoring
the more modern syscall instruction. It also moves architecture
dependent code from functions that are meant to be architecture
agnostic therefore paving the way for adding more architectures.
The function signal_trampoline_dummy was using int 0x82 to call
SC_sigreturn. Since x86 is no longer supported, the correct way
to call a syscall is using the syscall instruction.
This paves the way to remove the syscall trap handling mechanism.
This is done by merging all scattered pieces of derived classes from the
ProcFSInode class into that one class, so we don't use inheritance but
rather simplistic checks to determine the proper code for each ProcFS
inode with its specific characteristics.
This will cause page faults to be generated. Since the previous commits
introduced the handling of page faults, we can now actually correctly
handle page faults.
The code in PageDirectory.cpp now keeps track of the registered page
directories, and actually sets the TTBR0_EL1 to the page table base of
the currently executing thread. When context switching, we now also
change the TTBR0_EL1 to the page table base of the thread that we
context switch into.
The handling of page tables is very architecture specific, so belongs
in the Arch directory. Some parts were already architecture-specific,
however this commit moves the rest of the PageDirectory class into the
Arch directory.
While we're here the aarch64/PageDirectory.{h,cpp} files are updated to
be aarch64 specific, by renaming some members and removing x86_64
specific code.
The class used to look at the x86_64 specific exception code to figure
out what kind of page fault happend, however this refactor allows
aarch64 to use the same class.
Various places in the kernel were manually checking the cs register for
x86_64, however to share this with aarch64 a function in RegisterState
is added, and the call-sites are updated. While we're here the
PreviousMode enum is renamed to ExecutionMode.
Until now the kernel was always executing with SP_EL0, as this made the
initial dropping to EL1 a bit easier. This commit changes this behaviour
to use the corresponding SP_ELx for each exception level.
To make sure that the execution of the C++ code can continue, the
current stack pointer is copied into the corresponding SP_ELx just
before dropping an exception level.
For each exposed PCI device in sysfs, there's a new node called "rom"
and by reading it, it exposes the raw data of a PCI option ROM blob to
a user for examining the blob.
There are now 2 separate classes for almost the same object type:
- EnumerableDeviceIdentifier, which is used in the enumeration code for
all PCI host controller classes. This is allowed to be moved and
copied, as it doesn't support ref-counting.
- DeviceIdentifier, which inherits from EnumerableDeviceIdentifier. This
class uses ref-counting, and is not allowed to be copied. It has a
spinlock member in its structure to allow safely executing complicated
IO sequences on a PCI device and its space configuration.
There's a static method that allows a quick conversion from
EnumerableDeviceIdentifier to DeviceIdentifier while creating a
NonnullRefPtr out of it.
The reason for doing this is for the sake of integrity and reliablity of
the system in 2 places:
- Ensure that "complicated" tasks that rely on manipulating PCI device
registers are done in a safe manner. For example, determining a PCI
BAR space size requires multiple read and writes to the same register,
and if another CPU tries to do something else with our selected
register, then the result will be a catastrophe.
- Allow the PCI API to have a united form around a shared object which
actually holds much more data than the PCI::Address structure. This is
fundamental if we want to do certain types of optimizations, and be
able to support more features of the PCI bus in the foreseeable
future.
This patch already has several implications:
- All PCI::Device(s) hold a reference to a DeviceIdentifier structure
being given originally from the PCI::Access singleton. This means that
all instances of DeviceIdentifier structures are located in one place,
and all references are pointing to that location. This ensures that
locking the operation spinlock will take effect in all the appropriate
places.
- We no longer support adding PCI host controllers and then immediately
allow for enumerating it with a lambda function. It was found that
this method is extremely broken and too much complicated to work
reliably with the new paradigm being introduced in this patch. This
means that for Volume Management Devices (Intel VMD devices), we
simply first enumerate the PCI bus for such devices in the storage
code, and if we find a device, we attach it in the PCI::Access method
which will scan for devices behind that bridge and will add new
DeviceIdentifier(s) objects to its internal Vector. Afterwards, we
just continue as usual with scanning for actual storage controllers,
so we will find a corresponding NVMe controllers if there were any
behind that VMD bridge.
There’s similar RDRAND register (encoded as ‘s3_3_c2_c4_0ʼ) to be
added if needed. RNG CPU feature on Aarch64 guarantees existence of both
RDSEED and RDRAND registers simultaneously—in contrast to x86-64, where
respective instructions are independent of each other.
This is the same address that the x86_64 kernel runs at, and allows us
to run the kernel at a high virtual memory address. Since we now run
completely in high virtual memory, we can also unmap the identity
mapping. Additionally some changes in MMU.cpp are required to
successfully boot.
Since we link the kernel at a high virtual memory address, the addresses
of global variables are also at virtual addresses. To be able to access
them without the MMU enabled, we have to subtract the
KERNEL_MAPPING_BASE.
Compile source files that run early in the boot process without the MMU
enabled, without stack protector and sanitizers. Enabling them will
cause the compiler to insert accesses to global variables, such as
__stack_chk_guard, which cause the CPU to crash, because these variables
are linked at high virtual addresses, which the CPU cannot access
without the MMU enabled.
This is a separate file that behaves similar to the Prekernel for
x86_64, and makes sure the CPU is dropped to EL1, the MMU is enabled,
and makes sure the CPU is running in high virtual memory. This code then
jumps to the usual init function of the kernel.
This was previously hardcoded this to be the physical memory range,
since we identity mapped the memory, however we now run the kernel at
a high virtual memory address.
Also changes PageDirectory.h to store up-to 512 pages, as the code now
needs access to more than 4 pages.
As the kernel is now linked at high address in virtual memory, we cannot
use absolute addresses as they refer to high addresses in virtual
memory. At this point in the boot process we are still running with the
MMU off, so we have to make sure the accesses are using physical memory
addresses.
This function will be used once the kernel runs in high virtual memory
to unmap the identity mapping as userspace will later on use this memory
range instead.
And use it the code that will be part of the early boot process.
The PANIC macro and dbgln functions cannot be used as it accesses global
variables, which in the early boot process do not work, since the MMU is
not yet enabled.
In the upcoming commits, we'll change the kernel to run at a virtual
address in high memory. This commit prepares for that by making sure the
kernel and mmio are mapped into high virtual memory.
A lot of places were relying on AK/Traits.h to give it strnlen, memcmp,
memcpy and other related declarations.
In the quest to remove inclusion of LibC headers from Kernel files, deal
with all the fallout of this included-everywhere header including less
things.
This header has always been fundamentally a Kernel API file. Move it
where it belongs. Include it directly in Kernel files, and make
Userland applications include it via sys/ioctl.h rather than directly.
Reduce inclusion of limits.h as much as possible at the same time.
This does mean that kmalloc.h is now including Kernel/API/POSIX/limits.h
instead of LibC/limits.h, but the scope could be limited a lot more.
Basically every file in the kernel includes kmalloc.h, and needs the
limits.h include for PAGE_SIZE.
A lot of interrupt numbers are initialized with the unhandled interrupt
handler. Whenever a new handler is registered on one of these
interrupts, the old handler is unregistered first. Let's not be verbose
about this since it is perfectly normal.
Following registers accessors are updated and put in use:
* ID_AA64ISAR0_EL1, Instruction Set Attribute Register 0
Accessors for following registers are added and put in use:
* ID_AA64ISAR1_EL1, Instruction Set Attribute Register 1
* ID_AA64ISAR2_EL1, Instruction Set Attribute Register 2
* ID_AA64MMFR1_EL1, AArch64 Memory Model Feature Register 1
* ID_AA64MMFR2_EL1, AArch64 Memory Model Feature Register 2
* ID_AA64MMFR3_EL1, AArch64 Memory Model Feature Register 3
* ID_AA64MMFR4_EL1, AArch64 Memory Model Feature Register 4
* ID_AA64PFR0_EL1, AArch64 Processor Feature Register 0
* ID_AA64PFR1_EL1, AArch64 Processor Feature Register 1
* ID_AA64PFR2_EL1, AArch64 Processor Feature Register 2
* ID_AA64ZFR0_EL1, AArch64 SVE Feature ID register 0
* ID_AA64SMFR0_EL1, AArch64 SME Feature ID register 0
* ID_AA64DFR0_EL1, AArch64 Debug Feature Register 0
* ID_AA64DFR1_EL1, AArch64 Debug Feature Register 1
Additionally, there are few CPU features detected with
* TCR_EL1, Translation Control Register
but detection mechanism using it (for LPA/LPA2) is probably wrong as
this is control register, not a id register, and needs further work.
Finally, following registers are provided. Former one is already used,
while latter is given for future use:
* MIDR_EL1, Main ID Register
* AIDR_EL1, Auxiliary ID Register
Settled for `cpu_feature_to_name` as that naming is more descriptive
and similarly named `cpu_feature_to_description` function will be
provided for Aarch64.
If USING_AK_GLOBALLY is not defined, the name IsLvalueReference might
not be available in the global namespace. Follow the pattern established
in LibTest to fully qualify AK types in macros to avoid this problem.
These are formatters that can only be used with debug print
functions, such as dbgln(). Currently this is limited to
Formatter<ErrorOr<T>>. With this you can still debug log ErrorOr
values (good for debugging), but trying to use them in any
String::formatted() call will fail (which prevents .to_string()
errors with the new failable strings being ignored).
You make a formatter debug only by adding a constexpr method like:
static constexpr bool is_debug_only() { return true; }
When calling ioctl on a socket with SIOCGIFHWADDR, return the correct
physical interface type. This value was previously hardcoded to
ARPHRD_ETHER (Ethernet), and now can also return ARPHRD_LOOPBACK for the
loopback adapter.
Before this patch, Core::SessionManagement::parse_path_with_sid() would
figure out the root session ID by sifting through /sys/kernel/processes.
That file can take quite a while to generate (sometimes up to 40ms on my
machine, which is a problem on its own!) and with no caching, many of
our programs were effectively doing this multiple times on startup when
unveiling something in /tmp/session/%sid/
While we should find ways to make generating /sys/kernel/processes fast
again, this patch addresses the specific problem by introducing a new
syscall: sys$get_root_session_id(). This extracts the root session ID
by looking directly at the process table and takes <1ms instead of 40ms.
This cuts WebContent process startup time by ~100ms on my machine. :^)
Resolves issue where a panic would occur if the file system failed to
initialize or mount, due to how the FileSystem was already added to
VFS's list. The newly-created FileSystem destructor would fail as a
result of the object still remaining in the IntrusiveList.
Nobody tests this network card as the person who added it, Jean-Baptiste
Boric (known as boricj) is not an active contributor in the project now.
After a discussion with him on the Discord server, we agreed it's for
the best to remove the driver, as for two reasons:
- The original author (boricj) agreed to do this, stating that he will
not be able to test the driver anymore after his Athlon XP machine is
no longer supported after the removal of the i686 port.
- It was agreed that the NE2000 network card family is far from the
ideal hardware we would want to support, similarly to the RTL8139 that
got removed recently for almost the same reason.
Instead of using a clunky switch-case paradigm, we now have all drivers
being declaring two methods for their adapter class - create and probe.
These methods are linked in each PCIGraphicsDriverInitializer structure,
in a new s_initializers static list of them.
Then, when we probe for a PCI device, we use each probe method and if
there's a match, then the corresponding create method is called.
As a result of this change, it's much more easy to add more drivers and
the initialization code is more readable.
We try our best to ensure a DisplayConnector initialization succeeds,
and this makes the Intel driver to work again, because if we can't
allocate a Region for the whole PCI BAR mapped region, then we will try
to allocate a Region with 16 MiB window size, so it doesn't eat the
entire Kernel-allocated virtual memory space.
Instead of just returning nothing, let's return Error or nothing.
This would help later on with error propagation in case of failure
during this method.
This also makes us more paranoid about failure in this method, so when
initializing a DisplayConnector we safely tear down the internal members
of the object. This applies the same for a StorageDevice object, but its
after_inserting method is much smaller compared to the DisplayConnector
overriden method.
Nobody tests this network card, and the driver has bugs (see the issue
https://github.com/SerenityOS/serenity/issues/10198 for more details),
so it's almost certain that this happened due to code being rotting when
there's simply no testing of it.
Essentially this has been determined to be dead-code so this is the most
important reason to drop this code. Another good reason to do so is
because the RTL8139 only supports Fast Ethernet connections (10/100
Megabits per second), and is considered obsolete even for bare metal
setups.
Instead of using a clunky if-statement paradigm, we now have all drivers
being declaring two methods for their adapter class - create and probe.
These methods are linked in each PCINetworkDriverInitializer structure,
in a new s_initializers static list of them.
Then, when we probe for a PCI device, we use each probe method and if
there's a match, then the corresponding create method is called. After
the adapter instance is created, we call the virtual initialize method
on it, because many drivers actually require a sort of post-construction
initialization sequence to ensure the network adapter can properly
function.
As a result of this change, it's much more easy to add more drivers and
the initialization code is more readable and it's easier to understand
when and where things could fail in the whole initialization sequence.
Instead of allocating those regions in the constructor, which makes it
impossible to fail in case of OOM condition, allocate them in the static
factory method so we could propagate errors in case of failure.
Instead of allocating after the construction point ensure that all Intel
drivers are allocating necessary buffer regions and then pass them to
the constructors.
This could let us fail early in case of OOM, so we don't touch a network
adapter before we ensure we have all the appropriate mappings in place.
We really don't want callers of this function to accidentally change
the jail, or even worse - remove the Process from an attached jail.
To ensure this never happens, we can just declare this method as const
so nobody can mutate it this way.
Use this helper function in various places to replace the old code of
acquiring the SpinlockProtected<RefPtr<Jail>> of a Process to do that
validation.
This seems to work perfectly OK on my ICH7 test machine and also it
works on QEMU, so it is probably OK to restore this.
This will ensure we always get scan code set 1 input, because we enable
scan code set 2 and PS/2 translation on the first (keyboard) port.
The setting of scan code set sequence is removed, as it's buggy and
could lead the controller to fail immediately when doing self-test
afterwards. We will restore it when we understand how to do so safely.
Allow the user to determine a preferred detection path with a new kernel
command line argument. The defualt option is to check i8042 presence
with an ACPI check and if necessary - an "aggressive" test to determine
i8042 existence in the system.
Also, keep the i8042 controller pointer on the stack, so don't assign
m_i8042_controller member pointer if it does not exist.
Only do so after a brief check if we are in a Jail or not. This fixes
SMP, because apparently it is crashing when calling try_generate()
from the SysFSGlobalInformation::refresh_data method, so the fix for
this is to simply not do that inside the Process' Jail spinlock scope,
because otherwise we will simply have a possible flow of taking
multiple conflicting Spinlocks (in the wrong order multiple times), for
the SysFSOverallProcesses generation code:
Process::current().jail(), and then Process::for_each_in_same_jail being
called, we take Process::all_instances(), and Process::current().jail()
again.
Therefore, we should at the very least eliminate the first taking of the
Process::current().jail() spinlock, in the refresh_data method of the
SysFSGlobalInformation class.
A virtual method named device_name() was added to
Kernel::PCI to support logging the PCI::Device name
and address using dmesgln_pci. Previously, PCI::Device
did not store the device name.
All devices inheriting from PCI::Device now use dmesgln_pci where
they previously used dmesgln.
`inline` already assigns vague linkage, so there's no need to
also assign per-TU linkage. Allows the linker to dedup these
functions across TUs (and is almost always just the Right Thing
to do in C++ -- this ain't C).
* Fix bug where last character of a filename or extension would be
truncated (HELLO.TXT -> HELL.TX).
* Fix bug where additional NULL characters would be added to long
filenames that did not completely fill one of the Long Filename Entry
character fields.
There are places in the kernel that would like to have access
to `pgid` credentials in certain circumstances.
I haven't found any use cases for `sid` yet, but `sid` and `pgid` are
both changed with `sys$setpgid`, so it seemed sensical to add it.
In Linux, `man 7 credentials` also mentions both the session id and
process group id, so this isn't unprecedented.
These instances were detected by searching for files that include
AK/Memory.h, but don't match the regex:
\\b(fast_u32_copy|fast_u32_fill|secure_zero|timing_safe_compare)\\b
This regex is pessimistic, so there might be more files that don't
actually use any memory function.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
These instances were detected by searching for files that include
AK/Concepts.h, but don't match the regex:
\\b(AnyString|Arithmetic|ArrayLike|DerivedFrom|Enum|FallibleFunction|Flo
atingPoint|Fundamental|HashCompatible|Indexable|Integral|IterableContain
er|IteratorFunction|IteratorPairWith|OneOf|OneOfIgnoringCV|SameAs|Signed
|SpecializationOf|Unsigned|VoidFunction)\\b
(Without the linebreaks.)
This regex is pessimistic, so there might be more files that don't
actually use any concepts.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
These instances were detected by searching for files that include
AK/StdLibExtras.h, but don't match the regex:
\\b(abs|AK_REPLACED_STD_NAMESPACE|array_size|ceil_div|clamp|exchange|for
ward|is_constant_evaluated|is_power_of_two|max|min|mix|move|_RawPtr|RawP
tr|round_up_to_power_of_two|swap|to_underlying)\\b
(Without the linebreaks.)
This regex is pessimistic, so there might be more files that don't
actually use any "extra stdlib" functions.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
These instances were detected by searching for files that include
AK/Format.h, but don't match the regex:
\\b(CheckedFormatString|critical_dmesgln|dbgln|dbgln_if|dmesgln|FormatBu
ilder|__FormatIfSupported|FormatIfSupported|FormatParser|FormatString|Fo
rmattable|Formatter|__format_value|HasFormatter|max_format_arguments|out
|outln|set_debug_enabled|StandardFormatter|TypeErasedFormatParams|TypeEr
asedParameter|VariadicFormatParams|v_critical_dmesgln|vdbgln|vdmesgln|vf
ormat|vout|warn|warnln|warnln_if)\\b
(Without the linebreaks.)
This regex is pessimistic, so there might be more files that don't
actually use any formatting functions.
Observe that this revealed that Userland/Libraries/LibC/signal.cpp is
missing an include.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
These instances were detected by searching for files that include
Kernel/Debug.h, but don't match the regex:
\\bdbgln_if\(|_DEBUG\\b
This regex is pessimistic, so there might be more files that don't check
for any real *_DEBUG macro. There seem to be no corner cases anyway.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
This step would ideally not have been necessary (increases amount of
refactoring and templates necessary, which in turn increases build
times), but it gives us a couple of nice properties:
- SpinlockProtected inside Singleton (a very common combination) can now
obtain any lock rank just via the template parameter. It was not
previously possible to do this with SingletonInstanceCreator magic.
- SpinlockProtected's lock rank is now mandatory; this is the majority
of cases and allows us to see where we're still missing proper ranks.
- The type already informs us what lock rank a lock has, which aids code
readability and (possibly, if gdb cooperates) lock mismatch debugging.
- The rank of a lock can no longer be dynamic, which is not something we
wanted in the first place (or made use of). Locks randomly changing
their rank sounds like a disaster waiting to happen.
- In some places, we might be able to statically check that locks are
taken in the right order (with the right lock rank checking
implementation) as rank information is fully statically known.
This refactoring even more exposes the fact that Mutex has no lock rank
capabilites, which is not fixed here.
Using policy based design `SinglyLinkedList` and
`SinglyLinkedListWithCount` can be combined into one class which takes
a policy to determine how to keep track of the size of the list. The
default policy is to use list iteration to count the items in the list
each time. The `WithCount` form is a different policy which tracks the
size, but comes with the overhead of storing the count and
incrementing/decrementing on each modification.
This model is extensible to have other forms of counting by
implementing only a new policy instead of implementing a totally new
type.
These instances were detected by searching for files that include
Array.h, but don't match the regex:
\\b(Array(?!\.h>)|iota_array|integer_sequence_generate_array)\\b
These are the three symbols defined by Array.h.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
If a page fault occurs while interrupts are disabled, we were wrongly
enabling interrupts right away in the page fault handler.
Instead, we should only do this if interrupts were enabled when the
page fault occurred.
We were already handling the rmdir("..") case by refusing to remove
directories that were not empty.
This patch removes a FIXME from January 2019 and adds a test. :^)
Dr. POSIX says that we should reject attempts to rmdir() the file named
"." so this patch does exactly that. We also add a test.
This solves a FIXME from January 2019. :^)
This has been done in multiple ways:
- Each time we modeset the resolution via the VirtIOGPU DisplayConnector
we ensure that the framebuffer is updated with the new resolution.
- Each time the cursor is updated we ensure that the framebuffer console
is marked dirty so the IO Work Queue task which is scheduled to check
if it is dirty, will flush the surface.
- We only initialize a framebuffer console after we ensure that at the
very least a DisplayConnector has being set with a known resolution.
- We only call GenericFramebufferConsole::enable() when enabling the
console after the important variables of the console (m_width, m_pitch
and m_height) have been set.
Check if the process we are currently running is in a jail, and if that
is the case, fail early with the EPERM error code.
Also, as Brian noted, we should also disallow attaching to a jail in
case of already running within a setid executable, as this leaves the
user with false thinking of being secure (because you can't exec new
setid binaries), but the current program is still marked setid, which
means that at the very least we gained permissions while we didn't
expect it, so let's block it.
The hand-written assembly does not compile under Clang due to register
size mismatches. Using a loop is slower (~6 instructions on O2 as
opposed to 2 with hand-written assembly), but using the pause
instruction makes this more efficient even under TCG.
For pause we use isb sy which will put the processor to sleep while the
pipeline is being flushed. This instruction is also used by Rust in spin
loops and found to be more efficient, as well as being a rough
equivalent to the x86 pause instruction which we also use here.
For wait_check we use yield, which is a hinted nop that is faster to
execute, and I leave a FIXME for processing SMP messages once we support
SMP.
These two changes probably make spin loops work on aarch64 :^)
This commit changes the init.cpp file to start and initialize the
Scheduler, and actually runs init_stage2. To show that it actually
works, another thread is spawned and executed simultaneously, by context
switching between the two!
This requires two new functions, context_first_init and
restore_context_and_eret. With this code in place, we are now running
the first idle thread! :^)
This changes the stack pointer to the initial_thread stack pointer, and
pushes two pointers onto the stack that point to the initial_thread. The
function then jumps to the ip of the initial_thread, which will be
thread_context_first_enter, and hangs there because that function is not
yet implemented.
This does not handle everything correctly yet, such as setting the
correct state for running userspace applications, however this should be
enough to get kernel scheduling to work.
These are architecture-specific anyway, so they belong in the Arch
directory. This commit also adds ThreadRegisters::set_initial_state to
factor out the logic in Thread.cpp.
I can't think of a reason why copying the Processor class makes sense,
so lets make sure it's not possible to do it by accident by declaring
the copy constructor as deleted.
This removes the x86 specific hlt instruction from the scheduler, and
allows us to run the scheduler code for aarch64 by implementing
Processor::wait_for_interrupt for aarch64.
This file does not contain any architecture specific implementations,
so we can move it to the Kernel base directory. Also update the relevant
include paths.
We are actually storing tpidr_el0, as can be seen in vector_table.S, but
the RegisterState.h incorrectly had tpidr_el1. This will probably save
some annoying debugging later on.
This is necessary to allow transferring frame buffers larger than
~500x500 pixels back to user space. Until the buffer management is
improved this allows us to at least test the existing game ports.
The ARM CPU is set up to trap on unaligned accesses, however the
compiler will still generate them if this flag is not set. We also need
the -Wno-cast-align as there are some files in AK that don't build
without the flag.
This happens to be a sad truth for the VirtIOGPU driver - it lacked any
error propagation measures and generally relied on clunky assumptions
that most operations with the GPU device are infallible, although in
reality much of them could fail, so we do need to handle errors.
To fix this, synchronous GPU commands no longer rely on the wait queue
mechanism anymore, so instead we introduce a timeout-based mechanism,
similar to how other Kernel drivers use a polling based mechanism with
the assumption that hardware could get stuck in an error state and we
could abort gracefully.
Then, we change most of the VirtIOGraphicsAdapter methods to propagate
errors properly to the original callers, to ensure that if a synchronous
GPU command failed, either the Kernel or userspace could do something
meaningful about this situation.
The performance that we achieve from this technique is visually worse
compared to turning off this feature, so let's not use this until we
figure out why it happens.
`OwnPtrWithCustomDeleter` was a decorator which provided the ability
to add a custom deleter to `OwnPtr` by wrapping and taking the deleter
as a run-time argument to the constructor. This solution means that no
additional space is needed for the `OwnPtr` because it doesn't need to
store a pointer to the deleter, but comes at the cost of having an
extra type that stores a pointer for every instance.
This logic is moved directly into `OwnPtr` by adding a template
argument that is defaulted to the default deleter for the type. This
means that the type itself stores the pointer to the deleter instead
of every instance and adds some type safety by encoding the deleter in
the type itself instead of taking a run-time argument.
We now move the ErrorOr returning functions in the constructor to the
try_to_initialize() factory, which allows us to handle the errors and
removes two FIXME's :))
We add this basic functionality to the Kernel so Userspace can request a
particular virtual memory mapping to be immutable. This will be useful
later on in the DynamicLoader code.
The annotation of a particular Kernel Region as immutable implies that
the following restrictions apply, so these features are prohibited:
- Changing the region's protection bits
- Unmapping the region
- Annotating the region with other virtual memory flags
- Applying further memory advises on the region
- Changing the region name
- Re-mapping the region
This syscall will be used later on to ensure we can declare virtual
memory mappings as immutable (which means that the underlying Region is
basically immutable for both future annotations or changing the
protection bits of it).
As is, we never *deallocate* them, so we will run out eventually.
Creating a context, or allocating a context ID, now returns ErrorOr if
there are no available free context IDs.
`number_of_fixmes--;` :^)
This now only requires `size + alignment` bytes while searching for a
free memory location. For the actual allocation, the memory area is
properly trimmed to the required alignment.
This keeps us from tripping strict aliasing, which previously made TCP
connections inoperable when building without `-fsanitize=undefined` or
`-fno-strict-aliasing`.
This patch validates that the size of the auxiliary vector does not
exceed `Process::max_auxiliary_size`. The auxiliary vector is a range
of memory in userspace stack where the kernel can pass information to
the process that will be created via `Process:do_exec`.
The reason the kernel needs to validate its size is that the about to
be created process needs to have remaining space on the stack.
Previously only `argv` and `envp` were taken into account for the
size validation, with this patch, the size of `auxv` is also
checked. All three elements contain values that a user (or an
attacker) can specify.
This patch adds the constant `Process::max_auxiliary_size` which is
defined to be one eight of the user-space stack size. This is the
approach taken by `Process:max_arguments_size` and
`Process::max_environment_size` which are used to check the sizes
of `argv` and `envp`.
Note that this still keeps the old behaviour of putting things in std by
default on serenity so the tools can be happy, but if USING_AK_GLOBALLY
is unset, AK behaves like a good citizen and doesn't try to put things
in the ::std namespace.
std::nothrow_t and its friends get to stay because I'm being told that
compilers assume things about them and I can't yeet them into a
different namespace...for now.
When scanning for network adapters, we give each driver a chance to
claim the PCI device and whoever claims it first gets to keep it.
Before this patch, the driver API returned a LockRefPtr<AdapterType>,
which made it impossible to propagate errors that occurred during
detection and/or initialization.
This patch changes the API so that errors can bubble all the way out
the PCI enumeration in NetworkingManagement::initialize() where we
perform all the network adapter auto-detection on boot.
When we eventually start to support hot-plugging network adapter in the
future, it will be even more important to propagate errors instead of
swallowing them.
Importantly, before this patch, some errors were "handled" by panicking
the kernel. This is no longer the case.
7 FIXMEs were killed in the making of this commit. :^)
We were previously using a 32-bit unsigned integer for this, which
caused us to start truncating region sizes when multiplied with
`PAGE_SIZE` on hardware with a lot of memory.
Some programs explicitly ask for a different initial stack size than
what the OS provides. This is implemented in ELF by having a
PT_GNU_STACK header which has its p_memsz set to the amount that the
program requires. This commit implements this policy by reading the
p_memsz of the header and setting the main thread stack size to that.
ELF::Image::validate_program_headers ensures that the size attribute is
a reasonable value.
This commit makes it possible for a process to downgrade a file lock it
holds from a write (exclusive) lock to a read (shared) lock. For this,
the process must point to the exact range of the flock, and must be the
owner of the lock.
Joining dead threads is allowed for two main reasons:
- Thread join behavior should not be racy when a thread is joined and
exiting at roughly the same time. This is common behavior when threads
are given a signal to end (meaning they are going to exit ASAP) and
then joined.
- POSIX requires that exited threads are joinable (at least, there is no
language in the specification forbidding it).
The behavior is still well-defined; e.g. it doesn't allow a dead
detached thread to be joined or a thread to be joined more than once.
The fact that we used a Vector meant that even if creating a Mount
object succeeded, we were still at a risk that appending to the actual
mounts Vector could fail due to OOM condition. To guard against this,
the mount table is now an IntrusiveList, which always means that when
allocation of a Mount object succeeded, then inserting that object to
the list will succeed, which allows us to fail early in case of OOM
condition.
From now on, we don't allow jailed processes to open all device nodes in
/dev, but only allow jailed processes to open /dev/full, /dev/zero,
/dev/null, and various TTY and PTY devices (and not including virtual
consoles) so we basically restrict applications to what they can do when
they are in jail.
The motivation for this type of restriction is to ensure that even if a
remote code execution occurred, the damage that can be done is very
small.
We also don't restrict reading and writing on device nodes that were
already opened, because that limit seems not useful, especially in the
case where we do want to provide an OpenFileDescription to such device
but nothing further than that.
`SysFSComponentRegistry`, `ProcFSComponentRegistry` and
`attach_null_device` "just work" already; let's include them to match
x86_64 as closely as possible.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
This patch adds a way to ask the allocator to skip its internal
scrubbing memset operation. Before this change, kcalloc() would scrub
twice: once internally in kmalloc() and then again in kcalloc().
The same mechanism already existed in LibC malloc, and this patch
brings it over to the kernel heap allocator as well.
This solves one FIXME in kcalloc(). :^)
This solves one of the security issues being mentioned in issue #15996.
We simply don't allow creating hardlinks on paths that were not unveiled
as writable to prevent possible bypass on a certain path that was
unveiled as non-writable.
Instead, allow userspace to decide on the coredump directory path. By
default, SystemServer sets it to the /tmp/coredump directory, but users
can now change this by writing a new path to the sysfs node at
/sys/kernel/variables/coredump_directory, and also to read this node to
check where coredumps are currently generated at.