It didn't feel right to have a "DHCPClient" in a "Servers" directory.
Rename this to Services to better reflect the type of programs we'll
be putting in there.
Ultimately we should not panic just because we can't fully commit a VM
region (by populating it with physical pages.)
This patch handles some of the situations where commit() can fail.
This caused us to report one purged page per occurrence of the shared
zero page in a purgeable memory region, despite it being a no-op.
Thanks to Sergey for spotting the bad assertion removal that led to
this being found!
This patch adds PageFaultResponse::OutOfMemory which informs the fault
handler that we were unable to allocate a necessary physical page and
cannot continue.
In response to this, the kernel will crash the current process. Because
we are OOM, we can't symbolicate the crash like we normally would
(since the ELF symbolication code needs to allocate), so we also
communicate to Process::crash() that we're out of memory.
Now we can survive "allocate 300 MB" (only the allocate process dies.)
This is definitely not perfect and can easily end up killing a random
innocent other process who happened to allocate one page at the wrong
time, but it's a *lot* better than panicking on OOM. :^)
This function has a lot of callers that don't bother checking if it
returns successfully or not. We'll need to handle failure in a bunch
of places and then we can remove this assertion.
The detection works very similarly to how we detect a mouse wheel, just
another magical sequence of "set sample rate" requests to the mouse
followed by an ID check.
If we OOM during a CoW fault and fail to allocate a new page for the
writing process, just leave the original VMObject alone so everyone
else can keep using it.
Since a Region is basically a view into a potentially larger VMObject,
it was always necessary to include the Region starting offset when
accessing its underlying physical pages.
Until now, you had to do that manually, but this patch adds a simple
Region::physical_page() for read-only access and a physical_page_slot()
when you want a mutable reference to the RefPtr<PhysicalPage> itself.
A lot of code is simplified by making use of this.
Previously we blindly just called update_next_timer_due() when
ever we modified the timer list. Since we know the list is sorted
this is a bit wasteful, and we can do better.
This change refactors the code so we only update the next due time
when necessary. In places where it was possible the code was modified
to directly modify the next due time, instead of having to go to the
front of the list to fetch it.
The public consumers of the timer API shouldn't need to know
the how timer id's are tracked internally. Expose a typedef
instead to allow the internal implementation to be protected
from potential churn in the future.
It's also just good API design.
Utilize the new Thread::wait_on timeout parameter to implement
timeout support for FUTEX_WAIT.
As we compute the relative time from the user specified absolute
time, we try to delay that computation as long as possible before
we call into Thread::wait_on(..). To enable this a small bit of
refactoring was done pull futex_queue fetching out and timeout fetch
and calculation separation.
This change plumbs a new optional timeout option to wait_on.
The timeout is enabled by enqueing a timer on the timer queue
while we are waiting. We can then see if we were woken up or
timed out by checking if we are still on the wait queue or not.
The current API of add_timer makes it hard to use as
you are forced to do a bunch of time arithmetic at the
caller. Ideally we would have overloads for common time
types like timespec or timeval to keep the API as straight
forward as possible. This change moves us in that direction.
While I'm here, we should really also use the machines actual
ticks per second, instead of the OPTIMAL_TICKS_PER_SECOND_RATE.
This is a special case that was previously not implemented.
The idea is that you can dispatch a signal to all other processes
the calling process has access to.
There was some minor refactoring to make the self signal logic
into a function so it could easily be easily re-used from do_killall.
Previously, when returning from a pthread's start_routine, we would
segfault. Now we instead implicitly call pthread_exit as specified in
the standard.
pthread_create now creates a thread running the new
pthread_create_helper, which properly manages the calling and exiting
of the start_routine supplied to pthread_create. To accomplish this,
the thread's stack initialization has been moved out of
sys$create_thread and into the userspace function create_thread.
A Lock can now be held either in shared or exclusive mode. Multiple threads can
hold the same lock in shared mode at one time, but if any thread holds the lock
in exclusive mode, no other thread can hold it at the same time in either mode.
The next commit is going to make it bigger again by increasing the size of Lock,
so make use of bitfields to make sure FileDescription still fits into 64 bytes,
and so can still be allocated with the SlabAllocator.
We don't need a wrapper Function object that just forwards the timer
callback to the scheduler tick function. It already has the same
signature, so we can just plug it in directly. :^)
Same with the clock updating function.
The IOAPIC manual states that "Interrupt Mask-R/W. When this bit is 1,
the interrupt signal is masked. Edge-sensitive interrupts signaled on
a masked interrupt pin are ignored." - Therefore we have to ensure that
we disable interrupts globally with cli(), but also to ensure that we
invoke enable_irq() before sending the hardware command that generates
an IRQ almost immediately.
First, before this change, specifying 'force_pio' in the kernel
commandline was meaningless because we nevertheless set the DMA flag to
be enabled.
Also, we had a problem in which we used IO::repeated_out16() in PIO
write method. This might work on buggy emulators, but I suspect that on
real hardware this code will fail.
The most difficult problem was to restore the PIO read operation.
Apparently, it seems that we can't use IO::repeated_in16() here because
it will read zeroed data. Currently we rely on a simple loop that
invokes IO::in16() to a buffer. Also, the interrupt handling stage in
the PIO read method is moved to be handled inside the loop of reading
the requested sectors.
POSIX says, "Conforming applications should not assume that the returned
contents of the symbolic link are null-terminated."
If we do include the null terminator into the returning string, Python
believes it to actually be a part of the returned name, and gets unhappy
about that later. This suggests other systems Python runs in don't include
it, so let's do that too.
Also, make our userspace support non-null-terminated realpath().
Output address validation should be done for the tracer's address space
and not the tracee's.
Also use copy_to_user() instead of copy_from_user(). The two are really
identical at the moment, but maybe we can add some assertions to make
sure we're doing what we think we're doing.
Thanks to Sergey for spotting these!
We currently only care about debug exceptions that are triggered
by the single-step execution mode.
The debug exception is translated to a SIGTRAP, which can be caught
and handled by the tracing thread.
This memory range was set up using 2MB pages by the code in boot.S.
Because of that, the kernel image protection code didn't work, since it
assumed 4KB pages.
We now switch to 4KB pages during MemoryManager initialization. This
makes the kernel image protection code work correctly again. :^)
The syscall wrapper for ptrace needs to return the peeked value when
using PT_PEEK.
Because of this, the user has to check errno to detect an error in
PT_PEEK.
This commit changes the actual syscall's interface (only for PT_PEEK) to
allow the syscall wrapper to detect an error and change errno.
PT_SETTREGS sets the regsiters of the traced thread. It can only be
used when the tracee is stopped.
Also, refactor ptrace.
The implementation was getting long and cluttered the alraedy large
Process.cpp file.
This commit moves the bulk of the implementation to Kernel/Ptrace.cpp,
and factors out peek & poke to separate methods of the Process class.
This was a missing feature in the PT_TRACEME command.
This feature allows the tracer to interact with the tracee before the
tracee has started executing its program.
It will be useful for automatically inserting a breakpoint at a
debugged program's entry point.
Before this commit, m_blocker was only set to null in Thread::block,
after the thread has been unblocked.
Starting with this commit, m_blocker is also set to null in
Thread::unblock.
This change will allow us to implement a missing feature of the PT_TRACE
command of the ptrace syscall - stopping the traced thread when it
exits the execve syscall.
That feature will be implemented by sending a blocking SIGSTOP to the
traced thread after it has executed the execve logic and before it
starts executing the new program in userspace.
However, since Process::exec arranges the tss to return to userspace
(the so-called "yield-teleport"), the code in Thread::block that should
be run after the thread unblocks, and sets m_blocker to null, never
actually runs.
Setting m_blocker to null in Thread::unblock allows us to avoid an
incorrect state where the thread is in a Running state but conatins a
pointer to a Blocker.
PT_POKE writes a single word to the tracee's address space.
Some caveats:
- If the user requests to write to an address in a read-only region, we
temporarily change the page's protections to allow it.
- If the user requests to write to a region that's backed by a
SharedInodeVMObject, we replace the vmobject with a PrivateIndoeVMObject.
This patch adds the minherit() syscall originally invented by OpenBSD.
Only the MAP_INHERIT_ZERO mode is supported for now. If set on an mmap
region, that region will be zeroed out on fork().
We now store the previous thread state in m_stop_state for all
transitions to the Stopped state via Thread::set_state.
Fixes#1752 whereupon resuming a thread that was stopped with SIGTSTP,
the previous state of the thread is not remembered correctly, resulting
in m_stop_state == State::Invalid and the associated assertion fails.
These validate_elf_* methods really had no business being static
methods of ELF::Image. Now that the ELF namespace exists, it makes
sense to just move them to be free functions in the namespace.
The plan is to extend what currently is known as "CPUGraph" and let the
SystemServer spawn multiple instances of it - which then can show memory
or network usages as well :^)
Simply renaming the applet is the first step.
This commit is one step forward for pluggable driver modules.
Instead of creating instances of network adapter classes, we let
their detect() methods to figure out if there are existing devices
to initialize.
There was a frequently occurring pattern of "map this physical address
into kernel VM, then read from it, then unmap it again".
This new typed_map() encapsulates that logic by giving you back a
typed pointer to the kind of structure you're interested in accessing.
It returns a TypedMapping<T> that can be used mostly like a pointer.
When destroyed, the TypedMapping object will unmap the memory. :^)
If we don't support ACPI, just don't instantiate an ACPI parser.
This is way less confusing than having a special parser class whose
only purpose is to do nothing.
We now search for the RSDP in ACPI::initialize() instead of letting
the parser constructor do it. This allows us to defer the decision
to create a parser until we're sure we can make a useful one.
- Get rid of the PCI::Initializer object which was not serving any real
purpose or holding any data members.
- Move command line parsing from init to PCI::initialize().
Instead of nesting a bunch of heap allocations, just store them in
a simple HashMap<u16, MMIOSegment>.
Also fix a bunch of double hash lookups like this:
ASSERT(map.contains(key));
auto thing = map.get(key).value();
They now look like this instead:
auto thing = map.get(key);
ASSERT(thing.has_value());
- Make things const when they don't need to be non-const.
- Don't return AK::String when it's always a string literal anyway.
- Remove excessive get_ prefixes per coding style.
The PCI access layer was composed of a bunch of virtual functions that
did nothing but call other virtual functions. The first layer was never
overridden so there was no need for them to be virtual.
This patch removes the indirection and moves logic from PCI::Access
down into the various PCI::get_foo() helpers that were the sole users.
- If there is no VMWare backdoor, don't allocate memory for it.
- Remove the "unsupported" state, instead just don't instantiate.
- Move the command-line parsing from init to the driver.
- Move mouse packet reception from PS2MouseDevice to VMWareBackdoor.
The purpose of init() is to get multi-tasking up and running. We don't
want to do anything in init() that doesn't advance that goal.
This patch moves some things from init() to init_stage2(), and adds a
comment block explaining the split.
In contrast to the previous patchset that was reverted, this time we use
a "special" method to access a file with block size of 512 bytes (like
a harddrive essentially).
We were allowing this dangerous kind of thing:
RefPtr<Base> base;
RefPtr<Derived> derived = base;
This patch changes the {Nonnull,}RefPtr constructors so this is no
longer possible.
To downcast one of these pointers, there is now static_ptr_cast<T>:
RefPtr<Derived> derived = static_ptr_cast<Derived>(base);
Fixing this exposed a ton of cowboy-downcasts in various places,
which we're now forced to fix. :^)
This patch adds a way for a socket to ask to be routed through a
specific interface.
Currently, this option only applies to sending, however, it should also
apply to receiving...somehow :^)
This patch relaxes how we think about UDP packets being "for us" a bit;
the proper way to handle this would be to also check if the matched
socket has SO_BROADCAST set, but we don't have that :)
This commit adds a basic implementation of
the ptrace syscall, which allows one process
(the tracer) to control another process (the tracee).
While a process is being traced, it is stopped whenever a signal is
received (other than SIGCONT).
The tracer can start tracing another thread with PT_ATTACH,
which causes the tracee to stop.
From there, the tracer can use PT_CONTINUE
to continue the execution of the tracee,
or use other request codes (which haven't been implemented yet)
to modify the state of the tracee.
Additional request codes are PT_SYSCALL, which causes the tracee to
continue exection but stop at the next entry or exit from a syscall,
and PT_GETREGS which fethces the last saved register set of the tracee
(can be used to inspect syscall arguments and return value).
A special request code is PT_TRACE_ME, which is issued by the tracee
and causes it to stop when it calls execve and wait for the
tracer to attach.
Previosuly, if we sent a SIGCONT to a stopped thread
and then waitpid() with WSTOPPED on that thread before
the signal was dispatched,
then the WaitBlocker would first unblock (because the thread is stopped)
and only after that the thread would get the SIGCONT signal.
This would mean that when waitpid returns
the waitee is not stopped.
To fix this, we do not unblock the waiting thread
if the waitee thread has a pending SIGCONT.
The test runner currently depends on the bash port being installed.
If you have it, you can run the LibJS test suite inside Serenity
by simply entering /home/anon/js-tests and doing ./run-tests :^)
Instead of blindly setting masks, if we want to disable an IRQ and it's
already masked, we just return. The same happens if we want to enable an
IRQ and it's unmasked.
Setting the m_enabled variable to true or false can help
with monitoring the IRQHandler object(s) later, and there's no good
reason to have an if-else statement in those methods anyway.
Before this change, we did a non-specific EOI, which could lead to
problems with other IRQs that are handled in the PIC. Since the original
8259A datasheet permits such functionality and we are not losing any
functionality, this change is acceptable even though we don't experience
problems with the EOI currently.
This is not a complete fix, since spurious IRQs under heavy loads can
still occur. However, this fix limits the amount of spurious IRQs.
It is encouraged to provide a better fix in the future, probably
something that takes into account handling of PCI level-triggered
interrupts.
Now we don't send raw numbers, but we let the IRQController object to
figure out the correct IRQ number.
This helps in a situation when we have 2 or more IOAPICs, so if IOAPIC
1 is assigned for IRQs 0-23 and IOAPIC 2 is assigned for IRQs 24-47,
if an IRQHandler of IRQ 25 invokes disable() for example, it will call
his responsible IRQController (IOAPIC 2), and the IRQController will
subtract the IRQ number with his assigned offset, and the result is that
the second redirection entry in IOAPIC 2 will be masked.
We don't return blindly the IRQ controller's model(), if the Spurious
IRQ handler is installed in IOAPIC environment, it's misleading to
return "IOAPIC" string since IOAPIC doesn't really handle Spurious
IRQs, therefore we return a "" string.
FlyString is a flyweight string class that wraps a RefPtr<StringImpl>
known to be unique among the set of FlyStrings. The class is very
unoptimized at the moment.
When to use FlyString:
- When you want O(1) string comparison
- When you want to deduplicate a lot of identical strings
When not to use FlyString:
- For strings that don't need either of the above features
- For strings that are likely to be unique
This new subsystem includes better abstractions of how time will be
handled in the OS. We take advantage of the existing RTC timer to aid
in keeping time synchronized. This is standing in contrast to how we
handled time-keeping in the kernel, where the PIT was responsible for
that function in addition to update the scheduler about ticks.
With that new advantage, we can easily change the ticking dynamically
and still keep the time synchronized.
In the process context, we no longer use a fixed declaration of
TICKS_PER_SECOND, but we call the TimeManagement singleton class to
provide us the right value. This allows us to use dynamic ticking in
the future, a feature known as tickless kernel.
The scheduler no longer does by himself the calculation of real time
(Unix time), and just calls the TimeManagment singleton class to provide
the value.
Also, we can use 2 new boot arguments:
- the "time" boot argument accpets either the value "modern", or
"legacy". If "modern" is specified, the time management subsystem will
try to setup HPET. Otherwise, for "legacy" value, the time subsystem
will revert to use the PIT & RTC, leaving HPET disabled.
If this boot argument is not specified, the default pattern is to try
to setup HPET.
- the "hpet" boot argumet accepts either the value "periodic" or
"nonperiodic". If "periodic" is specified, the HPET will scan for
periodic timers, and will assert if none are found. If only one is
found, that timer will be assigned for the time-keeping task. If more
than one is found, both time-keeping task & scheduler-ticking task
will be assigned to periodic timers.
If this boot argument is not specified, the default pattern is to try
to scan for HPET periodic timers. This boot argument has no effect if
HPET is disabled.
In hardware context, PIT & RealTimeClock classes are merely inheriting
from the HardwareTimer class, and they allow to use the old i8254 (PIT)
and RTC devices, managing them via IO ports. By default, the RTC will be
programmed to a frequency of 1024Hz. The PIT will be programmed to a
frequency close to 1000Hz.
About HPET, depending if we need to scan for periodic timers or not,
we try to set a frequency close to 1000Hz for the time-keeping timer
and scheduler-ticking timer. Also, if possible, we try to enable the
Legacy replacement feature of the HPET. This feature if exists,
instructs the chipset to disconnect both i8254 (PIT) and RTC.
This behavior is observable on QEMU, and was verified against the source
code:
ce967e2f33
The HPETComparator class is inheriting from HardwareTimer class, and is
responsible for an individual HPET comparator, which is essentially a
timer. Therefore, it needs to call the singleton HPET class to perform
HPET-related operations.
The new abstraction of Hardware timers brings an opportunity of more new
features in the foreseeable future. For example, we can change the
callback function of each hardware timer, thus it makes it possible to
swap missions between hardware timers, or to allow to use a hardware
timer for other temporary missions (e.g. calibrating the LAPIC timer,
measuring the CPU frequency, etc).