We don't need a wrapper Function object that just forwards the timer
callback to the scheduler tick function. It already has the same
signature, so we can just plug it in directly. :^)
Same with the clock updating function.
The IOAPIC manual states that "Interrupt Mask-R/W. When this bit is 1,
the interrupt signal is masked. Edge-sensitive interrupts signaled on
a masked interrupt pin are ignored." - Therefore we have to ensure that
we disable interrupts globally with cli(), but also to ensure that we
invoke enable_irq() before sending the hardware command that generates
an IRQ almost immediately.
First, before this change, specifying 'force_pio' in the kernel
commandline was meaningless because we nevertheless set the DMA flag to
be enabled.
Also, we had a problem in which we used IO::repeated_out16() in PIO
write method. This might work on buggy emulators, but I suspect that on
real hardware this code will fail.
The most difficult problem was to restore the PIO read operation.
Apparently, it seems that we can't use IO::repeated_in16() here because
it will read zeroed data. Currently we rely on a simple loop that
invokes IO::in16() to a buffer. Also, the interrupt handling stage in
the PIO read method is moved to be handled inside the loop of reading
the requested sectors.
POSIX says, "Conforming applications should not assume that the returned
contents of the symbolic link are null-terminated."
If we do include the null terminator into the returning string, Python
believes it to actually be a part of the returned name, and gets unhappy
about that later. This suggests other systems Python runs in don't include
it, so let's do that too.
Also, make our userspace support non-null-terminated realpath().
Output address validation should be done for the tracer's address space
and not the tracee's.
Also use copy_to_user() instead of copy_from_user(). The two are really
identical at the moment, but maybe we can add some assertions to make
sure we're doing what we think we're doing.
Thanks to Sergey for spotting these!
We currently only care about debug exceptions that are triggered
by the single-step execution mode.
The debug exception is translated to a SIGTRAP, which can be caught
and handled by the tracing thread.
This memory range was set up using 2MB pages by the code in boot.S.
Because of that, the kernel image protection code didn't work, since it
assumed 4KB pages.
We now switch to 4KB pages during MemoryManager initialization. This
makes the kernel image protection code work correctly again. :^)
The syscall wrapper for ptrace needs to return the peeked value when
using PT_PEEK.
Because of this, the user has to check errno to detect an error in
PT_PEEK.
This commit changes the actual syscall's interface (only for PT_PEEK) to
allow the syscall wrapper to detect an error and change errno.
PT_SETTREGS sets the regsiters of the traced thread. It can only be
used when the tracee is stopped.
Also, refactor ptrace.
The implementation was getting long and cluttered the alraedy large
Process.cpp file.
This commit moves the bulk of the implementation to Kernel/Ptrace.cpp,
and factors out peek & poke to separate methods of the Process class.
This was a missing feature in the PT_TRACEME command.
This feature allows the tracer to interact with the tracee before the
tracee has started executing its program.
It will be useful for automatically inserting a breakpoint at a
debugged program's entry point.
Before this commit, m_blocker was only set to null in Thread::block,
after the thread has been unblocked.
Starting with this commit, m_blocker is also set to null in
Thread::unblock.
This change will allow us to implement a missing feature of the PT_TRACE
command of the ptrace syscall - stopping the traced thread when it
exits the execve syscall.
That feature will be implemented by sending a blocking SIGSTOP to the
traced thread after it has executed the execve logic and before it
starts executing the new program in userspace.
However, since Process::exec arranges the tss to return to userspace
(the so-called "yield-teleport"), the code in Thread::block that should
be run after the thread unblocks, and sets m_blocker to null, never
actually runs.
Setting m_blocker to null in Thread::unblock allows us to avoid an
incorrect state where the thread is in a Running state but conatins a
pointer to a Blocker.
PT_POKE writes a single word to the tracee's address space.
Some caveats:
- If the user requests to write to an address in a read-only region, we
temporarily change the page's protections to allow it.
- If the user requests to write to a region that's backed by a
SharedInodeVMObject, we replace the vmobject with a PrivateIndoeVMObject.
This patch adds the minherit() syscall originally invented by OpenBSD.
Only the MAP_INHERIT_ZERO mode is supported for now. If set on an mmap
region, that region will be zeroed out on fork().
We now store the previous thread state in m_stop_state for all
transitions to the Stopped state via Thread::set_state.
Fixes#1752 whereupon resuming a thread that was stopped with SIGTSTP,
the previous state of the thread is not remembered correctly, resulting
in m_stop_state == State::Invalid and the associated assertion fails.
These validate_elf_* methods really had no business being static
methods of ELF::Image. Now that the ELF namespace exists, it makes
sense to just move them to be free functions in the namespace.
The plan is to extend what currently is known as "CPUGraph" and let the
SystemServer spawn multiple instances of it - which then can show memory
or network usages as well :^)
Simply renaming the applet is the first step.
This commit is one step forward for pluggable driver modules.
Instead of creating instances of network adapter classes, we let
their detect() methods to figure out if there are existing devices
to initialize.
There was a frequently occurring pattern of "map this physical address
into kernel VM, then read from it, then unmap it again".
This new typed_map() encapsulates that logic by giving you back a
typed pointer to the kind of structure you're interested in accessing.
It returns a TypedMapping<T> that can be used mostly like a pointer.
When destroyed, the TypedMapping object will unmap the memory. :^)
If we don't support ACPI, just don't instantiate an ACPI parser.
This is way less confusing than having a special parser class whose
only purpose is to do nothing.
We now search for the RSDP in ACPI::initialize() instead of letting
the parser constructor do it. This allows us to defer the decision
to create a parser until we're sure we can make a useful one.
- Get rid of the PCI::Initializer object which was not serving any real
purpose or holding any data members.
- Move command line parsing from init to PCI::initialize().
Instead of nesting a bunch of heap allocations, just store them in
a simple HashMap<u16, MMIOSegment>.
Also fix a bunch of double hash lookups like this:
ASSERT(map.contains(key));
auto thing = map.get(key).value();
They now look like this instead:
auto thing = map.get(key);
ASSERT(thing.has_value());
- Make things const when they don't need to be non-const.
- Don't return AK::String when it's always a string literal anyway.
- Remove excessive get_ prefixes per coding style.
The PCI access layer was composed of a bunch of virtual functions that
did nothing but call other virtual functions. The first layer was never
overridden so there was no need for them to be virtual.
This patch removes the indirection and moves logic from PCI::Access
down into the various PCI::get_foo() helpers that were the sole users.
- If there is no VMWare backdoor, don't allocate memory for it.
- Remove the "unsupported" state, instead just don't instantiate.
- Move the command-line parsing from init to the driver.
- Move mouse packet reception from PS2MouseDevice to VMWareBackdoor.
The purpose of init() is to get multi-tasking up and running. We don't
want to do anything in init() that doesn't advance that goal.
This patch moves some things from init() to init_stage2(), and adds a
comment block explaining the split.
In contrast to the previous patchset that was reverted, this time we use
a "special" method to access a file with block size of 512 bytes (like
a harddrive essentially).
We were allowing this dangerous kind of thing:
RefPtr<Base> base;
RefPtr<Derived> derived = base;
This patch changes the {Nonnull,}RefPtr constructors so this is no
longer possible.
To downcast one of these pointers, there is now static_ptr_cast<T>:
RefPtr<Derived> derived = static_ptr_cast<Derived>(base);
Fixing this exposed a ton of cowboy-downcasts in various places,
which we're now forced to fix. :^)
This patch adds a way for a socket to ask to be routed through a
specific interface.
Currently, this option only applies to sending, however, it should also
apply to receiving...somehow :^)
This patch relaxes how we think about UDP packets being "for us" a bit;
the proper way to handle this would be to also check if the matched
socket has SO_BROADCAST set, but we don't have that :)
This commit adds a basic implementation of
the ptrace syscall, which allows one process
(the tracer) to control another process (the tracee).
While a process is being traced, it is stopped whenever a signal is
received (other than SIGCONT).
The tracer can start tracing another thread with PT_ATTACH,
which causes the tracee to stop.
From there, the tracer can use PT_CONTINUE
to continue the execution of the tracee,
or use other request codes (which haven't been implemented yet)
to modify the state of the tracee.
Additional request codes are PT_SYSCALL, which causes the tracee to
continue exection but stop at the next entry or exit from a syscall,
and PT_GETREGS which fethces the last saved register set of the tracee
(can be used to inspect syscall arguments and return value).
A special request code is PT_TRACE_ME, which is issued by the tracee
and causes it to stop when it calls execve and wait for the
tracer to attach.
Previosuly, if we sent a SIGCONT to a stopped thread
and then waitpid() with WSTOPPED on that thread before
the signal was dispatched,
then the WaitBlocker would first unblock (because the thread is stopped)
and only after that the thread would get the SIGCONT signal.
This would mean that when waitpid returns
the waitee is not stopped.
To fix this, we do not unblock the waiting thread
if the waitee thread has a pending SIGCONT.
The test runner currently depends on the bash port being installed.
If you have it, you can run the LibJS test suite inside Serenity
by simply entering /home/anon/js-tests and doing ./run-tests :^)
Instead of blindly setting masks, if we want to disable an IRQ and it's
already masked, we just return. The same happens if we want to enable an
IRQ and it's unmasked.
Setting the m_enabled variable to true or false can help
with monitoring the IRQHandler object(s) later, and there's no good
reason to have an if-else statement in those methods anyway.
Before this change, we did a non-specific EOI, which could lead to
problems with other IRQs that are handled in the PIC. Since the original
8259A datasheet permits such functionality and we are not losing any
functionality, this change is acceptable even though we don't experience
problems with the EOI currently.
This is not a complete fix, since spurious IRQs under heavy loads can
still occur. However, this fix limits the amount of spurious IRQs.
It is encouraged to provide a better fix in the future, probably
something that takes into account handling of PCI level-triggered
interrupts.
Now we don't send raw numbers, but we let the IRQController object to
figure out the correct IRQ number.
This helps in a situation when we have 2 or more IOAPICs, so if IOAPIC
1 is assigned for IRQs 0-23 and IOAPIC 2 is assigned for IRQs 24-47,
if an IRQHandler of IRQ 25 invokes disable() for example, it will call
his responsible IRQController (IOAPIC 2), and the IRQController will
subtract the IRQ number with his assigned offset, and the result is that
the second redirection entry in IOAPIC 2 will be masked.
We don't return blindly the IRQ controller's model(), if the Spurious
IRQ handler is installed in IOAPIC environment, it's misleading to
return "IOAPIC" string since IOAPIC doesn't really handle Spurious
IRQs, therefore we return a "" string.
FlyString is a flyweight string class that wraps a RefPtr<StringImpl>
known to be unique among the set of FlyStrings. The class is very
unoptimized at the moment.
When to use FlyString:
- When you want O(1) string comparison
- When you want to deduplicate a lot of identical strings
When not to use FlyString:
- For strings that don't need either of the above features
- For strings that are likely to be unique
This new subsystem includes better abstractions of how time will be
handled in the OS. We take advantage of the existing RTC timer to aid
in keeping time synchronized. This is standing in contrast to how we
handled time-keeping in the kernel, where the PIT was responsible for
that function in addition to update the scheduler about ticks.
With that new advantage, we can easily change the ticking dynamically
and still keep the time synchronized.
In the process context, we no longer use a fixed declaration of
TICKS_PER_SECOND, but we call the TimeManagement singleton class to
provide us the right value. This allows us to use dynamic ticking in
the future, a feature known as tickless kernel.
The scheduler no longer does by himself the calculation of real time
(Unix time), and just calls the TimeManagment singleton class to provide
the value.
Also, we can use 2 new boot arguments:
- the "time" boot argument accpets either the value "modern", or
"legacy". If "modern" is specified, the time management subsystem will
try to setup HPET. Otherwise, for "legacy" value, the time subsystem
will revert to use the PIT & RTC, leaving HPET disabled.
If this boot argument is not specified, the default pattern is to try
to setup HPET.
- the "hpet" boot argumet accepts either the value "periodic" or
"nonperiodic". If "periodic" is specified, the HPET will scan for
periodic timers, and will assert if none are found. If only one is
found, that timer will be assigned for the time-keeping task. If more
than one is found, both time-keeping task & scheduler-ticking task
will be assigned to periodic timers.
If this boot argument is not specified, the default pattern is to try
to scan for HPET periodic timers. This boot argument has no effect if
HPET is disabled.
In hardware context, PIT & RealTimeClock classes are merely inheriting
from the HardwareTimer class, and they allow to use the old i8254 (PIT)
and RTC devices, managing them via IO ports. By default, the RTC will be
programmed to a frequency of 1024Hz. The PIT will be programmed to a
frequency close to 1000Hz.
About HPET, depending if we need to scan for periodic timers or not,
we try to set a frequency close to 1000Hz for the time-keeping timer
and scheduler-ticking timer. Also, if possible, we try to enable the
Legacy replacement feature of the HPET. This feature if exists,
instructs the chipset to disconnect both i8254 (PIT) and RTC.
This behavior is observable on QEMU, and was verified against the source
code:
ce967e2f33
The HPETComparator class is inheriting from HardwareTimer class, and is
responsible for an individual HPET comparator, which is essentially a
timer. Therefore, it needs to call the singleton HPET class to perform
HPET-related operations.
The new abstraction of Hardware timers brings an opportunity of more new
features in the foreseeable future. For example, we can change the
callback function of each hardware timer, thus it makes it possible to
swap missions between hardware timers, or to allow to use a hardware
timer for other temporary missions (e.g. calibrating the LAPIC timer,
measuring the CPU frequency, etc).
A new IP address or a new network mask can be specified in the command
line arguments of ifconfig to replace the old values of a given network
adapter. Additionally, more information is being printed for each adapter.
This is similar to 28e1da344d
and 4dd4dd2f3c.
The crux is that wait verifies that the outvalue (siginfo* infop)
is writable *before* waiting, and writes to it *after* waiting.
In the meantime, a concurrent thread can make the output region
unwritable, e.g. by deallocating it.
This is similar to 28e1da344d
and 4dd4dd2f3c.
The crux is that select verifies that the filedescriptor sets
are writable *before* blocking, and writes to them *after* blocking.
In the meantime, a concurrent thread can make the output buffer
unwritable, e.g. by deallocating it.
This was causing some obvious-in-hindsight but hard to spot bugs where
we'd implicitly convert the bool to an integer type and carry on with
the number 1 instead of the actual value().
Also, InterruptDisabler were added to prevent critical function from
being interrupted. In addition, the interrupt numbers are abstracted
from IDT offsets, thus, allowing to create a better routing scheme
when using IOAPICs for interrupt redirection.
This was caught by running all crash tests with "crash -A".
Basically, non-readable pages need to not be mapped *at all* so that
a "page not present" exception is provoked on access.
Unfortunately x86 does not support write-only mappings, so this is
the best we can do.
Fixes#1336.
This is a complete fix of clock_nanosleep, because the thread holds the
process lock again when returning from sleep()/sleep_until().
Therefore, no further concurrent invalidation can occur.
Now it actually defaults to "a < b" comparison, instead of forcing you
to provide a trivial less-than comparator. Also you can pass in any
collection type that has .begin() and .end() and we'll sort it for you.
Also, duplicate data in dbg() and klog() calls were removed.
In addition, leakage of virtual address to kernel log is prevented.
This is done by replacing kprintf() calls to dbg() calls with the
leaked data instead.
Also, other kprintf() calls were replaced with klog().
This was only used by the mechanism for mapping executables into each
process's own address space. Now that we remap executables on demand
when needed for symbolication, this can go away.
Previously we would map the entire executable of a program in its own
address space (but make it unavailable to userspace code.)
This patch removes that and changes the symbolication code to remap
the executable on demand (and into the kernel's own address space
instead of the process address space.)
This opens up a couple of further simplifications that will follow.
I had the wrong idea about this. Thanks to Sergey for pointing it out!
Here's what he says (reproduced for posterity):
> Private mappings protect the underlying file from the changes made by
> you, not the other way around. To quote POSIX, "If MAP_PRIVATE is
> specified, modifications to the mapped data by the calling process
> shall be visible only to the calling process and shall not change the
> underlying object. It is unspecified whether modifications to the
> underlying object done after the MAP_PRIVATE mapping is established
> are visible through the MAP_PRIVATE mapping." In practice that means
> that the pages that were already paged in don't get updated when the
> underlying file changes, and the pages that weren't paged in yet will
> load the latest data at that moment.
> The only thing MAP_FILE | MAP_PRIVATE is really useful for is mapping
> a library and performing relocations; it's definitely useless (and
> actively harmful for the system memory usage) if you only read from
> the file.
This effectively reverts e2697c2ddd.
This patch reduces the number of code paths that lead to the allocation
of a Region object. It's quite hard to follow the various ways in which
this can happen, so this is an effort to simplify.
When stopping a thread with the SIGSTOP signal, we now store the thread
state in Thread::m_stop_state. That state is then restored on SIGCONT.
This fixes an issue where previously-blocked threads would unblock
upon resume. Now they simply resume in the Blocked state, and it's up
to the regular unblocking mechanism to unblock them.
Fixes#1326.
This will be a memory usage pessimization until we actually implement
CoW sharing of the memory pages with SharedInodeVMObject.
However, it's a huge architectural improvement, so let's take it and
improve on this incrementally.
fork() should still be neutral, since all private mappings are CoW'ed.
It's now up to the caller to provide a VMObject when constructing a new
Region object. This will make it easier to handle things going wrong,
like allocation failures, etc.
When forking a process, we now turn all of the private inode-backed
mmap() regions into copy-on-write regions in both the parent and child.
This patch also removes an assertion that becomes irrelevant.
If we wrote anything we should just inform userspace that we did,
and not worry about the error code. Userspace can call us again if
it wants, and we'll give them the error then.
We don't have to log the process name/PID/TID, dbg() automatically adds
that as a prefix to every line.
Also we don't have to do .characters() on Strings passed to dbg() :^)
The IRQController object is RefCounted, and is shared between the
InterruptManagement class & IRQ handlers' classes.
IRQHandler, SharedIRQHandler & SpuriousInterruptHandler classes
use a responsible IRQ controller directly instead of calling
InterruptManagement for disable(), enable() or eoi().
Also, the initialization process of InterruptManagement is
simplified, so it doesn't rely on an ACPI parser to be initialized.
More namespaces have been added to organize the declarations
in a more sensible way.
Also, a namespace StaticParsing has been added to allow early
access to ACPI tables.
You can now mmap a file as private and writable, and the changes you
make will only be visible to you.
This works because internally a MAP_PRIVATE region is backed by a
unique PrivateInodeVMObject instead of using the globally shared
SharedInodeVMObject like we always did before. :^)
Fixes#1045.
We now have PrivateInodeVMObject and SharedInodeVMObject, corresponding
to MAP_PRIVATE and MAP_SHARED respectively.
Note that PrivateInodeVMObject is not used yet.
Add an extra out-parameter to shbuf_get() that receives the size of the
shared buffer. That way we don't need to make a separate syscall to
get the size, which we always did immediately after.
This feels a lot more consistent and Unixy:
create_shared_buffer() => shbuf_create()
share_buffer_with() => shbuf_allow_pid()
share_buffer_globally() => shbuf_allow_all()
get_shared_buffer() => shbuf_get()
release_shared_buffer() => shbuf_release()
seal_shared_buffer() => shbuf_seal()
get_shared_buffer_size() => shbuf_get_size()
Also, "shared_buffer_id" is shortened to "shbuf_id" all around.
Now we check before we set a FBResolution if the BXVGA device is capable
of setting the requested resolution.
If not, we revert the resolution to the previous one and return an error
to userspace.
Fixes#451.