There was a race window in a bunch of syscalls between calling
Thread::from_tid() and checking if the found thread was in the same
process as the calling thread.
If the found thread object was destroyed at that point, there was a
use-after-free that could be exploited by filling the kernel heap with
something that looked like a thread object.
While I was bringing up multitasking, I put the current PID in the SS2
(ring 2 stack segment) slot of the TSS. This was so I could see which
PID was currently running when just inspecting the CPU state.
Memory validation is used to verify that user syscalls are allowed to
access a given memory range. Ring 0 threads never make syscalls, and
so will never end up in validation anyway.
The reason we were allowing kmalloc memory accesses is because kernel
thread stacks used to be allocated in kmalloc memory. Since that's no
longer the case, we can stop making exceptions for kmalloc in the
validation code.
Move timeout management to the ReadBlocker and WriteBlocker classes.
Also get rid of the specialized ReceiveBlocker since it no longer does
anything that ReadBlocker can't do.
Vector::ensure_capacity() makes sure the underlying vector buffer can
contain all the data, but it doesn't update the Vector::size().
As a result, writev() would simply collect all the buffers to write,
and then do nothing.
It was possible to read uninitialized kernel memory via getsockname().
Of course, kmalloc() is a good boy and scrubs new allocations with 0xBB
so all you got was a bunch of 0xBB.
This implementation uses the new helper method of Bitmap called
find_longest_range_of_unset_bits. This method looks for the biggest
range of contiguous bits unset in the bitmap and returns the start of
the range back to the caller.
We should use dbg() instead of dbgprintf() as much as possible to
protect ourselves against format string bugs. Here's a bunch of
conversions in Ext2FS.
Move all the fork-specific inheritance logic to sys$fork(), and all the
stuff for setting up stdio for non-fork ring 3 processes moves to
Process::create_user_process().
Also: we were setting up the PGID, SID and umask twice. Also the code
for copying the open file descriptors was overly complicated. Now it's
just a simple Vector copy assignment. :^)
Since these are not part of the system call convention, we don't care
what userspace had in there. Might as well scrub it before entering
the kernel.
I would scrub EBP too, but that breaks the comfy kernel-thru-userspace
stack traces we currently get. It can be done with some effort.
This changes copyright holder to myself for the source code files that I've
created or have (almost) completely rewritten. Not included are the files
that were significantly changed by others even though it was me who originally
created them (think HtmlView), or the many other files I've contributed code to.
System components that need an IRQ handling are now inheriting the
InterruptHandler class.
In addition to that, the initialization process of PATAChannel was
changed to fit the changes.
PATAChannel, E1000NetworkAdapter and RTL8139NetworkAdapter are now
inheriting from PCI::Device instead of InterruptHandler directly.
The support is very basic - Each component that needs to handle IRQs
inherits from InterruptHandler class. When the InterruptHandler
constructor is called it registers itself in a SharedInterruptHandler.
When an IRQ is fired, the SharedInterruptHandler is invoked, then it
iterates through a list of the registered InterruptHandlers.
Also, InterruptEnabler class was created to provide a way to enable IRQ
handling temporarily, similar to InterruptDisabler (in CPU.h, which does
the opposite).
In addition to that a PCI::Device class has been added, that inherits
from InterruptHandler.
A process has one of three veil states:
- None: unveil() has never been called.
- Dropped: unveil() has been called, and can be called again.
- Locked: unveil() has been called, and cannot be called again.
When using dbg() in the kernel, the output is automatically prefixed
with [Process(PID:TID)]. This makes it a lot easier to understand which
thread is generating the output.
This patch also cleans up some common logging messages and removes the
now-unnecessary "dbg() << *current << ..." pattern.
Sergey suggested that having a non-zero O_RDONLY would make some things
less confusing, and it seems like he's right about that.
We can now easily check read/write permissions separately instead of
dancing around with the bits.
This patch also fixes unveil() validation for O_RDWR which previously
forgot to check for "r" permission.
We don't need to have this method anymore. It was a hack that was used
in many components in the system but currently we use better methods to
create virtual memory mappings. To prevent any further use of this
method it's best to just remove it completely.
Also, the APIC code is disabled for now since it doesn't help booting
the system, and is broken since it relies on identity mapping to exist
in the first 1MB. Any call to the APIC code will result in assertion
failed.
In addition to that, the name of the method which is responsible to
create an identity mapping between 1MB to 2MB was changed, to be more
precise about its purpose.
The problem was mostly in the initialization code, since in that stage
the parser assumed that there is an identity mapping in the first 1MB of
the address space. Now during initialization the parser will create the
correct mappings to locate the required data.
This syscall is a complement to pledge() and adds the same sort of
incremental relinquishing of capabilities for filesystem access.
The first call to unveil() will "drop a veil" on the process, and from
now on, only unveiled parts of the filesystem are visible to it.
Each call to unveil() specifies a path to either a directory or a file
along with permissions for that path. The permissions are a combination
of the following:
- r: Read access (like the "rpath" promise)
- w: Write access (like the "wpath" promise)
- x: Execute access
- c: Create/remove access (like the "cpath" promise)
Attempts to open a path that has not been unveiled with fail with
ENOENT. If the unveiled path lacks sufficient permissions, it will fail
with EACCES.
Like pledge(), subsequent calls to unveil() with the same path can only
remove permissions, not add them.
Once you call unveil(nullptr, nullptr), the veil is locked, and it's no
longer possible to unveil any more paths for the process, ever.
This concept comes from OpenBSD, and their implementation does various
things differently, I'm sure. This is just a first implementation for
SerenityOS, and we'll keep improving on it as we go. :^)