This replaces the previous virtual address allocator which was basically
just "m_next_address += size;"
With this in place, virtual addresses can get reused, which cuts down on
the number of page tables created. When we implement ASLR some day, we'll
probably have to do page table deallocation, but for now page tables are
only deallocated once the process dies.
Stash away the ELFLoader used to load an executable in Process so we can use
it for symbolicating userspace addresses later on. This will make debugging
userspace programs a lot nicer. :^)
Calling systrace(pid) gives you a file descriptor with a stream of the
syscalls made by a peer process. The process must be owned by the same
UID who calls systrace(). :^)
We can't have multiple threads in the same process running in the kernel
at the same time, so let's have a per-process lock that threads have to
acquire on syscall entry/exit (and yield while blocked.)
It's basically a userspace port of the kernel's Lock class.
Added gettid() and donate() syscalls to support the timeslice donation
feature we already enjoyed in the kernel.
The scheduler now operates on threads, rather than on processes.
Each process has a main thread, and can have any number of additional
threads. The process exits when the main thread exits.
This patch doesn't actually spawn any additional threads, it merely
does all the plumbing needed to make it possible. :^)
This is accomplished using a new Alarm class and a BlockedSnoozing state.
Basically, you call Process::snooze_until(some_alarm) and then the scheduler
won't wake up the process until some_alarm.is_ringing() returns true.
Only the receive timeout is hooked up yet. You can change the timeout by
calling setsockopt(..., SOL_SOCKET, SO_RCVTIMEO, ...).
Use this mechanism to make /bin/ping report timeouts.
Finally fixed the weird flaky crashing when resizing Terminal windows.
It was because we were dispatching a signal to "current" from the scheduler.
Yet another thing I dislike about even having a "current" process while
we're in the scheduler. Not sure yet how to fix this.
Let the signal handler's kernel stack be a kmalloc() allocation for now.
Once we can do allocation of consecutive physical pages in the supervisor
memory region, we can use that for all types of kernel stacks.
It's now possible to create symbolic links! :^)
This exposed an issue in Ext2FS where we'd write uninitialized data past
the end of an inode's content. Fix this by zeroing out the tail end of
the last block in a file.
When the kernel performs a successful exec(), whatever was on the kernel
stack for that process before goes away. For this reason, we need to make
sure we don't have any stack objects holding onto kmalloc memory.
This is a monster patch that required changing a whole bunch of things.
There are performance and stability issues all over the place, but it works.
Pretty cool, I have to admit :^)
Currently you can only mmap the entire framebuffer.
Using this when starting up the WindowServer gets us yet another step
closer towards it moving into userspace. :^)
This is really cool! :^)
Apps currently refuse to start if the WindowServer isn't listening on the
socket in /wsportal. This makes sense, but I guess it would also be nice
to have some sort of "wait for server on startup" mode.
This has performance issues, and I'll work on those, but this stuff seems
to actually work and I'm very happy with that.
In order to move the WindowServer to userspace, I have to eliminate its
dependence on system call facilities. The communication channel with each
client needs to be message-based in both directions.
For now, the WindowServer process will run with high priority,
while the Finalizer process will run with low priority.
Everyone else gets to be "normal".
At the moment, priority simply determines the size of your time slices.
Since we know who's holding the lock, and we're gonna have to yield anyway,
we can just ask the scheduler to donate any remaining ticks to that process.
Instead of processes themselves getting scheduled to finish dying,
let's have a Finalizer process that wakes up whenever someone is dying.
This way we can do all kinds of lock-taking in process cleanup without
risking reentering the scheduler.
- Don't cli() in Process::do_exec() unless current is execing.
Eventually this should go away once the scheduler is less retarded
in the face of interrupts.
- Improved memory access validation for ring0 processes.
We now look at the kernel ELF header to determine if an access
is appropriate. :^) It's very hackish but also kinda neat.
- Have Process::die() put the process into a new "Dying" state where
it can still get scheduled but no signals will be dispatched.
This way we can keep executing in die() but won't get our EIP
hijacked by signal dispatch. The main problem here was that die()
wanted to take various locks.
Instead of cowboy-calling the VESA BIOS in the bootloader, find the emulator
VGA adapter by scanning the PCI bus. Then set up the desired video mode by
sending device commands.
Font now uses the same in-memory format as the font files we have on disk.
This allows us to simply mmap() the font files and not use any additional
memory for them. Very cool! :^)
Hacking on this exposed a bug in file-backed VMObjects where the first client
to instantiate a VMObject for a specific inode also got to decide its size.
Since file-backed VMObjects always have the same size as the underlying file,
this made no sense, so I removed the ability to even set a size in that case.
Also use an enum for the rather-confusing return value in dispatch_signal().
I will go through the rest of the signals and set them up with the
appropriate default dispositions at some other point.
Now the filesystem is generated on-the-fly instead of manually adding and
removing inodes as processes spawn and die.
The code is convoluted and bloated as I wrote it while sleepless. However,
it's still vastly better than the old ProcFS, so I'm committing it.
I also added /proc/PID/fd/N symlinks for each of a process's open fd's.
GObjects can now register a timer with the GEventLoop. This will eventually
cause GTimerEvents to be dispatched to the GObject.
This needed a few supporting changes in the kernel:
- The PIT now ticks 1000 times/sec.
- select() now supports an arbitrary timeout.
- gettimeofday() now returns something in the tv_usec field.
With these changes, the clock window in guitest2 finally ticks on its own.
Use this to fix a use-after-free in ~GraphicsBitmap(). We'd hit this when
the WindowServer was doing a deferred destruction of a WSWindow whose
backing store referred to a now-reaped Process.
This required a fair bit of plumbing. The CharacterDevice::close() virtual
will now be closed by ~FileDescriptor(), allowing device implementations to
do custom cleanup at that point.
One big problem remains: if the master PTY is closed before the slave PTY,
we go into crashy land.
Only raw octal modes are supported right now.
This patch also changes mode_t from 32-bit to 16-bit to match the on-disk
type used by Ext2FS.
I also ran into EPERM being errno=0 which was confusing, so I inserted an
ESUCCESS in its place.
It's really only supported in Ext2FS since SynthFS doesn't really want you
mucking around with its files. This is pretty neat though :^)
I ran into some trouble with HashMap while working on this but opted to work
around it and leave that for a separate investigation.
Instead of clients painting whenever they feel like it, we now ask that they
paint in response to a paint message.
After finishing painting, clients notify the WindowServer about the rect(s)
they painted into and then flush eventually happens, etc.
This stuff leaves us with a lot of badly named things. Need to fix that.
This means we only have to do one fill_rect() per line and the whole process
ends up being ~10% faster than before.
Also added a read_tsc() syscall to give userspace access to the TSC.
To start painting, call:
gui$get_window_backing_store()
Then finish up with:
gui$release_window_backing_store()
Process will retain the underlying GraphicsBitmap behind the scenes.
This fixes racing between the WindowServer and GUI clients.
This patch also adds a WSWindowLocker that is exactly what it sounds like.
This patch adds most of the plumbing for working file deletion in Ext2FS.
Directory entries are removed and inode link counts updated.
We don't yet update the inode or block bitmaps, I will do that separately.
Make PageDirectory retainable and have each Region co-own the PageDirectory
they're mapped into. When unmapped, Region has no associated PageDirectory.
This allows Region to automatically unmap itself when destroyed.
The system can finally idle without burning CPU. :^)
There are some issues with scheduling making the mouse cursor sloppy
and unresponsive that need to be dealt with.
This is pretty cool. :^)
GraphicsBitmaps are now mapped into both the server and the client address
space (usually at different addresses but that doesn't matter.)
Added a GUI syscall for getting a window's backing store, and another one
for invalidating a window so that the server redraws it.
Userspace programs can now open /dev/gui_events and read a stream of GUI_Event
structs one at a time.
I was stuck on a stupid problem where we'd reenter Scheduler::yield() due to
having one of the has_data_available_for_reading() implementations using locks.
This is a lot better than having them in kmalloc memory. I'm gonna need
a way to keep track of which process owns which bitmap eventually,
maybe through some sort of resource keying system. We'll see.
This synchronous approach to inodes is silly, obviously. I need to rework
it so that the in-memory CoreInode object is the canonical inode, and then
we just need a sync() that flushes pending changes to disk.
The kernel now bills processes for time spent in kernelspace and userspace
separately. The accounting is forwarded to the parent process in reap().
This makes the "time" builtin in bash work.
This way the scheduler doesn't need to plumb the exit status into the waiter.
We still plumb the waitee pid though, I don't love it but it can be fixed.
mmap() will now map uncommitted pages that get allocated and zeroed upon the
first access. I also made /proc/PID/vm show number of "committed" bytes in
each region. This is so cool! :^)
I was surprised to find that dup()'ed fds don't share the close-on-exec flag.
That means it has to be stored separately from the FileDescriptor object.
I didn't even put the { } properly around everything that would leak.
Let's make sure this works correctly by splitting out the work into a
helper called do_exec().
- Process::exec() needs to restore the original paging scope when called
on a non-current process.
- Add missing InterruptDisabler guards around g_processes access.
- Only flush the TLB when modifying the active page tables.
This is really sweet! :^) The four instances of /bin/sh spawned at
startup now share their read-only text pages.
There are problems and limitations here, and plenty of room for
improvement. But it kinda works.
All right, we can now mmap() a file and it gets magically paged in from fs
in response to an NP page fault. This is really cool :^)
I need to refactor this to support sharing of read-only file-backed pages,
but it's cool to just have something working.
First of all, change sys$mmap to take a struct SC_mmap_params since our
sycsall calling convention can't handle more than 3 arguments.
This exposed a bug in Syscall::invoke() needing to use clobber lists.
It was a bit confusing to debug. :^)
This is dirty but pretty cool! If we have a pending, unmasked signal for
a process that's blocked inside the kernel, we set up alternate stacks
for that process and unblock it to execute the signal handler.
A slightly different return trampoline is used here: since we need to
get back into the kernel, a dedicated syscall is used (sys$sigreturn.)
This restores the TSS contents of the process to the state it was in
while we were originally blocking in the kernel.
NOTE: There's currently only one "kernel resume TSS" so signal nesting
definitely won't work.
Processes are either alive (with many substates), dead or forgiven.
A dead process is forgiven when the parent waitpid()s on it.
Dead orphans are also forgiven.
There's a lot of work to be done around this.
It only works for sending a signal to a process that's in userspace code.
We implement reception by synthesizing a PUSHA+PUSHF in the receiving process
(operating on values in the TSS.)
The TSS CS:EIP is then rerouted to the signal handler and a tiny return
trampoline is constructed in a dedicated region in the receiving process.
Also hacked up /bin/kill to be able to send arbitrary signals (kill -N PID)
Implemented some syscalls: dup(), dup2(), getdtablesize().
FileHandle is now a retainable, since that's needed for dup()'ed fd's.
I didn't really test any of this beyond a basic smoke check.
sys$fork() now clones all writable regions with per-page COW bits.
The pages are then mapped read-only and we handle a PF by COWing the pages.
This is quite delightful. Obviously there's lots of work to do still,
and it needs better data structures, but the general concept works.
This turned out way better than the old code. ELF loading is now quite
straightforward, and we don't need the weird concept of subregions anymore.
Next step is to respect the is_writable flag.
This is quite cool! The syscall entry point plumbs the register dump
down to sys$fork(), which uses it to set up the child process's TSS
in order to resume execution right after the int 0x80 fork() call. :^)
This works pretty well, although there is some problem with the kernel
alias mappings used to clone the parent process's regions. If I disable
the MM::release_page_directory() code, there's no problem. Probably there's
a premature freeing of a physical page somehow.