I was surprised to find that dup()'ed fds don't share the close-on-exec flag.
That means it has to be stored separately from the FileDescriptor object.
I didn't even put the { } properly around everything that would leak.
Let's make sure this works correctly by splitting out the work into a
helper called do_exec().
- Process::exec() needs to restore the original paging scope when called
on a non-current process.
- Add missing InterruptDisabler guards around g_processes access.
- Only flush the TLB when modifying the active page tables.
This is really sweet! :^) The four instances of /bin/sh spawned at
startup now share their read-only text pages.
There are problems and limitations here, and plenty of room for
improvement. But it kinda works.
All right, we can now mmap() a file and it gets magically paged in from fs
in response to an NP page fault. This is really cool :^)
I need to refactor this to support sharing of read-only file-backed pages,
but it's cool to just have something working.
First of all, change sys$mmap to take a struct SC_mmap_params since our
sycsall calling convention can't handle more than 3 arguments.
This exposed a bug in Syscall::invoke() needing to use clobber lists.
It was a bit confusing to debug. :^)
This is dirty but pretty cool! If we have a pending, unmasked signal for
a process that's blocked inside the kernel, we set up alternate stacks
for that process and unblock it to execute the signal handler.
A slightly different return trampoline is used here: since we need to
get back into the kernel, a dedicated syscall is used (sys$sigreturn.)
This restores the TSS contents of the process to the state it was in
while we were originally blocking in the kernel.
NOTE: There's currently only one "kernel resume TSS" so signal nesting
definitely won't work.
Processes are either alive (with many substates), dead or forgiven.
A dead process is forgiven when the parent waitpid()s on it.
Dead orphans are also forgiven.
There's a lot of work to be done around this.
It only works for sending a signal to a process that's in userspace code.
We implement reception by synthesizing a PUSHA+PUSHF in the receiving process
(operating on values in the TSS.)
The TSS CS:EIP is then rerouted to the signal handler and a tiny return
trampoline is constructed in a dedicated region in the receiving process.
Also hacked up /bin/kill to be able to send arbitrary signals (kill -N PID)
Implemented some syscalls: dup(), dup2(), getdtablesize().
FileHandle is now a retainable, since that's needed for dup()'ed fd's.
I didn't really test any of this beyond a basic smoke check.
sys$fork() now clones all writable regions with per-page COW bits.
The pages are then mapped read-only and we handle a PF by COWing the pages.
This is quite delightful. Obviously there's lots of work to do still,
and it needs better data structures, but the general concept works.
This turned out way better than the old code. ELF loading is now quite
straightforward, and we don't need the weird concept of subregions anymore.
Next step is to respect the is_writable flag.
This is quite cool! The syscall entry point plumbs the register dump
down to sys$fork(), which uses it to set up the child process's TSS
in order to resume execution right after the int 0x80 fork() call. :^)
This works pretty well, although there is some problem with the kernel
alias mappings used to clone the parent process's regions. If I disable
the MM::release_page_directory() code, there's no problem. Probably there's
a premature freeing of a physical page somehow.