All right, we can now mmap() a file and it gets magically paged in from fs
in response to an NP page fault. This is really cool :^)
I need to refactor this to support sharing of read-only file-backed pages,
but it's cool to just have something working.
First of all, change sys$mmap to take a struct SC_mmap_params since our
sycsall calling convention can't handle more than 3 arguments.
This exposed a bug in Syscall::invoke() needing to use clobber lists.
It was a bit confusing to debug. :^)
This is dirty but pretty cool! If we have a pending, unmasked signal for
a process that's blocked inside the kernel, we set up alternate stacks
for that process and unblock it to execute the signal handler.
A slightly different return trampoline is used here: since we need to
get back into the kernel, a dedicated syscall is used (sys$sigreturn.)
This restores the TSS contents of the process to the state it was in
while we were originally blocking in the kernel.
NOTE: There's currently only one "kernel resume TSS" so signal nesting
definitely won't work.
Processes are either alive (with many substates), dead or forgiven.
A dead process is forgiven when the parent waitpid()s on it.
Dead orphans are also forgiven.
There's a lot of work to be done around this.
Also keep the canonical errno list in LibC for now. The kernel gets it
from there. This makes building 3rd party code easier.
..also fix broken strchr().
It only works for sending a signal to a process that's in userspace code.
We implement reception by synthesizing a PUSHA+PUSHF in the receiving process
(operating on values in the TSS.)
The TSS CS:EIP is then rerouted to the signal handler and a tiny return
trampoline is constructed in a dedicated region in the receiving process.
Also hacked up /bin/kill to be able to send arbitrary signals (kill -N PID)
Implemented some syscalls: dup(), dup2(), getdtablesize().
FileHandle is now a retainable, since that's needed for dup()'ed fd's.
I didn't really test any of this beyond a basic smoke check.
sys$fork() now clones all writable regions with per-page COW bits.
The pages are then mapped read-only and we handle a PF by COWing the pages.
This is quite delightful. Obviously there's lots of work to do still,
and it needs better data structures, but the general concept works.
This turned out way better than the old code. ELF loading is now quite
straightforward, and we don't need the weird concept of subregions anymore.
Next step is to respect the is_writable flag.
This was the fix:
-process.m_page_directory[0] = m_kernel_page_directory[0];
-process.m_page_directory[1] = m_kernel_page_directory[1];
+process.m_page_directory->entries[0] = m_kernel_page_directory->entries[0];
+process.m_page_directory->entries[1] = m_kernel_page_directory->entries[1];
I spent a good two hours scratching my head, not being able to figure out why
user process page directories felt they had ownership of page tables in the
kernel page directory.
It was because I was copying the entire damn kernel page directory into
the process instead of only sharing the two first PDE's. Dang!
This is quite cool! The syscall entry point plumbs the register dump
down to sys$fork(), which uses it to set up the child process's TSS
in order to resume execution right after the int 0x80 fork() call. :^)
This works pretty well, although there is some problem with the kernel
alias mappings used to clone the parent process's regions. If I disable
the MM::release_page_directory() code, there's no problem. Probably there's
a premature freeing of a physical page somehow.
We no longer disable interrupts around the whole affair.
Since MM manages per-process data structures, this works quite smoothly now.
Only procfs had to be tweaked with an InterruptDisabler.
I'm still playing around with finding a style that I like.
This is starting to feel pleasing to the eye. I guess this is how long
it took me to break free from the habit of my previous Qt/WK coding style.
This is way better than walking the region lists. I suppose we could
even let the hardware trigger a page fault and handle that. That'll
be the next step in the evolution here I guess.
I added an RAII helper called OtherTaskPagingScope. While present,
it switches the kernel over to using another task's page directory.
This is perfect for e.g walking the stack in /proc/PID/stack.
I spent some time stuck on a problem where processes would clobber each
other's stacks. Took me a moment to figure out that their stacks
were allocated in the sub-4MB linear address range which is shared
between all processes. Oops!
This isn't finished but I'll commit as I go. We need to get to where context
switching only needs to change CR3 and everything's ready to go.
My basic idea is:
- The first 4 kB is off-limits. This catches null dereferences.
- Up to the 4 MB mark is identity-mapped and kernel-only.
- The rest is available to everyone!
While the first 4 MB is only available to the kernel, it's still mapped in
every process, for convenience when entering the kernel.
Ran into a horrendous bug where VirtualConsole would overrun its buffer
and scribble right into some other object if we were interrupted while
processing a character. Slapped an InterruptDisabler onto onChar for now.
This provokes an interesting question though.. if a process is killed
while its in kernel space, how the heck do we release any locks it held?
I'm sure there are many different solutions to this problem, but I'll
have to think about it.
I ran out of steam writing library routines and imported two
BSD-licensed libc routines: sscanf() and getopt().
I will most likely rewrite them sooner or later. For now
I just wanted to see figlet running.
I'm not sure why this was using a global, but it was very racy and made
processes walk over each other when multiple processes were doing
syscalls simultaneously.
We now make three VirtualConsoles at boot: tty0, tty1, and tty2.
We launch an instance of /bin/sh in each one.
You switch between them with Alt+1/2/3
How very very cool :^)
This doesn't mean we get any line editing just yet. But the keyboard device
now recognizes the backspace key, and the console device knows what to do
with the backspace characters.
The SpinLock was all backwards and didn't actually work. Fixing it exposed
how wrong most of the locking here is.
I need to come up with a better granularity here.
- sys$readlink + readlink()
- Add a /proc/PID/exe symlink to the process's executable.
- Print symlink contents in ls output.
- Some work on plumbing options into VFS::open().
This is pretty inefficient for ext2fs. We walk the entire block group
containing the inode, searching through every directory for an entry
referencing this inode.
It might be a good idea to cache this information somehow. I'm not sure
how often we'll be searching for it.
Obviously there are multiple caching layers missing in the file system.
Sweet, now we can look at all the zones (physical memory) currently in play.
Building the procfs files with ksprintf and rickety buffer presizing feels
pretty shoddy but I'll fix it up eventually.
This shows some info about the MM. Right now it's just the zone count
and the number of free physical pages. Lots more can be added.
Also added "exit" to sh so we can nest shells and exit from them.
I also noticed that we were leaking all the physical pages, so fixed that.
This took me a couple hours. :^)
The ELF loading code now allocates a single region for the entire
file and creates virtual memory mappings for the sections as needed.
Very nice!
I also added a generator cache to FileHandle. This way, multiple
reads to a generated file (i.e in a synthfs) can transparently
handle multiple calls to read() without the contents changing
between calls.
The cache is discarded at EOF (or when the FileHandle is destroyed.)
FileHandle gets a hasDataAvailableForRead() getter.
If this returns true in sys$read(), the task will block(BlockedRead) + yield.
The fd blocked on is stored in Task::m_fdBlockedOnRead.
The scheduler then looks at the state of that fd during the unblock phase.
This makes "sh" restful. :^)
There's still some problem with the kernel not surviving the colonel task
getting scheduled. I need to figure that out and fix it.
(also increase the number of sectors loaded by the bootloader in the
since I just noticed we were at the limit. This is not the ideal way
of doing this.)
It's implemented as a separate process. How cute is that.
Tasks now have a current working directory. Spawned tasks inherit their
parent task's working directory.
Currently everyone just uses "/" as there's no way to chdir().
I added a dead-simple malloc that only allows allocations < 4096 bytes.
It just forwards the request to mmap() every time.
I also added simplified versions of opendir() and readdir().
The naive spinlock was not nearly enough to protect kmalloc from
reentrancy problems.
I don't want to deal with coming up with a fancy lock for kmalloc
right now, so I made an InterruptDisabler thingy instead.
It does CLI and then STI iff interrupts were previously enabled.
I know I'm praying for cargo here, but this does fix a weird issue
where logging the sum_alloc and sum_free globals wouldn't display
symmetric values all the time.
Add a separate lock to protect the VFS. I think this might be a good idea.
I'm not sure it's a good approach though. I'll fiddle with it as I go along.
It's really fun to figure out all these things on my own.
- putch syscall now directly calls Console::putChar().
- /proc/summary includes some info about kmalloc stats.
- Syscall entry is guarded by a simple spinlock.
- Unmap regions for crashed tasks.
- Turn Keyboard into a CharacterDevice (85,1) at /dev/keyboard.
- Implement MM::unmapRegionsForTask() and MM::unmapRegion()
- Save SS correctly on interrupt.
- Add a simple Spawn syscall for launching another process.
- Move a bunch of IO syscall debug output behind DEBUG_IO.
- Have ASSERT do a "cli" immediately when failing.
This makes the output look proper every time.
- Implement a bunch of syscalls in LibC.
- Add a simple shell ("sh"). All it can do now is read a line
of text from /dev/keyboard and then try launching the specified
executable by calling spawn().
There are definitely bugs in here, but we're moving on forward.
- More work on funneling console output through Console.
- init() now breaks off into a separate task ASAP.
- ..this leaves the "colonel" task as a simple hlt idle loop.
- Mask all IRQs on startup (except IRQ2 for slave passthru)
- Fix underallocation bug in Task::allocateRegion().
- Remember how many times each Task has been scheduled.
The panel and scheduling banner are disabled until I get things
working nicely in the (brave) new Console world.
This leaves interrupts enabled while we're in the kernel, which is
precisely what we want.
This uncovered a horrendous problem with kernel tasks silently
overflowing their stacks. For now I've simply increased the stack size
but I need a more MMU-y solution for this eventually.
We load /_hello.o which just prints out a simple message.
It executes inside the kernel itself, so no fancy userspace process
or anything, but this is still so cool!