The idea here is to combine a potential syscall error code with an arbitrary
type in the case of success. I feel like this will end up much less error
prone than returning some arbitrary type that kinda sorta has bool semantics
(but sometimes not really) and passing the error through an out-param.
This patch only converts a few syscalls to using it. More to come.
It was very confusing that you had to open a FileDescriptor in order to stat
a file. This patch gives VFS a separate stat() function and uses it to
implement the stat() and lstat() syscalls.
I set it up so that TIOCSWINSZ on a master PTY gets forwarded to the slave.
This feels intuitively right. Terminal can then use that to inform the shell
or whoever is inside the slave that the window size has changed.
TIOCSWINSZ also triggers the generation of a SIGWINCH signal. :^)
Turns out FD_CLOEXEC and O_CLOEXEC are different values. Silly mistake.
I noticed that Terminal's shell process still had the Terminal's window
server connection open, albeit in a broken state.
When the kernel performs a successful exec(), whatever was on the kernel
stack for that process before goes away. For this reason, we need to make
sure we don't have any stack objects holding onto kmalloc memory.
The mismatch between the two was causing some trouble if you'd mmap e.g 1KB
and then try to munmap() it. The kernel would whine that it couldn't find
any such mapping (because mmap() actually rounded the 1KB to a 4KB page.)
This is a monster patch that required changing a whole bunch of things.
There are performance and stability issues all over the place, but it works.
Pretty cool, I have to admit :^)
Currently you can only mmap the entire framebuffer.
Using this when starting up the WindowServer gets us yet another step
closer towards it moving into userspace. :^)
We were reading one client message per client per event loop iteration.
That was not very snappy. Make the sockets non-blocking and read() until
there are no messages left.
It would be even better to make as few calls to read() as possible to
reduce context switching, but this is already a huge improvement.
This is really cool! :^)
Apps currently refuse to start if the WindowServer isn't listening on the
socket in /wsportal. This makes sense, but I guess it would also be nice
to have some sort of "wait for server on startup" mode.
This has performance issues, and I'll work on those, but this stuff seems
to actually work and I'm very happy with that.
Since we know who's holding the lock, and we're gonna have to yield anyway,
we can just ask the scheduler to donate any remaining ticks to that process.
Instead of processes themselves getting scheduled to finish dying,
let's have a Finalizer process that wakes up whenever someone is dying.
This way we can do all kinds of lock-taking in process cleanup without
risking reentering the scheduler.
- Don't cli() in Process::do_exec() unless current is execing.
Eventually this should go away once the scheduler is less retarded
in the face of interrupts.
- Improved memory access validation for ring0 processes.
We now look at the kernel ELF header to determine if an access
is appropriate. :^) It's very hackish but also kinda neat.
- Have Process::die() put the process into a new "Dying" state where
it can still get scheduled but no signals will be dispatched.
This way we can keep executing in die() but won't get our EIP
hijacked by signal dispatch. The main problem here was that die()
wanted to take various locks.
Also add assertion in Lock that the scheduler isn't currently active.
I've been seeing occasional fuckups that I suspect might be someone called
by the scheduler trying to take a busy lock.
Instead of cowboy-calling the VESA BIOS in the bootloader, find the emulator
VGA adapter by scanning the PCI bus. Then set up the desired video mode by
sending device commands.
The current strategy is simply to nuke all physical pages and force
reload them from disk. This is obviously not optimal and should eventually
be optimized. It should be fairly straightforward.
Font now uses the same in-memory format as the font files we have on disk.
This allows us to simply mmap() the font files and not use any additional
memory for them. Very cool! :^)
Hacking on this exposed a bug in file-backed VMObjects where the first client
to instantiate a VMObject for a specific inode also got to decide its size.
Since file-backed VMObjects always have the same size as the underlying file,
this made no sense, so I removed the ability to even set a size in that case.
Also use an enum for the rather-confusing return value in dispatch_signal().
I will go through the rest of the signals and set them up with the
appropriate default dispositions at some other point.
Now the filesystem is generated on-the-fly instead of manually adding and
removing inodes as processes spawn and die.
The code is convoluted and bloated as I wrote it while sleepless. However,
it's still vastly better than the old ProcFS, so I'm committing it.
I also added /proc/PID/fd/N symlinks for each of a process's open fd's.
GObjects can now register a timer with the GEventLoop. This will eventually
cause GTimerEvents to be dispatched to the GObject.
This needed a few supporting changes in the kernel:
- The PIT now ticks 1000 times/sec.
- select() now supports an arbitrary timeout.
- gettimeofday() now returns something in the tv_usec field.
With these changes, the clock window in guitest2 finally ticks on its own.
FileDescriptor will now keep a pointer to the original inode even after
opening it resolves to a character device.
Fixed up /bin/ls to display major and minor device numbers instead of size
for device files.
This required a fair bit of plumbing. The CharacterDevice::close() virtual
will now be closed by ~FileDescriptor(), allowing device implementations to
do custom cleanup at that point.
One big problem remains: if the master PTY is closed before the slave PTY,
we go into crashy land.
Only raw octal modes are supported right now.
This patch also changes mode_t from 32-bit to 16-bit to match the on-disk
type used by Ext2FS.
I also ran into EPERM being errno=0 which was confusing, so I inserted an
ESUCCESS in its place.
It's really only supported in Ext2FS since SynthFS doesn't really want you
mucking around with its files. This is pretty neat though :^)
I ran into some trouble with HashMap while working on this but opted to work
around it and leave that for a separate investigation.
This means we only have to do one fill_rect() per line and the whole process
ends up being ~10% faster than before.
Also added a read_tsc() syscall to give userspace access to the TSC.
This patch adds most of the plumbing for working file deletion in Ext2FS.
Directory entries are removed and inode link counts updated.
We don't yet update the inode or block bitmaps, I will do that separately.
Make PageDirectory retainable and have each Region co-own the PageDirectory
they're mapped into. When unmapped, Region has no associated PageDirectory.
This allows Region to automatically unmap itself when destroyed.
It's a bit confusing that the "current" process is not actually running
while we're inside the scheduler. Perhaps the scheduler should redirect
"current" to its own dummy Process. I'm not sure.
Regardless, this patch improves responsiveness by allowing the scheduler
to unblock a process right after it calls select() in case it already has
a pending wakeup request.
The system can finally idle without burning CPU. :^)
There are some issues with scheduling making the mouse cursor sloppy
and unresponsive that need to be dealt with.
When you open /dev/ptmx, you get a file descriptor pointing to one of the
available MasterPTY's. If none are available, you get an EBUSY.
This makes it possible to open multiple (up to 4) Terminals. :^)
To support this, I also added a CharacterDevice::open() that gets control
when VFS is opening a CharacterDevice. This is useful when we want to return
a custom FileDescriptor like we do here.
Userspace programs can now open /dev/gui_events and read a stream of GUI_Event
structs one at a time.
I was stuck on a stupid problem where we'd reenter Scheduler::yield() due to
having one of the has_data_available_for_reading() implementations using locks.
This is a lot better than having them in kmalloc memory. I'm gonna need
a way to keep track of which process owns which bitmap eventually,
maybe through some sort of resource keying system. We'll see.
The old approach only worked because of an overpermissive accident.
There's now a concept of supervisor physical pages that can be allocated.
They all sit in the low 4 MB of physical memory and are identity mapped,
shared between all processes, and only ring 0 can access them.
Also use a simple array of { dword, const char* } for the KSyms and put the
whole shebang in kmalloc_eternal() memory. This was a fugly source of
kmalloc perma-frag.
It walks all the live Inode objects and flushes pending metadata changes
wherever needed.
This could be optimized by keeping a separate list of dirty Inodes,
but let's not get ahead of ourselves.
Use a little template magic to have Retainable::release() call out to
T::will_be_destroyed() if such a function exists before actually calling
the destructor. This gives us full access to virtual functions in the
pre-destruction code.
This synchronous approach to inodes is silly, obviously. I need to rework
it so that the in-memory CoreInode object is the canonical inode, and then
we just need a sync() that flushes pending changes to disk.
The kernel now bills processes for time spent in kernelspace and userspace
separately. The accounting is forwarded to the parent process in reap().
This makes the "time" builtin in bash work.
This way the scheduler doesn't need to plumb the exit status into the waiter.
We still plumb the waitee pid though, I don't love it but it can be fixed.
mmap() will now map uncommitted pages that get allocated and zeroed upon the
first access. I also made /proc/PID/vm show number of "committed" bytes in
each region. This is so cool! :^)
I was surprised to find that dup()'ed fds don't share the close-on-exec flag.
That means it has to be stored separately from the FileDescriptor object.
Instead of memcpy'ing the entire screen every time we press enter at the
bottom, use the VGA start address register to make a "view" onto the
underlying memory that moves downward as we scroll.
Eventually we run out of memory and have to reset to the start of the
buffer. That's when we memcpy everything. It would be cool if there was
some way to get the hardware to act like a ring buffer with automatic
wrapping here but I don't know how to do that.
I didn't even put the { } properly around everything that would leak.
Let's make sure this works correctly by splitting out the work into a
helper called do_exec().
- Process::exec() needs to restore the original paging scope when called
on a non-current process.
- Add missing InterruptDisabler guards around g_processes access.
- Only flush the TLB when modifying the active page tables.
This is really sweet! :^) The four instances of /bin/sh spawned at
startup now share their read-only text pages.
There are problems and limitations here, and plenty of room for
improvement. But it kinda works.
All right, we can now mmap() a file and it gets magically paged in from fs
in response to an NP page fault. This is really cool :^)
I need to refactor this to support sharing of read-only file-backed pages,
but it's cool to just have something working.
First of all, change sys$mmap to take a struct SC_mmap_params since our
sycsall calling convention can't handle more than 3 arguments.
This exposed a bug in Syscall::invoke() needing to use clobber lists.
It was a bit confusing to debug. :^)
This is dirty but pretty cool! If we have a pending, unmasked signal for
a process that's blocked inside the kernel, we set up alternate stacks
for that process and unblock it to execute the signal handler.
A slightly different return trampoline is used here: since we need to
get back into the kernel, a dedicated syscall is used (sys$sigreturn.)
This restores the TSS contents of the process to the state it was in
while we were originally blocking in the kernel.
NOTE: There's currently only one "kernel resume TSS" so signal nesting
definitely won't work.
Processes are either alive (with many substates), dead or forgiven.
A dead process is forgiven when the parent waitpid()s on it.
Dead orphans are also forgiven.
There's a lot of work to be done around this.
It only works for sending a signal to a process that's in userspace code.
We implement reception by synthesizing a PUSHA+PUSHF in the receiving process
(operating on values in the TSS.)
The TSS CS:EIP is then rerouted to the signal handler and a tiny return
trampoline is constructed in a dedicated region in the receiving process.
Also hacked up /bin/kill to be able to send arbitrary signals (kill -N PID)
Implemented some syscalls: dup(), dup2(), getdtablesize().
FileHandle is now a retainable, since that's needed for dup()'ed fd's.
I didn't really test any of this beyond a basic smoke check.
sys$fork() now clones all writable regions with per-page COW bits.
The pages are then mapped read-only and we handle a PF by COWing the pages.
This is quite delightful. Obviously there's lots of work to do still,
and it needs better data structures, but the general concept works.