Commit Graph

541 Commits

Author SHA1 Message Date
Andreas Kling
3f254bfbc8 Kernel+ping: Only allow superuser to create SOCK_RAW sockets
/bin/ping is now setuid-root, and will drop privileges immediately
after opening a raw socket.
2019-12-31 01:42:34 +01:00
Andreas Kling
a69734bf2e Kernel: Also add a process boosting mechanism
Let's also have set_process_boost() for giving all threads in a process
the same boost.
2019-12-30 20:10:00 +01:00
Andreas Kling
610f3ad12f Kernel: Add a basic thread boosting mechanism
This patch introduces a syscall:

    int set_thread_boost(int tid, int amount)

You can use this to add a permanent boost value to the effective thread
priority of any thread with your UID (or any thread in the system if
you are the superuser.)

This is quite crude, but opens up some interesting opportunities. :^)
2019-12-30 19:23:13 +01:00
Andreas Kling
50677bf806 Kernel: Refactor scheduler to use dynamic thread priorities
Threads now have numeric priorities with a base priority in the 1-99
range.

Whenever a runnable thread is *not* scheduled, its effective priority
is incremented by 1. This is tracked in Thread::m_extra_priority.
The effective priority of a thread is m_priority + m_extra_priority.

When a runnable thread *is* scheduled, its m_extra_priority is reset to
zero and the effective priority returns to base.

This means that lower-priority threads will always eventually get
scheduled to run, once its effective priority becomes high enough to
exceed the base priority of threads "above" it.

The previous values for ThreadPriority (Low, Normal and High) are now
replaced as follows:

    Low -> 10
    Normal -> 30
    High -> 50

In other words, it will take 20 ticks for a "Low" priority thread to
get to "Normal" effective priority, and another 20 to reach "High".

This is not perfect, and I've used some quite naive data structures,
but I think the mechanism will allow us to build various new and
interesting optimizations, and we can figure out better data structures
later on. :^)
2019-12-30 18:46:17 +01:00
Andrew Kaster
cdcab7e5f4 Kernel: Retry mmap if MAP_FIXED is not in flags and addr is not 0
If an mmap fails to allocate a region, but the addr passed in was
non-zero, non-fixed mmaps should attempt to allocate at any available
virtual address.
2019-12-29 23:01:27 +01:00
Andreas Kling
fed3416bd2 Kernel: Embrace the SerenityOS name 2019-12-29 19:08:02 +01:00
Andreas Kling
1f31156173 Kernel: Add a mode flag to sys$purge and allow purging clean inodes 2019-12-29 13:16:53 +01:00
Andreas Kling
c74cde918a Kernel+SystemMonitor: Expose amount of per-process clean inode memory
This is memory that's loaded from an inode (file) but not modified in
memory, so still identical to what's on disk. This kind of memory can
be freed and reloaded transparently from disk if needed.
2019-12-29 12:45:58 +01:00
Andreas Kling
0d5e0e4cad Kernel+SystemMonitor: Expose amount of per-process dirty private memory
Dirty private memory is all memory in non-inode-backed mappings that's
process-private, meaning it's not shared with any other process.

This patch exposes that number via SystemMonitor, giving us an idea of
how much memory each process is responsible for all on its own.
2019-12-29 12:28:32 +01:00
Andreas Kling
95034fdfbd Kernel: Move PC speaker beep timing logic from scheduler to the syscall
I don't know why I put this in the scheduler to begin with.. the caller
can just block until the beeping is finished.
2019-12-26 22:31:26 +01:00
Andreas Kling
4a8683ea68 Kernel+LibPthread+LibC: Add a naive futex and use it for pthread_cond_t
This patch implements a simple version of the futex (fast userspace
mutex) API in the kernel and uses it to make the pthread_cond_t API's
block instead of busily sched_yield().

An arbitrary userspace address is passed to the kernel as a "token"
that identifies the futex and you can then FUTEX_WAIT and FUTEX_WAKE
that specific userspace address.

FUTEX_WAIT corresponds to pthread_cond_wait() and FUTEX_WAKE is used
for pthread_cond_signal() and pthread_cond_broadcast().

I'm pretty sure I'm missing something in this implementation, but it's
hopefully okay for a start. :^)
2019-12-25 23:54:06 +01:00
Andreas Kling
9e55bcb7da Kernel: Make kernel memory regions be non-executable by default
From now on, you'll have to request executable memory specifically
if you want some.
2019-12-25 22:41:34 +01:00
Andreas Kling
56a28890eb Kernel: Clarify the various input validity checks in mmap()
Also share some validation logic between mmap() and mprotect().
2019-12-25 21:50:13 +01:00
Andreas Kling
419e0ced27 Kernel: Don't allow mmap()/mprotect() to set up PROT_WRITE|PROT_EXEC
..but also allow mprotect() to set PROT_EXEC on a region, something
we were just ignoring before.
2019-12-25 13:35:57 +01:00
Conrad Pankoff
efa7141d14 Kernel: Fail module loading if any symbols can not be resolved 2019-12-24 11:52:01 +01:00
Conrad Pankoff
9a8032b479 Kernel: Disallow loading a module twice without explicitly unloading it
This ensures that a module has the chance to run its cleanup functions
before it's taken out of service.
2019-12-24 02:20:37 +01:00
Conrad Pankoff
3aaeff483b Kernel: Add a size argument to validate_read_from_kernel 2019-12-24 01:28:38 +01:00
Andreas Kling
4b8851bd01 Kernel: Make TID's be unique PID's
This is a little strange, but it's how I understand things should work.

The first thread in a new process now has TID == PID.
Additional threads subsequently spawned in that process all have unique
TID's generated by the PID allocator. TIDs are now globally unique.
2019-12-22 12:38:01 +01:00
Andreas Kling
16812f0f98 Kernel: Get rid of "main thread" concept
The idea of all processes reliably having a main thread was nice in
some ways, but cumbersome in others. More importantly, it didn't match
up with POSIX thread semantics, so let's move away from it.

This thread gets rid of Process::main_thread() and you now we just have
a bunch of Thread objects floating around each Process.

When the finalizer nukes the last Thread in a Process, it will also
tear down the Process.

There's a bunch of more things to fix around this, but this is where we
get started :^)
2019-12-22 12:37:58 +01:00
Andreas Kling
b6ee8a2c8d Kernel: Rename vmo => vmobject everywhere 2019-12-19 19:15:27 +01:00
Andreas Kling
8ea4217c01 Kernel: Merge Process::fork() into sys$fork()
There was no good reason for this to be a separate function.
2019-12-19 19:07:41 +01:00
Andreas Kling
3012b224f0 Kernel: Fix intermittent assertion failure in sys$exec()
While setting up the main thread stack for a new process, we'd incur
some zero-fill page faults. This was to be expected, since we allocate
a huge stack but lazily populate it with physical pages.

The problem is that page fault handlers may enable interrupts in order
to grab a VMObject lock (or to page in from an inode.)

During exec(), a process is reorganizing itself and will be in a very
unrunnable state if the scheduler should interrupt it and then later
ask it to run again. Which is exactly what happens if the process gets
pre-empted while the new stack's zero-fill page fault grabs the lock.

This patch fixes the issue by creating new main thread stacks before
disabling interrupts and going into the critical part of exec().
2019-12-18 23:03:23 +01:00
Andreas Kling
72ec2fae6e Kernel: Ignore MADV_SET_NONVOLATILE if already non-volatile
Just return 0 right away without changing any region flags.
2019-12-18 20:48:58 +01:00
Andreas Kling
487f9b373b Kernel: Add MADV_GET_VOLATILE for checking the volatile flag
Sometimes you might want to know if a purgeable region is volatile.
2019-12-18 20:48:24 +01:00
Andreas Kling
0a75a46501 Kernel: Make sure the kernel info page is read-only for userspace
To enforce this, we create two separate mappings of the same underlying
physical page. A writable mapping for the kernel, and a read-only one
for userspace (the one returned by sys$get_kernel_info_page.)
2019-12-15 22:21:28 +01:00
Andreas Kling
77cf607cda Kernel+LibC: Publish a "kernel info page" and use it for gettimeofday()
This patch adds a single "kernel info page" that is mappable read-only
by any process and contains the current time of day.

This is then used to implement a version of gettimeofday() that doesn't
have to make a syscall.

To protect against race condition issues, the info page also has a
serial number which is incremented whenever the kernel updates the
contents of the page. Make sure to verify that the serial number is the
same before and after reading the information you want from the page.
2019-12-15 21:29:26 +01:00
Andreas Kling
931e4b7f5e Kernel+SystemMonitor: Prevent userspace access to process ELF image
Every process keeps its own ELF executable mapped in memory in case we
need to do symbol lookup (for backtraces, etc.)

Until now, it was mapped in a way that made it accessible to the
program, despite the program not having mapped it itself.
I don't really see a need for userspace to have access to this right
now, so let's lock things down a little bit.

This patch makes it inaccessible to userspace and exposes that fact
through /proc/PID/vm (per-region "user_accessible" flag.)
2019-12-15 20:11:57 +01:00
Andreas Kling
5292f6e78f Kernel+FileManager: Disallow watch_file() in unsupported file systems
Currently only Ext2FS and TmpFS supports InodeWatchers. We now fail
with ENOTSUPP if watch_file() is called on e.g ProcFS.

This fixes an issue with FileManager chewing up all the CPU when /proc
was opened. Watchers don't keep the watched Inode open, and when they
close, the watcher FD will EOF.

Since nothing else kept /proc open in FileManager, the watchers created
for it would EOF immediately, causing a refresh over and over.

Fixes #879.
2019-12-15 19:33:39 +01:00
Andreas Kling
d723d9844f Kernel: Remove spammy log message in sys$sendto() 2019-12-14 11:30:45 +01:00
Andreas Kling
b32e961a84 Kernel: Implement a simple process time profiler
The kernel now supports basic profiling of all the threads in a process
by calling profiling_enable(pid_t). You finish the profiling by calling
profiling_disable(pid_t).

This all works by recording thread stacks when the timer interrupt
fires and the current thread is in a process being profiled.
Note that symbolication is deferred until profiling_disable() to avoid
adding more noise than necessary to the profile.

A simple "/bin/profile" command is included here that can be used to
start/stop profiling like so:

    $ profile 10 on
    ... wait ...
    $ profile 10 off

After a profile has been recorded, it can be fetched in /proc/profile

There are various limits (or "bugs") on this mechanism at the moment:

- Only one process can be profiled at a time.
- We allocate 8MB for the samples, if you use more space, things will
  not work, and probably break a bit.
- Things will probably fall apart if the profiled process dies during
  profiling, or while extracing /proc/profile
2019-12-11 20:36:56 +01:00
Andreas Kling
0317ca5ccc Kernel+LibC: Make all SharedBuffers purgeable (default: non-volatile)
This patch makes SharedBuffer use a PurgeableVMObject as its underlying
memory object.

A new syscall is added to control the volatile flag of a SharedBuffer.
2019-12-09 20:06:47 +01:00
Andreas Kling
dbb644f20c Kernel: Start implementing purgeable memory support
It's now possible to get purgeable memory by using mmap(MAP_PURGEABLE).
Purgeable memory has a "volatile" flag that can be set using madvise():

- madvise(..., MADV_SET_VOLATILE)
- madvise(..., MADV_SET_NONVOLATILE)

When in the "volatile" state, the kernel may take away the underlying
physical memory pages at any time, without notifying the owner.
This gives you a guilt discount when caching very large things. :^)

Setting a purgeable region to non-volatile will return whether or not
the memory has been taken away by the kernel while being volatile.
Basically, if madvise(..., MADV_SET_NONVOLATILE) returns 1, that means
the memory was purged while volatile, and whatever was in that piece
of memory needs to be reconstructed before use.
2019-12-09 19:12:38 +01:00
Andreas Kling
7248c34e35 AK: SinglyLinkedList::size_slow() should return size_t 2019-12-09 17:51:21 +01:00
Andreas Kling
6f4c380d95 AK: Use size_t for the length of strings
Using int was a mistake. This patch changes String, StringImpl,
StringView and StringBuilder to use size_t instead of int for lengths.
Obviously a lot of code needs to change as a result of this.
2019-12-09 17:51:21 +01:00
Andrew Kaster
9058962712 Kernel: Allow setting thread names
The main thread of each kernel/user process will take the name of
the process. Extra threads will get a fancy new name
"ProcessName[<tid>]".

Thread backtraces now list the thread name in addtion to tid.

Add the thread name to /proc/all (should it get its own proc
file?).

Add two new syscalls, set_thread_name and get_thread_name.
2019-12-08 14:09:29 +01:00
Andreas Kling
95b086f47f Kernel+LibPthread: Implement pthread_detach() 2019-12-07 14:52:27 +01:00
Andreas Kling
23e802518d Kernel: Add getsockopt(SO_PEERCRED) for local sockets
This sockopt gives you a struct with the PID, UID and GID of a socket's
peer process.
2019-12-06 18:38:36 +01:00
Andreas Kling
f41ae755ec Kernel: Crash on memory access in non-readable regions
This patch makes it possible to make memory regions non-readable.
This is enforced using the "present" bit in the page tables.
A process that hits an not-present page fault in a non-readable
region will be crashed.
2019-12-02 19:18:52 +01:00
Andreas Kling
272d65e3e2 WindowServer: Port to the new IPC system
This patch introduces code generation for the WindowServer IPC with
its clients. The client/server endpoints are defined by the two .ipc
files in Servers/WindowServer/: WindowServer.ipc and WindowClient.ipc

It now becomes significantly easier to add features and capabilities
to WindowServer since you don't have to know nearly as much about all
the intricate paths that IPC messages take between LibGUI and WSWindow.

The new system also uses significantly less IPC bandwidth since we're
now doing packed serialization instead of passing fixed-sized structs
of ~600 bytes for each message.

Some repaint coalescing optimizations are lost in this conversion and
we'll need to look at how to implement those in the new world.

The old CoreIPC::Client::Connection and CoreIPC::Server::Connection
classes are removed by this patch and replaced by use of ConnectionNG,
which will be renamed eventually.

Goodbye, old WindowServer IPC. You served us well :^)
2019-12-02 11:11:05 +01:00
Andreas Kling
ef32c71683 Kernel: Have modules export their name in a "module_name" string
This will show up in /proc/modules, and is also the name you can pass
to the module_unload() syscall for unloading the module.
2019-11-29 21:31:17 +01:00
Andreas Kling
4ef6be8212 Kernel: Allow modules to link against anything in kernel.map :^)
We now use the symbols from kernel.map to link modules as they are
loaded into the kernel. This is pretty fricken cool!
2019-11-28 21:30:20 +01:00
Andreas Kling
a43b115a6c Kernel: Implement basic module unloading :^)
Kernel modules can now be unloaded via a syscall. They get a chance to
run some code of course. Before deallocating them, we call their
"module_fini" symbol.
2019-11-28 21:07:22 +01:00
Andreas Kling
6b150c794a Kernel: Implement very simple kernel module loading
It's now possible to load a .o file into the kernel via a syscall.
The kernel will perform all the necessary ELF relocations, and then
call the "module_init" symbol in the loaded module.
2019-11-28 20:59:11 +01:00
Andreas Kling
23a2e84873 Kernel: listen() should fail with EINVAL for already-connected sockets
This according to POSIX.
2019-11-27 16:01:22 +01:00
Andreas Kling
bd86ebbcc0 Kernel: Remove outdated FIXME about EINTR in select()
This is actually already implemented. :^)
2019-11-27 15:31:23 +01:00
Hüseyin ASLITÜRK
794ca16cca Kernel: Implement the setkeymap() syscall. 2019-11-25 11:53:02 +01:00
Andreas Kling
3dc87be891 Kernel: Mark mmap()-created regions with a special bit
Then only allow regions with that bit to be manipulated via munmap()
and mprotect(). This prevents messing with non-mmap()ed regions in
a process's address space (stacks, shared buffers, ...)
2019-11-24 12:26:21 +01:00
Sergey Bugaev
16b9b3f228 Kernel: Don't interrupt short writes
Remove explicit checking for pending signals from writing code paths,
since this is handled automatically when blocking, and should not
happen if the write() call is "short", i.e. doesn't block. All the
other syscalls already work like this.

Fixes https://github.com/SerenityOS/serenity/issues/797
2019-11-19 14:07:24 +01:00
Andrew Kaster
618aebdd8a Kernel+LibPthread: pthread_create handles pthread_attr_t
Add an initial implementation of pthread attributes for:
  * detach state (joinable, detached)
  * schedule params (just priority)
  * guard page size (as skeleton) (requires kernel support maybe?)
  * stack size and user-provided stack location (4 or 8 MB only, must be aligned)

Add some tests too, to the thread test program.

Also, LibC: Move pthread declarations to sys/types.h, where they belong.
2019-11-18 09:04:32 +01:00
Andreas Kling
3da6d89d1f Kernel+LibC: Remove the isatty() syscall
This can be implemented entirely in userspace by calling tcgetattr().
To avoid screwing up the syscall indexes, this patch also adds a
mechanism for removing a syscall without shifting the index of other
syscalls.

Note that ports will still have to be rebuilt after this change,
as their LibC code will try to make the isatty() syscall on startup.
2019-11-17 20:03:42 +01:00