Commit Graph

300 Commits

Author SHA1 Message Date
Andreas Kling
fe1bf067b8 ProcFS: Reads past the end of a generated file should be zero-length 2020-01-08 12:59:06 +01:00
Andreas Kling
28ee5b0e98 TmpFS: Reads past the end of a file should be zero-length 2020-01-08 12:47:41 +01:00
Andreas Kling
faf32153f6 Kernel: Take const Process& in InodeMetadata::may_{read,write,execute} 2020-01-07 19:24:06 +01:00
Andreas Kling
5387a19268 Kernel: Make Process::file_description() vend a RefPtr<FileDescription>
This encourages callers to strongly reference file descriptions while
working with them.

This fixes a use-after-free issue where one thread would close() an
open fd while another thread was blocked on it becoming readable.

Test: Kernel/uaf-close-while-blocked-in-read.cpp
2020-01-07 15:53:42 +01:00
Andreas Kling
a49d9c774f TmpFS: Add ASSERT(offset >= 0) to read_bytes() and write_bytes() 2020-01-07 15:25:56 +01:00
Andreas Kling
bb9db9d430 TmpFS: Add "." and ".." entries to all directories
It was so weird not seeing them in "ls -la" output :^)
2020-01-07 14:48:43 +01:00
Andreas Kling
56a2c21e0c Kernel: Don't leak kmalloc pointers through FIFO absolute paths
Instead of using the FIFO's memory address as part of its absolute path
identity, just use an incrementing FIFO index instead.

Note that this is not used for anything other than debugging (it helps
you identify which file descriptors refer to the same FIFO by looking
at /proc/PID/fds
2020-01-07 10:29:47 +01:00
Andreas Kling
9eef39d68a Kernel: Start implementing x86 SMAP support
Supervisor Mode Access Prevention (SMAP) is an x86 CPU feature that
prevents the kernel from accessing userspace memory. With SMAP enabled,
trying to read/write a userspace memory address while in the kernel
will now generate a page fault.

Since it's sometimes necessary to read/write userspace memory, there
are two new instructions that quickly switch the protection on/off:
STAC (disables protection) and CLAC (enables protection.)
These are exposed in kernel code via the stac() and clac() helpers.

There's also a SmapDisabler RAII object that can be used to ensure
that you don't forget to re-enable protection before returning to
userspace code.

THis patch also adds copy_to_user(), copy_from_user() and memset_user()
which are the "correct" way of doing things. These functions allow us
to briefly disable protection for a specific purpose, and then turn it
back on immediately after it's done. Going forward all kernel code
should be moved to using these and all uses of SmapDisabler are to be
considered FIXME's.

Note that we're not realizing the full potential of this feature since
I've used SmapDisabler quite liberally in this initial bring-up patch.
2020-01-05 18:14:51 +01:00
Andreas Kling
12eb1f5d74 Kernel: Entries in /dev/pts should be accessible only to the owner
This fixes an issue where anyone could snoop on any pseudoterminal.
2020-01-04 12:46:48 +01:00
Andreas Kling
b5da0b78eb Kernel: File::open() should apply r/w mode from the provided options
This has been a FIXME for a long time. We now apply the provided
read/write permissions to the constructed FileDescription when opening
a File object via File::open().
2020-01-04 12:30:55 +01:00
Andreas Kling
e79c33eabb Kernel: The root inode of a TmpFS should have the sticky bit set
We were running without the sticky bit and mode 777, which meant that
the /tmp directory was world-writable *without* protection.

With this fixed, it's no longer possible for everyone to steal root's
files in /tmp.
2020-01-04 11:33:36 +01:00
Andreas Kling
d84299c7be Kernel: Allow fchmod() and fchown() on pre-bind() local sockets
In order to ensure a specific owner and mode when the local socket
filesystem endpoint is instantiated, we need to be able to call
fchmod() and fchown() on a socket fd between socket() and bind().

This is because until we call bind(), there is no filesystem inode
for the socket yet.
2020-01-03 20:14:56 +01:00
Andreas Kling
4abbedb6e4 Kernel: Allow passing initial UID and GID when creating new inodes
If we're creating something that should have a different owner than the
current process's UID/GID, we need to plumb that all the way through
VFS down to the FS functions.
2020-01-03 20:13:21 +01:00
Andreas Kling
82760998a9 Ext2FS: Take the inode lock in Ext2FSInode::metadata()
Remove an unnecessary InterruptDisabler to make this not assert. :^)
2020-01-03 17:48:02 +01:00
Andreas Kling
889ecd1375 Kernel: The superuser is allowed to utime() on any file
Before this patch, root was not able to "touch" someone else's file.
2020-01-03 04:14:41 +01:00
Andreas Kling
3f74e66e82 Kernel: rename() should fail with EXDEV for cross-device requests
POSIX does not support rename() from one file system to another.
2020-01-03 04:10:05 +01:00
Andreas Kling
3be1c7b514 Kernel: Fix awkward bug where "touch /foo/bar/baz" could create "/baz"
To accomodate file creation, path resolution optionally returns the
last valid parent directory seen while traversing the path.

Clients will then interpret "ENOENT, but I have a parent for you" as
meaning that the file doesn't exist, but its immediate parent directory
does. The client then goes ahead and creates a new file.

In the case of "/foo/bar/baz" where there is no "/foo", it would fail
with ENOENT and "/" as the last seen parent directory, causing e.g the
open() syscall to create "/baz".

Covered by test_io.
2020-01-03 03:57:10 +01:00
Andreas Kling
0a1865ebc6 Kernel: read() and write() should fail with EBADF for wrong mode fd's
It was previously possible to write to read-only file descriptors,
and read from write-only file descriptors.

All FileDescription objects now start out non-readable + non-writable,
and whoever is creating them has to "manually" enable reading/writing
by calling set_readable() and/or set_writable() on them.
2020-01-03 03:29:59 +01:00
Andreas Kling
064e46e581 Kernel: Don't allow open() with (O_CREAT | O_DIRECTORY) 2020-01-03 03:16:29 +01:00
Andreas Kling
15f3abc849 Kernel: Handle O_DIRECTORY in VFS::open() instead of in each syscall
Just taking care of some FIXMEs.
2020-01-03 03:16:29 +01:00
Andreas Kling
fdde5cdf26 Kernel: Don't include the process GID in the "extra GIDs" table
Process::m_extra_gids is for supplementary GIDs only.
2020-01-02 23:45:52 +01:00
Andreas Kling
32ec1e5aed Kernel: Mask kernel addresses in backtraces and profiles
Addresses outside the userspace virtual range will now show up as
0xdeadc0de in backtraces and profiles generated by unprivileged users.
2020-01-02 20:51:31 +01:00
Andreas Kling
7f04334664 Kernel: Remove broken implementation of Unix SHM
This code never worked, as was never used for anything. We can build
a much better SHM implementation on top of TmpFS or similar when we
get to the point when we need one.
2020-01-02 12:44:21 +01:00
Liav A
e5ffa960d7 Kernel: Create support for PCI ECAM
The new PCI subsystem is initialized during runtime.
PCI::Initializer is supposed to be called during early boot, to
perform a few tests, and initialize the proper configuration space
access mechanism. Kernel boot parameters can be specified by a user to
determine what tests will occur, to aid debugging on problematic
machines.
After that, PCI::Initializer should be dismissed.

PCI::IOAccess is a class that is derived from PCI::Access
class and implements PCI configuration space access mechanism via x86
IO ports.
PCI::MMIOAccess is a class that is derived from PCI::Access
and implements PCI configurtaion space access mechanism via memory
access.

The new PCI subsystem also supports determination of IO/MMIO space
needed by a device by checking a given BAR.
In addition, Every device or component that use the PCI subsystem has
changed to match the last changes.
2020-01-02 00:50:09 +01:00
Andreas Kling
d8ef13a426 ProcFS: Supervisor-only inodes should be owned by UID 0, GID 0 2019-12-31 13:22:43 +01:00
Andreas Kling
9af054af9e ProcFS: Reduce the amount of info accessible to non-superusers
This patch hardens /proc a bit by making many things only accessible
to UID 0, and also disallowing access to /proc/PID/ for anyone other
than the UID of that process (and superuser, obviously.)
2019-12-31 01:32:27 +01:00
Andreas Kling
54d182f553 Kernel: Remove some unnecessary leaking of kernel pointers into dmesg
There's a lot more of this and we need to stop printing kernel pointers
anywhere but the debug console.
2019-12-31 01:22:00 +01:00
Andreas Kling
50677bf806 Kernel: Refactor scheduler to use dynamic thread priorities
Threads now have numeric priorities with a base priority in the 1-99
range.

Whenever a runnable thread is *not* scheduled, its effective priority
is incremented by 1. This is tracked in Thread::m_extra_priority.
The effective priority of a thread is m_priority + m_extra_priority.

When a runnable thread *is* scheduled, its m_extra_priority is reset to
zero and the effective priority returns to base.

This means that lower-priority threads will always eventually get
scheduled to run, once its effective priority becomes high enough to
exceed the base priority of threads "above" it.

The previous values for ThreadPriority (Low, Normal and High) are now
replaced as follows:

    Low -> 10
    Normal -> 30
    High -> 50

In other words, it will take 20 ticks for a "Low" priority thread to
get to "Normal" effective priority, and another 20 to reach "High".

This is not perfect, and I've used some quite naive data structures,
but I think the mechanism will allow us to build various new and
interesting optimizations, and we can figure out better data structures
later on. :^)
2019-12-30 18:46:17 +01:00
Andreas Kling
c74cde918a Kernel+SystemMonitor: Expose amount of per-process clean inode memory
This is memory that's loaded from an inode (file) but not modified in
memory, so still identical to what's on disk. This kind of memory can
be freed and reloaded transparently from disk if needed.
2019-12-29 12:45:58 +01:00
Andreas Kling
0d5e0e4cad Kernel+SystemMonitor: Expose amount of per-process dirty private memory
Dirty private memory is all memory in non-inode-backed mappings that's
process-private, meaning it's not shared with any other process.

This patch exposes that number via SystemMonitor, giving us an idea of
how much memory each process is responsible for all on its own.
2019-12-29 12:28:32 +01:00
Andreas Kling
dafd715743 ProcFS: Fix inconsistent numbers in /proc/memstat
We were listing the total number of user/super pages as the number of
"available" pages in the system. This was then misinterpreted in the
SystemMonitor program and displayed wrong in the GUI.
2019-12-26 11:43:42 +01:00
Shannon Booth
0e45b9423b Kernel: Implement recursion limit on path resolution
Cautiously use 5 as a limit for now so that we don't blow the stack.
This can be increased in the future if we are sure that we won't be
blowing the stack, or if the implementation is changed to not use
recursion :^)
2019-12-24 23:14:14 +01:00
Andreas Kling
c5c1cc817e Kernel: Expose region executable bit in /proc/PID/vm
Also show it in SystemMonitor's process memory map table (as 'X') :^)
2019-12-21 16:26:42 +01:00
Andreas Kling
b6ee8a2c8d Kernel: Rename vmo => vmobject everywhere 2019-12-19 19:15:27 +01:00
Andreas Kling
931e4b7f5e Kernel+SystemMonitor: Prevent userspace access to process ELF image
Every process keeps its own ELF executable mapped in memory in case we
need to do symbol lookup (for backtraces, etc.)

Until now, it was mapped in a way that made it accessible to the
program, despite the program not having mapped it itself.
I don't really see a need for userspace to have access to this right
now, so let's lock things down a little bit.

This patch makes it inaccessible to userspace and exposes that fact
through /proc/PID/vm (per-region "user_accessible" flag.)
2019-12-15 20:11:57 +01:00
Andreas Kling
5292f6e78f Kernel+FileManager: Disallow watch_file() in unsupported file systems
Currently only Ext2FS and TmpFS supports InodeWatchers. We now fail
with ENOTSUPP if watch_file() is called on e.g ProcFS.

This fixes an issue with FileManager chewing up all the CPU when /proc
was opened. Watchers don't keep the watched Inode open, and when they
close, the watcher FD will EOF.

Since nothing else kept /proc open in FileManager, the watchers created
for it would EOF immediately, causing a refresh over and over.

Fixes #879.
2019-12-15 19:33:39 +01:00
Andreas Kling
3fbc50a350 Kernel+SystemMonitor: Expose the number of set CoW bits in each Region
This number tells us how many more pages in a given region will trigger
a CoW fault if written to.
2019-12-15 16:53:00 +01:00
Andreas Kling
0f393148da Kernel: Separate out the symbol offsets in profile output
Instead of saying "main +39" and "main +57" etc, we now have a separate
field in /proc/profile for the offset-into-the-symbol.
2019-12-12 21:59:47 +01:00
Andreas Kling
bc919c8432 ProcFS: Rename "samples" in /proc/profile to "frames"
These are stack frames, not profile samples. :^)
2019-12-12 20:36:41 +01:00
Andreas Kling
b32e961a84 Kernel: Implement a simple process time profiler
The kernel now supports basic profiling of all the threads in a process
by calling profiling_enable(pid_t). You finish the profiling by calling
profiling_disable(pid_t).

This all works by recording thread stacks when the timer interrupt
fires and the current thread is in a process being profiled.
Note that symbolication is deferred until profiling_disable() to avoid
adding more noise than necessary to the profile.

A simple "/bin/profile" command is included here that can be used to
start/stop profiling like so:

    $ profile 10 on
    ... wait ...
    $ profile 10 off

After a profile has been recorded, it can be fetched in /proc/profile

There are various limits (or "bugs") on this mechanism at the moment:

- Only one process can be profiled at a time.
- We allocate 8MB for the samples, if you use more space, things will
  not work, and probably break a bit.
- Things will probably fall apart if the profiled process dies during
  profiling, or while extracing /proc/profile
2019-12-11 20:36:56 +01:00
Andreas Kling
e9dda8d592 Kernel: Give PTY's *actually* unique major ID's
Okay, one "dunce hat" point for me. The new PTY majors conflicted with
PATAChannel. Now they are 200 for master and 201 for slave, not used
by anything else.. I hope!
2019-12-09 21:03:39 +01:00
Andreas Kling
dbb644f20c Kernel: Start implementing purgeable memory support
It's now possible to get purgeable memory by using mmap(MAP_PURGEABLE).
Purgeable memory has a "volatile" flag that can be set using madvise():

- madvise(..., MADV_SET_VOLATILE)
- madvise(..., MADV_SET_NONVOLATILE)

When in the "volatile" state, the kernel may take away the underlying
physical memory pages at any time, without notifying the owner.
This gives you a guilt discount when caching very large things. :^)

Setting a purgeable region to non-volatile will return whether or not
the memory has been taken away by the kernel while being volatile.
Basically, if madvise(..., MADV_SET_NONVOLATILE) returns 1, that means
the memory was purged while volatile, and whatever was in that piece
of memory needs to be reconstructed before use.
2019-12-09 19:12:38 +01:00
Andreas Kling
6f4c380d95 AK: Use size_t for the length of strings
Using int was a mistake. This patch changes String, StringImpl,
StringView and StringBuilder to use size_t instead of int for lengths.
Obviously a lot of code needs to change as a result of this.
2019-12-09 17:51:21 +01:00
Andrew Kaster
9058962712 Kernel: Allow setting thread names
The main thread of each kernel/user process will take the name of
the process. Extra threads will get a fancy new name
"ProcessName[<tid>]".

Thread backtraces now list the thread name in addtion to tid.

Add the thread name to /proc/all (should it get its own proc
file?).

Add two new syscalls, set_thread_name and get_thread_name.
2019-12-08 14:09:29 +01:00
Andreas Kling
c5ff0da93c ProcFS: Fix typo in /proc/net/local 2019-12-07 09:40:30 +01:00
Andreas Kling
23e802518d Kernel: Add getsockopt(SO_PEERCRED) for local sockets
This sockopt gives you a struct with the PID, UID and GID of a socket's
peer process.
2019-12-06 18:38:36 +01:00
Andreas Kling
5a45376180 Kernel+SystemMonitor: Log amounts of I/O per thread
This patch adds these I/O counters to each thread:

- (Inode) file read bytes
- (Inode) file write bytes
- Unix socket read bytes
- Unix socket write bytes
- IPv4 socket read bytes
- IPv4 socket write bytes

These are then exposed in /proc/all and seen in SystemMonitor.
2019-12-01 17:40:27 +01:00
Andreas Kling
3ad0e6e198 Kernel: Show module memory size in /proc/modules
Note that this only shows the size of the loaded module sections,
and does not include any memory allocated *by* the module.
2019-11-29 21:18:37 +01:00
Andreas Kling
1f67894bdd Kernel: Add /proc/modules to enumerate the currently loaded modules 2019-11-28 21:12:02 +01:00
Andreas Kling
75ed262fe5 Kernel+ifconfig: Add an MTU value to NetworkAdapter
This defaults to 1500 for all adapters, but LoopbackAdapter increases
it to 65536 on construction.

If an IPv4 packet is larger than the MTU, we'll need to break it into
smaller fragments before transmitting it. This part is a FIXME. :^)
2019-11-28 14:14:26 +01:00