Commit Graph

276 Commits

Author SHA1 Message Date
Andreas Kling
d8ef13a426 ProcFS: Supervisor-only inodes should be owned by UID 0, GID 0 2019-12-31 13:22:43 +01:00
Andreas Kling
9af054af9e ProcFS: Reduce the amount of info accessible to non-superusers
This patch hardens /proc a bit by making many things only accessible
to UID 0, and also disallowing access to /proc/PID/ for anyone other
than the UID of that process (and superuser, obviously.)
2019-12-31 01:32:27 +01:00
Andreas Kling
54d182f553 Kernel: Remove some unnecessary leaking of kernel pointers into dmesg
There's a lot more of this and we need to stop printing kernel pointers
anywhere but the debug console.
2019-12-31 01:22:00 +01:00
Andreas Kling
50677bf806 Kernel: Refactor scheduler to use dynamic thread priorities
Threads now have numeric priorities with a base priority in the 1-99
range.

Whenever a runnable thread is *not* scheduled, its effective priority
is incremented by 1. This is tracked in Thread::m_extra_priority.
The effective priority of a thread is m_priority + m_extra_priority.

When a runnable thread *is* scheduled, its m_extra_priority is reset to
zero and the effective priority returns to base.

This means that lower-priority threads will always eventually get
scheduled to run, once its effective priority becomes high enough to
exceed the base priority of threads "above" it.

The previous values for ThreadPriority (Low, Normal and High) are now
replaced as follows:

    Low -> 10
    Normal -> 30
    High -> 50

In other words, it will take 20 ticks for a "Low" priority thread to
get to "Normal" effective priority, and another 20 to reach "High".

This is not perfect, and I've used some quite naive data structures,
but I think the mechanism will allow us to build various new and
interesting optimizations, and we can figure out better data structures
later on. :^)
2019-12-30 18:46:17 +01:00
Andreas Kling
c74cde918a Kernel+SystemMonitor: Expose amount of per-process clean inode memory
This is memory that's loaded from an inode (file) but not modified in
memory, so still identical to what's on disk. This kind of memory can
be freed and reloaded transparently from disk if needed.
2019-12-29 12:45:58 +01:00
Andreas Kling
0d5e0e4cad Kernel+SystemMonitor: Expose amount of per-process dirty private memory
Dirty private memory is all memory in non-inode-backed mappings that's
process-private, meaning it's not shared with any other process.

This patch exposes that number via SystemMonitor, giving us an idea of
how much memory each process is responsible for all on its own.
2019-12-29 12:28:32 +01:00
Andreas Kling
dafd715743 ProcFS: Fix inconsistent numbers in /proc/memstat
We were listing the total number of user/super pages as the number of
"available" pages in the system. This was then misinterpreted in the
SystemMonitor program and displayed wrong in the GUI.
2019-12-26 11:43:42 +01:00
Shannon Booth
0e45b9423b Kernel: Implement recursion limit on path resolution
Cautiously use 5 as a limit for now so that we don't blow the stack.
This can be increased in the future if we are sure that we won't be
blowing the stack, or if the implementation is changed to not use
recursion :^)
2019-12-24 23:14:14 +01:00
Andreas Kling
c5c1cc817e Kernel: Expose region executable bit in /proc/PID/vm
Also show it in SystemMonitor's process memory map table (as 'X') :^)
2019-12-21 16:26:42 +01:00
Andreas Kling
b6ee8a2c8d Kernel: Rename vmo => vmobject everywhere 2019-12-19 19:15:27 +01:00
Andreas Kling
931e4b7f5e Kernel+SystemMonitor: Prevent userspace access to process ELF image
Every process keeps its own ELF executable mapped in memory in case we
need to do symbol lookup (for backtraces, etc.)

Until now, it was mapped in a way that made it accessible to the
program, despite the program not having mapped it itself.
I don't really see a need for userspace to have access to this right
now, so let's lock things down a little bit.

This patch makes it inaccessible to userspace and exposes that fact
through /proc/PID/vm (per-region "user_accessible" flag.)
2019-12-15 20:11:57 +01:00
Andreas Kling
5292f6e78f Kernel+FileManager: Disallow watch_file() in unsupported file systems
Currently only Ext2FS and TmpFS supports InodeWatchers. We now fail
with ENOTSUPP if watch_file() is called on e.g ProcFS.

This fixes an issue with FileManager chewing up all the CPU when /proc
was opened. Watchers don't keep the watched Inode open, and when they
close, the watcher FD will EOF.

Since nothing else kept /proc open in FileManager, the watchers created
for it would EOF immediately, causing a refresh over and over.

Fixes #879.
2019-12-15 19:33:39 +01:00
Andreas Kling
3fbc50a350 Kernel+SystemMonitor: Expose the number of set CoW bits in each Region
This number tells us how many more pages in a given region will trigger
a CoW fault if written to.
2019-12-15 16:53:00 +01:00
Andreas Kling
0f393148da Kernel: Separate out the symbol offsets in profile output
Instead of saying "main +39" and "main +57" etc, we now have a separate
field in /proc/profile for the offset-into-the-symbol.
2019-12-12 21:59:47 +01:00
Andreas Kling
bc919c8432 ProcFS: Rename "samples" in /proc/profile to "frames"
These are stack frames, not profile samples. :^)
2019-12-12 20:36:41 +01:00
Andreas Kling
b32e961a84 Kernel: Implement a simple process time profiler
The kernel now supports basic profiling of all the threads in a process
by calling profiling_enable(pid_t). You finish the profiling by calling
profiling_disable(pid_t).

This all works by recording thread stacks when the timer interrupt
fires and the current thread is in a process being profiled.
Note that symbolication is deferred until profiling_disable() to avoid
adding more noise than necessary to the profile.

A simple "/bin/profile" command is included here that can be used to
start/stop profiling like so:

    $ profile 10 on
    ... wait ...
    $ profile 10 off

After a profile has been recorded, it can be fetched in /proc/profile

There are various limits (or "bugs") on this mechanism at the moment:

- Only one process can be profiled at a time.
- We allocate 8MB for the samples, if you use more space, things will
  not work, and probably break a bit.
- Things will probably fall apart if the profiled process dies during
  profiling, or while extracing /proc/profile
2019-12-11 20:36:56 +01:00
Andreas Kling
e9dda8d592 Kernel: Give PTY's *actually* unique major ID's
Okay, one "dunce hat" point for me. The new PTY majors conflicted with
PATAChannel. Now they are 200 for master and 201 for slave, not used
by anything else.. I hope!
2019-12-09 21:03:39 +01:00
Andreas Kling
dbb644f20c Kernel: Start implementing purgeable memory support
It's now possible to get purgeable memory by using mmap(MAP_PURGEABLE).
Purgeable memory has a "volatile" flag that can be set using madvise():

- madvise(..., MADV_SET_VOLATILE)
- madvise(..., MADV_SET_NONVOLATILE)

When in the "volatile" state, the kernel may take away the underlying
physical memory pages at any time, without notifying the owner.
This gives you a guilt discount when caching very large things. :^)

Setting a purgeable region to non-volatile will return whether or not
the memory has been taken away by the kernel while being volatile.
Basically, if madvise(..., MADV_SET_NONVOLATILE) returns 1, that means
the memory was purged while volatile, and whatever was in that piece
of memory needs to be reconstructed before use.
2019-12-09 19:12:38 +01:00
Andreas Kling
6f4c380d95 AK: Use size_t for the length of strings
Using int was a mistake. This patch changes String, StringImpl,
StringView and StringBuilder to use size_t instead of int for lengths.
Obviously a lot of code needs to change as a result of this.
2019-12-09 17:51:21 +01:00
Andrew Kaster
9058962712 Kernel: Allow setting thread names
The main thread of each kernel/user process will take the name of
the process. Extra threads will get a fancy new name
"ProcessName[<tid>]".

Thread backtraces now list the thread name in addtion to tid.

Add the thread name to /proc/all (should it get its own proc
file?).

Add two new syscalls, set_thread_name and get_thread_name.
2019-12-08 14:09:29 +01:00
Andreas Kling
c5ff0da93c ProcFS: Fix typo in /proc/net/local 2019-12-07 09:40:30 +01:00
Andreas Kling
23e802518d Kernel: Add getsockopt(SO_PEERCRED) for local sockets
This sockopt gives you a struct with the PID, UID and GID of a socket's
peer process.
2019-12-06 18:38:36 +01:00
Andreas Kling
5a45376180 Kernel+SystemMonitor: Log amounts of I/O per thread
This patch adds these I/O counters to each thread:

- (Inode) file read bytes
- (Inode) file write bytes
- Unix socket read bytes
- Unix socket write bytes
- IPv4 socket read bytes
- IPv4 socket write bytes

These are then exposed in /proc/all and seen in SystemMonitor.
2019-12-01 17:40:27 +01:00
Andreas Kling
3ad0e6e198 Kernel: Show module memory size in /proc/modules
Note that this only shows the size of the loaded module sections,
and does not include any memory allocated *by* the module.
2019-11-29 21:18:37 +01:00
Andreas Kling
1f67894bdd Kernel: Add /proc/modules to enumerate the currently loaded modules 2019-11-28 21:12:02 +01:00
Andreas Kling
75ed262fe5 Kernel+ifconfig: Add an MTU value to NetworkAdapter
This defaults to 1500 for all adapters, but LoopbackAdapter increases
it to 65536 on construction.

If an IPv4 packet is larger than the MTU, we'll need to break it into
smaller fragments before transmitting it. This part is a FIXME. :^)
2019-11-28 14:14:26 +01:00
Andreas Kling
0adbacf59e Kernel: Demangle userspace ELF symbols in backtraces
Turns out we can use abi::__cxa_demangle() for this, and all we need to
provide is sprintf(), realloc() and free(), so this patch exposes them.

We now have fully demangled C++ backtraces :^)
2019-11-27 14:06:24 +01:00
Andreas Kling
5b8cf2ee23 Kernel: Make syscall counters and page fault counters per-thread
Now that we show individual threads in SystemMonitor and "top",
it's also very nice to have individual counters for the threads. :^)
2019-11-26 21:37:38 +01:00
Andreas Kling
712ae73581 Kernel: Expose per-thread information in /proc/all
Previously it was not possible to see what each thread in a process was
up to, or how much CPU it was consuming. This patch fixes that.

SystemMonitor and "top" now show threads instead of just processes.
"ps" is gonna need some more fixing, but it at least builds for now.

Fixes #66.
2019-11-26 21:37:30 +01:00
Sergey Bugaev
8aef0a0755 Kernel: Handle fstat() on sockets 2019-11-26 19:58:25 +01:00
Drew Stratford
ee0eed26f4 Ext2FileSystem: set_metadata_dirty(true) during write_directory().
This adds a call to set_metadata_dirty(true) to
Ext2FS::write_directory(). This fixes a bug wherein InodeWatchers
weren't alerted on directory updates.
2019-11-21 13:04:38 +01:00
Andreas Kling
8ccbd7002b Ext2FS: Rename allocate_inode() => find_a_free_inode()
Since this function doesn't actually mark the inode as allocated,
let's tone down the name a little bit.
2019-11-17 19:19:02 +01:00
Andreas Kling
a712d4ac0c Ext2FS: Writing to a slow symlink should not treat it like a fast one
We would misinterpret short writes to the first 60 bytes of a slow
symlink as writes to a fast symlink.
2019-11-17 19:19:00 +01:00
Andreas Kling
5d08665d9a Ext2FS: Remove unnecessary extra cache lookup in get_inode() 2019-11-17 19:11:19 +01:00
Andreas Kling
ba997c0a72 Ext2FS: Add some FIXME's while browsing this code 2019-11-17 18:59:14 +01:00
Andreas Kling
02f89dc419 Kernel+SystemMonitor: Show VM region "shared" and "stack" bits in UI
Expose these two region bits through /proc/PID/vm and show them in the
SystemMonitor process memory map view.
2019-11-17 12:15:46 +01:00
Andreas Kling
8d4d63d9b6 Ext2FS: Minor cleanup, remove an unused function 2019-11-16 17:29:09 +01:00
Andreas Kling
06a80bcf69 Kernel+SystemMonitor: Publish can_read/write state for open files
The can_read() and can_write() states for file descriptions are now
published in /proc, allowing SystemMonitor to display it.
2019-11-09 22:42:19 +01:00
Andreas Kling
083c5f8b89 Kernel: Rework Process::Priority into ThreadPriority
Scheduling priority is now set at the thread level instead of at the
process level.

This is a step towards allowing processes to set different priorities
for threads. There's no userspace API for that yet, since only the main
thread's priority is affected by sched_setparam().
2019-11-06 16:30:06 +01:00
Andreas Kling
1c8f017730 Kernel: Remove unused SynthFS filesystem
This used to be the base class of ProcFS and DevPtsFS but not anymore.
2019-11-06 11:36:24 +01:00
Andreas Kling
59ed235c85 Kernel: Implement O_DIRECT open() flag to bypass disk caches
Files opened with O_DIRECT will now bypass the disk cache in read/write
operations (though metadata operations will still hit the disk cache.)

This will allow us to test actual disk performance instead of testing
disk *cache* performance, if that's what we want. :^)

There's room for improvment here, we're very aggressively flushing any
dirty cache entries for the specific block before reading/writing that
block. This is done by walking the entire cache, which may be slow.
2019-11-05 19:35:12 +01:00
Andreas Kling
f215f0c226 ProcFS: Fix Clang build (or really, Qt Creator syntax highlighting)
The Clang parser used by Qt Creator kept getting confused by this code.
2019-11-04 18:14:04 +01:00
Andreas Kling
721585473b Ext2FS: Don't uncache inodes while they are being watched
If an inode is observed by watch_file(), we won't uncache it.
This allows a program to watch a file without keeping it open.
2019-11-04 17:25:44 +01:00
Andreas Kling
1b2ef8582c Kernel: Make File's can_read/can_write take a const FileDescription&
Asking a File if we could possibly read or write it will never mutate
the asking FileDescription&, so it should be const.
2019-11-04 14:03:14 +01:00
Andreas Kling
e8fee92357 Kernel: Don't update fd offset on read/write error
If something goes wrong with a read or write operation, we don't want
to add the error number to the fd's offset. :^)
2019-11-04 13:58:28 +01:00
Andreas Kling
c538648465 Ext2FS: Uncache unused Inodes after flushing contents to disk
Don't keep Inodes around in memory forever after we've interacted with
them once. This is a slight performance pessimization when accessing
the same file repeatedly, but closing it for a while in between.

Longer term we should find a way to keep a limited number of unused
Inodes cached, whichever ones we think are likely to be used again.
2019-11-04 12:23:19 +01:00
Alexander
9e03f3ce20 ProcFS: Identify virtual filesystems' device in df (#728) 2019-11-03 20:58:10 +01:00
Andreas Kling
7e56cf1800 Ext2FS: Lock the filesystem during initialization and during sync
If we get preempted during initialization, we really don't want to try
doing a sync on a partially-initialized filesystem.
2019-11-03 13:56:55 +01:00
Andreas Kling
c1d3ac7108 Ext2FS: Fix unpopulated block list cache after mkdir()
When creating a new directory, we set the initial size to 1 block.
This meant that we were allocating a block up front, but the Inode's
internal block list cache was not populated with this block.

This broke write_bytes() on a new directory, since it assumed that
the block list cache would be up to date if the call to write_bytes()
would not change the directory's size.

This patch fixes the issue in two ways: First, we cache the initial
block list created for new directories.
Second, we now repopulate the block list cache in write_bytes() if it
is empty when we get there. This is basically just a safety fallback
to avoid having this kind of bug in the future.
2019-11-03 10:22:09 +01:00
Andreas Kling
dcf8d359f3 Kernel: Fick infinite recursion when filling up disk cache
We can't be calling the virtual FS::flush_writes() in order to flush
the disk cache from within the disk cache, since an FS subclass may
try to do cache stuff in its flush_writes() implementation.

Instead, separate out the implementation of DiskBackedFS's flushing
logic into a flush_writes_impl() and call that from the cache code.
2019-11-03 00:21:17 +01:00