Commit Graph

428 Commits

Author SHA1 Message Date
Sergey Bugaev
d6184afcae Kernel: Simplify VFS::resolve_path() further
It turns out we don't even need to store the whole custody chain, as we only
ever access its last element. So we can just store one custody. This also fixes
a performance FIXME :^)

Also, rename parent_custody to out_parent.
2020-01-17 21:49:58 +01:00
Andreas Kling
d4d17ce423 Kernel: Trying to sys$link() a directory should fail with EPERM 2020-01-15 22:11:44 +01:00
Andreas Kling
e91f03cb39 Ext2FS: Assert that inline symlink read/write always uses offset=0 2020-01-15 22:11:44 +01:00
Andreas Kling
5a13a5416e Kernel: Avoid an extra call to read_bytes() in Inode::read_entire()
If we slurp up the entire inode in a single read_bytes(), no need to
call read_bytes() again.
2020-01-15 22:11:44 +01:00
Andreas Kling
9e54c7c17f Ext2FS: Don't allow creating new files in removed directories
Also don't uncache inodes when they reach i_links_count==0 unless they
also have no ref counts other than the +1 from the inode cache.
This prevents the FS from deleting the on-disk inode too soon.
2020-01-15 22:11:44 +01:00
Andreas Kling
e23536d682 Kernel: Use Vector::unstable_remove() in a couple of places 2020-01-15 19:26:41 +01:00
Sergey Bugaev
b913e30011 Kernel: Refactor/rewrite VFS::resolve_path()
This makes the implementation easier to follow, but also fixes multiple issues
with the old implementation. In particular, it now deals properly with . and ..
in paths, including around mount points.

Hopefully there aren't many new bugs this introduces :^)
2020-01-14 12:24:19 +01:00
Andreas Kling
0c44a12247 Kernel: read() and write() should EOVERFLOW if (offset+size) overflows 2020-01-12 20:20:17 +01:00
Andreas Kling
14d4b1058e Kernel: Add a basic lock to FileDescription
Let's prevent two processes sharing a FileDescription from messing with
it at the same time for now.
2020-01-12 20:09:44 +01:00
Sergey Bugaev
33c0dc08a7 Kernel: Don't forget to copy & destroy root_directory_for_procfs
Also, rename it to root_directory_relative_to_global_root.
2020-01-12 20:02:11 +01:00
Sergey Bugaev
fee6d0a3a6 Kernel+Base: Mount root as nodev,nosuid
Then bind-mount /dev and /bin while adding back the appropriate permissions :^)
2020-01-12 20:02:11 +01:00
Sergey Bugaev
93ff911473 Kernel: Properly propagate bind mount flags
Previously, when performing a bind mount flags other than MS_BIND were ignored.
Now, they're properly propagated the same way a for any other mount.
2020-01-12 20:02:11 +01:00
Sergey Bugaev
3393b78623 Kernel: Allow getting a Device from a FileDescription
Like we already do for other kinds of files.
2020-01-12 20:02:11 +01:00
Andreas Kling
cb59f9e0f2 Kernel: Put some VFS debug spam behind VFS_DEBUG 2020-01-12 10:01:22 +01:00
Andreas Kling
b36608f47c ProcFS: Expose process pledge promises in /proc/all 2020-01-11 21:33:12 +01:00
Sergey Bugaev
0cb0f54783 Kernel: Implement bind mounts
You can now bind-mount files and directories. This essentially exposes an
existing part of the file system in another place, and can be used as an
alternative to symlinks or hardlinks.

Here's an example of doing this:

    # mkdir /tmp/foo
    # mount /home/anon/myfile.txt /tmp/foo -o bind
    # cat /tmp/foo
    This is anon's file.
2020-01-11 18:57:53 +01:00
Sergey Bugaev
61c1106d9f Kernel+LibC: Implement a few mount flags
We now support these mount flags:
* MS_NODEV: disallow opening any devices from this file system
* MS_NOEXEC: disallow executing any executables from this file system
* MS_NOSUID: ignore set-user-id bits on executables from this file system

The fourth flag, MS_BIND, is defined, but currently ignored.
2020-01-11 18:57:53 +01:00
Sergey Bugaev
2fcbb846fb Kernel+LibC: Add O_EXEC, move exec permission checking to VFS::open()
O_EXEC is mentioned by POSIX, so let's have it. Currently, it is only used
inside the kernel to ensure the process has the right permissions when opening
an executable.
2020-01-11 18:57:53 +01:00
Sergey Bugaev
4566c2d811 Kernel+LibC: Add support for mount flags
At the moment, the actual flags are ignored, but we correctly propagate them all
the way from the original mount() syscall to each custody that resides on the
mounted FS.
2020-01-11 18:57:53 +01:00
Sergey Bugaev
1e6ab0ed22 Kernel: Simplify VFS::Mount handling
No need to pass around RefPtr<>s and NonnullRefPtr<>s and no need to
heap-allocate them.

Also remove VFS::mount(NonnullRefPtr<FS>&&, StringView path) - it has been
unused for a long time.
2020-01-11 18:57:53 +01:00
Andreas Kling
29b3d95004 Kernel: Expose a process's filesystem root as a /proc/PID/root symlink
In order to preserve the absolute path of the process root, we save the
custody used by chroot() before stripping it to become the new "/".
There's probably a better way to do this.
2020-01-10 23:48:44 +01:00
Andreas Kling
ddd0b19281 Kernel: Add a basic chroot() syscall :^)
The chroot() syscall now allows the superuser to isolate a process into
a specific subtree of the filesystem. This is not strictly permanent,
as it is also possible for a superuser to break *out* of a chroot, but
it is a useful mechanism for isolating unprivileged processes.

The VFS now uses the current process's root_directory() as the root for
path resolution purposes. The root directory is stored as an uncached
Custody in the Process object.
2020-01-10 23:14:04 +01:00
Andreas Kling
944fbf507a Kernel: Custody::absolute_path() should always return "/" for roots
A Custody with no parent is always *a* root (although not necessarily
the *real* root.)
2020-01-10 23:09:58 +01:00
Andreas Kling
b1ffde6199 Kernel: unlink() should not follow symlinks 2020-01-10 14:07:36 +01:00
Andreas Kling
7380c8ec6e TmpFS: Synthesize "." and ".." in traverse_as_directory()
As Sergey pointed out, it's silly to have proper entries for . and ..
in TmpFS when we can just synthesize them on the fly.

Note that we have to tolerate removal of . and .. via remove_child()
to keep VFS::rmdir() happy.
2020-01-10 13:16:55 +01:00
Andreas Kling
59bfbed2e2 ProcFS: Don't expose kernel-only regions to users via /proc/PID/vm
The superuser is still allowed to see them, but kernel-only VM regions
are now excluded from /proc/PID/vm.
2020-01-10 10:57:33 +01:00
Andreas Kling
d310cf3b49 Kernel: Opening a file with O_TRUNC should update mtime 2020-01-08 15:21:06 +01:00
Andreas Kling
e485667201 Kernel: ftruncate() should update mtime 2020-01-08 15:21:06 +01:00
Andreas Kling
fe1bf067b8 ProcFS: Reads past the end of a generated file should be zero-length 2020-01-08 12:59:06 +01:00
Andreas Kling
28ee5b0e98 TmpFS: Reads past the end of a file should be zero-length 2020-01-08 12:47:41 +01:00
Andreas Kling
faf32153f6 Kernel: Take const Process& in InodeMetadata::may_{read,write,execute} 2020-01-07 19:24:06 +01:00
Andreas Kling
5387a19268 Kernel: Make Process::file_description() vend a RefPtr<FileDescription>
This encourages callers to strongly reference file descriptions while
working with them.

This fixes a use-after-free issue where one thread would close() an
open fd while another thread was blocked on it becoming readable.

Test: Kernel/uaf-close-while-blocked-in-read.cpp
2020-01-07 15:53:42 +01:00
Andreas Kling
a49d9c774f TmpFS: Add ASSERT(offset >= 0) to read_bytes() and write_bytes() 2020-01-07 15:25:56 +01:00
Andreas Kling
bb9db9d430 TmpFS: Add "." and ".." entries to all directories
It was so weird not seeing them in "ls -la" output :^)
2020-01-07 14:48:43 +01:00
Andreas Kling
56a2c21e0c Kernel: Don't leak kmalloc pointers through FIFO absolute paths
Instead of using the FIFO's memory address as part of its absolute path
identity, just use an incrementing FIFO index instead.

Note that this is not used for anything other than debugging (it helps
you identify which file descriptors refer to the same FIFO by looking
at /proc/PID/fds
2020-01-07 10:29:47 +01:00
Andreas Kling
9eef39d68a Kernel: Start implementing x86 SMAP support
Supervisor Mode Access Prevention (SMAP) is an x86 CPU feature that
prevents the kernel from accessing userspace memory. With SMAP enabled,
trying to read/write a userspace memory address while in the kernel
will now generate a page fault.

Since it's sometimes necessary to read/write userspace memory, there
are two new instructions that quickly switch the protection on/off:
STAC (disables protection) and CLAC (enables protection.)
These are exposed in kernel code via the stac() and clac() helpers.

There's also a SmapDisabler RAII object that can be used to ensure
that you don't forget to re-enable protection before returning to
userspace code.

THis patch also adds copy_to_user(), copy_from_user() and memset_user()
which are the "correct" way of doing things. These functions allow us
to briefly disable protection for a specific purpose, and then turn it
back on immediately after it's done. Going forward all kernel code
should be moved to using these and all uses of SmapDisabler are to be
considered FIXME's.

Note that we're not realizing the full potential of this feature since
I've used SmapDisabler quite liberally in this initial bring-up patch.
2020-01-05 18:14:51 +01:00
Andreas Kling
12eb1f5d74 Kernel: Entries in /dev/pts should be accessible only to the owner
This fixes an issue where anyone could snoop on any pseudoterminal.
2020-01-04 12:46:48 +01:00
Andreas Kling
b5da0b78eb Kernel: File::open() should apply r/w mode from the provided options
This has been a FIXME for a long time. We now apply the provided
read/write permissions to the constructed FileDescription when opening
a File object via File::open().
2020-01-04 12:30:55 +01:00
Andreas Kling
e79c33eabb Kernel: The root inode of a TmpFS should have the sticky bit set
We were running without the sticky bit and mode 777, which meant that
the /tmp directory was world-writable *without* protection.

With this fixed, it's no longer possible for everyone to steal root's
files in /tmp.
2020-01-04 11:33:36 +01:00
Andreas Kling
d84299c7be Kernel: Allow fchmod() and fchown() on pre-bind() local sockets
In order to ensure a specific owner and mode when the local socket
filesystem endpoint is instantiated, we need to be able to call
fchmod() and fchown() on a socket fd between socket() and bind().

This is because until we call bind(), there is no filesystem inode
for the socket yet.
2020-01-03 20:14:56 +01:00
Andreas Kling
4abbedb6e4 Kernel: Allow passing initial UID and GID when creating new inodes
If we're creating something that should have a different owner than the
current process's UID/GID, we need to plumb that all the way through
VFS down to the FS functions.
2020-01-03 20:13:21 +01:00
Andreas Kling
82760998a9 Ext2FS: Take the inode lock in Ext2FSInode::metadata()
Remove an unnecessary InterruptDisabler to make this not assert. :^)
2020-01-03 17:48:02 +01:00
Andreas Kling
889ecd1375 Kernel: The superuser is allowed to utime() on any file
Before this patch, root was not able to "touch" someone else's file.
2020-01-03 04:14:41 +01:00
Andreas Kling
3f74e66e82 Kernel: rename() should fail with EXDEV for cross-device requests
POSIX does not support rename() from one file system to another.
2020-01-03 04:10:05 +01:00
Andreas Kling
3be1c7b514 Kernel: Fix awkward bug where "touch /foo/bar/baz" could create "/baz"
To accomodate file creation, path resolution optionally returns the
last valid parent directory seen while traversing the path.

Clients will then interpret "ENOENT, but I have a parent for you" as
meaning that the file doesn't exist, but its immediate parent directory
does. The client then goes ahead and creates a new file.

In the case of "/foo/bar/baz" where there is no "/foo", it would fail
with ENOENT and "/" as the last seen parent directory, causing e.g the
open() syscall to create "/baz".

Covered by test_io.
2020-01-03 03:57:10 +01:00
Andreas Kling
0a1865ebc6 Kernel: read() and write() should fail with EBADF for wrong mode fd's
It was previously possible to write to read-only file descriptors,
and read from write-only file descriptors.

All FileDescription objects now start out non-readable + non-writable,
and whoever is creating them has to "manually" enable reading/writing
by calling set_readable() and/or set_writable() on them.
2020-01-03 03:29:59 +01:00
Andreas Kling
064e46e581 Kernel: Don't allow open() with (O_CREAT | O_DIRECTORY) 2020-01-03 03:16:29 +01:00
Andreas Kling
15f3abc849 Kernel: Handle O_DIRECTORY in VFS::open() instead of in each syscall
Just taking care of some FIXMEs.
2020-01-03 03:16:29 +01:00
Andreas Kling
fdde5cdf26 Kernel: Don't include the process GID in the "extra GIDs" table
Process::m_extra_gids is for supplementary GIDs only.
2020-01-02 23:45:52 +01:00
Andreas Kling
32ec1e5aed Kernel: Mask kernel addresses in backtraces and profiles
Addresses outside the userspace virtual range will now show up as
0xdeadc0de in backtraces and profiles generated by unprivileged users.
2020-01-02 20:51:31 +01:00
Andreas Kling
7f04334664 Kernel: Remove broken implementation of Unix SHM
This code never worked, as was never used for anything. We can build
a much better SHM implementation on top of TmpFS or similar when we
get to the point when we need one.
2020-01-02 12:44:21 +01:00
Liav A
e5ffa960d7 Kernel: Create support for PCI ECAM
The new PCI subsystem is initialized during runtime.
PCI::Initializer is supposed to be called during early boot, to
perform a few tests, and initialize the proper configuration space
access mechanism. Kernel boot parameters can be specified by a user to
determine what tests will occur, to aid debugging on problematic
machines.
After that, PCI::Initializer should be dismissed.

PCI::IOAccess is a class that is derived from PCI::Access
class and implements PCI configuration space access mechanism via x86
IO ports.
PCI::MMIOAccess is a class that is derived from PCI::Access
and implements PCI configurtaion space access mechanism via memory
access.

The new PCI subsystem also supports determination of IO/MMIO space
needed by a device by checking a given BAR.
In addition, Every device or component that use the PCI subsystem has
changed to match the last changes.
2020-01-02 00:50:09 +01:00
Andreas Kling
d8ef13a426 ProcFS: Supervisor-only inodes should be owned by UID 0, GID 0 2019-12-31 13:22:43 +01:00
Andreas Kling
9af054af9e ProcFS: Reduce the amount of info accessible to non-superusers
This patch hardens /proc a bit by making many things only accessible
to UID 0, and also disallowing access to /proc/PID/ for anyone other
than the UID of that process (and superuser, obviously.)
2019-12-31 01:32:27 +01:00
Andreas Kling
54d182f553 Kernel: Remove some unnecessary leaking of kernel pointers into dmesg
There's a lot more of this and we need to stop printing kernel pointers
anywhere but the debug console.
2019-12-31 01:22:00 +01:00
Andreas Kling
50677bf806 Kernel: Refactor scheduler to use dynamic thread priorities
Threads now have numeric priorities with a base priority in the 1-99
range.

Whenever a runnable thread is *not* scheduled, its effective priority
is incremented by 1. This is tracked in Thread::m_extra_priority.
The effective priority of a thread is m_priority + m_extra_priority.

When a runnable thread *is* scheduled, its m_extra_priority is reset to
zero and the effective priority returns to base.

This means that lower-priority threads will always eventually get
scheduled to run, once its effective priority becomes high enough to
exceed the base priority of threads "above" it.

The previous values for ThreadPriority (Low, Normal and High) are now
replaced as follows:

    Low -> 10
    Normal -> 30
    High -> 50

In other words, it will take 20 ticks for a "Low" priority thread to
get to "Normal" effective priority, and another 20 to reach "High".

This is not perfect, and I've used some quite naive data structures,
but I think the mechanism will allow us to build various new and
interesting optimizations, and we can figure out better data structures
later on. :^)
2019-12-30 18:46:17 +01:00
Andreas Kling
c74cde918a Kernel+SystemMonitor: Expose amount of per-process clean inode memory
This is memory that's loaded from an inode (file) but not modified in
memory, so still identical to what's on disk. This kind of memory can
be freed and reloaded transparently from disk if needed.
2019-12-29 12:45:58 +01:00
Andreas Kling
0d5e0e4cad Kernel+SystemMonitor: Expose amount of per-process dirty private memory
Dirty private memory is all memory in non-inode-backed mappings that's
process-private, meaning it's not shared with any other process.

This patch exposes that number via SystemMonitor, giving us an idea of
how much memory each process is responsible for all on its own.
2019-12-29 12:28:32 +01:00
Andreas Kling
dafd715743 ProcFS: Fix inconsistent numbers in /proc/memstat
We were listing the total number of user/super pages as the number of
"available" pages in the system. This was then misinterpreted in the
SystemMonitor program and displayed wrong in the GUI.
2019-12-26 11:43:42 +01:00
Shannon Booth
0e45b9423b Kernel: Implement recursion limit on path resolution
Cautiously use 5 as a limit for now so that we don't blow the stack.
This can be increased in the future if we are sure that we won't be
blowing the stack, or if the implementation is changed to not use
recursion :^)
2019-12-24 23:14:14 +01:00
Andreas Kling
c5c1cc817e Kernel: Expose region executable bit in /proc/PID/vm
Also show it in SystemMonitor's process memory map table (as 'X') :^)
2019-12-21 16:26:42 +01:00
Andreas Kling
b6ee8a2c8d Kernel: Rename vmo => vmobject everywhere 2019-12-19 19:15:27 +01:00
Andreas Kling
931e4b7f5e Kernel+SystemMonitor: Prevent userspace access to process ELF image
Every process keeps its own ELF executable mapped in memory in case we
need to do symbol lookup (for backtraces, etc.)

Until now, it was mapped in a way that made it accessible to the
program, despite the program not having mapped it itself.
I don't really see a need for userspace to have access to this right
now, so let's lock things down a little bit.

This patch makes it inaccessible to userspace and exposes that fact
through /proc/PID/vm (per-region "user_accessible" flag.)
2019-12-15 20:11:57 +01:00
Andreas Kling
5292f6e78f Kernel+FileManager: Disallow watch_file() in unsupported file systems
Currently only Ext2FS and TmpFS supports InodeWatchers. We now fail
with ENOTSUPP if watch_file() is called on e.g ProcFS.

This fixes an issue with FileManager chewing up all the CPU when /proc
was opened. Watchers don't keep the watched Inode open, and when they
close, the watcher FD will EOF.

Since nothing else kept /proc open in FileManager, the watchers created
for it would EOF immediately, causing a refresh over and over.

Fixes #879.
2019-12-15 19:33:39 +01:00
Andreas Kling
3fbc50a350 Kernel+SystemMonitor: Expose the number of set CoW bits in each Region
This number tells us how many more pages in a given region will trigger
a CoW fault if written to.
2019-12-15 16:53:00 +01:00
Andreas Kling
0f393148da Kernel: Separate out the symbol offsets in profile output
Instead of saying "main +39" and "main +57" etc, we now have a separate
field in /proc/profile for the offset-into-the-symbol.
2019-12-12 21:59:47 +01:00
Andreas Kling
bc919c8432 ProcFS: Rename "samples" in /proc/profile to "frames"
These are stack frames, not profile samples. :^)
2019-12-12 20:36:41 +01:00
Andreas Kling
b32e961a84 Kernel: Implement a simple process time profiler
The kernel now supports basic profiling of all the threads in a process
by calling profiling_enable(pid_t). You finish the profiling by calling
profiling_disable(pid_t).

This all works by recording thread stacks when the timer interrupt
fires and the current thread is in a process being profiled.
Note that symbolication is deferred until profiling_disable() to avoid
adding more noise than necessary to the profile.

A simple "/bin/profile" command is included here that can be used to
start/stop profiling like so:

    $ profile 10 on
    ... wait ...
    $ profile 10 off

After a profile has been recorded, it can be fetched in /proc/profile

There are various limits (or "bugs") on this mechanism at the moment:

- Only one process can be profiled at a time.
- We allocate 8MB for the samples, if you use more space, things will
  not work, and probably break a bit.
- Things will probably fall apart if the profiled process dies during
  profiling, or while extracing /proc/profile
2019-12-11 20:36:56 +01:00
Andreas Kling
e9dda8d592 Kernel: Give PTY's *actually* unique major ID's
Okay, one "dunce hat" point for me. The new PTY majors conflicted with
PATAChannel. Now they are 200 for master and 201 for slave, not used
by anything else.. I hope!
2019-12-09 21:03:39 +01:00
Andreas Kling
dbb644f20c Kernel: Start implementing purgeable memory support
It's now possible to get purgeable memory by using mmap(MAP_PURGEABLE).
Purgeable memory has a "volatile" flag that can be set using madvise():

- madvise(..., MADV_SET_VOLATILE)
- madvise(..., MADV_SET_NONVOLATILE)

When in the "volatile" state, the kernel may take away the underlying
physical memory pages at any time, without notifying the owner.
This gives you a guilt discount when caching very large things. :^)

Setting a purgeable region to non-volatile will return whether or not
the memory has been taken away by the kernel while being volatile.
Basically, if madvise(..., MADV_SET_NONVOLATILE) returns 1, that means
the memory was purged while volatile, and whatever was in that piece
of memory needs to be reconstructed before use.
2019-12-09 19:12:38 +01:00
Andreas Kling
6f4c380d95 AK: Use size_t for the length of strings
Using int was a mistake. This patch changes String, StringImpl,
StringView and StringBuilder to use size_t instead of int for lengths.
Obviously a lot of code needs to change as a result of this.
2019-12-09 17:51:21 +01:00
Andrew Kaster
9058962712 Kernel: Allow setting thread names
The main thread of each kernel/user process will take the name of
the process. Extra threads will get a fancy new name
"ProcessName[<tid>]".

Thread backtraces now list the thread name in addtion to tid.

Add the thread name to /proc/all (should it get its own proc
file?).

Add two new syscalls, set_thread_name and get_thread_name.
2019-12-08 14:09:29 +01:00
Andreas Kling
c5ff0da93c ProcFS: Fix typo in /proc/net/local 2019-12-07 09:40:30 +01:00
Andreas Kling
23e802518d Kernel: Add getsockopt(SO_PEERCRED) for local sockets
This sockopt gives you a struct with the PID, UID and GID of a socket's
peer process.
2019-12-06 18:38:36 +01:00
Andreas Kling
5a45376180 Kernel+SystemMonitor: Log amounts of I/O per thread
This patch adds these I/O counters to each thread:

- (Inode) file read bytes
- (Inode) file write bytes
- Unix socket read bytes
- Unix socket write bytes
- IPv4 socket read bytes
- IPv4 socket write bytes

These are then exposed in /proc/all and seen in SystemMonitor.
2019-12-01 17:40:27 +01:00
Andreas Kling
3ad0e6e198 Kernel: Show module memory size in /proc/modules
Note that this only shows the size of the loaded module sections,
and does not include any memory allocated *by* the module.
2019-11-29 21:18:37 +01:00
Andreas Kling
1f67894bdd Kernel: Add /proc/modules to enumerate the currently loaded modules 2019-11-28 21:12:02 +01:00
Andreas Kling
75ed262fe5 Kernel+ifconfig: Add an MTU value to NetworkAdapter
This defaults to 1500 for all adapters, but LoopbackAdapter increases
it to 65536 on construction.

If an IPv4 packet is larger than the MTU, we'll need to break it into
smaller fragments before transmitting it. This part is a FIXME. :^)
2019-11-28 14:14:26 +01:00
Andreas Kling
0adbacf59e Kernel: Demangle userspace ELF symbols in backtraces
Turns out we can use abi::__cxa_demangle() for this, and all we need to
provide is sprintf(), realloc() and free(), so this patch exposes them.

We now have fully demangled C++ backtraces :^)
2019-11-27 14:06:24 +01:00
Andreas Kling
5b8cf2ee23 Kernel: Make syscall counters and page fault counters per-thread
Now that we show individual threads in SystemMonitor and "top",
it's also very nice to have individual counters for the threads. :^)
2019-11-26 21:37:38 +01:00
Andreas Kling
712ae73581 Kernel: Expose per-thread information in /proc/all
Previously it was not possible to see what each thread in a process was
up to, or how much CPU it was consuming. This patch fixes that.

SystemMonitor and "top" now show threads instead of just processes.
"ps" is gonna need some more fixing, but it at least builds for now.

Fixes #66.
2019-11-26 21:37:30 +01:00
Sergey Bugaev
8aef0a0755 Kernel: Handle fstat() on sockets 2019-11-26 19:58:25 +01:00
Drew Stratford
ee0eed26f4 Ext2FileSystem: set_metadata_dirty(true) during write_directory().
This adds a call to set_metadata_dirty(true) to
Ext2FS::write_directory(). This fixes a bug wherein InodeWatchers
weren't alerted on directory updates.
2019-11-21 13:04:38 +01:00
Andreas Kling
8ccbd7002b Ext2FS: Rename allocate_inode() => find_a_free_inode()
Since this function doesn't actually mark the inode as allocated,
let's tone down the name a little bit.
2019-11-17 19:19:02 +01:00
Andreas Kling
a712d4ac0c Ext2FS: Writing to a slow symlink should not treat it like a fast one
We would misinterpret short writes to the first 60 bytes of a slow
symlink as writes to a fast symlink.
2019-11-17 19:19:00 +01:00
Andreas Kling
5d08665d9a Ext2FS: Remove unnecessary extra cache lookup in get_inode() 2019-11-17 19:11:19 +01:00
Andreas Kling
ba997c0a72 Ext2FS: Add some FIXME's while browsing this code 2019-11-17 18:59:14 +01:00
Andreas Kling
02f89dc419 Kernel+SystemMonitor: Show VM region "shared" and "stack" bits in UI
Expose these two region bits through /proc/PID/vm and show them in the
SystemMonitor process memory map view.
2019-11-17 12:15:46 +01:00
Andreas Kling
8d4d63d9b6 Ext2FS: Minor cleanup, remove an unused function 2019-11-16 17:29:09 +01:00
Andreas Kling
06a80bcf69 Kernel+SystemMonitor: Publish can_read/write state for open files
The can_read() and can_write() states for file descriptions are now
published in /proc, allowing SystemMonitor to display it.
2019-11-09 22:42:19 +01:00
Andreas Kling
083c5f8b89 Kernel: Rework Process::Priority into ThreadPriority
Scheduling priority is now set at the thread level instead of at the
process level.

This is a step towards allowing processes to set different priorities
for threads. There's no userspace API for that yet, since only the main
thread's priority is affected by sched_setparam().
2019-11-06 16:30:06 +01:00
Andreas Kling
1c8f017730 Kernel: Remove unused SynthFS filesystem
This used to be the base class of ProcFS and DevPtsFS but not anymore.
2019-11-06 11:36:24 +01:00
Andreas Kling
59ed235c85 Kernel: Implement O_DIRECT open() flag to bypass disk caches
Files opened with O_DIRECT will now bypass the disk cache in read/write
operations (though metadata operations will still hit the disk cache.)

This will allow us to test actual disk performance instead of testing
disk *cache* performance, if that's what we want. :^)

There's room for improvment here, we're very aggressively flushing any
dirty cache entries for the specific block before reading/writing that
block. This is done by walking the entire cache, which may be slow.
2019-11-05 19:35:12 +01:00
Andreas Kling
f215f0c226 ProcFS: Fix Clang build (or really, Qt Creator syntax highlighting)
The Clang parser used by Qt Creator kept getting confused by this code.
2019-11-04 18:14:04 +01:00
Andreas Kling
721585473b Ext2FS: Don't uncache inodes while they are being watched
If an inode is observed by watch_file(), we won't uncache it.
This allows a program to watch a file without keeping it open.
2019-11-04 17:25:44 +01:00
Andreas Kling
1b2ef8582c Kernel: Make File's can_read/can_write take a const FileDescription&
Asking a File if we could possibly read or write it will never mutate
the asking FileDescription&, so it should be const.
2019-11-04 14:03:14 +01:00
Andreas Kling
e8fee92357 Kernel: Don't update fd offset on read/write error
If something goes wrong with a read or write operation, we don't want
to add the error number to the fd's offset. :^)
2019-11-04 13:58:28 +01:00
Andreas Kling
c538648465 Ext2FS: Uncache unused Inodes after flushing contents to disk
Don't keep Inodes around in memory forever after we've interacted with
them once. This is a slight performance pessimization when accessing
the same file repeatedly, but closing it for a while in between.

Longer term we should find a way to keep a limited number of unused
Inodes cached, whichever ones we think are likely to be used again.
2019-11-04 12:23:19 +01:00
Alexander
9e03f3ce20 ProcFS: Identify virtual filesystems' device in df (#728) 2019-11-03 20:58:10 +01:00
Andreas Kling
7e56cf1800 Ext2FS: Lock the filesystem during initialization and during sync
If we get preempted during initialization, we really don't want to try
doing a sync on a partially-initialized filesystem.
2019-11-03 13:56:55 +01:00
Andreas Kling
c1d3ac7108 Ext2FS: Fix unpopulated block list cache after mkdir()
When creating a new directory, we set the initial size to 1 block.
This meant that we were allocating a block up front, but the Inode's
internal block list cache was not populated with this block.

This broke write_bytes() on a new directory, since it assumed that
the block list cache would be up to date if the call to write_bytes()
would not change the directory's size.

This patch fixes the issue in two ways: First, we cache the initial
block list created for new directories.
Second, we now repopulate the block list cache in write_bytes() if it
is empty when we get there. This is basically just a safety fallback
to avoid having this kind of bug in the future.
2019-11-03 10:22:09 +01:00
Andreas Kling
dcf8d359f3 Kernel: Fick infinite recursion when filling up disk cache
We can't be calling the virtual FS::flush_writes() in order to flush
the disk cache from within the disk cache, since an FS subclass may
try to do cache stuff in its flush_writes() implementation.

Instead, separate out the implementation of DiskBackedFS's flushing
logic into a flush_writes_impl() and call that from the cache code.
2019-11-03 00:21:17 +01:00
Andreas Kling
1e36d899f1 Ext2FS: Use KBuffers for the cached bitmap blocks
Also cache the block group descriptor table in a KBuffer on file system
initialization, instead of on first access.

This reduces pressure on the kmalloc heap somewhat.
2019-11-03 00:10:24 +01:00
Andreas Kling
94a6b248ca Ext2FS: Resizing an Inode to its current size should do nothing
We were writing out the full block list whenever Ext2FSInode::resize()
was called, even if the old and new sizes were identical.

This patch makes it a no-op, which drastically improves "cp" speed
since we now take full advantage of the up-front call to ftruncate().
2019-11-02 16:22:29 +01:00
Andreas Kling
5835569527 Ext2FS: Inode resizing should fail with ENOSPC if we lack blocks
If there are not enough free blocks in the filesystem to accomodate
growing an Inode, we should fail with ENOSPC before even starting to
allocate blocks.
2019-11-02 12:53:31 +01:00
Andreas Kling
2ad2210eb4 Ext2FS: Use the bitmap block caching for Inode bitmaps as well
Nothing really shows up on disk_benchmark for this change, but it is
obviously sensible to use the same mechanism here.
2019-11-02 12:34:18 +01:00
Andreas Kling
e52b7eeccc Ext2FS: Rename get_block_bitmap() => get_bitmap_block()
This can be used for Inode bitmaps as well, so let's rename it.
2019-11-02 12:26:20 +01:00
Andreas Kling
1ae9d85de9 Ext2FS: Cache block bitmaps instead of always reading/writing disk
Add a simple cache to Ext2FS where we keep block bitmaps along with a
dirty bit. This allows us to coalesce bitmap flushes, giving us a nice
~3x improvement in disk_benchmark write speeds.
2019-11-02 12:18:43 +01:00
Andreas Kling
3a8b5b405c Ext2FS: Tidy up code related to the Ext2 super block a bit
Store the cached super block as an ext2_super_block member instead of
caching it in a ByteBuffer and using a casting helper everywhere.

This patch also combines reading/writing of the super block into a
single disk device operation (instead of two.)
2019-11-02 11:49:11 +01:00
Andreas Kling
e4b7786b66 Ext2FS: Flush the super block and block group descriptors lazily
Keep dirty bits for these and override flush_writes() so that we can
coalesce writes. Looks like a ~5x write performance bump on the
disk_benchmark.
2019-11-02 11:36:56 +01:00
Andreas Kling
014f8ca8c4 AK: Allow JsonValue to store 64-bit integers internally
Add dedicated internal types for Int64 and UnsignedInt64. This makes it
a bit more straightforward to work with 64-bit numbers (instead of just
implicitly storing them as doubles.)
2019-10-29 16:36:50 +01:00
Andreas Kling
558c63a6f9 Kernel: FileDescription::is_directory() should not assert !is_fifo()
I have no idea why this was here. It makes no sense. If you're trying
to find out if something is a directory, why wouldn't you be allowed to
ask that about a FIFO? :^)

Thanks to Brandon for spotting this!

Also, while we're here, cache the directory state in a bool member so
we don't have to keep fetching inode metadata when checking this
repeatedly. This is important since sys$read() now calls it.
2019-10-25 09:23:38 +02:00
Drew Stratford
3014fdf3bd ProcFS: make procfs$pid_fds always returns a valid JSON array.
Previously, procfs$pid_fds would return nothing when called
for a process that had either no open files or a non-existent
handle. This could cause problems when a userspace program
expected a valid Json response.

Procfs$pid_fs now returns an empty array in the aforementioned
cases.
2019-10-23 07:45:13 +02:00
Drew Stratford
d063734f69 ProcFS: Check for empty Optional in read_bytes()
In read_bytes() we now check that the Optional "data" actually
contains a value before use, avoiding a failing asserting in
kernel space.
2019-10-23 07:45:13 +02:00
Andreas Kling
0782c60fe5 Kernel: Update the mtime after a successful InodeFile::write()
Well this was pretty silly. We were not updating the modification time
of files.. after modifying them. :^)
2019-10-22 22:23:58 +02:00
Andreas Kling
ec65b8db2e Revert "Kernel: Make DoubleBuffer use a KBuffer instead of kmalloc()ing"
This reverts commit 1cca5142af.

This appears to be causing intermittent triple-faults and I don't know
why yet, so I'll just revert it to keep the tree in decent shape.
2019-10-18 15:58:06 +02:00
Andreas Kling
1cca5142af Kernel: Make DoubleBuffer use a KBuffer instead of kmalloc()ing
Background: DoubleBuffer is a handy buffer class in the kernel that
allows you to keep writing to it from the "outside" while the "inside"
reads from it. It's used for things like LocalSocket and PTY's.
Internally, it has a read buffer and a write buffer, but the two will
swap places when the read buffer is exhausted (by reading from it.)

Before this patch, it was internally implemented as two Vector<u8>
that we would swap between when the reader side had exhausted the data
in the read buffer. Now instead we preallocate a large KBuffer (64KB*2)
on DoubleBuffer construction and use that throughout its lifetime.

This removes all the kmalloc heap traffic caused by DoubleBuffers :^)
2019-10-18 14:55:04 +02:00
Jesse Buhagiar
06b6af61c6 Kernel: Made DiskCache entries a KBuffer
The Cache entries found in `DiskBackedFileSystem` are now stored in a
`KBuffer` object, instead of relying on `kmalloc_eternal`. The number
of entries was exceeding that of the number of bytes allocated to
`kmalloc_eternal`, which in turn caused `mount()` to fail epically
when called.
2019-10-08 11:10:30 +02:00
Andreas Kling
35138437ef Kernel+SystemMonitor: Add fault counters
This patch adds three separate per-process fault counters:

- Inode faults

    An inode fault happens when we've memory-mapped a file from disk
    and we end up having to load 1 page (4KB) of the file into memory.

- Zero faults

    Memory returned by mmap() is lazily zeroed out. Every time we have
    to zero out 1 page, we count a zero fault.

- CoW faults

    VM objects can be shared by multiple mappings that make their own
    unique copy iff they want to modify it. The typical reason here is
    memory shared between a parent and child process.
2019-10-02 14:13:49 +02:00
Andreas Kling
5ab044beae Ext2FS: Make Ext2FSInode::is_directory() fast
This patch overloads Inode::is_directory() with a faster version that
doesn't require instantiating the whole InodeMetadata.

If you have an Ext2FSInode&, calling is_directory() should be instant
since we can just look directly at the raw inode bits.
2019-10-02 13:47:44 +02:00
Andreas Kling
8e775d241e Kernel: Make DiskBackedFS flush writes if cache is completely dirty
If we want to make a new entry in the disk cache when it's completely
full of dirty blocks, we'll now synchronously flush the writes at that
point. Maybe it's not ideal, but at least we can keep going.
2019-09-30 11:46:56 +02:00
Andreas Kling
922fd703c9 Kernel: Convert the DiskBackedFS write API to take "const u8*"
This way clients are not required to have instantiated ByteBuffers
and can choose whatever memory scheme works best for them.

Also converted some of the Ext2FS code to use stack buffers instead.
2019-09-30 11:23:36 +02:00
Andreas Kling
1fc2612667 Kernel: Make DiskBackedFS::read_block() write to client-provided memory
Instead of having DiskBackedFS allocate a ByteBuffer, leave it to each
client to provide buffer space.

This is significantly faster in many cases where we can use a stack
buffer and avoid heap allocation entirely.
2019-09-30 11:04:30 +02:00
Andreas Kling
a61f6ccc27 Kernel: Implement a simpler, bigger cache for DiskBackedFS
The hashmap cache was ridiculously slow and hurt us more than it helped
us. This patch replaces it with a flat memory cache that keeps up to
10'000 blocks in cache with a simple dirty bit.

The syncd task will wake up periodically and call flush_writes() on all
file systems, which now causes us to traverse the cache and write all
dirty blocks to disk.

There's a ton of room for improvement here, but this itself is already
drastically better when doing repeated GCC invocations.
2019-09-30 10:34:50 +02:00
Andreas Kling
8f45a259fc ByteBuffer: Remove pointer() in favor of data()
We had two ways to get the data inside a ByteBuffer. That was silly.
2019-09-30 08:57:01 +02:00
Sergey Bugaev
9a41dda029 Kernel: Expose blocking and cloexec fd flags in ProcFS 2019-09-28 22:27:45 +02:00
Sergey Bugaev
3652bec746 Kernel: Make proper use of the new keep_empty argument 2019-09-28 18:29:42 +02:00
Conrad Pankoff
062218b4cf Kernel: Support writing doubly-indirect ext2 blocks 2019-09-28 09:29:10 +02:00
Andreas Kling
e364846622 Ext2FS: Don't allocate blocks until we're committed to a new inode
Otherwise we would leak the blocks in case we returned earlier than
expected. :^)
2019-09-22 18:50:24 +02:00
Andreas Kling
bd7c38e7b6 Ext2FS: Oops, fix wrong ENOSPC in create_inode() 2019-09-22 18:44:05 +02:00
Andreas Kling
bff59eff4b Ext2FS: Fix two bugs in block allocation:
1) Off-by-one in block allocation when block size != 1 KB

Due to a quirk in the on-disk layout of ext2, file systems with a block
size of 1 KB have block #1 as their first block, while all others start
on block #0.

2) We had no fallback mechanism when the preferred group was full

We now allocate blocks from the preferred block group as long as it's
possible, and fall back to a simple scan through all groups when the
preferred one is full.
2019-09-22 18:37:17 +02:00
Sergey Bugaev
e9dd94063f Kernel: Do not panic on fstat(fifo)
Instead, let's return an empty buffer with st_mode indicating it's a fifo.
2019-09-17 21:56:42 +02:00
Andreas Kling
5d491fa1cd Kernel: Add a simple slab allocator for small allocations
This is a freelist allocator with static size classes that works as a
complement to the generic kmalloc(). It's a lot faster than kmalloc()
since allocation just means popping from the freelist.

It's also significantly more compact when there are a lot of objects
smaller than the minimum kmalloc chunk size (32 bytes.)

This patch enables it for the Region and PhysicalPage classes.
In the PhysicalPage (8 bytes) case, it's a huge improvement since we
no longer waste 75% of the storage allocated.

There are also a number of ways this can be improved, so let's keep
working on it going forward.
2019-09-16 10:33:27 +02:00
Andreas Kling
1c692e87a6 Kernel: Move kmalloc() into a Kernel/Heap/ directory 2019-09-16 09:01:44 +02:00
Andreas Kling
b9be6b7bb4 Ext2FS: Trying to create a too-long directory entry should ENAMETOOLONG
Also added some assertions to DirectoryEntry in case someone tries to
instantiate them with names that would overflow the name buffer.

DirectoryEntry is a crappy data structure, and the name buffer is also
crappy. Added a FIXME about replacing it with something nicer.

Before this patch, the DirectoryEntry::name buffer would overflow if
you did "touch extremely-long-file-name". Duh.

Fixes #538.
2019-09-10 21:04:27 +02:00
Andreas Kling
33e6cb8b80 Kernel: Remove spammy logging about absolute_path() on non-custodies 2019-09-08 09:37:28 +02:00
Andreas Kling
73fdbba59c AK: Rename <AK/AKString.h> to <AK/String.h>
This was a workaround to be able to build on case-insensitive file
systems where it might get confused about <string.h> vs <String.h>.

Let's just not support building that way, so String.h can have an
objectively nicer name. :^)
2019-09-06 15:36:54 +02:00
Andreas Kling
e25ade7579 Kernel: Rename "vmo" to "vmobject" everywhere 2019-09-04 11:27:14 +02:00
Conrad Pankoff
41d113713d ProcFS: Expose ARP table 2019-09-02 09:42:27 +02:00
Conrad Pankoff
682fe48222 Kernel: Show netmask/gateway in ProcFS when available 2019-08-29 06:25:06 +02:00
Sergey Bugaev
8ea25ca3b0 ProcFS: Port JSON generation to streaming serializers
This way we don't allocate giant JSON objects on the kernel heap first.
Hopefully fixes https://github.com/SerenityOS/serenity/issues/484
2019-08-27 14:56:09 +02:00
Rok Povsic
eb9ccf1c0a FileSystem: Add FIXME about resolve_path bug 2019-08-25 19:47:37 +02:00
Andreas Kling
ac7a559d96 Ext2FS: Avoid a String allocation in lookup()
By using find() with a custom finder, we can avoid creating a temporary
key value that's only used for the hash lookup.
2019-08-25 06:45:37 +02:00
Andreas Kling
b020a5e7ce Kernel: Don't create a String every time we look up a Custody by name 2019-08-25 06:45:14 +02:00
Andreas Kling
952baf32cd TmpFS: Notify any associated InodeVMObject on inode changes 2019-08-24 19:59:01 +02:00
Andreas Kling
5978993f00 TmpFS: Fix two bugs that broke GCC inside Serenity
- TmpFSInode::write_bytes() needs to allow non-zero offsets
- TmpFSInode::read_bytes() wasn't respecting the offset

GCC puts the temporary files generated during compilation in /tmp,
so this exposed some bugs in TmpFS.
2019-08-24 19:58:09 +02:00
Conrad Pankoff
286bafbb19 Kernel: Implement link status in /proc/net/adapters 2019-08-21 17:10:34 +02:00
Sergey Bugaev
0f8b45c015 ProcFS: Expose info about devices in /proc/devices 2019-08-18 15:59:59 +02:00
Sergey Bugaev
acccf9ccda Kernel: Move device lookup to Device class itself
Previously, VFS stored a list of all devices, and devices had to
register and unregister themselves with it. This cleans up things
a bit.
2019-08-18 15:59:59 +02:00
Andreas Kling
5f6b6c1665 Kernel: Do the umount() by the guest's root inode identifier
It was previously possible to unmount a filesystem mounted on /mnt by
doing e.g "umount /mnt/some/path".
2019-08-17 14:28:13 +02:00