Commit Graph

113 Commits

Author SHA1 Message Date
Liav A
5efb91ec06 Kernel/VFS: Ensure working with mount entry per a custody is safe
Previously we could get a raw pointer to a Mount object which might be
invalid when actually dereferencing it.
To ensure this could not happen, we should just use a callback that will
be used immediately after finding the appropriate Mount entry, while
holding the mount table lock.
2023-08-05 18:41:01 +02:00
Liav A
d216f780a4 Kernel/VFS: Remove the find_mount_for_guest method
We don't really need this method anymore, because we could just try to
find the mount entry based on the given mount point host custody.

This also allows us to remove the is_vfs_root and root_inode_id methods
from the VirtualFileSystem class.
2023-08-05 18:41:01 +02:00
Liav A
e5c7662638 Kernel/VFS: Check matching absolute path when jump to mount guest inode
We could easily encounter a case where we do the following:

```
mkdir -p /tmp2
mount /dev/hda /tmp2
```

would produce a bug that doing `ls /tmp2/tmp2` will give the contents
on `/dev/hda` ext2 root directory and also on `/tmp2/tmp2/tmp2` and so
on.
To prevent this, we must compare the current custody against each mount
entry's custody to ensure their paths match.
2023-08-05 18:41:01 +02:00
Liav A
debbfe07fb Kernel/VFS: Ensure Custodies' absolute path don't match before mounting
This ensures that the host mount point custody path is not the same like
the new to-be-mounted custody.

A scenario that could happen before adding this check is:
```
mkdir -p /tmp2
mount /dev/hda /tmp2/
mount /dev/hda /tmp2/
mount /dev/hda /tmp2/ # this will fail here
```

and after adding this check, the following scenario is now this:
```
mkdir -p /tmp2
mount /dev/hda /tmp2/
mount /dev/hda /tmp2/ # this will fail here
mount /dev/hda /tmp2/ # this will fail here too
```
2023-08-05 18:41:01 +02:00
kleines Filmröllchen
8940552d1d Kernel/VirtualFileSystem: Allow unmounting via inode and mount path
This pair of information uniquely identifies any mount point, and it can
be used in situations where mount point custodies are not available.
2023-07-15 00:12:01 +02:00
Liav A
23a7ccf607 Kernel+LibCore+LibC: Split the mount syscall into multiple syscalls
This is a preparation before we can create a usable mechanism to use
filesystem-specific mount flags.
To keep some compatibility with userland code, LibC and LibCore mount
functions are kept being usable, but now instead of doing an "atomic"
syscall, they do multiple syscalls to perform the complete procedure of
mounting a filesystem.

The FileBackedFileSystem IntrusiveList in the VFS code is now changed to
be protected by a Mutex, because when we mount a new filesystem, we need
to check if a filesystem is already created for a given source_fd so we
do a scan for that OpenFileDescription in that list. If we fail to find
an already-created filesystem we create a new one and register it in the
list if we successfully mounted it. We use a Mutex because we might need
to initiate disk access during the filesystem creation, which will take
other mutexes in other parts of the kernel, therefore making it not
possible to take a spinlock while doing this.
2023-07-02 01:04:51 +02:00
Liav A
cbf78975f1 Kernel: Add the futimens syscall
We have a problem with the original utimensat syscall because when we
do call LibC futimens function, internally we provide an empty path,
and the Kernel get_syscall_path_argument method will detect this as an
invalid path.

This happens to spit an error for example in the touch utility, so if a
user is running "touch non_existing_file", it will create that file, but
the user will still see an error coming from LibC futimens function.

This new syscall gets an open file description and it provides the same
functionality as utimensat, on the specified open file description.
The new syscall will be used later by LibC to properly implement LibC
futimens function so the situation described with relation to the
"touch" utility could be fixed.
2023-04-10 10:21:28 +02:00
Andreas Kling
673592dea8 Kernel: Stop using *LockRefPtr for FileSystem pointers
There was only one permanent storage location for these: as a member
in the Mount class.

That member is never modified after Mount initialization, so we don't
need to worry about races there.
2023-04-04 10:33:42 +02:00
Andreas Kling
e6fc7b3ff7 Kernel: Switch LockRefPtr<Inode> to RefPtr<Inode>
The main place where this is a little iffy is in RAMFS where inodes
have a LockWeakPtr to their parent inode. I've left that as a
LockWeakPtr for now.
2023-03-09 21:54:59 +01:00
Andreas Kling
d1371d66f7 Kernel: Use non-locking {Nonnull,}RefPtr for OpenFileDescription
This patch switches away from {Nonnull,}LockRefPtr to the non-locking
smart pointers throughout the kernel.

I've looked at the handful of places where these were being persisted
and I don't see any race situations.

Note that the process file descriptor table (Process::m_fds) was already
guarded via MutexProtected.
2023-03-07 00:30:12 +01:00
Andreas Kling
21db2b7b90 Everywhere: Remove NonnullOwnPtr.h includes 2023-03-06 23:46:35 +01:00
Liav A
39de5b7f82 Kernel: Actually check Process unveil data when creating perfcore dump
Before of this patch, we looked at the unveil data of the FinalizerTask,
which naturally doesn't have any unveil restrictions, therefore allowing
an unveil bypass for a process that enabled performance coredumps.

To ensure we always check the dumped process unveil data, an option to
pass a Process& has been added to a couple of methods in the class of
VirtualFileSystem.
2023-03-05 15:15:55 +00:00
kleines Filmröllchen
a6a439243f Kernel: Turn lock ranks into template parameters
This step would ideally not have been necessary (increases amount of
refactoring and templates necessary, which in turn increases build
times), but it gives us a couple of nice properties:
- SpinlockProtected inside Singleton (a very common combination) can now
  obtain any lock rank just via the template parameter. It was not
  previously possible to do this with SingletonInstanceCreator magic.
- SpinlockProtected's lock rank is now mandatory; this is the majority
  of cases and allows us to see where we're still missing proper ranks.
- The type already informs us what lock rank a lock has, which aids code
  readability and (possibly, if gdb cooperates) lock mismatch debugging.
- The rank of a lock can no longer be dynamic, which is not something we
  wanted in the first place (or made use of). Locks randomly changing
  their rank sounds like a disaster waiting to happen.
- In some places, we might be able to statically check that locks are
  taken in the right order (with the right lock rank checking
  implementation) as rank information is fully statically known.

This refactoring even more exposes the fact that Mutex has no lock rank
capabilites, which is not fixed here.
2023-01-02 18:15:27 -05:00
sin-ack
2a502fe232 Kernel+LibC+LibCore+UserspaceEmulator: Implement faccessat(2)
Co-Authored-By: Daniel Bertalan <dani@danielbertalan.dev>
2022-12-11 19:55:37 -07:00
sin-ack
d5fbdf1866 Kernel+LibC+LibCore: Implement renameat(2)
Now with the ability to specify different bases for the old and new
paths.
2022-12-11 19:55:37 -07:00
Liav A
aa9fab9c3a Kernel/FileSystem: Convert the mount table from Vector to IntrusiveList
The fact that we used a Vector meant that even if creating a Mount
object succeeded, we were still at a risk that appending to the actual
mounts Vector could fail due to OOM condition. To guard against this,
the mount table is now an IntrusiveList, which always means that when
allocation of a Mount object succeeded, then inserting that object to
the list will succeed, which allows us to fail early in case of OOM
condition.
2022-12-09 23:29:33 -07:00
Liav A
fea3cb5ff9 Kernel/FileSystem: Discard safely filesystems when unmounted last time
This commit reached that goal of "safely discarding" a filesystem by
doing the following:
1. Stop using the s_file_system_map HashMap as it was an unsafe measure
to access pointers of FileSystems. Instead, make sure to register all
FileSystems at the VFS layer, with an IntrusiveList, to avoid problems
related to OOM conditions.
2. Make sure to cleanly remove the DiskCache object from a BlockBased
filesystem, so the destructor of such object will not need to do that in
the destruction point.
3. For ext2 filesystems, don't cache the root inode at m_inode_cache
HashMap. The reason for this is that when unmounting an ext2 filesystem,
we lookup at the cache to see if there's a reference to a cached inode
and if that's the case, we fail with EBUSY. If we keep the m_root_inode
also being referenced at the m_inode_cache map, we have 2 references to
that object, which will lead to fail with EBUSY. Also, it's much simpler
to always ask for a root inode and get it immediately from m_root_inode,
instead of looking up the cache for that inode.
2022-10-22 16:57:52 -04:00
Liav A
0fd7b688af Kernel: Introduce support for using FileSystem object in multiple mounts
The idea is to enable mounting FileSystem objects across multiple mounts
in contrast to what happened until now - each mount has its own unique
FileSystem object being attached to it.

Considering a situation of mounting a block device at 2 different mount
points at in system, there were a couple of critical flaws due to how
the previous "design" worked:
1. BlockBasedFileSystem(s) that pointed to the same actual device had a
separate DiskCache object being attached to them. Because both instances
were not synchronized by any means, corruption of the filesystem is most
likely achieveable by a simple cache flush of either of the instances.
2. For superblock-oriented filesystems (such as the ext2 filesystem),
lack of synchronization between both instances can lead to severe
corruption in the superblock, which could render the entire filesystem
unusable.
3. Flags of a specific filesystem implementation (for example, with xfs
on Linux, one can instruct to mount it with the discard option) must be
honored across multiple mounts, to ensure expected behavior against a
particular filesystem.

This patch put the foundations to start fix the issues mentioned above.
However, there are still major issues to solve, so this is only a start.
2022-10-22 16:57:52 -04:00
Andreas Kling
c3351d4b9f Kernel: Make VirtualFileSystem functions take credentials as input
Instead of getting credentials from Process::current(), we now require
that they be provided as input to the various VFS functions.

This ensures that an atomic set of credentials is used throughout an
entire VFS operation.
2022-08-21 16:02:24 +02:00
Andreas Kling
728c3fbd14 Kernel: Use RefPtr instead of LockRefPtr for Custody
By protecting all the RefPtr<Custody> objects that may be accessed from
multiple threads at the same time (with spinlocks), we remove the need
for using LockRefPtr<Custody> (which is basically a RefPtr with a
built-in spinlock.)
2022-08-21 12:25:14 +02:00
Andreas Kling
11eee67b85 Kernel: Make self-contained locking smart pointers their own classes
Until now, our kernel has reimplemented a number of AK classes to
provide automatic internal locking:

- RefPtr
- NonnullRefPtr
- WeakPtr
- Weakable

This patch renames the Kernel classes so that they can coexist with
the original AK classes:

- RefPtr => LockRefPtr
- NonnullRefPtr => NonnullLockRefPtr
- WeakPtr => LockWeakPtr
- Weakable => LockWeakable

The goal here is to eventually get rid of the Lock* classes in favor of
using external locking.
2022-08-20 17:20:43 +02:00
kleines Filmröllchen
4314c25cf2 Kernel: Require lock rank for Spinlock construction
All users which relied on the default constructor use a None lock rank
for now. This will make it easier to in the future remove LockRank and
actually annotate the ranks by searching for None.
2022-08-19 20:26:47 -07:00
Kristiyan Stoimenov
9e1bea50ee Kernel/VFS: Check that mount-point is not in use
Before committing the mount, verify that this host i-node is not already
a mount-point.
2022-08-12 19:57:18 -07:00
Ariel Don
9a6bd85924 Kernel+LibC+VFS: Implement utimensat(3)
Create POSIX utimensat() library call and corresponding system call to
update file access and modification times.
2022-05-21 18:15:00 +02:00
Idan Horowitz
086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
Idan Horowitz
feb00b7105 Everywhere: Make JSON serialization fallible
This allows us to eliminate a major source of infallible allocation in
the Kernel, as well as lay down the groundwork for OOM fallibility in
userland.
2022-02-27 20:37:57 +01:00
Andreas Kling
210689281f Kernel: Protect mounted filesystem list with spinlock instead of mutex 2022-02-03 16:11:26 +01:00
Daniel Bertalan
182016d7c0 Kernel+LibC+LibCore+UE: Implement fchmodat(2)
This function is an extended version of `chmod(2)` that lets one control
whether to dereference symlinks, and specify a file descriptor to a
directory that will be used as the base for relative paths.
2022-01-12 14:54:12 +01:00
circl
63760603f3 Kernel+LibC+LibCore: Add lchown and fchownat functions
This modifies sys$chown to allow specifying whether or not to follow
symlinks and in which directory.

This was then used to implement lchown and fchownat in LibC and LibCore.
2022-01-01 15:08:49 +01:00
Andreas Kling
ac7ce12123 Kernel: Remove the kmalloc_eternal heap :^)
This was a premature optimization from the early days of SerenityOS.
The eternal heap was a simple bump pointer allocator over a static
byte array. My original idea was to avoid heap fragmentation and improve
data locality, but both ideas were rooted in cargo culting, not data.

We would reserve 4 MiB at boot and only ended up using ~256 KiB, wasting
the rest.

This patch replaces all kmalloc_eternal() usage by regular kmalloc().
2021-12-28 21:02:38 +01:00
Martin Bříza
f75bab2a25 Kernel: Move symlink recursion limit to .h, increase it to 8
As pointed out by BertalanD on Discord, POSIX specifies that
_SC_SYMLOOP_MAX (implemented in the following commit) always needs to be
equal or more than _POSIX_SYMLOOP_MAX (8, defined in
LibC/bits/posix1_lim.h), hence I've increased it to that value to
comply with the standard.

The move to header is required for the following commit - to make this
constant accessible outside of the VFS class, namely in sysconf.
2021-12-21 12:54:11 -08:00
Hendiadyoin1
e34eb3e36d Kernel: Remove unused String.h includes
This makes searching for not yet OOM safe interfaces a bit easier.
2021-12-11 13:15:26 -08:00
Andreas Kling
5ce753b74d Kernel: Make Inode::traverse_as_directory() callback return ErrorOr
This allows us to propagate errors from inside the callback with TRY().
2021-11-10 21:58:58 +01:00
Andreas Kling
79fa9765ca Kernel: Replace KResult and KResultOr<T> with Error and ErrorOr<T>
We now use AK::Error and AK::ErrorOr<T> in both kernel and userspace!
This was a slightly tedious refactoring that took a long time, so it's
not unlikely that some bugs crept in.

Nevertheless, it does pass basic functionality testing, and it's just
real nice to finally see the same pattern in all contexts. :^)
2021-11-08 01:10:53 +01:00
Ben Wiederhake
88ca12f037 Kernel: Enable early-returns from VFS::for_each_mount 2021-10-31 12:06:28 +01:00
Andreas Kling
4a9c18afb9 Kernel: Rename FileDescription => OpenFileDescription
Dr. POSIX really calls these "open file description", not just
"file description", so let's call them exactly that. :^)
2021-09-07 13:53:14 +02:00
Andreas Kling
71187d865e Kernel: Tidy up VirtualFileSystem::mount_root() a little bit
- Return KResult instead of bool
- Use TRY()
2021-09-05 14:46:44 +02:00
sin-ack
566c5d1e99 AK+Kernel: Move KResult.h to Kernel/API for userspace access
This commit moves the KResult and KResultOr objects to Kernel/API to
signify that they may now be freely used by userspace code at points
where a syscall-related error result is to be expected. It also exposes
KResult and KResultOr to the global namespace to make it nicer to use
for userspace code.
2021-09-05 12:54:48 +02:00
Andreas Kling
ae197deb6b Kernel: Strongly typed user & group ID's
Prior to this change, both uid_t and gid_t were typedef'ed to `u32`.
This made it easy to use them interchangeably. Let's not allow that.

This patch adds UserID and GroupID using the AK::DistinctNumeric
mechanism we've already been employing for pid_t/ProcessID.
2021-08-29 01:09:19 +02:00
Andreas Kling
c2fc33becd Kernel: Rename ProtectedValue<T> => MutexProtected<T>
Let's make it obvious what we're protecting it with.
2021-08-22 03:34:09 +02:00
Andreas Kling
29a58459ab Kernel: Use ProtectedValue for VirtualFileSystem::m_mounts
This is what VirtualFileSystem::m_lock was actually guarding, and
wrapping it in a ProtectedValue makes it so much more obvious how it
all works. :^)
2021-08-16 01:41:26 +02:00
Andreas Kling
dcb015fa7e Kernel: Move VFS-internal O_FOO definitions to VirtualFileSystem.h 2021-08-14 19:58:11 +02:00
Andreas Kling
44da58c0b2 Kernel: Move UnveilNode.h into Kernel/FileSystem/ 2021-08-06 14:11:45 +02:00
Andreas Kling
cee9528168 Kernel: Rename Lock to Mutex
Let's be explicit about what kind of lock this is meant to be.
2021-07-17 21:10:32 +02:00
Andreas Kling
98080497d2 Kernel: Use Forward.h headers more 2021-07-11 14:14:51 +02:00
Andreas Kling
4238e2e9be Kernel: Only allow looking up Mounts by InodeIdentifier
Let's simplify the interface by not allowing lookup by Inode&.
2021-07-11 00:51:06 +02:00
Andreas Kling
6a27de2d94 Kernel: Make VirtualFileSystem::Mount a top-level class
And move it to its own compilation unit.
2021-07-11 00:51:06 +02:00
Andreas Kling
07c4c89297 Kernel: Make VirtualFileSystem::sync() static 2021-07-11 00:26:17 +02:00
Andreas Kling
0d39bd04d3 Kernel: Rename VFS => VirtualFileSystem 2021-07-11 00:25:24 +02:00
Andreas Kling
d53d9d3677 Kernel: Rename FS => FileSystem
This matches our common naming style better.
2021-07-11 00:20:38 +02:00