- Renamed try_create_absolute_path() => try_serialize_absolute_path()
- Use KResultOr and TRY() to propagate errors
- Don't call this when it's only for debug logging
- Use KResultOr and TRY() to propagate errors
- Check for OOM errors
- Move allocation out of constructors
There's still a lot more to do here, as SysFS is still quite brittle
in the face of memory pressure.
The default template argument is only used in one place, and it
looks like it was probably just an oversight. The rest of the Kernel
code all uses u8 as the type. So lets make that the default and remove
the unused template argument, as there doesn't seem to be a reason to
allow the size to be customizable.
This commit moves the KResult and KResultOr objects to Kernel/API to
signify that they may now be freely used by userspace code at points
where a syscall-related error result is to be expected. It also exposes
KResult and KResultOr to the global namespace to make it nicer to use
for userspace code.
Like with the ProcFS, description data can change at anytime, so it's
wise to ensure that when the userland reads from an Inode, data is
consistent unless the userland indicated it wants to refresh the data
(by seeking to offset 0, or re-attaching the Inode).
Otherwise, if the data changes in the middle of the reading, it can
cause silent corruption in output which can lead to random crashes.
Our existing implementation did not check the element type of the other
pointer in the constructors and move assignment operators. This meant
that some operations that would require explicit casting on raw pointers
were done implicitly, such as:
- downcasting a base class to a derived class (e.g. `Kernel::Inode` =>
`Kernel::ProcFSDirectoryInode` in Kernel/ProcFS.cpp),
- casting to an unrelated type (e.g. `Promise<bool>` => `Promise<Empty>`
in LibIMAP/Client.cpp)
This, of course, allows gross violations of the type system, and makes
the need to type-check less obvious before downcasting. Luckily, while
adding the `static_ptr_cast`s, only two truly incorrect usages were
found; in the other instances, our casts just needed to be made
explicit.
Prior to this change, both uid_t and gid_t were typedef'ed to `u32`.
This made it easy to use them interchangeably. Let's not allow that.
This patch adds UserID and GroupID using the AK::DistinctNumeric
mechanism we've already been employing for pid_t/ProcessID.
Instead of registering with blocker sets and whatnot in the various
Blocker subclass constructors, this patch moves such initialization
to a separate setup_blocker() virtual.
setup_blocker() returns false if there's no need to actually block
the thread. This allows us to bail earlier in Thread::block().
Namely, will_unblock_immediately_without_blocking(Reason).
This virtual function is called on a blocker *before any block occurs*,
if it turns out that we don't need to block the thread after all.
This can happens for one of two reasons:
- UnblockImmediatelyReason::UnblockConditionAlreadyMet
We don't need to block the thread because the condition for
unblocking it is already met.
- UnblockImmediatelyReason::TimeoutInThePast
We don't need to block the thread because a timeout was specified
and that timeout is already in the past.
This patch does not introduce any behavior changes, it's only meant to
clarify this part of the blocking logic.
Namely, unblock_all_blockers_whose_conditions_are_met().
The old name made it sound like things were getting unblocked no matter
what, but that's not actually the case.
What this actually does is iterate through the set of blockers,
unblocking those whose conditions are met. So give it a (very) verbose
name that errs on the side of descriptiveness.
This has several benefits:
1) We no longer just blindly derefence a null pointer in various places
2) We will get nicer runtime error messages if the current process does
turn out to be null in the call location
3) GCC no longer complains about possible nullptr dereferences when
compiling without KUBSAN
This is primarily to be able to remove the GenericLexer include out of
Format.h as well. A subsequent commit will add AK::Result to
GenericLexer, which will cause naming conflicts with other structures
named Result. This can be avoided (for now) by preventing nearly every
file in the system from implicitly including GenericLexer.
Other changes in this commit are to add the GenericLexer include to
files where it is missing.
We don't need to access the Custody cache in IRQs or anything like that,
so it should be fine to use a regular Mutex (via ProtectedValue.)
This allows threads to block while waiting for the custody cache.
Thanks to Sergey for pointing this out. :^)
Usage patterns mean we are more likely to need a Custody we just cached.
Because lookup walks the list from the beginning, prepending new items
instead of appending means they will be found quicker.
This reduces the number of items in the cache we need to walk by 50% for
boot and application startups.
Use a StringBuilder to generate a temporary string buffer with the
slave PTY names. (This works because StringBuilder has 128 bytes of
inline stack capacity before it does any heap allocations.)
This simplifies the DevPtsFS implementation somewhat, as it no longer
requires SlavePTY to register itself with it, since it can now simply
use the list of SlavePTY instances.
Make File inherit from RefCountedBase and provide a custom unref()
implementation. This will allow subclasses that participate in lists to
remove themselves in a safe way when being destroyed.
This patch adds a (spinlock-protected) custody cache. It's a simple
intrusive list containing all currently live custody objects.
This allows us to re-use existing custodies instead of creating them
again and again.
This gives a pretty decent performance improvement on "find /" :^)
We are not using this for anything and it's just been sitting there
gathering dust for well over a year, so let's stop carrying all this
complexity around for no good reason.
This allows us to 1) let go of the Process when an inode is ref'ing for
ProcFSExposedComponent related reasons, and 2) change our ref/unref
implementation.
This patch removes KResult::operator int() and deals with the fallout.
This forces a lot of code to be more explicit in its handling of errors,
greatly improving readability.
Calling error() on KResult is a mistake I made in 7ba991dc37, so
instead of doing that, which triggers an assertion if an error occured,
in Inode::read_entire method with VERIFY(nread <= sizeof(buffer)), we
really should just return the KResult and not to call error() on it.
Since the InodeIndex encapsulates a 64 bit value, it is correct to
ensure that the Kernel is exposing the entire value and the LibC is
aware of it.
This commit requires an entire re-compile because it's essentially a
change in the Kernel ABI, together with a corresponding change in LibC.
Previously, non-blocking read operations on pipes returned EOF instead
of EAGAIN and non-blocking write operations blocked. This fixes
pkgsrc's bmake "eof on job pipe!" error message when running in
non-compatibility mode.
This commit implements the ISO 9660 filesystem as specified in ECMA 119.
Currently, it only supports the base specification and Joliet or Rock
Ridge support is not present. The filesystem will normalize all
filenames to be lowercase (same as Linux).
The filesystem can be mounted directly from a file. Loop devices are
currently not supported by SerenityOS.
Special thanks to Lubrsi for testing on real hardware and providing
profiling help.
Co-Authored-By: Luke <luke.wilde@live.co.uk>
...and also RangeAllocator => VirtualRangeAllocator.
This clarifies that the ranges we're dealing with are *virtual* memory
ranges and not anything else.
Now that all KResult and KResultOr are used consistently throughout the
kernel, it's no longer necessary to return negative error codes.
However, we were still doing that in some places, so let's fix all those
(bugs) by removing the minuses. :^)
Create the disk cache up front, so we can verify it succeeds.
Make the KBuffer allocation fail-able, so we can properly handle
failure when the user asks up to mount a Ext2 filesystem under
OOM conditions.
It's easy to forget the responsibility of validating and safely copying
kernel parameters in code that is far away from syscalls. ioctl's are
one such example, and bugs there are just as dangerous as at the root
syscall level.
To avoid this case, utilize the AK::Userspace<T> template in the ioctl
kernel interface so that implementors have no choice but to properly
validate and copy ioctl pointer arguments.
This patch changes the semantics of purgeable memory.
- AnonymousVMObject now has a "purgeable" flag. It can only be set when
constructing the object. (Previously, all anonymous memory was
effectively purgeable.)
- AnonymousVMObject now has a "volatile" flag. It covers the entire
range of physical pages. (Previously, we tracked ranges of volatile
pages, effectively making it a page-level concept.)
- Non-volatile objects maintain a physical page reservation via the
committed pages mechanism, to ensure full coverage for page faults.
- When an object is made volatile, it relinquishes any unused committed
pages immediately. If later made non-volatile again, we then attempt
to make a new committed pages reservation. If this fails, we return
ENOMEM to userspace.
mmap() now creates purgeable objects if passed the MAP_PURGEABLE option
together with MAP_ANONYMOUS. anon_create() memory is always purgeable.
Advisory locks don't actually prevent other processes from writing to
the file, but they do prevent other processes looking to acquire and
advisory lock on the file.
This implementation currently only adds non-blocking locks, which are
all I need for now.
We often get queried for the root inode, and it will always be cached
in memory anyway, so let's make Ext2FS::root_inode() fast by keeping
the root inode in a dedicated member variable.
To count the remaining children, we simply need to traverse the
directory and increment a counter. No need for a custom virtual that
all file systems have to implement. :^)
Unless we're accessing mutex-guarded metadata, there's no need to
acquire the inode lock.
The file system ID or inode index of a constructed inode will never
change, for example.
Reimplement directory traversal in terms of read_bytes() instead of
doing direct block access. This lets us avoid taking the inode lock
while iterating over the directory contents.
Once we've finalized all the file system metadata in flush_writes(),
we no longer need to hold the file system lock during the call to
BlockBasedFileSystem::flush_writes().
Ext2FS::get_inode() will remember unknown inode indices that it has
been asked about and put them into the inode cache as null inodes.
flush_writes() was not null-checking these while iterating, which
was a bug I finally managed to hit.
Flushing also seemed like a good time to drop unknown inodes from
the cache, since there's no good reason to hold to them indefinitely.
The file system lock is meant to protect the file system metadata
(super blocks, bitmaps, etc.) Not protect processes from reading
independent parts of the disk at once.
This patch introduces a new lock to protect the *block cache* instead,
which is the real thing that needs synchronization.
Forcing users of a FileDescription to seek before they can read/write
makes it inherently racy. This patch adds variants of read/write that
simply ignore the "current offset" of the description in favor of a
caller-supplied offset.
In case we are about to delete the PID directory, we clear the Process
pointer. If someone still holds a reference to the PID directory (by
opening it), we still need to delete the process, but we can't delete
the directory, so we will keep it alive, but any operation on it will
fail by propogating the error to userspace about that the Process was
deleted and therefore there's no meaning to trying to do operations on
the directory.
Fixes#8576.
We need some overflow checks due to the implementation of TmpFS.
When size_t is 32 bits and off_t is 64 bits, we might overflow our
KBuffer max size and confuse the KBuffer set_size code, causing a VERIFY
failure. Make sure that resulting offset + size will fit in a size_t.
Another constraint, we make sure that the resulting offset + size will
be less than half of the maximum value of a size_t, because we double
the KBuffer size each time we resize it.
Previously, VirtualFileSystem::mkdir() would always return ENOENT if
no parent custody was returned by resolve_path(). This is incorrect when
e.g. the user has no search permission in a component of the path
prefix (=> EACCES), or if on component of the path prefix is a file (=>
ENOTDIR). This patch fixes that behavior.
This adds KLexicalPath::try_join(). As this cannot be done without
allocation, it uses KString and can fail. This patch also uses it at one
place. All the other cases of String::formatted("{}/{}", ...) currently
rely on the return value being a String, which means they cannot easily
be converted to use the new API.
This replaces all uses of LexicalPath in the Kernel with the functions
from KLexicalPath. This also allows the Kernel to stop including
AK::LexicalPath.
This change enforces that paths passed to
VFS::validate_path_against_process_veil are absolute and do not contain
any '..' or '.' parts. We should VERIFY here instead of returning EINVAL
since the code that calls this should resolve non-canonical paths before
calling this function.
Previously, Custody::absolute_path() was called for every call to
validate_path_against_process_veil(). For processes that don't have a
veil, the path is not used by the function. This means that it is
unnecessarily generated. This introduces an overload to
validate_path_against_process_veil(), which takes a Custody const& and
only generates the absolute path if it there is actually a veil and it
is thus needed.
This patch results in a speed up of Assistant's file system cache
building by around 16 percent.
These small changes fix the remaining warnings that come up during
kernel compilation with Clang. These specific fixes were for benign
things: unused lambda captures and braces around scalar initializers.
The `#pragma GCC diagnostic` part is needed because the class has
virtual methods with the same name but different arguments, and Clang
tries to warn us that we are not actually overriding anything with
these.
Weirdly enough, GCC does not seem to care.
Now we use WeakPtrs to break Ref-counting cycle. Also, we call the
prepare_for_deletion method to ensure deleted objects are ready for
deletion. This is necessary to ensure we don't keep dead processes,
which would become zombies.
In addition to that, add some debug prints to aid debug in the future.
This changes the m_parts, m_dirname, m_basename, m_title and m_extension
member variables to StringViews onto the m_string String. It also
removes the m_is_absolute member in favour of computing if a path is
absolute in the is_absolute() getter. Due to this, the canonicalize()
method has been completely rewritten.
The parts() getter still returns a Vector<String>, although it is no
longer a const reference as m_parts is no longer a Vector<String>.
Rather, it is constructed from the StringViews in m_parts upon request.
The parts_view() getter has been added, which returns Vector<StringView>
const&. Most previous users of parts() have been changed to use
parts_view(), except where Strings are required.
Due to this change, it's is now no longer allow to create temporary
LexicalPath objects to call the dirname, basename, title, or extension
getters on them because the returned StringViews will point to possible
freed memory.
The LexicalPath instance methods dirname(), basename(), title() and
extension() will be changed to return StringView const& in a further
commit. Due to this, users creating temporary LexicalPath objects just
to call one of those getters will recieve a StringView const& pointing
to a possible freed buffer.
To avoid this, static methods for those APIs have been added, which will
return a String by value to avoid those problems. All cases where
temporary LexicalPath objects have been used as described above haven
been changed to use the static APIs.
It didn't make any sense to hardcode the modified time of all created
inodes with "mepoch", so we should query the procfs "backend" to get
the modified time value.
Since ProcFS is dynamically changed all the time, the modified time
equals to the querying time.
This could be changed if desired, by making the modified_time()
method virtual and overriding it in different procfs-backed objects :)
The new ProcFS design consists of two main parts:
1. The representative ProcFS class, which is derived from the FS class.
The ProcFS and its inodes are much more lean - merely 3 classes to
represent the common type of inodes - regular files, symbolic links and
directories. They're backed by a ProcFSExposedComponent object, which
is responsible for the functional operation behind the scenes.
2. The backend of the ProcFS - the ProcFSComponentsRegistrar class
and all derived classes from the ProcFSExposedComponent class. These
together form the entire backend and handle all the functions you can
expect from the ProcFS.
The ProcFSExposedComponent derived classes split to 3 types in the
manner of lifetime in the kernel:
1. Persistent objects - this category includes all basic objects, like
the root folder, /proc/bus folder, main blob files in the root folders,
etc. These objects are persistent and cannot die ever.
2. Semi-persistent objects - this category includes all PID folders,
and subdirectories to the PID folders. It also includes exposed objects
like the unveil JSON'ed blob. These object are persistent as long as the
the responsible process they represent is still alive.
3. Dynamic objects - this category includes files in the subdirectories
of a PID folder, like /proc/PID/fd/* or /proc/PID/stacks/*. Essentially,
these objects are always created dynamically and when no longer in need
after being used, they're deallocated.
Nevertheless, the new allocated backend objects and inodes try to use
the same InodeIndex if possible - this might change only when a thread
dies and a new thread is born with a new thread stack, or when a file
descriptor is closed and a new one within the same file descriptor
number is opened. This is needed to actually be able to do something
useful with these objects.
The new design assures that many ProcFS instances can be used at once,
with one backend for usage for all instances.
The intention is to add dynamic mechanism for notifying the userspace
about hotplug events. Currently, the DMI (SMBIOS) blobs and ACPI tables
are exposed in the new filesystem.
This commit converts naked `new`s to `AK::try_make` and `AK::try_create`
wherever possible. If the called constructor is private, this can not be
done, so we instead now use the standard-defined and compiler-agnostic
`new (nothrow)`.