Note that Jakt only allows StringView creation from string literals, so
none of the invariants in the class are broken by this (if used only
from within Jakt).
This allows the user to transform the contents of the optional (if any
exists), without manually unwrapping and then rewrapping it.
This is needed by the Jakt runtime.
This is used in Jakt, and providing that value from Jakt's side is more
trouble than doing this.
Considering this class is bound to go away, a little
backwards-compatible API change is just fine.
The previous moved-from state was the null string. This violates both
our invariant that String is never null, and also the C++ contract that
the moved-from state must be valid but unspecified. The empty short
string state is of course valid, so it satisfies both invariants. It
also allows us to remove any extra checks for the null state.
The reason this change is made is primarily because swap() requires
moved-from objects to be reassignable (C++ allows this). Because the
move assignment of String would not check the null state, it crashed
trying to increment the data reference count (nullptr signals a
non-short string). This meant that e.g. quick_sort'ing String would
crash immediately.
s p a c e s h i p o p e r a t o r
Comparing UTF-8 can be done by simple byte lexicographic comparison per
definition, so we just piggy-back on StringView's high-performance
comparator.
Similar to how LibJS and LibSQL used to behave, the boolean constructor
of JsonValue is currently allowing pointers to be used to construct a
boolean value. Explicitly disallow such construction.
This allows us to pass the new String type to functions that take a
StringView directly, having to call bytes_as_string_view() every time
gets old quickly.
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
Previously we allowed the end_offset to be larger than the chunk itself,
which made it so that certain input sizes would make the logic attempt
to delete a nonexistent object.
Fixes#16308.
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.
One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
This patch adds support for 128-bit floating points in FloatExtractor.
This is required to build SerenityOS on MacOS/aarch64. It might break
building for Raspberry Pi.
AK internals like to use concepts and details without a fully qualified
name, which usually works just fine because we make everything
AK-related available to the unqualified namespace.
However, this breaks as soon as we start not using `USING_AK_GLOBALLY`,
due to those identifiers no longer being made available. Instead, we
just export those into the `AK` namespace instead.
This patch adds the `USING_AK_GLOBALLY` macro which is enabled by
default, but can be overridden by build flags.
This is a step towards integrating Jakt and AK types.
Unlike iterator_at_byte_offset(), this function assumes the provided
byte offset is a valid offset into the UTF-8 character stream.
This avoids walking the stream from the start.
There was a subtle mismatch between the obviously expected behavior
of BumpAllocator::for_each_chunk() and its actual implementation.
You'd think it would invoke the callback with the address of each chunk,
but actually it also took the liberty of adding sizeof(ChunkHeader) to
this address. UniformBumpAllocator::destroy_all() relied on this to
get the right address for objects to delete.
The bug happened in BumpAllocator::deallocate_all(), where we use
for_each_chunk() to walk the list of chunks and munmap() them.
To avoid memory mapping churn, we keep a global cache of 1 chunk around.
Since we were being called with the offset chunk address, it meant that
the cached chunk shifted 16 bytes away from its real address every time
we re-added it to the cache.
Eventually the cached chunk address would leave its memory region
entirely, and at that point, any attempt to allocate from it would yield
an address outside the region, causing memory corruption.