This error was introduced by 9a7accdd and had a significant impact on
`BufferedFile` behavior. Hence, we started seeing crash in test262.
By itself, the issue was a wrong calculation of the internal reading
spans when using the `read` and `until` parameters. Which can lead to
at worse crash in VERIFY and at least weird behaviors as missed needles
or detections out of bounds.
It was also accompanied by an erroneous test.
This patch fixes the bug, the test and also provides more tests.
This parameter allows to start searching after an offset. For example,
to resume a search.
It is unfortunately a breaking change in API so this patch also modifies
one user and one test.
This implements a FlyString that will de-duplicate String instances. The
FlyString will store the raw encoded data of the String instance: If the
String is a short string, FlyString holds the String::ShortString bytes;
otherwise FlyString holds a pointer to the Detail::StringData.
FlyString itself does not know about String's storage or how to refcount
its Detail::StringData. It defers to String to implement these details.
Since AK can't refer to LibUnicode directly, the strategy here is that
if you need case transformations, you can link LibUnicode and receive
them. If you try to use either of these methods without linking it, then
you'll of course get a linker error (note we don't do any fallbacks to
e.g. ASCII case transformations). If you don't need these methods, you
don't have to link LibUnicode.
DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so
let's rename it to A) match the name of DeprecatedString, B) write a new
FlyString class that is tied to String.
This allows us to make all comparision operators on the class constexpr
without pulling in a bunch of boilerplate. We don't use the `<compare>`
header because it doesn't compile in the main serenity cross-build due
to the include paths to LibC being incompatible with how libc++ expects
them to be for clang builds.
These instances were detected by searching for files that include
AK/StdLibExtras.h, but don't match the regex:
\\b(abs|AK_REPLACED_STD_NAMESPACE|array_size|ceil_div|clamp|exchange|for
ward|is_constant_evaluated|is_power_of_two|max|min|mix|move|_RawPtr|RawP
tr|round_up_to_power_of_two|swap|to_underlying)\\b
(Without the linebreaks.)
This regex is pessimistic, so there might be more files that don't
actually use any "extra stdlib" functions.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
These instances were detected by searching for files that include
AK/Format.h, but don't match the regex:
\\b(CheckedFormatString|critical_dmesgln|dbgln|dbgln_if|dmesgln|FormatBu
ilder|__FormatIfSupported|FormatIfSupported|FormatParser|FormatString|Fo
rmattable|Formatter|__format_value|HasFormatter|max_format_arguments|out
|outln|set_debug_enabled|StandardFormatter|TypeErasedFormatParams|TypeEr
asedParameter|VariadicFormatParams|v_critical_dmesgln|vdbgln|vdmesgln|vf
ormat|vout|warn|warnln|warnln_if)\\b
(Without the linebreaks.)
This regex is pessimistic, so there might be more files that don't
actually use any formatting functions.
Observe that this revealed that Userland/Libraries/LibC/signal.cpp is
missing an include.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
In practice, this function does not take any perceptible amount of time.
However, this benchmark demonstrates that for extreme values, the
internal for-loop does matter.
Using policy based design `SinglyLinkedList` and
`SinglyLinkedListWithCount` can be combined into one class which takes
a policy to determine how to keep track of the size of the list. The
default policy is to use list iteration to count the items in the list
each time. The `WithCount` form is a different policy which tracks the
size, but comes with the overhead of storing the count and
incrementing/decrementing on each modification.
This model is extensible to have other forms of counting by
implementing only a new policy instead of implementing a totally new
type.
In 7c5e30daaa, the focus was "only" on
Userland/Libraries/, whereas this commit cleans up the remaining
headers in the repo, and any new badly-formatted include.
The class is very similar to `CircularDuplexStream` in its behavior.
Main differences are that `CircularBuffer`:
- does not inherit from `AK::Stream`
- uses `ErrorOr` for its API
- is heap allocated (and OOM-Safe)
This patch also add some tests.
Previously any backslash and the character following it were ignored.
This commit adds a fall through to match the character following the
backslash without checking whether it is "special".
This allows callers to use the following semantics:
using MyVariant = Variant<Empty, int>;
template<typename T>
size_t size() { return TypeList<T>::size; }
auto s = size<MyVariant>();
This will be needed for an upcoming IPC change, which will result in us
knowing the Variant type, but not the underlying variadic types that the
Variant holds.
`OwnPtrWithCustomDeleter` was a decorator which provided the ability
to add a custom deleter to `OwnPtr` by wrapping and taking the deleter
as a run-time argument to the constructor. This solution means that no
additional space is needed for the `OwnPtr` because it doesn't need to
store a pointer to the deleter, but comes at the cost of having an
extra type that stores a pointer for every instance.
This logic is moved directly into `OwnPtr` by adding a template
argument that is defaulted to the default deleter for the type. This
means that the type itself stores the pointer to the deleter instead
of every instance and adds some type safety by encoding the deleter in
the type itself instead of taking a run-time argument.
Note that this still keeps the old behaviour of putting things in std by
default on serenity so the tools can be happy, but if USING_AK_GLOBALLY
is unset, AK behaves like a good citizen and doesn't try to put things
in the ::std namespace.
std::nothrow_t and its friends get to stay because I'm being told that
compilers assume things about them and I can't yeet them into a
different namespace...for now.
Implement insertion sort in AK. The cutoff value 7 is a magic number
here, values [5, 15] should work well. Main idea of the cutoff is to
reduce recursion performed by quicksort to speed up sorting
of small partitions.
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.
One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
This means that rather than this:
```
AK_TYPEDEF_DISTINCT_NUMERIC_GENERAL(u64, true, true, false, false,
false, true, FunctionAddress);
```
We now have this:
```
AK_TYPEDEF_DISTINCT_NUMERIC_GENERAL(u64, FunctionAddress, Arithmetic,
Comparison, Increment);
```
Which is a lot more readable. :^)
Co-authored-by: Ali Mohammad Pur <mpfard@serenityos.org>
When calling clear_with_capacity on an empty HashTable/HashMap, a null
deref would occur when trying to memset() m_buckets. Checking that it
has capacity before clearing fixes the issue.