This parameter allows to start searching after an offset. For example,
to resume a search.
It is unfortunately a breaking change in API so this patch also modifies
one user and one test.
This implements a FlyString that will de-duplicate String instances. The
FlyString will store the raw encoded data of the String instance: If the
String is a short string, FlyString holds the String::ShortString bytes;
otherwise FlyString holds a pointer to the Detail::StringData.
FlyString itself does not know about String's storage or how to refcount
its Detail::StringData. It defers to String to implement these details.
Since AK can't refer to LibUnicode directly, the strategy here is that
if you need case transformations, you can link LibUnicode and receive
them. If you try to use either of these methods without linking it, then
you'll of course get a linker error (note we don't do any fallbacks to
e.g. ASCII case transformations). If you don't need these methods, you
don't have to link LibUnicode.
DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so
let's rename it to A) match the name of DeprecatedString, B) write a new
FlyString class that is tied to String.
This allows us to make all comparision operators on the class constexpr
without pulling in a bunch of boilerplate. We don't use the `<compare>`
header because it doesn't compile in the main serenity cross-build due
to the include paths to LibC being incompatible with how libc++ expects
them to be for clang builds.
These instances were detected by searching for files that include
AK/StdLibExtras.h, but don't match the regex:
\\b(abs|AK_REPLACED_STD_NAMESPACE|array_size|ceil_div|clamp|exchange|for
ward|is_constant_evaluated|is_power_of_two|max|min|mix|move|_RawPtr|RawP
tr|round_up_to_power_of_two|swap|to_underlying)\\b
(Without the linebreaks.)
This regex is pessimistic, so there might be more files that don't
actually use any "extra stdlib" functions.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
These instances were detected by searching for files that include
AK/Format.h, but don't match the regex:
\\b(CheckedFormatString|critical_dmesgln|dbgln|dbgln_if|dmesgln|FormatBu
ilder|__FormatIfSupported|FormatIfSupported|FormatParser|FormatString|Fo
rmattable|Formatter|__format_value|HasFormatter|max_format_arguments|out
|outln|set_debug_enabled|StandardFormatter|TypeErasedFormatParams|TypeEr
asedParameter|VariadicFormatParams|v_critical_dmesgln|vdbgln|vdmesgln|vf
ormat|vout|warn|warnln|warnln_if)\\b
(Without the linebreaks.)
This regex is pessimistic, so there might be more files that don't
actually use any formatting functions.
Observe that this revealed that Userland/Libraries/LibC/signal.cpp is
missing an include.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
In practice, this function does not take any perceptible amount of time.
However, this benchmark demonstrates that for extreme values, the
internal for-loop does matter.
Using policy based design `SinglyLinkedList` and
`SinglyLinkedListWithCount` can be combined into one class which takes
a policy to determine how to keep track of the size of the list. The
default policy is to use list iteration to count the items in the list
each time. The `WithCount` form is a different policy which tracks the
size, but comes with the overhead of storing the count and
incrementing/decrementing on each modification.
This model is extensible to have other forms of counting by
implementing only a new policy instead of implementing a totally new
type.
In 7c5e30daaa, the focus was "only" on
Userland/Libraries/, whereas this commit cleans up the remaining
headers in the repo, and any new badly-formatted include.
The class is very similar to `CircularDuplexStream` in its behavior.
Main differences are that `CircularBuffer`:
- does not inherit from `AK::Stream`
- uses `ErrorOr` for its API
- is heap allocated (and OOM-Safe)
This patch also add some tests.
Previously any backslash and the character following it were ignored.
This commit adds a fall through to match the character following the
backslash without checking whether it is "special".
This allows callers to use the following semantics:
using MyVariant = Variant<Empty, int>;
template<typename T>
size_t size() { return TypeList<T>::size; }
auto s = size<MyVariant>();
This will be needed for an upcoming IPC change, which will result in us
knowing the Variant type, but not the underlying variadic types that the
Variant holds.
`OwnPtrWithCustomDeleter` was a decorator which provided the ability
to add a custom deleter to `OwnPtr` by wrapping and taking the deleter
as a run-time argument to the constructor. This solution means that no
additional space is needed for the `OwnPtr` because it doesn't need to
store a pointer to the deleter, but comes at the cost of having an
extra type that stores a pointer for every instance.
This logic is moved directly into `OwnPtr` by adding a template
argument that is defaulted to the default deleter for the type. This
means that the type itself stores the pointer to the deleter instead
of every instance and adds some type safety by encoding the deleter in
the type itself instead of taking a run-time argument.
Note that this still keeps the old behaviour of putting things in std by
default on serenity so the tools can be happy, but if USING_AK_GLOBALLY
is unset, AK behaves like a good citizen and doesn't try to put things
in the ::std namespace.
std::nothrow_t and its friends get to stay because I'm being told that
compilers assume things about them and I can't yeet them into a
different namespace...for now.
Implement insertion sort in AK. The cutoff value 7 is a magic number
here, values [5, 15] should work well. Main idea of the cutoff is to
reduce recursion performed by quicksort to speed up sorting
of small partitions.
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.
One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
This means that rather than this:
```
AK_TYPEDEF_DISTINCT_NUMERIC_GENERAL(u64, true, true, false, false,
false, true, FunctionAddress);
```
We now have this:
```
AK_TYPEDEF_DISTINCT_NUMERIC_GENERAL(u64, FunctionAddress, Arithmetic,
Comparison, Increment);
```
Which is a lot more readable. :^)
Co-authored-by: Ali Mohammad Pur <mpfard@serenityos.org>
When calling clear_with_capacity on an empty HashTable/HashMap, a null
deref would occur when trying to memset() m_buckets. Checking that it
has capacity before clearing fixes the issue.
Currently, the floating point to string conversion is implemented
several times across the codebase. This commit provides a pretty
low-level function to unify all of such conversions. It converts the
given double to a fixed point decimal satisfying a few correctness
criteria.
Because we still support u64 and i64 (on top of i32 and u32) we do still
have to parse the number ourself first. Then if we determine that the
number is a floating point or is outside of the range of i64 and u64 we
fallback and parse it as a double.
Before JsonParser had ifdefs guarding the double computation, but it
just build when we error on ifdef KERNEL so JsonParser is no longer
usable in the Kernel. This can be remedied fairly easily but since
it is not needed we #error on that for now.
Similar to decimal floating point parsing the current strtod hex float
parsing gives a lot of incorrect results. We can use a similar technique
as with decimal parsing however hex floats are much simpler as we don't
need to scale with a power of 5.
For hex floats we just provide the parse_first_hexfloat API as there is
currently no need for a parse_hexfloat_completely API.
Again the accepted input for parse_first_hexfloat is very lenient and
any validation should be done before calling this method.
This is based on the paper by Daniel Lemire called
"Number parsing at a Gigabyte per second", currently available at
https://arxiv.org/abs/2101.11408
An implementation can be found at
https://github.com/fastfloat/fast_float
To support both strtod like methods and String::to_double we have two
different APIs. The parse_first_floating_point gives back both the
result, next character to read and the error/out of range status.
Out of range here means we rounded to infinity 0.
The other API, parse_floating_point_completely, will return a floating
point only if the given character range contains just the floating point
and nothing else. This can be much faster as we can skip actually
computing the value if we notice we did not parse the whole range.
Both of these APIs support a very lenient format to be usable in as many
places as possible. Also it does not check for "named" values like
"nan", "inf", "NAN" etc. Because this can be different for every usage.
For integers and small values this new method is not faster and often
even a tiny bit slower than the current strtod implementation. However
the strtod implementation is wrong for a lot of values and has a much
less predictable running time.
For correctness this method was tested against known string -> double
datasets from https://github.com/nigeltao/parse-number-fxx-test-data
This method gives 100% accuracy.
The old strtod gave an incorrect value in over 50% of the numbers
tested.
By appending individual bytes as code points, we were "breaking apart"
multi-byte UTF-8 code points. This now behaves the same way as the
invert_case() helper in StringUtils.
If the entire string you want to right-trim consists of characters you
want to remove, we previously would incorrectly leave the first
character there.
For example: `trim("aaaaa", "a")` would return "a" instead of "".
We can't use `i >= 0` in the loop since that would fail to detect
underflow, so instead we keep `i` in the range `size .. 1` and then
subtract 1 from it when reading the character.
Added some trim() tests while I was at it. (And to confirm that this was
the issue.)
Instead of doing anything reasonable, Utf8CodePointIterator returned
invalid code points, for example U+123456. However, many callers of this
iterator assume that a code point is always at most 0x10FFFF.
In fact, this is one of two reasons for the following OSS Fuzz issue:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=49184
This is probably a very old bug.
In the particular case of URLParser, AK::is_url_code_point got confused:
return /* ... */ || code_point >= 0xA0;
If code_point is a "code point" beyond 0x10FFFF, this violates the
condition given in the preceding comment, but satisfies the given
condition, which eventually causes URLParser to crash.
This commit fixes *only* the erroneous UTF-8 decoding, and does not
fully resolve OSS-Fuzz#49184.
Doesn't use them in libc headers so that those don't have to pull in
AK/Platform.h.
AK_COMPILER_GCC is set _only_ for gcc, not for clang too. (__GNUC__ is
defined in clang builds as well.) Using AK_COMPILER_GCC simplifies
things some.
AK_COMPILER_CLANG isn't as much of a win, other than that it's
consistent with AK_COMPILER_GCC.
We were dropping the base URL path components in the resulting URL due
to mistakenly determining the input URL to start with a Windows drive
letter. Fix this, add a spec link, and a test.
This caused the m_allocation_enabled_previously member to be technically
uninitialized when the compiler emits the implicit destructor call for
stack allocated classes.
This was pointed out by gcc on lagom builds, no clue how this was flying
under the radar for so long and is not triggering CI.
This is a set of functions that allow you to convert between arbitrary
IEEE 754 floating point types, as long as they can be represented
within 64 bits. Conversion methods between floats and doubles are
provided, as well as a generic `float_to_float()`.
Example usage:
#include <AK/FloatingPoint.h>
double val = 1.234;
auto weird_f16 =
convert_from_native_double<FloatingPointBits<0, 6, 10>>(val);
Signed and unsigned floats are supported, and both NaN and +/-Inf are
handled correctly. Values that do not fit in the target floating point
type are clamped.
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
This commit moves the length calculations out to be directly on the
StringView users. This is an important step towards the goal of removing
StringView(char const*), as it moves the responsibility of calculating
the size of the string to the user of the StringView (which will prevent
naive uses causing OOB access).
Previously we would treat the empty string as `null`. This caused
JavaScript like this to fail:
```js
var object = {};
try {
object = JSON.parse("");
} catch {}
var array = object.array || [];
```
Since `JSON.parse("")` returned null instead of throwing, it would set
`object` to null and then try and use it instead of using the default
backup value.
This commit has no behavior changes.
In particular, this does not fix any of the wrong uses of the previous
default parameter (which used to be 'false', meaning "only replace the
first occurence in the string"). It simply replaces the default uses by
String::replace(..., ReplaceMode::FirstOnly), leaving them incorrect.
This test doesn't test AK::String, but LibC's sprintf instead, so it
does not belong in `Tests/AK`. This also means this test won't be ran on
Lagom using the host OS's printf implementation.
Fixes a deprecated declaration warning when compiling with macOS SDK 13.
Usually the values of the previous and next pointers of deleted buckets
are never used, as they're not part of the main ordered bucket chain,
but if an in-place rehashing is done, which results in the bucket being
turned into a free bucket, the stale pointers will remain, at which
point any item that is inserted into said free-bucket will have either
a stale previous pointer if the HashTable was empty on insertion, or a
stale next pointer, resulting in undefined behaviour.
This commit also includes a new HashMap test that reproduces this issue
On oss-fuzz, the LibJS REPL is provided a file encoded with Windows-1252
with the following contents:
/ô¡°½/
The REPL assumes the input file is UTF-8. So in Windows-1252, the above
is represented as [0x2f 0xf4 0xa1 0xb0 0xbd 0x2f]. The inner 4 bytes are
actually a valid UTF-8 encoding if we only look at the most significant
bits to parse leading/continuation bytes. However, it decodes to the
code point U+121c3d, which is not a valid code point.
This commit adds additional validation to ensure the decoded code point
itself is also valid.
This implements Optional<T&> as a T*, whose presence has been missing
since the early days of Optional.
As a lot of find_foo() APIs return an Optional<T> which imposes a
pointless copy on the underlying value, and can sometimes be very
misleading, with this change, those APIs can return Optional<T&>.
This caused a system-wide crash because of a previous bug relating to
non-trivial types in HashTable. Therefore, check that such types
actually work under various workloads.
Thrashing is what I call the situations where a table is mostly filled
with deleted markers, causing an increase in size (at least temporarily)
when a simple re-hash would be enough to get rid of those. This happens
when a hash table (especially with many elements) has a lot of deletes
and re-inserts done to it, which is what this benchmark does.
This is an enum-like type that works with arbitrary sized storage > u64,
which is the limit for a regular enum class - which limits it to 64
members when needing bit field behavior.
Co-authored-by: Ali Mohammad Pur <mpfard@serenityos.org>
Previously, case-insensitively searching the haystack "Go Go Back" for
the needle "Go Back" would return false:
1. Match the first three characters. "Go ".
2. Notice that 'G' and 'B' don't match.
3. Skip ahead 3 characters, plus 1 for the outer for-loop.
4. Now, the haystack is effectively "o Back", so the match fails.
Reducing the skip by 1 fixes this issue. I'm not 100% convinced this
fixes all cases, but I haven't been able to find any cases where it
doesn't work now. :^)
This is the IPv6 counter part to the IPv4Address class and implements
parsing strings into a in6_addr and formatting one as a string. It
supports the address compression scheme as well as IPv4 mapped
addresses.
Parse JSON floating point literals properly,
No longer throwing a SyntaxError when the decimal portion
of the number exceeds the capacity of u32.
Added tests to AK/TestJSON and LibJS/builtins/JSON/JSON.parse
Before this was incorrectly assuming that if the current node `n` was at
least the key and the left child of `n` was below the key that `n` was
always correct.
However, the right child(ren) of the left child of `n` could still be
at least the key.
Also added some tests which produced the wrong results before this.
Apologies for the enormous commit, but I don't see a way to split this
up nicely. In the vast majority of cases it's a simple change. A few
extra places can use TRY instead of manual error checking though. :^)
Rather than casting the FixedPoint to double, format the FixedPoint
directly. This avoids using floating point instruction, which in
turn enables this to be used even in the kernel.
This makes the following code behave as expected:
Variant<int, String> x { some_string() };
x.visit(
[](String const&) {}, // Expectation is for this to be called
[](auto&) {});
Except for tangential accessors such as data(), there is no more feature
of FixedArray that is untested after this large expansion of its test
cases. These tests, with the help of the new NoAllocationGuard, also
test the allocation contract that was fixated in the last commit.
Hopefully this builds confidence in future Kernel uses of FixedArray
as well as its establishment in the real-time parts of the audio
subsystem. I'm excited :^)
FixedArray always *almost* had the following allocation guarantees:
There is (possibly) one allocation in the constructor and one (or more)
deallocation(s) in the destructor. No other operation allocates or
deallocates. With this removal of the public clear() method, which
nobody except the test used anyways, those guarantees are now completely
true and furthermore fixated with an explanatory comment.
This mechanism was unsafe to use in any multithreaded context, since
the hook function was invoked on a raw pointer *after* decrementing
the local ref count.
Since we don't use it for anything anymore, let's just get rid of it.
Currently, we define a CaseInsensitiveStringTraits structure for String.
Using this structure for StringView involves allocating a String from
that view, and a second string to convert that intermediate string to
lowercase.
This defines CaseInsensitiveStringViewTraits (and the underlying helper
case_insensitive_string_hash) to avoid allocations.
FixedArray now doesn't expose any infallible constructors anymore.
Rather, it exposes fallible methods. Therefore, it can be used for
OOM-safe code.
This commit also converts the rest of the system to use the new API.
However, as an example, VMObject can't take advantage of this yet,
as we would have to endow VMObject with a fallible static
construction method, which would require a very fundamental change
to VMObject's whole inheritance hierarchy.
The previous implementation had some pretty short cycles and two fixed
points (1711463637 and 2389024350). If two keys hashed to one of these
values insertions and lookups would loop forever.
This version is based on a standard xorshift PRNG with period 2**32-1.
The all-zero state is usually forbidden, so we insert it into the cycle
at an arbitrary location.
As it was, negative predicate test for remove_all_matching was
run on empty hash map, and could not remove anything, so test always
returned true. By duplicating it in state where hash maps contains
elements, we make sure that negative predicate has something to
do nothing on.
This is a raffinement of 49cbd4dcca.
Previously, the container was scanned to compute the size in the unhappy
path. Now, using `all_of` happy and unhappy path should be fast.
The goal of this file is to enable C++ overloaded functions for
standard builtin functions that we use. It contains fallback
implementations for systems that do not have the builtins available.
This unbreaks the /var/run/utmp system which starts out as an empty
string, and is then turned into an object by the first update.
This isn't necessarily the best way for this to work, but it's how
it used to work, so this just fixes the regression for now.
This isn't a complete conversion to ErrorOr<void>, but a good chunk.
The end goal here is to propagate buffer allocation failures to the
caller, and allow the use of TRY() with formatting functions.
Also add slightly richer parse errors now that we can include a string
literal with returned errors.
This will allow us to use TRY() when working with JSON data.
When I added this code in 1472f6d, I forgot to add tests for it. That's
why I didn't realize that the values were appended to the wrong
FormatBuilder object, so an empty string was returned instead of the
expected "nan"/"inf". This made debugging some FPU issues with the
ScummVM port significantly more difficult.
In the long-term, we should probably have a way to signal decoding
failure. For now, it should suffice to at least not crash. This is
particularly relevant because apparently this can be triggered while
parsing a PEM certificate, which happens during every TLS connection.
Found by OSS Fuzz
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38979