Currently, the floating point to string conversion is implemented
several times across the codebase. This commit provides a pretty
low-level function to unify all of such conversions. It converts the
given double to a fixed point decimal satisfying a few correctness
criteria.
Because we still support u64 and i64 (on top of i32 and u32) we do still
have to parse the number ourself first. Then if we determine that the
number is a floating point or is outside of the range of i64 and u64 we
fallback and parse it as a double.
Before JsonParser had ifdefs guarding the double computation, but it
just build when we error on ifdef KERNEL so JsonParser is no longer
usable in the Kernel. This can be remedied fairly easily but since
it is not needed we #error on that for now.
Similar to decimal floating point parsing the current strtod hex float
parsing gives a lot of incorrect results. We can use a similar technique
as with decimal parsing however hex floats are much simpler as we don't
need to scale with a power of 5.
For hex floats we just provide the parse_first_hexfloat API as there is
currently no need for a parse_hexfloat_completely API.
Again the accepted input for parse_first_hexfloat is very lenient and
any validation should be done before calling this method.
This is based on the paper by Daniel Lemire called
"Number parsing at a Gigabyte per second", currently available at
https://arxiv.org/abs/2101.11408
An implementation can be found at
https://github.com/fastfloat/fast_float
To support both strtod like methods and String::to_double we have two
different APIs. The parse_first_floating_point gives back both the
result, next character to read and the error/out of range status.
Out of range here means we rounded to infinity 0.
The other API, parse_floating_point_completely, will return a floating
point only if the given character range contains just the floating point
and nothing else. This can be much faster as we can skip actually
computing the value if we notice we did not parse the whole range.
Both of these APIs support a very lenient format to be usable in as many
places as possible. Also it does not check for "named" values like
"nan", "inf", "NAN" etc. Because this can be different for every usage.
For integers and small values this new method is not faster and often
even a tiny bit slower than the current strtod implementation. However
the strtod implementation is wrong for a lot of values and has a much
less predictable running time.
For correctness this method was tested against known string -> double
datasets from https://github.com/nigeltao/parse-number-fxx-test-data
This method gives 100% accuracy.
The old strtod gave an incorrect value in over 50% of the numbers
tested.
By appending individual bytes as code points, we were "breaking apart"
multi-byte UTF-8 code points. This now behaves the same way as the
invert_case() helper in StringUtils.
If the entire string you want to right-trim consists of characters you
want to remove, we previously would incorrectly leave the first
character there.
For example: `trim("aaaaa", "a")` would return "a" instead of "".
We can't use `i >= 0` in the loop since that would fail to detect
underflow, so instead we keep `i` in the range `size .. 1` and then
subtract 1 from it when reading the character.
Added some trim() tests while I was at it. (And to confirm that this was
the issue.)
Instead of doing anything reasonable, Utf8CodePointIterator returned
invalid code points, for example U+123456. However, many callers of this
iterator assume that a code point is always at most 0x10FFFF.
In fact, this is one of two reasons for the following OSS Fuzz issue:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=49184
This is probably a very old bug.
In the particular case of URLParser, AK::is_url_code_point got confused:
return /* ... */ || code_point >= 0xA0;
If code_point is a "code point" beyond 0x10FFFF, this violates the
condition given in the preceding comment, but satisfies the given
condition, which eventually causes URLParser to crash.
This commit fixes *only* the erroneous UTF-8 decoding, and does not
fully resolve OSS-Fuzz#49184.
Doesn't use them in libc headers so that those don't have to pull in
AK/Platform.h.
AK_COMPILER_GCC is set _only_ for gcc, not for clang too. (__GNUC__ is
defined in clang builds as well.) Using AK_COMPILER_GCC simplifies
things some.
AK_COMPILER_CLANG isn't as much of a win, other than that it's
consistent with AK_COMPILER_GCC.
We were dropping the base URL path components in the resulting URL due
to mistakenly determining the input URL to start with a Windows drive
letter. Fix this, add a spec link, and a test.
This caused the m_allocation_enabled_previously member to be technically
uninitialized when the compiler emits the implicit destructor call for
stack allocated classes.
This was pointed out by gcc on lagom builds, no clue how this was flying
under the radar for so long and is not triggering CI.
This is a set of functions that allow you to convert between arbitrary
IEEE 754 floating point types, as long as they can be represented
within 64 bits. Conversion methods between floats and doubles are
provided, as well as a generic `float_to_float()`.
Example usage:
#include <AK/FloatingPoint.h>
double val = 1.234;
auto weird_f16 =
convert_from_native_double<FloatingPointBits<0, 6, 10>>(val);
Signed and unsigned floats are supported, and both NaN and +/-Inf are
handled correctly. Values that do not fit in the target floating point
type are clamped.
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
This commit moves the length calculations out to be directly on the
StringView users. This is an important step towards the goal of removing
StringView(char const*), as it moves the responsibility of calculating
the size of the string to the user of the StringView (which will prevent
naive uses causing OOB access).
Previously we would treat the empty string as `null`. This caused
JavaScript like this to fail:
```js
var object = {};
try {
object = JSON.parse("");
} catch {}
var array = object.array || [];
```
Since `JSON.parse("")` returned null instead of throwing, it would set
`object` to null and then try and use it instead of using the default
backup value.
This commit has no behavior changes.
In particular, this does not fix any of the wrong uses of the previous
default parameter (which used to be 'false', meaning "only replace the
first occurence in the string"). It simply replaces the default uses by
String::replace(..., ReplaceMode::FirstOnly), leaving them incorrect.
This test doesn't test AK::String, but LibC's sprintf instead, so it
does not belong in `Tests/AK`. This also means this test won't be ran on
Lagom using the host OS's printf implementation.
Fixes a deprecated declaration warning when compiling with macOS SDK 13.
Usually the values of the previous and next pointers of deleted buckets
are never used, as they're not part of the main ordered bucket chain,
but if an in-place rehashing is done, which results in the bucket being
turned into a free bucket, the stale pointers will remain, at which
point any item that is inserted into said free-bucket will have either
a stale previous pointer if the HashTable was empty on insertion, or a
stale next pointer, resulting in undefined behaviour.
This commit also includes a new HashMap test that reproduces this issue
On oss-fuzz, the LibJS REPL is provided a file encoded with Windows-1252
with the following contents:
/ô¡°½/
The REPL assumes the input file is UTF-8. So in Windows-1252, the above
is represented as [0x2f 0xf4 0xa1 0xb0 0xbd 0x2f]. The inner 4 bytes are
actually a valid UTF-8 encoding if we only look at the most significant
bits to parse leading/continuation bytes. However, it decodes to the
code point U+121c3d, which is not a valid code point.
This commit adds additional validation to ensure the decoded code point
itself is also valid.
This implements Optional<T&> as a T*, whose presence has been missing
since the early days of Optional.
As a lot of find_foo() APIs return an Optional<T> which imposes a
pointless copy on the underlying value, and can sometimes be very
misleading, with this change, those APIs can return Optional<T&>.
This caused a system-wide crash because of a previous bug relating to
non-trivial types in HashTable. Therefore, check that such types
actually work under various workloads.
Thrashing is what I call the situations where a table is mostly filled
with deleted markers, causing an increase in size (at least temporarily)
when a simple re-hash would be enough to get rid of those. This happens
when a hash table (especially with many elements) has a lot of deletes
and re-inserts done to it, which is what this benchmark does.
This is an enum-like type that works with arbitrary sized storage > u64,
which is the limit for a regular enum class - which limits it to 64
members when needing bit field behavior.
Co-authored-by: Ali Mohammad Pur <mpfard@serenityos.org>
Previously, case-insensitively searching the haystack "Go Go Back" for
the needle "Go Back" would return false:
1. Match the first three characters. "Go ".
2. Notice that 'G' and 'B' don't match.
3. Skip ahead 3 characters, plus 1 for the outer for-loop.
4. Now, the haystack is effectively "o Back", so the match fails.
Reducing the skip by 1 fixes this issue. I'm not 100% convinced this
fixes all cases, but I haven't been able to find any cases where it
doesn't work now. :^)
This is the IPv6 counter part to the IPv4Address class and implements
parsing strings into a in6_addr and formatting one as a string. It
supports the address compression scheme as well as IPv4 mapped
addresses.
Parse JSON floating point literals properly,
No longer throwing a SyntaxError when the decimal portion
of the number exceeds the capacity of u32.
Added tests to AK/TestJSON and LibJS/builtins/JSON/JSON.parse
Before this was incorrectly assuming that if the current node `n` was at
least the key and the left child of `n` was below the key that `n` was
always correct.
However, the right child(ren) of the left child of `n` could still be
at least the key.
Also added some tests which produced the wrong results before this.
Apologies for the enormous commit, but I don't see a way to split this
up nicely. In the vast majority of cases it's a simple change. A few
extra places can use TRY instead of manual error checking though. :^)
Rather than casting the FixedPoint to double, format the FixedPoint
directly. This avoids using floating point instruction, which in
turn enables this to be used even in the kernel.
This makes the following code behave as expected:
Variant<int, String> x { some_string() };
x.visit(
[](String const&) {}, // Expectation is for this to be called
[](auto&) {});
Except for tangential accessors such as data(), there is no more feature
of FixedArray that is untested after this large expansion of its test
cases. These tests, with the help of the new NoAllocationGuard, also
test the allocation contract that was fixated in the last commit.
Hopefully this builds confidence in future Kernel uses of FixedArray
as well as its establishment in the real-time parts of the audio
subsystem. I'm excited :^)
FixedArray always *almost* had the following allocation guarantees:
There is (possibly) one allocation in the constructor and one (or more)
deallocation(s) in the destructor. No other operation allocates or
deallocates. With this removal of the public clear() method, which
nobody except the test used anyways, those guarantees are now completely
true and furthermore fixated with an explanatory comment.
This mechanism was unsafe to use in any multithreaded context, since
the hook function was invoked on a raw pointer *after* decrementing
the local ref count.
Since we don't use it for anything anymore, let's just get rid of it.
Currently, we define a CaseInsensitiveStringTraits structure for String.
Using this structure for StringView involves allocating a String from
that view, and a second string to convert that intermediate string to
lowercase.
This defines CaseInsensitiveStringViewTraits (and the underlying helper
case_insensitive_string_hash) to avoid allocations.
FixedArray now doesn't expose any infallible constructors anymore.
Rather, it exposes fallible methods. Therefore, it can be used for
OOM-safe code.
This commit also converts the rest of the system to use the new API.
However, as an example, VMObject can't take advantage of this yet,
as we would have to endow VMObject with a fallible static
construction method, which would require a very fundamental change
to VMObject's whole inheritance hierarchy.
The previous implementation had some pretty short cycles and two fixed
points (1711463637 and 2389024350). If two keys hashed to one of these
values insertions and lookups would loop forever.
This version is based on a standard xorshift PRNG with period 2**32-1.
The all-zero state is usually forbidden, so we insert it into the cycle
at an arbitrary location.
As it was, negative predicate test for remove_all_matching was
run on empty hash map, and could not remove anything, so test always
returned true. By duplicating it in state where hash maps contains
elements, we make sure that negative predicate has something to
do nothing on.
This is a raffinement of 49cbd4dcca.
Previously, the container was scanned to compute the size in the unhappy
path. Now, using `all_of` happy and unhappy path should be fast.
The goal of this file is to enable C++ overloaded functions for
standard builtin functions that we use. It contains fallback
implementations for systems that do not have the builtins available.
This unbreaks the /var/run/utmp system which starts out as an empty
string, and is then turned into an object by the first update.
This isn't necessarily the best way for this to work, but it's how
it used to work, so this just fixes the regression for now.
This isn't a complete conversion to ErrorOr<void>, but a good chunk.
The end goal here is to propagate buffer allocation failures to the
caller, and allow the use of TRY() with formatting functions.
Also add slightly richer parse errors now that we can include a string
literal with returned errors.
This will allow us to use TRY() when working with JSON data.
When I added this code in 1472f6d, I forgot to add tests for it. That's
why I didn't realize that the values were appended to the wrong
FormatBuilder object, so an empty string was returned instead of the
expected "nan"/"inf". This made debugging some FPU issues with the
ScummVM port significantly more difficult.
In the long-term, we should probably have a way to signal decoding
failure. For now, it should suffice to at least not crash. This is
particularly relevant because apparently this can be triggered while
parsing a PEM certificate, which happens during every TLS connection.
Found by OSS Fuzz
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38979
DisjointChunks<T> provides a nice interface over multiple sequential
Vector<T>'s, allowing the user to iterate over/index into/slice from
said buffers as if they were a single contiguous buffer.
To work with views on such objects, DisjointSpans<T> is provided, which
has the same behaviour but does not own the underlying objects.
This removes the awkward String::replace API which was the only String
API which mutated the String and replaces it with a new immutable
version that returns a new String with the replacements applied. This
also fixes a couple of UAFs that were caused by the use of this API.
As an optimization an equivalent StringView::replace API was also added
to remove an unnecessary String allocations in the format of:
`String { view }.replace(...);`
This avoids a value copy when calling value() or value_or() on a
temporary Optional. This is very common when using the HashMap::get()
API like this:
auto value = hash_map.get(key).value_or(fallback_value);
Using a file(GLOB) to find all the test files in a directory is an easy
hack to get things started, but has some drawbacks. Namely, if you add
a test, it won't be found again without re-running CMake. `ninja` seems
to do this automatically, but it would be nice to one day stop seeing it
rechecking our globbed directories.
When swapping the same object, we could end up with a double-free error.
This was found while quick-sorting a Vector of Variants holding complex
types, reproduced by the new swap_same_complex_object test case.
This parsing is already duplicated between LibJS and LibRegex, and will
shortly be needed in more places in those libraries. Move it to AK to
prevent further duplication.
This API will consume escaped Unicode code points of the form:
\\u{code point}
\\unnnn (where each n is a hexadecimal digit)
\\unnnn\\unnnn (where the two escaped values are a surrogate pair)
This helps us avoid weird truncation issues and fixes a bug on Clang
builds where truncation while reading caused the DIE offsets following
large LEB128 numbers to be incorrect. This removes the need for the
separate `LongUnsignedNumber` type.
Improve the parsing of data urls in URLParser to bring it more up-to-
spec. At the moment, we cannot parse the components of the MIME type
since it is represented as a string, but the spec requires it to be
parsed as a "MIME type record".
The following commit broke Tests/AK/TestJSON.cpp as it removed the
file that the test loaded from disk to validate JSON parsing.
commit ad141a2286
Author: Andreas Kling <kling@serenityos.org>
Date: Sat Jul 31 15:26:14 2021 +0200
Base: Remove "test.frm" from HackStudio test project
Instead of restoring the file, lets just embed a bit of JSON in the
test case to avoid using external resources, as they obviously are
surprising and make the test less portable across environments.
Previously there was no way to create a MACAddress by passing a direct
address as a string. This will allow programs like the arp utility to
create a MACAddress instance by user-passed addresses.
This is a generally nicer-to-use version of the existing {any,all}_of()
that doesn't require the user to explicitly provide two iterators.
As a bonus, it also allows arbitrary iterators (as opposed to the hard
requirement of providing SimpleIterators in the iterator version).
The state of the formatter for the previous element should be thrown
away for each iteration. This showed up when trying to format a
Vector<String>, since Formatter<StringView> was unhappy about some state
that gets set when it's called. Add a test for Formatter<Vector>.
Prior to this, it'd try to stuff them into an i64, which could fail and
give us nothing.
Even though this is an extension we've made to JSON, the parser should
be able to correctly round-trip from whatever our serialiser has
generated.
Since Clang enables a couple of warnings that we don't have in GCC,
these were not caught before. Included fixes:
- Use correct printf format string for `size_t`
- Don't compare Nonnull(Ref|Own)Ptr` to nullptr
- Fix unsigned int& => unsigned long& conversion
Let's bring this class back, but without the confusing resize() API.
A FixedArray<T> is simply a fixed-size array of T.
The size is provided at run-time, unlike Array<T> where the size is
provided at compile-time.
This adds a test case for String::find and String::find_all with empty
needles. The expected behavior is in line with what the C++ standard
library (and other languages standard libraries) expect.
This implements StringUtils::find_any_of() and uses it in
String::find_any_of() and StringView::find_any_of(). All uses of
find_{first,last}_of have been replaced with find_any_of(), find() or
find_last(). find_{first,last}_of have subsequently been removed.
This removes StringView::find_first_of(char) and find_last_of(char) and
replaces all its usages with find and find_last respectively. This is
because those two methods are functionally equivalent.
find_{first,last}_of should only be used if searching for multiple
different characters, which is never the case with the char argument.
This also adds the [[nodiscard]] to the remaining find_{first,last}_of
methods.
This replaces the current LexicalPath::append() API with a new method
that returns a new LexicalPath object and doesn't touch the this-object.
With this, LexicalPath is now immutable. It also adds a
LexicalPath::parent() method and the relevant test cases.
Since this is always set to true on the non-default constructor and
subsequently never modified, it is somewhat pointless. Furthermore,
there are arguably no invalid relative paths.
Also add some tests to ensure that they _remain_ constexpr.
In general, any runtime assertions, weirdo C casts, pointer aliasing,
and such shenanigans should be gated behind the (helpfully newly added)
AK::is_constant_evaluated() function when the intention is to write
constexpr-capable code.
a.k.a. deliver promises of constexpr-ness :P
This commit converts naked `new`s to `AK::try_make` and `AK::try_create`
wherever possible. If the called constructor is private, this can not be
done, so we instead now use the standard-defined and compiler-agnostic
`new (nothrow)`.
This declares all test cases which compare function outputs over the
entire Unicode range as `BENCHMARK_CASE`, to avoid them being run by CI.
This reduces runtime of TestCharacterTypes (without benchmarks) by about
one third.
The insert_before method on AK::InlineLinkedList is used, so in order to
achieve feature parity, we need to implement it for AK::IntrusiveList as
well.
Doing these as custom classes might be faster, especially when writing
them in SSE, but this would cause a lot of Code duplication and due to
the nature of constexprs and the intelligence of the compiler they might
be using SSE/MMX either way
This commit makes it possible to instantiate `Vector<T&>` and use it
to store references to `T` in a vector.
All non-pointer observers are made to return the reference, and the
pointer observers simply yield the underlying pointer.
Note that the 'find_*' methods act on the values and not the pointers
that are stored in the vector.
This commit also makes errors in various vector methods much more
readable by directly using requires-clauses on them.
And finally, it should be noted that Vector cannot hold temporaries :^)
Because non-ASCII code points have negative byte values, trimming away
control characters requires checking for negative bytes values.
This also adds a test case with a URL containing non-ASCII code points.
StringView::lines() supports line-separators “\n”, “\r”, and “\r\n”.
The method will drop an entire line if it is surrounded by “\r”
and “\n” separators on the left and right sides respectively.
The previous behavior was to always VERIFY that the UTF-8 bytes were
valid when iterating over the code points of an UTF8View. This change
makes it so we instead output the 0xFFFD 'REPLACEMENT CHARACTER'
code point when encountering invalid bytes, and keep iterating the
view after skipping one byte.
Leaving the decision to the consumer would break symmetry with the
UTF32View API, which would in turn require heavy refactoring and/or
code duplication in generic code such as the one found in
Gfx::Painter and the Shell.
To make it easier for the consumers to detect the original bytes, we
provide a new method on the iterator that returns a Span over the
data that has been decoded. This method is immediately used in the
TextNode::compute_text_for_rendering method, which previously did
this in a ad-hoc waay.
This also add tests for the new behavior in TestUtf8.cpp, as well
as reinforcements to the existing tests to check if the underlying
bytes match up with their expected values.
This patch introduces a new operator== to compare an Optional to its
contained type directly. If the Optional does not contain a value, the
comparison will always return false.
This also adds a test case for the new behavior as well as comparison
between Optional objects themselves.
This adds more tests for AK::URL. Furthermore, this also changes some
tests to conform to what the reworked URL class does (and the URL
specification mostly expects).
This adds a peek method for Utf8CodepointIterator, which enables it to
be used in some parsing cases where peeking is necessary.
peek(0) is equivalent to operator*, expect that peek() does not contain
any assertions and will just return an empty Optional<u32>.
This also implements a test case for iterating UTF-8.
Previously, we would go crazy and shift things way out of bounds.
Add tests to verify that the decoding algorithm is safe around the
limits of the result type.
We had two functions for doing mostly the same thing. Combine both
of them into String::find() and use that everywhere.
Also add some tests to cover basic behavior.
Problem:
- `static` variables consume memory and sometimes are less
optimizable.
- `static const` variables can be `constexpr`, usually.
- `static` function-local variables require an initialization check
every time the function is run.
Solution:
- If a global `static` variable is only used in a single function then
move it into the function and make it non-`static` and `constexpr`.
- Make all global `static` variables `constexpr` instead of `const`.
- Change function-local `static const[expr]` variables to be just
`constexpr`.
Previously <AK/Function.h> also included <AK/OwnPtr.h>. That's about to
change though. This patch fixes a few build problems that will occur
when that change happens.
This changes Variant::visit() to forward the value returned by the
selected visitor invocation. By perfectly forwarding the returned value,
this allows for the visitor to return by value or reference.
Note that all provided visitors must return the same type - the compiler
will otherwise fail with the message: "inconsistent deduction for auto
return type".
The current code is factored such that reads to the entirety of the last
byte should be dropped. This was relying on the fact that last would be
one past the end in that case. Instead of actually reading that byte
when it's completely out of bounds of the bitmask, just skip reads that
would be invalid. Add more tests to make sure that the behavior is
correct for byte aligned reads of byte aligned bitmaps.
The should_not_destroy test case intentionally performs an invalid stack
access on a NeverDestroyed to confirm that the destructor for the held
type was not called.
We can't unref an object to destruction while there's still a live
RefPtr to the object, otherwise the RefPtr destructor will try to
destroy it again, accessing the refcount of a destroyed object (before
realizing that oops! the object is already dead)
Unfortunately adopt_ref requires a reference, which obviously does not
work well with when attempting to harden against allocation failure.
The adopt_ref_if_nonnull() variant will allow you to avoid using bare
pointers, while still allowing you to handle allocation failure.
This allows the construction of `Variant<int, int, int>`.
While this might not seem useful, it is very useful for making variants
that contain a series of member function pointers, which I plan to use
in LibGL for glGenLists() and co.