Commit Graph

212 Commits

Author SHA1 Message Date
Poseydon42
bdd7531bf5 AK: Create relative path even if prefix is not an ancestor of the path 2022-12-14 15:11:03 +00:00
Ali Mohammad Pur
f96a3c002a Everywhere: Stop shoving things into ::std and mentioning them as such
Note that this still keeps the old behaviour of putting things in std by
default on serenity so the tools can be happy, but if USING_AK_GLOBALLY
is unset, AK behaves like a good citizen and doesn't try to put things
in the ::std namespace.

std::nothrow_t and its friends get to stay because I'm being told that
compilers assume things about them and I can't yeet them into a
different namespace...for now.
2022-12-14 11:44:32 +01:00
Marc Luqué
22f472249d AK: Introduce cutoff to insertion sort for Quicksort
Implement insertion sort in AK. The cutoff value 7 is a magic number
here, values [5, 15] should work well. Main idea of the cutoff is to
reduce recursion performed by quicksort to speed up sorting
of small partitions.
2022-12-12 15:03:57 +00:00
kleines Filmröllchen
16ca41ec10 AK: Add LexicalPath::is_child_of
This API checks whether this path is a child of (or the same as) another
path.
2022-12-11 16:05:23 +00:00
Maciej
58f5deba70 AK: Unref old m_data in String's move assignment
We were overridding the data pointer without unreffing it,
causing a memory leak when assigning a String.
2022-12-09 00:02:53 +01:00
Timothy Flynn
949f5460fb AK: Add formatters for Span<T> and Span<T const>
This generalizes the formatter currently used for Vector to be usable
for any Span.
2022-12-08 17:14:48 +01:00
Poseydon42
d2334957ba Tests: Add tests for Checked<> decrement operator 2022-12-08 07:20:14 -05:00
Andreas Kling
a3e82eaad3 AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.

Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.

Problems in DeprecatedString:

- It assumes string allocation never fails. This makes it impossible
  to use in allocation-sensitive contexts, and is the reason we had to
  ban DeprecatedString from the kernel entirely.

- The awkward null state. DeprecatedString can be null. It's different
  from the empty state, although null strings are considered empty.
  All code is immediately nicer when using Optional<DeprecatedString>
  but DeprecatedString came before Optional, which is how we ended up
  like this.

- The encoding of the underlying data is ambiguous. For the most part,
  we use it as if it's always UTF-8, but there have been cases where
  we pass around strings in other encodings (e.g ISO8859-1)

- operator[] and length() are used to iterate over DeprecatedString one
  byte at a time. This is done all over the codebase, and will *not*
  give the right results unless the string is all ASCII.

How we solve these issues in the new String:

- Functions that may allocate now return ErrorOr<String> so that ENOMEM
  errors can be passed to the caller.

- String has no null state. Use Optional<String> when needed.

- String is always UTF-8. This is validated when constructing a String.
  We may need to add a bypass for this in the future, for cases where
  you have a known-good string, but for now: validate all the things!

- There is no operator[] or length(). You can get the underlying data
  with bytes(), but for iterating over code points, you should be using
  an UTF-8 iterator.

Furthermore, it has two nifty new features:

- String implements a small string optimization (SSO) for strings that
  can fit entirely within a pointer. This means up to 3 bytes on 32-bit
  platforms, and 7 bytes on 64-bit platforms. Such small strings will
  not be heap-allocated.

- String can create substrings without making a deep copy of the
  substring. Instead, the superstring gets +1 refcount from the
  substring, and it acts like a view into the superstring. To make
  substrings like this, use the substring_with_shared_superstring() API.

One caveat:

- String does not guarantee that the underlying data is null-terminated
  like DeprecatedString does today. While this was nifty in a handful of
  places where we were calling C functions, it did stand in the way of
  shared-superstring substrings.
2022-12-06 15:21:26 +01:00
Linus Groh
57dc179b1f Everywhere: Rename to_{string => deprecated_string}() where applicable
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.

One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
2022-12-06 08:54:33 +01:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Linus Groh
d26aabff04 Everywhere: Run clang-format 2022-12-03 23:52:23 +00:00
Timothy Flynn
13b18a182a AK: Add JSON object/array for-each methods for fallible callbacks
This allows the provided callback to return an ErrorOr-like type to
propagate errors back to the caller.
2022-11-18 12:21:57 +00:00
Daniel Bertalan
269a931414 Tests/AK: Re-enable HashTable<double> test
The incorrect UBSan alignment check that made this test fail has been
fixed in Clang 15.

Closes #13614
2022-11-15 10:45:41 +02:00
Sam Atkins
cf046dbfdb AK: Add optional explicit cast to underlying type to DistinctNumeric 2022-11-11 17:50:53 +03:30
Sam Atkins
c33eae24f9 AK+Everywhere: Replace DistinctNumeric bool parameters with named ones
This means that rather than this:

```
AK_TYPEDEF_DISTINCT_NUMERIC_GENERAL(u64, true, true, false, false,
    false, true, FunctionAddress);
```

We now have this:
```
AK_TYPEDEF_DISTINCT_NUMERIC_GENERAL(u64, FunctionAddress, Arithmetic,
    Comparison, Increment);
```

Which is a lot more readable. :^)

Co-authored-by: Ali Mohammad Pur <mpfard@serenityos.org>
2022-11-11 17:50:53 +03:30
Zaggy1024
a1300d3797 AK: Don't crash in HashTable::clear_with_capacity on an empty table
When calling clear_with_capacity on an empty HashTable/HashMap, a null
deref would occur when trying to memset() m_buckets. Checking that it
has capacity before clearing fixes the issue.
2022-11-11 00:44:04 -07:00
Ali Mohammad Pur
40b07901ac AK: Allow Variant::downcast<OtherVariantType>()
We usually give type aliases to variants, so their variant types are not
always available, so make it possible to downcast to another variant
type.
2022-11-10 16:02:42 +03:30
Dan Klishch
fdc53a5995 AK: Add framework for a unified floating point to string conversion
Currently, the floating point to string conversion is implemented
several times across the codebase. This commit provides a pretty
low-level function to unify all of such conversions. It converts the
given double to a fixed point decimal satisfying a few correctness
criteria.
2022-11-03 20:17:09 -06:00
demostanis
7c33f8f7df AK: Add SplitBehavior::KeepTrailingSeparator with tests 2022-10-24 23:29:18 +01:00
demostanis
3e8b5ac920 AK+Everywhere: Turn bool keep_empty to an enum in split* functions 2022-10-24 23:29:18 +01:00
davidot
c9aa664eb0 AK: Make the JsonParser use the new double parser for numbers
Because we still support u64 and i64 (on top of i32 and u32) we do still
have to parse the number ourself first. Then if we determine that the
number is a floating point or is outside of the range of i64 and u64 we
fallback and parse it as a double.

Before JsonParser had ifdefs guarding the double computation, but it
just build when we error on ifdef KERNEL so JsonParser is no longer
usable in the Kernel. This can be remedied fairly easily but since
it is not needed we #error on that for now.
2022-10-23 15:48:45 +02:00
davidot
2334cd85a2 AK: Add an exact and fast hex float parsing algorithm
Similar to decimal floating point parsing the current strtod hex float
parsing gives a lot of incorrect results. We can use a similar technique
as with decimal parsing however hex floats are much simpler as we don't
need to scale with a power of 5.

For hex floats we just provide the parse_first_hexfloat API as there is
currently no need for a parse_hexfloat_completely API.

Again the accepted input for parse_first_hexfloat is very lenient and
any validation should be done before calling this method.
2022-10-23 15:48:45 +02:00
davidot
53b7f5e6a1 AK: Add an exact and fast floating point parsing algorithm
This is based on the paper by Daniel Lemire called
"Number parsing at a Gigabyte per second", currently available at
https://arxiv.org/abs/2101.11408
An implementation can be found at
https://github.com/fastfloat/fast_float

To support both strtod like methods and String::to_double we have two
different APIs. The parse_first_floating_point gives back both the
result, next character to read and the error/out of range status.
Out of range here means we rounded to infinity 0.

The other API, parse_floating_point_completely, will return a floating
point only if the given character range contains just the floating point
and nothing else. This can be much faster as we can skip actually
computing the value if we notice we did not parse the whole range.

Both of these APIs support a very lenient format to be usable in as many
places as possible. Also it does not check for "named" values like
"nan", "inf", "NAN" etc. Because this can be different for every usage.

For integers and small values this new method is not faster and often
even a tiny bit slower than the current strtod implementation. However
the strtod implementation is wrong for a lot of values and has a much
less predictable running time.

For correctness this method was tested against known string -> double
datasets from https://github.com/nigeltao/parse-number-fxx-test-data
This method gives 100% accuracy.
The old strtod gave an incorrect value in over 50% of the numbers
tested.
2022-10-23 15:48:45 +02:00
davidot
bf6d4a5cbf AK: Make truncating UFixedBigInts constexpr
Also add some tests and shift tests while we're at it.
2022-10-23 15:48:45 +02:00
Timothy Flynn
b9dc0b7d1b AK: Do not append string bytes as code points when title-casing a string
By appending individual bytes as code points, we were "breaking apart"
multi-byte UTF-8 code points. This now behaves the same way as the
invert_case() helper in StringUtils.
2022-10-20 18:55:43 +02:00
Sam Atkins
a0d44026fc AK+Tests: Correct off-by-one error when right-trimming text
If the entire string you want to right-trim consists of characters you
want to remove, we previously would incorrectly leave the first
character there.

For example: `trim("aaaaa", "a")` would return "a" instead of "".

We can't use `i >= 0` in the loop since that would fail to detect
underflow, so instead we keep `i` in the range `size .. 1` and then
subtract 1 from it when reading the character.

Added some trim() tests while I was at it. (And to confirm that this was
the issue.)
2022-10-11 17:49:32 +02:00
Ben Wiederhake
ff8f3814cc AK+Tests: Avoid creating invalid code points from malformed UTF-8
Instead of doing anything reasonable, Utf8CodePointIterator returned
invalid code points, for example U+123456. However, many callers of this
iterator assume that a code point is always at most 0x10FFFF.

In fact, this is one of two reasons for the following OSS Fuzz issue:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=49184
This is probably a very old bug.

In the particular case of URLParser, AK::is_url_code_point got confused:
    return /* ... */ || code_point >= 0xA0;
If code_point is a "code point" beyond 0x10FFFF, this violates the
condition given in the preceding comment, but satisfies the given
condition, which eventually causes URLParser to crash.

This commit fixes *only* the erroneous UTF-8 decoding, and does not
fully resolve OSS-Fuzz#49184.
2022-10-09 10:37:20 -06:00
Nico Weber
2af028132a AK+Everywhere: Add AK_COMPILER_{GCC,CLANG} and use them most places
Doesn't use them in libc headers so that those don't have to pull in
AK/Platform.h.

AK_COMPILER_GCC is set _only_ for gcc, not for clang too. (__GNUC__ is
defined in clang builds as well.) Using AK_COMPILER_GCC simplifies
things some.

AK_COMPILER_CLANG isn't as much of a win, other than that it's
consistent with AK_COMPILER_GCC.
2022-10-04 23:35:07 +01:00
Andreas Kling
287a9b552a AK: Fix bad parsing of some file:/// URLs with base URL
We were dropping the base URL path components in the resulting URL due
to mistakenly determining the input URL to start with a Windows drive
letter. Fix this, add a spec link, and a test.
2022-09-20 15:38:53 +02:00
Hendiadyoin1
6b6510b577 AK+Tests: Don't double-destroy NoAllocationGuard in TestFixedArray
This caused the m_allocation_enabled_previously member to be technically
uninitialized when the compiler emits the implicit destructor call for
stack allocated classes.
This was pointed out by gcc on lagom builds, no clue how this was flying
under the radar for so long and is not triggering CI.
2022-09-15 23:04:46 +00:00
Brian Gianforcaro
d0a1775369 Everywhere: Fix a variety of typos
Spelling fixes found by `codespell`.
2022-09-14 04:46:49 +00:00
davidot
75ebcf6b4a AK: Allow exponents in JSON double values
This is required for ECMA-404 compliance, but probably not for serenity
itself.
2022-09-02 02:07:37 +01:00
Jelle Raaijmakers
8483064b59 AK: Add FloatingPoint.h
This is a set of functions that allow you to convert between arbitrary
IEEE 754 floating point types, as long as they can be represented
within 64 bits. Conversion methods between floats and doubles are
provided, as well as a generic `float_to_float()`.

Example usage:

  #include <AK/FloatingPoint.h>

  double val = 1.234;
  auto weird_f16 =
      convert_from_native_double<FloatingPointBits<0, 6, 10>>(val);

Signed and unsigned floats are supported, and both NaN and +/-Inf are
handled correctly. Values that do not fit in the target floating point
type are clamped.
2022-08-27 12:28:05 +02:00
Linus Groh
5a106b6401 Everywhere: Prefix 'TYPEDEF_DISTINCT_NUMERIC_GENERAL' with 'AK_' 2022-07-22 23:09:43 +01:00
Ali Mohammad Pur
0d6dc74951 AK: Use the correct data types in bitap_bitwise()
Otherwise the bit twiddling goes all wrong and breaks some boundary
cases.
Fixes `StringView::contains(31-chars)`.
2022-07-14 13:10:23 +02:00
sin-ack
d16544100f Tests: Remove StringView char const* initialization test
We now explicitly disallow this.
2022-07-12 23:11:35 +02:00
sin-ack
604aac531c AK+Userland+Tests: Remove URL(char const*) constructor
The StringView(char const*) constructor is being removed, and there was
only a few users of this left, which are also cleaned up in this commit.
2022-07-12 23:11:35 +02:00
sin-ack
3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
sin-ack
c70f45ff44 Everywhere: Explicitly specify the size in StringView constructors
This commit moves the length calculations out to be directly on the
StringView users. This is an important step towards the goal of removing
StringView(char const*), as it moves the responsibility of calculating
the size of the string to the user of the StringView (which will prevent
naive uses causing OOB access).
2022-07-12 23:11:35 +02:00
sin-ack
f6b1db37fc Tests: Convert TestBase64 decode test to use StringViews directly
Previously it would rely on the implicit StringView conversions. Now the
decode_equal function will directly use StringViews.
2022-07-12 23:11:35 +02:00
sin-ack
3e1d0d9425 Tests: Make TestSourceLocation basic_scenario specify StringView length 2022-07-12 23:11:35 +02:00
Luke Wilde
da25ac0d48 AK: Treat empty string as invalid JSON
Previously we would treat the empty string as `null`. This caused
JavaScript like this to fail:
```js
var object = {};
try {
    object = JSON.parse("");
} catch {}
var array = object.array || [];
```
Since `JSON.parse("")` returned null instead of throwing, it would set
`object` to null and then try and use it instead of using the default
backup value.
2022-07-10 23:31:48 +02:00
Maciej
36676a1604 AK: Add IPv4Address::netmask_from_cidr 2022-07-09 09:22:25 +01:00
DexesTTP
7ceeb74535 AK: Use an enum instead of a bool for String::replace(all_occurences)
This commit has no behavior changes.

In particular, this does not fix any of the wrong uses of the previous
default parameter (which used to be 'false', meaning "only replace the
first occurence in the string"). It simply replaces the default uses by
String::replace(..., ReplaceMode::FirstOnly), leaving them incorrect.
2022-07-06 11:12:45 +02:00
Daniel Bertalan
e15d6125b2 Tests: Move sprintf test from AK/ to LibC/
This test doesn't test AK::String, but LibC's sprintf instead, so it
does not belong in `Tests/AK`. This also means this test won't be ran on
Lagom using the host OS's printf implementation.

Fixes a deprecated declaration warning when compiling with macOS SDK 13.
2022-07-04 21:46:02 +02:00
Hendiadyoin1
5bf84a5b0e AK: Zero previous pointer *after* fixing the insertion list in HashTable 2022-06-23 20:25:12 +03:00
Idan Horowitz
eb02425ef9 AK: Clear the previous and next pointers of deleted HashTable buckets
Usually the values of the previous and next pointers of deleted buckets
are never used, as they're not part of the main ordered bucket chain,
but if an in-place rehashing is done, which results in the bucket being
turned into a free bucket, the stale pointers will remain, at which
point any item that is inserted into said free-bucket will have either
a stale previous pointer if the HashTable was empty on insertion, or a
stale next pointer, resulting in undefined behaviour.

This commit also includes a new HashMap test that reproduces this issue
2022-06-22 21:53:13 +02:00
Andreas Kling
ede818cbf9 AK: Disable the HashTable<double> test until UB issue is fixed 2022-04-11 00:11:53 +02:00
Andreas Kling
ae6b09f4dc AK: Add hash traits for floating-point primitives
This allows us to use float and double as hash keys.
2022-04-10 12:39:44 +02:00
Timothy Flynn
9e5abec6f1 AK: Invalidate UTF-8 encoded code points larger than U+10ffff
On oss-fuzz, the LibJS REPL is provided a file encoded with Windows-1252
with the following contents:

    /ô¡°½/

The REPL assumes the input file is UTF-8. So in Windows-1252, the above
is represented as [0x2f 0xf4 0xa1 0xb0 0xbd 0x2f]. The inner 4 bytes are
actually a valid UTF-8 encoding if we only look at the most significant
bits to parse leading/continuation bytes. However, it decodes to the
code point U+121c3d, which is not a valid code point.

This commit adds additional validation to ensure the decoded code point
itself is also valid.
2022-04-05 00:14:29 +01:00