Commit Graph

3476 Commits

Author SHA1 Message Date
Andreas Kling
2b8a920a7c AK: Don't blindly use SipHash as default hash function
Although it has some interesting properties, SipHash is brutally slow
compared to our previous hash function. Since its introduction, it has
been highly visible in every profile of doing anything interesting with
LibJS or LibWeb.

By switching back, we gain a 10x speedup for 32-bit hashes, and "only"
a 3x speedup for 64-bit hashes.

This comes out to roughly 1.10x faster HashTable insertion, and roughly
2.25x faster HashTable lookup. Hashing is no longer at the top of
profiles and everything runs measurably faster.

For security-sensitive hash tables with user-controlled inputs, we can
opt into SipHash selectively on a case-by-case basis. The vast majority
of our uses don't fit that description though.
2024-03-25 12:39:23 +01:00
Timothy Flynn
7e38653492 AK: Reject invalid Base64 encoded string lengths 2024-03-25 08:13:27 +01:00
Timothy Flynn
4ecf4c7617 AK: Compute the exact size of decoded Base64 strings 2024-03-25 08:13:27 +01:00
Timothy Flynn
754ff41b9c AK: Remove whitespace skipping feature from AK's Base64 decoder
This was added in commit f2663f477f as a
partial implementation of what is now LibWeb's forgiving Base64 decoder.
All use cases within LibWeb that require whitespace skipping now use
that implementation instead.

Removing this feature from AK allows us to know the exact output size of
a decoded Base64 string. We can still trim whitespace at the start and
end of the input though; for example, this is useful when reading from a
file that may have a newline at the end of the file.
2024-03-25 08:13:27 +01:00
Timothy Flynn
690db10463 AK: Convert Base64 template parameters to regular function parameters
The generated function name is otherwise very long, which makes stack
traces a bit more difficult to sift through.
2024-03-25 08:13:27 +01:00
Timothy Flynn
f292746134 AK: Convert some west-consts to east-const in Base64.cpp
Caught by clang-format-17. Note that clang-format-16 is fine with this
as well (it leaves the const placement alone), it just doesn't perform
the formatting to east-const itself.
2024-03-25 08:13:27 +01:00
Andreas Kling
3bdfca1119 AK: Make FlyString::from_utf8*() avoid allocation if possible
If we already have a FlyString instantiated for the given string,
look that up and return it instead of making a temporary String just to
use as a key into the FlyString table.
2024-03-24 13:28:24 +01:00
Andreas Kling
8d7a1e5654 LibWeb: Skip some redundant UTF-8 validation in CSS tokenizer
If we're just adding code points to a StringBuilder, there's no need to
revalidate the result.
2024-03-24 13:28:24 +01:00
Andreas Kling
a88799c032 AK: Remove excessive hashing caused by FlyString table
Before this change, the global FlyString table looked like this:

    HashMap<StringView, Detail::StringBase>

After this change, we have:

    HashTable<Detail::StringData const*, FlyStringTableHashTraits>

The custom hash traits are used to extract the stored hash from
StringData which avoids having to rehash the StringView repeatedly like
we did before.

This necessitated a handful of smaller changes to make it work.
2024-03-24 13:28:24 +01:00
Andreas Kling
8bfad24708 AK: Move AK::Detail::StringData to its own header file
This will allow us to access it from FlyString.cpp
2024-03-24 13:28:24 +01:00
Dan Klishch
45a0ba2167 AK: Introduce AK::enumerate
Co-Authored-By: Tim Flynn <trflynn89@pm.me>
2024-03-23 09:02:58 -04:00
Stanisław Wiśniewski
994fe0b89f AK: Use else if constexpr in explode_byte() 2024-03-21 14:35:20 -06:00
Timothy Flynn
81ad6de41b AK: Avoid creating an intermediate buffer when decoding a Base64 string
There's no need to copy the result. We can also avoid increasing the
size of the output buffer by 1 for each written byte.

This reduces the runtime of `./bin/base64 -d enwik8.base64 >/dev/null`
from 0.917s to 0.632s.

(enwik8 is a 100MB test file from http://mattmahoney.net/dc/enwik8.zip)
2024-03-21 15:53:46 +01:00
Timothy Flynn
0fd7ad09a0 AK: Avoid StringBuilder when creating a Base64-encoded string
We don't really need the features provided by StringBuilder here, since
we know the exact size of the output. Avoiding StringBuilder avoids the
recurring capacity/size checks both within StringBuilder itself and its
internal ByteBuffer.

This reduces the runtime of `./bin/base64 enwik8 >/dev/null` from
0.976s to 0.428s.

(enwik8 is a 100MB test file from http://mattmahoney.net/dc/enwik8.zip)
2024-03-21 15:53:46 +01:00
Timothy Flynn
5f5b8ee9bb AK: Do not perform UTF-8 validation on Base64-encoded strings
We know we are only appending ASCII characters to the StringBuilder, so
do not bother validating the result.

This reduces the runtime of `./bin/base64 enwik8 >/dev/null` from
1.192s to 0.976s.

(enwik8 is a 100MB test file from http://mattmahoney.net/dc/enwik8.zip)
2024-03-21 15:53:46 +01:00
Andrew Kaster
e9b16970fe AK: Add base64url encoding and decoding methods
This encoding scheme comes from section 5 of RFC 4648, as an
alternative to the standard base64 encode/decode methods.

The only difference is that the last two characters are replaced
with '-' and '_', as '+' and '/' are not safe in URLs or filenames.
2024-03-20 12:18:57 -04:00
Shannon Booth
e800605ad3 AK+LibURL: Move AK::URL into a new URL library
This URL library ends up being a relatively fundamental base library of
the system, as LibCore depends on LibURL.

This change has two main benefits:
 * Moving AK back more towards being an agnostic library that can
   be used between the kernel and userspace. URL has never really fit
   that description - and is not used in the kernel.
 * URL _should_ depend on LibUnicode, as it needs punnycode support.
   However, it's not really possible to do this inside of AK as it can't
   depend on any external library. This change brings us a little closer
   to being able to do that, but unfortunately we aren't there quite
   yet, as the code generators depend on LibCore.
2024-03-18 14:06:28 -04:00
Andreas Kling
6724f840cd AK: Early return from empty hash table lookups to avoid hashing
When calling get() or find() on an empty HashTable or HashMap, we can
avoid hashing the sought-after key.
2024-03-16 14:27:59 +01:00
Timothy Flynn
e4213f5767 AK: Generalize Span::contains_slow to use the Traits infrastructure
This allows, for example, checking if a Span<String> contains a value
without having to allocate a String.
2024-03-16 08:42:33 +01:00
Timothy Flynn
faf4ba63c2 AK: Don't use east-constexpr in Span methods 2024-03-16 08:42:33 +01:00
Ali Mohammad Pur
d451f84f31 LibCrypto: Add a minimal DER encoder
Progress towards #23562.
2024-03-16 01:17:02 -06:00
Andreas Kling
d125a76f85 AK: Make FlyString-to-FlyString comparison inline & trivial
This should never boil down to more than a machine word comparison.
2024-03-14 12:42:08 +01:00
Ali Mohammad Pur
8003bde03d AK+LibRegex+LibWasm: Remove the non-const COWVector::operator[]
This was copying the vector behind our backs, let's remove it and make
the copying explicit by putting it behind COWVector::mutable_at().
This is a further 64% performance improvement on Wasm validation.
2024-03-12 17:10:47 +01:00
Ali Mohammad Pur
cefe177a56 AK+LibRegex: Move COWVector to AK
This is about to gain a new user, so move it to AK.
2024-03-12 17:10:47 +01:00
Timothy Flynn
e3b5e24ce0 AK: Iterate the bytes of a URL query with an unsigned type
Otherwise, we percent-encode negative signed chars incorrectly. For
example, https://www.strava.com/login contains the following hidden
<input> field:

    <input name="utf8" type="hidden" value="✓" />

On submitting the form, we would percent-encode that field as:

    utf8=%-1E%-64%-6D

Which would cause us to receive an HTTP 500 response. We now properly
percent-encode that field as:

    utf8=%E2%9C%93

And can login to Strava :^)
2024-03-10 15:17:31 +01:00
Nico Weber
58838db445 LibGfx: Add the start of a JBIG2 loader
JBIG2 is infamous for two things:

1. It's used in xerox scanners were it falsifies scanned numbers:

https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_are_switching_written_numbers_when_scanning

2. It was allegedly used in an iOS zero day, in a very cool way:

https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-into-nso-zero-click.html

Needless to say, we need support for it in Serenity. (...because it's
used in PDF files.)

This adds all the scaffolding, but no actual implementation yet.

It's enough for `file` to print the mime type of .jb2 files, but `image`
can't do anything with the files yet.
2024-03-09 16:01:22 +01:00
Timothy Flynn
82ea53cf10 AK: Add a StringView method to count the number of lines in a string
We already have a helper to split a StringView by line while considering
"\n", "\r", and "\r\n". Add an analagous method to just count the number
of lines in the same manner.
2024-03-08 14:43:33 -05:00
Timothy Flynn
07a27b2ec0 AK: Replace the boolean parameter of StringView::lines with a named enum 2024-03-08 14:43:33 -05:00
Matthew Olsson
a511f1ef85 AK: Add HashMap::ensure_capacity 2024-03-06 07:45:56 +01:00
Filiph Siitam Sandström
fd694e8672 AK+Lagom: Make it possible to build for iOS
This commit makes it possible to build AK and most of Lagom for iOS,
based on the work for the Ladybird build demoed on discord:
https://discord.com/channels/830522505605283862/830525031720943627/1211987732646068314
2024-03-03 13:13:42 -07:00
Hendiadyoin1
79fd8eb28d AK/HashMap: Use structured bindings when iterating over itself 2024-03-01 14:05:53 -07:00
Nico Weber
f8b8d1b3be AK: Add is_ascii_uppercase_hex_digit() 2024-03-01 14:17:42 +01:00
Timothy Flynn
d878975f95 AK+LibJS: Remove OFFSET_OF and its users
With the LibJS JIT removed, let's not expose pointers to internal
members.
2024-02-29 09:00:00 +01:00
Andrew Kaster
21ac431fac AK: Allow reading from EOF buffered streams better in read_line()
If the BufferedStream is able to fill its entire circular buffer in
populate_read_buffer() and is later asked to read a line or read until
a delimiter, it could erroneously return EMSGSIZE if the caller's buffer
was smaller than the internal buffer. In this case, all we really care
about is whether the caller's buffer is big enough for however much data
we're going to copy into it. Which needs to take into account the
candidate.
2024-02-26 13:16:27 -07:00
Dan Klishch
ba24e86fdd AK: Introduce IntrusiveBinaryHeap and reimplement BinaryHeap using it
The main difference between them is that IntrusiveBinaryHeap can
optionally maintain an index inside every stored node that allows
arbitrary nodes to be deleted.
2024-02-25 17:24:36 -07:00
Hendiadyoin1
38cb5444d9 AK: Make StringView::for_each_split_view() aware of IterationDecision 2024-02-24 16:43:44 -07:00
Dan Klishch
8ac0e3f0e5 AK+LibJS: Remove null state from DeprecatedFlyString :^) 2024-02-24 15:06:52 -07:00
Dan Klishch
061f902f95 AK+Userland: Introduce ByteString::create_and_overwrite
And replace two users of raw StringImpl with it.
2024-02-24 15:06:52 -07:00
Ali Mohammad Pur
bc301b6f40 AK+LibXML+JSSpecCompiler: Move LineTrackingLexer to AK
This is a simple extension of GenericLexer, and is used in more than
just LibXML, so let's move it into AK.
The move also resolves a FIXME, which is removed in this commit.
2024-02-16 15:26:43 +01:00
Lucas CHOLLET
cbfea68ed8 AK: Add BigEndianInputBitStream::bits_until_next_byte_boundary() 2024-02-12 14:08:56 +01:00
Nico Weber
d84b69ace9 AK: Add to_array()
This is useful if you want an array with an explicit type but still
want its size to be inferred.
2024-02-11 18:53:00 +01:00
Nico Weber
10216e1743 AK: Remove a stray static
No behavior change.
2024-02-11 18:53:00 +01:00
Nico Weber
4409b33145 AK: Make IndexSequence use size_t
This makes it possible to use MakeIndexSequqnce in functions like:

    template<typename T, size_t N>
    constexpr auto foo(T (&a)[N])

This means AK/StdLibExtraDetails.h must now include AK/Types.h
for size_t, which means AK/Types.h can no longer include
AK/StdLibExtras.h (which arguably it shouldn't do anyways),
which requires rejiggering some things.

(IMHO Types.h shouldn't use AK::Details metaprogramming at all.
FlatPtr doesn't necessarily have to use Conditional<> and ssize_t could
maybe be in its own header or something. But since it's tangential to
this PR, going with the tried and true "lift things that cause the
cycle up to the top" approach.)
2024-02-11 18:53:00 +01:00
Tim Ledbetter
4a7236cabf Everywhere: Prefer _string when constructing strings from literals 2024-02-08 11:01:10 -05:00
Dan Klishch
88af15d513 AK: Store JsonValue's value in AK::Variant 2024-02-08 08:04:05 -07:00
Andrew Kaster
bc9c710904 LibWeb: Hide WebDriver::match_route debug behind its own flag
When enabling WEBDRIVER_DEBUG globally, this function's debug spam
overpowers the rest of the useful logs.
2024-02-08 15:53:46 +01:00
Dan Klishch
677bcea771 ntpquery: Use AK::convert_between_host_and_network_endian
Instead of polluting global namespace with definitions from
libkern/OSByteOrder.h and machine/endian.h on MacOS, just use AK
functions for conversions.
2024-02-06 04:37:47 -07:00
vincent-rg
a9df60ff1c AK: Update OptionParser::m_arg_index by substracting skipped args
On argument swapping to put positional ones toward the end,
m_arg_index was pointing at "last arg  index" + "skipped args" +
"consumed args" and thus was pointing ahead of the skipped ones.

m_arg_index now points after the current parsed option arguments.
2024-02-06 00:08:30 +01:00
Dan Klishch
3e43d15440 Everywhere: Prefer VERIFY over assert() 2024-02-05 07:03:53 -05:00
Nico Weber
41f57a5477 AK: Remove the SIMD version of rsqrt() too, for good measure
No strong reason to remove this one, other than that it's also unused.
2024-01-30 10:02:33 +01:00