Now it actually only exposes methods to allocate uninitialized storage
and to create substring with a shared superstring. All the details of
the memory layout are fully encapsulated.
The idea is to eventually get rid of protected state in StringBase. To
do this, we first need to remove all references to m_data and
m_short_string from String.
This starts separating memory management of string data and string
utilities like `String::formatted`. This would also allow to reuse the
same storage in `DeprecatedString` in the future.
This method (unlike can_read_line) ensures that the delimiter is present
in the buffer, and doesn't return true after eof when the delimiter is
absent.
`JsonValue::to_byte_string` has peculiar type-erasure semantics which is
not usually intended. Unfortunately, it also has a very stereotypical
name which does not warn about unexpected behavior. So let's prefix it
with `deprecated_` to make new code use `as_string` if it just wants to
get string value or `serialized<StringBuilder>` if it needs to do proper
serialization.
A bunch of users used consume_specific with a constant ByteString
literal, which can be replaced by an allocation-free StringView literal.
The generic consume_while overload gains a requires clause so that
consume_specific("abc") causes a more understandable and actionable
error.
The current algorithm is currently O(N^2) because we forward-search an
ever-increasing substring of the haystack. This implementation reduces
the search time of a 500,000-length string (where the desired needle is
at index 0) from 72 seconds to 2-3 milliseconds.
Caught by clang-format-17. Note that clang-format-16 is fine with this
as well (it leaves the const placement alone), it just doesn't perform
the formatting to east-const itself.
Using a vector to represent a list of painting commands results in many
reallocations, especially on pages with a lot of content.
This change addresses it by introducing a SegmentedVector, which allows
fast appending by representing a list as a sequence of fixed-size
vectors. Currently, this new data structure supports only the
operations used in RecordingPainter, which are appending and iterating.
Much of the UTF-8 data that we'll iterate over will be ASCII only,
and we can get a significant speed-up by simply having a fast path
when the iterator points at a byte that is obviously an ASCII character
(<= 0x7F).
Since we're already building up a percent-encoded ASCII-only string
in the internal parser buffer, there's no need to do a second UTF-8
validation pass before assigning each part of the parsed URL.
This makes URL parsing signficantly faster.
Instead of do a wrappy MUST(try_append_code_point()), we now inline
the UTF-8 encoding logic. This allows us to grow the buffer by the
right increment up front, and also removes a bunch of ErrorOr ceremony
that we don't care about.
Once we know that a code point must be a valid ASCII character,
we now cast it to `char` and avoid the expensive generic
StringView::contains(u32 code_point) checks.
This dramatically speeds up URL parsing.
UTF8Decoder was already converting invalid data into replacement
characters while converting, so we know for sure we have valid UTF-8
by the time conversion is finished.
This patch adds a new StringBuilder::to_string_without_validation()
and uses it to make UTF8Decoder avoid half the work it was doing.
Instead of using a StringBuilder, add a String::repeated(String, N)
overload that takes advantage of knowing it's already all UTF-8.
This makes the following microbenchmark go 4x faster:
"foo".repeat(100_000_000)
And for single character strings, we can even go 10x faster:
"x".repeat(100_000_000)
In a bunch of cases, this actually ends up simplifying the code as
to_number will handle something such as:
```
Optional<I> opt;
if constexpr (IsSigned<I>)
opt = view.to_int<I>();
else
opt = view.to_uint<I>();
```
For us.
The main goal here however is to have a single generic number conversion
API between all of the String classes.