StringView::for_each_split_view allows you to process the splits in a
StringView without needing to allocate a Vector<StringView> to store
each of the parts.
Since we migrated the implementation from the normal split_view path, we
can also re-implement split_view in terms of for_each_split_view.
Currently, we define a CaseInsensitiveStringTraits structure for String.
Using this structure for StringView involves allocating a String from
that view, and a second string to convert that intermediate string to
lowercase.
This defines CaseInsensitiveStringViewTraits (and the underlying helper
case_insensitive_string_hash) to avoid allocations.
This removes the awkward String::replace API which was the only String
API which mutated the String and replaces it with a new immutable
version that returns a new String with the replacements applied. This
also fixes a couple of UAFs that were caused by the use of this API.
As an optimization an equivalent StringView::replace API was also added
to remove an unnecessary String allocations in the format of:
`String { view }.replace(...);`
This was needlessly copying StringView arguments, and was also using
strstr internally, which meant it was doing a bunch of unnecessary
strlen calls on it. This also moves the implementation to StringUtils
to allow API consistency between String and StringView.
This typo / bug in the Traits<T> implementation for StringView caused
AK::HashMap methods to return a `String` when looking up values out of
a hash map of type HashTable<StringView,StringView>.
This change fixes the typo, and fixes the only consumer, the kernel
Commandline class.
This implements StringUtils::find_any_of() and uses it in
String::find_any_of() and StringView::find_any_of(). All uses of
find_{first,last}_of have been replaced with find_any_of(), find() or
find_last(). find_{first,last}_of have subsequently been removed.
This implements the StringView::find_all() method by re-implemeting the
current method existing for String in StringUtils, and using that
implementation for both String and StringView.
The rewrite uses memmem() instead of strstr(), so the String::find_all()
argument type has been changed from String to StringView, as the null
byte is no longer required.
This removes StringView::find_first_of(char) and find_last_of(char) and
replaces all its usages with find and find_last respectively. This is
because those two methods are functionally equivalent.
find_{first,last}_of should only be used if searching for multiple
different characters, which is never the case with the char argument.
This also adds the [[nodiscard]] to the remaining find_{first,last}_of
methods.
This patch reimplements the StringView::find methods in StringUtils, so
they can also be used by String. The methods now also take an optional
start parameter, which moves their API in line with String's respective
methods.
This also implements a StringView::find_ast(char) method, which is
currently functionally equivalent to find_last_of(char). This is because
find_last_of(char) will be removed in a further commit.
This patch refactors StringImpl::to_{lower,upper}case to use the new
static methods StringImpl::create_{lower,upper}cased if they have to use
to create a new StringImpl. This allows implementing StringView's
to_{lower,upper}case_string using the same methods.
It also replaces the usage of hand-written to_ascii_lowercase() and
similar methods with those from CharacterTypes.h.
Also add some tests to ensure that they _remain_ constexpr.
In general, any runtime assertions, weirdo C casts, pointer aliasing,
and such shenanigans should be gated behind the (helpfully newly added)
AK::is_constant_evaluated() function when the intention is to write
constexpr-capable code.
a.k.a. deliver promises of constexpr-ness :P
Previously this was generating a crazy number of symbols, and it was
also pretty-damn-slow as it was defined recursively, which made the
compiler incapable of inlining it (due to the many many layers of
recursion before it terminated).
This commit replaces the recursion with a pack expansion and marks it
always-inline.
Problem:
- Much of the `GenericLexer` can be `constexpr`, but is not.
Solution:
- Make it `constexpr` and de-duplicate code.
- Extend some of `StringView` with `constexpr` to support.
- Add tests to ensure `constexpr` behavior.
Note:
- Construction of `StringView` from pointer and length is not
`constexpr`-compatible at the moment because the VERIFY cannot be,
yet.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
We had an unusual optimization in AK::StringView where constructing
a StringView from a String would cause it to remember the internal
StringImpl pointer of the String.
This was used to make constructing a String from a StringView fast
and copy-free.
I tried removing this optimization and indeed we started seeing a
ton of allocation traffic. However, all of it was due to a silly
pattern where functions would take a StringView and then go on
to create a String from it.
I've gone through most of the code and updated those functions to
simply take a String directly instead, which now makes this
optimization unnecessary, and indeed a source of bloat instead.
So, let's get rid of it and make StringView a little smaller. :^)
A new operator, operator""sv was added as of C++17 to support
string_view literals. This allows string_views to be constructed
from string literals and with no runtime cost to find the string
length.
See: https://en.cppreference.com/w/cpp/string/basic_string_view/operator%22%22sv
This change implements that functionality in AK::StringView.
We do have to suppress some warnings about implementing reserved
operators as we are essentially implementing STL functions in AK
as we have no STL :).
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)
Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.
We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.
Don't compute the strlen() of the string we're comparing against first.
This can save a lot of time if we're comparing against something that
already fails to match in the first few characters.
Problem:
- Many constructors are defined as `{}` rather than using the ` =
default` compiler-provided constructor.
- Some types provide an implicit conversion operator from `nullptr_t`
instead of requiring the caller to default construct. This violates
the C++ Core Guidelines suggestion to declare single-argument
constructors explicit
(https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#c46-by-default-declare-single-argument-constructors-explicit).
Solution:
- Change default constructors to use the compiler-provided default
constructor.
- Remove implicit conversion operators from `nullptr_t` and change
usage to enforce type consistency without conversion.
The symbol name insertion scheme is different from objdump -d's.
Compare the output on Build/Userland/id:
* disasm:
...
_start (08048305-0804836b):
08048305 push ebp
...
08048366 call 0x0000df56
0804836b o16 nop
0804836d o16 nop
0804836f nop
(deregister_tm_clones (08048370-08048370))
08048370 mov eax, 0x080643e0
...
_ZN2AK8Utf8ViewC1ERKNS_6StringE (0805d9b2-0805d9b7):
_ZN2AK8Utf8ViewC2ERKNS_6StringE (0805d9b2-0805d9b7):
0805d9b2 jmp 0x00014ff2
0805d9b7 nop
* objdump -d:
08048305 <_start>:
8048305: 55 push %ebp
...
8048366: e8 9b dc 00 00 call 8056006 <exit>
804836b: 66 90 xchg %ax,%ax
804836d: 66 90 xchg %ax,%ax
804836f: 90 nop
08048370 <deregister_tm_clones>:
8048370: b8 e0 43 06 08 mov $0x80643e0,%eax
...
0805d9b2 <_ZN2AK8Utf8ViewC1ERKNS_6StringE>:
805d9b2: e9 eb f6 ff ff jmp 805d0a2 <_ZN2AK10StringViewC1ERKNS_6StringE>
805d9b7: 90 nop
Differences:
1. disasm can show multiple symbols that cover the same instructions.
I've only seen this happen for C1/C2 (and D1/D2) ctor/dtor pairs,
but it could conceivably happen with ICF as well.
2. disasm separates instructions that do not belong to a symbol with
a newline, so that nop padding isn't shown as part of a function
when it technically isn't.
3. disasm shows symbols that are skipped (due to having size 0)
in parenthesis, separated from preceding and following instructions.