Commit Graph

71 Commits

Author SHA1 Message Date
Andreas Kling
3c039903fb LibTextCodec+AK: Don't validate UTF-8 strings twice
UTF8Decoder was already converting invalid data into replacement
characters while converting, so we know for sure we have valid UTF-8
by the time conversion is finished.

This patch adds a new StringBuilder::to_string_without_validation()
and uses it to make UTF8Decoder avoid half the work it was doing.
2023-12-30 13:49:50 +01:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Liav A
b2fd51f561 AK: Implement string formatting for FixedStringBuffers
To ensure this happens without duplicating code, we allow forcing a
StringBuilder object to only use the inline buffer, so the code in the
AK/Format.cpp file doesn't need to deal with different underlying
storage types (expandable or inline-fixed) at all.
2023-08-12 11:48:48 -06:00
Linus Groh
e76394d96c AK: Remove infallible version of StringBuilder::to_byte_buffer
Also drop the try_ prefix from the fallible function, as it is no longer
needed to distinguish the two.
2023-03-09 15:51:00 +00:00
Karol Baraniecki
b4b283670d AK: Introduce a fallible version of StringBuilder::to_byte_buffer
Name it StringBuilder::try_to_byte_buffer accordingly :^)
2023-03-09 12:59:57 +00:00
Sam Atkins
1453ac79e7 AK: Add StringBuilder::to_fly_string() 2023-02-15 12:48:26 -05:00
Timothy Flynn
79aaa2fe0f AK: Allow the kernel to have access to StringBuilder::to_string
This is mostly to prevent String.h from acquiring ifdef-soup. In any
case, it's fine for the kernel to see this symbol as it is fallible.
2023-01-28 00:13:46 +00:00
Linus Groh
6e7459322d AK: Remove StringBuilder::build() in favor of to_deprecated_string()
Having an alias function that only wraps another one is silly, and
keeping the more obvious name should flush out more uses of deprecated
strings.
No behavior change.
2023-01-27 20:38:49 +00:00
MacDue
2366265c53 AK: Add StringBuilder::try_join()
This is a failable version of StringBuilder::join().
2023-01-14 12:37:00 +01:00
MacDue
9a120d7243 AK: Add support for "debug only" formatters
These are formatters that can only be used with debug print
functions, such as dbgln(). Currently this is limited to
Formatter<ErrorOr<T>>. With this you can still debug log ErrorOr
values (good for debugging), but trying to use them in any
String::formatted() call will fail (which prevents .to_string()
errors with the new failable strings being ignored).

You make a formatter debug only by adding a constexpr method like:
static constexpr bool is_debug_only() { return true; }
2023-01-13 21:09:26 +00:00
Ali Mohammad Pur
543890c5c9 AK: Add a fallible StringBuilder::create() factory function
This is nice, and is also used by the Jakt runtime.
2022-12-11 20:44:54 +03:30
Moustafa Raafat
b8f1e1bed2 Everywhere: Remove unnecessary AK and Detail namespace scoping 2022-12-09 11:25:30 +00:00
Andreas Kling
a3e82eaad3 AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.

Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.

Problems in DeprecatedString:

- It assumes string allocation never fails. This makes it impossible
  to use in allocation-sensitive contexts, and is the reason we had to
  ban DeprecatedString from the kernel entirely.

- The awkward null state. DeprecatedString can be null. It's different
  from the empty state, although null strings are considered empty.
  All code is immediately nicer when using Optional<DeprecatedString>
  but DeprecatedString came before Optional, which is how we ended up
  like this.

- The encoding of the underlying data is ambiguous. For the most part,
  we use it as if it's always UTF-8, but there have been cases where
  we pass around strings in other encodings (e.g ISO8859-1)

- operator[] and length() are used to iterate over DeprecatedString one
  byte at a time. This is done all over the codebase, and will *not*
  give the right results unless the string is all ASCII.

How we solve these issues in the new String:

- Functions that may allocate now return ErrorOr<String> so that ENOMEM
  errors can be passed to the caller.

- String has no null state. Use Optional<String> when needed.

- String is always UTF-8. This is validated when constructing a String.
  We may need to add a bypass for this in the future, for cases where
  you have a known-good string, but for now: validate all the things!

- There is no operator[] or length(). You can get the underlying data
  with bytes(), but for iterating over code points, you should be using
  an UTF-8 iterator.

Furthermore, it has two nifty new features:

- String implements a small string optimization (SSO) for strings that
  can fit entirely within a pointer. This means up to 3 bytes on 32-bit
  platforms, and 7 bytes on 64-bit platforms. Such small strings will
  not be heap-allocated.

- String can create substrings without making a deep copy of the
  substring. Instead, the superstring gets +1 refcount from the
  substring, and it acts like a view into the superstring. To make
  substrings like this, use the substring_with_shared_superstring() API.

One caveat:

- String does not guarantee that the underlying data is null-terminated
  like DeprecatedString does today. While this was nifty in a handful of
  places where we were calling C functions, it did stand in the way of
  shared-superstring substrings.
2022-12-06 15:21:26 +01:00
Linus Groh
57dc179b1f Everywhere: Rename to_{string => deprecated_string}() where applicable
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.

One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
2022-12-06 08:54:33 +01:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Andreas Kling
ae3ffdd521 AK: Make it possible to not using AK classes into the global namespace
This patch adds the `USING_AK_GLOBALLY` macro which is enabled by
default, but can be overridden by build flags.

This is a step towards integrating Jakt and AK types.
2022-11-26 15:51:34 +01:00
Lucas CHOLLET
62b8ccaffc StringBuilder: Add try_append_repeated() and append_repeated()
This two methods add the character as many times as specified by the
second parameter.
2022-09-15 14:08:21 +01:00
Idan Horowitz
9da8c78133 AK: Add a try variant of StringBuilder::append_escaped_for_json
This will allow us to make a fallible version of the JSON serializers.
2022-02-27 20:37:57 +01:00
Linus Groh
b253bca807 AK: Add optional format string parameter to String{,Builder}::join()
Allow specifying a custom format string that's being used for each item
instead of hardcoding "{}".
2022-02-23 21:53:30 +00:00
Idan Horowitz
8f093e91e0 AK: Exclude StringBuilder String APIs from the Kernel
These APIs are only used by userland, and String is OOM-infallible,
so let's just ifdef it out of the Kernel.
2022-02-16 22:21:37 +01:00
Andreas Kling
8a51f64503 AK: Increase StringBuilder's inline buffer size from 128 to 256 bytes 2021-12-26 01:42:58 +01:00
Andreas Kling
216e21a1fa AK: Convert AK::Format formatting helpers to returning ErrorOr<void>
This isn't a complete conversion to ErrorOr<void>, but a good chunk.
The end goal here is to propagate buffer allocation failures to the
caller, and allow the use of TRY() with formatting functions.
2021-11-17 00:21:13 +01:00
Andreas Kling
008355c222 AK: Add failable try_* functions to StringBuilder
These will allow us to start using TRY() with StringBuilder operations.
2021-11-17 00:21:13 +01:00
Andreas Kling
8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Andreas Kling
a15ed8743d AK: Make ByteBuffer::try_* functions return ErrorOr<void>
Same as Vector, ByteBuffer now also signals allocation failure by
returning an ENOMEM Error instead of a bool, allowing us to use the
TRY() and MUST() patterns.
2021-11-10 21:58:58 +01:00
Ali Mohammad Pur
3a9f00c59b Everywhere: Use OOM-safe ByteBuffer APIs where possible
If we can easily communicate failure, let's avoid asserting and report
failure instead.
2021-09-06 01:53:26 +02:00
Brian Gianforcaro
f0b3aa0331 Everywhere: Pass AK::Format TypeErasedFormatParams by reference
This silences a overeager warning in sonar cloud, warning that
slicing could occur with `VariadicFormatParams` which derives from
`TypeErasedFormatParams`.

Reference:
https://sonarcloud.io/project/issues?id=SerenityOS_serenity&issues=AXuVPBO_k92xXUF3qWsm&open=AXuVPBO_k92xXUF3qWsm
2021-08-30 15:50:00 +04:30
Timothy Flynn
c16aca7abf AK+Kernel: Add StringBuilder::append overload for UTF-16 views
Currently, to append a UTF-16 view to a StringBuilder, callers must
first convert the view to UTF-8 and then append the copy. Add a UTF-16
overload so callers do not need to hold an entire copy in memory.
2021-08-10 23:07:50 +02:00
Timothy Flynn
5978caf96b AK: Convert StringBuilder to use east-const 2021-08-10 23:07:50 +02:00
Ali Mohammad Pur
3829bf115c AK: Make StringBuilder::join() use appendff() instead of append()
`append()` is almost never going to select the overload that is desired.
e.g. it will append chars when you pass it a Vector<size_t>, which is
definitely not the right overload :)
2021-08-06 01:14:03 +02:00
Gunnar Beutner
de0aa44bb6 AK: Remove the m_length member for StringBuilder
Instead we can just use ByteBuffer::size() which already keeps track
of the buffer's size.
2021-05-31 14:49:00 +04:30
Max Wipfli
f51b0729f5 AK: Implement StringBuilder::append_as_lowercase(char ch)
This patch adds a convenience method to AK::StringBuilder which converts
ASCII uppercase characters to lowercase before appending them.
2021-05-18 21:02:07 +02:00
Gunnar Beutner
fcaf98361f AK: Turn ByteBuffer into a value type
Previously ByteBuffer would internally hold a RefPtr to the byte
buffer and would behave like a reference type, i.e. copying a
ByteBuffer would not create a duplicate byte buffer, but rather
two objects which refer to the same internal buffer.

This also changes ByteBuffer so that it has some internal capacity
much like the Vector<T> type. Unlike Vector<T> however a byte
buffer's data may be uninitialized.

With this commit ByteBuffer makes use of the kmalloc_good_size()
API to pick an optimal allocation size for its internal buffer.
2021-05-16 17:49:42 +02:00
Andreas Kling
834b6508d7 AK: Remove StringBuilder::appendf()
All users have been converted to using AK::Format via appendff().
2021-05-07 21:12:09 +02:00
Brian Gianforcaro
1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Brian Gianforcaro
7db74a6b3e AK: Annotate StringBuilder functions as [[nodiscard]] 2021-04-11 12:50:33 +02:00
AnotherTest
7c2754c3a6 AK+Kernel+Userland: Enable some more compiletime format string checks
This enables format string checks for three more functions:
- String::formatted()
- Builder::appendff()
- KBufferBuilder::appendff()
2021-02-23 13:59:33 +01:00
Lenny Maiorani
e6f907a155 AK: Simplify constructors and conversions from nullptr_t
Problem:
- Many constructors are defined as `{}` rather than using the ` =
  default` compiler-provided constructor.
- Some types provide an implicit conversion operator from `nullptr_t`
  instead of requiring the caller to default construct. This violates
  the C++ Core Guidelines suggestion to declare single-argument
  constructors explicit
  (https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#c46-by-default-declare-single-argument-constructors-explicit).

Solution:
- Change default constructors to use the compiler-provided default
  constructor.
- Remove implicit conversion operators from `nullptr_t` and change
  usage to enforce type consistency without conversion.
2021-01-12 09:11:45 +01:00
Sahan Fernando
ccf4368ca5 AK: Enable format string warnings for AK printf wrappers 2021-01-11 21:06:32 +01:00
Andreas Kling
54ade31d84 AK: Add some inline capacity to StringBuilder
This patch adds a 128-byte inline buffer that we use before switching
to using a dynamically growing ByteBuffer.

This allows us to avoid heap allocations in many cases, and totally
incidentally also speeds up @nico's favorite test, "disasm /bin/id"
more than 2x. :^)
2020-11-24 22:06:51 +01:00
Andreas Kling
5e164052f6 AK+Kernel: Escape JSON keys & values
Grab the escaping logic from JSON string value serialization and use
it for serializing all keys and values.

Fixes #3917.
2020-11-02 12:56:36 +01:00
asynts
6351a56d27 AK+Format: Do some housekeeping in the format implementation. 2020-10-02 20:48:19 +02:00
asynts
b7a4c4482f AK: Resolve format related circular dependencies properly.
With this commit, <AK/Format.h> has a more supportive role and isn't
used directly.

Essentially, there now is a public 'vformat' function ('v' for vector)
which takes already type erased parameters. The name is choosen to
indicate that this function behaves similar to C-style functions taking
a va_list equivalent.

The interface for frontend users are now 'String::formatted' and
'StringBuilder::appendff'.
2020-09-23 21:45:28 +02:00
asynts
e5497a326a AK: Add StringBuilder::appendff using the new format.
StringBuilder::appendf was already used, thus this name. If we some day
replace all usages of printf, we could rename this method.
2020-09-22 15:06:40 +02:00
Nico Weber
ce95628b7f Unicode: Try s/codepoint/code_point/g again
This time, without trailing 's'. Ran:

    git grep -l 'codepoint' | xargs sed -ie 's/codepoint/code_point/g
2020-08-05 22:33:42 +02:00
Nico Weber
19ac1f6368 Revert "Unicode: s/codepoint/code_point/g"
This reverts commit ea9ac3155d.
It replaced "codepoint" with "code_points", not "code_point".
2020-08-05 22:33:42 +02:00
Andreas Kling
ea9ac3155d Unicode: s/codepoint/code_point/g
Unicode calls them "code points" so let's follow their style.
2020-08-03 19:06:41 +02:00
Andreas Kling
d4bbc00901 AK: Add StringBuilder::append_codepoint(u32)
Also, implement append(Utf32View) using it.
2020-06-04 21:12:17 +02:00
Andreas Kling
86242f9c18 AK: Add StringBuilder::append(Utf32View)
This encodes the incoming UTF-32 sequence as UTF-8.
2020-05-17 22:35:25 +02:00
Andreas Kling
4eef3e5a09 AK: Add StringBuilder::join() for joining collections with a separator
This patch adds a generic StringBuilder::join(separator, collection):

    Vector<String> strings = { "well", "hello", "friends" };
    StringBuilder builder;
    builder.join("+ ", strings);
    builder.to_string(); // "well + hello + friends"
2020-03-20 14:41:23 +01:00