Now that ""_string is infallible, the only benefit of explicitly
constructing a short string is the ability to do it at compile-time. But
we never do that, so let's simplify the API and remove this
implementation detail from it.
Parsing 'data:' URLs took it's own route. It never set standard URL
fields like path, query or fragment (except for scheme) and instead
gave us separate methods called `data_payload()`, `data_mime_type()`,
and `data_payload_is_base64()`.
Because parsing 'data:' didn't use standard fields, running the
following JS code:
new URL('#a', 'data:text/plain,hello').toString()
not only cleared the path as URLParser doesn't check for data from
data_payload() function (making the result be 'data:#a'), but it also
crashes the program because we forbid having an empty MIME type when we
serialize to string.
With this change, 'data:' URLs will be parsed like every other URLs.
To decode the 'data:' URL contents, one needs to call process_data_url()
on a URL, which will return a struct containing MIME type with already
decoded data! :^)
By not clearing the buffer, we were leaking the path part of a URL into
the query for URLs without an authority component (no '//host').
This could be seen most noticeably in mailto: URLs with header fields
set, as the query part of `mailto:user@example.com?subject=test` was
parsed to `user@example.comsubject=test`.
data: URLs didn't have this problem, because we have a special case for
parsing them.
In order to follow spec text to achieve this, we need to change the
underlying representation of a host in AK::URL to deserialized format.
Before this, we were parsing the host and then immediately serializing
it again.
Making that change resulted in a whole bunch of fallout.
After this change, callers can access the serialized data through
this concept-host-serializer. The functional end result of this
change is that IPv6 hosts are now correctly serialized to be
surrounded with '[' and ']'.
I misunderstood the spec step for checking whether the host 'ends with a
number'. We can't simply check for it if ends with a number, this check
is actually an algorithm which is required to avoid detecting hosts that
end with a number from an IPv4 host.
Implement this missing step, and add a test to cover this.
This is just a straight (and fairly inefficient) implementation of IPv6
parsing and serialization from the URL spec.
Note that we don't use AK::IPv6Address here because the URL spec
requires a specific serialization behavior.
This now searches the memory in blocks, which should be slightly more
efficient. However, it doesn't make much difference (e.g. ~1% in LZMA
compression) in most real-world applications, as the non-hint function
is more expensive by orders of magnitude.
The "operation modes" of this function have very different focuses, and
trying to combine both in a way where we share the most amount of code
probably results in the worst performance.
Instead, split up the function into "existing distances" and "no
existing distances" so that we can optimize either case separately.
We will be adding extra logic to the CircularBuffer to optimize
searching, but this would negatively impact the performance of
CircularBuffer users that don't need that functionality.
I was debugging a different issue in Ladybird, and noticed that
completing relative file URLs with URL::complete_url didn't seem to work
right. This test case covers both the working https case, as well as the
file URL case fixed by the previous commit.
Change the name and return type of
`IPv6Address::to_deprecated_string()` to `IPv6Address::to_string()`
with return type `ErrorOr<String>`.
It will now propagate errors that occur when writing to the
StringBuilder.
There are two users of `to_deprecated_string()` that now use
`to_string()`:
1. `Formatted<IPv6Address>`: it now propagates errors.
2. `inet_ntop`: it now sets errno to ENOMEM and returns.
That's what this class really is; in fact that's what the first line of
the comment says it is.
This commit does not rename the main files, since those will contain
other time-related classes in a little bit.
Note that in some cases (in particular SQL::Result and PDFErrorOr),
there is no Formatter defined for the error type, hence TRY_OR_FAIL
cannot work as-is. Furthermore, this commit leaves untouched the places
where MUST could be replaced by TRY_OR_FAIL.
Inspired by:
https://github.com/SerenityOS/serenity/pull/18710#discussion_r1186892445
"The official project language is American English […]."
5d2e915623/CONTRIBUTING.md (L30)
Here's a short statistic of the occurrences of the word "behavio(u)r":
$ git grep -IPioh 'behaviou?r' | sort | uniq -c | sort -n
2 BEHAVIOR
24 Behaviour
32 behaviour
407 Behavior
992 behavior
Therefore, it is clear that "behaviour" (56 occurrences) should be
regarded a typo, and "behavior" (1401 occurrences) should be preferred.
Note that The occurrences in LibJS are intentionally NOT changed,
because there are taken verbatim from the specification. Hence:
$ git grep -IPioh 'behaviou?r' | sort | uniq -c | sort -n
2 BEHAVIOR
10 behaviour
24 Behaviour
407 Behavior
1014 behavior
GCC 13 was released on 2023-04-26. This commit fixes Lagom build errors
when using an updated host toolchain:
- Adds a workaround for a bug in constraint handling, which made LibJS
fail to compile: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109683
- Silences the new `-Wdangling-reference` diagnostic globally. It
produces multiple false positives with no clear way to silence them
without `#pragmas`.
- Silences `-Wself-move` in `RefPtr` tests as GCC 13 adds this
previously Clang-exclusive warning.
We now null out smart pointers *before* calling unref on the pointee.
This ensures that the same smart pointer can't be used to acquire a new
reference to the pointee after its destruction has begun.
I ran into this when destroying a non-empty IntrusiveList of RefPtrs,
but the problem was more general so this fixes it for all of RefPtr,
NonnullRefPtr, OwnPtr and NonnullOwnPtr.
This allows accessing and looping over the path segments in a URL
without necessarily allocating a new vector if you want them percent
decoded too (which path_segment_at_index() has an option for).
This now defaults to serializing the path with percent decoded segments
(which is what all callers expect), but has an option not to. This fixes
`file://` URLs with spaces in their paths.
The name has been changed to serialize_path() path to make it more clear
that this method will generate a new string each call (except for the
cannot_be_a_base_url() case). A few callers have then been updated to
avoid repeatedly calling this function.
Previously, if we copied the last byte for a length of 100, we'd
recalculate the read span 100 times and memmove one byte 100 times,
which resulted in a lot of overhead.
Now, if we know that we have two consecutive copies of the data, we just
extend the distance to cover both copies, which halves the number of
times that we recalculate the span and actually call memmove.
This takes the running time of the attached benchmark case from 150ms
down to 15ms.
As noted in serval comments doing this goes against the WC3 spec,
and breaks parsing then re-serializing URLs that contain percent
encoded data, that was not encoded using the same character set as
the serializer.
For example, previously if you had a URL like:
https:://foo.com/what%2F%2F (the path is what + '//' percent encoded)
Creating URL("https:://foo.com/what%2F%2F").serialize() would return:
https://foo.com/what//
Which is incorrect and not the same as the URL we passed. This is
because the re-serializing uses the PercentEncodeSet::Path which
does not include '/'.
Only doing the percent encoding in the setters fixes this, which
is required to navigate to Google Street View (which includes a
percent encoded URL in its URL).
Seems to fix#13477 too
`vformat()` can now accept format specifiers of the form
{:'[numeric-type]}. This will output a number with a comma separator
every 3 digits.
For example:
`dbgln("{:'d}", 9999999);` will output 9,999,999.
Binary, octal and hexadecimal numbers can also use this feature, for
example:
`dbgln("{:'x}", 0xffffffff);` will output ff,fff,fff.