The web specs do not expect decoding or decoding to happen when calling
these helpers. This allows us to remove the raw_fragment helper function
from the URL class.
This encoder can handle all integer formats and sample rates, though
only two channels well. It uses fixed LPC and performs a
close-to-optimal parameter search on the LPC order and residual Rice
parameter, leading to decent compression already.
Just like with input buffered streams, we don't currently have a use
case for output buffered streams which aren't seekable, since the main
application are files.
To ensure this happens without duplicating code, we allow forcing a
StringBuilder object to only use the inline buffer, so the code in the
AK/Format.cpp file doesn't need to deal with different underlying
storage types (expandable or inline-fixed) at all.
I couldn't run the parser in a debugger like I normally would, so I
added printouts to understand where the parser is failing.
More could be added, but these are enough to get a good idea of what
the parser is doing. It's very spammy, though, so enable it by flicking
the IMAP_PARSER_DEBUG switch :^)
Instead, use the FixedCharBuffer class to ensure we always use a static
buffer storage for these names. This ensures that if a Process or a
Thread were created, there's a guarantee that setting a new name will
never fail, as only copying of strings should be done to that static
storage.
The limits which are set are 32 characters for processes' names and 64
characters for thread names - this is because threads' names could be
more verbose than processes' names.
This class encapsulates a fixed Array with compile-time size definition
for storing ASCII characters.
There are also new Kernel StdLib functions to copy user data into such
objects so this class will be useful later on.
The spec indicates we should support serializing opaque hosts, but we
were assuming the host contained a String. Opaque hosts are represented
with Empty. Return an empty string here instead to prevent crashing on
an invalid variant access.
Now that ""_string is infallible, the only benefit of explicitly
constructing a short string is the ability to do it at compile-time. But
we never do that, so let's simplify the API and remove this
implementation detail from it.
This could happen if a sequence of '0' parts was followed by a longer
sequence of '0' parts at the end of the host. The first sequence was
being used for the compress, and not the second.
For example, [1:1:0:0:1:0:0:0] was being serialized as: [1:1::1:0:0:0]
instead of [1:1:0:0:1::].
Fix this by checking at the end of the loop if we are in the middle of a
sequence of '0' parts that is longer than the current longest.
Everywhere only ever expects percent encoding to occur, so let's just
remove this flag altogether. At the same time, replace some
DeprecatedString with StringView.
Parsing 'data:' URLs took it's own route. It never set standard URL
fields like path, query or fragment (except for scheme) and instead
gave us separate methods called `data_payload()`, `data_mime_type()`,
and `data_payload_is_base64()`.
Because parsing 'data:' didn't use standard fields, running the
following JS code:
new URL('#a', 'data:text/plain,hello').toString()
not only cleared the path as URLParser doesn't check for data from
data_payload() function (making the result be 'data:#a'), but it also
crashes the program because we forbid having an empty MIME type when we
serialize to string.
With this change, 'data:' URLs will be parsed like every other URLs.
To decode the 'data:' URL contents, one needs to call process_data_url()
on a URL, which will return a struct containing MIME type with already
decoded data! :^)
By not clearing the buffer, we were leaking the path part of a URL into
the query for URLs without an authority component (no '//host').
This could be seen most noticeably in mailto: URLs with header fields
set, as the query part of `mailto:user@example.com?subject=test` was
parsed to `user@example.comsubject=test`.
data: URLs didn't have this problem, because we have a special case for
parsing them.
This is defined in the spec, but was missing in our table. Fix this, and
add a spec comment for what is missing. Also begin a basic text based
test for URL, so we can get some coverage of LibWeb's usage of URL too.
This takes the previous alternation optimisation and applies it to all
the alternation blocks instead of just the few instructions at the
start.
By generating a trie of instructions, all logically equivalent
instructions will be consolidated into a single node, allowing the
engine to avoid checking the same thing multiple times.
For instance, given the pattern /abc|ac|ab/, this optimisation would
generate the following tree:
- a
| - b
| | - c
| | | - <accept>
| | - <accept>
| - c
| | - <accept>
which will attempt to match 'a' or 'b' only once, and would also limit
the number of backtrackings performed in case alternatives fails to
match.
This optimisation is currently gated behind a simple cost model that
estimates the number of instructions generated, which is pessimistic for
small patterns, though the change in performance in such patterns is not
particularly large.
Similar to floor and ceil, we were forcing the values to be doubles here
Also adds a big FIXME about my findings trying to add a general
implementation for these
Both GCC and Clang inline this function to use bit-wise logic and/or
appropriate instructions even on -O0 and allow their use in a constexpr
context, see
https://godbolt.org/z/de1393vha
In order to follow spec text to achieve this, we need to change the
underlying representation of a host in AK::URL to deserialized format.
Before this, we were parsing the host and then immediately serializing
it again.
Making that change resulted in a whole bunch of fallout.
After this change, callers can access the serialized data through
this concept-host-serializer. The functional end result of this
change is that IPv6 hosts are now correctly serialized to be
surrounded with '[' and ']'.
This implementation will allow us to fix serialization of IPv6
addresses not being surrounded by '[' and ']'.
Nothing is calling this function yet - this will come in the next
(larger) commit where the underlying host representation inside of
AK::URL is changed from DeprecatedString to URL::Host.
This doesn't seem trivial enough to be defining in the header like this,
and should not be a performance critical function anyhow.
Also add spec comments while we are at it, and a FIXME since we do not
seem to exactly align.
And use them where applicable. This will allow us to store the host in
the deserialized format as the spec specifies.
Ideally these typdefs would instead be the existing AK interfaces, but
in the meantime, we can just use this.
These methods are slightly more convenient than storing the Bytes
separately. However, it it feels unsanitary to reach in and access this
data directly. Both of the users of these already have the
[Readonly]Bytes available in their constructors, and can easily avoid
using these methods, so let's remove them entirely.