Also add some tests that ensure that the input and output streams match
each other, because I can't wrap my head around what the internal
representation looks like.
Since AK can't refer to LibUnicode directly, the strategy here is that
if you need case transformations, you can link LibUnicode and receive
them. If you try to use either of these methods without linking it, then
you'll of course get a linker error (note we don't do any fallbacks to
e.g. ASCII case transformations). If you don't need these methods, you
don't have to link LibUnicode.
DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so
let's rename it to A) match the name of DeprecatedString, B) write a new
FlyString class that is tied to String.
This makes construction of Utf16String fallible in OOM conditions. The
immediate impact is that PrimitiveString must then be fallible as well,
as it may either transcode UTF-8 to UTF-16, or create a UTF-16 string
from ropes.
There are a couple of places where it is very non-trivial to propagate
the error further. A FIXME has been added to those locations.
This allows us to make all comparision operators on the class constexpr
without pulling in a bunch of boilerplate. We don't use the `<compare>`
header because it doesn't compile in the main serenity cross-build due
to the include paths to LibC being incompatible with how libc++ expects
them to be for clang builds.
Simplify a lot of uses of ElapsedTimer by converting the callers to
elapsed_time from elapsed, as the AK::Time returned is better for unit
conversions and comparisons against constants.
Previously, if a pattern matched the empty string (e.g. ".*"), it would
match the string twice instead of once. Among other issues, this caused
a Regex replacement to duplicate its expected output, since it would
replace "both" empty matches.
These instances were detected by searching for files that include
stdlib.h, but don't match the regex:
\\b(_abort|abort|abs|aligned_alloc|arc4random|arc4random_buf|arc4random_
uniform|atexit|atof|atoi|atol|atoll|bsearch|calloc|clearenv|div|div_t|ex
it|_Exit|EXIT_FAILURE|EXIT_SUCCESS|free|getenv|getprogname|grantpt|labs|
ldiv|ldiv_t|llabs|lldiv|lldiv_t|malloc|malloc_good_size|malloc_size|mble
n|mbstowcs|mbtowc|mkdtemp|mkstemp|mkstemps|mktemp|posix_memalign|posix_o
penpt|ptsname|ptsname_r|putenv|qsort|qsort_r|rand|RAND_MAX|random|reallo
c|realpath|secure_getenv|serenity_dump_malloc_stats|serenity_setenv|sete
nv|setprogname|srand|srandom|strtod|strtof|strtol|strtold|strtoll|strtou
l|strtoull|system|unlockpt|unsetenv|wcstombs|wctomb)\\b
(Without the linebreaks.)
This regex is pessimistic, so there might be more files that don't
actually use anything from the stdlib.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
These instances were detected by searching for files that include
AK/StdLibExtras.h, but don't match the regex:
\\b(abs|AK_REPLACED_STD_NAMESPACE|array_size|ceil_div|clamp|exchange|for
ward|is_constant_evaluated|is_power_of_two|max|min|mix|move|_RawPtr|RawP
tr|round_up_to_power_of_two|swap|to_underlying)\\b
(Without the linebreaks.)
This regex is pessimistic, so there might be more files that don't
actually use any "extra stdlib" functions.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
These instances were detected by searching for files that include
AK/Format.h, but don't match the regex:
\\b(CheckedFormatString|critical_dmesgln|dbgln|dbgln_if|dmesgln|FormatBu
ilder|__FormatIfSupported|FormatIfSupported|FormatParser|FormatString|Fo
rmattable|Formatter|__format_value|HasFormatter|max_format_arguments|out
|outln|set_debug_enabled|StandardFormatter|TypeErasedFormatParams|TypeEr
asedParameter|VariadicFormatParams|v_critical_dmesgln|vdbgln|vdmesgln|vf
ormat|vout|warn|warnln|warnln_if)\\b
(Without the linebreaks.)
This regex is pessimistic, so there might be more files that don't
actually use any formatting functions.
Observe that this revealed that Userland/Libraries/LibC/signal.cpp is
missing an include.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
In practice, this function does not take any perceptible amount of time.
However, this benchmark demonstrates that for extreme values, the
internal for-loop does matter.
Using policy based design `SinglyLinkedList` and
`SinglyLinkedListWithCount` can be combined into one class which takes
a policy to determine how to keep track of the size of the list. The
default policy is to use list iteration to count the items in the list
each time. The `WithCount` form is a different policy which tracks the
size, but comes with the overhead of storing the count and
incrementing/decrementing on each modification.
This model is extensible to have other forms of counting by
implementing only a new policy instead of implementing a totally new
type.
In 7c5e30daaa, the focus was "only" on
Userland/Libraries/, whereas this commit cleans up the remaining
headers in the repo, and any new badly-formatted include.
This allows you to enter TRUE or FALSE in a SQL statement for BOOLEAN
types. Note that this differs from SQLite, which requires entering 1 or
0 for BOOLEANs; having explicit keywords feels a bit more natural.
We were already handling the rmdir("..") case by refusing to remove
directories that were not empty.
This patch removes a FIXME from January 2019 and adds a test. :^)
Dr. POSIX says that we should reject attempts to rmdir() the file named
"." so this patch does exactly that. We also add a test.
This solves a FIXME from January 2019. :^)
The underlying reason is an unconditional call to consume(), even if
there is no reason to expect that the string continues.
This crash was discovered by OSS-Fuzz:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=42354
This bug exists since the code was first written in April 2021:
13abbc5ea8
This old symbolication exploit proof of concept doesn't link on aarch64
with the Clang toolchain. We should revisit these as as whole to see if
they're worth keeping around in general.
The class is very similar to `CircularDuplexStream` in its behavior.
Main differences are that `CircularBuffer`:
- does not inherit from `AK::Stream`
- uses `ErrorOr` for its API
- is heap allocated (and OOM-Safe)
This patch also add some tests.
Previously any backslash and the character following it were ignored.
This commit adds a fall through to match the character following the
backslash without checking whether it is "special".
This allows callers to use the following semantics:
using MyVariant = Variant<Empty, int>;
template<typename T>
size_t size() { return TypeList<T>::size; }
auto s = size<MyVariant>();
This will be needed for an upcoming IPC change, which will result in us
knowing the Variant type, but not the underlying variadic types that the
Variant holds.
If a negative value ends up in one of the arguments for an invoked
function, we don't want to cast it from a floating point type to an
unsigned type. This fixes a float-cast-overflow UBSAN error on macOS
with llvm 15.0.6.
Rather than trying to assume the only two C libraries on Linux are musl
and glibc, this solution fixes musl builds by explicitly checking for
the one C library function we are overwriting.
That being said, we should find another solution to retrieving this
error information from crashing tests. Possibly just overriding the
SIGABRT handler would work. The full solution might require checking
stderr as well as stdout in the test driver though.
This is a first step towards handling OOM errors instead of just
crashing the program.
Now UDPServer's method `receive()` return memory allocation
errors explicitly with help of ErrorOr.
This removes one FIXME and make a bunch of new ones. :(
Adapt BMPImageDecoderPlugin to support BMP images included in ICOns.
ICOImageDecoderPlugin now uses BMPImageDecoderPlugin to decode all
BMP images instead of it's own ad-hoc decoder which only supported
32 bpp BMPs.
`OwnPtrWithCustomDeleter` was a decorator which provided the ability
to add a custom deleter to `OwnPtr` by wrapping and taking the deleter
as a run-time argument to the constructor. This solution means that no
additional space is needed for the `OwnPtr` because it doesn't need to
store a pointer to the deleter, but comes at the cost of having an
extra type that stores a pointer for every instance.
This logic is moved directly into `OwnPtr` by adding a template
argument that is defaulted to the default deleter for the type. This
means that the type itself stores the pointer to the deleter instead
of every instance and adds some type safety by encoding the deleter in
the type itself instead of taking a run-time argument.
This syscall will be used later on to ensure we can declare virtual
memory mappings as immutable (which means that the underlying Region is
basically immutable for both future annotations or changing the
protection bits of it).
This is to differentiate between the upcoming `AllocatingMemoryStream`,
which automatically allocates memory as needed instead of operating on a
static memory area.
Currently, integers are stored in LibSQL as 32-bit signed integers, even
if the provided type is unsigned. This resulted in a series of unchecked
unsigned-to-signed conversions, and prevented storing 64-bit values.
Further, mathematical operations were performed without similar checks,
and without checking for overflow.
This changes SQL::Value to behave like SQLite for INTEGER types. In
SQLite, the INTEGER type does not imply a size or signedness of the
underlying type. Instead, SQLite determines on-the-fly what type is
needed as values are created and updated.
To do so, the SQL::Value variant can now hold an i64 or u64 integer. If
a specific type is requested, invalid conversions are now explictly an
error (e.g. converting a stored -1 to a u64 will fail). When binary
mathematical operations are performed, we now try to coerce the RHS
value to a type that works with the LHS value, failing the operation if
that isn't possible. Any overflow or invalid operation (e.g. bitshifting
a 64-bit value by more than 64 bytes) is an error.
In the long run, this is obviously a bad way to handle version changes
to the SQL database files. We will want to migrate old databases to new
formats. Until we figure out a good way to do that, wipe old databases
so that we don't crash trying to read incompatible data.
This constructor was easily confused with a copy constructor, and it was
possible to accidentally copy-construct Objects in at least one way that
we dicovered (via generic ThrowCompletionOr construction).
This patch adds a mandatory ConstructWithPrototypeTag parameter to the
constructor to disambiguate it.
Note that this still keeps the old behaviour of putting things in std by
default on serenity so the tools can be happy, but if USING_AK_GLOBALLY
is unset, AK behaves like a good citizen and doesn't try to put things
in the ::std namespace.
std::nothrow_t and its friends get to stay because I'm being told that
compilers assume things about them and I can't yeet them into a
different namespace...for now.
Implement insertion sort in AK. The cutoff value 7 is a magic number
here, values [5, 15] should work well. Main idea of the cutoff is to
reduce recursion performed by quicksort to speed up sorting
of small partitions.
This generally seems like a better name, especially if we somehow also
need a better name for "read the entire buffer, but not the entire file"
somewhere down the line.
Next to functions like `is_eof` these were really confusing to use, and
the `read`/`write` functions should fail anyways if a stream is not
readable/writable.
`Core::Stream::File` shouldn't hold any utility methods that are
unrelated to constructing a `Core::Stream`, so let's just replace the
existing `Core::File::exists` with the nicer looking implementation.
Three standalone Cell creation functions remain in the JS namespace:
- js_bigint()
- js_string()
- js_symbol()
All of them are leftovers from early iterations when LibJS still took
inspiration from JSC, which itself has jsString(). Nowadays, we pretty
much exclusively use static create() functions to construct types
allocated on the JS heap, and there's no reason to not do the same for
these.
Also change the return type from BigInt* to NonnullGCPtr<BigInt> while
we're here.
This is patch 1/3, replacement of js_string() and js_symbol() follow.
This partially implements SQLite's bind-parameter expression to support
indicating placeholder values in a SQL statement. For example:
INSERT INTO table VALUES (42, ?);
In the above statement, the '?' identifier is a placeholder. This will
allow clients to compile statements a single time while running those
statements any number of times with different placeholder values.
Further, this will help mitigate SQL injection attacks.
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.
One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
This now prepares all the needed (fallible) components before actually
constructing a LoaderPlugin object, so we are no longer filling them in
at an arbitrary later point in time.
Database::get_table currently either returns a RefPtr to an existing
table, a nullptr if the table doesn't exist, or an Error if some
internal error occured. Change this to return a NonnullRefPtr to an
exisiting table, or a SQL::Result with any error, including if the
table was not found. Callers can then handle that specific error code
if they want.
Returning a NonnullRefPtr will enable some further cleanup. This had
some fallout of needing to change some other methods' return types from
AK::ErrorOr to SQL::Result so that TRY may continue to be used.
Database::get_schema currently either returns a RefPtr to an existing
schema, a nullptr if the schema doesn't exist, or an Error if some
internal error occured. Change this to return a NonnullRefPtr to an
exisiting schema, or a SQL::Result with any error, including if the
schema was not found. Callers can then handle that specific error code
if they want.
Returning a NonnullRefPtr will enable some further cleanup. This had
some fallout of needing to change some other methods' return types from
AK::ErrorOr to SQL::Result so that TRY may continue to be used.
After splitting a node, the new node was written to the same pointer as
the current node - probably a copy / paste error. This new code requires
a `.pointer() -> u32` to exist on the object to be serialized,
preventing this issue from happening again.
Fixes#15844.
The Demuxer class was changed to return errors for more functions so
that all of the underlying reading can be done lazily. Other than that,
the demuxer interface is unchanged, and only the underlying reader was
modified.
The MatroskaDocument class is no more, and MatroskaReader's getter
functions replace it. Every MatroskaReader getter beyond the Segment
element's position is parsed lazily from the file as needed. This means
that all getter functions can return DecoderErrors which must be
handled by callers.
Matroska::Reader functions now return DecoderErrorOr instead of values
being declared Optional. Useful errors can be handled by the users of
the parser, similarly to the VP9 decoder. A lot of the error checking
in the reader is a lot cleaner thanks to this change, since all reads
can be range checked in Streamer::read_octet() now.
Most functions for the Streamer class are now also out-of-line in
Reader.cpp now instead of residing in the header.
As new demuxers are added, this will get quite full of files, so it'll
be good to have a separate folder for these.
To avoid too many chained namespaces, the Containers subdirectory is
not also a namespace, but the Matroska folder is for the sake of
separating the multiple classes for parsed information entering the
Video namespace.
These functions are now implemented in terms of getpwent_r() which
allows us to remove two FIXMEs about global variable shenanigans.
I'm also adding tests for both APIs. :^)
This means that rather than this:
```
AK_TYPEDEF_DISTINCT_NUMERIC_GENERAL(u64, true, true, false, false,
false, true, FunctionAddress);
```
We now have this:
```
AK_TYPEDEF_DISTINCT_NUMERIC_GENERAL(u64, FunctionAddress, Arithmetic,
Comparison, Increment);
```
Which is a lot more readable. :^)
Co-authored-by: Ali Mohammad Pur <mpfard@serenityos.org>
When calling clear_with_capacity on an empty HashTable/HashMap, a null
deref would occur when trying to memset() m_buckets. Checking that it
has capacity before clearing fixes the issue.
Currently, the floating point to string conversion is implemented
several times across the codebase. This commit provides a pretty
low-level function to unify all of such conversions. It converts the
given double to a fixed point decimal satisfying a few correctness
criteria.
Otherwise, we end up propagating those dependencies into targets that
link against that library, which creates unnecessary link-time
dependencies.
Also included are changes to readd now missing dependencies to tools
that actually need them.
Even though the toolchain implicitly links against -lc, it does not know
where it should get LibC from except for the sysroot. In the case of
Clang this causes it to pick up the LibC stub instead, which might be
slightly outdated and feature missing symbols.
This is currently not an issue that manifests because we pass through
the dependency on LibC and other libraries by accident, which causes
CMake to link against the LibC target (instead of just the library),
and thus points the linker at the build output directory.
Since we are looking to fix that in the upcoming commits, let's make
sure that everything will still be able to find the proper LibC first.
The class is virtual and has one subclass, SubsampledYUVFrame, which
is used by the VP9 decoder to return a single frame. The
output_to_bitmap(Bitmap&) function can be used to set pixels on an
existing bitmap of the correct size to the RGB values that
should be displayed. The to_bitmap() function will allocate a new bitmap
and fill it using output_to_bitmap.
This new class also implements bilinear scaling of the subsampled U and
V planes so that subsampled videos' colors will appear smoother.
Because we still support u64 and i64 (on top of i32 and u32) we do still
have to parse the number ourself first. Then if we determine that the
number is a floating point or is outside of the range of i64 and u64 we
fallback and parse it as a double.
Before JsonParser had ifdefs guarding the double computation, but it
just build when we error on ifdef KERNEL so JsonParser is no longer
usable in the Kernel. This can be remedied fairly easily but since
it is not needed we #error on that for now.
Similar to decimal floating point parsing the current strtod hex float
parsing gives a lot of incorrect results. We can use a similar technique
as with decimal parsing however hex floats are much simpler as we don't
need to scale with a power of 5.
For hex floats we just provide the parse_first_hexfloat API as there is
currently no need for a parse_hexfloat_completely API.
Again the accepted input for parse_first_hexfloat is very lenient and
any validation should be done before calling this method.
This is based on the paper by Daniel Lemire called
"Number parsing at a Gigabyte per second", currently available at
https://arxiv.org/abs/2101.11408
An implementation can be found at
https://github.com/fastfloat/fast_float
To support both strtod like methods and String::to_double we have two
different APIs. The parse_first_floating_point gives back both the
result, next character to read and the error/out of range status.
Out of range here means we rounded to infinity 0.
The other API, parse_floating_point_completely, will return a floating
point only if the given character range contains just the floating point
and nothing else. This can be much faster as we can skip actually
computing the value if we notice we did not parse the whole range.
Both of these APIs support a very lenient format to be usable in as many
places as possible. Also it does not check for "named" values like
"nan", "inf", "NAN" etc. Because this can be different for every usage.
For integers and small values this new method is not faster and often
even a tiny bit slower than the current strtod implementation. However
the strtod implementation is wrong for a lot of values and has a much
less predictable running time.
For correctness this method was tested against known string -> double
datasets from https://github.com/nigeltao/parse-number-fxx-test-data
This method gives 100% accuracy.
The old strtod gave an incorrect value in over 50% of the numbers
tested.
By appending individual bytes as code points, we were "breaking apart"
multi-byte UTF-8 code points. This now behaves the same way as the
invert_case() helper in StringUtils.
Commit c3fd455 changed LibTimeZone to fall back to the system time zone
when we fail to parse the TZ environment variable. This behavior differs
from both our LibC and glibc; they abort parsing and default to UTC.
This changes LibTimeZone to behave the same way to avoid a very awkward
situation where some parts of the codebase thinks the timezone is UTC,
and others think the timezone is whatever /etc/timezone indicates.
According to the spec, pointers to client data need to be dereferenced
immediately when adding calls such as `glDrawElements` or
`glArrayElement` to a display list. We were trying to support display
lists for these calls but since they only invoke _other_ calls that also
support display lists, we can simply defer the display list
functionality to them.
This fixes the rendering of the ClassiCube port by cflip.
Instead of just having a giant KBuffer that is not resizeable easily, we
use multiple AnonymousVMObjects in one Vector to store them.
The idea is to not have to do giant memcpy or memset each time we need
to allocate or de-allocate memory for TmpFS inodes, but instead, we can
allocate only the desired block range when trying to write to it.
Therefore, it is also possible to have data holes in the inode content
in case of skipping an entire set of one data block or more when writing
to the inode content, thus, making memory usage much more efficient.
To ensure we don't run out of virtual memory range, don't allocate a
Region in advance to each TmpFSInode, but instead try to allocate a
Region on IO operation, and then use that Region to map the VMObjects
in IO loop.
Currently, the Value class is essentially a "pImpl" wrapper around the
ValueImpl hierarchy of classes. This is a bit difficult to follow and
reason about, as methods jump between the Value class and its impl
classes.
This changes the Variant held by Value to instead store the specified
types (String, int, etc.) directly. In doing so, the ValueImpl classes
are removed, and all methods are now just concise Variant visitors.
As part of this rewrite, support for the "array" type is dropped (or
rather, just not re-implemented) as it was unused. If it's needed in the
future, support can be re-added.
This does retain the ability for non-NULL types to store NULL values
(i.e. an empty Optional). I tried dropping this support as well, but it
is depended upon by the on-disk storage classes in non-trivial ways.
Even though this almost certainly wouldn't run properly even if we had
a working kernel for AARCH64 this at least lets us build all the
userland binaries.
The only complication here is that Core::Stream::File is not RefCounted
meaning we have to use OwnPtr instead of RefPtr.
Unfortunately we cannot propagate errors as some errors must be caught
and dealt with as the runner can do anything (like stop at any moment
or close pipes).
Without this the runner is waiting for new tests which will never come
and test-test262 is waiting for output which never comes since the
runner is blocked.
Also finish off a comment, and make the variables follow serenity style.
Previously, some integer overflows and truncations were causing parsing
errors for 4K videos, with those fixed it can fully decode 8K video.
This adds a test to ensure that 4K video will continue to be decoded.
Note: There seems to be unexpectedly high memory usage while decoding
them, causing 8K video to require more than a gigabyte of RAM. (!!!)
The relevant RFC section from
https://www.rfc-editor.org/rfc/rfc7932#section-9.2
MSKIPBYTES * 8 bits: MSKIPLEN - 1, where MSKIPLEN is
the number of metadata bytes; this field is
only present if MSKIPBYTES is positive;
otherwise, MSKIPLEN is 0 (if MSKIPBYTES is
greater than 1, and the last byte is all
zeros, then the stream should be rejected as
invalid)
So when skip_bytes is zero we need to break and
re-align bytes.
Added the relevant test case that demonstrates this from:
https://github.com/google/brotli/blob/master/tests/testdata/x.compressed
If the entire string you want to right-trim consists of characters you
want to remove, we previously would incorrectly leave the first
character there.
For example: `trim("aaaaa", "a")` would return "a" instead of "".
We can't use `i >= 0` in the loop since that would fail to detect
underflow, so instead we keep `i` in the range `size .. 1` and then
subtract 1 from it when reading the character.
Added some trim() tests while I was at it. (And to confirm that this was
the issue.)
This test file had #ifdef macros at the top that caused none of the
content to be compiled unless a developer manually wanted to run the
specific benchmarks within. As such, it has become stale. Remove it for
now, if someone wants to restore it in an always-runnable state, we can
restore the specific tests it's trying to benchmark.
Instead of doing anything reasonable, Utf8CodePointIterator returned
invalid code points, for example U+123456. However, many callers of this
iterator assume that a code point is always at most 0x10FFFF.
In fact, this is one of two reasons for the following OSS Fuzz issue:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=49184
This is probably a very old bug.
In the particular case of URLParser, AK::is_url_code_point got confused:
return /* ... */ || code_point >= 0xA0;
If code_point is a "code point" beyond 0x10FFFF, this violates the
condition given in the preceding comment, but satisfies the given
condition, which eventually causes URLParser to crash.
This commit fixes *only* the erroneous UTF-8 decoding, and does not
fully resolve OSS-Fuzz#49184.
This fixes `combine_hangul_code_points` which would try to combine
a LVT syllable with a trailing consonant, resulting in a wrong
character.
Also added a test for this specific case.
`mkstemps` generates a unique temporary file name from a pattern like
`prefixXXXXXXsuffix` where `prefix` and `suffix` can be any string with
only characters that are valid in a filename. The second parameter is
the length of the suffix.
`mkstemp` is `mkstemps` with suffix length 0, so to avoid code
duplication it calls `mkstemps`. It is unlikely this has any
significant performance impact on SerenityOS.
`generate_unique_filename` now takes the suffix length as a `size_t`.
The original behavior of this function is preserved when specifying a
suffix length of 0. All original uses of this function have been
adapted.
`mkstemps()` was added because it is required by version 4.6.3 of the
ccache port.
Doesn't use them in libc headers so that those don't have to pull in
AK/Platform.h.
AK_COMPILER_GCC is set _only_ for gcc, not for clang too. (__GNUC__ is
defined in clang builds as well.) Using AK_COMPILER_GCC simplifies
things some.
AK_COMPILER_CLANG isn't as much of a win, other than that it's
consistent with AK_COMPILER_GCC.
Some time zones, like "Asia/Shanghai", use a set of DST rules that end
before present day. In these cases, we should fall back to last possible
RULE entry from the TZDB. The time zone compiler published by IANA (zic)
performs the same fallback starting with version 2 of the time zone file
format.
JS::Value stores 48 bit pointers to separately allocated objects in its
payload. On x86-64, canonical addresses have their top 16 bits set to
the same value as bit 47, effectively meaning that the value has to be
sign-extended to get the pointer. AArch64, however, expects the topmost
bits to be all zeros.
This commit gates sign extension behind `#if ARCH(X86_64)`, and adds an
`#error` for unsupported architectures, so that we do not forget to
think about pointer handling when porting to a new architecture.
Fixes#15290FixesSerenityOS/ladybird#56
We were dropping the base URL path components in the resulting URL due
to mistakenly determining the input URL to start with a Windows drive
letter. Fix this, add a spec link, and a test.
The test-case is heavily inspired by:
https://github.com/google/brotli/blob/master/tests/testdata/x.compressed.01
Or in words: A metadata meta-block containing `Y` (which should be
ignored), and then the actual data (a single `Z`). The bug used to skip
one metadata byte too few, and thus read garbage.
Propagate errors in places that are already set up to handle them, like
WebGLRenderingContext and the Tubes demo, and convert other callers
to using MUST.
This caused the m_allocation_enabled_previously member to be technically
uninitialized when the compiler emits the implicit destructor call for
stack allocated classes.
This was pointed out by gcc on lagom builds, no clue how this was flying
under the radar for so long and is not triggering CI.
...`__attribute__((__noreturn__))`
This is more inline with the definition in glibc's version of the file,
and stops clang from complaining about it originally not being declared
as `[[no_return]]`.
Each texture unit now has its own texture transformation matrix stack.
Introduce a new texture unit configuration that is synced when changed.
Because we're no longer passing a silly `Vector` when drawing each
primitive, this results in a slightly improved frames per second :^)
This allows running of test262 (like) tests with any runner. And thus
allows running the full test262 suite on Serenity itself.
The functionality of test-test262 is intentionally limited at first.
It does support:
- Progress updates including the special serenity terminal commands
- Outputting a per-file, to compare against other runs
- Passing any number of parameters to the runner
- Setting the batch size of the amount of tests per runner process
- Outputting a summary of the test results
If a test is supposed to fail during parse or early phase we can stop
after parsing. Because phases in modules are not as clear we don't skip
the other parts for modules.
When running a larger set of tests in Serenity the runner would
otherwise trigger a lot of crash reporters. This would then in turn lead
to memory starvation causes more crashes.
We also protect against recursive assert failures, for example due to
being out of memory.
With this change the runner now compiles and runs on Serenity :^).
Since setitimer is not implemented in Serenity we use alarm which
triggers SIGALRM after the timeout. We also don't use a signal handler
as we are doing things that serenity doesn't like/doesn't allow.
Linux dealt with allocating and writing in a signal handler but it is
undefined, so instead we just let the process die by SIGALRM.
This means we instead of reading the output can detect timeouts by
checking how the process died.
For now this is a lagom only application as it is not compatible with
serenity in its current state.
The only change is that it is released under a different license with
permission from all the authors.
We were consuming all whitespace from the format, but not the input
lexer - that was left to the actual format parsing code. It so happened
that we did not account for whitespace with the conversion specifier
'[', causing whitespace to end up in the output variables.
Fix this by always consuming all whitespace and removing the whitespace
logic from the conversion code.
Currently, LibUnicodeData contains the generated UCD and CLDR data. Move
the UCD data to the main LibUnicode library, and rename LibUnicodeData
to LibLocaleData. This is another prepatory change to migrate to
LibLocale.
The FLAC "spec tests", or rather the test suite by xiph that exercises
weird FLAC features and edge cases, can be found at
https://github.com/ietf-wg-cellar/flac-test-files and is a good
challenge for our FLAC decoder to become more spec compliant. Running
these tests is similar to LibWasm spec tests, you need to pass
INCLUDE_FLAC_SPEC_TESTS to CMake.
As of integrating these tests, 23 out of 63 fail. :yakplus:
Previously, for a regex such as /[a-sy-z]/i, we would incorrectly think
the character "u" fell into the range "a-s" because neither of the
conditions "u > s && U > s" or "u < a && U < a" would be true, resulting
in the lookup falling back to assuming the character is in the range.
Instead, first explicitly check if the character falls into the range,
rather than checking if it falls outside the range. If the explicit
checks fail, then we know the character is outside the range.
Intrinsics, i.e. mostly constructor and prototype objects, but also
things like empty and new object shape now live on a new heap-allocated
JS::Intrinsics object, thus completing the long journey of taking all
the magic away from the global object.
This represents the Realm's [[Intrinsics]] slot in the spec and matches
its existing [[GlobalObject]] / [[GlobalEnv]] slots in terms of
architecture.
In the majority of cases it should now be possibly to fully allocate a
regular object without the global object existing, and in fact that's
what we do now - the realm is allocated before the global object, and
the intrinsics between both :^)
In OpenGL this is called the (base) internal format which is an
expectation expressed by the client for the minimum supported texel
storage format in the GPU for textures.
Since we store everything as RGBA in a `FloatVector4`, the only thing
we do in this patch is remember the expected internal format, and when
we write new texels we fixate the value for the alpha channel to 1 for
two formats that require it.
`PixelConverter` has learned how to transform pixels during transfer to
support this.
A GPU (driver) is now responsible for reading and writing pixels from
and to user data. The client (LibGL) is responsible for specifying how
the user data must be interpreted or written to.
This allows us to centralize all pixel format conversion in one class,
`LibSoftGPU::PixelConverter`. For both the input and output image, it
takes a specification containing the image dimensions, the pixel type
and the selection (basically a clipping rect), and converts the pixels
from the input image to the output image.
Effectively this means we now support almost all OpenGL 1.5 formats,
and all custom logic has disappeared from:
- `glDrawPixels`
- `glReadPixels`
- `glTexImage2D`
- `glTexSubImage2D`
The new logic is still unoptimized, but on my machine I experienced no
noticeable slowdown. :^)
This is a set of functions that allow you to convert between arbitrary
IEEE 754 floating point types, as long as they can be represented
within 64 bits. Conversion methods between floats and doubles are
provided, as well as a generic `float_to_float()`.
Example usage:
#include <AK/FloatingPoint.h>
double val = 1.234;
auto weird_f16 =
convert_from_native_double<FloatingPointBits<0, 6, 10>>(val);
Signed and unsigned floats are supported, and both NaN and +/-Inf are
handled correctly. Values that do not fit in the target floating point
type are clamped.
Instead we just use a specific constructor. With this set of
constructors using curly braces for constructing is highly recommended.
As then it will not do too many implicit conversions which could lead to
unexpected loss of data or calling the much slower double constructor.
Also to ensure we don't feed (Un)SignedBigInteger infinities we throw
RangeError earlier for Durations.
This means it can take any (un)signed word of size at most Word.
This means the constructor can be disambiguated if we were to add a
double constructor :^).
This requires a change in just one test.
This allows using different options for rounding, like IEEE
roundTiesToEven, which is the mode that JS requires.
Also fix that the last word read from the bigint for the mantissa could
be shifted incorrectly leading to incorrect results.
SignedBigInteger can immediately use this by just negating the double if
the sign bit is set.
For simple cases (below 2^53) we can just convert via an u64, however
above that we need to extract the top 53 bits and use those as the
mantissa.
This function currently does not behave exactly as the JS spec specifies
however it is much less naive than the previous implementation.
The basic idea is that a global object cannot just come out of nowhere,
it must be associated to a realm - so get it from there, if needed.
This is to enforce the changes from all the previous commits by not
handing out global objects unless you actually have an initialized
realm (either stored somewhere, or the VM's current realm).
This is needed so that the allocated NativeFunction receives the correct
realm, usually forwarded from the Object's initialize() function, rather
than using the current realm.
This is a continuation of the previous six commits.
The global object is only needed to return it if the execution context
stack is empty, but that doesn't seem like a useful thing to allow in
the first place - if you're not currently executing JS, and the
execution context stack is empty, there is no this value to retrieve.
This is a continuation of the previous five commits.
A first big step into the direction of no longer having to pass a realm
(or currently, a global object) trough layers upon layers of AOs!
Unlike the create() APIs we can safely assume that this is only ever
called when a running execution context and therefore current realm
exists. If not, you can always manually allocate the Error and put it in
a Completion :^)
In the spec, throw exceptions implicitly use the current realm's
intrinsics as well: https://tc39.es/ecma262/#sec-throw-an-exception
This is a continuation of the previous three commits.
Now that create() receives the allocating realm, we can simply forward
that to allocate(), which accounts for the majority of these changes.
Additionally, we can get rid of the realm_from_global_object() in one
place, with one more remaining in VM::throw_completion().
This is a continuation of the previous two commits.
As allocating a JS cell already primarily involves a realm instead of a
global object, and we'll need to pass one to the allocate() function
itself eventually (it's bridged via the global object right now), the
create() functions need to receive a realm as well.
The plan is for this to be the highest-level function that actually
receives a realm and passes it around, AOs on an even higher level will
use the "current realm" concept via VM::current_realm() as that's what
the spec assumes; passing around realms (or global objects, for that
matter) on higher AO levels is pointless and unlike for allocating
individual objects, which may happen outside of regular JS execution, we
don't need control over the specific realm that is being used there.
This is a continuation of the previous commit.
Calling initialize() is the first thing that's done after allocating a
cell on the JS heap - and in the common case of allocating an object,
that's where properties are assigned and intrinsics occasionally
accessed.
Since those are supposed to live on the realm eventually, this is
another step into that direction.
We likely won't be able to test `pthread_cancel` itself, but this at
least makes sure that we use the correct values by default and that we
correctly reject invalid values.
This prevents us from needing a sv suffix, and potentially reduces the
need to run generic code for a single character (as contains,
starts_with, ends_with etc. for a char will be just a length and
equality check).
No functional changes.
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
This commit moves the length calculations out to be directly on the
StringView users. This is an important step towards the goal of removing
StringView(char const*), as it moves the responsibility of calculating
the size of the string to the user of the StringView (which will prevent
naive uses causing OOB access).
Previously we would treat the empty string as `null`. This caused
JavaScript like this to fail:
```js
var object = {};
try {
object = JSON.parse("");
} catch {}
var array = object.array || [];
```
Since `JSON.parse("")` returned null instead of throwing, it would set
`object` to null and then try and use it instead of using the default
backup value.
[^XYZ] is not(X | Y | Z), we used to translate this to
not(X) | not(Y) | not(Z), this commit makes LibRegex interpret this
pattern as not(X) & not(Y) & not(Z).
The lowercase version of a range is not required to be a valid range,
instead of casefolding the range and making it invalid, check twice with
both cases of the input character (which are the same as the input if
not insensitive).
This time includes an actual test :^)
This commit has no behavior changes.
In particular, this does not fix any of the wrong uses of the previous
default parameter (which used to be 'false', meaning "only replace the
first occurence in the string"). It simply replaces the default uses by
String::replace(..., ReplaceMode::FirstOnly), leaving them incorrect.