This parsing is already duplicated between LibJS and LibRegex, and will
shortly be needed in more places in those libraries. Move it to AK to
prevent further duplication.
This API will consume escaped Unicode code points of the form:
\\u{code point}
\\unnnn (where each n is a hexadecimal digit)
\\unnnn\\unnnn (where the two escaped values are a surrogate pair)
Currently, when we need to repeat an instruction N times, we simply add
that instruction N times in a for-loop. This doesn't scale well with
extremely large values of N, and ECMA-262 allows up to N = 2^53 - 1.
Instead, add a new REPEAT bytecode operation to defer this loop from the
parser to the runtime executor. This allows the parser to complete sans
any loops (for this instruction), and allows the executor to bail early
if the repeated bytecode fails.
Note: The templated ByteCode methods are to allow the Posix parsers to
continue using u32 because they are limited to N = 2^20.
Combining these into one list helps reduce the size of MatchState, and
as a result, reduces the amount of memory consumed during execution of
very large regex matches.
Doing this also allows us to remove a few regex byte code instructions:
ClearNamedCaptureGroup, SaveLeftNamedCaptureGroup, and NamedReference.
Named groups now behave the same as unnamed groups for these operations.
Note that SaveRightNamedCaptureGroup still exists to cache the matched
group name.
This also removes the recursion level from the MatchState, as it can
exist as a local variable in Matcher::execute instead.
The grammar for the ECMA-262 CharacterEscape is:
CharacterEscape[U, N] ::
ControlEscape
c ControlLetter
0 [lookahead ∉ DecimalDigit]
HexEscapeSequence
RegExpUnicodeEscapeSequence[?U]
[~U]LegacyOctalEscapeSequence
IdentityEscape[?U, ?N]
It's important to parse the standalone "\0 [lookahead ∉ DecimalDigit]"
before parsing LegacyOctalEscapeSequence. Otherwise, all standalone "\0"
patterns are parsed as octal, which are disallowed in Unicode mode.
Further, LegacyOctalEscapeSequence should also be parsed while parsing
character classes.
A subsequent commit will add tests that require a string containing only
"\0". As a C-string, this will be interpreted as the null terminator. To
make the diff for that commit easier to grok, this commit converts all
tests to use StringView without any other functional changes.
* Only alphabetic (A-Z, a-z) characters may be escaped with \c. The loop
currently parsing \c includes code points between the upper/lower case
groups.
* In Unicode mode, all invalid identity escapes should cause a parser
error, even in browser-extended mode.
* Avoid an infinite loop when parsing the pattern "\c" on its own.
Similarly to the LibCpp parser regression tests, these tests run the
preprocessor on the .cpp test files under
Userland/LibCpp/Tests/preprocessor, and compare the output with existing
.txt ground truth files.
These script extensions have some peculiar behavior in the Unicode spec.
The UCD ScriptExtension file does not contain these scripts. Rather, it
is implied the code points which have these scripts as an extension are
the code points that both:
1. Have Common or Inherited as their primary script value
2. Do not have any other script value in their script extension lists
Because these are not explictly listed in the UCD, we must manually form
these script extensions.
Notice that unlike the note in populate_general_category_unions(),
script extension do indeed have code point ranges which overlap. Thus,
this commit adds code to handle that, and hooks it into the GC unions.
Previously, each code point's General Category was part of the generated
UnicodeData structure. This ultimately presented two problems, one
functional and one performance related:
* Some General Categories are applied to unassigned code points, for
example the Unassigned (Cn) category. Unassigned code points are
strictly excluded from UnicodeData.txt, so by relying on that file,
the generator is unable to handle these categories.
* Lookups for General Categories are slower when searching through the
large UnicodeData hash map. Even though lookups are O(1), the hash
function turned out to be slower than binary searching through a
category-specific table.
So, now a table is generated for each General Category. When querying a
code point for a category, a binary search is done on each code point
range in that category's table to check if code point has that category.
Further, General Categories are now parsed from the UCD file
DerivedGeneralCategory.txt. This file is a normal "prop list" file and
contains the categories for unassigned code points.
There seems to be more incorrect assumptions about Clang-built
executables' memory layout than expected. These make the CI fail even
though the system is functional in all other aspects. While this is
being fixed, let's just disable tests for UserspaceEmulator.
This helps us avoid weird truncation issues and fixes a bug on Clang
builds where truncation while reading caused the DIE offsets following
large LEB128 numbers to be incorrect. This removes the need for the
separate `LongUnsignedNumber` type.
We now call Preprocessor::process_and_lex() and pass the result to the
parser.
Doing the lexing in the preprocessor will allow us to maintain the
original position information of tokens after substituting definitions.
Improve the parsing of data urls in URLParser to bring it more up-to-
spec. At the moment, we cannot parse the components of the MIME type
since it is represented as a string, but the spec requires it to be
parsed as a "MIME type record".
This is a regression test to validate the functionality that was
reported broken in #9071, where the kernel would spin attempting
to cancel a stale timer.
Before now, only binary properties could be parsed. Non-binary props are
of the form "Type=Value", where "Type" may be General_Category, Script,
or Script_Extension (or their aliases). Of these, LibUnicode currently
supports General_Category, so LibRegex can parse only that type.
This changes LibRegex to parse the property escape as a Variant of
Unicode Property & General Category values. A byte code instruction is
added to perform matching based on General Category values.
The following commit broke Tests/AK/TestJSON.cpp as it removed the
file that the test loaded from disk to validate JSON parsing.
commit ad141a2286
Author: Andreas Kling <kling@serenityos.org>
Date: Sat Jul 31 15:26:14 2021 +0200
Base: Remove "test.frm" from HackStudio test project
Instead of restoring the file, lets just embed a bit of JSON in the
test case to avoid using external resources, as they obviously are
surprising and make the test less portable across environments.
This supports some binary property matching. It does not support any
properties not yet parsed by LibUnicode, nor does it support value
matching (such as Script_Extensions=Latin).
Previously unmapping any offset starting at 0x0 would assert in the
kernel, add a regression test to validate the fix.
Co-authored-by: Federico Guerinoni <guerinoni.federico@gmail.com>
Apparently, some code points fit both categories, for example U+0345
(COMBINING GREEK YPOGEGRAMMENI). Handle this fact when determining if
a code point is a final code point in a string.
This implements unconditional special case folding, and conditional
folding for non-locale cases. Worth noting that the only conditional,
non-locale special case is for converting an uppercase sigma to
lowercase.
The Unicode standard publishes the Unicode Character Database (UCD) with
information about every code point, such as each code point's upper case
mapping. LibUnicode exists to download and parse UCD files at build time
and to provide accessors to that data.
As a start, LibUnicode includes upper- and lower-case code point
converters.
Previously there was no way to create a MACAddress by passing a direct
address as a string. This will allow programs like the arp utility to
create a MACAddress instance by user-passed addresses.
When the Unicode flag is set, regular expressions may escape code points
by surrounding the hexadecimal code point with curly braces, e.g. \u{41}
is the character "A".
When the Unicode flag is not set, this should be considered a repetition
symbol - \u{41} is the character "u" repeated 41 times. This is left as
a TODO for now.
When the Unicode option is not set, regular expressions should match
based on code units; when it is set, they should match based on code
points. To do so, the regex parser must combine surrogate pairs when
the Unicode option is set. Further, RegexStringView needs to know if
the flag is set in order to return code point vs. code unit based
string lengths and substrings.
This is a generally nicer-to-use version of the existing {any,all}_of()
that doesn't require the user to explicitly provide two iterators.
As a bonus, it also allows arbitrary iterators (as opposed to the hard
requirement of providing SimpleIterators in the iterator version).
The state of the formatter for the previous element should be thrown
away for each iteration. This showed up when trying to format a
Vector<String>, since Formatter<StringView> was unhappy about some state
that gets set when it's called. Add a test for Formatter<Vector>.
During a recent commit the 64-bit kernel was moved to a different
address, breaking this test (unnoticed). This fixes it, so we can
turn on breaking x86_64 tests on the CI again.
This commit makes LibRegex (mostly) capable of operating on any of
the three main string views:
- StringView for raw strings
- Utf8View for utf-8 encoded strings
- Utf32View for raw unicode strings
As a result, regexps with unicode strings should be able to properly
handle utf-8 and not stop in the middle of a code point.
A future commit will update LibJS to use the correct type of string
depending on the flags.
While trying to port to Clang we found that the functions as
implemented didn't actually work, and replacing them with a blatantly
broken function also did not break the tests on the GCC build. It
turns out we've been testing GCC's builtins by many tests. This
removes the use of builtins for LibM's tests (so we test the whole
function). It turns off the denormal test for scalbn (which was not
implemented) and comments out the tgamma(0.5) test which is too
inaccurate to be usable (and too complicated for me to fix). The gamma
function was made accurate for all other test cases, and asin received
two more layers of Taylor expansion to bring it within error margin
for the tests.
Previously, HTMLToken would expose the Vector<Attribute> directly to
its users. In preparation for a future change, all users now use
implementation-agnostic APIs which do not expose the Vector directly.
* wasm: Don't try to print the function results if it traps
* LibWasm: Inline some very hot functions
These are mostly pretty small functions too, and they were about ~10%
of runtime.
* LibWasm+Everywhere: Make the instruction count limit configurable
...and enable it for LibWeb and test-wasm.
Note that `wasm` will not be limited by this.
* LibWasm: Remove a useless use of ScopeGuard
There are no multiple exit paths in that function, so we can just put
the ending logic right at the end of the function instead.
Prior to this, it'd try to stuff them into an i64, which could fail and
give us nothing.
Even though this is an extension we've made to JSON, the parser should
be able to correctly round-trip from whatever our serialiser has
generated.
This fixes parsing the following regular expression: /</g;
It also adds a simple script element to the HTMLTokenizer regression
test, which also contains that specific regex.
The test suite includes a few basic tests and a very crude regression
test, which just concatenates the to_string() of all tokens and checks
the String's hash to be equal. This relies on the format of
HTMLToken::to_string() to stay the same, which is not ideal.
Since Clang enables a couple of warnings that we don't have in GCC,
these were not caught before. Included fixes:
- Use correct printf format string for `size_t`
- Don't compare Nonnull(Ref|Own)Ptr` to nullptr
- Fix unsigned int& => unsigned long& conversion
This test exposed a kernel panic in is_user_range calculations, so let's
convert it to be a LibTest test so we can prevent regressions in mmap,
the page allocator, and the memory manager.
Let's bring this class back, but without the confusing resize() API.
A FixedArray<T> is simply a fixed-size array of T.
The size is provided at run-time, unlike Array<T> where the size is
provided at compile-time.
This patch introduces the SQLServer system server. This service is
supposed to be the only process/application talking to database storage.
This makes things like locking and caching more reliable, easier to
implement, and more efficient.
In LibSQL we added a client component that does the ugly IPC nitty-
gritty for you. All that's needed is setting a number of event handler
lambdas and you can connect to databases and execute statements on them.
Applications that wish to use this SQLClient class obviously need to
link LibSQL and LibIPC.
The Order enum is used in the Meta component of LibSQL. Using this enum
meant having to include the monster AST/AST.h include file. Furthermore,
they are sort of basic and therefore can live in the general SQL
namespace. Moved to LibSQL/Type.h.
Also introduced a new class, SQLResult, which is needed in future
patches.
Now that the test is converted to be LibTest based, we can remove it
from the exclude list in /home/anon/.config/Tests.ini.
Prior to this it would crash and fail because it was signaled instead of
returning normally with exit code 0.
After the changes to LibGfx to make default font management handled in
WindowServer instead of each GUI application to allow for global font
broadcasts, the two LibGfx tests broke. The non-benchmark was fixed in
8f96d2, but the benchmark was left in the dust because nobody really
runs it manually :^(
These are usually incorrect, and people sometimes forget to add the
correct values as a result of them being optional, so they should just
be specified explicitly.
This adds a test case for String::find and String::find_all with empty
needles. The expected behavior is in line with what the C++ standard
library (and other languages standard libraries) expect.
This implements StringUtils::find_any_of() and uses it in
String::find_any_of() and StringView::find_any_of(). All uses of
find_{first,last}_of have been replaced with find_any_of(), find() or
find_last(). find_{first,last}_of have subsequently been removed.
This removes StringView::find_first_of(char) and find_last_of(char) and
replaces all its usages with find and find_last respectively. This is
because those two methods are functionally equivalent.
find_{first,last}_of should only be used if searching for multiple
different characters, which is never the case with the char argument.
This also adds the [[nodiscard]] to the remaining find_{first,last}_of
methods.
This replaces the current LexicalPath::append() API with a new method
that returns a new LexicalPath object and doesn't touch the this-object.
With this, LexicalPath is now immutable. It also adds a
LexicalPath::parent() method and the relevant test cases.
Since this is always set to true on the non-default constructor and
subsequently never modified, it is somewhat pointless. Furthermore,
there are arguably no invalid relative paths.
Split out the functionality to gather multiple tests from the filesystem
and run them in turn into Test::TestRunner, and leave the JavaScript
specific test harness logic in Test::JS::TestRunner and friends.
If someone runs the test with shell redirection going on, or in a way
that changes any of the standard file descriptors this assumption will
not hold. When running from a terminal normally, it is true however.
Instead, check that /proc/self/fd/[0,1,2] are symlinks, and can be
stat-d by verifying that both stat and lstat succeed, and give different
struct stat contents.
Also add some tests to ensure that they _remain_ constexpr.
In general, any runtime assertions, weirdo C casts, pointer aliasing,
and such shenanigans should be gated behind the (helpfully newly added)
AK::is_constant_evaluated() function when the intention is to write
constexpr-capable code.
a.k.a. deliver promises of constexpr-ness :P
This commit converts naked `new`s to `AK::try_make` and `AK::try_create`
wherever possible. If the called constructor is private, this can not be
done, so we instead now use the standard-defined and compiler-agnostic
`new (nothrow)`.
Scanning tables is a linear process using pointers in the table's
tuples, and does not involve more 'stochastic' code paths like index
traversals. Therefore the 1000 and 10000 row tests were basically
overkill and added nothing we can't find out with less rows.
This declares all test cases which compare function outputs over the
entire Unicode range as `BENCHMARK_CASE`, to avoid them being run by CI.
This reduces runtime of TestCharacterTypes (without benchmarks) by about
one third.
SQL was standardized before there was consensus on sane language syntax
constructs had evolved. The language is mostly case-insensitive, with
unquoted text converted to upper case. Identifiers can include lower
case characters and other 'special' characters by enclosing the
identifier with double quotes. A double quote is escaped by doubling it.
Likewise, a single quote in a literal string is escaped by doubling it.
All this means that the strategy used in the lexer, where a token's
value is a StringView 'window' on the source string, does not work,
because the value needs to be massaged before being handed to the
parser. Therefore a token now has a String containing its value. Given
the limited lifetime of a token, this is acceptable overhead.
Not doing this means that for example quote removal and double quote
escaping would need to be done in the parser or in AST node
construction, which would spread lexing basically all over the place.
Which would be suboptimal.
There was some impact on the sql utility and SyntaxHighlighter component
which was addressed by storing the token's end position together with
the start position in order to properly highlight it.
Finally, reviewing the tests for parsing numeric literals revealed an
inconsistency in which tokens we accept or reject: `1a` is accepted but
`1e` is rejected. Related to this is the fate of `0x`. Added a FIXME
reminding us to address this.
Unfortunately this patch is quite large.
The main functionality included are a BTree index implementation and
the Heap class which manages persistent storage.
Also included are a Key subclass of the Tuple class, which is a
specialization for index key tuples. This "dragged in" the Meta layer,
which has classes defining SQL objects like tables and indexes.
This patch adds the basic dynamic value classes used by the SQL Storage
layer. The most elementary class is Value, which holds a typed Value
which can be converted to standard C++ types. A Tuple is a collection
of Values described by a TupleDescriptor, which specifies the names,
types, and ordering of the elements in the Tuple.
Tuples and Values can be serialized and deserialized to and from
ByteBuffers. This is mechanism which is used to save them to disk.
Tuples are used as keys in SQL indexes and rows in SQL tables.
Also included is a test file.
The insert_before method on AK::InlineLinkedList is used, so in order to
achieve feature parity, we need to implement it for AK::IntrusiveList as
well.
A POSIX-compatibility fix was introduced in 64740a0214 to make the
compilation of the `diffutils` port work, which expected a
`char* const* argv` signature.
And indeed, the POSIX spec does not mention permutation of `argv`:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/getopt.html
However, most implementations do modify `argv` as evidenced by
documentation such as:
https://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic
/LSB-Core-generic/libutil-getopt-3.html
"The function prototype was aligned with POSIX 1003.1-2008 (ISO/IEC
9945-2009) despite the fact that it modifies argv, and the library
maintainers are unwilling to change this."
Change the behavior back to permutate `argc` to allow for the following
command line argument order to work again:
unzip ./file.zip -o target-dir
Without this change, `./file.zip` in the example above would have been
ignored completely.
Doing these as custom classes might be faster, especially when writing
them in SSE, but this would cause a lot of Code duplication and due to
the nature of constexprs and the intelligence of the compiler they might
be using SSE/MMX either way
This commit makes it possible to instantiate `Vector<T&>` and use it
to store references to `T` in a vector.
All non-pointer observers are made to return the reference, and the
pointer observers simply yield the underlying pointer.
Note that the 'find_*' methods act on the values and not the pointers
that are stored in the vector.
This commit also makes errors in various vector methods much more
readable by directly using requires-clauses on them.
And finally, it should be noted that Vector cannot hold temporaries :^)
Previously, AK::Function would accept _any_ callable type, and try to
call it when called, first with the given set of arguments, then with
zero arguments, and if all of those failed, it would simply not call the
function and **return a value-constructed Out type**.
This lead to many, many, many hard to debug situations when someone
forgot a `const` in their lambda argument types, and many cases of
people taking zero arguments in their lambdas to ignore them.
This commit reworks the Function interface to not include any such
surprising behaviour, if your function instance is not callable with
the declared argument set of the Function, it can simply not be
assigned to that Function instance, end of story.
According to the definition at https://sqlite.org/lang_expr.html, SQL
expressions could be infinitely deep. For practicality, SQLite enforces
a maxiumum expression tree depth of 1000. Apply the same limit in
LibSQL to avoid stack overflow in the expression parser.
Fixes https://crbug.com/oss-fuzz/34859.
Because non-ASCII code points have negative byte values, trimming away
control characters requires checking for negative bytes values.
This also adds a test case with a URL containing non-ASCII code points.
This commit is a fairly large refactor, mainly because it unified the
two different ways that existed to represent references.
Now Reference values are also a kind of value.
It also implements a printer for values/references instead of copying
the implementation everywhere.
StringView::lines() supports line-separators “\n”, “\r”, and “\r\n”.
The method will drop an entire line if it is surrounded by “\r”
and “\n” separators on the left and right sides respectively.
The previous behavior was to always VERIFY that the UTF-8 bytes were
valid when iterating over the code points of an UTF8View. This change
makes it so we instead output the 0xFFFD 'REPLACEMENT CHARACTER'
code point when encountering invalid bytes, and keep iterating the
view after skipping one byte.
Leaving the decision to the consumer would break symmetry with the
UTF32View API, which would in turn require heavy refactoring and/or
code duplication in generic code such as the one found in
Gfx::Painter and the Shell.
To make it easier for the consumers to detect the original bytes, we
provide a new method on the iterator that returns a Span over the
data that has been decoded. This method is immediately used in the
TextNode::compute_text_for_rendering method, which previously did
this in a ad-hoc waay.
This also add tests for the new behavior in TestUtf8.cpp, as well
as reinforcements to the existing tests to check if the underlying
bytes match up with their expected values.
Rather than aborting when a LIMIT clause of the form 'LIMIT expr, expr'
is encountered, fail the parser with a syntax error. This will be nicer
for the user and fixes the following fuzzer bug:
https://crbug.com/oss-fuzz/34837
This patch introduces a new operator== to compare an Optional to its
contained type directly. If the Optional does not contain a value, the
comparison will always return false.
This also adds a test case for the new behavior as well as comparison
between Optional objects themselves.
This adds more tests for AK::URL. Furthermore, this also changes some
tests to conform to what the reworked URL class does (and the URL
specification mostly expects).
This adds a peek method for Utf8CodepointIterator, which enables it to
be used in some parsing cases where peeking is necessary.
peek(0) is equivalent to operator*, expect that peek() does not contain
any assertions and will just return an empty Optional<u32>.
This also implements a test case for iterating UTF-8.
Previously, we would go crazy and shift things way out of bounds.
Add tests to verify that the decoding algorithm is safe around the
limits of the result type.
The round trip compress test wants the first half of the byte buffer to
be filled with random data, and the second half to be all zeroes. The
strategy of using memset on ByteBuffer::offset_pointer confuses
__builtin_memset_chk when building with -fsanitize=undefined. It thinks
that the buffer is using inline capacity when we can prove to ourselves
pretty easily that it's not. To avoid this, just create the buffer
zeroed to start, and then fill the first half with the random data.
This implements the XSI-compliant version of strerror_r() - as opposed
to the GNU-specific variant.
The function explicitly saves errno so as to not accidentally change it
with one of the calls to other functions.
We had two functions for doing mostly the same thing. Combine both
of them into String::find() and use that everywhere.
Also add some tests to cover basic behavior.
Managing the instantiated modules becomes a pain if they're on the
stack, since an instantiated module will eventually reference itself.
To make using this simpler, just avoid copying the instance.
This fixes a FIXME and will allow linking only select modules together,
instead of linking every instantiated module into a big mess of exported
entities :P
This also optionally generates a test suite from the WebAssembly
testsuite, which can be enabled via passing `INCLUDE_WASM_SPEC_TESTS`
to cmake, which will generate test-wasm-compatible tests and the
required fixtures.
The generated directories are excluded from git since there's no point
in committing them.
This only tests "can it be parsed", but the goal of this commit is to
provide a test framework that can be built upon :)
The conformance tests are downloaded, compiled* and installed only if
the INCLUDE_WASM_SPEC_TESTS cmake option is enabled.
(*) Since we do not yet have a wast parser, the compilation is delegated
to an external tool from binaryen, `wasm-as`, which is required for the
test suite download/install to succeed.
This *does* run the tests in CI, but it currently does not include the
spec conformance tests.
For each .cpp file in the test suite data, there is a .ast file that
represents the "known good" baseline of the parser result.
Each .cpp file goes through the parser, and the result of
invoking `ASTNode::dump()` on the root node is compared to the
baseline to find regressions.
We also check that there were no parser errors when parsing the .cpp
files.
Problem:
- `static` variables consume memory and sometimes are less
optimizable.
- `static const` variables can be `constexpr`, usually.
- `static` function-local variables require an initialization check
every time the function is run.
Solution:
- If a global `static` variable is only used in a single function then
move it into the function and make it non-`static` and `constexpr`.
- Make all global `static` variables `constexpr` instead of `const`.
- Change function-local `static const[expr]` variables to be just
`constexpr`.
Previously <AK/Function.h> also included <AK/OwnPtr.h>. That's about to
change though. This patch fixes a few build problems that will occur
when that change happens.
This changes Variant::visit() to forward the value returned by the
selected visitor invocation. By perfectly forwarding the returned value,
this allows for the visitor to return by value or reference.
Note that all provided visitors must return the same type - the compiler
will otherwise fail with the message: "inconsistent deduction for auto
return type".
The current code is factored such that reads to the entirety of the last
byte should be dropped. This was relying on the fact that last would be
one past the end in that case. Instead of actually reading that byte
when it's completely out of bounds of the bitmask, just skip reads that
would be invalid. Add more tests to make sure that the behavior is
correct for byte aligned reads of byte aligned bitmaps.
Previously ByteBuffer would internally hold a RefPtr to the byte
buffer and would behave like a reference type, i.e. copying a
ByteBuffer would not create a duplicate byte buffer, but rather
two objects which refer to the same internal buffer.
This also changes ByteBuffer so that it has some internal capacity
much like the Vector<T> type. Unlike Vector<T> however a byte
buffer's data may be uninitialized.
With this commit ByteBuffer makes use of the kmalloc_good_size()
API to pick an optimal allocation size for its internal buffer.
The TestRunner objects at the end of test-js are destroyed after the
if/else that chooses whether to run the 262 parser tests or the standard
tests. Accessing TestRunner::the() after the lifetime of the TestRunners
ends is UB, so return the Test::Counts from run() instead. Also, fix the
destructor of TestRunner to set s_the to nullptr so that if anyone tries
this type of shenanigains again, they'll get a crash :^).
The should_not_destroy test case intentionally performs an invalid stack
access on a NeverDestroyed to confirm that the destructor for the held
type was not called.
The C interface (posix interface?) for regexes has no "initialize"
function, only a free function. The comment in regcomp in
LibRegex/C/Regex.cpp notes that calling regcomp without a regfree is an
error, and will leak memory. Every single time regcomp is called on a
regex_t*, it will allocate new memory.
Make sure that all the regcomp calls are paired with a regfree in the
tests program
We can't unref an object to destruction while there's still a live
RefPtr to the object, otherwise the RefPtr destructor will try to
destroy it again, accessing the refcount of a destroyed object (before
realizing that oops! the object is already dead)
Unfortunately adopt_ref requires a reference, which obviously does not
work well with when attempting to harden against allocation failure.
The adopt_ref_if_nonnull() variant will allow you to avoid using bare
pointers, while still allowing you to handle allocation failure.
This patch adds some rudimentary tests for InodeWatcher. It tests the
basic functionality, but maybe there are corner cases I haven't caught.
Additionally, this is our first LibCore test. :^)
This allows the construction of `Variant<int, int, int>`.
While this might not seem useful, it is very useful for making variants
that contain a series of member function pointers, which I plan to use
in LibGL for glGenLists() and co.
With the goal of centralizing all tests in the system, this is a
first step to establish a Tests sub-tree. It will contain all of
the unit tests and test harnesses for the various components in the
system.
This commit adds an implementation of memmem, using the Bitap text
search algorithm for needles smaller than 32 bytes, and a naive loop
search for longer needles.
Add a test case that the timeout argument to pthread_cond_timedwait
works in LibPthread. This change also validates the new support for
timeouts to the futex syscall, as that's how condition variables are
implemented.
This adds a test for the race condition in clock_nanosleep.
The crux is that clock_nanosleep verifies that the output buffer
is writable *before* sleeping, and writes to it *after* sleeping.
In the meantime, a concurrent thread can make the output buffer
unwritable, e.g. by deallocating it.
This testcase is needlessly complex because pthread_kill is
not implemented yet. I tried to keep it as simple as possible.
Here is the relevant part of dmesg:
[nanosleep-race-outbuf-munmap(22:22)]: Unblock nanosleep-race-outbuf-munmap(20:20) due to signal
nanosleep-race-outbuf-munmap(20:20) Unrecoverable page fault, write to address 0x02130016
CRASH: Page Fault. Process: nanosleep-race-outbuf-munmap(20)
[nanosleep-race-outbuf-munmap(20:20)]: 0xc01160ff memcpy +44
[nanosleep-race-outbuf-munmap(20:20)]: 0xc014de64 Kernel::Process::crash(int, unsigned int) +782
[nanosleep-race-outbuf-munmap(20:20)]: 0xc01191b5 illegal_instruction_handler +0
[nanosleep-race-outbuf-munmap(20:20)]: 0xc011965b page_fault_handler +649
[nanosleep-race-outbuf-munmap(20:20)]: 0xc0117233 page_fault_asm_entry +22
[nanosleep-race-outbuf-munmap(20:20)]: 0xc011616b copy_to_user +102
[nanosleep-race-outbuf-munmap(20:20)]: 0xc015911f Kernel::Process::sys(Kernel::Syscall::SC_clock_nanosleep_params const*) +457
[nanosleep-race-outbuf-munmap(20:20)]: 0xc015daad syscall_handler +1130
[nanosleep-race-outbuf-munmap(20:20)]: 0xc015d597 syscall_asm_entry +29
[nanosleep-race-outbuf-munmap(20:20)]: 0x08048437 main +146
[nanosleep-race-outbuf-munmap(20:20)]: 0x08048573 _start +94
Most importantly, note that it crashes *inside*
Kernel::Process::sys.
Instead, the correct behavior is to return -EFAULT.
We should only execute the filename verbatim if it contains a slash (/)
character somewhere. Otherwise, we need to look through the entries in
the PATH environment variable.
This fixes an issue where you could easily "override" system programs
by placing them in a directory you control, and then waiting for
someone to come there and run e.g "ls" :^)
Test: LibC/exec-should-not-search-current-directory.cpp
Previously this API would return an InodeIdentifier, which meant that
there was a race in path resolution where an inode could be unlinked
in between finding the InodeIdentifier for a path component, and
actually resolving that to an Inode object.
Attaching a test that would quickly trip an assertion before.
Test: Kernel/path-resolution-race.cpp
Previously it was not possible for this function to fail. You could
exploit this by triggering the creation of a VMObject whose physical
memory range would wrap around the 32-bit limit.
It was quite easy to map kernel memory into userspace and read/write
whatever you wanted in it.
Test: Kernel/bxvga-mmap-kernel-into-userspace.cpp
Right now, permission flags passed to VFS::open() are effectively ignored, but
that is going to change.
* O_RDONLY is 0, but it's still nicer to pass it explicitly
* POSIX says that binding a Unix socket to a symlink shall fail with EADDRINUSE
It's now an error to sys$mmap() a file as writable if it's currently
mapped executable by anyone else.
It's also an error to sys$execve() a file that's currently mapped
writable by anyone else.
This fixes a race condition vulnerability where one program could make
modifications to an executable while another process was in the kernel,
in the middle of exec'ing the same executable.
Test: Kernel/elf-execve-mmap-race.cpp
It was possible to craft a custom ELF executable that when symbolicated
would cause the kernel to read from user-controlled addresses anywhere
in memory. You could then fetch this memory via /proc/PID/stack
We fix this by making ELFImage hand out StringView rather than raw
const char* for symbol names. In case a symbol offset is outside the
ELF image, you get a null StringView. :^)
Test: Kernel/elf-symbolication-kernel-read-exploit.cpp
The join_thread() syscall is not supposed to be interruptible by
signals, but it was. And since the process death mechanism piggybacked
on signal interrupts, it was possible to interrupt a pthread_join() by
killing the process that was doing it, leading to confusing due to some
assumptions being made by Thread::finalize() for threads that have a
pending joiner.
This patch fixes the issue by making "interrupted by death" a distinct
block result separate from "interrupted by signal". Then we handle that
state in join_thread() and tidy things up so that thread finalization
doesn't get confused by the pending joiner being gone.
Test: Tests/Kernel/null-deref-crash-during-pthread_join.cpp
This fixes a null RefPtr deref (which asserts) in the scheduler if a
file descriptor being select()'ed is closed by a second thread while
blocked in select().
Test: Kernel/null-deref-close-during-select.cpp
This patch fixes some issues with the mmap() and mprotect() syscalls,
neither of whom were checking the permission bits of the underlying
files when mapping an inode MAP_SHARED.
This made it possible to subvert execution of any running program
by simply memory-mapping its executable and replacing some of the code.
Test: Kernel/mmap-write-into-running-programs-executable-file.cpp
This encourages callers to strongly reference file descriptions while
working with them.
This fixes a use-after-free issue where one thread would close() an
open fd while another thread was blocked on it becoming readable.
Test: Kernel/uaf-close-while-blocked-in-read.cpp