Commit Graph

378 Commits

Author SHA1 Message Date
Tim Schumacher
5ac2e84264 LibC: Implement wcsstr 2021-10-03 05:28:51 +00:00
Tim Schumacher
1b078f87b7 LibC: Implement wcspbrk 2021-10-03 05:28:51 +00:00
Liav A
4974727dbb Kernel: Move x86 IO instructions code into the x86 specific folder 2021-10-01 12:27:20 +02:00
Nico Weber
841a5fe81b Tests: Fix typos 2021-10-01 01:33:43 +01:00
davidot
ce3f29a135 LibJS + test-js: Get results from the global object directly
This is as the spec would require you to do it and necessary for changes
to come in the following commits.
2021-09-30 08:16:32 +01:00
Rodrigo Tobar
a8fae3d2c8 LibPthread: Add first test cases for RWlock 2021-09-28 18:36:20 +03:30
Andreas Kling
f67648f872 LibWeb: Rename HTMLDocumentParser => HTMLParser 2021-09-25 23:36:43 +02:00
Ben Wiederhake
743470c8f2 AK: Introduce ability to default-initialize a Variant
I noticed that Variant<Empty, …> is a somewhat common pattern while
working on #10080, and this will simplify a few use-cases. :^)
2021-09-21 04:22:52 +04:30
Ali Mohammad Pur
436693c0c9 LibTLS: Use a setter for on_tls_ready_to_write with some more smarts
The callback should be called as soon as the connection is established,
and if we actually set the callback when it already is, we expect it to
be called immediately.
2021-09-19 21:10:23 +04:30
thankyouverycool
41ce6d0cd0 Tests: Conform font tests to new font format 2021-09-19 00:58:59 +02:00
Andreas Kling
1be4cbd639 AK: Make Utf8View constructors inline and remove C string constructor
Using StringView instead of C strings is basically always preferable.
The only reason to use a C string is because you are calling a C API.
2021-09-18 19:54:24 +02:00
Tim Schumacher
b8c756a53a LibC: Primitively implement wcscoll
At the moment, sorting like LC_COLLATE=C would do is better than
nothing.
2021-09-18 02:57:56 +00:00
Tim Schumacher
42031f026a LibC: Implement towctrans 2021-09-17 22:59:51 +00:00
Tim Schumacher
8c7b566629 LibC: Implement iswctype 2021-09-17 22:59:51 +00:00
Tim Schumacher
ff0ab8b9a9 LibC: Implement wctrans 2021-09-17 22:59:51 +00:00
Tim Schumacher
7b17230d7a LibC: Implement wctype 2021-09-17 22:59:51 +00:00
Ben Wiederhake
5f0f0ac413 crash: Don't test for qemu-unsupported feature
See #10042 for details. In short: qemu doesn't seem to implement that
feature, therefore the test correctly fails. However, that does not help
us, so we skip that test.
2021-09-16 20:51:24 +00:00
Ben Wiederhake
c680ef0a09 crash: Run automatically during CI 2021-09-16 20:51:24 +00:00
Andrew Kaster
6e7cc40b18 Meta: Add Meta/CMake to the CMAKE_MODULE_PATH for Serenity and Lagom
This makes it so we don't need to specify the full path to all the
helper scripts we include() from different places in the codebase and
feels a lot cleaner.
2021-09-15 19:04:52 +04:30
Ali Mohammad Pur
741886a4c4 LibRegex: Make the optimizer understand references and capture groups
Otherwise the fork in patterns like `(1+)\1` would be (incorrectly)
optimized away.
2021-09-15 15:52:28 +04:30
Andreas Kling
0a09eaf3a1 LibJS+LibTest: Use JS::Script and JS::SourceTextModule in test-js
Instead of creating a Parser and Lexer manually in test-js, we now
use either JS::Script::parse() or JS::SourceTextModule::parse()
to load tests.
2021-09-14 21:41:51 +02:00
Ali Mohammad Pur
ccb53c64e9 AK: Add an abstraction over multiple disjoint buffers
DisjointChunks<T> provides a nice interface over multiple sequential
Vector<T>'s, allowing the user to iterate over/index into/slice from
said buffers as if they were a single contiguous buffer.
To work with views on such objects, DisjointSpans<T> is provided, which
has the same behaviour but does not own the underlying objects.
2021-09-14 21:33:15 +04:30
Idan Horowitz
d6cfa34667 AK: Make URL::m_port an Optional<u16>, Expose raw port getter
Our current way of signalling a missing port with m_port == 0 was
lacking, as 0 is a valid port number in URLs.
2021-09-14 00:14:45 +02:00
Ali Mohammad Pur
246ab432ff LibRegex: Add a basic optimization pass
This currently tries to convert forking loops to atomic groups, and
unify the left side of alternations.
2021-09-13 14:38:53 +04:30
Timothy Flynn
470262c8ab LibJS: Use ErrorType::NotAnObjectOfType instead of NotA 2021-09-12 00:16:39 +02:00
Timothy Flynn
c59b97043e LibWeb: Use ErrorType::NotAnObjectOfType instead of NotA 2021-09-12 00:16:39 +02:00
Idan Horowitz
6704961c82 AK: Replace the mutable String::replace API with an immutable version
This removes the awkward String::replace API which was the only String
API which mutated the String and replaces it with a new immutable
version that returns a new String with the replacements applied. This
also fixes a couple of UAFs that were caused by the use of this API.

As an optimization an equivalent StringView::replace API was also added
to remove an unnecessary String allocations in the format of:
`String { view }.replace(...);`
2021-09-11 20:36:43 +03:00
Ben Wiederhake
2e4ec891da Everywhere: Fix format-vulnerabilities
Command used:
grep -Pirn '(out|warn)ln\((?!["\)]|format,|stderr,|stdout,|output, ")' \
     AK Kernel/ Tests/ Userland/
(Plus some manual reviewing.)

Let's pick ArgsParser as an example:
    outln(file, m_general_help);
This will fail at runtime if the general help happens to contain braces.

Even if this transformation turns out to be unnecessary in a place or
two, this way the code is "more obviously" correct.
2021-09-11 15:16:26 +01:00
Brian Gianforcaro
a4efaa7b47 Tests/Kernel: Fix test after off-by-one fix in Memory::is_user_range()
Commit 890c647e0f fixed an off-by-one bug, so the mapping of the page
at the very end of the user address space now works correctly.

This change adjusts the test so cover the corner cases the original
version was designed too.validate.
2021-09-11 04:15:16 +00:00
Ali Mohammad Pur
14c8373eb0 AK+Kernel: Reduce the number of template parameters of IntrusiveRBTree
This makes the user-facing type only take the node member pointer, and
lets the compiler figure out the other needed types from that.
2021-09-10 18:05:46 +03:00
Ali Mohammad Pur
5a0cdb15b0 AK+Everywhere: Reduce the number of template parameters of IntrusiveList
This makes the user-facing type only take the node member pointer, and
lets the compiler figure out the other needed types from that.
2021-09-10 18:05:46 +03:00
Timothy Flynn
4f2bcebe74 LibUnicode+LibJS: Store locale keyword values as a single string
Previously, LibUnicode would store the values of a keyword as a Vector.
For example, the locale "en-u-ca-abc-def" would have its keyword "ca"
stored as {"abc, "def"}. Then, canonicalization would occur on each of
the elements in that Vector.

This is incorrect because, for example, the keyword value "true" should
only be dropped if that is the entire value. That is, the canonical form
of "en-u-kb-true" is "en-u-kb", but "en-u-kb-abc-true" does not change
for canonicalization. However, we would canonicalize that locale as
"en-u-kb-abc".
2021-09-08 21:08:48 +01:00
Idan Horowitz
cb9720baab AK: Set IntrusiveRBTree Node key on insertion instead of construction
This makes the API look much nicer.
2021-09-08 19:17:07 +03:00
Idan Horowitz
1db9250766 AK: Make IntrusiveRedBlackTree capable of holding non-raw pointers
This is completely based on e4412f1f59
and will allow us to convert some AK::HashMap users in the kernel.
2021-09-08 19:17:07 +03:00
davidot
43b17f27a3 test-js: Add a mark_as_garbage method to force GC to collect that object
This should fix the flaky tests of test-js.
It also fixes the tests when running with the -g flag since the values
will not be garbage collected too soon.
2021-09-08 08:53:02 +01:00
Ali Mohammad Pur
7fefb8148b LibRegex: Use the correct capture group index in ERE bytecode generation
Otherwise the left and right capture instructions wouldn't point to the
same capture group if there was another nested group there.
2021-09-07 20:01:58 +02:00
Andreas Kling
6ad427993a Everywhere: Behaviour => Behavior 2021-09-07 13:53:14 +02:00
Timothy Flynn
50158abaf1 LibUnicode: Implement locale-aware BEFORE_DOT special casing
Note that the algorithm in the Unicode spec is for checking that a code
point precedes U+0307, but the special casing condition NotBeforeDot is
interested in the inverse of this rule.
2021-09-06 15:24:27 +01:00
Timothy Flynn
436faf9fd9 LibUnicode: Implement locale-aware MORE_ABOVE special casing 2021-09-06 15:24:27 +01:00
Timothy Flynn
1427ebc622 LibUnicode: Implement locale-aware AFTER_SOFT_DOTTED special casing 2021-09-06 15:24:27 +01:00
Timothy Flynn
0053d48c41 LibUnicode: Implement locale-aware AFTER_I special casing 2021-09-06 15:24:27 +01:00
Ali Mohammad Pur
63523d3836 Tests/LibRegex: Decrease the size of the fork chain test 2021-09-06 13:51:30 +04:30
Ali Mohammad Pur
abbe9da255 LibRegex: Make infinite repetitions short-circuit on empty matches
This makes (addmittedly weird) patterns like `(a*)*` work correctly
without going into an infinite fork loop.
2021-09-06 13:51:30 +04:30
Ali Mohammad Pur
97e97bccab Everywhere: Make ByteBuffer::{create_*,copy}() OOM-safe 2021-09-06 01:53:26 +02:00
Ali Mohammad Pur
3a9f00c59b Everywhere: Use OOM-safe ByteBuffer APIs where possible
If we can easily communicate failure, let's avoid asserting and report
failure instead.
2021-09-06 01:53:26 +02:00
Brian Gianforcaro
782e7834c3 Tests: Reduce the execution time of the LibC TestQSort test
Executing 100 times vs 10 times doesn't increase test coverage
substantial, this API is stable and more iterations is just a
waste of time.

Without KVM this test is a clear outlier in runtime during CI:

```
START  LibC/TestQsort (106/172)
PASS   LibC/TestQsort (41.233692s)
```
2021-09-05 22:17:28 +02:00
Andreas Kling
7dda773426 AK: Add rvalue-ref qualifiers for Optional's value() and value_or()
This avoids a value copy when calling value() or value_or() on a
temporary Optional. This is very common when using the HashMap::get()
API like this:

    auto value = hash_map.get(key).value_or(fallback_value);
2021-09-04 03:02:08 +02:00
Daniel Bertalan
d7b6cc6421 Everywhere: Prevent risky implicit casts of (Nonnull)RefPtr
Our existing implementation did not check the element type of the other
pointer in the constructors and move assignment operators. This meant
that some operations that would require explicit casting on raw pointers
were done implicitly, such as:
- downcasting a base class to a derived class (e.g. `Kernel::Inode` =>
  `Kernel::ProcFSDirectoryInode` in Kernel/ProcFS.cpp),
- casting to an unrelated type (e.g. `Promise<bool>` => `Promise<Empty>`
  in LibIMAP/Client.cpp)

This, of course, allows gross violations of the type system, and makes
the need to type-check less obvious before downcasting. Luckily, while
adding the `static_ptr_cast`s, only two truly incorrect usages were
found; in the other instances, our casts just needed to be made
explicit.
2021-09-03 23:20:23 +02:00
Timothy Flynn
a05419db55 LibUnicode: Add lexer to test if a string matches the "type" production 2021-09-02 17:56:42 +01:00
Andrew Kaster
58797a1289 Tests: Remove all file(GLOB) from CMakeLists in Tests
Using a file(GLOB) to find all the test files in a directory is an easy
hack to get things started, but has some drawbacks. Namely, if you add
a test, it won't be found again without re-running CMake. `ninja` seems
to do this automatically, but it would be nice to one day stop seeing it
rechecking our globbed directories.
2021-09-02 09:08:23 +02:00
sin-ack
b5a145b466 Tests: Add tests for Core::deferred_invoke 2021-09-02 03:47:47 +04:30
Timothy Flynn
fd0011989a LibUnicode: Resolve the most likely territory alias when there are many 2021-09-01 14:14:47 +01:00
Timothy Flynn
72f49e42b4 LibUnicode: Perform complex Unicode locale alias substitution 2021-09-01 14:14:47 +01:00
Timothy Flynn
da89cf9afb LibUnicode: Canonicalize calendar subtags
Calendar subtags are a bit of an odd-man-out in that we must match the
variants "ethiopic-amete-alem" in that order, without any other variant
in the locale. So a separate method is needed for this, and we now defer
sorting the variant list until after other canonicalization is done.
2021-09-01 14:14:47 +01:00
Timothy Flynn
8458f477a4 LibUnicode: Canonicalize timezone subtags 2021-09-01 14:14:47 +01:00
Timothy Flynn
335f985b31 LibUnicode: Canonicalize the subtag "imperial" to "uksystem" 2021-09-01 14:14:47 +01:00
Timothy Flynn
2d90144888 LibUnicode: Canonicalize the subtag "primary" and "tertiary" to "levelN" 2021-09-01 14:14:47 +01:00
Timothy Flynn
409f39b336 LibUnicode: Canonicalize the subtag "names" to "prprname" 2021-09-01 14:14:47 +01:00
Timothy Flynn
f907a7dc38 LibUnicode: Canonicalize the subtag "yes" to "true" 2021-09-01 14:14:47 +01:00
Timothy Flynn
556374a904 LibUnicode: Substitute Unicode locale aliases during canonicalization
Unicode TR35 defines how locale subtag aliases should be emplaced when
converting a locale to canonical form. For most subtags, it is a simple
substitution. Language subtags depend on context; for example, the
language "sh" should become "sr-Latn", but if the original locale has a
script subtag already ("sh-Cyrl"), then only the language subtag of the
alias should be taken ("sr-Latn").

To facilitate this, we now make two passes when canonicalizing a locale.
In the first pass, we convert the LocaleID structure to canonical syntax
(where the conversions all happen in-place). In the second pass, we form
the canonical string based on the canonical syntax.
2021-09-01 14:14:47 +01:00
Timothy Flynn
d13142f015 LibJS+LibUnicode: Store parsed Unicode locale data as full strings
Originally, it was convenient to store the parsed Unicode locale data as
views into the original string being parsed. But to implement locale
aliases will require mutating the data that was parsed. To prepare for
that, store the parsed data as proper strings.
2021-09-01 14:14:47 +01:00
Andrew Kaster
b3e3e4d45d Tests: Convert remaining LibC tests to LibTest
Convert them to using outln instead of printf at the same time.
2021-09-01 13:44:24 +02:00
Peter Elliott
33d7fdca28 Everywhere: Use my cool new @serenityos.org email address 2021-09-01 11:37:25 +04:30
Peter Elliott
8d2c04821f Tests: Test LibMarkdown against commonmark test suite
TestCommonmark runs the CommonMark test suite
(https://spec.commonmark.org/0.30/spec.json) against LibMarkdown.
Currently 44/652 tests pass.
2021-08-31 16:53:51 +02:00
Hediadyoin1
fdef6e5f76 AK: Add FixedPoint arithmetic helper
Co-authored-by: Hendiadyoin1 <leon2002.la@gmail.com>
Co-authored-by: kleines Filmröllchen <malu.bertsch@gmail.com>
2021-08-31 17:03:55 +04:30
Ali Mohammad Pur
30a1a25daa Tests/LibWasm: Handle all stream errors in parse_webassembly_module 2021-08-30 22:47:02 +02:00
Ali Mohammad Pur
99199b9bfd Tests/LibWasm: Add support for javascript bigint values
Some i64 values will not fit in normal doubles, and these values _are_
tested by the test suite, this makes the test runtime capable of
handling them correctly.
2021-08-30 22:47:02 +02:00
Timothy Flynn
f897c2edb3 LibUnicode: Canonicalize locale private use extensions 2021-08-30 19:42:40 +01:00
Timothy Flynn
6f0cb52dc4 LibUnicode: Canonicalize locale extensions 2021-08-30 19:42:40 +01:00
Timothy Flynn
30855e6663 LibUnicode: Parse locale private use extensions 2021-08-30 19:42:40 +01:00
Timothy Flynn
29f76ef7c8 LibUnicode: Parse locale extensions of the other extension form 2021-08-30 19:42:40 +01:00
Timothy Flynn
d2d304fcf8 LibUnicode: Parse locale extensions of the transformed extension form 2021-08-30 19:42:40 +01:00
Timothy Flynn
eda92d15e4 LibUnicode: Parse locale extensions of the Unicode locale extension form 2021-08-30 19:42:40 +01:00
Timothy Flynn
587d4663a3 AK: Return early from swap() when swapping the same object
When swapping the same object, we could end up with a double-free error.
This was found while quick-sorting a Vector of Variants holding complex
types, reproduced by the new swap_same_complex_object test case.
2021-08-30 19:42:40 +01:00
Ali Mohammad Pur
206bc01f81 LibRegex: Allow null bytes in pattern
That check was rather pointless as the input is a StringView which knows
its own bounds.
Fixes #9686.
2021-08-30 18:43:09 +02:00
Timothy Flynn
b7a95cba65 LibUnicode: Implement grammar validators for Unicode TR-35
ECMA-402 requires validating user input against the EBNF grammar for
Unicode locales described in TR-35: https://www.unicode.org/reports/tr35

This commit adds validators for that grammar, as well as other helper to
e.g. canonicalize a locale string.
2021-08-26 22:04:09 +01:00
Timothy Flynn
262e412634 AK: Implement method to convert a String/StringView to title case
This implementation preserves consecutive spaces in the orginal string.
2021-08-26 22:04:09 +01:00
Jean-Baptiste Boric
c97f7ea23b Tests: Test setjmp/sigsetjmp LibC functions
Since there are no real users of these functions in Serenity's
userland and this is my third attempt at this... This time, the great
LibTest test suite will make sure that I do it right!
2021-08-26 00:54:23 +02:00
Jan de Visser
85a84b0794 LibSQL: Introduce Serializer as a mediator between Heap and client code
Classes reading and writing to the data heap would communicate directly
with the Heap object, and transfer ByteBuffers back and forth with it.
This makes things like caching and locking hard. Therefore all data
persistence activity will be funneled through a Serializer object which
in turn submits it to the Heap.

Introducing this unfortunately resulted in a huge amount of churn, in
which a number of smaller refactorings got caught up as well.
2021-08-21 22:03:30 +02:00
Jan de Visser
d074a601df LibSQL+SQLServer: Bare bones INSERT and SELECT statements
This patch provides very basic, bare bones implementations of the
INSERT and SELECT statements. They are *very* limited:
- The only variant of the INSERT statement that currently works is
   SELECT INTO schema.table (column1, column2, ....) VALUES
      (value11, value21, ...), (value12, value22, ...), ...
   where the values are literals.
- The SELECT statement is even more limited, and is only provided to
  allow verification of the INSERT statement. The only form implemented
  is: SELECT * FROM schema.table

These statements required a bit of change in the Statement::execute
API. Originally execute only received a Database object as parameter.
This is not enough; we now pass an ExecutionContext object which
contains the Database, the current result set, and the last Tuple read
from the database. This object will undoubtedly evolve over time.

This API change dragged SQLServer::SQLStatement into the patch.

Another API addition is Expression::evaluate. This method is,
unsurprisingly, used to evaluate expressions, like the values in the
INSERT statement.

Finally, a new test file is added: TestSqlStatementExecution, which
tests the currently implemented statements. As the number and flavour of
implemented statements grows, this test file will probably have to be
restructured.
2021-08-21 22:03:30 +02:00
Jan de Visser
b74721e604 LibSQL: Redesign Value implementation and add new types
The implemtation of the Value class was based on lambda member variables
implementing type-dependent behaviour. This was done to ensure that
Values can be used as stack-only objects; the simplest alternative,
virtual methods, forces them onto the heap. The problem with the the
lambda approach is that it bloats the Values (which are supposed to be
lightweight objects) quite considerably, because every object contains
more than a dozen function pointers.

The solution to address both problems (we want Values to be able to live
on the stack and be as lightweight as possible) chosen here is to
encapsulate type-dependent behaviour and state in an implementation
class, and let the Value be an AK::Variant of those implementation
classes. All methods of Value are now basically straight delegates to
the implementation object using the Variant::visit method.

One issue complicating matters is the addition of two aggregate types,
Tuple and Array, which each contain a Vector of Values. At this point
Tuples and Arrays (and potential future aggregate types) can't contain
these aggregate types. This is limiting and needs to be addressed.

Another area that needs attention is the nomenclature of things; it's
a bit of a tangle of 'ValueBlahBlah' and 'ImplBlahBlah'. It makes sense
right now I think but admit we probably can do better.

Other things included here:
- Added the Boolean and Null types (and Tuple and Array, see above).
- to_string now always succeeds and returns a String instead of an
  Optional. This had some impact on other sources.
- Added a lot of tests.
- Started moving the serialization mechanism more towards where I want
  it to be, i.e. a 'DataSerializer' object which just takes
  serialization and deserialization requests and knows for example how
  to store long strings out-of-line.

One last remark: There is obviously a naming clash between the Tuple
class and the Tuple Value type. This is intentional; I plan to make the
Tuple class a subclass of Value (and hence Key and Row as well).
2021-08-21 22:03:30 +02:00
Jan de Visser
a5e28f2897 LibSQL: Make TupleDescriptor a shared pointer instead of a stack object
Tuple descriptors are basically the same for for example all rows in
a table. Makes sense to share them instead of copying them for every
single row.
2021-08-21 22:03:30 +02:00
Timothy Flynn
562d4e497b LibRegex: Treat pattern string characters as unsigned
For example, consider the following pattern:

    new RegExp('\ud834\udf06', 'u')

With this pattern, the regex parser should insert the UTF-8 encoded
bytes 0xf0, 0x9d, 0x8c, and 0x86. However, because these characters are
currently treated as normal char types, they have a negative value since
they are all > 0x7f. Then, due to sign extension, when these characters
are cast to u64, the sign bit is preserved. The result is that these
bytes are inserted as 0xfffffffffffffff0, 0xffffffffffffff9d, etc.

Fortunately, there are only a few places where we insert bytecode with
the raw characters. In these places, be sure to treat the bytes as u8
before they are cast to u64.
2021-08-20 19:16:33 +02:00
Andreas Kling
13f4890c38 LibCore: Make Core::File::open() return OSError in case of failure 2021-08-20 15:31:46 +02:00
Timothy Flynn
4f2cbe119b LibRegex: Allow Unicode escape sequences in capture group names
Unfortunately, this requires a slight divergence in the way the capture
group names are stored. Previously, the generated byte code would simply
store a view into the regex pattern string, so no string copying was
required.

Now, the escape sequences are decoded into a new string, and a vector
of all parsed capture group names are stored in a vector in the parser
result structure. The byte code then stores a view into the
corresponding string in that vector.
2021-08-19 23:49:25 +02:00
Timothy Flynn
fd8ccedf2b AK: Add GenericLexer API to consume an escaped Unicode code point
This parsing is already duplicated between LibJS and LibRegex, and will
shortly be needed in more places in those libraries. Move it to AK to
prevent further duplication.

This API will consume escaped Unicode code points of the form:
    \\u{code point}
    \\unnnn (where each n is a hexadecimal digit)
    \\unnnn\\unnnn (where the two escaped values are a surrogate pair)
2021-08-19 23:49:25 +02:00
Timothy Flynn
325eabc770 LibRegex: Ensure the GoBack operation decrements the code unit index
This was missed in commit 27d555bab0.
2021-08-18 09:47:09 +04:30
Timothy Flynn
a9716ad44e LibRegex: In non-Unicode mode, parse \u{4} as a repetition pattern 2021-08-18 09:47:09 +04:30
davidot
7613c22b06 LibJS: Add a mode to parse JS as a module
In a module strict mode should be enabled at the start of parsing and we
allow import and export statements.
2021-08-15 23:51:47 +01:00
Timothy Flynn
9509433e25 LibRegex: Implement and use a REPEAT operation for bytecode repetition
Currently, when we need to repeat an instruction N times, we simply add
that instruction N times in a for-loop. This doesn't scale well with
extremely large values of N, and ECMA-262 allows up to N = 2^53 - 1.

Instead, add a new REPEAT bytecode operation to defer this loop from the
parser to the runtime executor. This allows the parser to complete sans
any loops (for this instruction), and allows the executor to bail early
if the repeated bytecode fails.

Note: The templated ByteCode methods are to allow the Posix parsers to
continue using u32 because they are limited to N = 2^20.
2021-08-15 11:43:45 +01:00
Timothy Flynn
f1ce998d73 LibRegex+LibJS: Combine named and unnamed capture groups in MatchState
Combining these into one list helps reduce the size of MatchState, and
as a result, reduces the amount of memory consumed during execution of
very large regex matches.

Doing this also allows us to remove a few regex byte code instructions:
ClearNamedCaptureGroup, SaveLeftNamedCaptureGroup, and NamedReference.
Named groups now behave the same as unnamed groups for these operations.
Note that SaveRightNamedCaptureGroup still exists to cache the matched
group name.

This also removes the recursion level from the MatchState, as it can
exist as a local variable in Matcher::execute instead.
2021-08-15 11:43:45 +01:00
Timothy Flynn
1a173be29d LibRegex: Disallow unescaped quantifiers in Unicode mode 2021-08-15 11:43:45 +01:00
Timothy Flynn
c3e1f1f687 LibRegex: Use correct source characters for Unicode identity escapes 2021-08-15 11:43:45 +01:00
Timothy Flynn
6a485f612f LibRegex: Implement legacy octal escape parsing closer to the spec
The grammar for the ECMA-262 CharacterEscape is:

  CharacterEscape[U, N] ::
    ControlEscape
    c ControlLetter
    0 [lookahead ∉ DecimalDigit]
    HexEscapeSequence
    RegExpUnicodeEscapeSequence[?U]
    [~U]LegacyOctalEscapeSequence
    IdentityEscape[?U, ?N]

It's important to parse the standalone "\0 [lookahead ∉ DecimalDigit]"
before parsing LegacyOctalEscapeSequence. Otherwise, all standalone "\0"
patterns are parsed as octal, which are disallowed in Unicode mode.

Further, LegacyOctalEscapeSequence should also be parsed while parsing
character classes.
2021-08-15 11:43:45 +01:00
Timothy Flynn
83ca8c7e38 LibRegex: Convert LibRegex tests to use StringView in place of C-strings
A subsequent commit will add tests that require a string containing only
"\0". As a C-string, this will be interpreted as the null terminator. To
make the diff for that commit easier to grok, this commit converts all
tests to use StringView without any other functional changes.
2021-08-15 11:43:45 +01:00
Timothy Flynn
0c8f2f5aca LibRegex: Ensure escaped hexadecimals are exactly 2 digits in length 2021-08-15 11:43:45 +01:00
Timothy Flynn
2e4b6fd1ac LibRegex: Ensure escaped code points are exactly 4 digits in length 2021-08-15 11:43:45 +01:00
Timothy Flynn
e887314472 LibRegex: Fix ECMA-262 parsing of invalid identity escapes
* Only alphabetic (A-Z, a-z) characters may be escaped with \c. The loop
  currently parsing \c includes code points between the upper/lower case
  groups.
* In Unicode mode, all invalid identity escapes should cause a parser
  error, even in browser-extended mode.
* Avoid an infinite loop when parsing the pattern "\c" on its own.
2021-08-15 11:43:45 +01:00
Brian Gianforcaro
a2a5cb0f24 AK: Add Time::is_negative() to detect negative time values 2021-08-15 12:20:38 +02:00
Daniel Bertalan
0a36cea9dc Tests: Re-enable UserspaceEmulator tests on the Clang build
Now that problems that made UE crash have been fixed, this test should
now pass.
2021-08-14 18:42:14 +02:00