ladybird

mirror of https://github.com/LadybirdBrowser/ladybird.git synced 2025-01-06 02:55:49 +03:00

Author	SHA1	Message	Date
Timothy Flynn	c911781c21	Everywhere: Remove needless trailing semi-colons after functions This is a new option in clang-format-16.	2023-07-08 10:32:56 +01:00
Sam Atkins	6c66fd5ffb	LibRegex: Remove declarations for non-existent methods	2023-01-27 20:33:18 +00:00
Timothy Flynn	f3db548a3d	AK+Everywhere: Rename FlyString to DeprecatedFlyString DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so let's rename it to A) match the name of DeprecatedString, B) write a new FlyString class that is tied to String.	2023-01-09 23:00:24 +00:00
Linus Groh	6e19ab2bbc	AK+Everywhere: Rename String to DeprecatedString We have a new, improved string type coming up in AK (OOM aware, no null state), and while it's going to use UTF-8, the name UTF8String is a mouthful - so let's free up the String name by renaming the existing class. Making the old one have an annoying name will hopefully also help with quick adoption :^)	2022-12-06 08:54:33 +01:00
Ali Mohammad Pur	598dc74a76	LibRegex: Partially implement the ECMAScript unicodeSets proposal This skips the new string unicode properties additions, along with \q{}.	2022-07-20 21:25:59 +01:00
Ali Mohammad Pur	7734914909	LibRegex: Refactor parsing 'CharacterEscape' out of 'AtomEscape' The ECMA262 spec has this as a separate production, and we need it to be split up for a future commit.	2022-07-20 21:25:59 +01:00
Ali Mohammad Pur	b908f9f6ef	LibRegex: Pass parse flags as a struct instead of multiple arguments	2022-07-20 21:25:59 +01:00
sin-ack	fbc771efe9	Everywhere: Use default StringView constructor over nullptr While null StringViews are just as bad, these prevent the removal of StringView(char const*) as that constructor accepts a nullptr. No functional changes.	2022-07-12 23:11:35 +02:00
Ali Mohammad Pur	5fac41f733	LibRegex: Implement ECMA262 multiline matching without splitting lines As ECMA262 regex allows `[^]` and literal newlines to match newlines in the input string, we shouldn't split the input string into lines, rather simply make boundaries and catchall patterns capable of checking for these conditions specifically.	2022-01-26 00:53:09 +03:30
Ali Mohammad Pur	c11be92e23	LibRegex: Implement an ECMA262 Regex quirk with negative lookarounds This implements the quirk defined by "Note 3" in section "Canonicalize" (https://tc39.es/ecma262/#sec-runtime-semantics-canonicalize-ch). Crosses off another quirk from #6042.	2022-01-21 18:14:08 +03:30
Andreas Kling	6ad427993a	Everywhere: Behaviour => Behavior	2021-09-07 13:53:14 +02:00
Ali Mohammad Pur	05c65f9b5d	LibRegex: Limit the number of nested capture groups allowed in BRE Found by OSS-Fuzz: https://oss-fuzz.com/testcase?key=4869334212673536	2021-08-31 16:37:49 +02:00
Timothy Flynn	562d4e497b	LibRegex: Treat pattern string characters as unsigned For example, consider the following pattern: new RegExp('\ud834\udf06', 'u') With this pattern, the regex parser should insert the UTF-8 encoded bytes 0xf0, 0x9d, 0x8c, and 0x86. However, because these characters are currently treated as normal char types, they have a negative value since they are all > 0x7f. Then, due to sign extension, when these characters are cast to u64, the sign bit is preserved. The result is that these bytes are inserted as 0xfffffffffffffff0, 0xffffffffffffff9d, etc. Fortunately, there are only a few places where we insert bytecode with the raw characters. In these places, be sure to treat the bytes as u8 before they are cast to u64.	2021-08-20 19:16:33 +02:00
Timothy Flynn	4f2cbe119b	LibRegex: Allow Unicode escape sequences in capture group names Unfortunately, this requires a slight divergence in the way the capture group names are stored. Previously, the generated byte code would simply store a view into the regex pattern string, so no string copying was required. Now, the escape sequences are decoded into a new string, and a vector of all parsed capture group names are stored in a vector in the parser result structure. The byte code then stores a view into the corresponding string in that vector.	2021-08-19 23:49:25 +02:00
Timothy Flynn	6131c0485e	LibRegex: Use GenericLexer to consume escaped code points	2021-08-19 23:49:25 +02:00
Timothy Flynn	9509433e25	LibRegex: Implement and use a REPEAT operation for bytecode repetition Currently, when we need to repeat an instruction N times, we simply add that instruction N times in a for-loop. This doesn't scale well with extremely large values of N, and ECMA-262 allows up to N = 2^53 - 1. Instead, add a new REPEAT bytecode operation to defer this loop from the parser to the runtime executor. This allows the parser to complete sans any loops (for this instruction), and allows the executor to bail early if the repeated bytecode fails. Note: The templated ByteCode methods are to allow the Posix parsers to continue using u32 because they are limited to N = 2^20.	2021-08-15 11:43:45 +01:00
Timothy Flynn	f1ce998d73	LibRegex+LibJS: Combine named and unnamed capture groups in MatchState Combining these into one list helps reduce the size of MatchState, and as a result, reduces the amount of memory consumed during execution of very large regex matches. Doing this also allows us to remove a few regex byte code instructions: ClearNamedCaptureGroup, SaveLeftNamedCaptureGroup, and NamedReference. Named groups now behave the same as unnamed groups for these operations. Note that SaveRightNamedCaptureGroup still exists to cache the matched group name. This also removes the recursion level from the MatchState, as it can exist as a local variable in Matcher::execute instead.	2021-08-15 11:43:45 +01:00
Timothy Flynn	2e4b6fd1ac	LibRegex: Ensure escaped code points are exactly 4 digits in length	2021-08-15 11:43:45 +01:00
Ali Mohammad Pur	15f95220ae	AK+Everywhere: Delete Variant's default constructor This was exposed to the user by mistake, and even accumulated a bunch of users that didn't blow up out of sheer luck.	2021-08-13 17:31:39 +04:30
Timothy Flynn	df14d11a11	LibRegex: Disallow invalid interval qualifiers in Unicode mode Fixes all remaining 'built-ins/RegExp/property-escapes' test262 tests.	2021-08-11 13:11:01 +02:00
Timothy Flynn	484ccfadc3	LibRegex: Support property escapes of Unicode script extensions	2021-08-04 13:50:32 +01:00
Timothy Flynn	06088df729	LibRegex: Support property escapes of the Unicode script property Note that unlike binary properties and general categories, scripts must be specified in the non-binary (Script=Value) form.	2021-08-04 13:50:32 +01:00
Timothy Flynn	1e10d6d7ce	LibRegex: Support property escapes of Unicode General Categories This changes LibRegex to parse the property escape as a Variant of Unicode Property & General Category values. A byte code instruction is added to perform matching based on General Category values.	2021-08-02 21:02:09 +04:30
Timothy Flynn	d485cf29d7	LibRegex+LibUnicode: Begin implementing Unicode property escapes This supports some binary property matching. It does not support any properties not yet parsed by LibUnicode, nor does it support value matching (such as Script_Extensions=Latin).	2021-07-30 21:26:31 +01:00
Ali Mohammad Pur	36bfc912fc	LibRegex: Switch to east-const style	2021-07-23 21:19:21 +04:30
Ali Mohammad Pur	c8b2199251	LibRegex: Clear previous capture group contents in ECMA262 mode ECMA262 requires that the capture groups only contain the values from the last iteration, e.g. `((c)(a)?(b))` should _not_ contain 'a' in the second capture group when matching "cabcb".	2021-07-23 21:19:21 +04:30
Ali Mohammad Pur	11a8476cf4	LibRegex: Use the parser state capture group count in BRE Otherwise the users won't know how many capture groups are in the parsed regular expression.	2021-07-10 23:14:08 +04:30
Ali Mohammad Pur	54d89609de	LibRegex: Add support for the Basic POSIX regular expressions This implements the internal regex stuff for #8506.	2021-07-10 13:33:08 +02:00
Brian Gianforcaro	1682f0b760	Everything: Move to SPDX license identifiers in all files. SPDX License Identifiers are a more compact / standardized way of representing file license information. See: https://spdx.dev/resources/use/#identifiers This was done with the `ambr` search and replace tool. ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *	2021-04-22 11:22:27 +02:00
AnotherTest	c128b3fd91	LibRegex: Remove 'ReadDigitFollowPolicy' as it's no longer needed Thanks to @GMTA: `1b071455b1 (r49343474)`	2021-04-10 12:10:45 +02:00
Jelle Raaijmakers	db321db5f4	LibRegex: Parse `\0` as a zero-byte instead of 0x30 ("0") This was causing some regexes to trip up. Fixes #6202.	2021-04-09 21:53:14 +02:00
AnotherTest	6bbb26fdaf	LibRegex: Allow references to capture groups that aren't parsed yet This only applies to the ECMA262 parser. This behaviour is an ECMA262-specific quirk, such references always generate zero-length matches (even on subsequent passes). Also adds a test in LibJS's test suite. Fixes #6039.	2021-04-01 21:55:47 +02:00
AnotherTest	f05e518cbc	LibRegex: Implement section B.1.4. of the ECMA262 spec This allows the parser to deal with crazy patterns like the one in #5517.	2021-02-27 07:31:01 +01:00
Andreas Kling	13d7c09125	Libraries: Move to Userland/Libraries/	2021-01-12 12:17:46 +01:00

34 Commits