Vectors that stick around in the AST were wasting a fair bit of memory
due to the growth padding we keep by default. This patch goes after some
of these vectors with the shrink_to_fit() stick to reduce waste.
Since the AST can stay around for a long time, it is worth making an
effort to shrink it down when we have a chance.
Instead of CallExpression storing its arguments in a Vector<Argument>,
we now custom-allocate the memory slot for CallExpression (and its
subclass NewExpression) so that it fits both CallExpression and its list
of Arguments in one allocation.
This reduces memory usage on twitter.com/awesomekling by 8.8 MiB :^)
This template allows us to allocate an AST node and an array of some
arbitrary type T with one allocation instead of two. This can save
a lot of memory in some cases.
Thanks to Jonathan Müller for suggesting this technique! :^)
ASTNode inherits from RefCounted, which has a single 32-bit member.
This means that there's a 32-bit padding hole after RefCounted,
where we are free to put something (or it will go to waste!)
This patch moves ASTNode::m_start_offset into that padding hole,
and we now have a 32-bit padding hole at the end of ASTNode instead.
This will allow ASTNode subclasses to put things in the ASTNode hole
by moving them to the head of the member list.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
This was state only used by the parser to output an error with
appropriate location. This shrinks the size of ObjectExpression from
120 bytes down to just 56. This saves roughly 2.5 MiB when loading
twitter.
Using source offsets directly means we don't have to synthesize any
SourceRanges and circumvents that SourceRange returns invalid values for
the last range of a file.
This patch does two things:
- We now use u32 instead of size_t for the hops and index fields
in EnvironmentCoordinate. This means we're limited to an environment
nesting level and variable count of 4Gs respectively.
- Instead of wrapping it in an Optional, EnvironmentCoordinate now has
a custom valid/invalid state using a magic marker value.
These two changes reduce the size of Identifier by 16 bytes. :^)
Before this change, each AST node had a 64-byte SourceRange member.
This SourceRange had the following layout:
filename: StringView (16 bytes)
start: Position (24 bytes)
end: Position (24 bytes)
The Position structs have { line, column, offset }, all members size_t.
To reduce memory consumption, AST nodes now only store the following:
source_code: NonnullRefPtr<SourceCode> (8 bytes)
start_offset: u32 (4 bytes)
end_offset: u32 (4 bytes)
SourceCode is a new ref-counted data structure that keeps the filename
and original parsed source code in a single location, and all AST nodes
have a pointer to it.
The start_offset and end_offset can be turned into (line, column) when
necessary by calling SourceCode::range_from_offsets(). This will walk
the source code string and compute line/column numbers on the fly, so
it's not necessarily fast, but it should be rare since this information
is primarily used for diagnostics and exception stack traces.
With this, ASTNode shrinks from 80 bytes to 32 bytes. This gives us a
~23% reduction in memory usage when loading twitter.com/awesomekling
(330 MiB before, 253 MiB after!) :^)
This gives us better debug output when analysing calls to `undefined`
and also fixes multiple test-js cases expecting an
`(evaluated from $Expression)` in the error message.
This also refactors out the generation of that string, to avoid code
duplication with the AST interpreter.
This is an export which looks like `export {} from "module"`, and
although it doesn't have any real export entries it should still add
"module" to the required modules to load.
This is a continuation of the previous four commits.
Passing a global object here is largely redundant, we definitely need
the interpreter but can get the VM and (later) current active realm from
there - and also the global object while we still need it, although I'd
like to remove Interpreter::global_object() in the future.
This now matches the bytecode interpreter's execute_impl() functions.
This is a continuation of the previous two commits.
As allocating a JS cell already primarily involves a realm instead of a
global object, and we'll need to pass one to the allocate() function
itself eventually (it's bridged via the global object right now), the
create() functions need to receive a realm as well.
The plan is for this to be the highest-level function that actually
receives a realm and passes it around, AOs on an even higher level will
use the "current realm" concept via VM::current_realm() as that's what
the spec assumes; passing around realms (or global objects, for that
matter) on higher AO levels is pointless and unlike for allocating
individual objects, which may happen outside of regular JS execution, we
don't need control over the specific realm that is being used there.
We already did this but it called the @@iterator method of
%Array.prototype% visible to the user for example by overriding that
method. This should not be visible so we use a special version of
SuperCall now.
We cache on the AST node side as this is easier to track a position, we
just have to take care to wrap the values in a handle to make sure they
are not garbage collected.
This is done by keeping track of all the labels that apply to a given
break/continue scope alongside their bytecode target. When a
break/continue with a label is generated, we scan from the most inner
scope to the most outer scope looking for the label, performing any
necessary unwinds on the way. Once the label is found, it is then
jumped to.
This was defined twice, despite being the very same thing:
- ClassElement::ClassFieldDefinition
- ECMAScriptFunctionObject::InstanceField
Move the former to a new header and use it everywhere. Also update the
define_field() AO to take a single field instead of separate name and
initializer arguments.
Instead of crashing on the spot, return a descriptive error that will
eventually continue its days as a javascript "InternalError" exception.
This should make random crashes with BC less likely.
This removes a number of vm.exception() checks which are now caught
directly by TRY. Make use of these checks in
{Global, Eval}DeclarationInstantiation and while we're here add spec
comments.
It seems the stack search does not find all functions because they are
kept in variants and other structs. This meant some function could be
cleaned up while we were evaluating a class meaning it would fail/crash
when attempting to run the functions.
The hard part of parsing them in import statements and calls was already
done so this is just removing some check which threw before on
assertions. And filtering the assertions based on the result of a new
host hook.
Because we can have arbitrary in- and export names with strings we can
have '*' and '' which means using '*' as an indicating namespace imports
failed / behaved incorrectly for string imports '*'.
We now use more specific types to indicate these special states instead
of these 'magic' string values.
Do note that 'default' is not actually a magic string value but one
specified by the spec. And you can in fact export the default value by
doing: `export { 1 as default }`.
The big changes are:
- Allow strings as Module{Export, Import}Name
- Properly track declarations in default export statements
However, the spec is a little strange in that it allows function and
class declarations without a name in default export statements.
This is quite hard to fully implement without rewriting more of the
parser so for now this behavior is emulated by faking things with
function and class expressions. See the comments in
parse_export_statement for details on the hacks and where it goes wrong.