This PR ports urbit/urbit#6159, fixing a performance problem that
plagued previous porting attempts. Fixes#157, supersedes #210 and #413.
The poor performance observed in #210 and elsewhere was not due to any
issue matching or dispatching jets. It coincided with the switch from
hoon %140 to %139, but only incidentally. It was caused by a change to
the `+solid` pill generator, which inadvertently broke the structural
sharing in the lifecycle sequence (see
https://github.com/urbit/urbit/pull/5989/files#diff-2f8df9d079ccb58c0a9a9c46f2f7dbd943dabaa21ba658c839de757bbac999f1L108-L116).
The problem was unnoticed because, in normal (ie, king/serf) boot and
replay, events are sent over IPC in batches, which had the side effect
of recovering the necessary structural sharing. This new replay
implementation does not involve IPC, but instead reads and computes
events synchronously, in a single process.
The issue did not arise until ships booted from pills created with the
updated generator were replayed using this new implementation, and that
happened to coincide with the release of hoon %139. The absence of
structural shared lead to jets being registered with one copy of the
kernel, but dispatched from a separate copy, resulting in absurdly
expensive equality comparisons. Since both copies were already allocated
on the home-road, unification could not be performed. And since the
problem manifested during the initial phase (lifecycle sequence) of the
boot process, `|meld` could not be used.
This PR includes a trivial hack to work around such event logs: the
lifecycle sequence is read in an inner road, jammed, and then cue'd,
thus recovering structural sharing before any nock computation, jet
registration, &c. The solid pill generator should also be fixed, but
workarounds will still be needed to account for existing piers.
Longer-term, home-road unification should clearly be explored to avoid
such fragility.
Fine requests entire 512-packet "pages" from arvo in each scry, and when
it does that it marks the page as "pending" internally, so that if
another request comes in for the same packet, it doesn't resend the
scry, because it knows the scry is going to come back eventually.
This PR fixes a bug where instead of checking whether the page is
pending, we checked whether the requested packet is pending. This means
that if requests for packets 1, 2, 3, ..., 512 all came in a row, we
would fire off 512 scries, instead of only one. Tlon has seen this issue
in the wild when starting a large number of ships, while they're
downloading their initial OTA from their sponsor.
This is a draft because it has not been tested yet. I believe the bug
can be reproduced by transferring a large file between two ships with
-v, but I have not verified this, and this shouldn't be merged until
we've proven it fixes a bug.
Fine requests entire 512-packet "pages" from arvo in each scry, and when
it does that it marks the page as "pending" internally, so that if
another request comes in for the same packet, it doesn't resend the
scry, because it knows the scry is going to come back eventually.
This PR fixes a bug where instead of checking whether the page is
pending, we checked whether the requested packet is pending. This means
that if requests for packets 1, 2, 3, ..., 512 all came in a row, we
would fire off 512 scries, instead of only one. Tlon has seen this
issue in the wild when starting a large number of ships, while they're
downloading their initial OTA from their sponsor.