Before this, the %watch to eth-watcher was happening before the %poke,
and so eth-watcher was responding with its entire history immediately.
This is bad because it takes a lot of memory to process that many logs,
and also because those logs are stale.
Now, the %poke happens first, which clears the history.
%kick is supposed to start back from the snapshot and move forward.
Without this, we would only fetch logs that we hadn't already fetched.
Thus, if you were up-to-date when you kicked, you would miss anything
that happened between the time the snapshot was taken and the present,
though you would see things after the present.
Also reverted lull change to make this a safer upgrade.
Previously, when the larva got to processing enqueued events, it was
doing so without loading state into the adult beforehand, resulting in
incorrect processing of events.
Here, we make the larva call +molt more eagerly, ensuring that the adult
always has its state available when we use it.
Yes, there is a global timer for closing flows, but all that does is
enqueue a cork message. +on-stir needs to set _pump_ timers for all
flows that might still have messages to send, which includes closing
flows.
When ames notifies us that our subscription has been kicked, we enqueue
a cork to clean up the flow. Unlike the %leave case, however, we were
not registering the cork in the queue of outstanding comms. We would
eventually get an ack, but not know what for, and erroneously inject
%poke-acks and %watch-acks.
Here we simply add a %cork entry to the queue before sending it.
This is sufficient to bring the normal (non-prerelease-bugged) cases
into the new world.
For the prerelease ships that ran a buggier version of the new gall
subscription logic, we note that the conditional may trigger for the
nonce=1 case where it had already triggered for their
(shouldn't-be-possible) nonce=0 case. This results in a %leave on a wire
that wasn't in use. This no-ops on the publisher side though, and the
flow gets corked right away, so this is considered harmless.
In response to clog notification from remote ames, we were sending a
%cork to clean up the flow. However, the wire we were using had the /sys
prefix already stripped off. Here, we put it back in.
Start by killing subscription nonce 0, then work our way up instead of
down. We enhance the printf with a "total nonces" indicator so we can
still easily see the progress being made.
Previous +ap-doff kicked the agent repeatedly. We needed to kick
it only once. Now publisher agents clear their incoming subscription
state without the subscriber making lots of new subscriptions because
of repeated kicking.
+on-plea gets called in two very different ways:
1) handling request from local vane to send %plea to peer
2) handling %cork request from another ship, which our local ames has %pass'ed
to ourselves
In the second case, we shouldn't print misleadingly, or bind a duct in the ossuary.
+ap-nuke was not including the nonce, but should.
+ap-handle-peers was potentially including a zero nonce.
(The latter shouldn't have been possible, but there's a bug in +load
where sub-nonce.yoke gets initialized as 0 instead of 1.)
Gall tells ames to %cork flows for subscriptions it has closed.
Receiving a kick also closes a subscription, but gall wasn't issuing a
%cork in that case. We correct that here.
Inlines +mo-handle-ames-response's logic at its only callsite.
Without this, a ship would send a cork on a max of one flow per
recork timer, which could take years to clear for some ships.
This starts a hot loop of trying the next cork once one gets
positively acked.
The previous recork timer queued up %cork messages without sending them.
It also relied on making sure pump timers didn't get set for recork bones.
This was fragile.
The new design enqueues up to one new %cork message per ship during each
recork timer, based on the state of the flow. If the flow is closing but
there are no outstanding messages in it, then it needs to be recorked.
Flows will be recorked in ascending numerical order by bone.
The condition got butchered during refactor: instead of avoiding the creation
of pump timers during recork wake, it was setting them _exclusively_ during
recork wake.
This test started failing presumably somewhere during #5886. Testing
with a comet on the network, the test seems inaccurate: the comet can
communicate and be communicated to just fine.
Problem:
by-channel has its own copy of server-state from line 2182. discard-channel returns an altered state, with one channel removed from the state of by-channel.
but the state of by-channel isn't changing with each iteration, so |trim is only removing one channel per invocation.
Solution:
update by-channel on each iteration.