* origin/philip/gall-ack-fix:
gall: give both acks in case of unexpected ack
gall: make 2140e07 ota-able
gall: properly track remote acknowledgments
Signed-off-by: Jared Tobin <jared@tlon.io>
It's hard to say what's the safest thing to do when we get an ack we
weren't expecting due to losing outstanding.agents.state in +load
3-to-4, so this gives both a watch-ack and a poke-ack. This seems most
likely to succeed.
Does not change state type, but clears outstanding.agents.state since
it's full of garbage values. This introduces a possibility that we may
have been in the middle of something, so we handle that in a reasonably
sane way.
outstanding.agents.state is a queue of what sort of message we sent to a
foreign app. We use it so that when the acknowledgment comes back we
know whether to treat it as a watch-ack, poke-ack, or neither. We used
to put this info in the wire, but this gave us a different ames flow,
which meant %leave and %watch didn't get associated (causing #2079).
The error was that when when retrieving the item from the queue, we put
the new 1-item-shorter queue back in outstanding.agents.state at a
different wire than it came from, so the queues never actually got
shorter, and acknowledgments of the wrong sort were commonly produced.
This caused problems mainly in situations where we poke and peer on the
same wire, and possibly when a subscription was cancelled.
Possibly related to #2206 and #2176. I would expect this bug to cause
those issues, but I haven't verified the converse. Also possibly
related to #2153 and #2079.
"Replace" suggests this function either produces an updated set/map when done,
like +snap, or changes all values in-place, like +turn. In truth, it's more
similar to +roll, which does reduction/accumulation.
("Reduce" specifically was chosen because it maintains the mnemonic relation to
the arm name.)
Due to asynchronicity, Ford can receive responses from Clay to requests
that it has already attempted to cancel. This removes some overzealous
assertions that this wouldn't happen.
@ixv recently uncovered a bug (#2180) in Ford that caused certain
rebuilds to crash. @Fang- and I believe this change should fix the bug,
and we have confirmed that the reproduction that used to fail about two
thirds of the time now has not failed at all in the ten or so times
we've run it since then. @Fang- is still running more tests to confirm
the fix with more certainty.
It turned out the cause was that (depending on the rebuild order, which
is unspecified and should not need to be specified), Ford could enqueue
a provisional sub-build to be run but then, later in the same +gather
call, discover that the sub-build was in fact an orphan and delete it
from builds.state accordingly. Then when Ford tried to run the
sub-build, it would have already been deleted from the state, so Ford
would crash when trying to process its result in +reduce.
The fix was to make sure that when we discover a provisional sub-build
is orphaned, dequeue it from candidate-builds and next-builds to make
sure we don't try to run it. I'm about 95% sure this fix completely
solves the bug.
Uses Zuse's previously unused +harden helper function to streamline
+task unwrapping in vanes.
(Arguably, in landlocked vanes like Ford, we should crash if we get a
%soft task, since no events should be coming in directly from the
outside.)
There was a typo in the routing logic that was comparing equality
against a value where it should have been doing a pattern match. The
value compared against contained the literal * gate, which would never
match route.peer-state, so this condition was always true, meaning the
fix that had added this extra condition (5406f06) did not actually
change the behavior from what it been previously.
If we receive the naxplanation before the nack, the assertion in the gte
direction fails. The intent of the assertion is to make sure top of the
live queue never falls behind current.state, so it was simply in the
wrong direction.
Instead of providing a (unit path), allows for (list path), which better
supports the "update to path and subpath cases".
For example, if /things wants updates about everything, and
/things/specific wants updates about the specific thing, they'll both
need to receive a %fact when the specific thing changes.
Previously, these would have been two separate moves. Now, gall handles
the multi-targeting for you.
Previously, it would always produce ~, regardless of the path asked
about.
Now, it produces a loobean, based on whether or not a file exists at the
specified path.