%leave over the network didn't work because we included the message type
in the wire from gall, so the duct for the initial %watch and the %leave
were different. We need to know the message type so we can route the
acknowledgment as %poke-ack, %watch-ack, or no-op.
This moves this piece of information to a piece of state, where we queue
up the message types per [duct wire]. Ames guarantees that
acknowledgments will come in order.
This also includes an easy state adapter. The more interesting part of
the upgrade is that we likely have outstanding subscriptions with the
old wire format. The disadvantage of storing information in wires is
that it can't be upgraded in +load. So, here we listen for updates on
the old wire format, and when we get them we kill the old subscription,
so that it will be recreated with the new wire format.
As an aside, this is a good example of what we mean when we say
subscriptions may be killed at any time, so apps must handle this case.
Finally, this fixes the "attributing" ship to ~zod for agent requests.
This information was ignored for agent requests, but including it causes
spurious duct mismatches.
We've seen issues where the message-num of the head of live.state is
less than current.state. When this happens, we continually try to
resend message n-1, but we throw away any acknowledgment for n-1 because
current.state is already n. This halts progress on that flow.
We don't know what causes us to get in this bad state, so this adds an
assert to the packet pump that we're in a good state, run every time
the packet pump is run. When this crashes, we can turn on |ames-verb
and hopefully identify the cause.
This also adds logic to +on-wake in the packet pump to not try to resend
any messages that have already been acknowledged. This is just to
rescue ships that currently have these stuck flows.
(Incidentally, I'd love to have a rr-style debugger for stuff like this.
Just run a command that says "replay my event log watching for this
specific condition and then stop and let me poke around".)
This is why basically all packets are going through the galaxies right
now. Most of the time, the flow right now is:
* talking to ~dopzod but don't know where it is, so ask ~zod to forward,
which it does
* ~dopzod responds both directly (on the origin lane) and through ~zod
* (if NAT, the direct response doesn't get back, but the one through
~zod does. Then you respond directly to ~dopzod because their lane
piggybacked on the response. ~dopzod responds both directly and
through ~zod, and the story picks up the same as if you weren't behind a
NAT)
* now you have a direct lane to ~dopzod, so all is well.
* now the duplicate response from ~dopzod through ~zod comes in (takes a
little longer because it's bouncing off ~zod), resetting your lane to
"provisional"
* since your lane is provisional, you send your next packet both
directly and through ~zod
* GOTO 2
This change says "if I already have a direct lane, don't overwrite it
with a provisional one". This way, the only way the direct lane can be
overwritten is if they stop responding on it (cleared on "not
responding; still trying").
I also added |- to +send-blob to make |ames-verb %rot less confusing.
Compare +mute and +mule. Those pass through scry, which doesn't allow us to
catch crashes due to blocking scry. If you intercept scry, you can't preserve
the type polymorphically. By monomorphizing, we are able to do so safely.
Compare +mute and +mule. Those pass through scry, which doesn't allow us to
catch crashes due to blocking scry. If you intercept scry, you can't preserve
the type polymorphically. By monomorphizing, we are able to do so safely.