Stale lanes may cause forwarding loops. Imagine the following:
1) Planet A is live. Galaxy B, its indirect sponsor, learns of its route.
2) A goes offline. Another ship, C, is started in its place, at the same route.
3) B receives a packet for A, forwards it to the known route.
4) C received the packet, forwards it to B.
5) Repeat from 3.
Here, we update the forward lane(s) scry used by the runtime to not produce a
peer's lane if they haven't communicated with us in the last hour. Everyone's
supposed to ping their sponsorship chain every 30 seconds. If those aren't
going through, you shouldn't expect to be reachable anyway.
We may or may not want to update +send-blob to match.
This accounts for a possible race condition where ames expects a
response, but regresses into the larval state. Upon receiving the
$sign on +take, we would remain stuck as a larva. Now we check
that we have enough information to re-evolve and then start a
/larval timer to begin draining the queue.
Previously we were dropping events that used old
wires that lacked a rift in them. This seems a
bad behavior because we don't want to destroy a
flow that has not been processed by both ends.
Note: pending a fix to test-old-ames-wire
* jb/motion:
pill: solid
zuse: remove %crud from vane-task
arvo: full vane names in $sign
aqua: build again (still broken)
arvo: reform of the scry reform
We used to not accept new indirect lanes if we already have a direct
lane. This means that if Bob, with a publicly-accessible lane, changes
lanes (eg by restarting the process and getting a new port or changing
ip addresses), tries to talk to Alice, who is behind a NAT, then Bob
will try directly but fail (because Alice is behind a NAT), so he will
route the message through her galaxy. This is good -- the message gets
to Alice. However, Alice had a direct route to Bob's old lane, so she
will try to ack on that lane, which fails. She will not time out this
lane because she doesn't know that Bob isn't getting the acks (acks
don't have their own acks).
The solution is that if Alice receives an indirect lane for Bob when she
already has a direct lane, she shouldn't ignore it. If the lane is the
same as what she has, she shouldn't change anything (in particular, she
shouldn't mark it as indirect). But if it's a new lane, she should
discard her old direct lane and use the new indirect lane.
We inspect the wire of our subscriber to see if we need to produce the
result as a %public-keys or a %boon. This is bad -- we should proxy the
subscription to avoid this need, but this doesn't make that change yet.
%pubs is an old name that doesn't exist anymore (last existed around
September 2019). The new version is /public-keys, but it's worked so
far because /public-keys has only one item in the path, so it missed the
conditional. This commit makes the intent more clear.
The [%a @ @ *] could be just [%a @ *], but I leave it to reduce the
chance of breaking stuff.
Somehow we ended up with flows which expected to awaken but did not wake
up. This was likely caused by the error in r920j OTA, urbit-os-v1.0.18.
This adds a command which ensures that every flow has an active timer.
I expect this to be needed only once, but it's a pretty general tool, so
it's worth keeping.
I've included an unused @t parameter to more easily add simple debug
commands to ames without having to add a new task
At some point this should be more properly styled similar to +by, +in,
and +to, but for now this reduces duplication and makes the ordered map
available to everyone.
Support /=peers= and /=peer=/~ship scries for getting at all peers and
a specific peer's connection state, respectively.
Moves some internal types into zuse for easier external use.
Trying to reduce the size of ames queues. This deduplicates incoming
message-blobs by comparing with existing message-blobs in other queues.
It also stops splitting into fragments in +feed-packets. Instead, it
splits into fragments at the last moment, in +encrypt. This means we
don't have to store a large number of packets in our home road.
Gives you a poor man's progress bar. For example, to determine how much
of an OTA you've downloaded from your sponsor, run:
|ames-sift (sein:title our now our)
|ames-verb %rcv
and then to turn it off:
|ames-verb
Uses Zuse's previously unused +harden helper function to streamline
+task unwrapping in vanes.
(Arguably, in landlocked vanes like Ford, we should crash if we get a
%soft task, since no events should be coming in directly from the
outside.)
There was a typo in the routing logic that was comparing equality
against a value where it should have been doing a pattern match. The
value compared against contained the literal * gate, which would never
match route.peer-state, so this condition was always true, meaning the
fix that had added this extra condition (5406f06) did not actually
change the behavior from what it been previously.
If we receive the naxplanation before the nack, the assertion in the gte
direction fails. The intent of the assertion is to make sure top of the
live queue never falls behind current.state, so it was simply in the
wrong direction.
Two bugs fixed here: first, if the %done reentrancy triggered another
%boon, that wasn't getting translated to a %lost, even though it could
have been the reason the event crashed in the first place.
Second, the %done reentrancy needs to happen after we emit our move, so
that we don't invert the order of the %boon's we produce.
We've seen issues where the message-num of the head of live.state is
less than current.state. When this happens, we continually try to
resend message n-1, but we throw away any acknowledgment for n-1 because
current.state is already n. This halts progress on that flow.
We don't know what causes us to get in this bad state, so this adds an
assert to the packet pump that we're in a good state, run every time
the packet pump is run. When this crashes, we can turn on |ames-verb
and hopefully identify the cause.
This also adds logic to +on-wake in the packet pump to not try to resend
any messages that have already been acknowledged. This is just to
rescue ships that currently have these stuck flows.
(Incidentally, I'd love to have a rr-style debugger for stuff like this.
Just run a command that says "replay my event log watching for this
specific condition and then stop and let me poke around".)
This is why basically all packets are going through the galaxies right
now. Most of the time, the flow right now is:
* talking to ~dopzod but don't know where it is, so ask ~zod to forward,
which it does
* ~dopzod responds both directly (on the origin lane) and through ~zod
* (if NAT, the direct response doesn't get back, but the one through
~zod does. Then you respond directly to ~dopzod because their lane
piggybacked on the response. ~dopzod responds both directly and
through ~zod, and the story picks up the same as if you weren't behind a
NAT)
* now you have a direct lane to ~dopzod, so all is well.
* now the duplicate response from ~dopzod through ~zod comes in (takes a
little longer because it's bouncing off ~zod), resetting your lane to
"provisional"
* since your lane is provisional, you send your next packet both
directly and through ~zod
* GOTO 2
This change says "if I already have a direct lane, don't overwrite it
with a provisional one". This way, the only way the direct lane can be
overwritten is if they stop responding on it (cleared on "not
responding; still trying").
I also added |- to +send-blob to make |ames-verb %rot less confusing.