Commit Graph

252 Commits

Author SHA1 Message Date
Philip Monk
0e876b3cd4
ames: better printfs 2019-12-18 11:31:17 -03:30
Philip Monk
7ca3d9624e
ames: handle misordered crashing boons
Two bugs fixed here: first, if the %done reentrancy triggered another
%boon, that wasn't getting translated to a %lost, even though it could
have been the reason the event crashed in the first place.

Second, the %done reentrancy needs to happen after we emit our move, so
that we don't invert the order of the %boon's we produce.
2019-12-17 20:58:30 -08:00
Jared Tobin
9ba4505086
Merge branch 'ames-sift' (#2081)
* ames-sift:
  ames: refactor +load
  ames: +send-blob better ship printing
  hood: |ames-sift generator to trace by ship
  ames: add %sift  to trace by ship

Signed-off-by: Jared Tobin <jared@tlon.io>
2019-12-12 16:06:32 +08:00
Ted Blackman
35596ca7de
ames: refactor +load 2019-12-12 15:55:37 +08:00
Ted Blackman
d4574b5da4
ames: +send-blob better ship printing 2019-12-12 15:55:36 +08:00
Ted Blackman
d77fb0f685
ames: add %sift to trace by ship 2019-12-12 15:55:32 +08:00
Jared Tobin
85d447f173
Merge branch 'philip/gall-noop' (#2073)
* origin/philip/gall-noop:
  gall: no-op on duplicate watch-ack

Signed-off-by: Jared Tobin <jared@tlon.io>
2019-12-12 15:50:19 +08:00
Jared Tobin
2aa86e3121
Merge branch 'philip/stuck-flow' (#2071)
* origin/philip/stuck-flow:
  ames: recover from mismatched message nums

Signed-off-by: Jared Tobin <jared@tlon.io>
2019-12-12 15:49:53 +08:00
Philip Monk
3b41a8be15
gall: no-op on duplicate watch-ack
fixes #2070
2019-12-10 18:49:50 -08:00
Philip Monk
29f078bb14
ames: don't forward up the sponsorship chain
This is *actually* why the galaxies are under so much load.  They're in
a forwarding loop with their stars, and this breaks the loop.
2019-12-10 16:20:12 -08:00
Philip Monk
e7c8a44e11
ames: recover from mismatched message nums
We've seen issues where the message-num of the head of live.state is
less than current.state.  When this happens, we continually try to
resend message n-1, but we throw away any acknowledgment for n-1 because
current.state is already n.  This halts progress on that flow.

We don't know what causes us to get in this bad state, so this adds an
assert to the packet pump that we're in a good state, run every time
the packet pump is run.  When this crashes, we can turn on |ames-verb
and hopefully identify the cause.

This also adds logic to +on-wake in the packet pump to not try to resend
any messages that have already been acknowledged.  This is just to
rescue ships that currently have these stuck flows.

(Incidentally, I'd love to have a rr-style debugger for stuff like this.
Just run a command that says "replay my event log watching for this
specific condition and then stop and let me poke around".)
2019-12-09 23:31:18 -08:00
Philip Monk
abde1d8aa9
ames: reduce load by increasing timer delays 2019-12-06 12:11:06 -08:00
Ted Blackman
bee0b5803a
ames: don't crash on missing queued larval event 2019-12-05 17:04:24 +08:00
Jared Tobin
41b64feb16
Merge branch 'philip/p2p' (#2025)
* philip/p2p:
  ames: don't overwrite lane if already direct

Signed-off-by: Jared Tobin <jared@tlon.io>
2019-12-05 16:08:01 +08:00
Philip Monk
5406f06092
ames: don't overwrite lane if already direct
This is why basically all packets are going through the galaxies right
now.  Most of the time, the flow right now is:

* talking to ~dopzod but don't know where it is, so ask ~zod to forward,
  which it does

* ~dopzod responds both directly (on the origin lane) and through ~zod

* (if NAT, the direct response doesn't get back, but the one through
  ~zod does. Then you respond directly to ~dopzod because their lane
  piggybacked on the response. ~dopzod responds both directly and
  through ~zod, and the story picks up the same as if you weren't behind a
  NAT)

* now you have a direct lane to ~dopzod, so all is well.

* now the duplicate response from ~dopzod through ~zod comes in (takes a
  little longer because it's bouncing off ~zod), resetting your lane to
  "provisional"

* since your lane is provisional, you send your next packet both
  directly and through ~zod

* GOTO 2

This change says "if I already have a direct lane, don't overwrite it
with a provisional one". This way, the only way the direct lane can be
overwritten is if they stop responding on it (cleared on "not
responding; still trying").

I also added |- to +send-blob to make |ames-verb %rot less confusing.
2019-12-05 16:05:06 +08:00
Jared Tobin
75ca54ca24
Merge branch 'ames-sponsor-scry-2' (#2021)
* ames-sponsor-scry-2:
  ames: scry for sponsor and don't crash on jael response

Signed-off-by: Jared Tobin <jared@tlon.io>
2019-12-05 15:43:00 +08:00
Ted Blackman
a7e638ebab ames: scry for sponsor and don't crash on jael response 2019-12-04 17:18:39 -05:00
Ted Blackman
b3f757d88b
ames: send larval crashes to dill 2019-12-05 02:23:13 +08:00
Ted Blackman
4c9cc1542a
ames: dequeue failed larval timer 2019-12-05 02:23:13 +08:00
Ted Blackman
c20f2391e1
ames: print and retry larval crashes 2019-12-05 02:22:27 +08:00
Philip Monk
9bc6ccb7fc
ames: don't say not responding if we haven't been talking 2019-12-03 20:21:43 -08:00
Philip Monk
702dd2c07a
verb: add +verb %bowl to print bowl on every event 2019-12-03 15:05:42 -08:00
Philip Monk
8c2c52c01c
ames: make life printf helpful 2019-12-03 13:06:04 -08:00
Philip Monk
5f1c4805fe
ames: printfs 2019-12-02 23:13:48 -08:00
Philip Monk
c90107659b
Merge remote-tracking branch 'origin/rc-ames-verb' into rc 2019-12-02 20:22:04 -08:00
Philip Monk
93d3edbf73
pill 2019-12-02 20:09:36 -08:00
Ted Blackman
d0d45ed8f2
ames: fix message pump to complete queueing fix 2019-12-02 20:09:35 -08:00
Ted Blackman
6dcb6622fa
ames: fix ack queueing 2019-12-02 20:09:35 -08:00
Ted Blackman
0cb6464e9d ames: %spew to set verbosity 2019-12-02 18:46:40 -05:00
Philip Monk
e3005eaffa
ames: clear out-of-order messages from packet queue 2019-12-02 00:41:50 -08:00
Philip Monk
aaf7b3b42e
ames: don't crash in +on-take-wake 2019-12-01 16:00:32 -08:00
Ted Blackman
0a8b12c882 ames: state adapter 2019-12-01 02:49:46 -05:00
Ted Blackman
900d923ccc ames: fix aggressive lane timeout (still needs migration) 2019-12-01 02:49:46 -05:00
Philip Monk
d6c1ff4e20
ames: add routing diagnostics 2019-11-30 14:44:57 -08:00
Ted Blackman
071b1a4bbe ames: ~s30 max timeout instead of ~m2 2019-11-28 01:17:34 -05:00
Ted Blackman
93604c2f29 Merge branch 'rc' of github.com:urbit/urbit into rc 2019-11-27 23:06:53 -05:00
Ted Blackman
3779cca5a9 ames: try sponsors above .our 2019-11-27 23:06:39 -05:00
Philip Monk
23cc21c383
ames: remove printf 2019-11-27 19:39:12 -08:00
Ted Blackman
e9ba500ee4 Merge branch 'rc' of github.com:urbit/urbit into rc 2019-11-27 22:31:18 -05:00
Philip Monk
26c5be2948
ames: remove printf 2019-11-27 18:40:33 -08:00
Ted Blackman
9af7b3954a ames: ignore encrypted packets from alien comets 2019-11-27 20:58:18 -05:00
Philip Monk
138cbb5d2e
ames: clean up printfs 2019-11-27 16:58:26 -08:00
Philip Monk
fdb1069b33
ames: printfs 2019-11-27 16:43:09 -08:00
Philip Monk
fc74ab2dbd
ames: count unsent messages for backpressure 2019-11-27 15:58:38 -08:00
Philip Monk
74b0f66850
ames: continue processing memos after %done 2019-11-27 15:13:17 -08:00
Philip Monk
f035955a36
ames: rename alef -> ames 2019-11-27 00:46:02 -08:00
Ted Blackman
043f508f27 deleted old ames 2019-07-22 19:15:16 -07:00
Joe Bryan
dc2483f1f8 fixes jet-registration hint in %ames +turf-scry (for profiling) 2019-07-11 23:41:54 -07:00
Fang
12b8134c33
Merge branch 'v0.8.0rc' into gut-by 2019-07-10 01:49:07 +02:00
Joe Bryan
47aaef7904 disables spurious condition in %ames ping flow 2019-07-02 18:08:06 -07:00
Fang
eb6c8a45ce
Replace (fall (~(get by calls with (~(gut by 2019-06-30 18:13:34 +02:00
Jared Tobin
b3901ab42f Add 'pkg/arvo/' from commit 'c20e2a185f131ff3f5d3961829bd7a3fe0f227f8'
git-subtree-dir: pkg/arvo
git-subtree-mainline: 9c8f40bf6c
git-subtree-split: c20e2a185f
2019-06-28 12:48:05 +08:00