Commit Graph

424 Commits

Author SHA1 Message Date
Philip Monk
b14606660a
goad: recompile apps after changes to /sys
OTAs commonly end up in an inconsistent state if apps depend on changes
to /sys.  For example, the %sift changes break on OTA because %spider
needs to be reloaded so that it's aware of the new thread type.  This
adds a %goad app, which reloads all apps after every change to /sys.

Getting this to start OTA is nontrivial, but this pattern should work
for apps in the future.  The changes to clock shouldn't generally be
necessary; they are only necessary here because we can't rely on hood to
start goad, since hood fails to compile if it's run before zuse is
reloaded.  Once goad is active, this will cease to be a problem.
2019-12-13 17:14:51 -08:00
Jared Tobin
9ba4505086
Merge branch 'ames-sift' (#2081)
* ames-sift:
  ames: refactor +load
  ames: +send-blob better ship printing
  hood: |ames-sift generator to trace by ship
  ames: add %sift  to trace by ship

Signed-off-by: Jared Tobin <jared@tlon.io>
2019-12-12 16:06:32 +08:00
Ted Blackman
35596ca7de
ames: refactor +load 2019-12-12 15:55:37 +08:00
Ted Blackman
d4574b5da4
ames: +send-blob better ship printing 2019-12-12 15:55:36 +08:00
Ted Blackman
d77fb0f685
ames: add %sift to trace by ship 2019-12-12 15:55:32 +08:00
Jared Tobin
85d447f173
Merge branch 'philip/gall-noop' (#2073)
* origin/philip/gall-noop:
  gall: no-op on duplicate watch-ack

Signed-off-by: Jared Tobin <jared@tlon.io>
2019-12-12 15:50:19 +08:00
Jared Tobin
2aa86e3121
Merge branch 'philip/stuck-flow' (#2071)
* origin/philip/stuck-flow:
  ames: recover from mismatched message nums

Signed-off-by: Jared Tobin <jared@tlon.io>
2019-12-12 15:49:53 +08:00
Jared Tobin
e4a7dae888
Merge branch 'philip/login-instructions' (#2039)
* origin/philip/login-instructions:
  eyre: add instructions to login page

Signed-off-by: Jared Tobin <jared@tlon.io>
2019-12-12 15:46:36 +08:00
Philip Monk
3b41a8be15
gall: no-op on duplicate watch-ack
fixes #2070
2019-12-10 18:49:50 -08:00
Philip Monk
29f078bb14
ames: don't forward up the sponsorship chain
This is *actually* why the galaxies are under so much load.  They're in
a forwarding loop with their stars, and this breaks the loop.
2019-12-10 16:20:12 -08:00
Philip Monk
68279d91e4
gall: remove message type from wire
%leave over the network didn't work because we included the message type
in the wire from gall, so the duct for the initial %watch and the %leave
were different.  We need to know the message type so we can route the
acknowledgment as %poke-ack, %watch-ack, or no-op.

This moves this piece of information to a piece of state, where we queue
up the message types per [duct wire].  Ames guarantees that
acknowledgments will come in order.

This also includes an easy state adapter.  The more interesting part of
the upgrade is that we likely have outstanding subscriptions with the
old wire format.  The disadvantage of storing information in wires is
that it can't be upgraded in +load.  So, here we listen for updates on
the old wire format, and when we get them we kill the old subscription,
so that it will be recreated with the new wire format.

As an aside, this is a good example of what we mean when we say
subscriptions may be killed at any time, so apps must handle this case.

Finally, this fixes the "attributing" ship to ~zod for agent requests.
This information was ignored for agent requests, but including it causes
spurious duct mismatches.
2019-12-10 19:32:26 +08:00
Philip Monk
e7c8a44e11
ames: recover from mismatched message nums
We've seen issues where the message-num of the head of live.state is
less than current.state.  When this happens, we continually try to
resend message n-1, but we throw away any acknowledgment for n-1 because
current.state is already n.  This halts progress on that flow.

We don't know what causes us to get in this bad state, so this adds an
assert to the packet pump that we're in a good state, run every time
the packet pump is run.  When this crashes, we can turn on |ames-verb
and hopefully identify the cause.

This also adds logic to +on-wake in the packet pump to not try to resend
any messages that have already been acknowledged.  This is just to
rescue ships that currently have these stuck flows.

(Incidentally, I'd love to have a rr-style debugger for stuff like this.
Just run a command that says "replay my event log watching for this
specific condition and then stop and let me poke around".)
2019-12-09 23:31:18 -08:00
Philip Monk
abde1d8aa9
ames: reduce load by increasing timer delays 2019-12-06 12:11:06 -08:00
Philip Monk
956a3c7420
eyre: add instructions to login page 2019-12-05 12:31:42 -08:00
Ted Blackman
bee0b5803a
ames: don't crash on missing queued larval event 2019-12-05 17:04:24 +08:00
Jared Tobin
41b64feb16
Merge branch 'philip/p2p' (#2025)
* philip/p2p:
  ames: don't overwrite lane if already direct

Signed-off-by: Jared Tobin <jared@tlon.io>
2019-12-05 16:08:01 +08:00
Philip Monk
5406f06092
ames: don't overwrite lane if already direct
This is why basically all packets are going through the galaxies right
now.  Most of the time, the flow right now is:

* talking to ~dopzod but don't know where it is, so ask ~zod to forward,
  which it does

* ~dopzod responds both directly (on the origin lane) and through ~zod

* (if NAT, the direct response doesn't get back, but the one through
  ~zod does. Then you respond directly to ~dopzod because their lane
  piggybacked on the response. ~dopzod responds both directly and
  through ~zod, and the story picks up the same as if you weren't behind a
  NAT)

* now you have a direct lane to ~dopzod, so all is well.

* now the duplicate response from ~dopzod through ~zod comes in (takes a
  little longer because it's bouncing off ~zod), resetting your lane to
  "provisional"

* since your lane is provisional, you send your next packet both
  directly and through ~zod

* GOTO 2

This change says "if I already have a direct lane, don't overwrite it
with a provisional one". This way, the only way the direct lane can be
overwritten is if they stop responding on it (cleared on "not
responding; still trying").

I also added |- to +send-blob to make |ames-verb %rot less confusing.
2019-12-05 16:05:06 +08:00
Jared Tobin
75ca54ca24
Merge branch 'ames-sponsor-scry-2' (#2021)
* ames-sponsor-scry-2:
  ames: scry for sponsor and don't crash on jael response

Signed-off-by: Jared Tobin <jared@tlon.io>
2019-12-05 15:43:00 +08:00
Ted Blackman
a7e638ebab ames: scry for sponsor and don't crash on jael response 2019-12-04 17:18:39 -05:00
Ted Blackman
b3f757d88b
ames: send larval crashes to dill 2019-12-05 02:23:13 +08:00
Ted Blackman
4c9cc1542a
ames: dequeue failed larval timer 2019-12-05 02:23:13 +08:00
Ted Blackman
c20f2391e1
ames: print and retry larval crashes 2019-12-05 02:22:27 +08:00
Philip Monk
ebec1eb54f
ping: delay kick until after ames processes breach 2019-12-04 02:27:35 -08:00
Philip Monk
9bc6ccb7fc
ames: don't say not responding if we haven't been talking 2019-12-03 20:21:43 -08:00
Philip Monk
38197fc79d
gen: add comments on new generators 2019-12-03 16:41:29 -08:00
Philip Monk
98b886acf2
Merge remote-tracking branch 'jfranklin9000/master' into rc 2019-12-03 15:10:32 -08:00
Philip Monk
702dd2c07a
verb: add +verb %bowl to print bowl on every event 2019-12-03 15:05:42 -08:00
Philip Monk
8c2c52c01c
ames: make life printf helpful 2019-12-03 13:06:04 -08:00
Philip Monk
2954ed0b55
gall: correctly construct wire for ap-specific-take 2019-12-02 23:46:15 -08:00
Philip Monk
5f1c4805fe
ames: printfs 2019-12-02 23:13:48 -08:00
Philip Monk
c90107659b
Merge remote-tracking branch 'origin/rc-ames-verb' into rc 2019-12-02 20:22:04 -08:00
Philip Monk
93d3edbf73
pill 2019-12-02 20:09:36 -08:00
Philip Monk
9186f232f1
gall: kick after sending leave 2019-12-02 20:09:36 -08:00
Ted Blackman
d0d45ed8f2
ames: fix message pump to complete queueing fix 2019-12-02 20:09:35 -08:00
Ted Blackman
6dcb6622fa
ames: fix ack queueing 2019-12-02 20:09:35 -08:00
Ted Blackman
0cb6464e9d ames: %spew to set verbosity 2019-12-02 18:46:40 -05:00
Philip Monk
096273cf4a
gall: add state upgrade for %pack 2019-12-02 03:20:34 -08:00
Philip Monk
0431c3c073
Merge remote-tracking branch 'origin/jam-cue-rock' into rc 2019-12-02 02:08:37 -08:00
Philip Monk
e3005eaffa
ames: clear out-of-order messages from packet queue 2019-12-02 00:41:50 -08:00
Philip Monk
aaf7b3b42e
ames: don't crash in +on-take-wake 2019-12-01 16:00:32 -08:00
Ted Blackman
0a8b12c882 ames: state adapter 2019-12-01 02:49:46 -05:00
Ted Blackman
900d923ccc ames: fix aggressive lane timeout (still needs migration) 2019-12-01 02:49:46 -05:00
John Franklin
c36edc838e jael: add %lyfe and %ryft scrys (unitized %life and %rift) and use in +trouble 2019-11-30 19:04:18 -06:00
Philip Monk
d6c1ff4e20
ames: add routing diagnostics 2019-11-30 14:44:57 -08:00
Ted Blackman
071b1a4bbe ames: ~s30 max timeout instead of ~m2 2019-11-28 01:17:34 -05:00
Ted Blackman
93604c2f29 Merge branch 'rc' of github.com:urbit/urbit into rc 2019-11-27 23:06:53 -05:00
Ted Blackman
3779cca5a9 ames: try sponsors above .our 2019-11-27 23:06:39 -05:00
Philip Monk
23cc21c383
ames: remove printf 2019-11-27 19:39:12 -08:00
Ted Blackman
e9ba500ee4 Merge branch 'rc' of github.com:urbit/urbit into rc 2019-11-27 22:31:18 -05:00
Philip Monk
26c5be2948
ames: remove printf 2019-11-27 18:40:33 -08:00