Before this, the %watch to eth-watcher was happening before the %poke,
and so eth-watcher was responding with its entire history immediately.
This is bad because it takes a lot of memory to process that many logs,
and also because those logs are stale.
Now, the %poke happens first, which clears the history.
%kick is supposed to start back from the snapshot and move forward.
Without this, we would only fetch logs that we hadn't already fetched.
Thus, if you were up-to-date when you kicked, you would miss anything
that happened between the time the snapshot was taken and the present,
though you would see things after the present.
Also reverted lull change to make this a safer upgrade.
Previously, the initial Azimuth snapshot was stored in Clay and shipped
in the pill. This causes several problems:
- It bloats the pill
- Updating the snapshot added large blobs to Clay's state. Even now
that tombstoning is possible, you don't want to have to do that
regularly.
- As a result, the snapshot was never updated.
- Even if you did tombstone those files, it could only be updated as
often as the pill
- And those updates would be sent over the network to people who didn't
need them
This moves the snapshot out of the pill and refactors Azimuth's
initialization process. On boot, when app/azimuth starts up, it first
downloads a snapshot from bootstrap.urbit.org and uses that to
initialize its state. As before, updates after this initial snapshot
come from an Ethereum node directly and are verified locally.
Relevant commands are:
- `-azimuth-snap-state %filename` creates a snapshot file
- `-azimuth-load "url"` downloads and inits from a snapshot, with url
defaulting to https://bootstrap.urbit.org/mainnet.azimuth-snapshot
- `:azimuth &azimuth-poke-data %load snap-state` takes a snap-state any
way you have it
Note the snapshot is downloaded from the same place as the pill, so this
doesn't introduce additional trust beyond what was already required.
When remote scry is released, we should consider allowing downloading
the snapshot in that way.
The previous value—used for testing—didn't consider
block reorgs, which meant that if we zoom to the latest
block that has no transactions, but that gets later replaced
by a 1-block reorg that does have a transaction, we'll miss it,
making our Azimuth state incomplete.
To fix it, we rewind the Azimuth state to the contents of the snapshot,
and then start retrieving logs from the latest one we have.
Also kick the call to +mule out of the loop. By uncommenting the
diagnostics in u3m_fall, I measured that running through the 290k events
the azimuth snapshot required this much memory:
Head recursive, +mule in: 1.1GB
Head recursive, +mule out: 780MB
Tail recursive, +mule in: 700MB
Tail recursive, +mule out: 70MB
So this commit chooses the last one. The most delicate part is making
sure the effects are the right order; this uses the usual idiom.
Kicking +mule out of the loop is okay because lib/naive should never
fail, and if it does then azimuth shouldn't advance until an out-of-band
solution is decided.
Addresses #5431
Jael needs to be reconfigured to listen to the new aagent for azimuth
events, and the old app needs to be shut down. We do this in
/app/azimuth's +on-init.
Additionally, we make sure that jael doesn't crash when it (as expected)
loses its subscription to the old agent.