This removes the logic from cleaning up stale subscriptions in %gall,
leaving +ap-rake as it was, and moves it to the +on-kroc arm in %ames.
Failed subscriptions from nacking a %watch plea that were
not properly corked (fixed in https://github.com/urbit/urbit/pull/6102)
are a subset of the more general "stale re-subscription" issue, so
we take care of all stale flows at the same time, by focusing on the
current subscription—leaving all others to be corked automatically—and
checking if it received a nack, to subsequently cork it.
This modifies the %rake task in %gall, to select what kind of
subscriptions we try to close:
=mode %o: kill old pre-nonce subscriptions
=mode %z: kill old pre-nonce subscriptions, including sub-nonce = 0
=mode %r: kills all stale resubscription flows
It also adds a dry-run option to both tasks (%kroc in ames, %rake in gall)
Address PR6136 comments to improve the interface to this scry.
Now it looks like .^((set ship) %cs /=landscape=/subs)
instead of .^((set ship) %cs %/subs/landscape)
Implements a /cx/[our]//[now]/cult/[desk] endpoint, for getting a set of
pending requests for any given desk. We don't give the $cult for the
desk as-is, but instead slim the $roves back down into $raves, remove
clay protocol version metadata, and make sure to put our @p in place of
empty "for" fields.
This flow is not supported, and it was causing issues releasing
416. This change just drops the responses to avoid crashing, but at
some point we should either support this flow or reject the request in
the first place.
As of version %5, dill uses a new wire format for its userspace
subscriptions. Its existing subscriptions (read: the one subscription
into %hood for the default session) was never updated to use this new
style.
We observed a bug on one ship, where it had both old-style and new-style
subscriptions into hood, resulting in output being rendered twice. How
exactly this happened remains as of yet unclear.
Here, we forcefully clean up the old-style subscription, and
(re)establish the equivalent new-style subscription. This will prevent
issues like this from reoccurring.
Adds .snub to ames-state, a global blocklist for ships. If a packet is
received from a ship that is in the .snub set, it is immediately
dropped. Adds %snub to ames' $task, to allow manipulating this list
Previously, fake breaches triggered by a %ruin task would only get sent to
subscribers watching for the affected ship specifically. Now, we send them to
both those subscribers, and the ones watching for pubkey changes on all ships.
Most of the memory stays in gall anyway, and this means you need to
recompile everything the next time anything changes, which could be
counterproductive. It's important that %trim not make things worse.
The functionality is moved to the debug %stir task.
These were originally added because they reduced memory usage, primarily
by clearing the memoization cache. Now that the memoization cache is
no longer used, we use less memory without them. On ~wicdev-wisryt with
~30 apps, updating Clay now takes ~320MB.
- fix `fragment-num` and `num-fragments` having duplicate faces
- fix faces being wrapped around wrong things in various places
- fix `bone` not being printed in "hear last in-progess" message
- make pretty tape interpolation style more uniform
The +on-cork handler asserts that the peer is known to us. This is the
incorrect behaviour, because it will crash when corking a flow to a peer
that is still an %alien. This can happen, for instance, when making a
gall subscription for the first time and then corking it before the
alien naturalises.
if a cert is configured and a secure port is live it will set the
redirect flag in http-config.state.
When it gets a ++request it will return a 301 redirect to
https://[host]/[path] if:
1. not already secure
2. redirect flag set
3. secure port live
4. is not requesting /.well-known/acme-challenge/...
5. the host is in domains.state
It will not happen if forwarded-secured, localhost, local loopback, ip
addresses or domains not in domains.state.
in ++load it checks the secure port is live and a cert is set and
enables it if so (for people who already use in-urbit letencrypt)
%rule %cert tasks also toggle it (only turning it on if secure port
live)
%live tasks also toggle it (only turning it on if cert set)
Have tested with a couple of ships and seems to work fine.
This is useful in combination with pyry's auto arvo.network dns config
system - can finally get rid of reverse proxies entirely.
Eyre always gets passed request headers in lowercase, so we should search for
the lowercased version of the header.
Arguably `+get-header` should lowercase keys before comparing them, but that's
a more serious behavioral change.
This allows you to pass a thread directly into khan, instead of passing
a filename. This has several implications:
- The friction for using threads from an app is significantly lower.
Consider:
=/ shed
=/ m (strand ,vase)
;< ~ bind:m (poke:strandio [our %hood] %helm-hi !>('hi'))
;< ~ bind:m (poke:strandio [our %hood] %helm-hi !>('there'))
(pure:m !>('product'))
[%pass /wire %arvo %k %lard %base shed]
- These threads close over their subject, so you don't need to parse
arguments out from a vase -- you can just refer to them. The produced
value must still be a vase.
++ hi-ship
|= [=ship msg1=@t msg2=@t]
=/ shed
=/ m (strand ,vase)
;< ~ bind:m (poke:strandio [ship %hood] %helm-hi !>(msg1))
;< ~ bind:m (poke:strandio [ship %hood] %helm-hi !>(msg2))
(pure:m !>('product'))
[%pass /wire %arvo %k %lard %base shed]
- Inline threads can be added to the dojo, though this PR does not add
any sugar for this.
=strandio -build-file %/lib/strandio/hoon
=sh |= message=@t
=/ m (strand:rand ,vase)
;< ~ bind:m (poke:strandio [our %hood] %helm-hi !>('hi'))
;< ~ bind:m (poke:strandio [our %hood] %helm-hi !>(message))
(pure:m !>('product'))
|pass [%k %lard %base (sh 'the message')]
Implementation notes:
- Review the commits separately: the first is small and implements the
real feature. The second moves the strand types into lull so khan can
refer to them.
- In lull, I wanted to put +rand inside +khan, but this fails to that
issue that puts the compiler in a loop. +rand depends on +gall, which
depends on +sign-arvo, which depends on +khan. If +rand is in +khan,
this spins the compiler. The usual solution is to either move
everything into the same battery (very ugly here) or break the
recursion (which we do here).
Whenever a session gets created or removed, send the set of valid auth
tokens to the runtime, so that it may use them in determining whether
incoming requests are authenticated or not.
Previously, when the larva got to processing enqueued events, it was
doing so without loading state into the adult beforehand, resulting in
incorrect processing of events.
Here, we make the larva call +molt more eagerly, ensuring that the adult
always has its state available when we use it.
Yes, there is a global timer for closing flows, but all that does is
enqueue a cork message. +on-stir needs to set _pump_ timers for all
flows that might still have messages to send, which includes closing
flows.
When ames notifies us that our subscription has been kicked, we enqueue
a cork to clean up the flow. Unlike the %leave case, however, we were
not registering the cork in the queue of outstanding comms. We would
eventually get an ack, but not know what for, and erroneously inject
%poke-acks and %watch-acks.
Here we simply add a %cork entry to the queue before sending it.
This is sufficient to bring the normal (non-prerelease-bugged) cases
into the new world.
For the prerelease ships that ran a buggier version of the new gall
subscription logic, we note that the conditional may trigger for the
nonce=1 case where it had already triggered for their
(shouldn't-be-possible) nonce=0 case. This results in a %leave on a wire
that wasn't in use. This no-ops on the publisher side though, and the
flow gets corked right away, so this is considered harmless.
In response to clog notification from remote ames, we were sending a
%cork to clean up the flow. However, the wire we were using had the /sys
prefix already stripped off. Here, we put it back in.
Start by killing subscription nonce 0, then work our way up instead of
down. We enhance the printf with a "total nonces" indicator so we can
still easily see the progress being made.
Previous +ap-doff kicked the agent repeatedly. We needed to kick
it only once. Now publisher agents clear their incoming subscription
state without the subscriber making lots of new subscriptions because
of repeated kicking.
+on-plea gets called in two very different ways:
1) handling request from local vane to send %plea to peer
2) handling %cork request from another ship, which our local ames has %pass'ed
to ourselves
In the second case, we shouldn't print misleadingly, or bind a duct in the ossuary.
+ap-nuke was not including the nonce, but should.
+ap-handle-peers was potentially including a zero nonce.
(The latter shouldn't have been possible, but there's a bug in +load
where sub-nonce.yoke gets initialized as 0 instead of 1.)
Gall tells ames to %cork flows for subscriptions it has closed.
Receiving a kick also closes a subscription, but gall wasn't issuing a
%cork in that case. We correct that here.
Inlines +mo-handle-ames-response's logic at its only callsite.
seems that this structure has been unused since
e75ab631a4 and confuses
newbies trying to figure out exactly what the commit
structure is (which is how I came across this)
Without this, a ship would send a cork on a max of one flow per
recork timer, which could take years to clear for some ships.
This starts a hot loop of trying the next cork once one gets
positively acked.
The previous recork timer queued up %cork messages without sending them.
It also relied on making sure pump timers didn't get set for recork bones.
This was fragile.
The new design enqueues up to one new %cork message per ship during each
recork timer, based on the state of the flow. If the flow is closing but
there are no outstanding messages in it, then it needs to be recorked.
Flows will be recorked in ascending numerical order by bone.
The condition got butchered during refactor: instead of avoiding the creation
of pump timers during recork wake, it was setting them _exclusively_ during
recork wake.
Problem:
by-channel has its own copy of server-state from line 2182. discard-channel returns an altered state, with one channel removed from the state of by-channel.
but the state of by-channel isn't changing with each iteration, so |trim is only removing one channel per invocation.
Solution:
update by-channel on each iteration.
This is a temporary fix, and first part of the gall-request-queue-fix
release in two stages. This gives a publisher ship the ability to
understand a %cork and handle it properly, but no subscriber will
be sending %corks at this stage when leaving a subscription.
We still add a nonce to all subscription wires but it doesn't
increment it when resubscribing, allowing flows to be reused.
Tested locally with toy pub/sub agents and Group join/leaving
Previously, the initial Azimuth snapshot was stored in Clay and shipped
in the pill. This causes several problems:
- It bloats the pill
- Updating the snapshot added large blobs to Clay's state. Even now
that tombstoning is possible, you don't want to have to do that
regularly.
- As a result, the snapshot was never updated.
- Even if you did tombstone those files, it could only be updated as
often as the pill
- And those updates would be sent over the network to people who didn't
need them
This moves the snapshot out of the pill and refactors Azimuth's
initialization process. On boot, when app/azimuth starts up, it first
downloads a snapshot from bootstrap.urbit.org and uses that to
initialize its state. As before, updates after this initial snapshot
come from an Ethereum node directly and are verified locally.
Relevant commands are:
- `-azimuth-snap-state %filename` creates a snapshot file
- `-azimuth-load "url"` downloads and inits from a snapshot, with url
defaulting to https://bootstrap.urbit.org/mainnet.azimuth-snapshot
- `:azimuth &azimuth-poke-data %load snap-state` takes a snap-state any
way you have it
Note the snapshot is downloaded from the same place as the pill, so this
doesn't introduce additional trust beyond what was already required.
When remote scry is released, we should consider allowing downloading
the snapshot in that way.
Includes patched versions of ames' and clay's +load arms.
In clay, we do a dumb ;; hack to get the state to adapt properly. This
shouldn't be needed ($case had an extra... case added to it, old ones
should still nest), and so we should revisit the logic there to make it
cleaner/better before release.
Because the publisher will send the cork plea back to the subscriber on
the next bone, we are not able to know the bone for the original cork.
To handle it, we add the cork bone to the plea path
still wip: it keeps resending the cork plea faster than its ~h1 timer
Removes the !: at the top of gall, so that it no longer gets included in traces about agent builds or crashes.
We also refine intentional crashes with ~_s, so that we still see a crash reason even if we don't get a full trace.
Lastly, flops the trace for +on-load crashes, which were getting printed bottom-first.
Regularized arm names to +etch-* and +sift-* to match Vere. Renamed
$packet to $shot. Used $meow, $purr, and $keen to match Vere's naming.
Reorganized packet decoding arms and moved some to Lull for later use in
Aqua.
Stale lanes may cause forwarding loops. Imagine the following:
1) Planet A is live. Galaxy B, its indirect sponsor, learns of its route.
2) A goes offline. Another ship, C, is started in its place, at the same route.
3) B receives a packet for A, forwards it to the known route.
4) C received the packet, forwards it to B.
5) Repeat from 3.
Here, we update the forward lane(s) scry used by the runtime to not produce a
peer's lane if they haven't communicated with us in the last hour. Everyone's
supposed to ping their sponsorship chain every 30 seconds. If those aren't
going through, you shouldn't expect to be reachable anyway.
We may or may not want to update +send-blob to match.
Previously, if the pointer for a syntax error pointed to the end of the file
(and the file ended in a newline) the code snippet rendering would try to
display a line _beyond_ the end of the file, causing a crash.
Here, we detect that case, and display `<<end of file>>` instead.
Instead of having a separate "busy" flag and pending scry request state,
we now have a single "busy" unit that, if there is a pending request,
contains details about the pending request. In the ames case, this is
simply %ames. But in the scry case, it contains all the details we need
to cancel the request, timer, or both when needed.
Additionally, we now make sure to always cancel the scry timeout timer
whenever we get a scry response.
Previously, when the scry timeout fired and we retried the request with
ames, we would always start over from the very first request (ie,
fetching the ""manifest"" containing the files we need to download).
This behavior is not correct in the case where we had already received
that initial response, and were now fetching individual files. So here,
if that's the case, we simply call into +work after marking this peer as
having known-broken scry, which will resume work as appropriate.
(The actual bug here that broke the fallback behavior entirely was the
busy flag not getting unset. We now clean it up properly.)
We also move setting of the sad timer into +retry-with-ames, instead of
doing it at each individual callsite. (In fact, one of the callsites was
missing this behavior.)
This accounts for a possible race condition where ames expects a
response, but regresses into the larval state. Upon receiving the
$sign on +take, we would remain stuck as a larva. Now we check
that we have enough information to re-evolve and then start a
/larval timer to begin draining the queue.
When +ke-abet-gone gets called, we're going to remove this keen
entirely. +ke-set-wake does a whole song-and-dance and may even set a
new timer. So instead, we simply call +ke-rest, if we have a known
outstanding timer.
Previously we stored the nonce in $boat, which changed the $bowl of each
agent. This compiles and all agents reload, but more testing is needed.
It also renames inbound/outbound watches to $bitt/$boat.
Similar to how %x on the empty desk was already treated specially. We
continue supporting %d on non-empty desks, but do add a print marking it
as deprecated.
The previous commit already started treating the 0v0 commit as
equivalent to the 0 aeon (that is, never-real root), so we should give
the 0v0 commit for all 0 aeons.
Instead of scrying for revisions of files at paths in desks, we now scry
for data corresponding to a given lobe. This removes concerns of aeons,
paths, and other such frivolities, and lets us ask for the specific data
we need regardless of where it may live.
How we're supposed to answer permissioning questions around data this
way remains to be seen.