In order to curb event queue growth when a client for whatever reason
isn't acking the events we send out, we implement a mechanism for
detecting such "clogging", and proactively kick subscriptions which are
adding too many events to the queue.
If the client hasn't sent an ack for ~s30, any subscription that accrues
more than 50 unacked %facts gets closed to prevent further buildup.
Upon reconnecting, the client will see %kick for the relevant
subscriptions and can open a new subscription as appropriate.
Includes a simple test for this behavior, and updates /app/dbug to be
able to display the newly tracked statistics.
By doing a %watch instead of %watch-as %json for channel subscriptions,
we can hopefully make better use of noun deduplication, when storing
events in a channel's event queue until they get acked.
Store the gall events from channel subscriptions as (vaseless) signs,
instead of serialized events. This should be smaller in memory, and
makes it more likely for noun deduplication to happen.
The cost is needing to reserialize upon channel reconnect, but this is
the less common case, and we don't expect it to be particularly slow.
In certain cases +find-merge-points was very slow. Specifically, the
`done` set was meant to avoid checking the same commit repeatedly, but
it didn't catch the case where a commit was added to the worklist that
was already in that worklist.
Secondly, the worklist was stored as a list but used as a queue, which
resulted in a lot of unnecessary welding. We change it to a qeu.
Fixes#3735
* release/next-vere: (1707 commits)
king: fix zig-zag in stderr logging
u3: refactors +murn/+turn, removing unused variable
u3: rewrites +skim jet with u3i_defcons()
u3: rewrites +skip jet with u3i_defcons()
u3: rewrites +skid jet with u3i_defcons()
vere: updates ames to only print network send failures once
u3: cleans up testing protocol, enables gc in mug tests
u3: refactors and enables gc in jam tests
u3: cleans up testing protocol, enables gc in hashtable tests
u3: enables gc in ames and newt tests
u3: initializes head/tail in u3i_defcons() (under U3_MEMORY_DEBUG)
king: actually try shutting down the piers
king: --serf="" is a host option, not a per ship option.
u3: optimizes +wyt:in jet, gated by compile-time assertion
u3: further optimizes +lent jet, gated by compile-time assertion
u3: refactors allocator constants, adds u3a_cells and u3a_maximum
u3: optimizes +lent jet, avoiding u3i_vint() while possible
u3: moves cell allocation counter into u3a_celloc()
u3: fixes memory leak introduced in +murn jet
u3: fixes mismatches in +div and +dvr jets
...
Instead of always providing a wildcard for the allowed methods and
headers, now echoes back the method and headers that the client asked
for, if any.
Fixes#3676.
Disallows registering bindings (through %connect and %serve) that would capture
traffic on paths starting with /~ (Eyre's) or /~_~ (runtime's, as of cc389c5).
Note that we don't touch +insert-binding, which is used by Eyre internally to
set up bindings in its own namespace.
Lets you check whether a specific Cookie header value string constitutes an
authenticated request.
/ex/=//=/authenticated/cookie/(scot %t 'cookie-string')
Intended for use in the runtime, for example with #3557.
Adds a cors-registry to Eyre's state that tracks allowed and rejected
origins for the purposes of CORS request handling.
For preflight requests, generates a response in-line.
For simple requests, adds CORS headers onto whatever response is given.
See also:
https://groups.google.com/a/urbit.org/g/dev/c/bb82dwEJGzM/m/q2JjNSx5BwAJ
This was originally introduced by me in #1814 to address #1811. Eyre was not
canceling heartbeat timers on all relevant events making it easy to end
up with an infinite behn loop. This check allowed ships that entered an infinite
loop to recover, as per my comment at
https://github.com/urbit/urbit/pull/1814#discussion_r333477482. Otherwise it's
not necessary.
@t further indicates to the caller (although loosely, because auras
are loosely enforced types) that the input should not contain embedded
NUL bytes, which are not valid @t.
Separately, a minor improvement has been made to the performance of
the raw hoon by pinning the gates used in the inner loop.
Prior to this commit, there was a jet mismatch in to-wain (formerly
called lore, and still jetted under that name). 0 bytes in the middle of
a cord caused the jet to crash, whereas the hoon simply treated them as
the end of cord and truncated the output. The history of this behavior
is fraught with controversy. This commit rectifies the current mess with
the following rationale: Null bytes are valid ASCII/UTF-8, and \n\n in
the input will cause null list items in the output, so nulls are (for
the purposes of to-wain) allowed in cords. Trailing nulls cannot be
represented because of the nature of atoms, but that is outside the
scope of to-wain's concern. Therefore to-wain should simply measure the
cord and split on newlines, and do nothing fancy at all with nulls.
In addition, the hoon for to-wain was written in an inefficient style
that produced a lot of intermediate garbage atoms via rsh and cat. This
commit's implementation measures once and cuts once, so to speak, and so
avoids the intermediate garbage. Quick benchmarks suggest it is about
20x faster than the old hoon, but still orders of magnitude slower than
the jetted code. to-wain is the workhorse for the txt mark, so we should
still prefer to have a jet.
The old jet is left wired up under %lore, and should be removed when
support for the old, unupgraded zuse is no longer necessary. A new jet
with matching null handling has been wired up under the name %leer.
Well-behaved secp jets can now plausibly exit if they are given inputs
of the wrong size/range. Previously this was either not checked or the inputs
were silently truncated.
The secp core had some flaws: in particular, the logic for signing/recovery
did not match libsecbp256k1 w.r.t. the enigmatic "recid" (v) value. The jet
hints were also subtly wrong, in that the curve parameters were in a sample
(not an arm) and thus not matched by the jet matching scheme. Consequently,
the jets would be used (but incorrect) for other curve parameters.
Tests were also added to exercise the recovery id cases thoroughly.
Depending on the additions to term.c made in 467d8d239 allows dill to
forget about ansi escape codes, and pass styled text nouns straight on
to vere.
Also removes a bit of logic from drum, which assumed things about the
rendering of escape codes to adjust cursor positioning. Now it simply
states the semantic cursor position, letting the runtime deal with the
potential influence of styling.
If both sides changed a file in the same way, %mate used the version in
the mergebase, which is incorrect. This changes it to use the version
in the destination desk.
An example of this issue:
> +cat %/test/hoon
/~zod/home/~2020.9.3..21.41.24..61ed/test/hoon
first
> |merge %scratch our %home
>=
merged with strategy %fine
+ /~zod/scratch/2/test/hoon
> +cat /=scratch=/test
/~zod/scratch/~2020.9.3..21.41.32..408c/test/hoon
first
> *%/test/hoon 'second'
: /~zod/home/3/test/hoon
> *%%%/scratch=/test/hoon 'second'
: /~zod/scratch/3/test/hoon
> |merge %scratch our %home
>=
%fine merge failed, trying %meet
%meet merge failed, trying %mate
merged with strategy %mate
: /~zod/scratch/4/test/hoon
> +cat /=scratch=/test
/~zod/scratch/~2020.9.3..21.42.25..9e8b/test/hoon
first
Ordinarily, eyre cleans up the relevant gall subscriptions whenever a
channel disappears. In yet unresolved erroneous behavior though, it may
leave a gall subscription open, despite wiping the channel from state.
Attempting to pass the response onto the deleted channel anyway results
in an %eyre-no-channel error later in the event. The volume of these
errors can degrade the user experience, as per #3196.
To resolve the annoyance (but not the underlying issue) we detect the
"subscription has no channel" case, and issue a %leave. Doing so
requires additional information in the wire, so we add that in,
refactoring the relevant wire building along the way.
Note that due to the wire requirements, this cannot resolve existing
cases. For that, we depend on bc929ba6d.
As part of the solution to #3196, we need to clean up any gall
subscriptions that eyre didn't properly clean up.
Since detecting that is hard, we opt to just wipe _all_ eyre-originating
subscriptions from gall. We inspect the duct, which isn't good, but it's
only just this once.
The main thing here is that we aggressively check whether we're in
ancestry of another mergebase candidate. This means we don't have to do
a 2nd pass to eliminate redundant candidates.
Change the definition of base-hash to be the mergebase of %home with the
OTA source. This means it's the most recent successfully-applied
update, which is usually the most important information.
Add sour-hash, which is the hash of the most recently *downloaded*
update, regardless of whether it applied successfuly (ie the old
base-hash).
Add a summary of the various hashes at the top of gen/trouble.
Only no-op if the incoming commit's parent is the old head of the desk.
Also move the printing near the end so we can know exactly if anything
changed.
Jael now stores a `step` that is combined with the original salt to
produce a new code. A `%step` card is used to increment that value,
and effectively resetting the keys. Because the first `step` is zero,
the first code is the same as before.
Eyre was changed to be notified with `%code-changed` so it can forget
old cookies, sessions and discard all the existing channels.
A new generator was added |code, that does both querying and
resetting the code
|code :: shows current code, step and help
|code %reset :: changes the code
The old +code generator still works correctly.
* master: (915 commits)
vere: bumps urbit version to v0.10.8
pill: updates all
king: fix ames tests
contact-store: restore /~/default contacts
contact-hook: resubscribe on correct paths
u3: note that u3a_rewrit* doesn't yet support south roads
king: it was too clever of me to use stateTVar; compiler can't help
king: fix comment about ames q behavior
king: ames bounded q, now with logging and fifo
serf: tweaks |pack and |mass printfs
u3: moves u3a_compact to u3m_pack, refactors internals
metadata: handle OTA correctly
u3: refactors u3m_reclaim() into noun modules, works on any road
release: urbit-os-v1.0.30
group-store: remove scries from OTA logic
release: urbit-os-v1.0.30
MAINTAINERS: amend for post-fusion
ames: add scry endpoint for forward lanes
ames: improve scry interface
chat, publish, contacts: fix OTA bugs
...
We used to not accept new indirect lanes if we already have a direct
lane. This means that if Bob, with a publicly-accessible lane, changes
lanes (eg by restarting the process and getting a new port or changing
ip addresses), tries to talk to Alice, who is behind a NAT, then Bob
will try directly but fail (because Alice is behind a NAT), so he will
route the message through her galaxy. This is good -- the message gets
to Alice. However, Alice had a direct route to Bob's old lane, so she
will try to ack on that lane, which fails. She will not time out this
lane because she doesn't know that Bob isn't getting the acks (acks
don't have their own acks).
The solution is that if Alice receives an indirect lane for Bob when she
already has a direct lane, she shouldn't ignore it. If the lane is the
same as what she has, she shouldn't change anything (in particular, she
shouldn't mark it as indirect). But if it's a new lane, she should
discard her old direct lane and use the new indirect lane.
RFC2396 defines[1] unreserved characters as alphanumerics and nine "mark"
characters. We were only parsing for four of those, leading to parsing failure
for valid URLs.
[1]: https://tools.ietf.org/html/rfc2396#appendix-A
In Ford Fusion, Clay builds generators but Dojo and Eyre run them. Dojo
is already virtualized with a scry function, so +mule is fine, but Eyre
is not, so Eyre needs to use +mock and explicitly supply the scry
function. This does that. Fortunately, the produced result is simple
and easily clammable.
Fixes#3089
No longer abuse the desk field, instead making use of the path. Reject
any scries outside of the local ship, empty desk and current time as
invalid.
Expose ducts only under a debug endpoint, nothing else should care about
being able to inspect them.
Add scry endpoints for the very next timer (if any), and all timers up
to and including a specified timestamp.
When merging, +reachable-takos is called roughly once per merge commit
in the ancestry of the new commit. +reachable-takos was exponential in
the number of merge commits in the ancestry of the commit it's looking
at, due to mishandling of the accumulator. This makes it linear.
Of course, linear x linear is still quadratic, which is not great. I
doubt +reachable-takos can be made asymptotically better, but
+reduce-merge-points/+find-merge-points probably can. 50 merge commits
already gives about 14.000 iterations through the loop in
+reachable-takos. Another option is to try to memoize this somehow, but
a simple ~+ is insufficient since `s` is usually different.
In local tests on macOS with a -L copy of ~wicdev-wisryt, this speeds up
OTAs significantly. The majority of time was spent on this.
* origin/jb/aes-siv-fix:
tests: updates aes-siv regression test comment
pill: updates solid
zuse: propagates fix to aes-128-siv and aes-192-siv as well
Revert "test: disable aes-siv jets to demonstrate test failure"
pill: updates solid
zuse: fixes bug in aes-256-siv iv calculation (+s2vc:aes:crypto)
test: disable aes-siv jets to demonstrate test failure
test: add test case for aes-256-siv jet mismatch, observed in the wild
Signed-off-by: Philip Monk <phil@pcmonk.me>
Adds +mure to run a trap in a separate road. This should eventually be
just a hint.
Vega was running inside a mule, but since +load was called within vega,
the new kernel was all run within the same mule, so it didn't actually
get to reclaim the space after hoon compiled.
We verified this with printfs in u3m_fall. On the test ship (from
mainnet) which had 800MB used, vega was taking interior free space from
950MB to 450 over the course of compiling hoon, then each vane would go
from about 450 to 350 and then back to 450 once it finished (which
proves they were correctly isolated). With this change, after hoon
compiles the free space goes back up to 950MB. This gives us a lot more
space to compile OTAs.
We had to slightly refactor the logic for doubly-recompiling hoon, since
+mure as written produces a ?(!! _trap), and you can't find faces in the
result of the trap. We could bake mure, but that's rather awkward. I
wonder if there's a way to fix this as a wet gate.
Attempt to convert the scry result to the mark that was asked for,
failing the scry (with ~) if the conversion fails.
Eyre's scry logic, then, can pass the requested mark directly into gall.
Exposes a scry endpoint. Any requests made to the /app/scry.mark url
under the endpoint will scry into %app using a %gx scry, at the
/scry/noun path, and attempt to convert the scry result into the %mark,
before converting that into the %mime mark, and sending that as an http
response.
In addition to producing the action bound for a given request, now also
produces the subset of the request url that comes _after_ the path at
which the binding has been established.
Will allow some bindings to more easily dispatch off the relevant part
of the url.
If we failed the password check, the login page served to us would never
include any redirect details, even if they were there in the original request.
Now we simply (attempt to) parse out the redirect field a little earlier.
Associates channels with the authentication sessions that opened them,
and deletes the channel when its associated session expires.
Also updates the debug dashboard to display channel counts per session.
Turns +on-channel-timeout into +discard-channel, which cleans up the
entirety of the channel, based on its current state. This allows us to
simplify the %delete channel request into a simple function call.
Changes the HTTP status code of the redirect that occurs upon a
successful login from 307 to 303. 307 preserves the method of the
original request, so the redirected request is a POST. With the new SPA,
this causes a 404 as app/file-server validates the method of the
request, something that did not happen in earlier versions of landscape.
303 instead changes the method to always produce a GET request.
Set up, by default, on /~/logout.
Sending a POST request to this expires the current session and redirects
to the login page. If the "all" key is set in the request body, expires
all open sessions.
We build a reef for each desk but use the compiler from our kernel. At
some point we should use the compiler from the desk, but then we need to
validate any results we get from it.
For request transparency, HTTP proxies may set the Forwarded header to
specify who the original requester is.
For requests from localhost only, we make Eyre respect the Forwarded
header, and adjust the handled ip address accordingly.
Note that we do not support X-Forwarded or other non-standard variants.
The header remains in the request, so server applications can handle
them as desired.
Fixes#2723.
When sending a response to an authenticated request, update the session
to last for +session-timeout again, and send an updated cookie to match.
Assuming the user makes an actual HTTP request at least once a week,
this will make sure they don't get logged out automatically. Simply
keeping a channel open, unfortunately, doesn't count.
Instead of setting a timer for every session, we set a single expiry
timer when the first session is created. On the subsequent wake event,
we clear all cookies that have expired at that time, then set a timer
for when the next session expires.
This approach gives us flexibility wrt sessions going forward, allowing
extending or early deleting of sessions without having to care about the
related timers.
Note that in +load, we clear all existing sessions. We would start the
expiry timer flow there, but can't. Forcing the user to login again
post-ota once isn't the end of the world.
We inspect the wire of our subscriber to see if we need to produce the
result as a %public-keys or a %boon. This is bad -- we should proxy the
subscription to avoid this need, but this doesn't make that change yet.
%pubs is an old name that doesn't exist anymore (last existed around
September 2019). The new version is /public-keys, but it's worked so
far because /public-keys has only one item in the path, so it missed the
conditional. This commit makes the intent more clear.
The [%a @ @ *] could be just [%a @ *], but I leave it to reduce the
chance of breaking stuff.
Somehow we ended up with flows which expected to awaken but did not wake
up. This was likely caused by the error in r920j OTA, urbit-os-v1.0.18.
This adds a command which ensures that every flow has an active timer.
I expect this to be needed only once, but it's a pretty general tool, so
it's worth keeping.
I've included an unused @t parameter to more easily add simple debug
commands to ames without having to add a new task
The subscription changes in drum broke existing subscriptions. This
worked alright (though loud) for dojo, but it left chat-cli "frozen"
unless you manually unlinked/relinked. This does that automatically.
It also includes a refactoring of +on-load in drum, to avoid vain
repetition.
We need to get updates directly into %home in case the marks depend on
changes to hoon.hoon. %base has no reason to exist.
Our ota strategy is now to merge from parent/kids to home, then
parent/kids to kids.
* origin/release/link-dojo:
chat-cli: allow sending • character
chat-cli: always talk to local ship only
chat-cli: single-target sole effects as needed
chat-cli: don't allow excessively small cli widths
chat-cli: pull in sole-sur namespace where relevant
chat-cli: remove unused entropy from state
chat-cli: print newlines correctly
chat-cli: support multiple sole connections
chat-cli: don't crash on %bad-text
dojo: rename remote access generators
gall: fix handling of empty path list
dojo: remove unused %json poke
dojo: add remote access controls
drum: switch to per-ship /sole/drum duct
Signed-off-by: Philip Monk <phil@pcmonk.me>
At some point this should be more properly styled similar to +by, +in,
and +to, but for now this reduces duplication and makes the ordered map
available to everyone.
a %hunk is the error-stack frame for a failed ([~ ~]) scry.
this changes the frame type from tank to *, avoiding
coordination overhead between +mink and the interpreter.
Allow one or more whitespace characters before and/or after the equals sign in
name attribute pairs, such as `<hello a = "yo" />` or `<hello a= "yo" />`.
Following the spec at https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Eq.
* origin/philip/ames-dedup:
clay: don't send peers to message pump
ames: only dedupe long messages
ames: don't split messages until ready to send
ames: dedup new messages and fragments
This will fix the issue described in #2867 for ducts that have already
triggered the bug. This will also send spurious acks for any messages
that are outstanding at the time of the upgrade, but I don't believe
this will cause a serious problem.
Support /=peers= and /=peer=/~ship scries for getting at all peers and
a specific peer's connection state, respectively.
Moves some internal types into zuse for easier external use.
I'm not going to be able to debug the memory leaks in the +ob core
jets, so remove those new jets and hooks to +fein and +fynd so they
can be called from the @p parsing jets. This moves +fein adn +fynd
to the toplevel because there appear to be issues with hooks
referencing things that aren't in the directly jetted core.
Trying to reduce the size of ames queues. This deduplicates incoming
message-blobs by comparing with existing message-blobs in other queues.
It also stops splitting into fragments in +feed-packets. Instead, it
splits into fragments at the last moment, in +encrypt. This means we
don't have to store a large number of packets in our home road.
`at` is for when you expect an array of a certain exact structure. If it
has extra elements, that indicates you were mistaken about the strucutre,
so it should fail to match.
RFC 2396 specifies that segments must be zero or more pchars.[1] We were
deviating from this by requiring at least one pchar per segment.
With this change, we support /some//path, and no longer lose the
trailing slash in /some/path/.
[1]: https://tools.ietf.org/html/rfc2396#section-3.3
-merge will replace |merge so that. Once they reach feature parity and
%info is rewritten to forward to -commit, we can rip out about half of
clay.hoon
Makes it so that |cancel %force skips the next thing in the queue if
you're not in the middle of something. If you are in the middle of
something, it skips the thing you're in the middle of (just like naked
|cancel).
This should resolve issues where |cancel doesn't drain the queue.
Considering some of the options here were atoms, not cells, $% wasn't
appropriate, and led to *etyp:abi:ethereum resulting in ford %ride execution
failure. Simply using $? instead would result in a fish-loop, so here we split
the atom cases from the tagged union ones with a $@.
%park is a plumbing commit task. It guarantees completion in a single
event, so you have to do much of the work before calling it. -commit
is an example of how to do this.
When a ship breaches, we remove all messages that have yet to be
delivered to an app (eg if it's not yet started). We also add
|gall-sear to do this manually, but this shouldn't be needed in normal
operation.
Finally, to unblock ~zod and ~bus on mainnet, we sear one particular
ship automatically on loading hood. It cannot be done manually because
no userpace changes can be made until it's unblocked.
We have three stacks: the hoon stack, bar stack, and duct stack. This
turns the bar stack to a list of ducts and adds it to the hoon stack.
This tells you the ducts of the moves that caused the move where you
crashed.
See:
recover: dig: intr
crud: %belt event failed
bail: intr
bar-stack
~[
~[/g/use/spider/~zod/build/~.dojo_0v5ogno.5anji.vn3f6.4gs7t.6r2ft /d //term/1]
~[/d //term/1]
~[/g/use/spider/~zod/find/~.dojo_0v5ogno.5anji.vn3f6.4gs7t.6r2ft /d //term/1]
~[/g/use/dojo/~zod/out/~zod/spider/drum/wool /d //term/1]
~[/d //term/1]
~[/g/use/dojo/~zod/drum/hand /d //term/1]
~[/g/use/hood/~zod/out/~zod/dojo/drum/phat/~zod/dojo /d //term/1]
~[/d //term/1]
~[//term/1]
]
call: failed
/~zod/home/~2020.3.17..23.14.11..50e0/sys/vane/ford:<[6.128 3].[6.220 5]>
/~zod/home/~2020.3.17..23.14.11..50e0/sys/vane/ford:<[6.129 3].[6.220 5]>
/~zod/home/~2020.3.17..23.14.11..50e0/sys/vane/ford:<[6.132 3].[6.220 5]>
...
Gives you a poor man's progress bar. For example, to determine how much
of an OTA you've downloaded from your sponsor, run:
|ames-sift (sein:title our now our)
|ames-verb %rcv
and then to turn it off:
|ames-verb
This reverts commit 046506f9d4, reversing
changes made to 6ef08962ef.
I'm reverting this as we're moving to a new branch/release model in
which breaching changes (as this one is) will live on a long-running
'next' branch, rather than alongside non-breaching changes in master.
This revert should itself be reverted on the 'next' branch.
* master: (484 commits)
king: Slight CLI cleanup and fix test build.
king: Add command-line flags to configure HTTP and HTTPS ports.
groups: reduce metadata updates, removal
chat: reducer handles metadata removal
groups: exclude group metadata from channels list
groups: set and surface group name metadata
groups: remove dummy 'share' flow, 'default' group
contacts: rename, migrate '~contacts' to '~groups'
sh/release: rename vere release tarballs
vere: patch version bump (v0.10.3 -> v0.10.4.rc1) [ci skip]
pills: updated brass and solid
chat: pull room contacts from associated group
chat: spell 'permanent' correctly
eyre: remove padding from 'access' input
chat: only delete metadata for a chat if you created it
chat: settings inputs add borders on focus
vere: disables gc on |mass in the daemon process
chat: remove console.log from metadataAction
chat: style fixes during review, use metadata-hook
chat: edit description, color settings
...
* origin/os1-rc: (439 commits)
pills: updated brass and solid
chat: pull room contacts from associated group
chat: spell 'permanent' correctly
eyre: remove padding from 'access' input
chat: only delete metadata for a chat if you created it
chat: settings inputs add borders on focus
chat: remove console.log from metadataAction
chat: style fixes during review, use metadata-hook
chat: edit description, color settings
chat: add update-metadata to metadata reducer
chat: revise api.js to match data structures
metadata-json: add json to action parsers
chat: construct settings page for metadata
chat: correct bottom border on join links
chat: copy shortcodes
chat: linkify unmanaged chats
metadata-hook: support group members other than host creating shared resources
contacts: add bg-gray0 to root page
chat + contact views: updated for style and to assert that group-path must be equal to app-path if there are ships in the members set
contacts: changed color + copy of "add to group" button
...
* origin/ted/ford-no-pit:
pills: update solid
http.c: revert timeout to original ~m10
tests: prime ford %reef cache
http.c: bump timeout from ~m20 to ~m30
http.c: bump timeout from ~m10 to ~m20
tests: fix ford tests for no %reef short-circuit
ford: remove pit short-circuit
Instead of trying to hint computations (buying us %memo, etc), we
simply pass through the nouns (with constant [1 noun] formulas)
to the underlying runtime. This avoids spuriously product-hinting
the +tone results of +mink as the previous version did.
* origin/philip/gall-ack-fix:
gall: give both acks in case of unexpected ack
gall: make 2140e07 ota-able
gall: properly track remote acknowledgments
Signed-off-by: Jared Tobin <jared@tlon.io>
It's hard to say what's the safest thing to do when we get an ack we
weren't expecting due to losing outstanding.agents.state in +load
3-to-4, so this gives both a watch-ack and a poke-ack. This seems most
likely to succeed.
Does not change state type, but clears outstanding.agents.state since
it's full of garbage values. This introduces a possibility that we may
have been in the middle of something, so we handle that in a reasonably
sane way.
outstanding.agents.state is a queue of what sort of message we sent to a
foreign app. We use it so that when the acknowledgment comes back we
know whether to treat it as a watch-ack, poke-ack, or neither. We used
to put this info in the wire, but this gave us a different ames flow,
which meant %leave and %watch didn't get associated (causing #2079).
The error was that when when retrieving the item from the queue, we put
the new 1-item-shorter queue back in outstanding.agents.state at a
different wire than it came from, so the queues never actually got
shorter, and acknowledgments of the wrong sort were commonly produced.
This caused problems mainly in situations where we poke and peer on the
same wire, and possibly when a subscription was cancelled.
Possibly related to #2206 and #2176. I would expect this bug to cause
those issues, but I haven't verified the converse. Also possibly
related to #2153 and #2079.
"Replace" suggests this function either produces an updated set/map when done,
like +snap, or changes all values in-place, like +turn. In truth, it's more
similar to +roll, which does reduction/accumulation.
("Reduce" specifically was chosen because it maintains the mnemonic relation to
the arm name.)
When molds were changed to crash on bad input, mook was not updated.
It relied on the old behavior of bunting on bad input. +moko
(the replacement +mook) simply doesn't include stack items that don't
have the proper type (in constrast to +mook, which currently crashes
and used to leave a "blank"/bunted stack item for improperly typed
values).
+mink, the current virtual nock interpreter, has a couple of problems.
1. it propagates blocks as a list of paths, which is inconsistent with
the way the jet behaves (only a single path is ever blocked on, with
exception semantics).
2. +mush was not updated after the change to molds to crash instead of
bunting. it crashes when not given the right kind of data, which is
inconsistent with the intended semantics of ++mink.
3. it "eats" hints, causing (for example) slogs to disappear when running
without a mink jet.
4. the naming/style was typically cryptic. since +mink will never really
be run, one could argue that its primary purpose is to be read.
+mino (which will be renamed to +mink after some staging) has had its
return type (+tono, to be renamed +tone) modified in the block case so
that it only blocks on one path, has a corrected +mush, carefully
"passes through" all hints to the underlying interpreter, and has more
meaningful names, with the intention of improving readability.
A generator (gen/mino.hoon) is also included in this commit; it contains
tests that were used during the development of +mino. It should be removed
before integration, and is included for posterity. The stack trace semantics
are expected to change in the near future (since they are dependent on jets
faithfully preserving the stack pushes of the pure nock, an onerous burden).
They are, however, tested in gen/mino.hoon, which makes it unsuitable as a
long-term test.
Adds a `[%spot *]` type to the `note` type annotation definition.
These are added when the %dbug hoon is encountered. Done to enable
jump to defintion in the language-server.
Due to asynchronicity, Ford can receive responses from Clay to requests
that it has already attempted to cancel. This removes some overzealous
assertions that this wouldn't happen.
@ixv recently uncovered a bug (#2180) in Ford that caused certain
rebuilds to crash. @Fang- and I believe this change should fix the bug,
and we have confirmed that the reproduction that used to fail about two
thirds of the time now has not failed at all in the ten or so times
we've run it since then. @Fang- is still running more tests to confirm
the fix with more certainty.
It turned out the cause was that (depending on the rebuild order, which
is unspecified and should not need to be specified), Ford could enqueue
a provisional sub-build to be run but then, later in the same +gather
call, discover that the sub-build was in fact an orphan and delete it
from builds.state accordingly. Then when Ford tried to run the
sub-build, it would have already been deleted from the state, so Ford
would crash when trying to process its result in +reduce.
The fix was to make sure that when we discover a provisional sub-build
is orphaned, dequeue it from candidate-builds and next-builds to make
sure we don't try to run it. I'm about 95% sure this fix completely
solves the bug.
Uses Zuse's previously unused +harden helper function to streamline
+task unwrapping in vanes.
(Arguably, in landlocked vanes like Ford, we should crash if we get a
%soft task, since no events should be coming in directly from the
outside.)
There was a typo in the routing logic that was comparing equality
against a value where it should have been doing a pattern match. The
value compared against contained the literal * gate, which would never
match route.peer-state, so this condition was always true, meaning the
fix that had added this extra condition (5406f06) did not actually
change the behavior from what it been previously.
If we receive the naxplanation before the nack, the assertion in the gte
direction fails. The intent of the assertion is to make sure top of the
live queue never falls behind current.state, so it was simply in the
wrong direction.
Instead of providing a (unit path), allows for (list path), which better
supports the "update to path and subpath cases".
For example, if /things wants updates about everything, and
/things/specific wants updates about the specific thing, they'll both
need to receive a %fact when the specific thing changes.
Previously, these would have been two separate moves. Now, gall handles
the multi-targeting for you.
Previously, it would always produce ~, regardless of the path asked
about.
Now, it produces a loobean, based on whether or not a file exists at the
specified path.
Two bugs fixed here: first, if the %done reentrancy triggered another
%boon, that wasn't getting translated to a %lost, even though it could
have been the reason the event crashed in the first place.
Second, the %done reentrancy needs to happen after we emit our move, so
that we don't invert the order of the %boon's we produce.
OTAs commonly end up in an inconsistent state if apps depend on changes
to /sys. For example, the %sift changes break on OTA because %spider
needs to be reloaded so that it's aware of the new thread type. This
adds a %goad app, which reloads all apps after every change to /sys.
Getting this to start OTA is nontrivial, but this pattern should work
for apps in the future. The changes to clock shouldn't generally be
necessary; they are only necessary here because we can't rely on hood to
start goad, since hood fails to compile if it's run before zuse is
reloaded. Once goad is active, this will cease to be a problem.
This fixes +put:in so that it works without the correct jet. There's a
mismatch where the hoon code is wrong and the jet is correct, so that
when we try to run this on alternate interpreters which may not have the
+in jets, things won't work.
%leave over the network didn't work because we included the message type
in the wire from gall, so the duct for the initial %watch and the %leave
were different. We need to know the message type so we can route the
acknowledgment as %poke-ack, %watch-ack, or no-op.
This moves this piece of information to a piece of state, where we queue
up the message types per [duct wire]. Ames guarantees that
acknowledgments will come in order.
This also includes an easy state adapter. The more interesting part of
the upgrade is that we likely have outstanding subscriptions with the
old wire format. The disadvantage of storing information in wires is
that it can't be upgraded in +load. So, here we listen for updates on
the old wire format, and when we get them we kill the old subscription,
so that it will be recreated with the new wire format.
As an aside, this is a good example of what we mean when we say
subscriptions may be killed at any time, so apps must handle this case.
Finally, this fixes the "attributing" ship to ~zod for agent requests.
This information was ignored for agent requests, but including it causes
spurious duct mismatches.
We've seen issues where the message-num of the head of live.state is
less than current.state. When this happens, we continually try to
resend message n-1, but we throw away any acknowledgment for n-1 because
current.state is already n. This halts progress on that flow.
We don't know what causes us to get in this bad state, so this adds an
assert to the packet pump that we're in a good state, run every time
the packet pump is run. When this crashes, we can turn on |ames-verb
and hopefully identify the cause.
This also adds logic to +on-wake in the packet pump to not try to resend
any messages that have already been acknowledged. This is just to
rescue ships that currently have these stuck flows.
(Incidentally, I'd love to have a rr-style debugger for stuff like this.
Just run a command that says "replay my event log watching for this
specific condition and then stop and let me poke around".)
This is why basically all packets are going through the galaxies right
now. Most of the time, the flow right now is:
* talking to ~dopzod but don't know where it is, so ask ~zod to forward,
which it does
* ~dopzod responds both directly (on the origin lane) and through ~zod
* (if NAT, the direct response doesn't get back, but the one through
~zod does. Then you respond directly to ~dopzod because their lane
piggybacked on the response. ~dopzod responds both directly and
through ~zod, and the story picks up the same as if you weren't behind a
NAT)
* now you have a direct lane to ~dopzod, so all is well.
* now the duplicate response from ~dopzod through ~zod comes in (takes a
little longer because it's bouncing off ~zod), resetting your lane to
"provisional"
* since your lane is provisional, you send your next packet both
directly and through ~zod
* GOTO 2
This change says "if I already have a direct lane, don't overwrite it
with a provisional one". This way, the only way the direct lane can be
overwritten is if they stop responding on it (cleared on "not
responding; still trying").
I also added |- to +send-blob to make |ames-verb %rot less confusing.