4.3 KiB
Distributed Garbage Collection
We use a weak B
map, to track local boxes (entries are removed by virtue of being a weak map once they are no longer referenced in local heap / boxes):
B_map :: WeakMap BoxId (MVar Value)
and a weak C
set, tracking all remote boxes referenced by local heap / boxes:
type RemoteBox = (BoxId, Node)
C_set :: WeakMap RemoteBox
Each local box b
has an associated value, and associated set of boxes referenced by its contents, b_subs
.
let keepaliveDuration = 20.seconds -- or whatever
type Keepalive = Keepalive { b :: BoxId, visited :: Set RemoteBox }
Receiving Keepalives
When node n
receives a keepalive message for BoxId b
- If
n
doesn't ownb
, disregard (shouldn't occur) - Else if
(b,n)
∈visited
, disregard (normal occurrence) - Else
- Create a strong reference to
b
for a fixed period of time (keepaliveDuration
) - Let
b_subs
be the set of all boxes (local and remote) referenced byb
.- If
b_subs
is not cached, and no existing process is indexingb
, starting indexingb
and cache the result when complete. - If indexing does not complete in time, do not interrupt indexing, but use
C_set
as an approximation ofb_subs
for the purposes of processing this particular keepalive message.
- If
- For each
b_i
∈b_subs
,- If
b_i
is a remote box, send(Keepalive b_i (Set.insert (b,n) visited))
to the owner ofb_i
. - If
b_i
is a local box, process(Keepalive b_i (Set.insert (b,n) visited))
locally. Whether or not you hit the network is up to you, but in this scheme, we do need to recursively propagate keepalives through local boxes.
- If
- Create a strong reference to
To compute b_subs
(set of boxes referenced by the value inside the b
box):
- Keep mutable cache
Optional [BoxId]
for each runtime value,v
, tracking boxes referenced transitively byv
. - Do a deep scan of the
v
inside the box to fully populate caches, recursively. - Avoid revisiting subtrees that already have a computed cache.
Receiving Continuations or Box Updates
When a continuation c
is transferred from node x
to node y
, or when value c
is Box.put
from node x
to node y
, node y
adds non-local boxes referenced by c
to C_set
. (This indexing may be done as part of the network deserialization.)
We must ensure that boxes referenced by c
are not GCed before y
can issue keepalives; this means that node x
must send keep-alives to any boxes referenced by c
during the transfer (this should already happen without special care) and at least once more after the transfer has completed, to avoid a race condition while y
takes over the keepalives. This may mean that both nodes x
and y
must also index c
while it is being transferred.
FAQ
Q: Will C_set
contain all of the remote boxes referenced by local boxes?
A: Yes: to store a value into b
, the value must be constructed within some continuation. Remote box references can only exist in a continuation transferred from a remote node, or a value Box.put
from a remote node. In both of these cases, any remote boxes referenced in the transfer are indexed into C
, per "Receiving Continuations or Box Updates" above.
Q: Can we say that durable values don't keep boxes alive? That a durable shouldn't expect any particular value to be preserved in a referenced box? A: ...
Q: If a remote node has computed the Optional [BoxId]
for a runtime value, should the remote node transfer that cache to me?
A: ...
Optimizations
- Avoid allocating boxes to B-map and C-set until first transfer. Until first transfer, boxes are just a regular
MVar
on the stack.
** Example reference graph**
type Foo = Ref (Box Foo) | No_Ref
do Remote
Remote.transfer x
q := Box.make
r := Box.make
Remote.transfer y
s := Box.make
t := Box.make
Remote.fork <| do Remote
sleep-random-duration
Box.take t
Box.put q (Ref s)
Box.put s (Ref r)
Box.put r (Ref t)
Box.put t (Ref q)
Box.put t No_Ref -- maintains cycle until Box.take t, then breaks cycle
x y
┌─┐ ┌─┐
┌>│q│──>│s│
│ ├─┤ /├─┤
│ │ │ / │ │
│ ├─┤└ ├─┤
│ │r│──>│t│
│ └─┘ └─┘
└────────┘