sapling/eden
Stanislau Hlebik f4a078e257 mononoke: make sure async megarepo requests are picked up by another worker if current worker dies
Summary:
High-level goal of this diff:
We have a problem in long_running_request_queue - if a tw job dies in the
middle of processing a request then this request will never be picked up by any
other job, and will never be completed.
The idea of the fix is fairly simple - while a job is executing a request it
needs to constantly update inprogress_last_updated_at field with the current
timestamp. In case a job dies then other jobs would notice that timestamp
hasn't been updated for a while and mark this job as "new" again, so that
somebody else can pick it up.
Note that it obviously doesn't prevent all possible race conditions - the worker
might just be too slow and not update the inprogress timestamp in time, but
that race condition we'd handle on other layers i.e. our worker guarantees that
every request will be executed at least once, but it doesn't guarantee that it will
be executed exactly once.

Now a few notes about implementation:
1) I intentionally separated methods for finding abandoned requests, and marking them new again. I did so to make it easier to log which requests where abandoned (logging will come in the next diffs).

2) My original idea (D29821091) had an additional field called execution_uuid, which would be changed each time a new worker claims a request. In the end I decided it's not worth it - while execution_uuid can reduce the likelyhood of two workers running at the same time, it doesn't eliminate it completely. So I decided that execution_uuid doesn't really gives us much.

3) It's possible that there will be two workers will be executing the same request and update the same inprogress_last_updated_at field. As I mentioned above, this is expected, and request implementation needs to handle it gracefully.

Reviewed By: krallin

Differential Revision: D29845826

fbshipit-source-id: 9285805c163b57d22a1936f85783154f6f41df2f
2021-07-27 14:12:53 -07:00
..
fs utils: fix race in SpawnedProcess::threadedCommunicate 2021-07-27 07:32:42 -07:00
hg-server Remove target-based type checking in eden 2021-07-23 12:34:20 -07:00
integration test: verify that EdenFS can be started in the fsck tests 2021-07-26 20:07:19 -07:00
locale
mononoke mononoke: make sure async megarepo requests are picked up by another worker if current worker dies 2021-07-27 14:12:53 -07:00
scm amend: fix file mixup during "--to" rebase 2021-07-27 12:26:42 -07:00
test_support fix systemd tests locally 2021-07-09 17:24:11 -07:00
test-data fix fsck snapshot integration tests 2021-07-14 16:20:04 -07:00
.gitignore
Eden.project.toml