Summary:
The overall goal of this stack is to add WarmBookmarksCache support to
repo_client to make Mononoke more resilient to lands of very large
commits.
We'd like to use WarmBookmarkCache in repo client, and to do that we need to be
able to tell Publishing and PullDefault bookmarks apart. Let's teach
WarmBookmarksCache about it.
Reviewed By: krallin
Differential Revision: D22812478
fbshipit-source-id: 2642be5c06155f0d896eeb47867534e600bbc535
Summary:
This method will be used in the next diff to add a test, but it might be more
useful later as well.
Note that `update()` method in BookmarkTransaction already handles publishing bookmarks correctly
Reviewed By: farnz
Differential Revision: D22817143
fbshipit-source-id: 11cd7ba993c83b3c8bca778560af4a360f892b03
Summary:
The overall goal of this stack is to add WarmBookmarksCache support to
repo_client to make Mononoke more resilient to lands of very large
commits.
The code for managing cached_publishing_bookmarks_maybe_stale was already a bit
tricky, and with WarmBookmarksCache introduction it would've gotten even worse.
Let's move this logic to a separate SessionBookmarkCache struct.
Reviewed By: krallin
Differential Revision: D22816708
fbshipit-source-id: 02a7e127ebc68504b8f1a7401beb063a031bc0f4
Summary:
Pull Request resolved: https://github.com/facebookexperimental/eden/pull/38
The tool is used in some integration tests, make it public so that the tests might pass
Reviewed By: ikostia
Differential Revision: D22815283
fbshipit-source-id: 76da92afb8f26f61ea4f3fb949044620a57cf5ed
Summary:
The overall goal of this stack is to add WarmBookmarksCache support to
repo_client to make Mononoke more resilient to lands of very large
commits.
The problem with large changesets is deriving hg changesets for them. It might take
a significant amount of time, and that means that all the clients are stuck waiting on
listkeys() or heads() call waiting for derivation. WarmBookmarksCache can help here by returning bookmarks
for which hg changesets were already derived.
This is the second refactoring to introduce WarmBookmarksCache.
Now let's cache not only pull default, but also publishing bookmarks. There are two reasons to do it:
1) (Less important) It simplifies the code slightly
2) (More important) Without this change 'heads()' fetches all bookmarks directly from BlobRepo thus
bypassing any caches that we might have. So in order to make WarmBookmarksCache useful we need to avoid
doing that.
Reviewed By: farnz
Differential Revision: D22816707
fbshipit-source-id: 9593426796b5263344bd29fe5a92451770dabdc6
Summary:
The overall goal of this stack is to add WarmBookmarksCache support to
repo_client to make Mononoke more resilient to lands of very large commits.
This diff just does a small refactoring that makes introducing
WarmBookmarksCache easier. In particular, later in cached_pull_default_bookmarks_maybe_stale cache I'd like to store
not only PullDefault bookmarks, but also Publishing bookmarks so that both
listkeys() and heads() method could be served from this cache. In order to do
that we need to store not only bookmark name, but also bookmark kind (i.e. is
it Publishing or PullDefault).
To do that let's store the actual Bookmarks and hg changeset objects instead of
raw bytes.
Reviewed By: farnz
Differential Revision: D22816710
fbshipit-source-id: 6ec3af8fe365d767689e8f6552f9af24cbcd0cb9
Summary:
Most out our APIs throw error when the path doesn't exist. I would like to
argue that's not the right choice for list_file_history.
Errors should be only retuned in abnormal situations and with
`history_across_deletions` param there's no other easy way to check if the file
ever existed other than calling this API - so it's not abnormal to call
it with path that doesn't exist in the repo.
Reviewed By: StanislavGlebik
Differential Revision: D22820263
fbshipit-source-id: 002bda2ef5ee9d6632259a333b7f3652cfb7aa6b
Summary:
Added a new query function to get the largest log id from bookmarks_update_log.
In repo_import tool once we move a bookmark to reveal commits to users, we want to check if hg_sync has received the commits. To do this, we extract the largest log id from bookmarks_update_log to compare it with the mutable_counter value related to hg_sync. If the counter value is larger or equal to the log id, we can move the bookmark to the next batch of commits.
Since this query wasn't implemented before, this diff add this functionality.
Next step: add query for mutable_counter
Reviewed By: krallin
Differential Revision: D22816538
fbshipit-source-id: daaa4e5159d561e698c6e1874dd8822546c699c7
Summary:
Pull Request resolved: https://github.com/facebookexperimental/eden/pull/37
mononoke_hg_sync_job is used in integration tests, make it public
Reviewed By: krallin
Differential Revision: D22795881
fbshipit-source-id: 7a32c8e8adf723a49922dbb9e7723ab01c011e60
Summary:
Pull Request resolved: https://github.com/facebookexperimental/eden/pull/36
This command is used in some integration tests, make it public.
Reviewed By: krallin
Differential Revision: D22792846
fbshipit-source-id: 39ac89b1a674ea63dc924cafa07107dbf8e5a098
Summary:
To gradually merge one repo into the other, we need to produce multiple slices of the working copy. The sum of these slices has to be equal to
the whole of the original repo's working copy. To create each of these slices all files but the ones in the slice need to be deleted from the working copy.
Before this diff, megarepolib would do this in a single delete commit. This however may be impractical, as it will produce huge commits, which we'll be unable
to process adequately. So this diff essentially introduces gradual deletion for each slice, and calls each slice "deletion stack". This is how it looks (a copy from the docstring):
```
M1
. \
. D11 (ac5fca16ae)
. |
. D12 (4c57c974e3)
. |
M2 \
. \ |
. D21 (1135339320) |
. | |
. D22 (60419d261b) |
. | |
o \|
| |
o PM
^ ^
| \
main DAG merged repo's DAG
```
Where:
- `M1`, `M2` - merge commits, each of which merges only a chunk
of the merged repo's DAG
- `PM` is a pre-merge master of the merged repo's DAG
- `D11 (ac5fca16ae)`, `D12 (4c57c974e3)`, `D21 (1135339320)` and `D22 (60419d261b)` are commits, which delete
a chunk of working copy each. Delete commmits are organized
into delete stacks, so that `D11 (ac5fca16ae)` and `D12 (4c57c974e3)` progressively delete
more and more files.
Reviewed By: StanislavGlebik
Differential Revision: D22778907
fbshipit-source-id: ad0bc31f5901727b6df32f7950053ecdde6f599c
Summary:
Once we start moving the bookmark across the imported commits (D22598159 (c5e880c239)), we need to check dependent systems to avoid overloading them when parsing the commits. In this diff we added the functionality to check Phabricator. We use an external service (jf graphql - find discussion here: https://fburl.com/nr1f19gs) to fetch commits from Phabricator. Each commit id starts with "r", followed by a call sign (e.g FBS for fbsource) and the commit hash (https://fburl.com/qa/9pf0vtkk). If we try to fetch an invalid commit id (e.g not having a call sign), we should receive an error. Otherwise, we should receive a JSON.
An imported commit should have the following query result: https://fburl.com/graphiql/20txxvsn - nodes has one result with the imported field true.
If the commit hasn't been recognised by Phabricator yet, the nodes array will be empty.
If the commit has been recognised, but not yet parsed, the imported field will be false.
If we haven't parsed the batch, we will try to check Phabricator again after sleeping for a couple of seconds.
If it has parsed the batch of commits, we move the bookmark to the next batch.
Reviewed By: krallin
Differential Revision: D22762800
fbshipit-source-id: 5c02262923524793f364743e3e1b3f46c921db8d
Summary: On MacOS if you kill a process without waiting on it to be killed you will receive a warning on the terminal saying that the process was killed. To suppress that output, which is messing with the integratino tests, use a combination of kill and wait (the custom "killandwait" bash function). It will wait for the process to stop which is probably what most integration tests would prefer to do
Reviewed By: krallin
Differential Revision: D22790485
fbshipit-source-id: d2a08a5e617e692967f8bd566e48f5f9b50cb94d
Summary: Using "/usr/bin/date" rather than just "date" is very limiting, not all systems have common command line tools installed in the same place, just use "date".
Reviewed By: krallin
Differential Revision: D22762186
fbshipit-source-id: 747da5a388932fb5b9f4c068014c01ee90a91f9b
Summary: On MacOS the default localisation configuration (UTF-8) won't allow operations on arbitrary bytes of data via some commands, because not all sequences of bytes are valid utf-8 characters. That is why when handling arbitrary bytes it is better to use the "C" locale, which can be achieved by setting the LC_ALL env variable to "C".
Reviewed By: krallin
Differential Revision: D22762189
fbshipit-source-id: aa917886c79fba5ea61ff7168767fc4b052a35a1
Summary: Use brew on MacOS GitHub CI runs to update bash from 3.* to 5.*.
Reviewed By: krallin
Differential Revision: D22762195
fbshipit-source-id: b3a4c9df7f8ed667e88b28aacf7d87c6881eb775
Summary: MacOS uses FreeBSD version of command line tools. This diff uses brew to install the GNU tooling on GitHub CI and uses it to run the integration tests.
Reviewed By: krallin
Differential Revision: D22762198
fbshipit-source-id: 1f67674392bf6eceea9d2de02e929bb3f9f7cadd
Summary:
On unexpected errors like missing blobstore keys the walker will now log the preceding node (source) and an interesting step to this node (not necessarily the immediately preceding, e.g. the affected changeset).
Validate mode produces route information with interesting tracking enabled, scrub currently does not to save time+memory. Blobstore errors in scrub mode can be reproduced in validate mode when the extra context from the graph route is needed.
Reviewed By: farnz
Differential Revision: D22600962
fbshipit-source-id: 27d46303a2f2c07219950c20cc7f1f78773163e5
Summary:
Now that we can configure ACL checking on a per-repo basis, use the
`enforce_acl_check` config option as a killswitch to quickly disable ACL
enforcement, if required.
Further, remove the `acl_check` config flag that was always set to True.
As part of this change I've refactored the integration test a little and
replaced the phrase "ACL check" with "ACL enforcement", as we always check the
ACL inside of the LFS server.
Reviewed By: krallin
Differential Revision: D22764510
fbshipit-source-id: 8e09c743a9cd78d54b1423fd2a5cfc9bf7383d7a
Summary: Some versions of sqlite don't allow using LIKE operation on BLOB data, so first cast it to TEXT. This test was failing on Linux runs on GitHub.
Reviewed By: krallin
Differential Revision: D22761041
fbshipit-source-id: 567d68050297c3a2ac781b252d3e9b21ea5b2201
Summary: Have a comprehensive list of OSS tests that do not pass yet.
Reviewed By: krallin
Differential Revision: D22762196
fbshipit-source-id: 19ab920c4c143179db65a6d8ee32974db16c5e3d
Summary:
Update the LFS server to use the `enforce_lfs_acl_check` to enforce
ACL checks for specific repos and also reject clients with missing idents.
In the next diff, I will use the existing LFS server config's
`enforce_acl_check` flag as a killswitch.
Reviewed By: krallin
Differential Revision: D22762451
fbshipit-source-id: 61d26944127711f3503e04154e8c079ae75dc815
Summary:
Let's by default not take a lease so that derived_data_tailer can make progress even if all other services are failing to derive.
One note - we don't remove the lease completely, but rather we use another lease that's separate from the lease used by other mononoke services. The motivation here is to make sure we don't derive unodes 4 times - blame, deleted_file_manifest and fastlog all want to derive unodes, and with no lease at all they would just all derive the same data a few times. Deriving unodes a few times seems undesirable, so I suggest to use a InProcessLease instead of no lease.
Reviewed By: krallin
Differential Revision: D22761222
fbshipit-source-id: 9595705d955f3bb2fe7efd649814fc74f9f45d54
Summary:
Add log sequence numbers to the scuba sample builder. This provides an ordering
over the logs made by an individual instance of Mononoke, allowing them to be
sorted.
Reviewed By: krallin
Differential Revision: D22728880
fbshipit-source-id: 854bde51c7bfc469677ad08bb738e5097cb05ad5
Summary:
We have two deficiencies to correct in here; modernise the code without changing behaviour first to make it easier to later fix them.
Deficiency 1 is that we always call the `on_put` handler; we need a mode that doesn't do that unless a blobstore returns an error, for jobs not waiting on a human.
Deficiency 2 is that we accept a write after one blobstore accepts it; there's a risk of that being the only copy if a certain set of race conditions are met
Reviewed By: StanislavGlebik
Differential Revision: D22701961
fbshipit-source-id: 0990d3229153cec403717fcd4383abcdf7a52e58
Summary:
as in title.
Since we haven't tested it much yet I've added a note that this feature is
experimental
Reviewed By: krallin
Differential Revision: D22760648
fbshipit-source-id: 33f858b0021939dabbe1894b08bd495464ad0f63
Summary:
Move changeset_fetcher building to a separate function, because
build_skiplist_index is already rather large and I'm going to make it larger in
the next diff
Reviewed By: krallin
Differential Revision: D22760556
fbshipit-source-id: 800baba052f46ed817f011f71dd28d40e98245fe
Summary:
Currently our skiplists store a skip edge for almost all public commits. This
is problematic for a few reasons:
1) It uses more memory
2) It increases the startup time
3) It makes startup flakier. We've noticed a few times that our backend storage
return errors more often when try to download large blobs.
Let's change the way we build skiplist. Let's not index every public changeset
we have, but rather index it smarter. See comments for more details.
Reviewed By: farnz
Differential Revision: D22500300
fbshipit-source-id: 7e9c887595ba11da80233767dad4ec177d933f72
Summary:
This adds `megarepolib` support for pre-merge "delete" commits creation.
Please see `create_sharded_delete_commits` docstring for explanation of what
these "delete" commits are.
This functionality is not used anywhere yet (I intend to use it from
`megarepotool`), so I've added what I think is a reasonble test coverage.
Reviewed By: StanislavGlebik
Differential Revision: D22724946
fbshipit-source-id: a8144c47b92cb209bb1d0799f8df93450c3ef29f
Summary:
Pull Request resolved: https://github.com/facebookexperimental/eden/pull/32
This parsing uses the standard "subject name" field of a x509 certificate to create MononokeIdentity.
Reviewed By: farnz
Differential Revision: D22627150
fbshipit-source-id: 7f4bfc87dc2088bed44f95dd224ea8cdecc61886
Summary: If fsnodes point to non-existent content we should be able to detect that.
Reviewed By: farnz
Differential Revision: D22723866
fbshipit-source-id: 31510aada5e21109b498a26e28e0f6f3b7358ec4
Summary: A few projects out of sync between TARGETS and Cargo.toml.
Reviewed By: dtolnay
Differential Revision: D22704460
fbshipit-source-id: 3d809292d50cc42cfbc4973f7b26af38d931121f
Summary: It's nice to have this flag available
Reviewed By: krallin
Differential Revision: D22693732
fbshipit-source-id: 9d0d8f44cb0f5f7263a33e86e9c5b8a9927c0c85
Summary: now the terminator argument is unused - we can get rid of it.
Differential Revision: D22502594
fbshipit-source-id: e8ecec01002421baee38be0c7e048d08068f2d74
Summary:
`until_timestamp` will benefit from checking each node - this will allow for
less filtering on the caller side.
Differential Revision: D22502595
fbshipit-source-id: 23c574b0a4eeb700cf7ea8c1ea29e3a6520097a9
Summary:
This new trait is going to replace the `Terminator` argument to fastlog
traversal function. Insted of deciding if we should fetch or/not given fastlog
batch this trait allows us to make decisions based on each visited changeset.
Differential Revision: D22502590
fbshipit-source-id: 19f9218958604b2bcb68203c9646b3c9b433541d
Summary:
The function for finding the commit where the file was deleted
in the fastlog module doesn't depend on fastlog at all.
It also seems generic enough to be a good public API for deleted files
manifests module.
Differential Revision: D22502596
fbshipit-source-id: 2e226bf14339da886668ee8e3402a08e8120266e
Summary:
Let's centralize the logic that adds new nodes to BFS queue during fastlog
traversal, this will allow me to hook into it in the next diffs.
Differential Revision: D22502593
fbshipit-source-id: 63f4e7adb3a7e11b4a2b2dcc65cab3bb4bf6f015
Summary:
This new skiplist feature allows to find merges between any two related
commits.
Reviewed By: StanislavGlebik
Differential Revision: D22457894
fbshipit-source-id: 203d43588040759b89a895395058a21c9b5ca43d
Summary: I'm planning to use it in my next diff to power `find_merges` functionality.
Reviewed By: StanislavGlebik
Differential Revision: D22457898
fbshipit-source-id: 76c3f107fd8b5bbef96e978037be31efca0f9841
Summary:
The `process_frontier` function is THE function that powers skiplist traversal
and it's quite complex. To make the core more readable I'm moving part of the
code into separate functions.
Differential Revision: D22457896
fbshipit-source-id: e3521855ae7ab889c21d7aff0204e27dc23cf906
Summary:
The `process_frontier` function is THE function that powers skiplist traversal
and it's quite complex. To make the core more readable I'm moving parts of the
code into separate functions.
I'm also planning to use the single step function to simplify lowest common
ancestor algorithm later.
Differential Revision: D22457895
fbshipit-source-id: 1234118705ca6b1b61e09fdd7867ce4366045a28