Summary:
We're currently not emitting any copy metadata in LFS in response to getfiles. As it happens, this appears to fine (clients still get the right metadata), because getreepack includes file copy data and so the client isn't bothered by the fact that the LFS pointers we return don't include it.
That said, it's probably a good idea to clean this up and include copy data. After all, we have it, and we should return to the client whatever they sent us.
Reviewed By: farnz
Differential Revision: D16282793
fbshipit-source-id: 6671f87de2acae2c5da1c754510bbb61de5768d3
Summary: We are fetching envelopes twice in repo_commit, this updates the code to provide a shared abstraction over file and manifest envelopes in order to avoid this duplicate fetch.
Reviewed By: farnz
Differential Revision: D16282794
fbshipit-source-id: 53d53f28f051cbacbc1c931b2dbb98ac74a19c15
Summary:
In repo_commit, we already leverage a file or manifest's envelope to gets its parents, but were getting the copy info from the raw filenode bytes (through a call to `get_raw_content`, which for a filenode eventually calls into `fetch_raw_filenode_bytes` to get the content from the blobstore and retrieves the metadata from the envelope, and then proceeds to discard the contents and only retain the copy info).
This updates this code to just get the metadata from the envelope in order to retrieve copy info, which is functionally equivalent, but avoids a fetch to the blobstore.
There's still some potential for improvement here, since we're duplicating the calls to get the envelope for a file. I'm fixing this up in a separate diff for clarity.
Reviewed By: farnz
Differential Revision: D16282795
fbshipit-source-id: eb9068dcc0d36c4a9c0046e856db3fb2e9a3748f
Summary:
Instantiating a new DB connection may require remote calls to be made to e.g. Hipster to allocate a new certificate (this is only the case when connecting to MySQL).
Currently, our bindings to our underlying DB locator make a blocking call to pretend that this operaiton is synchronous: https://fburl.com/ytmljxkb
This isn't ideal, because this call might actually take time, and we might also occasionally want to retry it (we've had issues in our MySQL tests with acquiring certificates that retrying should resolve). Running this synchronously makes doing so inefficient.
This patch doesn't update that, but it fixes everything on the Rust side of things to stop expecting connections to return a `Result` (and to start expecting a Future instead).
In a follow up diff, I'll work on making the changes in common/rust/sql to start returning a Future here.
Reviewed By: StanislavGlebik
Differential Revision: D16221857
fbshipit-source-id: 263f9237ff9394477c65e455de91b19a9de24a20
Summary:
This updates `find_file_changes` in bonsai generation to avoid content fetches. Until now, we were fetching content to see how big it is and compute its content ID, and sometimes we were also fetching the envelope to access copy info.
This updates the logic to instead unconditionally fetch the envelope, and not fetch the content at all.
Reviewed By: farnz
Differential Revision: D16262271
fbshipit-source-id: 5b299fe2769d3b8b584da2a7551434b7872396b2
Summary:
Before this diff, the `RepoBlobstore` type alias (and the newly-added
`RepoBlobstoreArgs` struct) lived in `blobrepo/blob_changeset` crate, which is
obviously the most correct place for them to be located. All would be fine, had
these things been used locally only. But `RepoBlobstore` is a reasonably
widely-used type alias across our codebase and importing it from
`blob_changeset` seems weird. Let's move it into a dedicated crate.
Reviewed By: StanislavGlebik
Differential Revision: D16174126
fbshipit-source-id: b83e345adcfe567e4a67c8a1621f3a789fab63c6
Summary:
This diff does two things:
- resolves a problem with dropping censorship information when calling
`in_memory_writes_READ_DOC_COMMENT`
- prevents someone from accidentally creating a `BlobRepo` where internal blobstore's prefix is different from the `repoid`. While prefix is conceptually unrelated to a blobstore, we do care that existing blobstores continue to work, so we need this safeguard.
Reviewed By: farnz
Differential Revision: D16163225
fbshipit-source-id: fc1c9d4dc32f6958b4b0e2e61026c1f3fe5f3b17
Summary: `MemoryManifest` is not longer used anywhere
Reviewed By: farnz
Differential Revision: D16148769
fbshipit-source-id: f4e63fed3d56ade122560d271bd5701110c5dab8
Summary:
This diff sets two Rust lints to warn in fbcode:
```
[rust]
warn_lints = bare_trait_objects, ellipsis_inclusive_range_patterns
```
and fixes occurrences of those warnings within common/rust, hg, and mononoke.
Both of these lints are set to warn by default starting with rustc 1.37. Enabling them early avoids writing even more new code that needs to be fixed when we pull in 1.37 in six weeks.
Upstream tracking issue: https://github.com/rust-lang/rust/issues/54910
Reviewed By: Imxset21
Differential Revision: D16200291
fbshipit-source-id: aca11a7a944e9fa95f94e226b52f6f053b97ec74
Summary:
Mine D16132403 introduced inefficiency that always fetched changesetid from
blobstore even if just id was necessary.
This diff fixes it
Reviewed By: krallin
Differential Revision: D16182140
fbshipit-source-id: 7dd4c871b8b1c19e50ba1e813d5fed42c699d064
Summary:
Report to Scuba whenever someone tries to access a blobstore which is blacklisted. Scuba reporting is done for any `get` or `put` method call.
Because of the possible overload - given the high number of requests mononoke receives and that CensoredBlobstore make the verification before we add the caching layer for blobstores - I considered reporting at most one bad request per second. If multiple requests to blacklisted blobstores are made in less than one second, only the first request should be reported. Again, this is not the best approach (to not report all of them), but performance wise is the best solution.
NOTE: I also wrote an implementation using `RwLock` (instead of the current `AtomicI64`), but atomic variables should be faster than using lockers so I gave up on that idea.
Reviewed By: ikostia, StanislavGlebik
Differential Revision: D16108456
fbshipit-source-id: 9e5338c50a1c7d15f823a2b8af177ffdb99e399f
Summary:
Seems cleaner this way. Also allows the `admin` tool to initialize
a censored blobstore.
Differential Revision: D16154919
fbshipit-source-id: f5edacc8b8332c67f1f5dfaf9bf49b4aeaecb33a
Summary:
This one is just an extraction, done to allow access to blobstore creation
without depending on the blobrepo crate (or any part thereof).
NB: the logic of censoring is still left in `blobrepo`, since that change is not a simple copy-paste.
Differential Revision: D16153024
fbshipit-source-id: b1def5f602d0c6423f28699c32f97ceae30b080c
Summary:
Because of the recursion this function can cause stackoverflow if there are a
lot of commits that don't have hg changeset generated.
Let's rewrite it using loop_fn.
Note that there's "panic!()" in fetch_hg_changeset_from_mapping. While having
panics in the code is not great, there doesn't seem to be a way around it. When
generating new hg changesets it expects that parents' changesets are already in
the repository. If they are not generated then it's an internal logic error,
and in that case it's probably better to panic.
Reviewed By: krallin
Differential Revision: D16132403
fbshipit-source-id: 17e48ecce653f72e3f576f3698330d90a4dd7902
Summary:
`MultiplexedBlobstore` can hide errors up until we suddenly lose availabilty of the wrong blobstore. Introduce an opt-in `ScrubBlobstore`, which functions as a `MultiplexedBlobstore` but checks that the combination of blobstores and healer queue should result in no data loss.
Use this new blobstore in the blobrepo checker, so that we can be confident that data is safe.
Later, this blobstore should trigger the healer to fix "obvious" problems.
Reviewed By: krallin
Differential Revision: D15353422
fbshipit-source-id: 83bb73261f8ae291285890324473f5fc078a4a87
Summary:
Added an option to control for which repositories should censoring be
enabled or disabled. The option is added in `server.toml` as `censoring` and it
is set to true or false. If `censoring` is not specified, then the default
option is set to true ( censoring is enabled).
By disabling `censoring` the verification if the key is blacklisted or not is
omitted, therefor all the files are fetchable.
Reviewed By: ikostia
Differential Revision: D16029509
fbshipit-source-id: e9822c917fbcec3b3683d0e3619d0ef340a44926
Summary:
CensoredBlob was placed between Blobstore and PrefixBlobstore. I moved CensoredBlob, so that now it is a wrapper for PrefixBlobstore. This means the key is compared before appending the `repoid` to the key.
By moving CensoredBlob on top of PrefixBlobstore, it provides better isolation for the existing blobstores. This way CensoredBlob does not interact with the underlaying layers and future changes of those layers will, most probably, not impact CensoredBlob's implementation.
Reviewed By: ikostia
Differential Revision: D15900610
fbshipit-source-id: 391594355d766f43638f3152b56d4e9acf49af32
Summary: After using it for sometime I found that it is more convenient to introduce addition parameter `OutCtx` which represent all information needed about node for `fold` operation instead of using `In`. Note this implementation is strictly more general, and isomorphic to previous one if `OutCtx == In`.
Reviewed By: farnz
Differential Revision: D16029193
fbshipit-source-id: f562baa023d737ce1db2936987f6c59bcd0c3761
Summary: This diff moves content SHA1 computation to use the new `CacheManager` so it can benefit from caching stuff in memcache.
Reviewed By: krallin
Differential Revision: D15978549
fbshipit-source-id: f05f1a1382e291dc08989cf11be752720e158888
Summary:
Currently, we implicitly expect that caching is enabled if we're dealing with a remote repository, but that means cachelib must be enabled when running with a remote repository, and that is ... slow.
This can be problematic in two cases:
In tests. It makes MySQL tests unbearably slow, and a little more flaky because we end up using so much CPU. With this patch, MySQL tests remain slower than SQLite tests, but by a factor of < 2, which is a pretty substantial improvement.
Running trivial administrative commands (e.g. a `mononoke_admin`), notably using a dev build (which right now unbearably slow). With this patch, such a trivial command is about 6x faster:
```
[torozco@devvm4998.lla1 ~/fbcode] time buck-out/gen/scm/mononoke/admin#binary/admin --repo-id 2102 --mononoke-config-path /home/torozco/local/.mononoke_exec/config/PROD --skip-caching bookmarks list --kind publishing
Jun 21 08:57:36.658 INFO using repo "instagram-server" repoid RepositoryId(2102)
master c96ac4654e4d2da45a9597af859adeac9dba3d7ca964cb42e5c96bc153f185e3 2c5713ad27262b91bf1dfaf21b3cf34fe3926c8d
real 0m5.299s
user 0m5.097s
sys 0m0.699s
[torozco@devvm4998.lla1 ~/fbcode] time buck-out/gen/scm/mononoke/admin#binary/admin --repo-id 2102 --mononoke-config-path /home/torozco/local/.mononoke_exec/config/PROD bookmarks list --kind publishing
I0621 08:57:59.299988 1181997 CacheAllocator-inl.h:3123] Started worker 'PoolRebalancer'
Jun 21 08:57:59.328 INFO using repo "instagram-server" repoid RepositoryId(2102)
master c96ac4654e4d2da45a9597af859adeac9dba3d7ca964cb42e5c96bc153f185e3 2c5713ad27262b91bf1dfaf21b3cf34fe3926c8d
real 0m28.620s
user 0m27.680s
sys 0m2.466s
```
This is also nice because it means the MySQL tests won't talk to Memcache anymore.
---
Note: in this refactor, I made `Caching` an enum so it can't accidentally be swapped with some other boolean.
---
Finally, it also uses up quite a bit less RAM (we no longer need 2GB of RAM to output one line of bookmarks — although we're still using quite a bit!):
```
[torozco@devvm4998.lla1 ~/fbcode] env time buck-out/gen/scm/mononoke/admin#binary/admin --skip-caching --repo-id 2102 --mononoke-config-path /home/torozco/local/.mononoke_exec/config/PROD bookmarks list --kind publishing
Jun 21 09:18:36.074 INFO using repo "instagram-server" repoid RepositoryId(2102)
master abdd2f78dafeaa8d4b96897955a63844b31324f9d89176b3a62088d0e2ae2b22 1702392d14bf7a332bf081518cb1ea3c83a13c39
5.08user 0.68system 0:05.28elapsed 109%CPU (0avgtext+0avgdata 728024maxresident)k
6776inputs+0outputs (8major+115477minor)pagefaults 0swaps
[torozco@devvm4998.lla1 ~/fbcode] env time buck-out/gen/scm/mononoke/admin#binary/admin --repo-id 2102 --mononoke-config-path /home/torozco/local/.mononoke_exec/config/PROD bookmarks list --kind publishing
I0621 09:19:01.385933 1244489 CacheAllocator-inl.h:3123] Started worker 'PoolRebalancer'
Jun 21 09:19:01.412 INFO using repo "instagram-server" repoid RepositoryId(2102)
master abdd2f78dafeaa8d4b96897955a63844b31324f9d89176b3a62088d0e2ae2b22 1702392d14bf7a332bf081518cb1ea3c83a13c39
26.96user 2.27system 0:27.93elapsed 104%CPU (0avgtext+0avgdata 2317716maxresident)k
11416inputs+5384outputs (17major+605118minor)pagefaults 0swaps
```
Reviewed By: farnz
Differential Revision: D15941644
fbshipit-source-id: 0df4a74ccd0220a786ebf0e883e1a9b8aab0a647
Summary:
Quite surprisingly, this diff saves quite a lot of memory on large gettreepack
requests - >100 Mb per request! So in total it might save gigabytes if a few
big gettreepack requests were sent at once.
This result is surprising and I found it almost accidentally. It's hard to
clearly explain why it helps with memory usage, but my current thinking is that
previously it eagerly fetched a lot of manifests and kept all of them in
memory.
That's not a great explanation, but the diff should be fairly safe to land
anyway.
Reviewed By: aslpavel, farnz
Differential Revision: D15959914
fbshipit-source-id: aee534a981dd79a52111a5123a71f90a6d5af79b
Summary: Add type safety to `abomonation_future_cache` by requiring usage of `VolatileLruCachePool`, and make that change for all usages of `LruCachePool`.
Reviewed By: farnz
Differential Revision: D15882275
fbshipit-source-id: 3f192142af254d7b6b8ea7f9cc586c2034c97b93
Summary: We are copying `MPath` a lot, this change should reduce amount of allocations and memory reuse for `MPath`
Reviewed By: farnz
Differential Revision: D15939495
fbshipit-source-id: 8da8f2c38f7b46f27d0df661210c9964aed52101
Summary: Since we own `In` elements `unfold` can have mutable reference instead of immutable one. This can simplify some code, see `BlobRepo::find_entries_in_manifest` for example.
Reviewed By: farnz
Differential Revision: D15939878
fbshipit-source-id: 9c767240bec279f24f922e0771ac919072b3a56c
Summary:
If a file has two parent filenodes and one of them is ancestor of another then
we want to keep only the descendant filenode as a parent.
(NOTE) Note that this diff doesn't fix all the corner cases. In integration tests a file that has
two parents is modified in a merge commit. Note that if it wasn't modified in
a merge commit then Mercurial produces a hash different from Mononoke. I'll
investigate why it happens
(NOTE) We had incorrect hashes in our test fixtures - this diff fixes them as well
Reviewed By: farnz
Differential Revision: D15896735
fbshipit-source-id: ea31071bc69fab02935887c665f6d03b64d5c572
Summary:
In the next diffs it will be used inside blobrepo to correctly generate hg
filenodes. This diff just moves the functions to a common place.
There are a few cleanups that can be done - see next diff.
Reviewed By: aslpavel
Differential Revision: D15896733
fbshipit-source-id: 19b58e4eb73fd4897d7c8ab28c0dcd8786124f2a
Summary:
New name better reflects what this function does - it might return cached
version of filenodes that might be out of date.
Reviewed By: aslpavel
Differential Revision: D15896734
fbshipit-source-id: caf4f1d3a9a29889327c3373ac886687ec916903
Summary: It updates SqlConstructors to expose a `with_xdb` method that accepts an optional myrouter port.
Reviewed By: krallin
Differential Revision: D15897639
fbshipit-source-id: 25047c24ef28c76d2a27a8d26de8ecad521a1f82
Summary:
This change optimizes performance of `store_file_changes`, by fixing following issues
- do not traverse manifest if it is **removal** of the file
- implements `find_files_in_manifest` to reduce number of traversal of manifest needed, using `bounded_traversal`
- resolves copy_from during parent traversal
Reviewed By: StanislavGlebik
Differential Revision: D15840540
fbshipit-source-id: e4de61aaa6c6e56873b3938bdfabf7588fb82447
Summary: `HgEntryId` is much more useful in a typed from (enum of `(FileType, HgFileNodeId)` and `HgManifestId`), most of the time we now which type entry should contain and it makes it harder to make and error, all other use cases which require just hash should use `HgNodeHash` instead. This diff includes minimal changes which are necessary to make it work. Some of the cases which do sequence of `Entry::get_hash().into_nondehash() -> HgManifestId::new() | HgFileNodeId::new()` are left for future diffs.
Reviewed By: farnz
Differential Revision: D15866081
fbshipit-source-id: 5be9ecc30dbfd0a49ae6c5d084cdfe2dac351dac
Summary:
Apiserver was failing to start up after D15762598. In this diff we started to
read censored blobs from xdb table. The problem was in that in apiserver we
didn't wait for myrouter to start up before we read censored blbos from the
table.
This diff fixes it
Reviewed By: krallin
Differential Revision: D15873817
fbshipit-source-id: 98550289a3c135c5f4a472206c6654e818ed54f4
Summary: Use the service SqlCensoredContentStore to populate the hashmap of blacklisted blobs at startup. Every get and put call on a censoredblob instance will check the hash key against the existing keys in the hashmap. It will display a suggestive error in case the hash key is censored, otherwise, it will follow the normal behavior
Reviewed By: ikostia
Differential Revision: D15762598
fbshipit-source-id: 03a2d19f06934e8207f5294140c38a80b67a3c3d
Summary:
Implemented a service which retrieves the censored blobs from the
database. I am working on writing the integration test for the service before using it in production (the next diff will contain the integration test).
In addition, I've changed the unit tests for get and put methods (they now verify the type of error returned, together with the task identifiers).
Reviewed By: krallin
Differential Revision: D15577325
fbshipit-source-id: 0ffe1b77f41aa9ca0efe686348ea885714a6cf25
Summary:
Mercurial has complicatied logic of finding filenode parents. And this diff
implements one of those tricky cases.
If a file existed and it was copied over (with `hg cp --force`) then Mononoke
set both p1 and copy information. Mercurial though just sets copy information.
This diff implements the same logic in Mononoke (note that for now it's only
for non-merge cases).
Reviewed By: krallin
Differential Revision: D15737250
fbshipit-source-id: 0b2117097801a7137ccc7382477452a0b7bf4504
Summary:
Hashset of keys to verify against them if the current key
is blacklisted.
Reviewed By: StanislavGlebik
Differential Revision: D15516498
fbshipit-source-id: ed7bdc0b824810b0cbcfb9474ff7c3983d8c4759
Summary:
This adds a sanity check that limits the count of matches in `list_all_bookmarks_with_prefix`.
If we find more matches than the limit, then an error will be returned (right now, we don't have support for e.g. offsets in this functionality, so the only alternative approach is for the caller to retry with a more specific pattern).
The underlying goal is to ensure that we don't trivially expose Mononoke to accidental denial of service when a list lists `*` and we end up querying literally all bookmarks.
I picked a fairly conservative limit here (500,000), which is > 5 times the number of bookmarks we currently have (we can load what we have right now successfully... but it's pretty slow);
Note that listing pull default bookmarks is not affected by this limit: this limit is only used when our query includes scratch bookmarks.
Reviewed By: StanislavGlebik
Differential Revision: D15413620
fbshipit-source-id: 1030204010d78a53372049ff282470cdc8187820
Summary:
This adds support for recording server-side whether a given bookmark is publishing and / or pull-default. This is a change on the way towards supporting Infinite Push in Mononoke. This diff will require schema changes on `xdb.mononoke_production`.
There isn't a whole lot of new user-facing functionality in this particular diff. For starters, nobody can edit this flag on bookmarks, and pushes that create a new bookmark will always result in setting a bookmark as publishing AND pull_default.
What this change does however introduce is the notion of `BookmarkHgKind`, which is represents the behavior of this bookmark as far as Mercurial operations are concerned.
There are 3 such kinds, which are relevant in different parts of the codebase:
- PullDefault - this is useful when we want to respond to listkeys queries.
- Publishing — this is useful when we want to identify commit Phases.
- All - this is useful when we want to respond to listkeyspatterns.
Note that only the first two groups are cached in CachedBookmarks.
---
There are a few things going on in this diff (which logically have to happen together):
- First, I updated the `Bookmarks` trait and its various implementations to expose new methods to select Bookmarks based on their hg kind. There's one method per hg kind, and all the methods use a `Freshness` parameter to determine whether to hit cache or not.
- Second, I refactored the various bookmark-access methods in blobrepo. A few of them were duplicative of each other, and some were unused, which became a bigger problem now that we had more (for publishing, pull default, etc.). We are now down to just one method that doesn't hit cache (which is used only by the blobimport script and some tests — perhaps this could be changed?).
- Third, I updated the call sites for the methods that were udpated in step 2 to use the proper method for their use case.
---
Here's a summary of where each method is used (I'm only listing stuff that isn't unit tests):
- `get_bonsai_heads_maybe_stale`:
- `SqlPhases`'s `get_public`
- `build_skiplist_index`
- `get_bonsai_publishing_bookmarks_most_recent`:
- Blobimport (perhaps we can update this to a `maybe_stale` read?)
- `get_pull_default_bookmarks_maybe_stale`
- `listkeys` implementations
- `get_publishing_bookmarks_maybe_stale`
- API Server's `get_branches`
- `get_bookmarks_by_prefix_maybe_stale`:
- `listkeyspatterns` implementation
---
As an aside, note that a lot of the code changes in this diff are actually in CacheBookmark's tests — I had to update those to minimize sleeps: it was fine to sleep in the earlier tests, but I introduced new quickcheck-based tests, and sleeping in those isn't ideal.
Reviewed By: StanislavGlebik
Differential Revision: D15298598
fbshipit-source-id: 4f0fe80ea98dea8af0c8289db4f6f4e4070af622
Summary:
Wrapped Blobstore in a CensoredBlob, with the purpose of checking for
censored commits.
Reviewed By: StanislavGlebik
Differential Revision: D15432210
fbshipit-source-id: 7136196d6ac8e03faabe7c9568c47f77cfe6e801
Summary:
There seems to be a deadlock, see attached task for details.
Disabling shouldn't do a lot of harm because we stil have memcache
Reviewed By: jsgf, farnz
Differential Revision: D15416236
fbshipit-source-id: 45fb966bb4644d3596e2ae1253a1670b4d3e305b
Summary:
As part of adding support for infinitepush in Mononoke, we'll include additional server-side metadata on Bookmarks (specifically, whether they are publishing and pull-default).
However, we do use the name `Bookmark` right now to just reference a Bookmark name. This patch updates all reference to `Bookmark` to `BookmarkName` in order to free up `Bookmark`.
Reviewed By: StanislavGlebik
Differential Revision: D15364674
fbshipit-source-id: 126142e24e4361c19d1a6e20daa28bc793fb8686
Summary:
`Phases` currently have very ugly API, which is constant source of confusion. I've made following changes
- only return/cache public phases
- do not require `public_heads` and always request them from `BlobRepo::get_bonsai_heads_maybe_stale`
- consolidated `HintPhases` and `SqlPhases`
- removed `Hint` from types which does not carry any meaning
- fixed all effected callsites
Reviewed By: StanislavGlebik
Differential Revision: D15344092
fbshipit-source-id: 848245a58a4e34e481706dbcea23450f3c43810b
Summary:
We've recently hit a stackoverflow issue while trying to push or blobimport a
lot of commits.
Part of the stacktrace:
{P64639044}
I matched the stacktrace to CreateChangeset::create() method. When we try to
import a lot of commits we get a future that represents a long stack of
commits. Polling this future results in deep recursion.
It's not clear why we didn't have these stack overflows before. Regardless of
that using `spawn_future()` helps prevent stackoverflow because it creates a
new task for each commit and uses oneshot::channel to connect them, which avoid
recursion.
Reviewed By: aslpavel
Differential Revision: D15367915
fbshipit-source-id: cf4e7981ffd66a8a0c4531516e022eb215265bc7