Summary: Move `Filenodes` to `BlobRepo::attributes` as it is mercurial specific.
Reviewed By: ikostia
Differential Revision: D21662418
fbshipit-source-id: 87648a3e6fd7382437424df3ee60e1e582b6b958
Summary: This diff introduces `BlobRepoHg` extension trait for `BlobRepo` object. Which contains mercurial specific methods that were previously part of `BlobRepo`. This diff also stars moving some of the methods from BlobRepo to BlobRepoHg.
Reviewed By: ikostia
Differential Revision: D21659867
fbshipit-source-id: 1af992915a776f6f6e49b03e4156151741b2fca2
Summary:
This diff adds additional filed `BlobRepo::attributes` which can store attributes of arbitrary type. This will help store opaque types inside blobrepo without creating dependency on a crate which contains type definition for this attribute. This diff also moves `BonsaiHgMapping` inside attributes set.
- This work will allow to move mercurial changeset generation logic to derive data infrastructure
Reviewed By: ikostia
Differential Revision: D21640438
fbshipit-source-id: 3abd912e7227738a73ea9b17aabdda72a33059aa
Summary:
Tooling can't handle named_deps yet, but it can warn about them
P133451794
Reviewed By: StanislavGlebik
Differential Revision: D22083499
fbshipit-source-id: 46de533c19b13b2469e912165c1577ddb63d15cd
Summary:
Remove unused dependencies for Rust targets.
This failed to remove the dependencies in eden/scm/edenscmnative/bindings
because of the extra macro layer.
Manual edits (named_deps) and misc output in P133451794
Reviewed By: dtolnay
Differential Revision: D22083498
fbshipit-source-id: 170bbaf3c6d767e52e86152d0f34bf6daa198283
Summary: This diff resolves TODO to make a helper function a crate private function as the only dependency (APIServer) is now gone.
Reviewed By: StanislavGlebik
Differential Revision: D21954238
fbshipit-source-id: 7cfc49534b7ab8af6035f7a87fdc763127dce318
Summary:
batch_derive() is a dangerous function to use. I'd love to delete it but this
function is very useful for backfilling, so unfortunately I can't.
The problem arises when one tries to backfill blame and unodes simultaneously
(or just derive blame which in turn derives unodes). While batch_derive()
tries to be careful with inserting "outer" derived data's mappings (i.e. blame
mapping), it doesn't do it for inner derived data mappings (i.e. unodes). So we
might end up in the situation where we insert unodes mapping before we inserted
all the manifests for it. If this thing fails in the middle of derivation then
we have a corruption.
Let's do not use it in blobimport. It will make derivation slower, but I'd
rather make it slower than incorrect.
Reviewed By: farnz
Differential Revision: D21905619
fbshipit-source-id: c0227df195a8cf4482b2452ca928acbc5750b3e5
Summary:
This diff migrates add_filenodes method to return FilenodeResult.
That means that all filenodes methods now return FilenodeResult and it's time
now to remove TODOs from derived_data filenodes.
Note that I had to change the test "derive_disabled_filenodes" a bit.
Previously FilenodesOnlyPublic::mapping::get() method immediately returned
FilenodesOnlyPublic::Disabled, while now it returns None if hg changeset is not
derived. This is an expected change in behaviour, so I just updated the test to
try to derive FilenodesOnlyPublic first, which in turns triggers generation of hg changeset.
Reviewed By: ahornby
Differential Revision: D21904401
fbshipit-source-id: f6f4cd14e6cdce5a4b95d8f3f9acff305ae6fa88
Summary:
Similar to get_all_filenodes_maybe_stale() make this method return
FilenodeResult if filenodes are disabled.
Note: this diff adds one TODO in fetch_root_filenode, which will be removed
together with other TODOs in the next diff.
Reviewed By: ahornby
Differential Revision: D21904399
fbshipit-source-id: 1569579699c02eb07021f8143aa652aa192d23bc
Summary:
I observed that for whatever reason our setting of `use_try_shorthand = true` in rustfmt.toml was causing entire functions to not get processed by rustfmt. Even files that contain neither `try` nor `?`. Remove it and reformat fbsource.
Documentation of that config:
- https://github.com/rust-lang/rustfmt/blob/master/Configurations.md#use_try_shorthand
We don't particularly care about the value anymore because nobody writes `r#try!(...)` in 2018 edition code.
Minimized:
```
fn f() {
g(
)
// ...
.h
}
```
This function gets formatted only if use_try_shorthand is not set.
The bug is fixed in the rustfmt 2.0 release candidate.
Reviewed By: jsgf
Differential Revision: D21878162
fbshipit-source-id: b028673c0eb703984d24bf0d2983453fc2a8c212
Summary:
In the next diffs we'll make it possible to disable filenodes in Mononoke. See
D21787848 and attached task for more details, but TL;DR is that if xdb is down
we still want to serve "hg update" traffic.
If filenodes are disabled we obviously can't generate filenodes for new
commits. So one option would be to just return an error from
FilenodesOnlyPublic::derive(...) call. But that would mean that any attempt to
call derivation would fail, and e.g. Mononoke servers won't be able to start up
- (see https://fburl.com/diffusion/roau028d). We could change callers to always
process errors from FilenodesOnlyPublic, but I think it would be harder to
enforce and easier to forget.
So this diff changes FilenodesOnlyPublic to be an enum, and
FilenodesOnlyPublic::Disabled is returned immediately if filenodes are
disabled. For callers it means that they can't rely on filenodes being present
in db even after FilenodesOnlyPublic were derived. That's the whole of the
stack, and the next diffs will update the callers to properly deal with missing
filenodes.
One caveat is that when we re-enable filenodes back we might need to derive
them for a lot of commits.
I don't expect it to happen often (i.e. if xdb is down then we probably can't
commit anyway), but if somehow it happened, then we should be a bit more
careful with re-enabling them after the problem was fixed. For example, we can
first derive all the filenodes locally by e.g. running backfill_derived_data,
and only after that has finished successfully we can re-enable them.
Reviewed By: krallin
Differential Revision: D21840328
fbshipit-source-id: ce9594d4a21110a5cb392c3049ccaede064c1e66
Summary:
DefferedDerivedMapping was added so that we can make deriving stack of commits faster - it does it by postponing updating
derived data mapping (e.g. writing to a blobstore) until the whole stack is derived.
While it probably makes derivation a bit faster, we now think it's better to remove it. A few reasons:
1) It's confusing to understand and it already caused us ubns before
2) It's increases write amplification - because we release the lease before we wrote to a blobstore, writers will try to rederive the same commit a few times. That has caused us a ubn today
Reviewed By: farnz
Differential Revision: D20113854
fbshipit-source-id: 169e05febcd382334bf4da209a20aace0b7c2333
Summary:
The motivation for the whole stack:
At the moment if mysql is down then Mononoke is down as well, both for writes
and for reads. However we can relatively easily improve the situation.
During hg update client sends getpack() requests to fetch files, and currently
for each file fetch we also fetch file's linknode. However hg client knows how
to deal with null linknodes [1], so when mysql is unavailable we can disable
filenode fetching completely and just return null linknodes. So the goal of this stack is to
add a knob (i.e. a tunable) that can turn things filenode fetches on and off, and make
sure the rest of the code deals nicely with this situation.
Now, about this diff. In order to force callers to deal with the fact that
filenodes might unavailable I suggest to add a special type of result, which (in
later diffs) will be returned by every filenodes methods.
This diff just introduces the FilenodeResult and convert BlobRepo filenode
methods to return it. The reason why I converted BlobRepo methods first
is to limit the scope of changes but at the same time show how the callers' code will look
like after FilenodeResult is introduced, and get people's thoughts of whether
it's reasonable or not.
Another important change I'd like to introduce in the next diffs is modifying FilenodesOnlyPublic
derived data to return success if filenodes knob is off. If we don't do that
then any attempt to derive filenodes might fail which in turn would lead to the
same problem we have right now - people won't be able to do hg update/hg
pull/etc if mysql is down.
[1] null linknodes might make some client side operation slower (e.g. hg rebase/log/blame),
so we should use it only in sev-like situations
Reviewed By: krallin
Differential Revision: D21787848
fbshipit-source-id: ad48d5556e995af09295fa43118ff8e3c2b0e53e
Summary:
now fastlog will be able to show a history of file like that:
```
Added
|
Deleted
|
Added
```
We do it by using deleted file manifest to find where a file was deleted. We do
it in case we find a unode which doesn't have a parent.
Differential Revision: D21664248
fbshipit-source-id: 8e80fccc87361e66e4e00ae2da3671ed8327f1f8
Summary:
... or rather make this special case a bit smaller.
In fastlog we have a `prefetch` which was already returned to the caller,
however its parents hasn't been processed yet. Previously we had two separate
places where these parents were processed, now we have only one.
This will make it easier to add support for history across deletions in the
next diff.
Differential Revision: D21664246
fbshipit-source-id: a5a109fafaeec28a0e7a36ffa44545aebc032b00
Summary:
Previously we created `visited` and immediately drop it.
Let's instead create it one level up. I don't think it'd matter in practice,
but it should be cleaner
Reviewed By: HarveyHunt
Differential Revision: D21662456
fbshipit-source-id: 36823309ea09448c5be10c2655ea0f317649f290
Summary:
subcommand_single calls `derived_data_utils.regenerate(vec![cs_id])` with the
intention that derived data for this commit will be regenerated. However
previously it didn't work because DerivedDataUtils::derive() was ignoring
regenerate parameter. This diff fixes it.
Reviewed By: krallin
Differential Revision: D21527344
fbshipit-source-id: 56d93135071a7f3789262b7a9d9ad84a0896c895
Summary:
This diff logs the delay in deriving data. In particular it logs how much time
has left since an underived commit was created.
Note that this code makes an assumption about monotonic dates - for repos with pushrebase
repos that should be the case.
Reviewed By: krallin
Differential Revision: D21427265
fbshipit-source-id: bfddf594467dfd2424f711f895275fb54a4e1c60
Summary: Small refactoring that will make the next diffs easier
Differential Revision: D21426166
fbshipit-source-id: f3c3ae00794046828eaf3c0912dbabc233c97e77
Summary:
- Change get return value for `Blobstore` from `BlobstoreBytes` to `BlobstoreGetData` which include `ctime` metadata
- Update the call sites and tests broken due to this change
- Change `ScrubHandler::on_repair` to accept metadata and log ctime
- `Fileblob` and `Manifoldblob` attach the ctime metadata
- Tests for fileblob in `mononoke:blobstore-test` and integration test `test-walker-scrub-blobstore.t`
- Make cachelib based caching use `BlobstoreGetData`
Reviewed By: ahornby
Differential Revision: D21094023
fbshipit-source-id: dc597e888eac2098c0e50d06e80ee180b4f3e069
Summary:
We have a number of error enums that wrap an existing errors, but fail to
register the underlying error as a `#[source]`. This results in truncated
context chains when we print the error. This fixes that. It also removes a
bunch of manual `From` implementation that can be provided by thiserror's
`#[from]`.
This also required updating the `Display` implementation for those errors. I've
opted for not printing the underlying error, since the context chain will
include it. This does mean that if we print one of those errors without the
context chain (i.e. `{}` as opposed to `{:#}` or `{:?}`), then we'll lose out a
bit of context. That said, this should be OK, as we really shouldn't ever being
do this, because we'd be missing the rest of the chain anyways.
Reviewed By: StanislavGlebik
Differential Revision: D21399490
fbshipit-source-id: a970a7ef0a9404e51ea3b59d783ceb7bf33f7328
Summary:
This removes our own (Mononoke's) implementation of failure chains, and instead
replaces them with usage of Anyhow. This doesn't appear to be used anywhere
besides Mononoke.
The historical motivation for failure chains was to make context introspectable
back when we were using Failure. However, we're not using Failure anymore, and
Anyhow does that out of the box with its `context` method, which you can
downcast to the original error or any of the context instances:
https://docs.rs/anyhow/1.0.28/anyhow/trait.Context.html#effect-on-downcasting
Reviewed By: StanislavGlebik
Differential Revision: D21384015
fbshipit-source-id: 1dc08b4b38edf8f9a2c69a1e1572d385c7063dbe
Summary:
Merge react and userprofile_shell into master, rewrite profile to use react and revert changes to p page
```
Reviewed By: markbt
Differential Revision: D21285297
fbshipit-source-id: f60006738af068982a25fa368fd260ba5b7f3781
Summary:
Design doc for the Deleted Files Manifest derived data structure - https://fb.quip.com/Wk4XAi7dyXRr.
**This diff adds derivation logic for merge commits.**
For linear case each deleted manifest node has an optional liknode, that points to the changeset id where the path was deleted. If the liknode is None it means that the path is or was a directory and its subentries were deleted, but the path itself still exists.
The idea for merge commits is that paths deleted in merges now will have liknodes pointing to the merge commit. So if the path was deleted in both branches of the merge commit, then deleted manifest of the merge commit will have a linknode for the path pointing to the merge commit itself. (For the examples see the test cases below.)
**To generate file history for the deleted file:**
If there is no unode for the given commit and path:
* search for the path in the Deleted files manifest
* if there is no node, there was no such path in the repo
* from the manifest's liknode get the changeset id where the file was deleted
* get the changeset's parents
* for each parent try to find unode, if there is unode, then traverse history
* otherwise check deleted manifest again (going back to the beginning)
This logic is going to be implemented in another diff.
Reviewed By: StanislavGlebik
Differential Revision: D21182002
fbshipit-source-id: 4dfa5ae0dab20a51d78e75c0fafb624f1a652fff
Summary: In this diff I moved some of the deleted files manifest derivation methods from the old futures to async/await. This allows me to introduce changes into the derivation logic using new standard futures.
Reviewed By: StanislavGlebik
Differential Revision: D21177622
fbshipit-source-id: fd5920cce11095885f167dad89ee974308501025
Summary:
When generating the deleted file manifest, there may be many duplicates of entries, particularly for deleted leaf files, which will always have the same manifest and thus the same manifest ID.
To prevent overloading of the blobstore with too many writes for the same key, deduplicate the writes before sending them.
Differential Revision: D21257640
fbshipit-source-id: 8f367496109b3e2b0a802fed96747cc290b295ec
Summary:
This diff adds a special dry-run mode of backfilling (for now only fsnodes are
supported). It does by keeping all derived data in memory (i.e. nothing is
written to blobstore) and periodically cleaning entries that can no longer
be referenced.
This mode can be useful to e.g. estimate size of derived data before actually
running the derivation.
Note that it requires --readonly-storage in order to make sure that we don't
accidentally write anything to e.g. mysql.
Reviewed By: ahornby
Differential Revision: D21088989
fbshipit-source-id: aeb299d5dd90a7da1e06a6be0b6d64b814bc7bde
Summary: Moved tests to the new mechanism of creating changesets, refactored and asyncified tests
Reviewed By: farnz
Differential Revision: D21050192
fbshipit-source-id: d97b1cf0ab92aecc2c35d95c8f9331cead906867
Summary:
This adds a special mode of deriving fsnodes that will be used in
backfill_derived_data. From my experiments it looks like it got 5-10 times
faster.
I tried to explain how it works in the comments - lmk if that's not enough.
Reviewed By: mitrandir77
Differential Revision: D21067817
fbshipit-source-id: ff72a079754a2c15f65852c28d80f723961b53c4
Summary:
This is needed because the tonic crate (see the diff stack) relies on tokio ^0.2.13
We can't go to a newer version because a bug that affects mononoke was introduced on 0.2.14 (discussion started on T65261126). The issue was reported upstream https://github.com/tokio-rs/tokio/issues/2390
This diff simply changed the version number on `fbsource/third-party/rust/Cargo.toml` and ran `fbsource/third-party/rust/reindeer/vendor`.
Also ran `buck run //common/rust/cargo_from_buck:cargo_from_buck` to fix the tokio version on generated cargo files
Reviewed By: krallin
Differential Revision: D21043344
fbshipit-source-id: e61797317a581aa87a8a54e9e2ae22655f22fb97
Summary:
A new method on BonsaiDerived trait that derives data for a batch of commits.
Default implementation just derives them in parallel, so it's not particularly
useful. However it might be overriden if a particular derived data has a more
efficinet way of deriving a batch of commits
Reviewed By: farnz
Differential Revision: D21039983
fbshipit-source-id: 3c6a7eaa682f5eaf6b8a768ca61d6f8a8f1258a7
Summary: It makes it backfill a great deal faster
Reviewed By: krallin
Differential Revision: D21040292
fbshipit-source-id: f6d06cbc76e710b4812f15e85eba73b24cdbbd3e
Summary:
Use deleted manifest to search deleted paths in the repos with linear history. For merged history it returns error as there was no such path.
Commit, where the path was deleted, is returned as a first commit in the history stream, the rest is a history before deletion.
Reviewed By: StanislavGlebik
Differential Revision: D20897083
fbshipit-source-id: e75e53f93f0ca27b51696f416b313466b9abcee8
Summary:
`log_v2` supports time-filters and that means it needs to be able to drop the history stream if the commits got older than the given time frame. (if not it just traverses the whole history...)
However, it cannot be done from the SCS commit_path API or from changeset_path, because they already receive history stream where commits are not ordered by creation time. And the naive solution "if next commit in the stream is older than `after_ts` then drop" won't work: there might be another branch (commit after the current one) which is still **in** the time frame.
I added a terminator-function to the `list_file_history` that is called on changeset id, for which a new fastlog batch is going to be fetched. If terminator returns true, then the fastlog is not fetched and the current history branch is dropped. All ready nodes are still streamed.
```
For example, if we have a history of the file changes like this:
A 03/03 ^|
| |
B 02/03 |
| | - one fastlog batch
C 01/03 |
| \ |
02/01 D E 10/02 _| - let assume, that fastlog batches for D and E ancestors are needed to prefetch
| |
01/01 F G 05/02
# Example 1
We query "history from A after time 01/02"
The old version would fetch all the commits and then filter them in `commit_path`. We would fetch both fastlog batches for the D branch and E branch.
With the terminator, `list_file_history` will call terminator on commit D and get `true` in return and then will drop the D branch,
then it will call terminator on E and get `false` and proceed with fetching fastlog for the E branch.
# Example 2
We query "history from A after time 01/04"
The old version would fetch all the commits and then filter them in `commit_path`, despite the fact that
the very first commit is already older than needed.
With the terminator it will call terminator on A and get `true` and won't proceed any further.
Reviewed By: StanislavGlebik
Differential Revision: D20801029
fbshipit-source-id: e637dcfb6fddceb4a8cfc29d08b427413bf42e79
Summary: Asyncified main functions of the fastlog/ops, so it'd be easier to modify them and proceed with the new features.
Reviewed By: StanislavGlebik
Differential Revision: D20801028
fbshipit-source-id: 2a03eedca776c6e1048a72c7bd613a6ef38c5c17
Summary:
`list_file_history` implements BFS on the commit graph and returns a stream of changeset ids using `bounded_traversal_stream`.
The old version iterated BFS "levels" and each iteration streamed all nodes on the current level. For example, for the commit graph:
```
1 <- start # 1 level
|
2 # 2
| \
3 4 # 3
| |
```
there would be 3 iterations and on each nodes would be yielded: [1], [2], [3, 4]. If there was need to prefetch fastlog batch or batches, it prefetched in parallel batch/batches for changesets on the same level.
The implementation was a bit hacky and it is a bit unfortunate that we need to make 100 iterations to stream changesets that are ready and do not require fetching fastlog. I also needed some simplification so I could then add a terminator function (3rd diff in the stack) on the fastlog batch prefetching stage (and add Deleted Manifest intefration).
So now the `bounded_traversal_stream` keeps a BFS-queue as a state and on each iteration streams all nodes in the queue until it hits the node for which it needs prefetch fastlog batch and goes to the next iteration.
```
state - [queue, prefetch_cs_id]
* on each iteration:
1. If prefetch_cs_id.is_some() =>
- fetch fastlog batch for prefetch_cs_id
- fill the commit graph
- add parents of the prefetch_cs_id to the bfs queue
2. Get from the queue all nodes until we meet changeset without fetched parents.
Mark this node as `prefetch_cs_id` for the next iteration.
3. Stream ready nodes and go to the next iteration.
```
Thus
- we still fetch fastlog batches on demand and not before we really need them
- if we have 100 ready to be yield commits in the queue, we won't do 100 iterations and stream them in one go
- now if we need prefetch fastlog batches for 2 branches on the same "bfs level" we will do it one by one and not in parallel, but this situation is pretty uncommon
- code is simpler and allows to integrate Deleted Manifest and add terminator function.
Reviewed By: StanislavGlebik
Differential Revision: D20768401
fbshipit-source-id: cdba40539a842b3628826f6c72a29514da0d539e
Summary:
This diff turns off the support_old_nightly feature of async-trait (https://github.com/dtolnay/async-trait/blob/0.1.24/Cargo.toml#L28-L32) everywhere in fbcode. I am getting ready to remove the feature upstream. It was an alternative implementation of async-trait that produces worse error messages but supports some older toolchains dating back to before stabilization of async/await that the default implementation does not support.
This diff includes updating async-trait from 0.1.24 to 0.1.29 to pull in fixes for some patterns that used to work in the support_old_nightly implementation but not the default implementation.
Differential Revision: D20805832
fbshipit-source-id: cd34ce55b419b5408f4f7efb4377c777209e4a6d
Summary:
The rust-crypto library appears to be unmaintained, switching
- `crypto::digest::Digest` to `digest::Digest`
- `crypto::sha1::Sha1` to `sha1::Sha1`
- `crypto::sha2::Sha256` to `sha2::Sha256`
Reviewed By: jsgf
Differential Revision: D20456962
fbshipit-source-id: 2e3406dedba05245265d96b480c35ba2421aa3fd