Summary: small cleanup - will help remove clone in the later diff
Reviewed By: farnz
Differential Revision: D20441266
fbshipit-source-id: ee15c7bdbfe2f42ccd1d10abbbe0d43e89547df5
Summary:
let's return the generation numbers as part of commit info - they are cheap to
obtain.
Reviewed By: StanislavGlebik
Differential Revision: D20426744
fbshipit-source-id: 50c7017c55aeba04fb9059e2c1db19f2fb0a6e5e
Summary:
Let's return those to the caller. The changeset is already fetched at this
point so this should come at no extra cost.
Reviewed By: markbt
Differential Revision: D20426745
fbshipit-source-id: 7e28f5fd44efdb502b8e37a7afac3ea9113e2c5e
Summary: I'm about to introduce one more usecase of it so let's rename it first.
Reviewed By: farnz
Differential Revision: D20393776
fbshipit-source-id: d74146fa212cdc4989a18c2cbd28307f58994759
Summary:
Currently, x-repo commit validator runs full working copy comparison every time. This is slow, as it requires fetching of full manifests for the commit versions in both a small and a large repo. For those of leafs, which are different, filenodes are also fetched, so that `ContentId`s can be compared. It's slow. Because it's slow, we skip the majority of commits and validate only 1 in ~40 `bookmarks_update_log` entries. Obviously, this is not ideal.
To address this, this diff introduces a new way to run validation: based on comparing full manifest diffs. For each entry of the `bookmarks_update_log` of the large repo, we:
- unfold it into a list of commits it introduces into the repository
- fetch each commit
- see how it rewrites to small repos
- compute manifest differences between the commit and it's parent in each involved repo
- compare those differences
- check that topological order relationship remains sane (i.e. if `a` is a p1 of `b` in a small repo, than `a'` must be a p1 of `b'` in a large repo, where `a'` and `b'` are remappings of `a` and `b`)
In addition, this diff adds some integration tests and gets rid of the skipping logic.
Reviewed By: StanislavGlebik
Differential Revision: D20007320
fbshipit-source-id: 6e4647e9945e1da40f54b7f5ed79651927b7b833
Summary:
Replace ownership moves with borrows here and there, add fetching of `FileType`
as well.
Reviewed By: krallin
Differential Revision: D20387564
fbshipit-source-id: 378511402277d744ddfbb68e01a3f5d1707f6d08
Summary:
Even simplest test took ~30 seconds to finish. Turned out most of the time is
sepnt in creating sqlite shards, and the reason we haven't noticed it before is
because we haven't use sqlblob in integration tests until recently (until
February 24 to be precise - D20001261).
We don't need 100 sqlite shards, let's decrease it downto 5.
Reviewed By: HarveyHunt
Differential Revision: D20438107
fbshipit-source-id: e71fd4cabf908e3d92b446fc518a0e5dd64a00bb
Summary:
The input for getcommitdata is a list of HgChangesetIds. For every entry, the
endpoint retrieves the commit, formats it like the revlog then returns it back
to the client.
The format for the returned entries is:
```
Hash " " Length LF
RevlogContent LF
```
Looking for recommendations for how to structure the code better.
Looking for recommendations on implementation requirements:
metrics, throttling, veriftying hash for returned bytes.
Reviewed By: krallin
Differential Revision: D20376665
fbshipit-source-id: 5d9eb0d581fd2b352cf3ce44f4777ad45076c8f4
Summary:
The data portion of the changelog revlog from local checkouts is going away.
Instead of fetching data from disk, data will be fetched from the server.
For migration purposes we want the data from the server to match the data that
would have been available locally.
Note. I don't understand the difference between `RevlogChangeset` and
`HgChangesetContent`. I want to eventually call
`mercurial_revlog::changeset::serialize_cs` and this change reduces
the number of transformations required. The whole process is more direct and
thus easier to follow.
Reviewed By: StanislavGlebik
Differential Revision: D20373568
fbshipit-source-id: f2f4eccdf6a35b322e13c5a4b3fa18d5d289848e
Summary:
The diff only contains HgCommand signatures. No implementation yet.
The purpose of the getcommitdata command is to return the serialized contents
of a commit. Given a Mercurial Changelog Id, the endpoint returns the same
contents that the Revlog would return on a Mercurial server.
At this point I am looking for suggestions regarding the protocol and the
implementation. My assumption is that both request and response can be fully
kept in memory. I think that we may decide that the request is going to be
streamed to the client so the initial protocol allows for that.
Requirements:
Input: HgChangelogId
Output: Changelog entry contents
Protocol Summary:
```
Request: "getcommitdata" LF "nodes " Length LF Length*(HEXDIG|" ")
Response: *(40HEXDIG Length LF Length*(%x00-79) LF)
```
A bit of a silly protocol. Let me know what recommendations you have.
The Request is modelled after the "known" command. This allows for efficient
batching compared to a batch command model. It's a bit awkward that we don't
pass in the number of HgChangelogId entries that we have in the request but
that is the existing protocol.
For every HgChangelogId in the request the response will first have a line
with the HgChangelogId that was requested and the length of the contents.
The next line will contain the contents followed by line feed.
Reviewed By: krallin
Differential Revision: D20345367
fbshipit-source-id: 50dffff4f6c60396f564f2f1f519744ce730bf96
Summary:
rust-crypto hasn't been updated since 2016, we are replacing it according to https://github.com/RustSec/advisory-db/blob/master/crates/rust-crypto/RUSTSEC-2016-0005.toml
Specifically, we replace crypto::Digest with digest::Digest, crypto::Sha1 with sha1::Sha1.
Reviewed By: krallin
Differential Revision: D20424524
fbshipit-source-id: aa5d268cb5d3c4b1b4e9f7c6f1f3c41d559f121a
Summary:
Avoid boilerplate in unit tests by consistently using `fbinit::compat_test`.
This is a very mechanical change; most of the modified lines are due to indentation changes.
Reviewed By: krallin
Differential Revision: D20414564
fbshipit-source-id: 11c6bd702de28c8a21978f0345d2b0ff8a574e72
Summary:
Let's use functionality added in D20389226 so that we can generate
diffs even for large files.
I've contemplated between two approaches: either "silently" generate
placeholder diff for files that are over the limit or add a new option
where client can request these placeholders. I choose the latter for a few
reasons:
1) To me option #1 might cause surprises e.g. diffing a single large file
doesn't fail, but diffing two files whose size is (LIMIT / 2 + 1) will fail.
Option #2 let's the client be very explicit about what it needs - and it also
won't return a placeholder when the actual content is expected!
2) Option #2 makes the client think about what it wants to be returned,
and it seems in line with source control thrift API design in general i.e.
commit_file_diffs is not trivial to use by design to because force client to
think about edge cases. And it seems that forcing client to think about additional
important edge case (i.e. large file) makes sense.
3) The code change is simpler with option #2
I also thought about adding generate_placeholder_diff parameter to CommitFileDiffsParams,
but that makes integration with scs_helper.py harder.
Reviewed By: markbt
Differential Revision: D20417787
fbshipit-source-id: ab9b32fd7a4768043414ed7d8bf39e3c6f4eb53e
Summary:
This adds a test demonstrating that we can perform BFS fetching over SSH. The
test should demonstrate that the fetches are:
- Done in a BFS fashion (we don't fetch the entire trees before comparing them,
instead we do a fetch at each layer in the tree). In particular, note that
the cc tree, which is unchanged, doesn't get explored at all.
- Done using the right paths / filenodeids, and therefore the right linknodes
are located.
Reviewed By: farnz
Differential Revision: D20387124
fbshipit-source-id: b014812b0e6e85a5cdf6abefe3fe4f47b004461e
Summary:
This is convenient because it makes it possible to tell what is going on within
Mononoke from the outside (e.g. introspect perf counters).
I'll land this after D20387115.
Reviewed By: farnz
Differential Revision: D20387125
fbshipit-source-id: ee070c4d658a0ec8f232152fe8b34bd0b56e6888
Summary:
Similarly to D20418610, let's keep track of request failure rates in ODS
counters so they can be health-cheked by SF.
Reviewed By: farnz
Differential Revision: D20418705
fbshipit-source-id: 1d281e06c59414a0610836106370592596b47e39
Summary:
Earlier, I added counters for successes and failures, but unfortunately
canaries do not allow for tracking formulas, so I also need the actual request
success rate logged.
More broadly speaking, it probably does make sense to have a very high level
"success rate" counter to query anyway. This adds it.
Reviewed By: farnz
Differential Revision: D20418610
fbshipit-source-id: 94f115f8201aef4f6498f4da98e170194186f0f8
Summary:
We need to differentiate the empty directory from no directories. Adding a
trailing comma after each directory instead of separating them achieves that.
Reviewed By: StanislavGlebik
Differential Revision: D20309700
fbshipit-source-id: 387ec477560968392de0a9631d67ccb591bd3cab
Summary:
This will allow the hg client to do tree fetching like we do in the API Server,
but through the SSH protocol — i.e. by passing a series a manifest ids and
their paths, without recursion on the server side through gettreepack.
Reviewed By: StanislavGlebik
Differential Revision: D20307442
fbshipit-source-id: a6dca03622becdebf41b264381fdd5837a7d4292
Summary:
The goal of the stack is to support "rendering" diffs for large files in scs
server. Note that rendering is in quotes - we are fine with just showing a
placeholder like "Binary file ... differs". This is still better than the
current behaviour which just return an error.
In order to do that I suggest to tweak xdiff library to accept FileContentType
which can be either Normal(...) meaning that we have file content available, or
Omitted, which usually means the file is large and we don't even want to fetch it, and we
just want xdiff to generate a placeholder.
Reviewed By: markbt, krallin
Differential Revision: D20389226
fbshipit-source-id: 0b776d4f143e2ac657d664aa9911f6de8ccfea37
Summary:
Our post-request processing was still implemented using old futures. Let's get
rid of that, and make it all new futures. This will make it easier to maintain,
and means we can get rid of the tokio-old dependency.
Reviewed By: mitrandir77
Differential Revision: D20343188
fbshipit-source-id: 513e124805e2ca588a1e312d8f5ca697ed6030c8
Summary:
This updates the lfs server and eden api server to use a newer version of
Gotham, which comes along with an updated version of Bytes and Hyper.
A few things had to change for this:
- New bytes don't support concatenation, so we need to fold them ourselves,
except...
- ... new Hyper bodies don't tell you how big they are (either in requests or
responses), so we need to inspect headers to find the size instead (I added
this in `gotham_ext::body_ext::BodyExt`, although it arguably belongs more in
a `hyper_ext` crate, but creating a new crate for just this seems overkill).
- New Hyper requires its data stream to be `Sync` for reasons that have more to
do with developer experience than runtime
(https://github.com/hyperium/hyper/pull/1857). Unfortunately, our Filestore
streams aren't `Sync`, because our `BoxFuture` contains a `dyn Future` that
isn't explicitly `Sync` (which is how we pull things out of blobstores). Even
if `BoxFuture` contained a `Sync` future, that still wouldn't be enough
anyway, because `compat()` explicitly implements `!Sync` on the stream it
returns. I'll ask upstream in Hyper if this can possibly change in the
future, but for now we can work around it by wrapping the stream in a
channel. I'll keep an eye out for performance here.
- When I updated our "pre state data" tweaks on top of Gotham, I renamed those
to "socket data", since that's a better name or what they are (hence the
changes here).
- I updated the lfs_protocol to stop depending on Hyper and instead depend on
http, since that's all we need here.
As you review this, please pay close attention to the updated implementation of
`SignalStream`. Since this is a custom `Stream` in new futures, it requires a
bit of `unsafe { ... }`.
Note that, unfortunately, the diff includes both the Gotham update and the
server updates, since they have to happen together.
Reviewed By: kulshrax, dtolnay
Differential Revision: D20342689
fbshipit-source-id: a490db96ca7c4da8ff761cb80c1e7e3c836bad87
Summary:
This adds the ability to specify a priority in hgcli, and to pass it on to
Mononoke. This will be used to replay commit cloud traffic at a lower priority.
Reviewed By: farnz
Differential Revision: D20038573
fbshipit-source-id: 4055d28ee295e2b15c15945bd3741f6d739ead3a
Summary:
This adds a blobstore that can reach into a CoreContext in order to identify
the allowed level of concurrency for blobstore requests initiated by this
CoreContext. This will let us replay infinitepush bundles with limits on a
per-request basis.
Reviewed By: farnz
Differential Revision: D20038575
fbshipit-source-id: 07299701879b7ae65ad9b7ff6e991ceddf062b24
Summary:
Time filters for file history require fetching changeset or changeset info to decide whether to include the commit into response. To speed up the process instead of sequential mapping, I buffer the map stream in batches of 100.
However it is unfortunate to fetch additional 100 history changesets if there is no time filters and only several commits were requested. Keeping also in mind that the most of the requests don't care about time.
Avoiding that would speed up the history generation,
I changed the changeset_path API so now it just returns the stream of changeset contexts. And commit_path API for history if there is no time filters collects everything to vector, otherwise applies them and also collects to vector. Then this vector is converted into response using FuturesOrdered.
Reviewed By: StanislavGlebik
Differential Revision: D20287497
fbshipit-source-id: 0c6b1315eccddb48f67bf5fa732bdf7c9a54a489
Summary:
A lot of callsites want to know repo name. Currently they need to pass it from
the place where repo was initialized, and that's quite awkward, and in some
places even impossible (i.e. in derived data, where I want to log reponame).
This diff adds reponame in BlobRepo
Reviewed By: krallin
Differential Revision: D20363065
fbshipit-source-id: 5e2eb611fb9d58f8f78638574fdcb32234e5ca0d
Summary:
We track this in Scuba right now (and alarm on it), but tracking it in ODS will
make it easier to incorporate in our canary and post-release health check
workflow.
Reviewed By: StanislavGlebik
Differential Revision: D20361803
fbshipit-source-id: 99fb514d41f9cda42c3c9a82f3b8d6681285430a
Summary:
Before we assumed that if the rows_affected length doesn't match the number of
entries we were trying to insert we have a conflict. Let's verify if we really
have conflict or we're trying to insert the same entry twice.
Reviewed By: krallin
Differential Revision: D20343219
fbshipit-source-id: 19e032439fdd65f5fe1afe1a10b401bc2fe33462
Summary: Running blobimport twice on the same commit seems to cause problems.
Reviewed By: krallin
Differential Revision: D20343218
fbshipit-source-id: 4d572630e7c15c219bee8db15cc879b2cb8602fe
Summary: Add ability to walk all published bookmarks as there may be multiple important bookmarks
Reviewed By: krallin
Differential Revision: D20249806
fbshipit-source-id: aff2ee1ec7d51a9e4fb6e1e803612abd207fd6cb
Summary:
The goal of the whole stack is quite simple (add reponame field to BlobRepo), but
this stack also tries to make it easier to initialize BlobRepo.
To do that BlobrepoBuilder was added. It now accepts RepoConfig instead of 6
different fields from RepoConfig - that makes it easier to pass a field from
config into BlobRepo. It also allows to customize BlobRepo. Currently it's used
just to add redaction override, but later we can extend it for other use cases
as well, with the hope that we'll be able to remove a bunch of repo-creation
functions from cmdlib.
Because of BlobrepoBuilder we no longer need open_blobrepo function. Later we
might consider removing open_blobrepo_given_datasources as well.
Note that this diff *adds* a few new clones. I don't consider it being a big
problem, though I'm curious to hear your thoughts folks.
Note that another option for the implementation would be to take a reference to objects
instead of taking them by value. I briefly looked into how they used, and lot of them are passed to the
objects that actually take ownership of what's inside these config fields. I.e. Blobstore essentially takes ownership
of BlobstoreOptions, because it needs to store manifold bucket name.
Same for scuba_censored_table, filestore_params, bookmarks_cache_ttl etc. So unless I'm missing anything, we can
either pass them as reference and then we'll have to copy them, or we can
just pass a value from BlobrepoBuilder directly.
Reviewed By: krallin
Differential Revision: D20312567
fbshipit-source-id: 14634f5e14f103b110482557254f084da1c725e1
Summary:
We've recently added new scuba table for derived data
(https://fburl.com/scuba/mononoke_derived_data/e4sekisf), and looks like our
previous format of logging is not very useful. It's better to have separate
fields for changeset id and derived data type, since it makes aggregation
easier.
Reviewed By: krallin
Differential Revision: D20309093
fbshipit-source-id: 48f5f04e0412002ef04028e34b12bf267a9b6834
Summary:
The mock crates contain a standard set of mock commits ids with all nybbles
set to a single hex value.
For the mutation tests I want to be able to generate them from a number and
have more than 15 changeset IDs. Add a new function that generates an hg
changeset ID from a number.
Reviewed By: krallin
Differential Revision: D20287382
fbshipit-source-id: 1f57de89f19e2e2eea8dbfea969a4d54510e23d8
Summary:
Note that comparing to many other asyncifying efforts, this one actually adds
one more clone instead of removing them. This is the clone of a logger field.
That shouldn't matter much because it can be cleaned up later and because this
function will be called once per repo.
Reviewed By: krallin
Differential Revision: D20311122
fbshipit-source-id: ace2a108790b1423f8525d08bdea9dc3a2e3c37c
Summary:
Instead of returning `anyhow::Error` wrapping an `ErrorKind` enum
from each Thrift client method, just return an error type specific
to that method. This will make error handling simpler and less
error-prone by removing the need to downcast the returned error.
This diff also removes the `ErrorKind` enums so that we can be sure
that there are no leftover places trying to downcast to them.
(Note: this ignores all push blocking failures!)
Reviewed By: dtolnay
Differential Revision: D20260398
fbshipit-source-id: f0dd96a7b83dd49f6b30948660456539012f82e6
Summary:
Previously we could have "Started ..." before "Starting ..."
This diff fixes it.
Reviewed By: krallin
Differential Revision: D20277406
fbshipit-source-id: 3c2f3fa1723c2e0852c6b114592ab7ad90be17ff
Summary:
This updates the store_bytes method to chunk incoming data instead of uploading
it as-is. This is unfortunately a bit hacky (but so was the previous
implementation), since it means we have to hash the data before it has gone
through the Filestore's preparation.
That said, one of the invariants of the filestore is that chunk size shouldn't
affect the Content ID (and there is fairly extensive test coverage for this),
so, notionally, this does work.
Performance-wise, it does mean we are hashing the object twice. That actually
was the case before as well anyway (since obtain the ContentId for FileContents
would clone them then hash them).
The upshot of this change is that large files uploaded through unbundle will
actually be chunked (whereas before, they wouldn't be).
Long-term, we should try and delete this method, as it is quite unsavory to
begin with. But, for now, we don't really have a choice since our content
upload path does rely on its existence.
Reviewed By: StanislavGlebik
Differential Revision: D20281937
fbshipit-source-id: 78d584b2f9eea6996dd1d4acbbadc10c9049a408
Summary:
This is a very small struct (2 u64s) that really doesn't need to be passed by
reference. Might as well just pass it by value.
Differential Revision: D20281936
fbshipit-source-id: 2cc64c8ab6e99ee50b2e493eff61ea34d6eb54c1
Summary: separate out the Facebook-specific pieces of the sql_ext crate
Reviewed By: ahornby
Differential Revision: D20218219
fbshipit-source-id: e933c7402b31fcd5c4af78d5e70adafd67e91ecd
Summary:
Context: https://fb.workplace.com/groups/rust.language/permalink/3338940432821215/
This codemod replaces all dependencies on `//common/rust/renamed:tokio-preview` with `fbsource//third-party/rust:tokio-preview` and their uses in Rust code from `tokio_preview::` to `tokio::`.
This does not introduce any collisions with `tokio::` meaning 0.1 tokio because D20235404 previously renamed all of those to `tokio_old::` in crates that depend on both 0.1 and 0.2 tokio.
This is the tokio version of what D20213432 did for futures.
Codemod performed by:
```
rg \
--files-with-matches \
--type-add buck:TARGETS \
--type buck \
--glob '!/experimental' \
--regexp '(_|\b)rust(_|\b)' \
| sed 's,TARGETS$,:,' \
| xargs \
-x \
buck query "labels(srcs, rdeps(%Ss, //common/rust/renamed:tokio-preview, 1))" \
| xargs sed -i 's,\btokio_preview::,tokio::,'
rg \
--files-with-matches \
--type-add buck:TARGETS \
--type buck \
--glob '!/experimental' \
--regexp '(_|\b)rust(_|\b)' \
| xargs sed -i 's,//common/rust/renamed:tokio-preview,fbsource//third-party/rust:tokio-preview,'
```
Reviewed By: k21
Differential Revision: D20236557
fbshipit-source-id: 15068b93a0a944d6249a1d9f63840a4c61c9c1ba
Summary:
This updates microwave to also support changesets, in addition to filenodes.
Those create a non-trivial amount of SQL load when we warm up the cache (due to
sequential reads), which we can eliminate by loading them through microwave.
They're also a bottleneck when manifests are loaded already.
Note: as part of this, I've updated the Microwave wrapper methods to panic if
we try to access a method that isn't instrumented. Since we'd be running
the Microwave builder in the background, this feels OK (because then we'd find
out if we call them during cache warmup unexpectedly).
Reviewed By: farnz
Differential Revision: D20221463
fbshipit-source-id: 317023677af4180007001fcaccc203681b7c95b7
Summary:
This incorporates microwave into the cache warmup process. See earlier in this
stack for a description of what this does, how it works, and why it's useful.
Reviewed By: ahornby
Differential Revision: D20219904
fbshipit-source-id: 52db74dc83635c5673ffe97cd5ff3e06faba7621