Commit Graph

6268 Commits

Author SHA1 Message Date
Jun Wu
120e6a03a0 templater: move builtin templates to a separate file
Summary:
We'd like to remove `{rev}` from templates.
This makes it easier to copy, edit, review, switch the built templates.

Reviewed By: DurhamG

Differential Revision: D22468670

fbshipit-source-id: 59580c413f97dfbee898b97ee64d6812f82d2014
2020-07-10 17:52:28 -07:00
Katie Mancini
a701242bf7 fetch aux data on corp
Summary:
This extends the metadata importer to work on corp.

Prefetching metadata for the entries in a tree when we fetch it saves us
an extra round trip to the server to fetch a blob when only the metadata
for that blob is fetched. This is particularly important on corp where the
latencies to the server are larger.

Requesting metadata with out the blob itself happens often in the parsing
phase of buck commands, thus this metadata prefetch should improve
the build times, and other work flows that rely heavily on buck queries.

Reviewed By: chadaustin

Differential Revision: D21698728

fbshipit-source-id: 4072be23f2fa7df33cf46879e8c1d8ddd6c316ba
2020-07-10 16:03:32 -07:00
Katie Mancini
614729cb49 Fetch aux data for blobs from scs
Summary:
Buck uses the content SHA-1 to identify each of the source files for a target.
During the parsing phase it needs these SHAs, though the content of the
files is not yet needed, and may never be needed if the file has already
been built and is in the buck cache.

Currently, if we do not already have metadata cached for a file when
requested we fetch the contents of the file, and compute the hash.

We want to avoid this.

Eventually this data will be available from the Mononoke EdenAPI server,
but for now we want a temporary solution to unblock the Buck team, and
ship benefits early.

Reviewed By: chadaustin

Differential Revision: D21820913

fbshipit-source-id: 56a7e32519f0fb04881518306d94aaed33527fd9
2020-07-10 16:03:32 -07:00
Katie Mancini
4962a7face tree metdata storage
Summary:
Prefetching metadata for the entries in a tree when we fetch it saves us
an extra round trip to the server to fetch a blob when only the metadata
for that blob is fetched. (This can happen often while parsing targets in
builds)

We will to prefetch the metadata for each of the entries in a tree when
we fetch the tree and store the metadata for each entry under that
entries id (to make looking up the entry metadata by its id quick)

However, we also don't want to unnecessarily fetch data from
the server if we already done so.

To accomplish this we will also store the metadata for each entry under the tree
id in the local store. This will: 1) allow us to check if we have already fetched
the metadata from the server when we are fetching a tree (we only have the
tree id easily available here to storing the metadata under the tree id makes
it much easier/less expensive to do this check). 2) allow us to refil the
metadata for each entry stored under that entries blob id if it has been cleared
from the local store (this may happen is the local store is gets large and gets
partially cleaned to reclaim space).

This implements the method to store tree metadata for all entries under the tree
id and under the blob id for each entry.

Reviewed By: chadaustin

Differential Revision: D22239173

fbshipit-source-id: d4e0ffd642ce0b4034188cfc4eeaf2ea05f54e77
2020-07-10 16:03:32 -07:00
Katie Mancini
d2b4855372 introduce scs metadata importer
Summary:
Prefetching metadata for the entries in a tree when we fetch it saves us
an extra round trip to the server to fetch a blob when only the metadata
for that blob is fetched. (This can happen often while parsing targets in
builds)

This implements a custom metadata fetcher to fetch this data when we
fetch a tree from the server.

Reviewed By: chadaustin

Differential Revision: D22086639

fbshipit-source-id: 5fe31d375bf6f7376eb67496d553d6b4540fc0c9
2020-07-10 16:03:32 -07:00
Katie Mancini
af70a36a41 introduce metadata importer
Summary:
This will allow adding custom MetadataImporters in different eden builds.

DefaultMetadataImporter provides a no-op version of the interface to be
used by default.

Reviewed By: chadaustin

Differential Revision: D21960834

fbshipit-source-id: aec8a3627ab1223f74466b92a0ebe3290b67b7ed
2020-07-10 16:03:32 -07:00
Katie Mancini
26a7f3ad25 Wrap backing stores copy of local store in shared ptr
Summary:
Previously the BackingStore kept a raw pointer to the LocalStore. To do this we relied on EdenServer ensuring the lifetime of the LocalStore exceeds that of the BackingStore.

This makes the LocalStore pointer a shared pointer to explictly make sure that the LocalStores lifetime matches the BackingStores lifetime.

Reviewed By: chadaustin

Differential Revision: D22394597

fbshipit-source-id: c81cb26c6fc8f834bc46d8576ced06ba6a96ac2c
2020-07-10 16:03:32 -07:00
Katie Mancini
550400364d introduce tree metadata storage in local store
Summary:
This introduces a class to manipulate the metadata for all the entries in a
tree. This adds serialization and deserialization to this class so that it can
be written to the local store.

Why do we need this? We need some way to easily check when we have already
fetched metadata for a tree and do not need to refetch this from the server to
avoid expensive network requests. Later diffs add functionally to store the metadata
for tree entries in the local store under the tree hash using this class.

Reviewed By: chadaustin

Differential Revision: D21959015

fbshipit-source-id: 0c0e8750737f3076c1f9604d0319cab7f2658656
2020-07-10 16:03:32 -07:00
Katie Mancini
264d749d67 record ScsProxyHash in LocalStore during import
Summary:
In following diffs we will use scs to prefetch meta-data for files, so that this data
will be available with out fetching the file content (which will improve build times
on eden).

This builds up the proxy hash index that serves as a conversion between eden
specific identifiers and commit and path which we will use to index into scs.

Reviewed By: chadaustin

Differential Revision: D21820909

fbshipit-source-id: 17891f6772f49c7c183061d7a4df2fe0a3be9d25
2020-07-10 16:03:32 -07:00
Katie Mancini
dc94bc8916 create scs proxy hash
Summary:
In following diffs we will use scs to prefetch meta-data for files, so that this data
will be available with out fetching the file content (which will improve build times
on eden).

SCS indexes trees by an scs specific hash (blake2 content hash) or by the commit
hash and path. Since this is different from the eden hashes and mercurial
hashes, we need another index to go between the current ids we have in eden
and identifiers for scs.

This introduces a proxy hash that serves as this conversion. Because we have
commit hashes around in eden right now, this is an easier route to indexing
into scs currently.

Reviewed By: chadaustin

Differential Revision: D21237648

fbshipit-source-id: 79115ac034a5f062ae879713cd2c1a17f348c725
2020-07-10 16:03:31 -07:00
Lukasz Piatkowski
75ae342768 edenscm/hg: add GitHub Actions with CI for HG plus add fixes for getdeps (#25)
Summary:
Fixes include:
1. Passing "GETDEPS_BUILD_DIR" and "GETDEPS_INSTALL_DIR" env variable and using them in eden/scm/Makefile rather than assuming the source code is always in the same place regardless getdeps arguments (it isn't).
2. Added "fbthrift-source" and "fb303-source" to avoid unnecessary compilation (at least of fb303) and to put fbthrift and fb303 source code in an easy to locate place inside getdeps' "installed" folder.

Pull Request resolved: https://github.com/facebookexperimental/eden/pull/25

Test Plan: sandcastle, check oss-eden_scm-darwin-getdeps

Reviewed By: farnz

Differential Revision: D22431872

Pulled By: lukaspiatkowski

fbshipit-source-id: 8ccbb090713ec085a5dd56df509eb58ab6fb9e34
2020-07-10 12:07:45 -07:00
Simon Farnsworth
78847ff88c Make BlobstoreSyncQueue use new futures
Summary: Stage 1 of a migration - next step is to make all users of this trait use new futures, and then I can come back, add lifetimes and references, and leave it modernised

Reviewed By: StanislavGlebik

Differential Revision: D22460164

fbshipit-source-id: 94591183912c0b006b7bcd7388a3d7c296e60577
2020-07-10 06:43:13 -07:00
Mark Thomas
a51d164892 admin: increase type_length_limit
Reviewed By: ikostia

Differential Revision: D22476055

fbshipit-source-id: 1df7556a5cf774744b26f09e3ed681cceb30c617
2020-07-10 05:55:06 -07:00
Mark Thomas
2180ac866d dbbookmarks: share SelectBookmark query
Summary: Use `pub(crate)` visibility to share the `SelectBookmark` query between modules.

Reviewed By: StanislavGlebik

Differential Revision: D22464059

fbshipit-source-id: 269561f5ab936b730ce2052e50173134ce241ff8
2020-07-10 04:50:25 -07:00
Mark Thomas
fb5fdb9c15 bookmarks: remove repo_id from Bookmarks methods
Summary:
Remove the `repo_id` parameter from the `Bookmarks` trait methods.

The `repo_id` parameters was intended to allow a single `Bookmarks` implementation
to serve multiple repos.  In practise, however, each repo has its own config, which
results in a separate `Bookmarks` instance for each repo.  The `repo_id` parameter
complicates the API and provides no benefit.

To make this work, we switch to the `Builder` pattern for `SqlBookmarks`, which
allows us to inject the `repo_id` at construction time.  In fact nothing here
prevents us from adding back-end sharing later on, as these `SqlBookmarks` objects
are free to share data in their implementation.

Reviewed By: StanislavGlebik

Differential Revision: D22437089

fbshipit-source-id: d20e08ce6313108b74912683c620d25d6bf7ca01
2020-07-10 04:50:25 -07:00
Mark Thomas
aed95ea96d dbbookmarks: split up into modules
Summary:
The dbbookmarks crate is getting too large for a single file.  Split it up into
a `store` module, which implements the bookmarks traits, and a `transaction`
module, which handles bookmark transactions.

Reviewed By: krallin

Differential Revision: D22437088

fbshipit-source-id: 629b62de151400cdbf56d502aef061df46c3da81
2020-07-10 04:50:25 -07:00
Mark Thomas
3afceb0e2c bookmarks: extract BundleReplayData from BookmarkUpdateReason
Summary:
Separate out the `BundleReplayData` from the `BookmarkUpdateReason` enum.  There's
no real need for this to be part of the reason, and removing it means we can
abstract away the remaining dependency on Mercurial changeset IDs from
the main bookmarks traits.

Reviewed By: mitrandir77, ikostia

Differential Revision: D22417659

fbshipit-source-id: c8e5af7ba57d10a90c86437b59c0d48e587e730e
2020-07-10 04:50:24 -07:00
Mark Thomas
fa4dce16f7 dbbookmarks: improve async transaction functions
Summary:
The dbbookmarks implementation still contains some remnants of old-style
futures combinators.  Remove these.

Reviewed By: krallin

Differential Revision: D22432175

fbshipit-source-id: 8d4419def4129112c2386b45e750970790020049
2020-07-10 04:50:24 -07:00
Simon Farnsworth
65e7404eba Command to manually scrub keys supplied on stdin
Summary: For populating the XDB blobstore, we'd like to copy data from Manifold - the easiest way to do that is to exploit MultiplexedBlobstore's scrub mode to copy data directly.

Reviewed By: krallin

Differential Revision: D22373838

fbshipit-source-id: 550a9c73e79059380337fa35ac94fe1134378196
2020-07-10 01:01:05 -07:00
Durham Goode
48a4372ae3 py3: fix morestatus printing
Summary: It now works in py3.

Reviewed By: quark-zju

Differential Revision: D22468693

fbshipit-source-id: 7ebe03560c54369f98477a64ce8074b55de2073e
2020-07-09 20:46:16 -07:00
Durham Goode
7c0dea5f3d py3: fix nullid file content from remotefilelog
Summary: All file content should be bytes.

Reviewed By: quark-zju

Differential Revision: D22468694

fbshipit-source-id: 9febc60d0fd1c49c2f5812f3ba9f10b4782ffd11
2020-07-09 20:46:16 -07:00
Arun Kulshreshtha
6b67d820bd auth: remove use of unwrap
Reviewed By: quark-zju

Differential Revision: D22467292

fbshipit-source-id: d645d437a3dc80b1a7f29841067aa05b0e48df17
2020-07-09 19:05:55 -07:00
Arun Kulshreshtha
14a7fe636f cpython-ext: Add ExtractInnerRef trait
Summary: Per comments on D22429347, add a new `ExtractInnerRef` trait that is similar to `ExtractInner`, but returns a reference to the underlying value. A default implementation is provided for types whose inner value is `Clone + 'static`, so in practice most types will only need to implement `ExtractInnerRef`, whereas the callsite may choose whether it needs a reference or an owned value.

Reviewed By: quark-zju

Differential Revision: D22464158

fbshipit-source-id: 7b97329aedcddb0e51fd242b519e79eba2eed350
2020-07-09 19:05:55 -07:00
Arun Kulshreshtha
a5ae136439 pyrevisionstore: add EdenAPI store bindings
Summary: Add add a `edenapistore` class to that wraps a `EdenApiHgIdRemoteStore`. This class is purely used as a means to set up the stores from Python code, and is only used as a way to get an `Arc<EdenApiHgIdRemoteStore>` to the Rust content store. It has no functionality of its own.

Reviewed By: quark-zju

Differential Revision: D22449702

fbshipit-source-id: ad2094c79da523071b6ed8344c8dde706e448c95
2020-07-09 19:05:55 -07:00
Arun Kulshreshtha
2f97de536f pyedenapi: rewrite bindings
Summary: This is effectively a complete rewrite of the EdenAPI Python bindings to use the new client.

Reviewed By: quark-zju

Differential Revision: D22442903

fbshipit-source-id: c3cf2b2b8291e24d6d4d3a3546ccc69472510567
2020-07-09 19:05:55 -07:00
Arun Kulshreshtha
5cb7bdd3c0 edenapi: use EdenApiError as error type for StatsFuture
Summary: Ensure that all of the components of an EdenAPI response use the same error type.

Reviewed By: quark-zju

Differential Revision: D22443029

fbshipit-source-id: 3e00a8b83677beb5ef2d90630fe9b85760874186
2020-07-09 19:05:55 -07:00
Arun Kulshreshtha
cb16831e6d revisionstore: add add_entry method to HgIdMutableDeltaStore
Summary: Add an `add_entry` convenience method to `HgIdMutableDeltaStore`, similar to the one present in `HgIdMutableHistoryStore`.

Reviewed By: quark-zju

Differential Revision: D22443031

fbshipit-source-id: 84fdaae9fbd51e6f2df466b0441ec5f7ce6715f7
2020-07-09 19:05:55 -07:00
Arun Kulshreshtha
7ae097e8da cpython-ext: add ExtractInner trait
Summary:
A common pattern in Mercurial's data storage layer Python bindings is to have a Python object that wraps a Rust object. These Python objects are often passed across the FFI boundary to Rust code, which then may need to access the underlying Rust value.

Previously, the objects that used this pattern did so in an ad-hoc manner, typically by providing an `into_inner` or `to_inner` inherent method. This diff introduces a new `ExtractInner` trait that standardizes this pattern into a single interface, which in turn allows this pattern to be used with generics.

Reviewed By: quark-zju

Differential Revision: D22429347

fbshipit-source-id: cab4c24b8b98c6ef8307f72a9b4726aabdc829cc
2020-07-09 19:05:55 -07:00
Durham Goode
fe102de31a bundle2: chunk large data blobs so they fit under 2GB
Summary:
Bundle2 chunks have to fit under 2GB, but we have code that simply
returns entire buffers as a chunk, which may be over 2GB. Let's split that up
into smaller chunks.

Reviewed By: quark-zju

Differential Revision: D21235286

fbshipit-source-id: f52366fb5ecebf4f9f00914e044c46e147873bec
2020-07-09 16:22:40 -07:00
Ailin Zhang
763f855e5a show fsck progress animation bars
Summary: This diff updated `eden start` to show animated fsck progress bars

Reviewed By: genevievehelsel

Differential Revision: D22357699

fbshipit-source-id: f399cc78fc25c85100e5931505ce787899c3fff4
2020-07-09 14:43:42 -07:00
Arun Kulshreshtha
a4140f2eca remotefilelog: remove old EdenAPI Python code
Summary: Remove all of the old EdenAPI Python code from Mercurial. For the new EdenAPI client, we intend to expose HTTP fetching through the Rust storage interfaces rather than putting conditionals throughout the Python code.

Reviewed By: quark-zju

Differential Revision: D22405579

fbshipit-source-id: d3c9ed02d9f624b9490e9280b8b0b4f8a127a9b5
2020-07-09 13:08:28 -07:00
Arun Kulshreshtha
9c9d27d95f edenapi: reuse HttpClient
Summary:
D22396026 made it so that `HttpClient::send_async` no longer consumes `self`. This means that instead of creating a new HTTP client for each request, we can reuse the same one.

This has the benefit of allowing for connection reuse (which was the point of D22396026), resulting in lower latency for serial fetches.

Reviewed By: quark-zju

Differential Revision: D22397768

fbshipit-source-id: 9d066c1ec64a6aa1b36ec674ef294030c1f90b41
2020-07-09 13:08:28 -07:00
Arun Kulshreshtha
ce69772ec7 edenapi: support sending serial requests from CLI
Summary: Allow passing multiple JSON requests to the EdenAPI CLI. The requests will be performed serially, which allows for testing the performance of serial EdenAPI calls.

Reviewed By: quark-zju

Differential Revision: D22397769

fbshipit-source-id: c59e5abf53eee9c2014010672183e202b6f180fc
2020-07-09 13:08:27 -07:00
Arun Kulshreshtha
0b05d4aefe http_client: add pool for Multi handles
Summary:
Add a pool of `Multi` handles that the client can reuse across requests.

Previously, `HttpClient`'s async functions had to consume the client in order to have a `'static` lifetime (since `Future`s generally cannot hold references to things outside of themselves). This meant that the each async operation would use its own `Multi` handle, preventing connection reuse across operations since the `Multi` handle maintains a connection cache internally.

With this change, the client can reuse the `Multi` session after an async operation, thereby benefitting from libcurl's caching. Note that the same `Multi` handle still cannot be used by concurrently running `Future`s (as this [would not be thread safe](https://curl.haxx.se/libcurl/c/threadsafe.html)), but once a `Future` has completed its `Multi` handle will return to the pool for use by subsequent requests.

 ---
(Somewhat tangential)

As is noted in the code comments, `libcurl`'s C API provides a way to share caches across multiple multi sessions: [the "share" interface](https://curl.haxx.se/libcurl/c/libcurl-share.html).

While using this would seems preferable to an ad-hoc solution like this diff, it turns out that the `curl` crate does not provide safe bindings to the share interface. This means that in order to use the share interface, we'd need to directly use the unsafe bindings from `curl-sys`.

In addition to the difficulty of working with unsafe FFI code, the API expects the application to handle synchronization by passing it function pointers to handle locking/unlocking shared resources.

Ultimately, I came to the conclusion that managing lifetimes and synchronization in unsafe code across an FFI boundary would be nontrivial, and ensuring correctness would require a lot of effort that could be avoided by implementing an ad-hoc solution on top of the safe API instead. However, it might make sense to change this to use the share interface in the future.

Reviewed By: quark-zju

Differential Revision: D22396026

fbshipit-source-id: 06eea2ffacdc791527eac9ce4becc457af5c0480
2020-07-09 13:08:27 -07:00
Arun Kulshreshtha
1323b17436 edenapi: delete old client
Summary: Delete the old EdenAPI client.

Reviewed By: quark-zju

Differential Revision: D22379475

fbshipit-source-id: 99f76ea170ec9db8d79727fbdfd441afd2de3899
2020-07-09 13:08:27 -07:00
Arun Kulshreshtha
14392eb035 pyedenapi: use new EdenAPI crate
Summary: Update the EdenAPI Python bindings to use the new client. This is mostly just a stopgap measure to allow us to delete the old client code; nothing in production actually uses these bindings anymore, and the new client will primarily be used from Rust.

Reviewed By: quark-zju

Differential Revision: D22379476

fbshipit-source-id: 953e0ffc2ce682869ee234d672a154046b373c1e
2020-07-09 13:08:27 -07:00
Arun Kulshreshtha
d1d3224ba1 revisionstore: use new EdenAPI crate
Summary: Update the `revisionstore` and `backingstore` crates to use the new EdenAPI crate.

Reviewed By: quark-zju

Differential Revision: D22378330

fbshipit-source-id: 989f34827b744ff4b4ac0aa10d004f03dbe9058f
2020-07-09 13:08:27 -07:00
Arun Kulshreshtha
41e68f46d3 edenapi: add blocking API
Summary: Add a new `EdenApiBlocking` trait that exposes blocking versions of the `EdenApi` trait's methods, for use in non-async code.

Reviewed By: quark-zju

Differential Revision: D22305396

fbshipit-source-id: d0f3a73cad1a23a4f0892a17f18267374e63108e
2020-07-09 13:08:27 -07:00
Arun Kulshreshtha
30a6cad591 edenapi: add EdenAPI testing CLI
Summary:
This diff adds an EdenAPI CLI program that allows manually sending requests to the server.

Requests are read from stdin in a JSON format (the same format used by the `make_req` tool and the EdenAPI server integration tests). This makes it easy to create and edit requests during debugging.

Responses are re-serialized as CBOR and written to stdout. (The program will refuse to write output if stdout is a TTY.) These responses can then be analyzed using the `read_res` tool (also used by the EdenAPI server integration tests).

The program prints real-time download statistics during data fetching, allow the user to debug performance in addition to correctness.

The program uses standard `hgrc` files to configure the EdenAPI client, which means that one can simulate production settings by specifying a production `hgrc`. By default, it will read from `~/.hgrc.edenapi` rather than `~/.hgrc` since the user will most likely want to configure this program independently of Mercurial.

Reviewed By: quark-zju

Differential Revision: D22370163

fbshipit-source-id: 5d9974bc05fa960d26cd2c87810f4646e2bc55b4
2020-07-09 13:08:27 -07:00
Zeyi (Rice) Fan
30ba5cf783 fix missed use of Path.resolve
Summary:
There was some missed usage of `Path.resolve`. This diff should cover it all.

```
cli $ rg -F ".resolve"
main.py
967:        uid = self.resolve_uid(args.uid)
968:        gid = self.resolve_gid(args.gid)

util.py
622:    `Path.resolve`. This is a helper method to work around that by using
628:    return path.resolve(strict=strict)
```

Reviewed By: chadaustin

Differential Revision: D22459188

fbshipit-source-id: c2a1b132f752cc399ebf34723f26123559939f2a
2020-07-09 13:03:58 -07:00
Durham Goode
7411a8e6fa py3: fix pushrebase and infinitepush tests using old mysql connector
Summary:
Apparently some of these tests still run in py2. Let's let it fallback
to the old mysql-connector connector.

Reviewed By: xavierd

Differential Revision: D22458822

fbshipit-source-id: add3da42cbd18e6cb5b34b3038d96cf52c7c6387
2020-07-09 12:17:17 -07:00
Chad Austin
1031c6a211 stop shipping hg_import_helper.py
Summary:
proxy_import_helper.py exists for compatibility with older EdenFS
builds. None of those builds are running anymore, so remove it.

Reviewed By: genevievehelsel

Differential Revision: D22451196

fbshipit-source-id: 4d258b3fafe13bb67bd11259f5d1193a7e5575e6
2020-07-09 11:28:07 -07:00
Ailin Zhang
a1163c34f2 using ProgressCallback type for simplification to print fsck messages
Summary: This diff defines `Overlaychecker::ProgressCallback` to replace repetitive function type declaration.

Reviewed By: genevievehelsel

Differential Revision: D22243160

fbshipit-source-id: ea05e451817a760b5266879b956eaea48dc8d85e
2020-07-09 11:13:18 -07:00
Stanislau Hlebik
361f4e98a7 mononoke: use batch_derive method in derived data utils
Summary:
Previously backfill_batch_dangerous method was calling internal derive_impl() method
directly. That wasn't great (after all, we are calling a function whose name suggests it should only be called from inside derive data crate) and this diff changes it so that we call batch_derive() method instead.

This gives a few benefits:
1) We no longer call internal derive_impl function
2) It allows different types of derived data to override batching behaviour.
For example, we've already overriden it for fsnodes and next diff will override
it for blame as well.

To make it compatible with derive_impl() batch_derive() now accepts derive data mode and mapping

Reviewed By: krallin

Differential Revision: D22435044

fbshipit-source-id: a4d911606284676566583a94199195860ffe2ecf
2020-07-09 10:45:19 -07:00
Katie Mancini
df2b9b9009 open all RocksDB column families for backwards compatabiliy
Summary:
From the Rocks DB documentation:

> When opening a DB in a read-write mode, you need to specify all Column
Families that currently exist in a DB. If that's not the case, DB::Open call
will return Status::InvalidArgument()

This can cause problems for us in a couple of situations:
- When we need to rollback from an eden version where we added a column to
our configuration for RocksDB
- When we delete a column from our configuration for RocksDB

To make sure we do not encounter this error we need to make sure that we still
open all the columns existing in the database, even if they are not in our
configured list of family columns.

Reviewed By: wez

Differential Revision: D22425310

fbshipit-source-id: 9822b22cfedf4633f65bbed96f95a546dd3614f4
2020-07-09 10:28:14 -07:00
Mark Thomas
5c95bf6be2 mutation_store: make separate requests by primordial and successor
Summary:
D22206317 (9a6ed4b6ca) added requesting of predecessor information for suspected primordials
by the successor ID.  This allows recovery of earlier predecessors when partial
data upload resulted in the history of a commit being extended backwards.

Unfortunately, while the individual requests are fast, the combined request
using `OR` in SQL ended up being very slow for some requests.

Separate out the requests at the application level, and aggregate the results
by concatenating them.  `collect_entries` already handles duplicates should any
arise.

Most of the time the successor query will very quickly return no rows, as
it only matters when history is extended backwards, which is expected to be
rare.

Reviewed By: ikostia

Differential Revision: D22456062

fbshipit-source-id: 1e6094b4ac1590a5824e9ae6ef48468766560188
2020-07-09 09:21:01 -07:00
Viet Hung Nguyen
b7817ffbd8 xdiff: renamed third-party xdiff functions
Summary:
Renamed xdiff functions to avoid linking issues when using both libgit2-sys and xdiff.

When using repo_import tool (https://fburl.com/diffusion/8p6fhjt2) we have libgit2-sys dependency for importing git repos. However, when we derive blame data types, we need to use xdiff functionalities (from_no_parents: https://fburl.com/diffusion/pitukmyo -> diff_hunks: https://fburl.com/diffusion/9f8caan9 -> xdl_diff: https://fburl.com/diffusion/260x66hf). Both libgit2 and eden/scm have vendored versions of xdiff library. Therefore, libgit2-sys and eden/scm share functions with the same signatures, but have different behaviours and when we tried to derive blame, it used libgit2-sys's xdl_diff instead of eden's. This resulted in getting segfaults (https://fburl.com/paste/04gwalpo).
Note: repo_import is the first tool that has tried to import both and the first to run into this issue.

Reviewed By: StanislavGlebik

Differential Revision: D22432330

fbshipit-source-id: f2b965f3926a2dc45de1bf20e41dad70ca09cdfd
2020-07-09 01:20:32 -07:00
Katie Mancini
81a21ac4fa Increase process cmdline buffer
Summary:
Currently when we are resolving the full command line for a client pid, we only
read the first 256 bytes of the command.

This means that some commands will be truncated, this has come up in some
of our recently added logs. This ups the buffer size so that we can
hopefully get the full command line.

The longer term solution would be to implement the something fancier mentioned
in the comment in the code copied below, but also has drawbacks as mentioned.

> // Could do something fancy if the entire buffer is filled, but it's better
// if this code does as few syscalls as possible, so just truncate the
// result

Reviewed By: wez

Differential Revision: D22436219

fbshipit-source-id: 80a9aecfe148aa3e333ca480c6a8cb8b9c5c86f2
2020-07-08 15:48:15 -07:00
Jun Wu
b80966f93c revlog: turn on head-based-commit-transaction for tests
Summary:
Bypass truncation-based transaction if narrow-heads is on.

The transaction abort still works logically because commit references stay
unchanged on abort.

Related EdenFS and Mononoke tests are updated. Mononoke tests probably
shouldn't rely on revlog / fncache implementation details in hg.

Reviewed By: DurhamG

Differential Revision: D22240186

fbshipit-source-id: f97efd60855467b52c9fb83e7c794ded269e9617
2020-07-08 14:33:58 -07:00
Jun Wu
3244065fb4 revlog: bypass some transaction logic when adding commits with modern setup
Summary:
With narrow-heads, visible heads are explicitly controlled by commit
references. Adding commits can be just writing them out directly.

This mainly removes the "buffered" writes of `00changelog.i`.
Instead of writing pending changes to `00changelog.i.a`, they
are directly written to `00changelog.i` (or buffered in memory
with future changes).

This does not bypass all transaction logic. Truncation can still
happen. Strip is also unaffected.

The change is incomplete. In the future, pending changes will
be written in-memory to the Rust HgCommits struct and we no
longer write directly to revlog.

Reviewed By: DurhamG

Differential Revision: D22240176

fbshipit-source-id: ac9d20ab95ff304fb285a503d2d3db815942d5b3
2020-07-08 14:33:58 -07:00
Jun Wu
b61236537c util: define istest
Summary: This makes pyre aware that `istest` exist on `util`.

Reviewed By: DurhamG

Differential Revision: D22421141

fbshipit-source-id: 50dd264988ffe0e93597df2d540f3de03e8aea4d
2020-07-08 14:29:31 -07:00
Jun Wu
4b09e88ceb amend: do not auto restack if there are no visible children
Summary:
With modern configs, repo is unfiltered and `ctx.children()` returns unfiltered
commits. Use the revset function `children` instead so invisible children won't
trigger auto restack.

Reviewed By: DurhamG

Differential Revision: D22421689

fbshipit-source-id: 3ec8f616c17254ee9ccfcad96673d209b9163da6
2020-07-08 14:10:37 -07:00
Jun Wu
3d82fd408a amend: add a new auto restack test case
Summary: The test demostrates an issue with the current auto restack logic.

Reviewed By: DurhamG

Differential Revision: D22421690

fbshipit-source-id: e035cd3212357f24322f8eb9ec5941767ad780d9
2020-07-08 14:10:37 -07:00
Arun Kulshreshtha
fdb8859422 edenapi: add new EdenAPI client
Summary:
This diff is a complete, ground-up rewrite of the EdenAPI client. Rather than attempting to use `libcurl` directly, it relies on the new `http_client` crate, which makes the code considerably simpler and allows for a proper async interface.

The most notable change is that `EdenApi` is now an async trait. A blocking API is added later in the stack for use in non-async contexts.

Reviewed By: quark-zju

Differential Revision: D22305397

fbshipit-source-id: 4c1e5d3091d6dd04cf13291e7b7a4217dfdd249f
2020-07-08 12:51:04 -07:00
Arun Kulshreshtha
014f1a5289 http_client: add buffering to CborStream
Summary:
As was pointed out in the review for D22280745 (d73c63d862), `CborStream` is inefficient in situations where the underlying stream produces chunks that are much smaller than the size of the serialized items. To avoid pathological behavior, make `CborStream` buffer the incoming data, and only attempt deserialization if enough data has accumulated.

For now, the buffer size is fixed (with a default of 1MB, chosen arbitrarily). In the future, it might make sense to have the stream adjust the buffer size based on the average size of observed deserialized values.

Reviewed By: quark-zju

Differential Revision: D22370164

fbshipit-source-id: ed940c56ca2cbbfc07f01d47becf6f1d71872872
2020-07-08 12:51:04 -07:00
Jun Wu
e440d3ce2b revlogindex: update nodemap even if it's non-symlink and mmaped on Windows
Summary: On Windows a mmap file cannot be replaced. Detect that and delete manually.

Reviewed By: farnz

Differential Revision: D22428731

fbshipit-source-id: 4d308a07aae02dcaf2aedb7b0267a535c2e09c92
2020-07-08 11:31:21 -07:00
Durham Goode
8902138221 mysql: set ssl_disabled for infinitepush and pushrebase
Summary:
Diff D22140187 (74da65a38f) upgraded mysql-connector-python, which enabled ssl by
default. Our db doesn't support this, so we disabled it for hgsql but forgot to
for infinitepush and pushrebase. Let's do it for them too.

Reviewed By: krallin

Differential Revision: D22416533

fbshipit-source-id: bc91ccd2ab4d9bc8ba423c8e60fc0191c7ff78c6
2020-07-08 09:15:29 -07:00
Thomas Orozco
ae917ba227 mononoke/virtually_sharded_blobstore: make sampling rate tunable
Summary: As it says in the title.

Reviewed By: farnz

Differential Revision: D22432526

fbshipit-source-id: 42726584689cbc2f5c9138b42b7bf77939921bdd
2020-07-08 09:07:19 -07:00
Kostia Balytskyi
75db021d70 live_commit_sync_config: make it into a trait
Summary:
The goal is to make it easier to implement unit tests, which depend on `LiveCommitSyncConfig`. Specifically, `scs` has a piece of code, which instantiates `mononoke_api::Repo` with a test version of `CommitSyncConfig`. To migrate it to `LiveCommitSyncConfig`, I need to be able to create a test version of that. It **is** possible now, but would require me to turn a supplied instance of `CommitSyncConfig` back into `json`, which is cumbersome. Using a `dyn LiveCommitSyncConfig` there, instead of a concrete struct seems like a good idea.

Note also that we are using this technique in many places: most (all?) of our DB tables are traits, which we then implement for SQL-specific structs.

Finally, this diff does not actually migrate all of the current users of `LiveCommitSyncConfig` (the struct) to be users of `LiveCommitSyncConfig` (the trait), and instead makes them use `CfgrLiveCommitSyncConfig` (the trait impl). The idea is that we can migrate bits to use traits when needed (for example, in an upcoming `scs` diff). When not needed, it's fine to use concrete structs. Again, this is already the case in a a few places: we sometimes use `SqlSyncedCommitMapping` struct directly, instead of `T: SyncedCommitMapping` or `dyn SyncedCommitMapping`.

Reviewed By: StanislavGlebik

Differential Revision: D22383859

fbshipit-source-id: 8657fa39b11101684c1baae9f26becad6f890302
2020-07-08 08:34:06 -07:00
Thomas Orozco
dd1aaf90fe mononoke/{hgproto,mercurial_bundles}: eliminate O(N^2) behavior in decoding
Summary:
This updates the AsyncRead implementations we use in hgproto and
mercurial_bundles to use a LimitedAsyncRead. The upshot of this change is that
we eliminate O(N^2) behavior when parsing the data we receive from clients.

See the earlier diff on this stack for more detail on where this happens, but
the bottom line is that Framed presents a full-size buffer that we zero out
every time we try to read data. With this change, the buffer we zero out is
comparable to the amount of data we are reading.

This matters in commit cloud because bundles might be really big, and a single
big bundle is enough to take an entire core for a spin or 20 minutes (and they
achieve nothing but time out in the end). That being said, it's also useful for
non-commit cloud bundles: we do occasionally receive big bundles (especially
for WWW codemods), and those will benefit from the exact same speedup.

One final thing I should mention: this is all in a busy CPU poll loop, and as I noted
in my earlier diff, the effect persists across our bundle receiving code. This means
it will sometimes result in not polling other futures we might have going.

Reviewed By: farnz

Differential Revision: D22432350

fbshipit-source-id: 33f1a035afb8cdae94c2ecb8e03204c394c67a55
2020-07-08 08:07:13 -07:00
Lukas Piatkowski
0eb9d79a46 rust/reindeer: update eventsource to 0.5
Summary: The 0.3 version (currently being used only in one crate eden/scm/lib/commitcloudsubscriber) is using an old openssl crate which doesn't work with openssl library installed on most machines (Both in FB and on GitHub Actions).

Reviewed By: mitrandir77

Differential Revision: D22430649

fbshipit-source-id: b8fa930841dbcdd4c085d8c9488d768b3526e1c4
2020-07-08 04:15:21 -07:00
Durham Goode
f8b098febb dirstate: prevent absolute paths in the dirstate
Summary:
The dirstate code did not prevent absolute paths from being added to
the structure, but they would cause problems later when those paths were passed
to Rust. We should move the dirstate to use the Rust path type, but for now
let's just block absolute paths.

Reviewed By: quark-zju, xavierd

Differential Revision: D22426592

fbshipit-source-id: 4ae9f004237e4c54336beb03aab29517254ae441
2020-07-07 20:32:05 -07:00
Xavier Deguillard
71a9ae11d9 pyrevisionstore: do not abort on partial fetches
Summary:
We've seen a handful of users complaining about clone failing and not being
able to recover from it. From looking at the various reports and the
stacktraces, I believe this is caused by a flaky connection on the user end
that causes the Python code to retry the getpack calls. Before retrying, the
code will figure out what still needs fetching and this is done via the
getmissing API. When LFS pointers were fetched, the LFS blobs aren't yet
present on disk, and thus the underlying ContentStore::get_missing will a set
of keys that contain some StoreKey::Content keys. The code would previously
fail at this point, but since the key also contains the original key, we can
simply return this, the pointers might be refetched but these are fairly small.

Taking a step back from this bug, the issue really is that the retry logic is
done in code that cannot understand content-keys, and moving it to a part of
the code that understands this would also resolve the issue.

I went with the simple approach for now, but since other remote stores
(EdenAPI, the LFS one, etc) would also benefit from the retry logic, we may
want to move the logic into Rust and remove the getmissing API from the Python
exposed ContentStore.

Reviewed By: DurhamG

Differential Revision: D22425600

fbshipit-source-id: 69c2898cc302d2170cd0f206c89189c341db5278
2020-07-07 19:44:01 -07:00
Jun Wu
8e199e9a44 debugcomplete: show aliases with -v
Summary:
Make zsh_completion complete standard aliases like `checkout`.
This restores the behavior before D18463299 (54451585ce) stack.

Reviewed By: farnz

Differential Revision: D22396737

fbshipit-source-id: 745761041d6d1dec6adba2efb102e2021a01b36b
2020-07-07 16:47:45 -07:00
Chad Austin
4c0996afa5 make systemd management functions explicitly async
Summary:
Rather than dynamically allocating an event loop in the systemd async
code, make all the corresponding functions async, so the caller is
responsible for threading an event loop down.

Reviewed By: genevievehelsel

Differential Revision: D21894106

fbshipit-source-id: 398c769c30c85a3bb210dbc209f34f9f7336996c
2020-07-07 11:31:33 -07:00
Genevieve Helsel
bda44c5201 move is_system_idle to procutils
Summary: I'd eventually like to use this in the edenfs_monitor, so moving it to `proc_utils` for sharibility.

Reviewed By: chadaustin

Differential Revision: D21998763

fbshipit-source-id: 052e78fb8e58515f98eb465b8041fd0e621fc9da
2020-07-07 11:22:33 -07:00
Genevieve Helsel
cd1196bb28 move eden idleness detection to EdenFSProcess
Summary: I'd eventually like to use this in the edenfs_monitor, so I'm adding this to `proc_utils` for future ease of use.

Reviewed By: chadaustin

Differential Revision: D21987390

fbshipit-source-id: 076672b44311c2a1e0cac934c0674a18a87649af
2020-07-07 11:22:33 -07:00
Genevieve Helsel
0f9cff3b88 Add tests for ignored file with carriage return in name
Summary: On macOS, "Icon?" (aka "Icon\r") is a sometimes added. This file is weird to ignore, and should be ignored using "Icon\r\r" or "Icon[/r]". This won't be hidden with "Icon\r" or "Icon\r"

Reviewed By: chadaustin

Differential Revision: D22050682

fbshipit-source-id: 51d7d4c2414a07b959120455ae991d2425c1ea4d
2020-07-07 11:13:12 -07:00
Thomas Orozco
8c994e7682 mononoke/fastreplay: log replay success & failure counts to ODS
Summary:
I want to update the health check to stop averaging averages (like in
D22394014). To do this, I need those counters.

Reviewed By: ahornby

Differential Revision: D22410196

fbshipit-source-id: aa5cbfe6607be3b953887f1639e1de54baac7389
2020-07-07 06:41:23 -07:00
Stanislau Hlebik
886e34d17b mononoke: log size of fetched undesired files
Summary:
Just knowing the number of fetched undesired files doesn't give the full
picture. e.g. fetching lots of small files is better than fetching single
multi-Gb file.
So knowing the size of files is helpful

Reviewed By: krallin

Differential Revision: D22408400

fbshipit-source-id: 7653c1cdceccf50aeda9ce8a4880ee5178d4b107
2020-07-07 06:23:01 -07:00
Thomas Orozco
78b9dbb470 changelog: fix undefined variable
Summary:
This is causing the Mononoke and hg tests to break on an undefined variable.
This looks like it might have been a refactoring accident when the code
defining this got extracted into D22240176 (but the code using it landed in
D22240177 (ec58e72903)).

This diff unbreaks the tests by defaulting the parameter to False (which
seems coherent with the idea of that latter diff), and puts it in a place
that'll trigger a merge conflict for quark-zju when he rebases D22240176.

Reviewed By: farnz

Differential Revision: D22408588

fbshipit-source-id: 496808742a13dfeb17989123742a0aa8bae17b38
2020-07-07 02:37:42 -07:00
Ratnadeep Joshi
ce4b8122ae Some minor modifications to verbiage in documentation
Summary: [edenfs] Some minor modifications to verbiage in documentation

Reviewed By: chadaustin

Differential Revision: D22394129

fbshipit-source-id: 2d662e56d621fd5e5d5ba6de284ca3d08f8bd4e5
2020-07-06 22:20:17 -07:00
Durham Goode
f09839fdf9 merge: don't do checkunknown for files that we know are not untracked
Summary:
checkunknown is quite expensive since it has to read the contents of
every untracked file, which can be 10's of thousands of non-parallel stats and
reads. For files that don't exist in the working copy, it's just wasted work to
stat for the files at all. Status can efficiently tell us what files are
unknown, so let's use that to triage most "unknown" files to normal writes
before we even get to checkunknown.

The downside of this approach is that it makes an additional call to status,
which is not cached (only non-unknown+non-ignore+non-clean status calls are
cached). We could add more caching if this is a problem.

This doesn't help the case where a user might have 10k+ untracked files due to a
ctrl+c'd checkout, but we'll improve that in a future diff.

Reviewed By: quark-zju

Differential Revision: D22366758

fbshipit-source-id: b54fec113dc162f97a35e705ed083ddd14babe55
2020-07-06 22:10:56 -07:00
Arun Kulshreshtha
5f0181f48c Regenerate all Cargo.tomls after upgrade to futures 0.3.5
Summary: D22381744 updated the version of `futures` in third-party/rust to 0.3.5, but did not regenerate the autocargo-managed Cargo.toml files in the repo. Although this is a semver-compatible change (and therefore should not break anything), it means that affected projects would see changes to all of their Cargo.toml files the next time they ran `cargo autocargo`.

Reviewed By: dtolnay

Differential Revision: D22403809

fbshipit-source-id: eb1fdbaf69c99549309da0f67c9bebcb69c1131b
2020-07-06 20:49:43 -07:00
generatedunixname89002005307016
09cd8a3058 suppress errors in eden
Summary:
Automatic run to suppress type errors.
 #pyreupgrade

Differential Revision: D22404242

fbshipit-source-id: a7eabe6c6eb0dc9f29b3cf01f780c34fff1c6810
2020-07-06 20:31:25 -07:00
Xavier Deguillard
48d97384d0 revisionstore: fix typo in memcache trace
Summary:
The key is "hit_count" not "hits". This typo caused the trace to always claim
that no data was fetched from memcache, which is obviously not true as the
getpack trace that follows listed significantly less requested keys.

Reviewed By: kulshrax

Differential Revision: D22401592

fbshipit-source-id: ab2ea3e7f8ff3a9c7322678afc8a174e09d6dc09
2020-07-06 17:57:16 -07:00
Jun Wu
ae2f6a939c pydag: ignore nullrev or nullid from Python
Summary:
The Mercurial's concept of `null` revision (hardcoded as 20 zeros) is a
headache to special case. See https://www.mercurial-scm.org/wiki/RevsetVirtualRevisionPlan.

The Rust DAG layer cannot handle it. Make pydag drop the nullid or nullrev when
crossing the Python -> Rust boundary.

A cleaner way to handle `null` might be:

- Create a new vertex in the DAG in memory that has empty content.
  Calculate its commit hash normally. The commit is isolated from other parts
  of the commit graph. It has no parents and no children.
  The vertex has an assigned Id, which is not zero if the repo is not empty.
- Assign the `null` special name (like how we do for `tip`) to the commit.
- Remove all hard-code special cases of the 20-zero `nullid`.

That would allow things like `hg up null`, `hg diff -r null -r X` to continue
work without special casing it in the commit graph layer.

Reviewed By: sfilipco

Differential Revision: D22240188

fbshipit-source-id: 707af47cbf36a7df60097a17d69094aae89d3250
2020-07-06 15:51:00 -07:00
Jun Wu
e89ef0df78 changelog: replace index2 with the new Rust structure
Summary:
The new Rust structure has enough features to replace index2.

This will eventually allow us to delete index2 related logic (namely,
pyrevlogindex).

Reviewed By: sfilipco

Differential Revision: D22240178

fbshipit-source-id: 1af9e6045c8d8d1a220a6abad6d33b129a3afa70
2020-07-06 15:51:00 -07:00
Jun Wu
b1163b2553 repair: invalidate changelog after strip
Summary:
The old revlog C code works with strip. The new Rust code is never designed to
work with strip (especially the segmented changelog does not support strip).

In the strip code path, just reload the changelog after strip to avoid issues.

Reviewed By: sfilipco

Differential Revision: D22323188

fbshipit-source-id: c4f790c66372c28a71173cf16910ad1d7cb89223
2020-07-06 15:51:00 -07:00
Jun Wu
2b67699ca1 treemanifest: do not prefetch anything for 'debugstrip'
Summary:
Strip is a special case. `tr.changes["revs"]` can contain invalidated
revision numbers. Do not use it.

Reviewed By: sfilipco

Differential Revision: D22240179

fbshipit-source-id: 6b9d29e099f821a7fc7aa6055dc8eccf4597ffd0
2020-07-06 15:51:00 -07:00
Jun Wu
ec58e72903 changelog: load Rust HgCommits structure
Summary:
This exposes the Rust HgCommits object. It will replace changelog operations
gradually.

This diff only makes changelog load the new Rust HgCommits structure, and
maintain it side-by-side with the original data structures when there are
changes. It does not replace real changelog operations yet.

Reviewed By: sfilipco

Differential Revision: D22240177

fbshipit-source-id: b585c1585defdc133d2b9ef2fda4aea8702152bf
2020-07-06 15:51:00 -07:00
Jun Wu
b1ae5b2874 revlogindex: skip flushing duplicated entries
Summary:
If the revlog on disk was changed to include new commits, read them and avoid
writing duplicated commits (which breaks nodemap building).

Reviewed By: sfilipco

Differential Revision: D22323187

fbshipit-source-id: cdd65f31e65865d9f3868e43416633297896c0f9
2020-07-06 15:51:00 -07:00
Jun Wu
0c91746cc7 pydag: use trait object abstractions
Summary:
Change pydag from using concreate `namedag` and `memnamedag` to trait objects:

- `commits`: High-level read-write commits storage, supports Rust `HgCommits`
  (segmented changelog), `MemHgCommits`, and `RevlogCommits`.
- `dagalgo`: maps to the `DagAlgorithm` Rust trait.
- `idmap`: maps to the `IdConvert + PrefixLookup` Rust traits.

The idea is that we move the revlog / segmented changelog difference from Python
to behind Rust trait objects so the Python code looks overall cleaner, the Rust
revset alternative gets exercised early, and switching from revlog to segmented
changelog becomes easier.

Reviewed By: sfilipco

Differential Revision: D21796242

fbshipit-source-id: 3a4a3ff3d9e7e46059d1ed3461a55003c352e82d
2020-07-06 15:51:00 -07:00
Jun Wu
e53be6d0fc hgcommits: make HgCommit serializable
Summary: This is used by the next diff.

Reviewed By: sfilipco

Differential Revision: D21944139

fbshipit-source-id: 184c4e97aaeca36c3608665defd1473c9300fb5b
2020-07-06 15:51:00 -07:00
Jun Wu
132f046f11 hgcommits: add revlog-based commits
Summary: Use `RevlogIndex` to implement the HgCommits interface.

Reviewed By: sfilipco

Differential Revision: D21854226

fbshipit-source-id: 05ff242858ac879d3b40b35ba8db5044135604be
2020-07-06 15:50:59 -07:00
Jun Wu
fa62891e30 hgcommits: add in-memory version of HgCommits
Summary: This will satisfy some use-cases.

Reviewed By: sfilipco

Differential Revision: D21854225

fbshipit-source-id: 76758716b35cfd31dc3843c118917c0fb7609027
2020-07-06 15:50:59 -07:00
Jun Wu
ea67a2168e hgcommits: new crate for hybrid commit data + dag structure
Summary: This will help move more Python logic to Rust.

Reviewed By: sfilipco

Differential Revision: D21854224

fbshipit-source-id: b03cbacedc11d77e8c56262437a8d10bd9a89e59
2020-07-06 15:50:59 -07:00
Jun Wu
a0c5b1b3a5 revlogindex: is_ancestor(x, x) should return true
Summary: This is discovered by using it in Python world.

Reviewed By: sfilipco

Differential Revision: D22323186

fbshipit-source-id: 295811e0950b94ad2ad73ad242228b6a3f9765d0
2020-07-06 15:50:59 -07:00
Jun Wu
52668752d8 revlogindex: de-duplicate insertions
Summary: Adding a same commit multiple times is a no-op.

Reviewed By: sfilipco

Differential Revision: D22323190

fbshipit-source-id: 61a06335581a9cad32dc7e929b841ec69b551a9c
2020-07-06 15:50:59 -07:00
Jun Wu
b86f3bd6e2 revlogindex: use tests from the dag crate
Summary: This adds some test coverage for the revlog DagAlgorithm implementation.

Reviewed By: sfilipco

Differential Revision: D22249157

fbshipit-source-id: a1d347b4d90d0e7f8fb229c317cc75c2b8e16242
2020-07-06 15:50:59 -07:00
Jun Wu
fcbe821dd1 revlogindex: impl DagAddHeads for RevlogIndex
Summary:
This makes RevlogIndex compatible with the generic DAG testing API from the dag
crate.

Reviewed By: sfilipco

Differential Revision: D22249156

fbshipit-source-id: 54a3c458e85804968964174eab674e494a6fa8a2
2020-07-06 15:50:59 -07:00
Jun Wu
cf1bc37007 dag: avoid using > 2 parents in generic DAG tests
Summary: Some DAG implementations does not support it.

Reviewed By: sfilipco

Differential Revision: D22249158

fbshipit-source-id: ebcdf164677ee647ef44aa1ee3cfd318bac658b0
2020-07-06 15:50:59 -07:00
Jun Wu
9a17be7ce0 dag: do not test the order of vertexes in generic tests
Summary:
Different implementation might return different orders. They should be
considered correct.

Reviewed By: sfilipco

Differential Revision: D22249159

fbshipit-source-id: 36e4cadf814366f7ee2ed8a778948ff810760550
2020-07-06 15:50:58 -07:00
Jun Wu
f24dc621cb dag: make part of the tests generic
Summary: This makes it possible to run tests for other DAGs, like the revlog.

Reviewed By: sfilipco

Differential Revision: D22249155

fbshipit-source-id: 205579eeaccd42a21297d965973957168bb8726e
2020-07-06 15:50:58 -07:00
Jun Wu
5d9baa2f07 revlogindex: implement fast path for only
Summary:
For revlog, calculating `only` can have some fast paths that do not scan the
entire changelog.

Reviewed By: sfilipco

Differential Revision: D21944136

fbshipit-source-id: 58391636350f8f19643d59c46d663f55861d6de3
2020-07-06 15:50:58 -07:00
Jun Wu
7d75f6046f revlogindex: implement fast path for only_both
Summary:
This will be used to maintain narrow-heads phase calculation and sunsetting the
revlog-specific changelog.index2.

Reviewed By: sfilipco

Differential Revision: D21944131

fbshipit-source-id: a8bbd1fd24546f4891ffa677476bff750c3faf5f
2020-07-06 15:50:58 -07:00
Jun Wu
40cd0f8f06 revlogindex: fix an offset-by-one error
Summary: The values of `pending_nodes_index` should start from 0 instead of 1.

Reviewed By: sfilipco

Differential Revision: D21944133

fbshipit-source-id: a2a332868f16b398037289c81bf8076d1400c0a7
2020-07-06 15:50:58 -07:00
Jun Wu
bd0a35f2a0 revlogindex: do not raise errors on ambiguous prefix
Summary: This matches the interface of segmented changelog.

Reviewed By: sfilipco

Differential Revision: D21944134

fbshipit-source-id: 75f68b2838de4abe95f13cb3c62dc68af132fff7
2020-07-06 15:50:58 -07:00
Jun Wu
4670200c21 revlogindex: maintain revlog.d file handler transparently
Summary:
This drops the `file` parameter from the `raw_data` API, making
RevlogIndex easier to use.

Reviewed By: sfilipco

Differential Revision: D21854228

fbshipit-source-id: 259726524d1cc6a1f9d00783e22f9502c7decdeb
2020-07-06 15:50:58 -07:00
Jun Wu
137fa3cd34 revlogindex: implement writing to revlog data
Summary: Extend RevlogIndex to support writing to revlog data.

Reviewed By: sfilipco

Differential Revision: D21854227

fbshipit-source-id: 11b6bf3b706b316f23c33ab07144530c9db92d58
2020-07-06 15:50:58 -07:00
Jun Wu
f005d92f07 revlogindex: implement reading from revlog data
Summary: Extend RevlogIndex to support reading from revlog data.

Reviewed By: sfilipco

Differential Revision: D21854229

fbshipit-source-id: 4cbc08762fd236a97370d5d55c59a222f935b262
2020-07-06 15:50:58 -07:00
Jun Wu
2bc4dd01ca dag: add a trait to convert IdSet to Set
Summary:
The reverse `to_id_set` exists.
It turns out that the Python land wants this in many places.

Reviewed By: sfilipco

Differential Revision: D22240175

fbshipit-source-id: b6a3a3a3869dc0c521a21b1d86394421b816632b
2020-07-06 15:50:58 -07:00
Jun Wu
07b3d60f80 dag: add "only(x, y)" to DagAlgorithm
Summary:
This provides a way for implementations to optimize the operation.

For segmented changelog, the default implementation is good enough.

For revlog, `only` can have a fast path that does not iterate through the
entire changelog.

A related API `only_both` is added. For revlog it has multiple use-cases,
including narrow-heads phase calculation and revlog.findcommonmissing used by
discovery.

Reviewed By: markbt

Differential Revision: D21944132

fbshipit-source-id: d11660dae85ea6158977eb00d1ceaceddf1d8234
2020-07-06 15:50:57 -07:00
Thomas Orozco
7e8c9174be mononoke/admin: add a filestore fetch subcommand
Summary:
Sometimes you want to fetch a file. Using curl and the LFS server works, but
this really should be part of Mononoke admin.

Reviewed By: ikostia

Differential Revision: D22397472

fbshipit-source-id: 17decf4aa2017a2c1be52605a254692f293d1bcd
2020-07-06 14:56:08 -07:00
Thomas Orozco
46def15c4f mononoke/admin: fix filestore store subcommand
Summary:
This got broken when we moved to Tokio 0.2. Let's fix it and add a test to make
sure it does not regress.

Reviewed By: ikostia

Differential Revision: D22396261

fbshipit-source-id: a8359aee33b4d6d840581f57f91af6c03125fd6a
2020-07-06 14:56:08 -07:00
Arun Kulshreshtha
7ae5344bb2 edenapi_types: improve DataEntry hash check API
Summary: Use `thiserror` to provide a more ergonomic API for `DataEntry` hash checking. The `.data()` method now simply returns a `Result` rather than a tuple with an ad-hoc enum.

Reviewed By: quark-zju

Differential Revision: D22376164

fbshipit-source-id: fc39cb212ec1ee5830292db4aa5eca18f2c16a2b
2020-07-06 14:47:48 -07:00
Durham Goode
f02c7e85f0 py3: fix streampager progress bar
Summary:
The streaming pager expects bytes as inputs, but we were sending
strings to the progress buffer. This fixes it.

Reviewed By: quark-zju

Differential Revision: D22394395

fbshipit-source-id: 4acbfc08ca624ca3c794e6e369df669e370e5b42
2020-07-06 14:07:59 -07:00
Durham Goode
dc09a79a6b py3: fix checkout with conflicts in eden
Summary: It now works

Reviewed By: quark-zju

Differential Revision: D22394103

fbshipit-source-id: e525192079bef66d36f731c754c4b73f5fc7cb11
2020-07-06 14:07:59 -07:00
Jun Wu
253584bf2e test-generaldelta: disable modern features
Summary:
The test relies on Python revlog implementation details which
do not exist in the Rust revlog implementation.

Reviewed By: DurhamG

Differential Revision: D22240183

fbshipit-source-id: b245b35e561c3364618a0e199244df030cc47942
2020-07-06 14:04:28 -07:00
Jun Wu
1a5de54ab0 test-bookmarks-strip: do not use debugstrip
Reviewed By: DurhamG

Differential Revision: D22240182

fbshipit-source-id: b0bf34e84f46a0593b6390c6c97a47110f8d94d2
2020-07-06 14:04:28 -07:00
Jun Wu
ddfb3c5dba test-backout: rewrite the test in a basic form
Summary:
The original test is unmaintainable. Rewrite it to test key features.

I dropped detailed tests about merge conflict / content handling.
In the future we probalby will have a clean Rust implementation of "applying
diff between X and Y to Z" which can replace various unmaintainable patch
application logic in Python. We can test that Rust library extensively and
commands will just use the clean library (ex. revert, backout)

Reviewed By: sfilipco

Differential Revision: D22240184

fbshipit-source-id: 4d6c65fe02ccc92e64c62a48f702187678973086
2020-07-06 14:04:28 -07:00
Jun Wu
24ab8f5b4b test-amend-split: do not use hg debugstrip
Summary:
`debugstrip` is an operation that depends on multiple legacy components (revlog
strip, truncate-based transaction).  They are incompatible with modern configs
(no truncation, heads-based visibility, metalog-based transaction).
Avoid using it in the test.

Reviewed By: DurhamG

Differential Revision: D22240187

fbshipit-source-id: ec215d75fb766957a3d6f58e491ef815f5bedbdc
2020-07-06 14:04:28 -07:00
Jun Wu
4da3addbee tests: migrate off rev numbers for more tests
Summary:
Change by `fix-revnum.py`. Part of the tests using `hg debugstrip`,
which I'm trying to avoid.

Reviewed By: DurhamG

Differential Revision: D22240181

fbshipit-source-id: a569b712fe4b985378e5c61c000deecccefbc488
2020-07-06 14:04:28 -07:00
Jun Wu
ac60550de6 commands: disable rollback unconditionally
Summary:
Since tests do not use it, and we have long disabled it in production.
Let's just disable the command unconditionally.

Reviewed By: DurhamG

Differential Revision: D22368834

fbshipit-source-id: 7ebc5b07c4044b6809defc06437cda7256cb2ebf
2020-07-06 14:04:27 -07:00
Jun Wu
24363ad52f tests: remove usage of hg rollback
Summary:
`hg rollback` was long disabled in production setup. It has weird behavior and
is likely incompatible with modern transaction frameworks. Remove its usage in
tests.

Reviewed By: DurhamG

Differential Revision: D22240180

fbshipit-source-id: 453684ebbc77132e09b1b717b6ad1e106dcad214
2020-07-06 14:04:27 -07:00
Jun Wu
dabce28285 repoview: further remove repoview references
Summary:
Since repoview is removed, those concetps are useless. Therefore remove them.

This includes:
- repo.unfiltered(), repo.filtered(), repo.filtername
- changelog.filteredrevs
- error.FilteredIndexError, error.FilteredLookupError,
  error.FilteredRepoLookupError
- repo.unfilteredpropertycache, repo.filteredpropertycache,
  repo.unfilteredmethod
- index.headsrevsfiltered

Reviewed By: DurhamG

Differential Revision: D22367600

fbshipit-source-id: d133b8aaa136176b4c9f7f4b0c52ee60ac888531
2020-07-06 14:04:27 -07:00
Jun Wu
021fa7eba5 repoview: remove repoview layer
Summary:
End-users have been using visibleheads + narrowheads for a while, and hgsql
does not require any filtering, and most tests are migrated to modern configs
(visibility + narrow heads). Now it's time to consider removing the repoview
layer.

This removes complexities around `changelog.filteredrevs` and various different
`repoview` objects with caching problems (ex. I have seen that `repo` and `unfi`
have inconsistent phasecache causing they calculate phases differently and it's
quite hard to reason about confidently).

This will also make it easier to migrate to segmented changelog.

Reviewed By: DurhamG

Differential Revision: D22201084

fbshipit-source-id: 3661c26dd72a64b5005d86e164af4da5a6895649
2020-07-06 14:04:27 -07:00
Kostia Balytskyi
6d5b3ac1f2 live_commit_sync_config: add versions accessors
Summary:
This diff adds two new bits of functionality to `LiveCommitSyncConfig`:
- getting all possible versions of `CommitSyncConfig` for a given repo
- getting `CommitSyncConfig` for a repo by version name

These bits are meant to be used in:
- `commit_validator` and `bookmarks_validator`, which would
  need to run validation against a specific config version
- `mononoke_admin`, which would need to be able to query all versions,
  display the version used to sync two commits and so on

Reviewed By: StanislavGlebik

Differential Revision: D22235381

fbshipit-source-id: 42326fe853b588849bce0185b456a5365f3d8dff
2020-07-06 14:00:36 -07:00
Jun Wu
ef913a9914 debugshell: ignore error setting up commitcloud service
Summary:
For various reasons (ex. wrong configs like investigating test repos) the
initialization can fail. Ignore them.

Reviewed By: DurhamG

Differential Revision: D22368942

fbshipit-source-id: ae01dcc499f63f373b0f7bec00554ea8074ae7cf
2020-07-06 12:54:52 -07:00
Thomas Orozco
ce0af2d591 mononoke/virtually_sharded_blobstore: deduplicate puts based on data being put
Summary:
This updates the virtually_sharded_blobstore to deduplicate puts only if the
data being put is actually the data we have put in the past. This is done by
keeping track of the hash of things we've put in the presence cache.

This has 2 benefits:

- This is safer. We only dedupe puts we 100% know succeeded (because this
  particular instance was the one to attempt the put).
- This is creates less surprises, notably it lets us overwrite data in the
  backing store (if we are writing something different).

Reviewed By: StanislavGlebik

Differential Revision: D22392809

fbshipit-source-id: d76a49baa9a5749b0fb4865ee1fc1aa5016791bc
2020-07-06 12:10:46 -07:00
Thomas Orozco
19b31ead9d mononoke/virtually_sharded_blobstore: make race tests a little more forgiving
Summary:
Running those on my devserver, I noticed they can be a bit flaky. They're are
racy on the purpose, but let's relax them a bit.

We have a lot of margin here — our blobstore is rate limited at once request
every 10ms, and we need to do 100 requests (the goal is to show that they don't
all wait), so 100ms is fine to prove that they're not rate limited when sharing
the same data.

Reviewed By: StanislavGlebik

Differential Revision: D22392810

fbshipit-source-id: 2e3c9cdf19b0e4ab979dfc000fbfa8da864c4fd6
2020-07-06 12:10:46 -07:00
Kostia Balytskyi
f223ca6e6e synced commit mapping: expose version in get query
Summary:
When we look up how a commit was synced, we frequently need to know which version of `CommitSyncConfig` was used to sync it. Specifically, this is useful for admin tooling and commit validator, which I am planning to migrate to use versioned `CommitSyncConfig` in the near future.

Later I will also include this information into `RewrittenAs` variant of `CommitSyncOutcome`, so that we expose it to real users. I did not do it in this diff to keep is small and easy to review. And because the other part is not ready :P

Reviewed By: StanislavGlebik

Differential Revision: D22255785

fbshipit-source-id: 4312e9b75e2c5f92ba018ff9ed9149efd3e7b7bc
2020-07-06 11:23:31 -07:00
Mateusz Kwapich
7b3aa42459 fix the problem with ordering in into_response
Summary: When I've implemented this method I didn't test it for preserving the order of the input changesets and I've noticed my mistake when I was testing the scmquery part.

Reviewed By: StanislavGlebik

Differential Revision: D22374981

fbshipit-source-id: 4529f01370798377b27e4b6a706fc192a1ea928e
2020-07-06 08:32:03 -07:00
Mark Thomas
bcaaba1e9c add list-bookmarks command
Summary:
Add the `scsc list-bookmarks` command, which lists bookmarks in a repository.

If a commit id is also provided, `list-bookmark` will be limited to bookmarks that
point to that commit of one of its descendants.

Reviewed By: mitrandir77

Differential Revision: D22361240

fbshipit-source-id: 17067ba47f9285b8137a567a70a87fadcaabec80
2020-07-06 07:01:24 -07:00
Thomas Orozco
07907b2b26 mononoke/virtually_sharded_blobstore: merge in the context_concurrency_blobstore
Summary:
There is inevitably interaction between caching, deduplication and rate
limiting:

- You don't want the rate limiting to be above caching (in the blobstore stack,
  that is), because you shouldn't rate limits cache hits (this is where we are
  today).
- You don't want the rate limiting to below deduplication, because then you get
  priority inversion where a low-priority rate-limited request might hold the
  semaphore while a higher-priority, non rate limited request wants to do the
  same fetch (we could have moved rate limiting here prior to introducing
  deduplication, but I didn't do it earlier because I wanted to eventually
  introduce deduplication).

So, now that we have caching and deduplication in the same blobstore, let's
also incorporate rate limiting there!.

Note that this also brings a potential motivation for moving Memcache into this
blobstore, in case we don't want rate limiting to apply to requests before they
go to the _actual_ blobstore (I did not do this in this diff).

The design here when accessing the blobstore is as follows:

- Get the semaphore
- Check if the data is in cache, if so release the semaphore and return the
  data.
- Otherwise, check if we are rater limited.

Then, if we are rate limited:

- Release the semaphore
- Wait for our turn
- Acquire the semaphore again
- Check the cache again (someone might have put the data we want while we were
  waiting).
    - If the data is there, then return our rate limit token.
    - If the data isn't there, then proceed to query the blobstore.

If we aren't rate limited, then we just proceed to query the blobstore.

There are a couple subtle aspects of this:

- If we have a "late" cache hit (i.e. after we waited for rate limiting), then
  we'll have waited but we won't need to query the blobstore.
    - This is important when a large number of requests from the same key
      arrive at the same time and get rate limited. If we don't do this second
      cache check or if we don't return the token, then we'll consume a rate
      limiting token for each request (instead of 1 for the first request).
- If a piece of data isn't cacheable, we should treat it like a cache hit with
  regard to semaphores (i.e. release early), but like a miss with regard to
  rate limits (i.e. wait).

Both of those are addressed captured in the code by returning the `Ticket` on a
cache hit. We can then choose to either return the ticket on a cache hit, or wait
for it on a cache miss.

(all of this logic is captured in unit tests, we can remove any of the blocks
there in `Shards::acquire` and a test will fail)

Reviewed By: farnz

Differential Revision: D22374606

fbshipit-source-id: c3a48805d3cdfed2a885bec8c47c173ee7ebfe2d
2020-07-06 04:38:31 -07:00
Thomas Orozco
6153dab328 mononoke/async_limiter: add support for cancelling access
Summary:
Sometimes we take a token then realize we don't want it. In this case, giving it back is convenient.

This adds this!

Reviewed By: farnz

Differential Revision: D22374607

fbshipit-source-id: ccf47e6c75c37d154704645c9e826f514d6f49f6
2020-07-06 04:38:31 -07:00
Kostia Balytskyi
b7cf1dcbdb x-repo sync job: use LiveCommitSyncConfig
Summary:
This is a mirror image of a diff, which made backsyncer use `LiveCommitSyncConfig`: we want to use configerator-based live configs, when we run in the continuous tailing mode.

As no-op iteration time used to be 10s and that's a bit wasteful for tests, this diff changes it to be configurable.

Finally, because of instantiating various additional `CommitSyncerArgs` structs, this diff globs out some of the `using repo` logs (which aren't very useful as test signals anyway, IMO).

Reviewed By: StanislavGlebik

Differential Revision: D22209205

fbshipit-source-id: fa46802418a431781593c41ee36f468dee9eefba
2020-07-03 13:36:18 -07:00
Arun Kulshreshtha
1b5283aa5a edenapi_types: improve comments
Summary: Tidy up some comments in this file.

Reviewed By: ikostia

Differential Revision: D22376165

fbshipit-source-id: ce4760776048aa8e72809b4f828d0ea426fcf878
2020-07-03 12:29:19 -07:00
Arun Kulshreshtha
dc98c085ad edenapi_types: split redaction tombstone string across multiple lines
Summary: Make this line less long.

Reviewed By: ikostia

Differential Revision: D22372492

fbshipit-source-id: cfc1ab6a296aa2056a908bf786e4f498f3a688b4
2020-07-03 12:15:01 -07:00
Stanislau Hlebik
2cfc23770c mononoke: use override_blame_filesize_limit option
Summary: This diff actually start to use the option

Reviewed By: krallin

Differential Revision: D22373943

fbshipit-source-id: fe23da9c3daa1f9f91a5ee5e368b33e0091aa9c1
2020-07-03 09:58:46 -07:00
Stanislau Hlebik
06f2a420d1 mononoke: correctly return BlameError::Rejected
Summary:
Previously if a blame request was rejected (e.g. because a file was too large)
then we returned BlameError::Error.

This doesn't look correct, because there's BlameError::Rejected. This diff
makes it so that fetch_blame function returns BlameError::Rejected

Reviewed By: aslpavel

Differential Revision: D22373948

fbshipit-source-id: 4859809dc315b8fd66f94016c6bd5156cffd7cc2
2020-07-03 09:58:46 -07:00
Stanislau Hlebik
2a732f2626 mononoke: pass BlobRepo in fetch_full_file_content
Summary:
In the next diffs we'll need to read override_blame_filesize_limit from derived
data config, and this config is stored in BlobRepo.

this diff makes a small refactoring to pass BlobRepo to fetch_full_file_content

Reviewed By: krallin

Differential Revision: D22373946

fbshipit-source-id: b209abce82c0279d41173b5b25f6761659a92f3d
2020-07-03 09:58:46 -07:00
Stanislau Hlebik
b703f11685 mononoke: asyncify fetch_full_file_content
Summary: This will make adding blame file size limit override the next diffs easier

Reviewed By: krallin

Differential Revision: D22373945

fbshipit-source-id: 4857e43c5d80596340878753ea90bf31d7bb3367
2020-07-03 09:58:46 -07:00
Mateusz Kwapich
f21b459c99 remove dependency on bounded_traversal
Summary:
We're always yielding zero or one child during traversal, bounded traversal is
unnecessary here

Differential Revision: D22242148

fbshipit-source-id: b4c8a1279ef7bd15e9d0b3b2063683f45e30a97a
2020-07-03 08:02:25 -07:00
Mateusz Kwapich
7ff7c931a8 add option for limiting the log to descendants of single node
Summary:
Let's use new option in CLI. Unfortunately we can't easily accept commit ids in
named params so it has to be a postional one.

Differential Revision: D22234412

fbshipit-source-id: a9c27422fa65ae1c42cb1c243c7694507a957437
2020-07-03 08:02:25 -07:00
Thomas Orozco
de731a89fc mononoke/virtually_sharded_blobstore: log deduplicated puts
Summary:
If anything were to go wrong, we'd be happy to know which puts we ignored. So,
let's log them.

Reviewed By: farnz

Differential Revision: D22356714

fbshipit-source-id: 5687bf0fc426421c5f28b99a9004d87c97106695
2020-07-03 05:53:11 -07:00
Thomas Orozco
be1bac6c06 mononoke/virtually_sharded_blobstore: expose this in cmdlib
Summary:
Eventually, I plan to make this the default, but for now I'd like to make it
something we can choose to turn on or off as a cmd argument (so we can start
with the experimental tier and Fastreplay).

Note that this mixes volatile vs. non-volatile pools when accessing the pools
for cacheblob. In practice, those pools are actually volatile, it's just that
things don't break if you access them as non-volatile.

Reviewed By: farnz

Differential Revision: D22356537

fbshipit-source-id: 53071b6b21ca5727d422e10f685061c709114ae7
2020-07-03 05:53:11 -07:00
Thomas Orozco
c68100f46e mononoke/virtually_sharded_blobstore: spawn before taking semaphores
Summary:
I canaried this on Fastreplay, but unfortunately that showed that sometimes we
just deadlock, or get so slow we might as well be deadlocked (and it happens
pretty quickly, after ~20 minutes). I tried spawning all the `get()` futures,
and that fixes the problem (but it makes gettreepack noticeably slower), so
that suggests something somewhere is creating futures, polling them a little
bit, then never driving them to completion.

For better or worse, I'd experienced the exact same problem with the
ContextConcurrencyBlobstore (my initial attempt at QOS, which also used a
semaphore), so I was kinda expecting this to happen.

In a sense, this nice because I we've suspected there were things like that in
the codebase for a while (e.g. with the occasional SQL timeout we see where it
looks like MySQL responds fast but we don't actually poll it until past the
timeout), and it gives us a somewhat convenient repro.

In another sense, it's annoying because it blocks this work :)

So, to work around the problem, for now, let's spawn futures to force the work
to complete when a semaphore is held. I originally had an unconditional spawn
here, but that is too expensive for the cache-hit code path and slows things
down (by about ~2x).

However, having it only if we'll query the blobstore isn't not as expensive,
and that seems to be fine (in fact it is a ~20% p99 perf improvement,
though the exact number depends on the number of shard we use for this, which I've had to tweak a bit).

https://pxl.cl/1c18H

I did find what I think is one potential instance of this problem in
`bounded_traversal_stream`, which is that we never try to poll `scheduled` to
completion. Instead, we just poll for the next ready future in our
FuturesUnordered, and if that turns out to be synchronous work then we'll just
re-enqueue more stuff (and sort of starve async work in this FuturesUnordered).

I tried updating bounded traversal to try a fairer implementation (which polls
everything), but that wasn't sufficient to make the problem go away, so I think
this is something we have to just accept for now (note that this actually has
some interesting perf impact in isolation: it's a free ~20% perf improvement on
p95+: https://pxl.cl/1c192

see 976b6b92293a0912147c09aa222b2957873ef0df if you're curious

Reviewed By: farnz

Differential Revision: D22332478

fbshipit-source-id: 885b84cda1abc15c51fbc5dd34473e49338e13f4
2020-07-03 05:53:11 -07:00
Thomas Orozco
2082621d51 mononoke/virtually_sharded_blobstore: add ODS metrics
Summary: Those are useful to track.

Reviewed By: farnz

Differential Revision: D22332480

fbshipit-source-id: 43f5cd7121c4aa497d961015e7c16973615798d1
2020-07-03 05:53:10 -07:00
Thomas Orozco
1db62473f2 mononoke/virtually_sharded_blobstore: track perf counters
Summary: Like it says in the title. Those are useful!

Reviewed By: farnz

Differential Revision: D22332479

fbshipit-source-id: f9bddad75fcbed2593c675f9ba45965bd87f1575
2020-07-03 05:53:10 -07:00
Thomas Orozco
c297024a52 mononoke/virtually_sharded_blobstore: do not delay reads for uncacheable data
Summary:
The goal of this blobstore is to dedupe reads by waiting for them to finish and
hit cache instead (and also to dedupe writes, but that's not relevant here).

However, this is not a desirable feature if a blob cannot be stored in cache,
because then we're serializing accesses for no good reason. So, when that
happens, we store "this cannot be stored in cache", and we release reads
immediately.

Reviewed By: farnz

Differential Revision: D22285269

fbshipit-source-id: be7f1c73dc36b6d58c5075172e5e3c5764eed894
2020-07-03 05:53:10 -07:00
Thomas Orozco
b9319a4d32 mononoke/virtually_sharded_blobstore: add a newtype for cache keys + a prefix
Summary:
I'm going to store things that aren't quite the exact blobs in here, so on the
off chance that we somehow have two caching blobstores (the old one and this
one) that use the same pools, we should avoid collisions by using a prefix.

And, since I'm going to use a prefix, I'm adding a newtype wrapper to not use
the prefixed key as the blobstore key by accident.

Differential Revision: D22285271

fbshipit-source-id: e352ba107f205958fa33af829c8a46896c24027e
2020-07-03 05:53:10 -07:00
Thomas Orozco
bf3c2e19f0 mononoke/virtually_sharded_blobstore: a caching blobstore that deduplicates
Summary:
This introduces a caching blobstore that deduplicates reads and writes. The
underlying motivation is to improve performance for processes that might find
themsleves inadvertently reading the same data concurrently from a bunch of
independent callsites (most of Mononoke), or writing the same bit of data over
and over again.

The latter is particularly useful for things like commit cloud backfilling in
WWW, where some logger commits include the same blob being written hundreds or
thousands of times, and cause us to overload the underlying Zippy shard in
Manifold. This is however a problem we've also encountered in the past in e.g.
the deleted files manifest and had to solve there. This blobstore is a little
different in the sense that it solves that problem for all writers.

This comes at the cost of writes being dropped if they're known to be
redundant, which prevents updates through this blobstore. This is desirable for
most of Mononoke, but not all (notably, for skiplist updates it's not great).

For now, I'm going to add this behind an opt-in flag, and later on I'm planning
to make it opt-out and turn it off there (I'm thinking to use the CoreContext
for this).

Reviewed By: farnz

Differential Revision: D22285270

fbshipit-source-id: 4e3502ab2da52a3a0e0e471cd9bc4c10b84a3cc5
2020-07-03 05:53:10 -07:00
Kostia Balytskyi
f210326656 blobstore_healer: log the speed with which queue rows are deleted
Summary: This allowed me to compare two alternative approaches to queue draining, and generally seems like a useful thing to do.

Reviewed By: krallin

Differential Revision: D22364733

fbshipit-source-id: b6c76295c85b4dec6f0bfd7107c30bb4e4a28942
2020-07-03 05:09:56 -07:00
Johan Schuijt-Li
2b69716461 push compat() down one level from main
Summary: Migrate to new-style futures

Reviewed By: ikostia

Differential Revision: D22365232

fbshipit-source-id: 08ddd50be1c34fe90a453f369cea2e45323b63db
2020-07-03 02:36:09 -07:00
Stanislau Hlebik
2d24ddf2e1 mononoke: add --all-types to backfill_derive_data single
Summary: It's useful to derive all enabled derived data at once

Reviewed By: krallin

Differential Revision: D22336338

fbshipit-source-id: 54bc27ab2c23c175913fc02e6bf05d18a54c249c
2020-07-03 00:20:58 -07:00
Stanislau Hlebik
2a54f281f2 mononoke: add an option to perform a stack move in megarepotool
Summary:
We've recently added an option to perform a stack move in megarepolib. A "stack
move" it's a stack of commits that move a files according to a mover. Now let's
expose it in the megarepotool

Reviewed By: ikostia

Differential Revision: D22312486

fbshipit-source-id: 878d4b2575ed2930bbbf0b9b35e51bb41393e622
2020-07-03 00:18:41 -07:00
Jun Wu
fe1aadc238 util: do not print large dict for smarttraceback
Summary:
Printing it can take too much time.

Use a larger threshold than `list` or `set`,  since `hg` commonly uses a dict
for command line options, and that can exceeds 8 items.

As we're here, also fix the traceback order so it's "most recent call last".

Reviewed By: kulshrax

Differential Revision: D22362003

fbshipit-source-id: 3d2f4664bec6b4cfaf42b8e5d2fc47b0f3d96411
2020-07-02 19:19:10 -07:00
Lukas Piatkowski
c763ab4b40 eden/scm: provide getdeps.py way of building eden/scm on GitHub
Summary:
In order to do what the title says, this diff does:
1. Add the `eden/oss/.../third-party/rust/.../Cargo.toml` files. As mentioned in the previous diff, those are required by GitHub so that the third party dependencies that are local in fbsource are properly defined with a "git" dependency in order for Cargo to "link" crates properly.
2. Changes to `eden/scm/Makefile` to add build/install commands for getdeps to invoke. Those command knowing that they are called from withing getdeps context they link the dependencies brought by getdeps into their proper places that match their folder layout in fbsource. Those Makefile commands also pass a GETDEPS_BUILD env to the setup.py invocations so that it knows it is being called withing a getdeps build.
3. Changes to `eden/scm/setup.py` that add "thriftasset" that makes use of the getdeps.py provided "thrift" binary to build .py files out of thrift files.
4. Changes to `distutils_rust` to use the vendored crates dir provided by getdeps.
5. Changes to `getdeps/builder.py` and `getdeps/manifest.py` that enable more fine-grained configuratior of how Makefile builds are invoked.
6. Changes to `getdeps/buildopts.py` and `getdeps/manifest.py` to disable overriding PATH and pkgconfig env, so that "eden/scm" builds in getdeps using system libraries rather than getdeps-provided ones (NOTE: I've tried to use getdeps provided libraries, but the trickiest bit was that Rust links with Python, which is currently not providable by getdeps, so if you try to build everything the system provided Python libraries will collide with getdeps provided ones)
7. Added `opensource/fbcode_builder/manifests/eden_scm` for the getdeps build.

Reviewed By: quark-zju

Differential Revision: D22336485

fbshipit-source-id: 244d10c9e06ee83de61e97e62a1f2a2184d2312f
2020-07-02 17:53:37 -07:00
Stefan Filip
232caa0cd6 tests: update test-*ruststores-repack for Mononoke
Summary:
The test case assumed that clone would return data in order from the server.
That is not a valid assumption and Mononoke doesn't return data in order.

Reviewed By: xavierd

Differential Revision: D22364636

fbshipit-source-id: abfcbe0074a08c9a76c42d351ce5c792eb65e24f
2020-07-02 14:37:41 -07:00