Summary: The backend is designed to be used by the "debugsegmentclone" command, which does not write revlog.
Reviewed By: sfilipco
Differential Revision: D25624786
fbshipit-source-id: e145128c7b41d78fed495f8da540169f741b674d
Summary: This makes it possible to add new commits in a repo without revlog.
Reviewed By: sfilipco
Differential Revision: D25602527
fbshipit-source-id: 56c27a5f00307bcf35efa4517c7664a865c47a43
Summary:
We want to start disallowing non-approved config files from being
loaded. To do that, let's update the config verifier to accept an optional list
of allowed locations. If it's provided, we delete any values that came from a
disallowed location.
This will enable us to prune our config sources down to rust configs,
configerator configs, .hg/hgrc, and ~/.hgrc.
Reviewed By: quark-zju
Differential Revision: D25539738
fbshipit-source-id: 0ece1c7038e4a563c92140832edfa726e879e498
Summary:
It turns out `Arc::ptr_eq` is becoming unreliable, which will cause fast paths
to be not used, and extreme slowness in some cases (ex. `public & nodes`
iterating everything in `public`).
This diff adds an API for an IdMap to tell us its identity. That identity is
then used to replace the unreliable `Arc::ptr_eq`.
For an in-memory map, we just assign a unique number (per process) for its
identity on initialization. For an on-disk map, we use the type + path to
represent it.
Note: strictly speaking, this could cause false positives about
"maps are compatible", because two maps initially cloned from each other
can be mutated differently and their map_id do not change. That will
be addressed in upcoming diffs introducing a more complex but precise way to
track compatibility.
Reviewed By: sfilipco
Differential Revision: D25598076
fbshipit-source-id: 98c58f367770adaa14edcad20eeeed37420fbbaa
Summary: The order has been incorrect and led to a confusing message
Reviewed By: krallin
Differential Revision: D25559963
fbshipit-source-id: 4fcb3e53cedcb08675b60b25cbb5da2ca52c08ed
Summary:
The config verifier would remove items from the values list if they
were disallowed. To do this, it iterated through the values list backwards,
removing bad items. In some cases it stored the index of a bad value for later
use, but because it was iterating backwards and removing things, the indexed it
stored might not be correct by the time the loop is done. To fix this, let's go
back to iterating forwards.
Reviewed By: quark-zju
Differential Revision: D25539737
fbshipit-source-id: 87663f3c162c690f3961b8075814f3467916cb4b
Summary:
Make `get_commit_raw_text` aware of hg's hardcoded commit hashes: NULL_ID and
WDIR_ID. Previously, only `stream_commit_raw_text` is aware of it.
This makes it a bit more compatible when used in more places.
Reviewed By: sfilipco
Differential Revision: D25515006
fbshipit-source-id: 08708734a28f43acf662494df69694988a5b9ca0
Summary:
Previously, only the batch fetching, or the stream fetching APIs will
actually fetch commit remotely. The 1-commit fetching API does not have
the network side effect, with the hope that we can migrate all usecases
to stream or batch fetching.
Practically it's quite difficult to migrate all use-cases, and the Python
layer has to have a fallback 1-by-1 fetching. Now let's just move that
fallback to Rust to simplify the code. The fallback in the Rust code
is by the default impl of get_commit_raw_text.
Reviewed By: sfilipco
Differential Revision: D25513056
fbshipit-source-id: b3c615397d33b8d35876dc23ca7b95173783ef80
Summary: The API will be used in Python bindings to avoid running Python in background threads.
Reviewed By: sfilipco
Differential Revision: D25513055
fbshipit-source-id: a108b55115271a256c0d43e0ff7b82c0b209be81
Summary: Make the auth crate validate the user's certificate before returning it. This way we can catch invalid certs before trying to use them.
Reviewed By: sfilipco
Differential Revision: D25454687
fbshipit-source-id: ad253fb433310570c20f33dbd0d0bf11df21e966
Summary: Add a new module that can parse X.509 certificates and detect common issues (e.g., the certificate is missing, corrupt, or expired). This should allow us to provide better UX around certificate errors.
Reviewed By: sfilipco
Differential Revision: D25440548
fbshipit-source-id: b7785fd17fa85f812fd38de09e79420f4e256065
Summary: This makes it more flexible.
Reviewed By: kulshrax
Differential Revision: D24467604
fbshipit-source-id: 63023cf0dde2fb7eac592ac79008e4b7a62340c1
Summary:
When a blob is redacted server side, the http code 410 is returned.
Unfortunately, that HTTP code wasn't supported and caused Mercurial to crash.
To fix this, we just need to store a placeholder when receiving this HTTP code
and simply return it up the stack when reading it.
Reviewed By: DurhamG
Differential Revision: D25433001
fbshipit-source-id: 66ec365fa2643bcf29e38b114ae1fc92aeaf1a7b
Summary:
There was a bug with local-data indexedlog storage where it
wasn't applying the appropriate suffix, so tree data was being stored in
.hg/store/indexedlogdatastore just like file data. Let's fix that and add a
test.
Reviewed By: quark-zju
Differential Revision: D25469917
fbshipit-source-id: 731252f924f9a8014867fc077a7ef10ac9870170
Summary: Make the parent function used by various graph building functions async.
Reviewed By: sfilipco
Differential Revision: D25353612
fbshipit-source-id: 31f173dc82f0cce6022cc2caae78369fdc821c8f
Summary:
It is no longer needed for building segments (replaced by "prepared flat
segments"). Remove it.
Reviewed By: sfilipco
Differential Revision: D25353613
fbshipit-source-id: aede9e33c3217a61b5b14aae5b128d8953bc578e
Summary: Make IdConvert async and migrate all its users.
Reviewed By: sfilipco
Differential Revision: D25350915
fbshipit-source-id: f05c89a43418f1180bf0ffa573ae2cdb87162c76
Summary: This will make it easier to make IdConvert async.
Reviewed By: sfilipco
Differential Revision: D25350912
fbshipit-source-id: fbaf638b16a9cf468b7530b19d699b7996ddc4f1
Summary: This will make async migrating easier.
Reviewed By: sfilipco
Differential Revision: D25350913
fbshipit-source-id: f33bdc0023ae0cc49601504b811991ea6813ff9e
Summary: This will make it easier to make IdConvert async.
Reviewed By: sfilipco
Differential Revision: D25350914
fbshipit-source-id: 9f2957731f13a28fdfab834de19763b8afcf8ffa
Summary: This will make it easier to make IdConvert async.
Reviewed By: sfilipco
Differential Revision: D25345239
fbshipit-source-id: 684a0843ae32270aa9b537ef9a2b17a28c027e51
Summary: This will make it easier to make IdConvert async.
Reviewed By: sfilipco
Differential Revision: D25345232
fbshipit-source-id: b8967ea51a6141a95070006a289dd724522f8e18
Summary:
Update DagAlgorithm and all its users to async. This makes it easier to make
IdConvert async.
Reviewed By: sfilipco
Differential Revision: D25345236
fbshipit-source-id: d6cf76723356bd0eb81822843b2e581de1e3290a
Summary:
Make it possible to use async functions in MetaSet functions.
It will be used when DagAlgorithm becomes async.
Reviewed By: sfilipco
Differential Revision: D25345229
fbshipit-source-id: 0469d572b56df21fbdbdfae4178377e572adbcda
Summary: This will make it easier if the `Set` passed in requires async.
Reviewed By: sfilipco
Differential Revision: D25345230
fbshipit-source-id: a327d4e5d425b7eb5296b2fbe25c446492aa9ea7
Summary: This makes it easier to make DagAlgorithm async.
Reviewed By: sfilipco
Differential Revision: D25345234
fbshipit-source-id: 5ca4bac38f5aac4c6611146a87f423a244f1f5a2
Summary: This will make it easier to use `.await` if part of `dag` becomes async.
Reviewed By: sfilipco
Differential Revision: D25345237
fbshipit-source-id: 7f07cdaa9c2e0468667638066611fabe3a3f7f28
Summary: `impl Trait` does not work with `async_trait`.
Reviewed By: sfilipco
Differential Revision: D25345238
fbshipit-source-id: e7890dbaeb162d44e072ea4428d045004608719b
Summary: This makes it easier to migrate to async.
Reviewed By: sfilipco
Differential Revision: D25345228
fbshipit-source-id: e819f0de5f805377a6977325216ef11b14d68c1d
Summary:
Marking IdConvert Sync makes it possible to be used as a trait object with async-trait.
See https://docs.rs/async-trait/0.1.41/async_trait/#dyn-traits
`dag` uses a lot `dyn DagAlgorithm`. In the future when async is used more, the
trait object will be required to be Send or Sync. Just require it on the trait
to make our life easier.
Marking `IdDagStore` as Send + Sync makes async migration easier.
Reviewed By: sfilipco
Differential Revision: D25345231
fbshipit-source-id: 45b96057907cbe2a1d38fd424e7d4c963dd1b245
Summary: Use async function for the PrefixLookup trait.
Reviewed By: sfilipco
Differential Revision: D24840820
fbshipit-source-id: d22cac9f11b06e3127fa956e3f116cf232214125
Summary: This makes the trait objects slightly easier to use.
Reviewed By: sfilipco
Differential Revision: D24840821
fbshipit-source-id: 22fcdf13b62420302b562c309874e08360d02372
Summary: This makes `dyn IdConvert` include `PrefixLookup`.
Reviewed By: sfilipco
Differential Revision: D24840819
fbshipit-source-id: 8d4e25c534f6e4397ec6f643eb3aa116bff12a2c
Summary:
In the future, when async APIs are used, Python bindings will have lifetime
issues. Make it possible to clone the IdMap so the Python bindings can be made
to work.
Reviewed By: sfilipco
Differential Revision: D24840822
fbshipit-source-id: 6aa4e369c877c428ed39d2cbea79e6943836afa8
Summary: This makes NameSet more friendly for async use-cases, interface-wise.
Reviewed By: sfilipco
Differential Revision: D24806695
fbshipit-source-id: 6e640ba2666872a9128d6460e8b53d6a0e595e56
Summary:
Change the main API of NameSet to async. Use the `nonblocking` crate to bridge
the sync and async world for compatibility. Future changes will migrate
Iterator to async Stream.
Reviewed By: sfilipco
Differential Revision: D24806696
fbshipit-source-id: f72571407a5747a4eabe096dada288656c9d426e
Summary:
This will make it easier to incrementally migrate APIs to async, and make sync
programs able to use sync interface if they'd like to. For example, the dag
crate is likely going to be async-ize. Its async APIs are only useful for
places where the dag is lazy. It's still possible to have a non-lazy dag
that a sync interface is suitable.
Reviewed By: ikostia
Differential Revision: D24777747
fbshipit-source-id: 6d56149fbd1f9b29f5fc62387cbf194800755d12
Summary:
We already have a config to write shared history to an indexedlog. Let's
add a similar config for local history.
Reviewed By: quark-zju
Differential Revision: D23910457
fbshipit-source-id: eca21db0350895f573a60e4e1a6b05514e70f1c7
Summary:
In a future diff we'll use the indexedlog stores for local history. We
want those to exist forever, so let's move IndexedLogHgIdHistoryStore to use a
Store under the hood, and add an enum for distinguishing between the two types
at creation time.
Reviewed By: quark-zju
Differential Revision: D25429675
fbshipit-source-id: 5f2dc494e1175d4c1dc74992d3311d2e55d784ca
Summary: Previously, the EdenAPI client (and Mercurial's `http-client`) required both a client certificate and private key to be specified individually in order to use TLS mutual authentication. However, libcurl supports having both the certificate and key concatenated together into a single PEM file. This diff makes it possible to simply specify the combined file as the `cert`, leaving the `key` unset.
Reviewed By: quark-zju
Differential Revision: D25323786
fbshipit-source-id: 1800be3ef82ec4dfa89d725f5860190172994c89
Summary: Treat empty paths as `None`, which allows certificate and key paths to be unset via a `--config` flag (e.g. `hg debughttp --config auth.edenapi.key=`). This would normally require adding an `%unset` line to the appropriate hgrc, which adds friction to ad-hoc command line usage.
Reviewed By: quark-zju
Differential Revision: D25416531
fbshipit-source-id: 15aae78d9b3a82227278f33def45fa960aa65a98
Summary:
We already have a config to write shared data to an indexedlog. Let's
add a similar config for local data.
Reviewed By: xavierd
Differential Revision: D23909569
fbshipit-source-id: 87317554beb6bef8237e6a900403701662c3c0d0
Summary:
We temporarily dropped repair support when transitioning to using Store instead
of a raw RotateLog. Let's add that back now.
Reviewed By: xavierd
Differential Revision: D25371622
fbshipit-source-id: e28fc425a6ffb50c93540672b0df75a172ebbe9c
Summary:
In a future diff we'll use the indexedlog stores for local data. We
want those to exist forever, so let's move IndexedLogHgIdDataStore to use a
Store under the hood, and add an enum for distinguishing between the two types
at creation time.
Reviewed By: xavierd
Differential Revision: D23915622
fbshipit-source-id: 296cf6dfcd53e5cf1ae7624fdccedf0a60a77f22
Summary:
In a future diff we'll be moving the IndexedLogHgIdDataStore to use the Store
type (that hides the differences between IndexedLog and RotateLog). To do so, it
needs to support repairs. Let's do some minor refactoring to enable this.
Reviewed By: xavierd
Differential Revision: D25371623
fbshipit-source-id: 846cb5f8c21f1e6b550a45259cc8da24cc65b13b
Summary:
The end goal is to have clients using a sparse IdMap. There is still some work
to get there though. In the mean time we can test repositories that don't use
any revlogs. The current expections for those repositories are that they have
a full idmap locally.
Reviewed By: quark-zju
Differential Revision: D25075341
fbshipit-source-id: 52ab881fc9c64d0d13944e9619c087e0d4fb547c
Summary:
This commit adds a new eden configuration option that
controls whether we try to load our edenfs.kext in preference to
an alternative fuse implementation on macOS.
The majority of this diff is plumbing to convey the configuration
value through to the privhelper, which is relatively restrictive
due to its root-ness.
I've also updated watchman and mercurial to be aware of the new
filesystem type that shows up in the mount table.
Reviewed By: genevievehelsel
Differential Revision: D25065462
fbshipit-source-id: 4f35b9440654298e2706a0d0613d97eb63451999
Summary:
Introduce `edenapithin` crate, which offers C bindings to EdenAPI Client.
There are two top-level `owned` types, `EdenApiClient` and `TreeEntryFetch`, which represent `Box`ed results from calls to EdenAPI's blocking client. The C / C++ code is responsible for calling the associated `free` functions for these types, as well as `OwnedString`, which is used to represent error variants of a `Result` as a string.
Most functionality is provided through functions which operate on and return references into these top-level owned types, providing access into Rust `Result`s and `Vec`s (manually-monomorphized), and EdenApi's `TreeEntry` and `TreeChildEntry`.
Additional non-pointer types are defined in the `types` module, which do not require manual memory management.
C++ bindings are not included currently, but will be introduced soon.
Reviewed By: fanzeyi
Differential Revision: D24866065
fbshipit-source-id: bb15127b84cdbc6487b2d0e1798f37ef62e5b32d
Summary:
Introduce a new API type, `TreeAttributes`, corresponding to the existing type `WireTreeAttributesRequest`, which exposes which optional attributes are available for fetching. An `Option<TreeAttributes>` parameter is added to the `trees` API, and if set to `None`, the client will make a request with TreeAttributes::default().
The Python bindings accept a dictionary for the attributes parameter, and any fields present will overwrite the default settings from TreeAttributes::default(). Unrecognized attributes will be silently ignored.
Reviewed By: kulshrax
Differential Revision: D25041255
fbshipit-source-id: 5c581c20aac06eeb0428fff42bfd93f6aecbb629
Summary: Adding `full_idmap_clone` to edenapi and usign that in `debugsegmentclone`.
Reviewed By: quark-zju
Differential Revision: D25139730
fbshipit-source-id: 682055f7c30a94a941acd16e2b8e61b9ea1d0aef
Summary:
This method reconstructs a dag from clone data.
At the moment we only have a clone data construction method in Mononoke. It's
the Dags job to construct and import the clone_data. We'll consolidate that at
a later time.
Reviewed By: quark-zju
Differential Revision: D24954823
fbshipit-source-id: fe92179ec80f71234fc8f1cf7709f5104aabb4fb
Summary:
The server expects post requests. At this time we don't want to cache this data
so POST.
Reviewed By: singhsrb
Differential Revision: D24954824
fbshipit-source-id: 433672189ad97100eee7679894a894ab1d8cff7b
Summary:
The config is something that makes sense for all commands to have access to.
Commands that don't use a repo don't have access to the config that is prepared
by the dispatches. This change is a stop-gap to allow new commands that don't
require a repository to receive the config as an argument.
The construction of the config is something that we should iterate on. I see
the current implementaiton as a workaround.
Reviewed By: quark-zju
Differential Revision: D24954822
fbshipit-source-id: 42254bb201ba8838e7cc107394e8fab53a1a95c7
Summary:
While trying to repro a user report (https://fburl.com/jqvm320o), I ran into a
new hg error: P151186623, i.e.:
```
KeyError: 'Key not found HgId(Key { path: RepoPathBuf("fbcode/thrift/facebook/test/py/TARGETS"), hgid: HgId("55713728544d5955703d604299d77bb1ed50c62d") })'
```
After some investigation (and adding a lot of prints), I noticed that this
was trying to query the EdenAPI server for this filenode. That request should
succeed, given Mononoke knows about this filenode:
```
[torozco@devbig051]~/fbcode % mononoke_exec mononoke_admin fbsource --use-mysql-client filenodes by-id fbcode/thrift/facebook/test/py/TARGETS 55713728544d5955703d604299d77bb1ed50c62d
mononoke_exec: Using config stage prod (set MONONOKE_EXEC_STAGE= to override)
I1126 08:10:02.089614 3697890 [main] eden/mononoke/cmdlib/src/args/mod.rs:1097] using repo "fbsource" repoid RepositoryId(2100)
I1126 08:10:02.995172 3697890 [main] eden/mononoke/cmds/admin/filenodes.rs:137] Filenode: HgFileNodeId(HgNodeHash(Sha1(55713728544d5955703d604299d77bb1ed50c62d)))
I1126 08:10:02.995282 3697890 [main] eden/mononoke/cmds/admin/filenodes.rs:138] -- path: FilePath(MPath("fbcode/thrift/facebook/test/py/TARGETS"))
I1126 08:10:02.995341 3697890 [main] eden/mononoke/cmds/admin/filenodes.rs:139] -- p1: Some(HgFileNodeId(HgNodeHash(Sha1(ccb76adc7db0fc4a395be066fe5464873cdf57e7))))
I1126 08:10:02.995405 3697890 [main] eden/mononoke/cmds/admin/filenodes.rs:140] -- p2: None
I1126 08:10:02.995449 3697890 [main] eden/mononoke/cmds/admin/filenodes.rs:141] -- copyfrom: None
I1126 08:10:02.995486 3697890 [main] eden/mononoke/cmds/admin/filenodes.rs:142] -- linknode: HgChangesetId(HgNodeHash(Sha1(dfe46f7d6cd8bc9b03af8870aca521b1801126f0)))
```
Turns out, the success rate for this method is actually 0% — out of thousands of requests,
not a single one succeeded :(
https://fburl.com/scuba/edenapi_server/cma3c3j0
The root cause here is that the server side is not properly deserializing
requests (actually finding that was a problem of its own, I filed T80406893 for this).
If you manage to get it to print the errors, it says:
```
{"message":"Deserialization failed: missing field `path`","request_id":"f97e2c7c-a432-4696-9a4e-538ed0db0418"}
```
The reason for this is that the server side tries to deserialize the request
as if it was a `WireHistoryRequest`, but actually it's a `HistoryRequest`, so
all the fields have different names (we use numbers in `WireHistoryRequest`).
This diff fixes that. I also introduced a helper method to make this a little
less footgun-y and double-checked the other callsites. There is one callsite
right now that looks like it might be broken (the commit one), but I couldn't
find the client side interface for this (I'm guessing it's not implemented
yet), so for now I left it as-is.
Reviewed By: StanislavGlebik
Differential Revision: D25187639
fbshipit-source-id: fa993579666dda762c0d71ccb56a646c20aee606
Summary:
Right now we just get a "deadline exceeded" error, which isn't very amenable to
helping us understand why we timed out. Let's add more logging. Notably,
I'd like to understnad what we've actually received at this point, if anything,
and how long we waited, as I'm starting to suspect this issue doesn't have much
to do with HTTP.
See https://fb.workplace.com/groups/scm/permalink/3361972140519049/ for
more context.
Reviewed By: quark-zju
Differential Revision: D25128159
fbshipit-source-id: b45d3415526fdf21aa80b7eeed98ee9206fbfd12
Summary:
We've seen some hangs with http 2 in lfs. Switching to http 1.1 seems
to fix it. Let's make this configurable so we can tweak this if we see it in
edenapi. For now we continue to default to http 2.
Reviewed By: krallin
Differential Revision: D24901201
fbshipit-source-id: 9806e2c37fa299e4bd381ebdcb17d00800408de3
Summary:
osxfuse is rebranding as macfuse in 4.x.
That has ripple effects through how the filesystem is mounted and shows up in
the system.
This commit adjusts for the new opaque and undocumented mount procedure and
speculatively updates a couple of other code locations that were sensitive to
looking for "osxfuse" when examining filesystems.
Reviewed By: genevievehelsel
Differential Revision: D24769826
fbshipit-source-id: dab81256a31702587b0683079806558e891bd1d2
Summary:
We've seen http 2 potentially causing hangs for users. Let's make this
configurable for lfs, so we can disable it and see if things get fixed.
Reviewed By: krallin
Differential Revision: D24898322
fbshipit-source-id: dc7842c0247dc6b9590a1f160076b17788aab1b9
Summary:
As discussed in a group thread (see link below), HTTP 2 may be causing
hangs for users. Let's start by making the http-client configurable. In
subsequent diffs we'll make edenapi and lfs configurable as well.
Reviewed By: krallin
Differential Revision: D24898323
fbshipit-source-id: f0035a1b8df3cee626ebe519e9e99358c1b3f043
Summary:
This isn't code that compiles, but the convention in Rust is that code actually
is doctests unless annotated otherwise, so if tested with Cargo, those fail.
This fixes that.
Reviewed By: farnz
Differential Revision: D24917364
fbshipit-source-id: 62fe11700ce561c13dc5498e01d15894b17b5b22
Summary:
Transfers iddag flat segments along with the head_id that should be use to
rebuild a full fledged IdDag. It also transfers idmap details. In the current
version it only transfers universal commit mappings.
Reviewed By: krallin
Differential Revision: D24808329
fbshipit-source-id: 4de9edcab56b54b901df1ca4be5985af2539ae05
Summary:
BE: remove old subscription to save resources in IceBreaker. The client code will recreate it anyway if missing but cleaning up will help us to reduce number of unused subscriptions.
Classic example: repo opsfiles or configerator maybe needed once and then a user don't use
Another example: switching workspaces failed and it could be result in subscriptions are not cleaned up properly
Reviewed By: markbt
Differential Revision: D24859931
fbshipit-source-id: 6df6c7e5f95859946726e04bce8bc8f3ac2d03df
Summary:
This function is useful in the mononoke to compute the universal commit idmap
that is required for clone.
Reviewed By: quark-zju
Differential Revision: D24808327
fbshipit-source-id: 0cccd59bd7982dd0bc024d5fc85fb5aa5eafb831
Summary:
`flat_segments` are going to be used to generate CloneData. These segments will
be sent to a client repository and are going to bootstrap the iddag.
Reviewed By: quark-zju
Differential Revision: D24808331
fbshipit-source-id: 00bf9723a43bb159cd98304c2c4c6583988d75aa
Summary: This is the object that will be used to bootstrap a Dag after a clone.
Reviewed By: quark-zju
Differential Revision: D24808328
fbshipit-source-id: 2c7e97c027c84a11e8716f2e288500474990169b
Summary:
The goal is to reused the functionality provided by AssignHeadOutcome for clone
purposes.
Reviewed By: quark-zju
Differential Revision: D24717924
fbshipit-source-id: e88f21ee0d8210e805e9d6896bc8992009bd7975
Summary:
The goal of this code was to divide the cache limit by the number of
logs. Instead it divided the cache limit by the default per-log size (2GB). That
results in a very small max-bytes-per-log so data was being thrown out
constantly. This fixes it and updates tests to actually demonstrate the issue.
Reviewed By: kulshrax
Differential Revision: D24712842
fbshipit-source-id: 8062758b5bfa40493e2003d5a9028d601b1522b1
Summary:
As part of getting buck build to work on OSX, we need procinfo to
include it's OSX specific library.
Reviewed By: sfilipco
Differential Revision: D24513234
fbshipit-source-id: 69d8dd546e28b4403718351ff7984ee6b2ed3d1d
Summary:
I initially saw the incremental build as something that would be run in places
that had IdMap and IdDag stored side by side in process. I am reconsidering
to use incremental build in the tailing process to keeps Segmented Changelog
artifacts up to date.
Since we update the IdMap before we update the IdDag, it is likely that we
will have runs that only update the IdMap and fail to update IdDags. This diff
adds a mechanism for the IdDag to catch up.
Reviewed By: krallin
Differential Revision: D24516440
fbshipit-source-id: 3a99248451d806ae20a0ba96199a34a8a35edaa4
Summary:
I am wondering whether we should customize the serialization format for the
InProcessStore. I want to have a basis for the comparison before I proceed.
Reviewed By: quark-zju
Differential Revision: D24580273
fbshipit-source-id: d3ddfdc029dbdd84f60acace06fddc80b4d005f4
Summary: This change adds wire types for the history API in the most straightforward way possible. In a future change, I'll move the `WireHistoryEntry` / `HistoryResponseChunk` conversion logic into the `ToApi` implementation. This implementation also doesn't add per-item errors, or standardize `history` to match the protocol evolution standards used by `trees`.
Reviewed By: kulshrax
Differential Revision: D24342046
fbshipit-source-id: d46403a823f2a1e89ad9d6d2074241d8bfe4810e
Summary: This change introduces custo `Serialize` and `Deserialize` implementations for wire types containing a fixed-length byte array. This change is more verbose than should be necessary. Because `TryFrom` is implemented to convert `&[T]` to `[T; N]` and `AsRef` to convert `[T; N]` to `&[T]`, we should be able to use a generic serde `with = ` attribute on the byte array fields of the appropriate structs. Unfortunately, I'm getting a lifetime issue that I haven't been able to resolve, and const generics aren't available on stable to implement the trait bounds without an explicit lifetime.
Reviewed By: kulshrax
Differential Revision: D24460527
fbshipit-source-id: d3a4179b833a523ce164d3df934e4cbdc2202546
Summary:
Removes top-level metadata from `TreeEntry` and add a new, specialized type for carrying child entry metadata, `TreeChildEntry`. This change also fully removes the `revisionstore_types::Metadata` from EdenAPI trees, which is only used on files.
In a follow up change I'll optimize handlings of paths / path segments in `TreeChildEntry` keys. Right now they need to be joined manually.
Reviewed By: kulshrax
Differential Revision: D24434716
fbshipit-source-id: d0739471b1f6cef58b435e10b5fb774bfb08f7f6
Summary: Rather than silently dropping entries which cannot be fetched, this change has the `WireTreeEntry` type carry optional error information, allowing it to be (de)serialized to / from `Result<TreeEntry, EdenApiServerError>` instead of a bare `TreeEntry`. Currently, handling of these failures is up to the individual application code, but it might be useful to introduce utility functions to drop failed entries and log errors.
Reviewed By: kulshrax
Differential Revision: D24315399
fbshipit-source-id: 94e4593b77cf2dc12d0dcc93d174c8a4eda95344
Summary:
indexedlogdatastore is supposed to use remotefilelog.cachelimit to set
the max size, but instead it was setting the max per-log size, which means the
max size was N times bigger. Let's fix that.
Reviewed By: xavierd
Differential Revision: D24483181
fbshipit-source-id: f33cedbfdbb318e9d5eb9fda497645050b93e9fe
Summary:
In the next diff in this stack, I'm changing the Rust thrift library as well as
the codegen, and this is causing quite a bit of pain with regard to this
codegen that is checked in to the hg build:
- Regenerating the codegen isn't super obvious.
- The Sandcastle build fails because it uses the current codegen with the old
library.
According to lukaspiatkowski, this has also been a problem in the past when Eden got
migrated to Tokio 0.2, so I'd like to save myself and others some pain by just
not generating the server side codegen and not checking it in, since it's
unused. This reduces the surface of stuff that might go out of sync.
#forcetdhashing
Reviewed By: markbt
Differential Revision: D23649604
fbshipit-source-id: d684bec427431a366de42c88e53072caa98d5b2f
Summary:
Similarly to the indexedlogdatastore, the LFS stores can become corrupted on
power loss. This is happening fairly frequently on Windows Sandcastle due to
the OS being virtualized and power being cut abruptly.
For now this only attempts to repair the shared stores, in theory we could also
try to repair the local stores but haven't looked into it.
Reviewed By: DurhamG
Differential Revision: D24449202
fbshipit-source-id: 605a7943a0850b625bf00c514879b3da1ab2b406
Summary:
The ContentStore/Metadatastore are made of several different stores, attempting
to expose all of them to Python to drive the repair logic from there would leak
implementation detail of how the stores are implemented.
Instead, let's simply expose a single `repair` function out of the
pyrevisionstore crate that takes care of repairing all of the underlying
stores. For now, this is just moving code around, but a future diff will
integrate the LFS stores.
Reviewed By: DurhamG
Differential Revision: D24449203
fbshipit-source-id: 1631ced9068716453cb404bf7e65cefbf2db5247
Summary:
This will be used to avoid 1-by-1 fetching for the changelog backend with
commit text stored remotely.
Reviewed By: sfilipco
Differential Revision: D24321293
fbshipit-source-id: 9695c72166cadc0b167e2ce7fde822cdf6b1cea8
Summary:
Turn on rust changelog (changelog2) for all hosts (except hgsql).
Turn on doublewrite backend for hg-dev hosts, triggered by pull.
Tests are mostly working, and I have been using it for weeks.
Reviewed By: singhsrb
Differential Revision: D24259759
fbshipit-source-id: b89a27f98a6d3d1e4ea187bf7b29f875d0e96e2e
Summary: Avoid HashSet or HashMap order to preserve the order of inserting commits.
Reviewed By: DurhamG
Differential Revision: D24214460
fbshipit-source-id: 66df2e0aba1820e6585f8da66897078f38abf82f
Summary:
The nullid and wdirid are special hg hashes that do not respect SHA1. They were
handled at the bindings layer. However the bindings layer cannot handle them
in a stream. Therefore move it in hgcommits.
Reviewed By: DurhamG
Differential Revision: D24365330
fbshipit-source-id: e8dc6205351ec1a2304252b9ec446dda010e6295
Summary:
In case the server does not respect the input contract and missed
some items without returning errors. The current logic would retry
forever. Change it to detect the issue and raise an error.
Reviewed By: DurhamG
Differential Revision: D24293497
fbshipit-source-id: 09421c7743078a488a9c81ce66fd92c12b39543c
Summary: The store stores sorted(p1,p2)+text to match SHA1 hashes. It's not just `text`.
Reviewed By: DurhamG
Differential Revision: D24325554
fbshipit-source-id: 8a91970f60fb535ca1a5a2d30c7d27f2714f28de
Summary: It's no longer useful as the new abstract interface does not need it.
Reviewed By: sfilipco
Differential Revision: D24399516
fbshipit-source-id: 2b6735d2a26706c6a3e6b592d2f3ecfc874c94cb
Summary:
This verifies the abstraction and simplifies the code.
The new code will use non-master segments for add_heads. Therefore the test
changes.
Reviewed By: sfilipco
Differential Revision: D24399496
fbshipit-source-id: 39067ad88ade79b4f7758bcdaafc03e5f34ced91
Summary: This makes the main namedag.rs cleaner. The next step is to move MemNameDag.
Reviewed By: sfilipco
Differential Revision: D24399495
fbshipit-source-id: c1e79a60edd8597fe7264f04548e5312414241a7
Summary: This is the last non-abstract interface of NameDag.
Reviewed By: sfilipco
Differential Revision: D24399514
fbshipit-source-id: f39bb84a1851a4fe4d1f29e6b0961e6a153c943d
Summary:
There is a need to open AbstractNameDag cleanly from a path.
Abstract that.
Reviewed By: sfilipco
Differential Revision: D24399498
fbshipit-source-id: ca242cd929e8f5580120c01eeaa928f630c21ed7
Summary:
I copied the code since it's hard to implement using the macros.
In the future I plan to merge MemNameDag into AbstractNameDag
and remove the macros.
Reviewed By: sfilipco
Differential Revision: D24399517
fbshipit-source-id: 326e76cd06a6e1ad26b39bcb51ba0ff24106c984
Summary: The `delegate!` is updated to support complex `impl`s.
Reviewed By: sfilipco
Differential Revision: D24399518
fbshipit-source-id: b9ba31174472cce4248e9644611cfc207abc3c1d
Summary: Will be used as bounds for abstraction.
Reviewed By: sfilipco
Differential Revision: D24399497
fbshipit-source-id: 343be12237d4850fbde9ebbe4034469527bd77fc
Summary: The `snapshot` field can be used instead.
Reviewed By: sfilipco
Differential Revision: D24399507
fbshipit-source-id: 67de20d897b8b763f724f3ccbd46618dec7911b9
Summary:
The trait requires an `IdMap` snapshot to be locally ready. That's not easy for
all possible implementations. Drop it to simplify things.
Reviewed By: sfilipco
Differential Revision: D24399501
fbshipit-source-id: 4d85f77c99208cda30b2a543a0bb5b295f49a65c
Summary: There were 2 prepare_filesystem_sync. Unify them into one implementation.
Reviewed By: sfilipco
Differential Revision: D24399513
fbshipit-source-id: 80d009c33b7f23dc2c4225da6fd0fb09589ba061
Summary: More general purposed type for Syncable{IdDag,IdMap}.
Reviewed By: sfilipco
Differential Revision: D24399502
fbshipit-source-id: 0599db6dd07fe3d430458f86a33a9144d850fca1
Summary: This makes it more generic.
Reviewed By: sfilipco
Differential Revision: D24399493
fbshipit-source-id: 8a1d0a13dd29989b17fe3ef1497b10b6fa0629d6
Summary: Similar to IdDag change, move impls to separate files.
Reviewed By: sfilipco
Differential Revision: D24399508
fbshipit-source-id: 575b6e7194677b67b6755b0a30ae7d014d498b10
Summary:
The lock, reload, mutate, persist pattern is general. It can be used for IdMap
too.
Reviewed By: sfilipco
Differential Revision: D24399512
fbshipit-source-id: d25e51ba735061ca101101d75aff95deb88b1d36
Summary:
Now `build_segments_persistent` and `build_segments_volatile` are the same.
Just keep one of them.
Reviewed By: sfilipco
Differential Revision: D24399511
fbshipit-source-id: a9f1ac920cdf5b448bd99bf9b6d4ca4160ba0304
Summary:
Previously, we keep the last high level segment per level in memory, and
drop it on disk. When we cross the memory / disk boundary, we had to
maintain such properties carefully. That was needed because some DAG
algorithms rely on complete high level segments.
Now that no DAG algorithms depend on such properties, let's just drop
the logic adding the last segment back to simplify the code.
This removes the need of building segments after open() and sync().
Reviewed By: sfilipco
Differential Revision: D24399515
fbshipit-source-id: 4c640d9aa03c050fcd97f70ee386e32d3a8ee26d
Summary:
This makes the algorithm a bit more robust. Now none of the DAG algorithms
depend on high-level segments are complete and cover all low-level segments.
This also removes constraints. For example, SyncableIdDag can now just
deref() to the normal IdDag for queries without worrying about correctness.
Reviewed By: sfilipco
Differential Revision: D24399503
fbshipit-source-id: e6a91010cff82264cf423e2f24dee1d372822ef6
Summary:
They depend on high-level segments covering low-level segments, which
adds extra complexities. Remove them to simplify logic.
Reviewed By: sfilipco
Differential Revision: D24399509
fbshipit-source-id: 56a8e06c263107d1da4d6754b884ce51e18e30bf
Summary: This preserves the `--noninteractive` flag used by some tools.
Reviewed By: DurhamG
Differential Revision: D24040789
fbshipit-source-id: 8d50f3f3ce6b2015f0ef6c3bd1b4fbb874d0ea7d
Summary:
This restores the compatibility of setting up merge tools using the `ui.merge`
config while still limiting the default `editmerge` tool to interactive
sessions.
Reviewed By: sfilipco
Differential Revision: D24377259
fbshipit-source-id: 3d2befba412b824fc985ddffa131e339644178c2
Summary: Make it testable by allowing specifying paths to load as user hgrc.
Reviewed By: sfilipco
Differential Revision: D24377258
fbshipit-source-id: 969028df64d55ad1f1304e35675d84595ed6a2bf
Summary:
Include a `User-Agent` header in EdenAPI requests from Mercurial. This will allow us to see the version in Scuba, and in the future, will allow us to distinguish between requests send by Mercurial and those sent directly by EdenFS.
Keeping with the current output of `hg version`, the application is specified as "EdenSCM" rather than "Mercurial".
Reviewed By: singhsrb
Differential Revision: D24347021
fbshipit-source-id: e323cfc945c9d95d8b2a0490e22c2b2505a620dc
Summary:
Rust tests run in multiple threads. Setting environment variables affects other
tests running in other threads and causes random test failures.
Protect env vars using a lock.
Reviewed By: DurhamG
Differential Revision: D24296639
fbshipit-source-id: db0bee85625a7b63e07b95ea76d96029487881d4
Summary:
The shell-script cargo tests seem very flaky. Use a dedicated Python script to
run the tests, with a more concise output that only includes failures, and run
tests in parallel.
Reviewed By: DurhamG
Differential Revision: D24296433
fbshipit-source-id: 1d63146c6c84f1035dded24fcd3d79f116c2e740
Summary:
When the server returns a 429, the intention is that the client will wait for a
little bit then try again later (there is no harm in that, as we haven't really
used many server resources for this). However, it turned out that right now we
just abort. Let's fix it!
Note that this changes the behavior a bit for the error cases, in the sense
that we no longer return `Ok(None)` but instead return an `Err`. Xavier noted
this should make sense here.
I've also had the client send its retry attempt via a header, because who
knows, that might be useful.
Reviewed By: kulshrax
Differential Revision: D24308127
fbshipit-source-id: 35639956f36342dfb0056b0d348dc4ad56bd576c
Summary: Introduces fetching of child entry IDs, and child file metadata for a specified tree manifest ID. The aux data lookup will only be performed if `with_file_metadata` is set, which is actually kind of wrong. Instead `with_children` from the wire type should be exposed in the API request type, and `with_*_metadata` should be hidden or used for data other than the child entry `Key`s.
Reviewed By: kulshrax
Differential Revision: D23886678
fbshipit-source-id: 0cba72cea7be47ae3348a406d407a19b60976c0c
Summary:
Include the client correlator string from the `clienttelemetry` extension in each EdenAPI HTTP request via the `X-Client-Correlator` header.
The `ClientIdentityMiddleware` in `gotham_ext` already understands this header (as it is already used by the LFS server), and `gotham_ext`'s `ScubaMiddleware` will automatically include the provided correlator in the server's Scuba samples.
Reviewed By: farnz
Differential Revision: D24282244
fbshipit-source-id: 13d04e706eda38893cff6e740bd1d7bf104e43dd
Summary:
This change introduces two new metadata types, Category and Transience, and a mechanism for Category to provide a default Fault and Transience, which can be overriden by the user.
Also introduces a mechanism for attempting to log exceptions which occur during exception logging, falling back to the previous behavior of just swallowing the exception on failure.
Reviewed By: DurhamG
Differential Revision: D22677565
fbshipit-source-id: 1cf75ca1e2a65964a0ede1f072439378a46bd391
Summary:
It only has benchmark code that led to the use of mincode. Now hgcommits is the
main crate for commit storage. `commitstore` without `hg` in its name was
initially planned to support other kinds of commits including git and bonsai.
However we don't have immediate goal for that at present. So let's just remove
the commitstore directory.
Reviewed By: singhsrb
Differential Revision: D24263618
fbshipit-source-id: 84b4861ae490817377e69d8c2006c63331e3db1f
Summary: Need to add new quickcheck tests, verify that remove `Serialize` from `TreeEntry` is okay.
Reviewed By: kulshrax
Differential Revision: D23457777
fbshipit-source-id: aa94ed7aa81b41924eba4a8bd1bdc2c737365b77
Summary:
Print a message for each EdenAPI method call to stderr if the user has `edenapi.debug` set.
These messages are already logged to `tracing`, but also printing them out when `edenapi.debug` is set makes the debug output more useful, since it provides context for the download stats. This is especially useful when reading through EdenFS logs.
Reviewed By: quark-zju
Differential Revision: D24204381
fbshipit-source-id: 37b47eed8b89438cdf510443e917a5c8660eb43b
Summary: Use a `HashMap` to store user-specified additional HTTP headers. This allows headers to be set in multiple places (whereas previously, setting new headers would replace all previously set headers).
Reviewed By: quark-zju
Differential Revision: D24200833
fbshipit-source-id: 93147cf334a849c4d2fc4f29849018a4c7565143
Summary: The panics can happen when the input sets are out of range.
Reviewed By: kulshrax
Differential Revision: D24191789
fbshipit-source-id: efbcbd7f6f69bd262aa979afa4f44acf9681d11e
Summary:
Some sort of serialization for the Dag is useful for saving the IdDag produced
by offline jobs load that when a mononoke server starts.
Reviewed By: quark-zju
Differential Revision: D24096964
fbshipit-source-id: 5fac40f9c10a5815fbf5dc5e2d9855cd7ec88973
Summary: Add `--debug` flag to `read_res cat` command for debug printing entire entry rather than just the data blob.
Reviewed By: kulshrax
Differential Revision: D23999804
fbshipit-source-id: 6955854edab2643cffbe5fae484a398716b48055
Summary:
The hybrid backend is similar to the doublewrite backend, except that it does
not use revlog to read commit data, but uses EdenAPI instead.
Note:
- The non-stream API will not fetch commit data from EdenAPI.
- The commit hashes are not lazy yet.
Reviewed By: sfilipco
Differential Revision: D23924147
fbshipit-source-id: eb2cf8d3a7e1704b4efb13ad3ad86f8b6a1b31d0
Summary:
This makes it convertible to `PyObject` via `cpython_ext::convert::Serde`
without additional code or dependencies.
Reviewed By: sfilipco
Differential Revision: D23966993
fbshipit-source-id: 74d83524a7c0701cde7aa6d61bb930ff4a1c90f5
Summary:
This API allows us to stream the data. If callsites only use this API, we'll
be more confident that there are no 1-by-1 fetches.
Reviewed By: sfilipco
Differential Revision: D23911865
fbshipit-source-id: 4c7dd8c2b5be33be5a55822845d55345797bacdf
Summary:
The API is basically to resolve `input_stream` to `output_stream`, with a
stateful "resolver" that can resolve locally and remotely.
Reviewed By: sfilipco
Differential Revision: D23915775
fbshipit-source-id: 14a3a37fc897c8229514acac5c91c7e46b270896
Summary:
Introduce `FileMetadata` and `DirectoryMetadata` to `Treeentry`, along with corresponding request API.
Move `metadata.flags` to `file_metadata.revisionstore_flags`, as it is never populated for trees. Do not use `metadata.size` on the wire, as it is never currently populated.
Leaving `DirectoryMetadata` commented out temporarily because serde round trips fail for unit struct. Re-introduced with fields in the next change in this stack.
Reviewed By: DurhamG
Differential Revision: D23455274
fbshipit-source-id: 57f440d5167f0b09eef2ea925484c84f739781e2
Summary:
EdenAPI always checks the integrity of filenode hashes before returning file data to the application. In the case of LFS files, this resulted in errors because the filenode hash is computed using the full file content, but the blob from the server only contains an LFS pointer.
Fix the bug by exempting LFS blobs from filenode integrity checks. (If integrity checks for LFS blobs are desired, the LFS code should be able to do this on its own since LFS blobs are content-addressed.)
Reviewed By: quark-zju
Differential Revision: D24145027
fbshipit-source-id: d7d86e2b912f267eba4120d1f5186908c3f4e9e3
Summary:
This can be used to automate Python/Rust conversions for complex structures
like `CommitRevlogData`.
Reviewed By: kulshrax
Differential Revision: D23966988
fbshipit-source-id: 17a19d38270e6ef0952c13a1cd778487e84a94ff
Summary:
The goal is to implement `FromPyObject` and `ToPyObject` more easily.
Today crates have to dependent on `cpython` to implement `From/ToPyObject`,
which is somewhat unwanted for pure Rust crates.
The `ser` module used to ignore the `variant` field for non-unit enum variants.
They have been fixed so the serialized value can be deserialized correctly.
For example, `enum E { A, B(T) }` will be serialized to `"A"` for `E::A`, and
`{"B": T}` for `E::B`.
Reviewed By: kulshrax
Differential Revision: D23966994
fbshipit-source-id: c50d57bf313caeec65a604ed9b05a5729f3b3635
Summary:
Switch from the default tuple deserialization which only understands the tuple
format, to "bytes" deserialization, which understands not only the existing
"tuple" format (therefore compatible with old data), but also "bytes" and "hex"
formats (for CBOR).
This will unblock us from switching to bytes serialization in the future.
Note: This is a breaking change for mincode serialization. Mincode + HgId users
(zsotre, metalog) have switched to explicit tuple serialization so they don't use
the default deserializaiton and remain unaffected.
Reviewed By: kulshrax
Differential Revision: D23966995
fbshipit-source-id: 83dd53f57bd4e6098de054f46a1d47f8b48133d0
Summary: This will unblock us from switching HgId to bytes serialization by default.
Reviewed By: kulshrax
Differential Revision: D24009039
fbshipit-source-id: a277869ec24652af428cda581faffa62c25d32c4
Summary: Similar to D23966992 (2a2971a4c7), add support to serialize Key differently.
Reviewed By: DurhamG
Differential Revision: D24009041
fbshipit-source-id: 2ecf1610b989a04083196d180bc62307b5162c2f
Summary: Similar to D23966992 (2a2971a4c7), add support to serialize Sha256 differently.
Reviewed By: DurhamG
Differential Revision: D24009040
fbshipit-source-id: b77f6732802f95507e1540f0bbde4d5a92d13cac
Summary: Instead of returning an error upon receiving an empty request, just return a `Fetch` object that does nothing. This prevents Mercurial from crashing in situations where an empty request somehow makes it to the EdenAPI remote store.
Reviewed By: quark-zju
Differential Revision: D24119632
fbshipit-source-id: cf4ec707b4097656c76d7084a55b2d0b3150b679
Summary:
Previously, EdenAPI was using `remotefilelog.debug` to determine whether to print things like download stats. Let's give EdenAPI its own `debug` option that can be configured independently of remotefilelog.
One notable benefit of this change is that download stats will always be printed immediately after the HTTP request completes. This can help rule out network or server issues in situations where Mercurial appears to be hanging during data fetching. (e.g, if hg had downloaded all of the data but was taking a while to process it, the debug output would show this.)
Reviewed By: DurhamG
Differential Revision: D24097942
fbshipit-source-id: bf9b065e7b97fc7ffe50ab74b1b13e2fe364755c
Summary:
Previously phase calculation was done via a simple ancestor check. This
was very slow in cases that required going far back into the graph. Going a year
back could take a number of seconds.
To fix it, let's take the Rust phaseset logic and rework it to make only_both
produce an incremental public nodes set. In a later diff we can switch the
phaseset function to use this as well, but right now phaseset returns IdSet, and
that would need to be changed to Set, which may have consequences. So I'll do it
later.
Reviewed By: quark-zju
Differential Revision: D24096539
fbshipit-source-id: 5730ddd45b08cc985ecd9128c25021b6e7d7bc89
Summary: D24070707: `[Thrift] Provide sorted fields to read_field_begin` made a change to the generated rust thrift files, so the eden/scm thrift files have to be regenerated to fix the build.
Reviewed By: farnz
Differential Revision: D24109655
fbshipit-source-id: e8575a76642673a11514fdce8e30f13ca28151f0
Summary: This can be used by dag::Vertex and minibytes::Bytes.
Reviewed By: kulshrax
Differential Revision: D23966985
fbshipit-source-id: 3b4b29648e038ef49f26ce2b500119e148544d9e
Summary:
The py_stream_class causes the code to be more verbose. It basically enforces
the bindings crate to define new types wrapping pure Rust types, and then
define py_stream_class.
In a future diff, I'm adding FromPyObject/ToPyObject support for types that
implements serde Deserialize/Serialize. py_stream_class gets in the way,
because the blanket type from cpython-ext cannot be used in the py_stream_class
macro. cpython-ext is not the proper place to define business-related stream
types.
Therefore, define a type-erased Python class, and implement
FromPyObject/ToPyObject automatically for TStream<anyhow::Result<T>> where
T implements FromPyObject or ToPyObject.
The FromPyObject now converts a Python iterator back to a stream. It's
no longer zero-cost. However, I'd imagine such usecases can be short-cut
using pure Rust code.
Background: Initially, I added some FromPyObject/ToPyObject impls to pure
Rust crates gated by a "pytypes" feature. While that works fine with cargo
build, buck does not support dynamic features and the fact that we support
both py2 and py3 makes it extremely hard to support cleanly in buck build.
For example, if minibytes::Bytes defines ToPyObject for Bytes, then any
crate using minibytes would have 2 different versions: a py2 version, a
py3 version, and they both depend on python. That seems to be a bad approach.
Reviewed By: sfilipco
Differential Revision: D23966984
fbshipit-source-id: eafb31ad458dcbdd8e970d8e419a10fbbe30595f
Summary:
Per the feedback on D23920367 (318f5683a5), let's make the human-readable download stats shorter. Example:
```
Downloaded 10.59 MiB in 12.35s over 5 requests (7.19 Mb/s, latency: 123ms)
```
The amount downloaded is now reported in binary-prefixed bytes (so that it can be directly compared to file sizes) whereas the transfer rate is reported in decimal-prefixed bits per second (so that it can be directly compared to a user's measured network speed).
Additionally, we now use the default formatting available from `std::time::Duration`, which will automatically choose the appropriate display units.
Reviewed By: quark-zju
Differential Revision: D24096525
fbshipit-source-id: 39c49f1b08135bbae7a7544b1ffe2bdbfe1533a1
Summary: We now get progress bar output when fetching from memcache!
Reviewed By: kulshrax
Differential Revision: D24060663
fbshipit-source-id: ff5efa08bced2dac12f1e16c4a55fbc37fbc0837
Summary: These aren't included anywhere, we can remove them.
Reviewed By: DurhamG
Differential Revision: D24062627
fbshipit-source-id: 9ff101eb44965ac3502ada3265ffcc8acc09d2e5
Summary: These are unused, no need to keep the code around.
Reviewed By: DurhamG
Differential Revision: D24055085
fbshipit-source-id: 6246d746983a575c051ddcb51ae02582a764a814
Summary: This is unused, no need to keep it around.
Reviewed By: DurhamG
Differential Revision: D24054164
fbshipit-source-id: 161b294eb952c6b4584aa0d49d8ff46cd63ee30f
Summary: Disable CLANGTIDY checks for several places at the code.
Reviewed By: zertosh, benoitsteiner
Differential Revision: D24018176
fbshipit-source-id: b2d294f9efd64b2e2c72b11b18d8033f9928e826
Summary:
This would have been easier if we can upgrade tokio (D24011447).
For now, let's just solve it by using a channel so the mutex is not held for long.
The implementation has some side effects, though:
- panic message is not preserved.
- 'static lifetime is required on Future.
The `'static` lifetime is incompatible with some existing code. The old function
is preserved as `block_on_exclusive` and is used in places where a future does
not have `'static` lifetime.
Reviewed By: sfilipco
Differential Revision: D24033134
fbshipit-source-id: 7b35d1ff636d2a289db9b04e60419c31bdea9453
Summary:
Add a null progress bar implementation that just keeps track of state, similar to the `progress.nullbar` in hg's Python code.
A benefit of this is that code that optionally shows progress can unconditionally update the progress bar rather than wrapping it in an `Option` and checking for presence each time.
Reviewed By: markbt
Differential Revision: D23982318
fbshipit-source-id: ffd762b59cc0c9bd2ad0c67c3ca785350db4850f
Summary:
This diff introduces a new `progress` crate that provides an abstract interface for progress bars in Rust code:
- The `ProgressFactory` trait can be used to create new progress bars.
- The `ProgressBar` trait allows Rust code to interact with the progress bar.
- The `ProgressSpinner` trait is similar, but for spinner-type progress indicators.
These traits are intended to be used as trait objects, allowing pure Rust code to accept an opaque `ProgressFactory` and use it to report progress. This kind of abstraction, while not common in idiomatic Rust code, allows the progress implementation to be completed decoupled from the pure Rust code, which is important given that Mercurial's progress bars are currently implemented in Python.
Part of the goal of this crate is to allow a smooth transition to pure Rust progress bars (once we eventually implement them). As long as the Rust progress bars implement the above traits, the can be used as drop-in replacements for Python progress bars everywhere.
Reviewed By: markbt
Differential Revision: D23982319
fbshipit-source-id: 9ccf167f18d9518bb0ed66e1606a5b8188d98428
Summary:
As EdenFS depends on a few bits of Mercurial code, these needs to be able to
compile with Buck.
Reviewed By: chadaustin
Differential Revision: D24000881
fbshipit-source-id: 078a2a958039a63db1b716785f872b4bbde3bab6
Summary: Make `parents`, `data`, and `metadata` optional, and introduce `WireTreeAttributesRequest` for selecting which attributes to request on the wire.
Reviewed By: kulshrax
Differential Revision: D23406763
fbshipit-source-id: 5edd674d9ba5d37c23b12ab4d7b54bbf6c9ff990
Summary:
Adds a `WireTreeQuery` enum for query method, with a single `ByKeys(WireTreeKeyQuery)` available currently, to request a specific set of keys.
Leave the API struct alone for now.
Reviewed By: kulshrax
Differential Revision: D23402366
fbshipit-source-id: 19cd8066afd9f14c7e5f718f7583d1e2b9ffac02
Summary: The size can change with zstd upgrades. Do not test them.
Reviewed By: sfilipco
Differential Revision: D23976933
fbshipit-source-id: d560061b6e4fefc3bb89513bdb12c770ea0bd881
Summary:
This makes it so commit hashes are serialized to bytes instead of tuples in Python:
In [1]: s,f=api.commitdata(repo.name, list(repo.nodes('master')))
In [2]: list(s)
Out[3]: [{'hgid': '...', ...}]
Some `Vec<HgId>`s cannot be changed using this way. It'd be nice if we can change
the default `HgId` serialization to bytes.
Reviewed By: kulshrax
Differential Revision: D23966989
fbshipit-source-id: 4d013525419741d3c5c23621be16e70441bab3c4
Summary:
`HgId` currently serializes into a tuple of 20 items. This is suboptimal in
CBOR, because the items are untyped. A byte might be serialized into one or two
bytes:
In [2]: cbor.dumps([1,1,1,1])
Out[2]: b'\x84\x01\x01\x01\x01'
In [3]: cbor.dumps([255,255,255,255])
Out[3]: b'\x84\x18\xff\x18\xff\x18\xff\x18\xff'
CBOR supports "bytes" type to efficiently encode a `[u8]`:
In [5]: cbor.dumps(b"\x01\x01\x01\x01")
Out[5]: b'D\x01\x01\x01\x01'
In [6]: cbor.dumps(b"\xff\xff\xff\xff")
Out[6]: b'D\xff\xff\xff\xff'
Add `serde_with` with 3 flavors: `bytes`, `tuple`, `hex` to satisfy different
needs. Check the added docstring for details.
Reviewed By: kulshrax
Differential Revision: D23966992
fbshipit-source-id: 704132648f9e50b952ffde0e96ee2106f2f2fbcf
Summary:
Dynamicconfig can generate configs two ways, 1) via `hg
debugdynamicconfig` and 2) synchronously in-process in an hg command when it
detects that the dynamicconfig is completely missing or has the wrong version
number.
In the first case, dynamicconfig gets the repo name from the standard config
object loaded by the hg dispatch. In the second case, the standard config
object isn't even loaded yet, so dynamicconfig does a mini-load of the user and
repo hgrcs so it can get the repo name and user name (needed for dynamic
conditions).
Unfortunately the second code path computed the wrong path (it had two .hg/'s)
which meant the reponame and user name were always none. This meant that the
dynamicconfig on disk could randomly be either computed with or without a
reponame.
Let's fix the path computation, and add a test. We may want to make
dynamicconfig fail if no repo name is passed, but I'm not sure if we'll want to
support no-repo configuration at some point.
This didn't cause a problem for most people, since it would only happen during a
hg version number change, and 15 minutes later the background 'hg
debugdynamiconfig' process would fix it up. It did affect sandcastle though,
since it often creates new repositories and acts on them immediately.
Reviewed By: quark-zju
Differential Revision: D23955628
fbshipit-source-id: c922f4b523d19df9223aa28c97700b7011fc03eb
Summary:
The old code tried to express 4GB by using ^ to do an exponent. That
operator is actually the bitwise xor, so this was producing a limit closer to 4
bytes. It doesn't seem to have mattered much since a later diff overrode the
default via dynamicconfig. But let's fix this anyway.
Reviewed By: krallin
Differential Revision: D23955629
fbshipit-source-id: 6abebcb7e84b7a47f70ac501fa11b0dc60dfda7b
Summary: Now that the `async_runtime` crate exists, use Mercurial's global `tokio::Runtime` instead of creating one for each EdenAPI store.
Reviewed By: quark-zju
Differential Revision: D23945569
fbshipit-source-id: 7d7ef6efbb554ca80131daeeb2467e57bbda6e72
Summary: Now that the EdenAPI server is using the `LoadMiddleware` from `gotham_ext`, each response will contain an `X-Load` header that contains the number of active requests that the server is currently handling.
Reviewed By: quark-zju
Differential Revision: D23922809
fbshipit-source-id: 973143de5ddccf074d28aa3ef38d73f9fc1501b6
Summary:
Network speeds are usually reported in megabits per second (Mb/s), whereas file sizes are usually reported in [mebibytes](https://en.wikipedia.org/wiki/Binary_prefix) per second (MiB/s). Previously, the HTTP client reported neither of those and instead reported megabytes per second (MB/s).
This diff changes the latter to the former so that the numbers are more immediately useful. As a bonus, the speeds are now directly comparable to those reported by `hg debugnetwork`.
Reviewed By: quark-zju
Differential Revision: D23920367
fbshipit-source-id: 46500a42681ab83fc7c4ead82980e8ed620a4d5a
Summary: Now that stats are logged to `tracing` by the `HttpClient` directly, we no longer need to log them here. This commit backs out D23858077 (613fbc858f) which added the logging.
Reviewed By: quark-zju
Differential Revision: D23919308
fbshipit-source-id: 23d3a12c5307bc4b84dd9ffd25bd376718e3cc91
Summary:
Improve the log output of the HTTP client to avoid spewing redundant debug messages.
As part of this change, logging now uses the `tracing` crate instead of the `log` crate for better integration with the rest of Mercurial's logging infrastructure. Right now, `tracing` is just being used as a drop-in replacement for `log`, but now that it's in use we can start using its full capabilities (such as defining tracing spans) in later diffs.
Reviewed By: quark-zju
Differential Revision: D23919310
fbshipit-source-id: 95555ad083ead805ceece39c6e30aaf879bdf2bc
Summary:
We were using the timeout parameter on `Multi::wait` (equivalent to `curl_multi_wait` in C) incorrectly. Previously, we were passing in the timeout provided by `curl_multi_timeout`.
This is incorrect usage because the value returned by `curl_multi_timeout` is the current value of libcurl's internal timeout (based on the state of the transfers), which will always be respected. The actual intention of the timeout parameter is to allow the caller to specify a hard cap on curl's internal timeout, so we should just pass some reasonable default value here. ([See explanation here.](https://github.com/curl/curl/issues/2996))
The purpose of `curl_multi_timeout` is to allow libcurl to tell the application what its desired timeout is in situations where the application itself is waiting for socket activity (using something like `epoll`), which is not the case when using `curl_multi_wait`.
Reviewed By: DurhamG
Differential Revision: D23914093
fbshipit-source-id: 76a25d7c59a4b08437c8d7be3d24708fb37b9172
Summary: Use the functionality from D23910534 (721f5af278) to set a timeout for EdenAPI requests, configured via the `edenapi.timeout` option.
Reviewed By: DurhamG
Differential Revision: D23911552
fbshipit-source-id: 4a6e3de1094d0faa1daaf6fe4b9b7aafb37a25a8
Summary: Add the ability to set a timeout on HTTP requests. Equivalent to [`CURLOPT_TIMEOUT_MS`](https://curl.haxx.se/libcurl/c/CURLOPT_TIMEOUT_MS.html).
Reviewed By: DurhamG
Differential Revision: D23910534
fbshipit-source-id: a7aec792ec3c122a01aa44fcfe2e2df6e3a111fc
Summary:
There are several places in the HTTP client where we log and discard errors. (Typically, these are "this should never happen" type situations.)
Previously, these were logged at the `trace` log level, meaning that in practice no one would ever know if we did hit these errors.
Let's upgrade them to `error` so that they'll be printed out. (In theory, users should never see these error messages unless something has gone horribly wrong.)
Reviewed By: DurhamG
Differential Revision: D23888268
fbshipit-source-id: 9007205f946ebb0127238c76812cf62524878047
Summary:
Treemanifest needs to be able to write to the shared stores from paths
other than just prefetch (like when it receives certain trees via a standard
pull). To make this possible we need to expose the Rust shared mutable stores.
This will also make just general integration with Python cleaner.
In the future we can get rid of the non-prefetch download paths and remove this.
Reviewed By: quark-zju
Differential Revision: D23772385
fbshipit-source-id: c1e67e3d21b354b85895dba8d82a7a9f0ffc5d73
Summary:
Introduce separate wire types to allow protocol evolution and client API changes to happen independently.
* Duplicate `*Request`, `*Entry`, `Key`, `Parents`, `RepoPathBuf`, `HgId`, and `revisionstore_types::Metadata` types into the `wire` module. The versions in the `wire` module are required to have proper `serde` annotations, `Serialize` / `Deserialize` implementations, etc. These have been removed from the original structs.
* Introduce infallible conversions from "API types" to "wire types" with the `ToWire` trait and fallible conversions from "wire types" to "API types" with the `ToApi`. API -> wire conversions should never fail in a binary that builds succesfully, but wire -> API conversions can fail in the case that the server and client are using different versions of the library. This will cause, for instance, a newly-introduced enum variant used by the client to be deserialized into the catch-all `Unknown` variant on the server, which won't generally have a corresponding representation in the API type.
* Cleanup: remove `*Response` types, which are no longer used anywhere.
* Introduce a `map` method on `Fetch` struct which allows a fallible conversion function to be used to convert a `Fetch<T>` to a `Fetch<U>`. This function is used in the edenapi client implementation to convert from wire types to API types.
* Modify `edenapi_server` to convert from API types to wire types.
* Modify `edenapi_cli` to convert back to wire types before serializing responses to disk.
* Modify `make_req` to use `ToWire` for converting API structs from the `json` module to wire structs.
* Modify `read_res` to use `ToApi` to convert deserialized wire types to API types with the necessary methods for investigating the contents (`.data()`, primarily). It will print an error message to stderr if it encounters a wire type which cannot be converted into the corresponding API type.
* Add some documentation about protocol conventions to the root of the `wire` module.
Reviewed By: kulshrax
Differential Revision: D23224705
fbshipit-source-id: 88f8addc403f3a8da3cde2aeee765899a826446d
Summary: Add log messages for debugging using the `tracing` crate, which allows them to be enabled via `env_logger`.
Reviewed By: quark-zju
Differential Revision: D23858076
fbshipit-source-id: a8ef1afac6c9ecbfb5d6d78232aa0d03a2fe2054
Summary: Log HTTP stats to stderr to assist with ad-hoc debugging. Will not be printed unless `RUST_LOG` is set appropriately.
Reviewed By: quark-zju
Differential Revision: D23858077
fbshipit-source-id: 39acf3de3fd0ca4403a986eb5373a6a79f1d004a
Summary:
Add a `PyFuture<F>` type that can be used as return type in binding function.
It converts Rust Future to a Python object with an `await` method so Python
can access the value stored in the future.
Unlike `TStream`, it's currently only designed to support Rust->Python one
way conversion so it looks simpler.
Reviewed By: kulshrax
Differential Revision: D23799644
fbshipit-source-id: da4a322527ad9bb4c2dbaa1c302147b784d1ee41
Summary:
The exposed type can be used as a Python iterator:
for value in stream:
...
The Python type can be used as input and output parameters in binding functions:
# Rust
type S = TStream<anyhow::Result<X>>;
def f1() -> PyResult<S> { ... }
def f2(x: S) -> PyResult<S> { Ok(x.stream().map_ok(...).into()) }
# Python
stream1 = f1()
stream2 = f2(stream1)
This crate is similar to `cpython-ext`: it does not define actual business
logic exposed by `bindings` module. So it's put in `lib`, not
`bindings/modules`.
Reviewed By: markbt
Differential Revision: D23799641
fbshipit-source-id: c13b0c788a6465679b562976728f0002fd872bee
Summary:
Move bunch of code into a separate file (scm daemon related options). Move them
out of cloud sync.
Also introduce additional check that the `hg cloud sync` command scm daemon
runs is intended for the current connected workspace
In theory when we switch a subscription, the SCM daemon gets notified but races possible and it is better to have this additional check, so SCM daemon triggers cloud sync where it is supposed to.
Reviewed By: markbt
Differential Revision: D23783616
fbshipit-source-id: b91a8b79189b7810538c15f8e61080b41abde386
Summary:
The Rust contentstore has no way to flush the shared stores, except
when the object is destructed. In treemanifest, the lifetime of the shared store
seems to be different from with files and we're not seeing them flushes
appropriately during certain commands. Let's make the flush api also flush the
shared stores.
Reviewed By: quark-zju
Differential Revision: D23662976
fbshipit-source-id: a542c3e45d5b489fcb5faf2726854cb49df16f4c
Summary: The old logic would just double pack some bits. Let's prevent that.
Reviewed By: xavierd
Differential Revision: D23661933
fbshipit-source-id: 155291fa08ec2c060619329bd1cb6040769feb63
Summary:
The rust pack stores currently have logic to refresh their list of
packs if there's a key miss and if it's been a while since we last loaded the
list of packs. In some cases we want to manually trigger this refresh, like if
we're in the middle of a histedit and it invokes an external command that
produces pack files that the histedit should later consume (like an external
amend, that histedit then needs to work on top of).
Python pack stores solve this by allowing callers to mark the store for a
refresh. Let's add the same logic for rust stores. Once pack files are gone we
can delete this.
This will be useful for the upcoming migration of treemanifest to Rust
contentstore. Filelog usage of the Rust contentstore avoided this issue by
recreating the entire contentstore object in certain situations, but refresh
seems useful and less expensive.
Reviewed By: quark-zju
Differential Revision: D23657036
fbshipit-source-id: 7c6438024c3d642bd22256a8e58961a6ee4bc867
Summary:
Instants do not represent actual time and can only be compared against
each other. When we subtracted arbitrary Durations from them, we run the risk of
overflowing the underlying storage, since the Instant may be represented by a
low number (such as the age of the process).
This caused crashes in test_refresh (in the next diff) on Windows.
Let's instead represent the "must rescan" state as a None last_scanned time, and avoid any arbitrary subtraction. It's generally much cleaner too.
Reviewed By: quark-zju
Differential Revision: D23752511
fbshipit-source-id: db89b14a701f238e1c549e497a5d751447115fb2
Summary:
Previously the MetadataStore would always construct a mutable pack, even
if the operation was readonly. This meant all read commands required write
access. It also means that random .tmp files get scattered all over the place
when the rust structures are not properly destructed (like if python doesn't
bother doing the final gc to call destructors for the Rust types).
Let's just only create mutable packs when we actually need them.
Reviewed By: quark-zju
Differential Revision: D23219961
fbshipit-source-id: a47f3d94f70adac1f2ee763f3170ed582ef01a14
Summary:
Previously the ContentStore would always construct a mutable pack, even
if the operation was readonly. This meant all read commands required write
access. It also means that random .tmp files get scattered all over the place
when the rust structures are not properly destructed (like if python doesn't
bother doing the final gc to call destructors for the Rust types).
Let's just only create mutable packs when we actually need them.
Reviewed By: quark-zju
Differential Revision: D23219962
fbshipit-source-id: 573844f81966d36ad324df03eecec3711c14eafe
Summary:
As it says in the title, this adds support for receiving compressed responses
in the revisionstore LFS client. This is controlled by a flag, which I'll
roll out through dynamicconfig.
The hope is that this should greatly improve our throughput to corp, where
our bandwidth is fairly scarce.
Reviewed By: StanislavGlebik
Differential Revision: D23652306
fbshipit-source-id: 53bf86d194657564bc3bd532e1a62208d39666df
Summary:
This imports the async-compression crate. We have an equivalent-ish in
common/rust, but it targets Tokio 0.1, whereas this community-supported crate
targets Tokio 0.2 (it offers a richer API, notably in the sense that we
can use it for Streams, whereas the async-compression crate we have is only for
AsyncWrite).
In the immediate term, I'd like to use this for transfer compression in
Mononoke's LFS Server. In the future, we might also use it in Mononoke where we
currently use our own async compression crate when all that stuff moves to
Tokio 0.2.
Finally, this also updates zstd: the version we link to from tp2 is actually
zstd 1.4.5, so it's a good idea to just get the same version of the zstd crate.
The zstd crate doesn't keep a great changelog, so it's hard to tell what has changed.
At a glance, it looks like the answer is not much, but I'm going to look to Sandcastle
to root out potential issues here.
Reviewed By: StanislavGlebik
Differential Revision: D23652335
fbshipit-source-id: e250cef7a52d640bbbcccd72448fd2d4f548a48a
Summary:
We've often had cases where we need to nuke peoples caches for various
reasons. It's a hug pain since we haven't a way to communicate with all hg
clients. Now that we have configerator dynamicconfigs, we can use that to reach
all clients.
This diff adds support for configs like:
```
[hgcache-purge]
foo=2020-08-20
```
The key, 'foo' in this case, is an identifier used to only run this purge once.
The value is a date after which this purge will no longer run. This is useful
for bounding the damager from forgetting about a purge and having it delete caches
over and over in the future for new repos or repos where the run once marker
file is deleted for some reason.
Reviewed By: quark-zju
Differential Revision: D23044205
fbshipit-source-id: 8394fcf9ba6df09f391b5317bad134f369e9b416
Summary:
For repositories that have the old-style LFS extension enabled, the pointers
are stored in packfiles/indexedlog alongside with a flag that signify to the
upper layers that the blob is externally stored. With the new way of doing LFS,
pointers are stored separately.
When both are enabled, we are observing some interesting behavior where
different get and get_meta calls may return different blobs/metadata for the
same filenode. This may happen if a filenode is stored in both a packfile as an
LFS pointers, and in the LFS store. Guaranteeing that the revisionstore code is
deterministic in this situation is unfortunately way too costly (a get_meta
call would for instance have to fully validate the sha256 of the blob, and this
wouldn't guarantee that it wouldn't become corrupted on disk before calling
get).
The solution take here is to simply ignore all the lfs pointers from
packfiles/indexedlog when remotefilelog.lfs is enabled. This way, there is no
risk of reading the metadata from the packfiles, and the blob from the
LFSStore. This brings however another complication for the user created blobs:
these are stored in packfiles and would thus become unreadable, the solution is
to simply perform a one-time full repack of the local store to make sure that
all the pointers are moved from the packfiles to to LFSStore.
In the code, the Python bindings are using ExtStoredPolicy::Ignore directly as
these are only used in the treemanifest code where no LFS pointers should be
present, the repack code uses ExtStoredPolicy::Use to be able to read the
pointers, it wouldn't be able to otherwise.
Reviewed By: DurhamG
Differential Revision: D22951598
fbshipit-source-id: 0e929708ba5a3bb2a02c0891fd62dae1ccf18204
Summary:
hg-http's built client should provide integration with Mercurial's stats
collection mechanisms.
Reviewed By: kulshrax
Differential Revision: D23577867
fbshipit-source-id: 93c777021bc347511322269d678d6879710eed3e
Summary:
Add `with_stats_reporting` to HttpClient. It takes a closure that will be
called with all `Stats` objects generated. We then use this function in
the hg-http crate to integrate with the metrics backend used in Mercurial.
Reviewed By: kulshrax
Differential Revision: D23577869
fbshipit-source-id: 5ac23f00183f3c3d956627a869393cd4b27610d4
Summary:
We start off simple here. Python only really has counters so we only implement
counters. There are a lot of options on how to improve this and things get
slightly complicated when we look at the how ecosystem and fb303. Anyway,
simple start.
Reviewed By: quark-zju
Differential Revision: D23577874
fbshipit-source-id: d50f5b2ba302d900b254200308bff7446121ae1d
Summary: The Mercurial codebase uses hyphens in crate names rather than underscores. This is similar to the convention favored by the larger Rust community, though it is different from Mononoke, which uses underscores. While we'll probably need to eventually settle on a consistent convention for all of projects in the Eden SCM repo, for now, `http_client` should be made consistent with the adjacent crates.
Reviewed By: sfilipco
Differential Revision: D23585721
fbshipit-source-id: d2e690d86815be02d7b8d645198bcd28e8cbd6e0
Summary: No more tokio-core! More `async/await`.
Reviewed By: kulshrax
Differential Revision: D23586509
fbshipit-source-id: b2e766ddb7575bc96963432f0c8582b4370b19aa
Summary:
This diff adds a `SocketTransport` implementation that no longer uses legacy `tokio-core` based futures but `tokio-tower` and `tower-service` for processing Thrift requests.
The old implementation is renamed to `SocketTransportLegacy` for better transitioning.
Reviewed By: dtolnay
Differential Revision: D20019196
fbshipit-source-id: 3bee684e9254bf1a81669ef0d2c2262a55e75daa
Summary:
In order to keep the hgcache size bounded we need to keep track of pack
file size even during normal operations and delete excess packs.
This has the negative side effect of deleting necessary data if the operation is
legitimately huge, but we'd rather have extra downloading time than fill up the
entire disk.
Reviewed By: quark-zju
Differential Revision: D23486922
fbshipit-source-id: d21be095a8671d2bfc794c85918f796358dc4834
Summary:
In a future diff we'll add logic to delete old pack files. We'll want
to use this pack iteration code, so let's move it to a function.
Reviewed By: quark-zju
Differential Revision: D23486920
fbshipit-source-id: 5f872e946ffe816289c925dd2e03c292e29da5af
Summary:
As the repository grows the opportunity for large downloads increases.
Today all writes to data packs get sent straight to disk, but we have no way to
prevent this from eating all the disk.
Let's automatically flush datapacks when they reach a certain size (default
4GB). In a future diff this will let us automatically garbage collect data packs
to bound the maximum size of packs.
Rotatelog already have this behavior.
Reviewed By: quark-zju
Differential Revision: D23478780
fbshipit-source-id: 14f9f707e8bffc59260c2d04c18b1e4f6bdb2f90
Summary:
See D23538897 for context. This adds a killswitch so we can rollout client
certs gradually through dynamicconfig.
Reviewed By: StanislavGlebik
Differential Revision: D23563905
fbshipit-source-id: 52141365d89c3892ad749800db36af08b79c3d0c
Summary:
Like it says in the title, this updates remotefilelog to present client
certificates when connecting to LFS (this was historically the case in the
previous LFs extension). This has a few upsides:
- It lets us understand who is connecting, which makes debugging easier;
- It lets us enforce ACLs.
- It lets us apply different rate limits to different use cases.
Config-wise, those certs were historically set up for Ovrsource, and the auth
mechanism will ignore them if not found, so this should be safe. That said, I'd
like to a killswitch for this nonetheless. I'll reach out to Durham to see if I
can use dynamic config for that
Also, while I was in there, I cleaned up few functions that were taking
ownership of things but didn't need it.
Reviewed By: DurhamG
Differential Revision: D23538897
fbshipit-source-id: 5658e7ae9f74d385fb134b88d40add0531b6fd10
Summary:
Generated by formatting with rustfmt 2.0.0-rc.2 and then a second time with fbsource's current rustfmt (1.4.14).
This results in formatting for which rustfmt 1.4 is idempotent but is closer to the style of rustfmt 2.0, reducing the amount of code that will need to change atomically in that upgrade.
---
*Why now?* **:** The 1.x branch is no longer being developed and fixes like https://github.com/rust-lang/rustfmt/issues/4159 (which we need in fbcode) only land to the 2.0 branch.
---
Reviewed By: zertosh
Differential Revision: D23568779
fbshipit-source-id: 477200f35b280a4f6471d8e574e37e5f57917baf
Summary:
Now that the Rust revisionstore records undesired filename fetches,
let's log those results to Scuba in Python.
Reviewed By: StanislavGlebik
Differential Revision: D23462572
fbshipit-source-id: b55f2290e30e3a5c3b67d9f612b24bc3aad403a8
Summary:
We want to be able to record when fetches to certain paths happen.
Let's add recording infrastructure to the new ReportingRemoteDataStore.
A future diff will make the seen accessible from Python for scuba logging.
Reviewed By: xavierd
Differential Revision: D23462574
fbshipit-source-id: 5d749f2429e26e8e7fe4fb5adc29140b4309eac9
Summary:
We want to monitor what paths are fetched from our remote servers.
Since all of our remote stores are hidden behind the RemoteDataStore interface,
let's create a wrapper around that. A future diff will insert the actual
monitoring and reporting.
Reviewed By: quark-zju
Differential Revision: D23462571
fbshipit-source-id: e6031f19db23f7d1b09767efb9613d7528fb457d
Summary:
This is based on fbsource data, building level 5 proves to be not useful.
This would save 300ms in the write path.
Reviewed By: sfilipco
Differential Revision: D23494505
fbshipit-source-id: ca795b4900af40dbfdaa463d36f3169413bf6a62
Summary:
Previously the IdMap's "Name -> Id" index simply ignores the "reassign
non-master" request. It turns out stale entries in that index can cause
issues as demonstrated by the previous diff.
Update IdMap to actually remove both indexes of non-master group on
remove_non_master so it cannot have stale entries.
To optimize the index, the format of IdMap is changed from:
[ 8 bytes Id (Big Endian) ] [ Name ]
to:
[ 8 bytes Id (Big Endian) ] [ 1 byte Group ] [ Name ]
So the index can use reference to the slice, instead of embedding the bytes, to
reduce index size.
The filesystem directory name for IdMap used by NameDag is bumped to `idmap2`
so it won't read the incompatible old `idmap` data.
Reviewed By: sfilipco
Differential Revision: D23494508
fbshipit-source-id: 3cb7782577750ba5bd13515b370f787519ed3894
Summary: Some vertexes can disappear from the graph!
Reviewed By: sfilipco
Differential Revision: D23494506
fbshipit-source-id: ecbf2a4169e5fc82596e89a4bfe4c442a82e9cd2
Summary: The TestDag struct will be used to do some more complicated tests.
Reviewed By: sfilipco
Differential Revision: D23494507
fbshipit-source-id: 11350f9e448725ae49f50a7b6f19efc57ad84448
Summary:
Replacing places where the tokio runtime is instantiated inside the edenapi
client crate.
Reviewed By: quark-zju
Differential Revision: D23468596
fbshipit-source-id: ef68718c7d5b89b6477a2946daaa51618b53d06a
Summary:
At open time, it's pointless to attempt to create new levels. So let's just
read the existing max_level and do not try to build max_level + 1.
This turns out to save 300ms in profiling result.
Reviewed By: sfilipco
Differential Revision: D23494509
fbshipit-source-id: 4ea326a3cc21792790ea0b87e5bf608a94ae382b
Summary:
With MultiLog, per-log meta was previously entirely ignored. However, they can
be useful for updated indexes. For example, application defines a new index,
and opens a Log via MultiLog. The application would expect the new index is
built only once. Without MultiLog, per-log meta is updated at open time in
place. With MultiLog, the updated index meta is not written back to the
multimeta so the new index would be rebuilt multiple times undesirably.
Update MultiLog to reuse the per-log meta if it's compatible so it can pick up
new indexes.
Reviewed By: sfilipco
Differential Revision: D23488212
fbshipit-source-id: c8b3e6b5589dbda2e76a143d15085862a93dae22
Summary:
The poisoned meta makes investigation harder. ex. `debugdumpindexlog` won't
work on those logs.
Reviewed By: sfilipco
Differential Revision: D23488213
fbshipit-source-id: b33894d8c605694b6adf5afdaed45707fbd7357e
Summary:
Change dag_ops benchmarks to use different IdDagStores. An example run shows:
benchmarking dag::iddagstore::indexedlog_store::IndexedLogStore
building segments (old) 856.803 ms
building segments (new) 127.831 ms
ancestors 54.288 ms
children (spans) 619.966 ms
children (1 id) 12.596 ms
common_ancestors (spans) 3.050 s
descendants (small subset) 35.652 ms
gca_one (2 ids) 164.296 ms
gca_one (spans) 3.132 s
gca_all (2 ids) 270.542 ms
gca_all (spans) 2.817 s
heads 247.504 ms
heads_ancestors 40.106 ms
is_ancestor 108.719 ms
parents 243.317 ms
parent_ids 10.752 ms
range (2 ids) 7.370 ms
range (spans) 23.933 ms
roots 620.150 ms
benchmarking dag::iddagstore::in_process_store::InProcessStore
building segments (old) 790.429 ms
building segments (new) 55.007 ms
ancestors 8.618 ms
children (spans) 196.562 ms
children (1 id) 2.488 ms
common_ancestors (spans) 545.344 ms
descendants (small subset) 8.093 ms
gca_one (2 ids) 24.569 ms
gca_one (spans) 529.080 ms
gca_all (2 ids) 38.462 ms
gca_all (spans) 540.486 ms
heads 103.930 ms
heads_ancestors 6.763 ms
is_ancestor 16.208 ms
parents 103.889 ms
parent_ids 0.822 ms
range (2 ids) 1.748 ms
range (spans) 6.157 ms
roots 197.924 ms
benchmarking dag::iddagstore::bytes_store::BytesStore
building segments (old) 724.467 ms
building segments (new) 90.207 ms
ancestors 23.812 ms
children (spans) 348.237 ms
children (1 id) 4.609 ms
common_ancestors (spans) 1.315 s
descendants (small subset) 20.819 ms
gca_one (2 ids) 72.423 ms
gca_one (spans) 1.346 s
gca_all (2 ids) 116.025 ms
gca_all (spans) 1.470 s
heads 155.667 ms
heads_ancestors 19.486 ms
is_ancestor 51.529 ms
parents 157.285 ms
parent_ids 5.427 ms
range (2 ids) 4.448 ms
range (spans) 13.874 ms
roots 365.568 ms
Overall, InProcessStore > BytesStore > IndexedLogStore. The InProcessStore
uses `Vec<BTreeMap<Id, StoreId>>` for the level-head index, which is more
efficient on the "Level" lookup (Vec), and more cache efficient (BTree).
BytesStore outperforms IndexedLogStore because it does not need to verify
checksum on every read access - the checksum was verified at store creation
(IdDag::from_bytes).
Note: The `BytesStore` is something optimized for serialization, and hasn't been sent.
Reviewed By: sfilipco
Differential Revision: D23438174
fbshipit-source-id: 6e5f15188e3b935659ccde25fac573e9b963b78f
Summary: This allows them to use the SyncableIdDag APIs.
Reviewed By: sfilipco
Differential Revision: D23438170
fbshipit-source-id: 7ec7288cfb8186b88f85f0212a913cb0dffe7345
Summary: Other IdDagStores can also use the API. This will be used in benchmarks.
Reviewed By: sfilipco
Differential Revision: D23438180
fbshipit-source-id: 565552b66372dcfbb268c397883f627491d6e154
Summary:
Similar to `IdDagStore::sync` -> `GetLock::persist`, `reload` is more related
to filesystem/internal state exchange, and should be protected by a lock. So
let's move the API there, and requires a lock.
Reviewed By: sfilipco
Differential Revision: D23438169
fbshipit-source-id: 4228106b7739a1a758677adfddd213ad54aa4b6a
Summary:
`NameDag::reload` is used in `flush` to get a "fresh" NameDag.
In a future diff the `IdDag::reload` API gets changed, so let's
remove NameDag's use of it.
Instead, let's just re-`open` the path again to get a fresh NameDag.
It's a bit more expensive but probably okay, and easier to understand.
`get_new_segment_size()` was added as an internal API to preserve tests.
This also solves an issue where `NameDag` cannot recover properly if its
`flush` fails, because the old `NameDag` state is not lost.
After removing `NameDag::reload`, `idMap::reload` is no longer used publicly
and was made private.
Reviewed By: sfilipco
Differential Revision: D23438179
fbshipit-source-id: 0a32556a2cd786919c233d7efcae1cb9cbc5fb09
Summary:
The word "sync" is bi-directional: flush + reload. It was indexedlog::Log's
behavior. However, in the IdDag context "sync" is confusing - it is actually
only used to write data out, with protection from lock. Rename to `persist`
to clarify it's memory -> disk. Besides, requires a reference to a lock object
as a lightweight prove that some lock is held.
Reviewed By: sfilipco
Differential Revision: D23438175
fbshipit-source-id: 3d9ccd7431691d1c4e2ee74f3c80d95f5e7243b5
Summary:
This removes the need of cloning `IdMap`.
SyncableIdMap is a bit tricky. I added some comments to clarify things.
Reviewed By: sfilipco
Differential Revision: D23438176
fbshipit-source-id: fe66071da07067ed6c53a6437790af1d81b28586
Summary:
Make the test cover IndexedLogIdDagStore. The only change is the parent index
returns children in a different order.
Reviewed By: sfilipco
Differential Revision: D23438173
fbshipit-source-id: bcfabcd329e45bbc5e7e773103fa42307c23c35d
Summary:
There aren't too many thigs that we can do with the responses that we get back
from the server. Thigs are somewhat application specific for this endpoint.
One option that is not available right now and might make sense to add is
limiting the number of entries that are printed for a given location.
Reviewed By: kulshrax
Differential Revision: D23456220
fbshipit-source-id: eb24602c3dea39b568859b82fc27b7f6acc77600
Summary:
To reduce the size over the wire on cases where we would be traversing the
changelog on the client, we want to allow the endpoint to return a whole parent
chain with their hashes.
Reviewed By: kulshrax
Differential Revision: D23456216
fbshipit-source-id: d048462fa8415d0466dd8e814144347df7a3452a
Summary:
Renaming all the LocationToHash related structures to CommitLocationToHash.
This is done for consistency. I realized the issue when the command for reading
the request from cbor was not what I was expecting it to be. The reason was that
the commit prefix was used inconsistently for LocationToHash.
Reviewed By: kulshrax
Differential Revision: D23456221
fbshipit-source-id: 0181dcaf81368b978902d8ca79c5405838e4b184