Summary: We are using the upperbound of `<[u8]>::split`'s iterator as the value for `with_capacity`, however this is incredibly pessimistic, and as a result we overallocate. Let's just make an initial pass and count the actual number.
Reviewed By: RajivTS
Differential Revision: D49951367
fbshipit-source-id: c3ce6fa0d4125f2cac26c9ea017a0992f9291397
Summary:
merge.update() had a lot of complicated arg checking to differentiate simple working copy updates from tricker three way merges. In particular, the eden and nativeupdate code paths only support the simple update case.
To make things clearer, let's separate update() into goto() and merge(). goto() handles the simple case, and moving it to Rust will be the (achievable) goal of some upcoming work. merge() handles the full enchilada, and isn't ripe for Rustification at this point.
Reviewed By: zzl0
Differential Revision: D49058005
fbshipit-source-id: f8106c04c48bfba0e7fad244e1dd9933142674e1
Summary: I couldn't find any references to the eden.prefetchsparseprofiles and clone.nohttp config settings.
Reviewed By: sggutier
Differential Revision: D49058002
fbshipit-source-id: 833cff3b3da0531ca18cabb3ade8289bf13dfaca
Summary:
Been deprecated for a while in favor of "commands.update.check".
There is a terrible subtlety here: "commands" config settings are automatically filtered out when HGPLAIN=1. "experimental" settings are not. So, experimental.updatecheck=noconflict was active with HGPLAIN=1, but commands.update.check=noconflict is not active with HGPLAIN=1. This would be a change in behavior, though, so I set the configitems.py default value to "noconflict" which will take effect with HGPLAIN=1.
Still, there is a change in behavior in that locally configured commands.update.check behavior will no longer take effect with HGPLAIN=1. It doesn't look to be configured anywhere outside hg, so I think we are okay.
We are planning to simplify and improve the update check with upcoming Rust work, so this config knob should go away, hopefully.
Reviewed By: sggutier
Differential Revision: D49058007
fbshipit-source-id: af2d01bf7bc6fac0bbbd913b745ce5da8353f13d
Summary: implement task local ClientRequestInfo and pass thread local into task local for EdenApi
Reviewed By: quark-zju
Differential Revision: D49912043
fbshipit-source-id: 1327aecd2162a441737a0602c42899980f3c2e23
Summary: This diff updates the `DeltaInstructions` storage logic to use `flate2` Zlib encoding instead of the `async-compression` ZLib encoding to maintain consistency with how Git objects are encoded and decoded when creating packs and bundles. It also adds a unit test to validate that the round trip from storage to bytes works as expected for `DeltaInstructions`
Reviewed By: markbt
Differential Revision: D49814036
fbshipit-source-id: 2c934f392a6be47a2d419a8ff84dbf1645dc6fae
Summary: Hopefully this can further reduce confusion.
Reviewed By: muirdm
Differential Revision: D49922798
fbshipit-source-id: 6da9f05eb9f3327976640bc67be6fed724611bd3
Summary:
Make `version` show "Sapling" even when running under hg identity.
This might reduce confusion internally.
Reviewed By: muirdm
Differential Revision: D49922799
fbshipit-source-id: 41cf17b461547ac4b2e16364667486216adeec27
Summary:
We met a copytrace failure, which was caused below two combined factors:
1. lib.rs is common name
2. commit 843db9ece03c is a big commit that changed many "lib.rs" files.
Let's increase the DEFAULT_MAX_RENAME_CANDIDATES, so we will check more
candidates files in this case.
Reviewed By: muirdm
Differential Revision: D49894288
fbshipit-source-id: 442f6901d8a4b7c3edfb9ccb59b7f5ef295d4885
Summary: Update the `lru` version we use to `0.11.1`.
Reviewed By: zertosh
Differential Revision: D49889174
fbshipit-source-id: 4b8103b9f6a4a066eded1667983bc45bd9991192
Summary:
See the previous change for context. This diff updates the rest of the internal
functions, modules to use the internalconfig name.
Reviewed By: muirdm
Differential Revision: D49881848
fbshipit-source-id: 439fc82f7c9345a79d7c6d32d0d0610556901505
Summary:
Per discussion on Aug 28, we'd like to rename dynamicconfig to internalconfig
to better reflect its actual features (Meta-internal static + dynamic + remote
configs). See also the `mode.rs` in D48042830.
The `dynamicconfig` command has effects writing files. Rename it to
`refreshconfig` to emphasize the effects.
The `dumpdynamicconfig` command only writes to stdout. So keep its "dump"
name but rename "dynamic" to "internal".
Reviewed By: muirdm
Differential Revision: D49881549
fbshipit-source-id: 0c3bb9caf6ea91ae7fe624e199cd3355244489e0
Summary:
People with muscle memories would be a bit surprised that certain commands no
longer work with `sl`. By moving them back from `legacyaliases` to `aliases` the
migration from `hg` to `sl` could be more smooth.
Reviewed By: zzl0
Differential Revision: D49879280
fbshipit-source-id: 20c747d7c44c1c53a89fc425187eb2280134a62d
Summary: In previous diffs the amount of code that was duplicative grew. This diff, refactors that duplicative code into a single method.
Reviewed By: kmancini
Differential Revision: D49753337
fbshipit-source-id: 34e5c5f132fcb4fa36754bad96b4277fcce9c08e
Summary:
As part of the project to remove HgImporter (aka `hg debugedenimporthelper`) it was discovered that we present duplicate keys to the EdenAPI when making batched requests - https://fburl.com/scuba/edenfs_events/owb8b65e. This reults in the requests being failed in the `backingstore` library (to avoid being failed by Mononoke). When these batched requests fail - they fall back to using HgImporter and fetch each object one at a time.
This diff addresses the batching of blob metadata. It creates a map of unique object ids to vectors of requests. In most cases the vector is size 1.
With the help of chadaustin, we were able to root cause why duplicate requests are submitted to EdenAPI. From Eden's POV, each individual file and directory are different and identified by a different hash. From Mononoke's POV, content (tree or blob) that contain the exact same bytes are the same regardless of where they are in a repo. For this reason, it is possible (somewhat common) to create a batch of requests that contain the same proxy hash to be submitted to EdenAPI and the solution put in place here is the correct one - identifying a set of duplicates, submitting only one hash for the set, and fulfilling all the set's promises when completed.
This diff and subsequent diffs also are increasing the amount of duplicate code between getBobBatch, getBlobMetadataBatch and getTreeBatch. This will be address in subsequent diffs - see T165043385.
Reviewed By: kmancini
Differential Revision: D49717032
fbshipit-source-id: cb1d18dc038dd2c4672313eb1433dd94d5bdd506
Summary:
As part of the project to remove HgImporter (aka `hg debugedenimporthelper`) it was discovered that we present duplicate keys to the EdenAPI when making batched requests - https://fburl.com/scuba/edenfs_events/owb8b65e. This reults in the requests being failed in the `backingstore` library (to avoid being failed by Mononoke). When these batched requests fail - they fall back to using HgImporter and fetch each object one at a time.
This diff addresses the batching of trees. It creates a map of unique object ids to vectors of requests. In most cases the vector is size 1.
With the help of chadaustin, we were able to root cause why duplicate requests are submitted to EdenAPI. From Eden's POV, each individual file and directory are different and identified by a different hash. From Mononoke's POV, content (tree or blob) that contain the exact same bytes are the same regardless of where they are in a repo. For this reason, it is possible (somewhat common) to create a batch of requests that contain the same proxy hash to be submitted to EdenAPI and the solution put in place here is the correct one - identifying a set of duplicates, submitting only one hash for the set, and fulfilling all the set's promises when completed.
This diff and subsequent diffs also are increasing the amount of duplicate code between getBobBatch, getBlobMetadataBatch and getTreeBatch. This will be address in subsequent diffs - see T165043385.
Reviewed By: kmancini
Differential Revision: D49707255
fbshipit-source-id: f176b785e74e71a2d13865397e9023ce538e928c
Summary:
As part of the project to remove HgImporter (aka `hg debugedenimporthelper`) it was discovered that we present duplicate keys to the EdenAPI when making batched requests - https://fburl.com/scuba/edenfs_events/owb8b65e. This reults in the requests being failed in the `backingstore` library (to avoid being failed by Mononoke). When these batched requests fail - they fall back to using HgImporter and fetch each object one at a time.
This diff addresses the batching of blobs. It creates a map of unique object ids to vectors of requests. In most cases the vector is size 1.
With the help of chadaustin, we were able to root cause why duplicate requests are submitted to EdenAPI. From Eden's POV, each individual file and directory are different and identified by a different hash. From Mononoke's POV, content (tree or blob) that contain the exact same bytes are the same regardless of where they are in a repo. For this reason, it is possible (somewhat common) to create a batch of requests that contain the same proxy hash to be submitted to EdenAPI and the solution put in place here is the correct one - identifying a set of duplicates, submitting only one hash for the set, and fulfilling all the set's promises when completed.
This diff and subsequent diffs also are increasing the amount of duplicate code between getBobBatch, getBlobMetadataBatch and getTreeBatch. This will be address in subsequent diffs - see T165043385.
Reviewed By: kmancini
Differential Revision: D49663711
fbshipit-source-id: 8f08b1e577c791550f446f86ddf301721b9587bf
Summary: In order to pass the requestInfo map to deeper layers, thread the Thrift layer `ObjectFetchContext` to `HgDatapackStore`. This is not consumed anywhere yet, but will passed to the sapling_backing_store in follow up diffs. This is part of a broader effort to thread a common client identifier from scm clients (sapling and edenfs to Mononoke)
Reviewed By: jdelliot
Differential Revision: D49795940
fbshipit-source-id: 67b2f703caae403f9d93fcb010b810a61437017d
Summary: In order to pass the `requestInfo` map to deeper layers, thread the Thrift layer `ObjectFetchContext` in to `HgBackingStore` to `importTreeManifestImpl()` . This is not consumed anywhere yet, but will passed to the `sapling_backing_store` in follow up diffs. This is part of a broader effort to thread a common client identifier from scm clients (`sapling` and `edenfs` to Mononoke)
Reviewed By: jdelliot
Differential Revision: D49718778
fbshipit-source-id: 7296319b220cf75cee5ca820c5cc4ce1efacfae2
Summary: In order to pass the `requestInfo` map to deeper layers, thread the Thrift layer `ObjectFetchContext` to `BackingStore::importManifestForRoot`. This is not consumed anywhere yet, but will passed to the `sapling_backing_store` in follow up diffs. This is part of a broader effort to thread a common client identifier from scm clients (`sapling` and `edenfs` to Mononoke)
Reviewed By: jdelliot
Differential Revision: D49718772
fbshipit-source-id: 0249afe827da68f2916976c180844e52a0d3df0b
Summary: In order to pass the `requestInfo` map to deeper layers, thread the Thrift layer `ObjectFetchContext` to `DiffContext` and `StatsFetchContext`. This is not consumed anywhere yet, but will passed to the `sapling_backing_store` in follow up diffs. This is part of a broader effort to thread a common client identifier from scm clients (`sapling` and `edenfs` to Mononoke)
Reviewed By: jdelliot
Differential Revision: D49716042
fbshipit-source-id: b5df9754058e1f1585954ff6c25bbafa9aa820c1
Summary: This was a remnant of the old HgProxyHash scheme.
Reviewed By: kmancini
Differential Revision: D49753911
fbshipit-source-id: 9883c2015f0c9d660122ff3bfdf2efff88aa00f3
Summary: In order to pass the `requestInfo` map to deeper layers, thread the Thrift layer `ObjectFetchContext` to `CheckoutContext`. This is not consumed anywhere yet, but will passed to the `sapling_backing_store` in follow up diffs. This is part of a broader effort to thread a common client identifier from scm clients (`sapling` and `edenfs` to Mononoke)
Reviewed By: jdelliot
Differential Revision: D49716030
fbshipit-source-id: 7b079e554711ec610dd919243c7e459b2737fa48
Summary: Rust is [opinionated](https://rust-lang.github.io/api-guidelines/naming.html) about the case of enum values. Change this type so that its values match.
Reviewed By: quark-zju
Differential Revision: D49782706
fbshipit-source-id: 4906729a7fe0fcd6e4618cd4b6c139a30ec2078a
Summary: Add logging of client request info objects to scuba in the derviation worker. This requires us to stop using a single context object, and instead construct a new one for each request. For now we just create a new client info. We will accept one from the derived data service in a later diff.
Reviewed By: RajivTS
Differential Revision: D49743404
fbshipit-source-id: 68b16b265593a8a4f487976f96e4d3e62904b543
Summary:
The naming conventions for Rust are that `into_X` methods usually take `self` by value. A method that takes `&self` and produces something else is usually termed `to_X`. Make this change for `ClientInfo`.
Additionally add a `from_json` method so we can easily parse it, too.
Reviewed By: quark-zju
Differential Revision: D49743401
fbshipit-source-id: 1f40e123eb7c4945ab6cdd069c991f3b54aef130
Summary:
Now that we can pass additional information from the derived data service to the derivation workers, start by passing the enqueue time and logging the approximate time spent on the queue.
Note that this is approximate as we rely on the clocks on the derived data service and derivation workers being in sync.
Reviewed By: mitrandir77
Differential Revision: D49689667
fbshipit-source-id: 86c09bc8c8800f10cb25c73831619bb7f5ee3bfe
Summary:
'ipdb' is a nice to have feature, import related errors should not
break the system.
Reviewed By: quark-zju
Differential Revision: D49868343
fbshipit-source-id: caa7767352f1412b7e0531416e0778dcea03f0bf
Summary:
One surprising behavior of the ObjectStore is that it leaves deduplication of
get operations to the BackingStore, the side effect is that all the work
performed between the ObjectStore to the deduplication logic might get
duplicated.
For instance, the LocalStoreCachedBackingStore will call into the underlying
BackingStore and then cache the result to the LocalStore. In the case where
concurrent operations are done for the same ObjectId, the result will be stored
several times to the LocalStore.
Taking an extreme example, calling the Thrift API `getSHA1` with the same path
10k times will force the all the Trees all the way to the final path to be
written potentially 10k times to the LocalStore.
Several solutions can be taken to fix this. An obvious one would be to let the
underlying BackingStore bubble up when the operation was deduplicated and not
perform LocalStore caching, but unfortunately this ignores the case where data
was available in local caches and thus not duplicated. Another potential
solution would be to do some deduplication at the `getSHA1` level, this would
unfortunately be a very ad-hoc solution that would need to be done for
getBlake3 and other Thrift calls.
The solution taken in this diff is to simply deduplicate the BackingStore
operations. The main downsides being the overhead of the deduplication map as
well as the inability to priority boost fetches.
Another benefit of this diff has to do with performance and memory. Let's
consider the case where `getSHA1` is called on 1M paths, all of which under a
common directory P. The Tree for P is neither present in Mercurial's caches,
nor in the LocalStore. In that scenario, the `getSHA1` Thrift handler will
trigger the fetch of P which will quickly complete and thus be present in
Mercurial's caches, but due to the use of `deferValue` in the
LocalStoreCachedBackingStore, the Tree for P will not be cached in the
LocalStore until the `getSHA1` Thrift call returns. At this point, all the
subsequent paths will hit the Mercurial cache, allocate a new Tree, and get
blocked in the deferValue mentioned above: the Tree for P may end up being
duplicated in memory many thousands of time, and depending on how big that Tree
is, EdenFS may need to allocate several GB of memory to hold these.
Switching the deferValue to use ImmediateFuture will be done in a subsequent
diff.
Reviewed By: jdelliot
Differential Revision: D48925204
fbshipit-source-id: ab2f6cce1e868b0d80b27328d99a62dae04ec3dd
Summary:
fix(build): correctly set target-specific variable values
Previously, the `HGNAME` and `OSS` variables were not being set correctly when
building with `make oss`. In my version of GNU Make on Ubuntu 22.04:
```
GNU Make 4.3
Built for x86_64-pc-linux-gnu
```
instead of the `OSS` variable being set to `true` and `HGNAME` being set to
`sl`, the `OSS` variable was being set to `true HGNAME=sl`. This was causing
the eden binary to be built with the default `HGNAME` value of `hg`.
Here is a minimal example of the issue:
```makefile
HG_NAME = hg
oss: OSS=true HG_NAME=sl
oss: local
local:
fuzic "OSS is $(OSS)"
fuzic "Building for $(HG_NAME)"
```
On my machine `make oss` prints
```
OSS is true HG_NAME=sl
Building for hg
```
However,
```makefile
HG_NAME = hg
oss: OSS=true
oss: HG_NAME=sl
oss: local
local:
fuzic "OSS is $(OSS)"
fuzic "Building for $(HG_NAME)"
```
prints
```
OSS is true
Building for sl
```
Pull Request resolved: https://github.com/facebook/sapling/pull/741
Test Plan:
1. On Ubuntu 22.04 and GNU Make 4.3, build sapling with `make oss`
2. should see an error like
```bash
running build_mo
rm -f hg
cp build/scripts-3*/hg hg
cp: cannot stat 'build/scripts-3*/hg': No such file or directory
make: *** [Makefile:85: local] Error 1
```
3. apply this patch and repeat step 1
4. `sl` should build successfully
Reviewed By: quark-zju
Differential Revision: D49870741
Pulled By: zzl0
fbshipit-source-id: f31017fc647167a17871d46712f91355b4ce5a68
Summary: This diff allows the GC to log the total storage footprint for each run. This allows us to find out the storage footprint changes along with the time easier.
Reviewed By: YousefSalama
Differential Revision: D49864056
fbshipit-source-id: c5a9117f8883e0ac696d3bd9646dc3a0958799e2
Summary:
I noticed a lot of these logs when running gitexport for larger directories...
This can be printed `O(# of commits)`, so I think it should probably be a `debug` instead of `info`.
Differential Revision: D49772392
fbshipit-source-id: e52c6e1be5fea0140e292656844a0e5b310c7291
Summary: This diff implements the necessary changes to ensure that when direct-`gitimport`ing a repo, the imported tags are recorded along with the tag hash of the Git object ID of the tag. Note that this just applies for direct `gitimport`. Support for `remote-gitimport` would be added in a follow up diff.
Reviewed By: mzr
Differential Revision: D49805885
fbshipit-source-id: 18502476298acaf9f04ab76c9de9f395018b9247