Summary: Similarly to the changes made for `get`, the same can be applied to prefetch.
Reviewed By: DurhamG
Differential Revision: D22565609
fbshipit-source-id: 0fbc1a0086fa44593a6aaffb746ed36b3261040c
Summary:
When using LFS, it's possible that a pointer may be present in the local
LfsStore, but the blob would only be in the shared one. Such scenario can
happen after an upload, when the blob is moved to the shared store for
instance. In this case, during a `get` call, the local LFS store won't be able
to find the blob and thus would return Ok(None), the shared LFS store woud not
be able to find the pointer itself and would thus return Ok(None) too. If the
server is not aware of the file node itself, the `ContentStore::get` would also
return Ok(None), even though all the information is present locally.
The main reason why this is happening is due to the `get` call operating
primarily on file node based keys, and for content-based stores (like LFS),
this means that the translation layer needs to be present in the same store,
which in some case may not be the case. By allowing stores to return a
`StoreKey` when progress was made in finding the key we can effectively solve
the problem described above, the local store would translate the file node key
onto a content key, and the shared store would read the blob properly.
Reviewed By: DurhamG
Differential Revision: D22565607
fbshipit-source-id: 94dd74a462526778f7a7e232a97b21211f95239f
Summary: Add an optional `edenapi` argument to metadatastore that allows using EdenAPI in place of the SSH remote store.
Reviewed By: quark-zju
Differential Revision: D22492535
fbshipit-source-id: eba034c9ba86c79c9a9dee6bab3ff615d0575b6f
Summary: Reimplement `EdenApiHgIdRemoteStore` as `EdenApiRemoteStore<T>`, where `T` is a marker type indicating whether this store fetches files or trees. This allows working with the stores in a more strongly-typed way, and avoid having to check what kind of store this is at runtime when fetching data.
Reviewed By: quark-zju
Differential Revision: D22492160
fbshipit-source-id: e17556093fa9b81d2301f281da36d75a03e33c5e
Summary:
We're seeing cases were cloning can take 10's of GB of memory because
we pend all the history information in memory. Let's flush the history info
every 10 million adds to bound the memory usage.
10 million was chosen somewhat arbitrarily, but it results in pack files that
are 800MB, which corresponds roughly with 8GB of memory usage.
This requires updating repack to be aware that a single flush could produce
multiple packs. Note, since repack writes via this same path, it may result in
repack producing multiple pack files. In the degenerate case repack could
produce the same number (or more) of pack files than was inputted. If we set the
threshold high enough I think we'll be fine though. 800MB is probably
sufficient.
Reviewed By: xavierd
Differential Revision: D22438569
fbshipit-source-id: 425d5d3b7999b81e44d1dbe1f2a4ea453ab6ca4f
Summary: Per comments on D22429347, add a new `ExtractInnerRef` trait that is similar to `ExtractInner`, but returns a reference to the underlying value. A default implementation is provided for types whose inner value is `Clone + 'static`, so in practice most types will only need to implement `ExtractInnerRef`, whereas the callsite may choose whether it needs a reference or an owned value.
Reviewed By: quark-zju
Differential Revision: D22464158
fbshipit-source-id: 7b97329aedcddb0e51fd242b519e79eba2eed350
Summary: Add add a `edenapistore` class to that wraps a `EdenApiHgIdRemoteStore`. This class is purely used as a means to set up the stores from Python code, and is only used as a way to get an `Arc<EdenApiHgIdRemoteStore>` to the Rust content store. It has no functionality of its own.
Reviewed By: quark-zju
Differential Revision: D22449702
fbshipit-source-id: ad2094c79da523071b6ed8344c8dde706e448c95
Summary:
A common pattern in Mercurial's data storage layer Python bindings is to have a Python object that wraps a Rust object. These Python objects are often passed across the FFI boundary to Rust code, which then may need to access the underlying Rust value.
Previously, the objects that used this pattern did so in an ad-hoc manner, typically by providing an `into_inner` or `to_inner` inherent method. This diff introduces a new `ExtractInner` trait that standardizes this pattern into a single interface, which in turn allows this pattern to be used with generics.
Reviewed By: quark-zju
Differential Revision: D22429347
fbshipit-source-id: cab4c24b8b98c6ef8307f72a9b4726aabdc829cc
Summary:
We've seen a handful of users complaining about clone failing and not being
able to recover from it. From looking at the various reports and the
stacktraces, I believe this is caused by a flaky connection on the user end
that causes the Python code to retry the getpack calls. Before retrying, the
code will figure out what still needs fetching and this is done via the
getmissing API. When LFS pointers were fetched, the LFS blobs aren't yet
present on disk, and thus the underlying ContentStore::get_missing will a set
of keys that contain some StoreKey::Content keys. The code would previously
fail at this point, but since the key also contains the original key, we can
simply return this, the pointers might be refetched but these are fairly small.
Taking a step back from this bug, the issue really is that the retry logic is
done in code that cannot understand content-keys, and moving it to a part of
the code that understands this would also resolve the issue.
I went with the simple approach for now, but since other remote stores
(EdenAPI, the LFS one, etc) would also benefit from the retry logic, we may
want to move the logic into Rust and remove the getmissing API from the Python
exposed ContentStore.
Reviewed By: DurhamG
Differential Revision: D22425600
fbshipit-source-id: 69c2898cc302d2170cd0f206c89189c341db5278
Summary:
In the case where an expensive network operation is involved, holding the GIL
would mean that Mercurial cannot display progress bars, or simply cannot be
interrupted. This is a less than ideal user experience.
To fix this, let's release the GIL whenever we enter into the revisionstore
code.
Reviewed By: DurhamG
Differential Revision: D22140399
fbshipit-source-id: 131c8cf81e39128810e0f20d1922b5681a33d95a
Summary:
When we got rid of the delta logic, we also needed to get rid of some
unused functions.
Reviewed By: singhsrb
Differential Revision: D21725043
fbshipit-source-id: ac069e6b0468e2275f353a9970b8971b5a2cfa23
Summary:
Pass `configparser::config::ConfigSet` to `repack` in
`revisionstore/src/repack.rs` so that we can use various config values in `filter_incrementalpacks`.
* `repack.maxdatapacksize`, `repack.maxhistpacksize`
* The overall max pack size
* `repack.sizelimit`
* The size limit for any individual pack
* `repack.maxpacks`
* The maximum number of packs we want to have after repack (overrides sizelimit)
Reviewed By: xavierd
Differential Revision: D21484836
fbshipit-source-id: 0407d50dfd69f23694fb736e729819b7285f480f
Summary:
When Qing implemented all the get method, the translate_lfs_missing function
didn't exist, and I forgot to add them in the right places when landing the
diff that added it. Fix this.
Reviewed By: sfilipco
Differential Revision: D21418043
fbshipit-source-id: baf67b0fe60ed20aeb2c1acd50a209d04dc91c5e
Summary:
Remove HgIdDataStore::get_delta and all implementations. Remove HgIdDataStore::get_delta_chain from trait, remove all unnecessary implentations, remove all implementations from public Rust API. Leave Python API and introduce "delta-wrapping".
MutableDataPack::get_delta_chain must remain in some form, as it necessary to implement get using a sequence of Deltas. It has been moved to a private inherent impl.
DataPack::get_delta_chain must remain in some form for the same reasons, and in fact both implenetations can probably be merged, but it is also used in repack.rs for the free function repack_datapack. There are a few ways to address this without making DataPack::get_delta_chain part of the public API. I've currently chosen to make the method pub(crate), ie visible only within the revisionstore crate. Alternatively, we could move the repack_datapack function to a method on DataPack, or use a trait in a private module, or some other technique to restrict visibility to only where necessary.
UnionDataStore::get has been modified to call get on it's sub-stores and return the first which matches the given key.
MultiplexDeltaStore has been modified to implement get similarly to UnionDataStore.
Reviewed By: xavierd
Differential Revision: D21356420
fbshipit-source-id: d04e18a0781374a138395d1c21c3687897223d15
Summary:
While the change looks fairly mechanical and simple, the why is a bit tricky.
If we follow the calls of `ContentStore::get`, we can see that it first goes
through every on-disk stores, and then switches to the remote ones, thanks to
that, when we reach the remote stores there is no reason to believe that the
local store attached to them contains the data we're fetching. Thus the
code used to always prefetch the data, before reading from the store what was
just written.
While this is true for regular stores (packstore, indexedlog, etc), it starts
to break down for the LFS store. The reason being that the LFS store is
internally represented as 2 halves: a pointer store, and a blob store. It is
entirely possible that the LFS store contains a pointer, but not the actual
blob. In that case, the `get` executed on the LFS store will simply return
`Ok(None)` as the blob just isn't present, which will cause us to fallback to
the remote stores. Since we do have the pointer locally, we shouldn't try to
refetch it from the remote store, and thus why a `get_missing` needs to be run
before fetching from the remote store.
As I was writing this, I realized that all of this subtle behavior is basically
the same between all the stores, but unfortunately, doing a:
impl<T: RemoteDataStore + ?Sized> HgIdDataStore for T
Conflicts with the one for `Deref<Target=HgIdDataStore>`. Macros could be used
to avoid code duplication, but for now let's not stray into them.
Reviewed By: DurhamG
Differential Revision: D21132667
fbshipit-source-id: 67a2544c36c2979dbac70dac5c1d055845509746
Summary:
Ideally, either the ContentStore, or the upper layer should verify that we
haven't missed uploading a blob, which could lead to weird behavior down the
line. For now, all the stores will return the keys of the blobs that weren't
uploaded, which allows us to return these keys to Python.
Reviewed By: DurhamG
Differential Revision: D21103998
fbshipit-source-id: 5bab0bbec32244291c65a07aa2a13aec344e715e
Summary: This method will be used to upload local LFS blobs to the LFS server.
Reviewed By: DurhamG
Differential Revision: D20843137
fbshipit-source-id: 33a331c42687c47442189ee329da33cb5ce4d376
Summary:
All of the callers are already using an Arc, so instead of forcing the remote
store to be cloneable, and thus wrap an inner self with an Arc, let's just pass
self as an Arc.
Reviewed By: DurhamG
Differential Revision: D20715580
fbshipit-source-id: 1bef23ae7da7b314d99cb3436a94d04134f1c0e4
Summary:
When LFS will be enabled on fbsource, the enablement will rolled out server,
with the server serving pointers (or not). In the catastrophic scenario where
Mononoke has to be rolled out, the Mononoke LFS server will be unable to serve
blobs, but some clients may still have LFS pointers locally but not the
corresponding blob. For this, we need to be able to fallback to fetching the
blob via the getpackv2 protocol.
Reviewed By: DurhamG
Differential Revision: D20662667
fbshipit-source-id: 4ac45558f6d205cbd1db33c21c6fb137a81bdbd5
Summary:
One of the main drawback of the current version of repack is that it writes
back the data to a packfile, making it hard to change file format. Currently, 2
file format changes are ongoing: moving away from packfiles entirely, and
moving from having LFS pointers stored in the packfiles, to a separate storage.
While an ad-hoc solution could be designed for this purpose, repack can
fullfill this goal easily by simply writing to the ContentStore, the
configuration of the ContentStore will then decide where this data will
be written into.
The main drawback of this code is the unfortunate added duplication of code.
I'm sure there is a way to avoid it by having new traits, I decided against it
for now from a code readability point of view.
Reviewed By: DurhamG
Differential Revision: D20567118
fbshipit-source-id: d67282dae31db93739e50f8cc64f9ecce92d2d30
Summary: There is no need to pass the path twice, once is enough.
Reviewed By: DurhamG
Differential Revision: D20567117
fbshipit-source-id: 169a088aa7558b4c25f8fbdecfe59bdd0d7575ef
Summary:
This will enables the fast-path for comparing LFS blobs without reading the
entire blob.
Reviewed By: DurhamG
Differential Revision: D20504233
fbshipit-source-id: 446cec57fba77e02cc7070203bd759d341fc01ab
Summary:
This type can either be a Mercurial type key, or a content hash based key. Both
the prefetch and get_missing now can handle these properly. This is essential
for stores where data can either be fetched in both ways or when the data is
split in 2. For LFS for instance, it is possible to have the LFS pointer (via
getpackv2), but not the actual blob. In which case get_missing will simply
return the content hash version of the StoreKey, to signify what it actually
has missing.
Reviewed By: quark-zju
Differential Revision: D20445631
fbshipit-source-id: 06282f70214966cc96e805e9891f220b438c91a7
Summary:
Similarly to the DataStore trait, this makes it easier to understand that they
deal with a Mercurial type Key.
Reviewed By: quark-zju
Differential Revision: D20445621
fbshipit-source-id: a1143d5f5d6a2c8686d517a6ea3c25b07c0df072
Summary: This makes it clear that these traits are dealing with Mercurial Key.
Reviewed By: quark-zju
Differential Revision: D20445626
fbshipit-source-id: d5acbf442e9407b973e95e40af69b5a61bff0a4d
Summary:
The clients should use an Rc/Arc if they need the ability to clone it. This
makes it more obvious and reduces the number of pointer indirection.
Reviewed By: quark-zju
Differential Revision: D20376839
fbshipit-source-id: c56e7e8f89ab17727be621894c329e344a7f3adb
Summary:
During `hg update`, Mercurial forks multiple processes to write files on disk
concurrently, this is done as fetching blobs from the content store, and
writing them to disk is CPU bound. Usually, threads would be the preferred way
of speeding up such process, but unfortunately, Python has GIL that severely
limit the available concurrency. So, multiple processes were chosen.
Unfortunately, the multi-process solution also brings a lot of other issues,
more recently, we've had cases where the connections to the server and memcache
had to be dropped after the fork. In some other cases, this caused deadlocks.
And the solution is not effective on Windows.
Now that Mercurial is getting more and more Rust, we could instead go back to
the threads solution by using them in Rust, and have Python just push work to
them, this is exactly what this change does.
Things that are left to be done, but I wanted to get a diff out first:
- no file path audit
- no file backup
- no symlink creation
- probably other things I'm missing
Reviewed By: quark-zju
Differential Revision: D20102888
fbshipit-source-id: d47829fd7818b97710586b9851880f178048e27b
Summary: This makes it friendly to Python 2.
Reviewed By: sfilipco
Differential Revision: D20162233
fbshipit-source-id: 5beb7a0f52159afc454332ff6e37e13087177cc0
Summary:
With the Arc embedded into the store themselves, this forces a second
allocation in order to use them as trait objects. Since in most cases, we do
not want the stores themselves to be cloneable, we can move the Arc outside and
thus reduce the number of pointer indirection.
Reviewed By: DurhamG
Differential Revision: D19867568
fbshipit-source-id: 9cd126831fe2b9ee715472ac3299b7a09df95fce
Summary:
This allows the Python code to build a memcache client and build ContentStore
and MetadataStore with it.
Reviewed By: DurhamG
Differential Revision: D19518694
fbshipit-source-id: d932fd5223ccfdf37db69cbb54a11a6571312709
Summary: This was missed while converting pyrevisionstore to PyPathBuf.
Reviewed By: markbt
Differential Revision: D19672441
fbshipit-source-id: 466a66f2e00c7f73c11a8989c22508560f423e0e
Summary: `PyPath` is to `PyPathBuf` as `Path` is to `PathBuf` and `str` is to `String`.
Reviewed By: quark-zju
Differential Revision: D19647995
fbshipit-source-id: 841a5f6fea295bc72b00da028ae256ca38578504
Summary:
`PyPath` is the type that owns the data. Rename it to `PyPathBuf` for analogy
with `PathBuf` and `RepoPathBuf`, and to allow us to introduce a reference type
named `PyPath`.
Reviewed By: xavierd
Differential Revision: D19643797
fbshipit-source-id: 56d80fea5677f7223e967b0723039d1763b26f68
Summary:
The names are really RepoPath, and thus let's use PyPath as a way to pass these
in.
Reviewed By: quark-zju, sfilipco
Differential Revision: D19625372
fbshipit-source-id: 4802030b91e6a065d3cb1905c770cad8a86da510
Summary:
Set up the `cpython-ext` and `hgcommands` libraries so that they can compile
against py2 and py3 versions of rust-cpython. Make py2 the default so
that cargo test still works.
Reviewed By: singhsrb
Differential Revision: D19615656
fbshipit-source-id: 3403e7077deb3c0a9dfe0e3b7d4f4ad1da73bba3
Summary:
This makes `make hg3` work. It requires cleaning up the `build` directory when
switching between py2 and py3 build, which will be fixed later.
Reviewed By: DurhamG
Differential Revision: D19604824
fbshipit-source-id: 060ff313420126a5dba935c4451b45dc9af45f13
Summary: Update to the new version of rust-cpython. This supports `list.append`, so make use of it.
Reviewed By: xavierd
Differential Revision: D19590905
fbshipit-source-id: 03609d4f698ae8e4380e82b8144caaa205b4c2d4
Summary:
For Python3 compatibility, let's use PyPath, it hides the logic of encoding for
Python2
Reviewed By: DurhamG
Differential Revision: D19590024
fbshipit-source-id: 7bed134a500b266837f3cab9b10604e1f34cc4a0
Summary: This can prevent potential moves and clones on the caller of prefetch.
Reviewed By: quark-zju
Differential Revision: D19518697
fbshipit-source-id: 63839fc3f4bb9ca420e290eabaffb481a3584f7b
Summary:
There are few cases where we need to prefetch data that is already present in
the local store, but the server holds more up-to-date data. For instance, after
pushing a change to a pushrebase server, the linkrev for the just pushed commit
is no longer valid, and the history data is re-fetched from the server to fix
it.
For now, let's keep the fix for this contained in Python as we will move away
from linkrev altogether in the future, by then, this change will be unecessary
and can be reverted.
I'm overall not a big fan of the hacks needed here, suggestions are welcomed for
a cleaner way to achieve this :)
Reviewed By: DurhamG
Differential Revision: D19412620
fbshipit-source-id: 331f3e4dbb51bcd687149370a62c60c325534f60
Summary:
In order to avoid pathological linkrevfixup after pushing a change to a
pushrebase repo, we need to be able to fetch history data that is already
present locally from the server. Since the Rust stores will always check
whether the data is present locally before fetching it, we would not be
fetching anything, causing the pathological linkrevfixup to kick in.
By allowing the stores to be built without a local component, the prefetch
function will not find the local history, and will thus be able to fetch it
properly.
Reviewed By: DurhamG
Differential Revision: D19412619
fbshipit-source-id: 421c59c63634ead7f98e6ba89da0532067f7b412
Summary:
Python errors wrapped in a Rust Error would be incorrectly changed into a
RuntimeError instead of the original one. The map_pyerr method does the right
thing, and is also shorter, let's use it.
Reviewed By: quark-zju
Differential Revision: D19313883
fbshipit-source-id: 1ecb41a8eef9d41618905d6d00f199252d373f96
Summary:
We no longer use the `failure` crate. Rename the module.
`cpython-failure` is dropped since it was merged into `cpython-ext`.
Reviewed By: markbt
Differential Revision: D19186693
fbshipit-source-id: 410d1491bcadd8d3272c7e2df08ecbe66fc0fef2
Summary:
The `map_pyerr<PE>` API is kind of hard to use correctly since the
code that calls `map_pyerr` does not really know the best suitable error type.
The existing code also "encourage"s the use of stdlib error types like
RuntimeError, ValueError, etc since the code is shorter, which is not great.
Error types from `mercurial.error` are better choices.
With previous diffs in this stack, we now decide the Python error type based
on the Rust error type. It's no longer necessary to specify an error type
with `map_pyerr<PE>`. So let's just drop the type parameter.
A `RustError` was added for error types that `cpython-ext` does not know how
to convert. We should avoid leaking it to the Python land by just implementing
how to convert `anyhow::Error` to `PyErr` in `cpython-ext`.
Reviewed By: markbt
Differential Revision: D19169527
fbshipit-source-id: f4563c36174cd51201b526bbc92a3f1c8a3da864
Summary: This should allow us to show a spinning progress bar when repair() is working.
Reviewed By: xavierd
Differential Revision: D19056949
fbshipit-source-id: 8e058bad002bb2de29dbf8457bead0ebd1e99372
Summary: Expose the repair API to the Python world.
Reviewed By: xavierd
Differential Revision: D18737913
fbshipit-source-id: c31085727589b6938c2fafb28897925aea617bc4
Summary:
This simplifies the logic a bit by getting a free `repair` method from
indexedlog.
Reviewed By: xavierd
Differential Revision: D18737908
fbshipit-source-id: 4988c1a83b7709b751cd1899c5663acc0c42e313
Summary:
This diff replaces eden's dependencies on failure::Error with anyhow::Error.
Failure's error type requires all errors to have an implementation of failure's own failure::Fail trait in order for cause chains and backtraces to work. The necessary methods for this functionality have made their way into the standard library error trait, so modern error libraries build directly on std::error::Error rather than something like failure::Fail. Once we are no longer tied to failure 0.1's Fail trait, different parts of the codebase will be free to use any std::error::Error-based libraries they like while still working nicely together.
Reviewed By: xavierd
Differential Revision: D18576093
fbshipit-source-id: e2d862b659450f2969520d9b74877913fabb2e5d
Summary:
This diff is preparation for migrating off of failure::Fail / failure::Error for errors in favor of errors that implement std::error::Error. The Fallible terminology is unique to failure and in non-failure code we should be using Result<T>. To minimize the size of the eventual diff that removes failure, this codemod replaces all use of Fallible with Result by:
- In modules that do not use Result<T, E>, we import `failure::Fallible as Result`;
- In modules that use a mix of Result<T, E> and Fallible<T> (only 5) we define `type Result<T, E = failure::Error> = std::result::Result<T, E>` to allow both Result<T> and Result<T, E> to work simultaneously.
Reviewed By: Imxset21
Differential Revision: D18499758
fbshipit-source-id: 9f5a54c47f81fdeedbc6003cef42a1194eee55bf