Commit Graph

34 Commits

Author SHA1 Message Date
Xavier Deguillard
34bce8690f revisionstore: silence compiler warning
Summary:
Don't restrict constructing a c_api datapack store to only Unix, we can
construct it on Windows too by assuming that their path will be valid UTF-8.

Reviewed By: quark-zju

Differential Revision: D20250718

fbshipit-source-id: 07234b6a71b50c803cfe3b962fa727f57037c919
2020-03-05 09:35:57 -08:00
Xavier Deguillard
6fac9ebad0 revisionstore: add a get_stripped method to ContentStore
Summary:
This new method returns the content of a blob without the copy-from metadata
header.

Reviewed By: DurhamG

Differential Revision: D20102889

fbshipit-source-id: e96f636b7d30460b59707a2cb700d667e616116a
2020-02-27 12:29:42 -08:00
Xavier Deguillard
934b64397b convert to bytes 0.5
Summary:
The bytes 0.5 is a depencency of newer tokio, it's also newer, and thus better.
Staying on 0.4 means that copies between Bytes 0.4 and 0.5 need to be done,
this will be especially bad in the LFS code since 10+MB buffer will have to be
copied...

One main API change is for the configparser. The code used to take Into<Bytes>
for the keys, I switched it to AsRef<[u8]>.

For hg_memcache_client, an extra copy is performed to build a Delta, since this
code uses an old tokio, and is being replaced right now, the effort of
switching to a new tokio and new bytes was not deemed worth it, the copy will
do for now.

Reviewed By: dtolnay

Differential Revision: D20043137

fbshipit-source-id: 395bfc3749a3b1bdfea652262019ac6a086e61e0
2020-02-24 10:28:46 -08:00
Xavier Deguillard
44c4f2f5d9 revisionstore: add copyfrom information to the LFS pointer
Summary:
Mercurial filenode hash is computed by including the copy information in the
blob header. Before computing the blob content hash, or returning it to the
upper layers, we need to either strip or reconstruct this header appropriately.

Reviewed By: DurhamG

Differential Revision: D19975887

fbshipit-source-id: 7555e7219e50f4d18ec677fdecc216ee705d7af4
2020-02-20 14:28:52 -08:00
Xavier Deguillard
7fb75ce4f0 lfs: move contenthash computation to the enum impl
Summary: This will make it easier to support more hash schemes in the future.

Reviewed By: DurhamG

Differential Revision: D19975888

fbshipit-source-id: 8b8ce3b20d72199bac3cd20a48475b5ab56bfc52
2020-02-20 14:28:52 -08:00
Xavier Deguillard
cd56a8b39a revisionstore: move Arc outside of the stores
Summary:
With the Arc embedded into the store themselves, this forces a second
allocation in order to use them as trait objects. Since in most cases, we do
not want the stores themselves to be cloneable, we can move the Arc outside and
thus reduce the number of pointer indirection.

Reviewed By: DurhamG

Differential Revision: D19867568

fbshipit-source-id: 9cd126831fe2b9ee715472ac3299b7a09df95fce
2020-02-20 14:28:52 -08:00
Xavier Deguillard
7c1a623d8a revisionstore: add the LfsStore to the ContentStore
Summary:
The ContentStore now can read LFS blobs from both the shared cache, and the
local store.

Reviewed By: DurhamG

Differential Revision: D19866249

fbshipit-source-id: a6fb3523495e9d3832613b56438f631cfa552b91
2020-02-20 14:28:51 -08:00
Xavier Deguillard
58d9d92e88 revisionstore: simplify ContentStore/MetadataStore initialization a bit
Summary:
With the LFS store being added, and the indexedlog being soon used for trees,
this simplification should help in formalizing the hierarchy of files/folders.

It will look like the following:
  <root dir>/lfs: for the lfs store
  <root dir>/indexedlog*: for the indexedlog
  <root dir>/foobar: for a hypothetical foobar store

For manifests, <root dir> will therefore be: <store dir>/manifests. The
unfortunate part is that the current tree data lives under
<store dir>/packs/manifests. As packfiles will be replaced, this small
discrepency is acceptable.

Reviewed By: DurhamG

Differential Revision: D19866248

fbshipit-source-id: 7ef59ef7df19149b19a529b4f4a45a479cc9d23b
2020-02-20 14:28:51 -08:00
Xavier Deguillard
f512b5658d revisionstore: add an LfsStore
Summary:
This is the first step in having a stronger integration between LFS blobs and
the ContentStore abstraction. The 2 main difference between the Python based
LFS implementation and this one are:
 - pointers are not stored alongside plain data,
 - blobs are split between local and shared blobs

As of now, no reclamation is being performed for shared blobs, blobs aren't
fetched or uploaded. This will come in future diffs.

Reviewed By: DurhamG

Differential Revision: D19859291

fbshipit-source-id: 45000fc574e6fbd6d3487f4966cad4f49dab731c
2020-02-20 14:28:51 -08:00
Xavier Deguillard
17cc9ab5ab revisionstore: add a wrapper around IndexedLog/RotateLog
Summary:
In some cases, higher level stores may want to store data in either a plain
IndexedLog, or in a RotateLog, for local and shared data. Due to slight
difference between the 2, they can't easily be adapted into a common trait.

Instead let's just wrap both into an enum and implement the main functions that
the higher level stores need.

The first use of this will be the LfsStore, future use will include the
IndexedLogDataStore and the IndexedLogHistoryStores.

Reviewed By: DurhamG

Differential Revision: D19859292

fbshipit-source-id: 920572e0cf5f69bda4901a727a6b0dc0f08fc8d0
2020-02-18 08:32:32 -08:00
Xavier Deguillard
7bb3e384d8 remotefilelog: append the repo name to memcache key
Summary:
Up to now, this has been done in chef, and thus for repos that we do not list,
they may share the memcache keys, with potential unintended consequences. Let's
always add the repo name to the key, so we can simplify the code in chef.

One small negative effect of this change is that while it is being rolled out,
the cache hit rate will be impacted. This should resolve itself quickly.

Reviewed By: DurhamG

Differential Revision: D19885775

fbshipit-source-id: 0b59ce9e378b0ab70f696a39d19d27cd89921098
2020-02-14 14:10:48 -08:00
Jun Wu
03baa31789 indexedlog: switch from bytes to minibytes
Summary:
This makes it possible to use `Bytes` for mmap buffers.

The changes are because `minibytes::Bytes` does not implement `From<&[u8]>`
with the intention to make slice copy explicit.

Reviewed By: xavierd

Differential Revision: D19818719

fbshipit-source-id: c34ee451bfd2dc7bcbbcebd52a76444b6c236849
2020-02-12 13:57:37 -08:00
Xavier Deguillard
a4b83e384a revisionstore: add tracing point for memcache
Summary:
These tracing points will help us understand the memcache hit rate as well as
the fetching speed.

Reviewed By: quark-zju

Differential Revision: D19836499

fbshipit-source-id: 1936c44efc3e7715069e6a959f5331139d591d5c
2020-02-12 10:38:59 -08:00
Xavier Deguillard
2c4a10bf4b revisionstore: move memcache set to a background thread
Summary:
Everytime a cache miss is seen, the data fetched from the server will be sent
directly to memcache for future use. Unfortunately, doing so in a blocking
manner severely impact the overall fetching speed from the server. Since
memcache is purely an optimization, we can afford to send data to it
asynchronously.

Let's move as much as possible of the code to a background thread to reduce the
overhead of memcache.

Reviewed By: DurhamG

Differential Revision: D19836011

fbshipit-source-id: 68e506ef7464d6e99d98457d0d37178f514be1a9
2020-02-12 10:38:59 -08:00
Xavier Deguillard
dc7f7908ef revisionstore: prefetch data with get_iter
Summary:
Instead of fetching data one-by-one, let's prefetch data concurrently by using
the new get_iter function.

Reviewed By: DurhamG

Differential Revision: D19836009

fbshipit-source-id: 4a50328c0cbbba677c2de3777ebe4c34cb10c1e2
2020-02-12 10:38:58 -08:00
Xavier Deguillard
8b082a18f7 revisionstore: don't prefetch with an empty key set
Summary:
Even when memcache would be able to prefetch everything, this would always call
into the underlying remote store with an empt key set. For things like `hg
prefetch` and a large number of keys, the effect of doing that is minimum, but
for EdenFS or `hg log -p`, the roundtrip to the server for every file/revision
would add a significant amount of overhead. Let's simply stop iterating when we
no longer need to fetch anything.

Reviewed By: DurhamG

Differential Revision: D19835797

fbshipit-source-id: 54ad704428c3b20d973cfa87f7171899ec44b3f9
2020-02-11 18:05:16 -08:00
Xavier Deguillard
6ea4bb998e revisionstore: move memcache initialization to a background thread
Summary:
As initializing the memcache client takes ~0.7s, let's move it to a background
thread as to not impact Mercurial startup time. This diff uses ArcSwap in
order to reduce the overhead of the very common read paths as much as possible.
Using Mutex or RwLock instead would have caused unecessary contention.

Reviewed By: DurhamG

Differential Revision: D19518693

fbshipit-source-id: 886e9b86813fda6ff005ccce99659890026f643a
2020-02-05 14:01:54 -08:00
Xavier Deguillard
b8947748b5 pyrevisionstore: expose the memcache client to python
Summary:
This allows the Python code to build a memcache client and build ContentStore
and MetadataStore with it.

Reviewed By: DurhamG

Differential Revision: D19518694

fbshipit-source-id: d932fd5223ccfdf37db69cbb54a11a6571312709
2020-02-05 14:01:54 -08:00
Xavier Deguillard
920ea27a17 revisionstore: add memcache client
Summary:
This enables an in-process memcache client for the Rust
ContentStore/MetadataStore. For now, this implementation is lacking several
necessary optimization:
 - Start-up time is always slowed down by ~0.7s, the initialization will be
   moved to a background thread
 - Writing data to memcache is blocking and will be moved to a background
   thread too.
 - Prefetching data does a roundtrip to memcache for every key, batching
   memcache APIs will be added.

Compared to the existing hg_memcache_client, this implementation is both
significantly shorter and do not exhibit some of the pathological behavior of
having to flush the indexedlog for every fetched blob when used in Eden.

Reviewed By: DurhamG

Differential Revision: D19518696

fbshipit-source-id: 4725447d13e7eddd9586135c2511e13ddb921771
2020-02-05 14:01:53 -08:00
Xavier Deguillard
61aaf894c3 pyrevisionstore: use PyPath instead of PyBytes
Summary:
For Python3 compatibility, let's use PyPath, it hides the logic of encoding for
Python2

Reviewed By: DurhamG

Differential Revision: D19590024

fbshipit-source-id: 7bed134a500b266837f3cab9b10604e1f34cc4a0
2020-01-28 10:01:50 -08:00
Xavier Deguillard
b6589bde84 revisionstore: prefetch takes &[Key] instead of Vec<Key>
Summary: This can prevent potential moves and clones on the caller of prefetch.

Reviewed By: quark-zju

Differential Revision: D19518697

fbshipit-source-id: 63839fc3f4bb9ca420e290eabaffb481a3584f7b
2020-01-23 08:57:22 -08:00
Xavier Deguillard
524c85d711 revisionstore: limit delta chain to 1000 entries
Summary:
We've seen a case where a datapack contains a circular delta chain, causing
Mercurial to fall into a infinite loop when trying to read it. Let's fail when
the chain is over 1000 entries.

Reviewed By: quark-zju

Differential Revision: D19458453

fbshipit-source-id: bfa503f7807122eca72cf94418abda161dafa41c
2020-01-21 08:50:59 -08:00
Xavier Deguillard
1e809fc681 revisionstore: allow building a {Content,Metadata}Store without local stores
Summary:
In order to avoid pathological linkrevfixup after pushing a change to a
pushrebase repo, we need to be able to fetch history data that is already
present locally from the server. Since the Rust stores will always check
whether the data is present locally before fetching it, we would not be
fetching anything, causing the pathological linkrevfixup to kick in.

By allowing the stores to be built without a local component, the prefetch
function will not find the local history, and will thus be able to fetch it
properly.

Reviewed By: DurhamG

Differential Revision: D19412619

fbshipit-source-id: 421c59c63634ead7f98e6ba89da0532067f7b412
2020-01-16 09:41:40 -08:00
Xavier Deguillard
d89eab8078 pyconfigparser: use String as arguments instead of PyBytes
Summary:
In the cpython bindings, the Rust String can take both PyBytes and PyUnicode
strings, which is perfect for Python3 compatibility as string literals are
PyBytes in Python2 and PyUnicode in Python3. The return values are kept as
Bytes for now as changing this is a much larger change in itself.

Other approaches tried:
 - Using PyUnicode as input/output: an extremely large codemod had to be done,
   with very little benefits
 - Using String as output: since we do have some configs that are unicode, on
   the Python side, the output might either be bytestrings or unicode, leading
   to weird bugs.

Reviewed By: DurhamG

Differential Revision: D18650466

fbshipit-source-id: aebdf30590dcae40b7df2787e5ece88e2ec9395c
2020-01-16 09:31:45 -08:00
Xavier Deguillard
c92c9d5a03 revisionstore: add some comment explaining the order of local/shared store
Summary:
Mercurial assumes some specific ordering, let's comment about this to make sure
we don't re-order and introduce subtle bugs.

Reviewed By: quark-zju

Differential Revision: D19394352

fbshipit-source-id: 0f9e02d2c6addf040311a54b8161b06bbeaa6be9
2020-01-15 15:55:11 -08:00
Xavier Deguillard
3030ad82b0 remotefilelog: fix remotefilelog.cachepath difference for Rust ContentStore
Summary:
The error message and the exception type are slightly different between the
Rust and Python ContentStore. For now, let's just fix this up manually.

Reviewed By: quark-zju

Differential Revision: D19394350

fbshipit-source-id: e432094a9dfcf605568a1890c0303b733e98d203
2020-01-15 15:55:10 -08:00
Zeyi (Rice) Fan
2f15f80957 revisionstore: make EdenApiRemoteStore to be able to fetch trees from remote
Summary: This will allow us to use EdenApi in EdenFS to fetch trees and blobs.

Reviewed By: xavierd

Differential Revision: D18622844

fbshipit-source-id: 59a9091e9f2fdbcae078da2fb24ee9c0dd18505b
2019-12-10 13:40:55 -08:00
Jun Wu
7a16996ce0 revisionstore: implement indexedlog::DefaultOpenOptions
Summary:
This simplifies the logic a bit by getting a free `repair` method from
indexedlog.

Reviewed By: xavierd

Differential Revision: D18737908

fbshipit-source-id: 4988c1a83b7709b751cd1899c5663acc0c42e313
2019-12-06 19:35:04 -08:00
David Tolnay
d1d8fb939a Switch from failure::Fail trait to std::error::Error for errors
Summary:
This diff replaces eden's dependencies on failure::Error with anyhow::Error.

Failure's error type requires all errors to have an implementation of failure's own failure::Fail trait in order for cause chains and backtraces to work. The necessary methods for this functionality have made their way into the standard library error trait, so modern error libraries build directly on std::error::Error rather than something like failure::Fail. Once we are no longer tied to failure 0.1's Fail trait, different parts of the codebase will be free to use any std::error::Error-based libraries they like while still working nicely together.

Reviewed By: xavierd

Differential Revision: D18576093

fbshipit-source-id: e2d862b659450f2969520d9b74877913fabb2e5d
2019-11-22 08:53:31 -08:00
Stanislau Hlebik
bcfbc5c19e fix compilation warning
Summary: I got unused warning every time I build Mononoke.

Reviewed By: singhsrb

Differential Revision: D18573764

fbshipit-source-id: 23921f581bdc74655041a2413f80cb159b4ba010
2019-11-19 00:43:36 -08:00
David Tolnay
9c6f253858 rust: Replace derive(Fail) with derive(Error)
Summary:
This diff replaces code of the form:

```
use failure::Fail;

#[derive(Fail, Debug)]
pub enum ErrorKind {
    #[fail(display = "something failed {} times", _0)]
    Failed(usize),
}
```

with:

```
use thiserror::Error;

#[derive(Error, Debug)]
pub enum ErrorKind {
    #[error("something failed {0} times")]
    Failed(usize),
}
```

The former emits an implementation of failure 0.1's `Fail` trait while the latter emits an impl of `std::error::Error`. Failure provides a blanket impl of `Fail` for any type that implements `Error`, so these `Error` impls are strictly more general. Each of these error types will continue to have exactly the same `Fail` impl that it did before this change, but now also has the appropriate `std::error::Error` impl which sets us up for dropping our various dependencies on `Fail` throughout the codebase.

Reviewed By: Imxset21

Differential Revision: D18523700

fbshipit-source-id: 0e43b10d5dfa79820663212391ecbf4aeaac2d41
2019-11-14 22:04:38 -08:00
David Tolnay
b1793a4416 rust: Rename Fallible<T> to Result<T>
Summary:
This diff is preparation for migrating off of failure::Fail / failure::Error for errors in favor of errors that implement std::error::Error. The Fallible terminology is unique to failure and in non-failure code we should be using Result<T>. To minimize the size of the eventual diff that removes failure, this codemod replaces all use of Fallible with Result by:

- In modules that do not use Result<T, E>, we import `failure::Fallible as Result`;
- In modules that use a mix of Result<T, E> and Fallible<T> (only 5) we define `type Result<T, E = failure::Error> = std::result::Result<T, E>` to allow both Result<T> and Result<T, E> to work simultaneously.

Reviewed By: Imxset21

Differential Revision: D18499758

fbshipit-source-id: 9f5a54c47f81fdeedbc6003cef42a1194eee55bf
2019-11-14 14:11:01 -08:00
Adam Simpkins
46890ae1ec Merge fb-mercurial sources into the eden repository
Summary:
Merge the fb-mercurial code into the Eden repository, under the
`eden/scm` subdirectory.

Reviewed By: quark-zju

Differential Revision: D18445774

fbshipit-source-id: fc3307f9937e0c7e1c8f7d03c5102c4fe5dedb10
2019-11-13 20:20:32 -08:00
Adam Simpkins
ab3a7cb21f Move fb-mercurial sources into an eden/scm subdirectory.
Summary:
In preparation for merging fb-mercurial sources to the Eden repository,
move everything from the top-level directory into an `eden/scm`
subdirectory.
2019-11-13 16:04:48 -08:00