Commit Graph

25 Commits

Author SHA1 Message Date
Xavier Deguillard
957cb59d60 pyrevisionstore: remove pyerr_to_error
Summary:
The function is now just wrapping a PyErr into a PythonError, let's inline this
bit into the code.

Reviewed By: quark-zju

Differential Revision: D17729379

fbshipit-source-id: ad569cee03497fab710f760d0fa09e0ea61fe208
2019-10-03 14:38:22 -07:00
Xavier Deguillard
d0b81200dd revisionstore: remove KeyError
Summary:
Instead of using KeyError to indicate that data isn't found, let's use an
Option. The Option type better encode that data is missing without having to do
a potentially error prone downcast, this may also enable us to set
RUST_BACKTRACE=1 everywhere as we won't except errors to happen often anymore,
previously, Mercurial will slow to a crawl due to the many KeyError being
thrown around.

I initially wanted to keep the change small to help reviews, but that didn't
really work out, as the dependencies on the `DataStore`/`HistoryStore` traits
are all over the place...

Reviewed By: quark-zju

Differential Revision: D17728486

fbshipit-source-id: de89c4fc441fd12ff37cc248e2230e4a1403ce44
2019-10-03 14:38:22 -07:00
Arun Kulshreshtha
78fe67ae86 manifest: add BFS prefetch function
Summary:
Add a client-driven tree prefetching implementation to the Rust manifest code. Unlike the existing prefetch implementation in Python, this one does all computation of which nodes to fetch on the client side using the BFS logic from BfsDiff. The trees are then bulk fetched layer-by-layer using EdenAPI.

This initial version is fairly naive, and omits some obvious optimizations (such as performing fetches of multiple trees concurrently), but is sufficient to demonstrate HTTP tree prefetching in action.

Reviewed By: xavierd

Differential Revision: D17379178

fbshipit-source-id: f17fe99834ad4fec07b4a4ab196928cc4fe91142
2019-09-30 20:26:36 -07:00
Arun Kulshreshtha
c48652ad4b manifest: walk manifest in breadth-first order
Summary:
Change the `Files` iterator in the Rust manifest code to traverse the tree in BFS order, allowing for layer-by-layer prefetching similar to `Diff`. This can substantially speed up walks over the tree when the cache is cold.

As a side-effect, this changes the order in which paths are reported during a manifest walk. (In particular, they are now reported in breadth-first order rather than depth-first order.) This may break things that rely on the existing ordering; as such, we may need to add a sort somewhere if this turns out to be a problem.

Reviewed By: xavierd

Differential Revision: D17645389

fbshipit-source-id: 624e426094a93e206bde4523ea8bd034fe5aeb90
2019-09-30 17:04:43 -07:00
Arun Kulshreshtha
55d07eeb94 manifest: remove dfs diff
Summary: Remove the DFS diff implementation and replace it with the BFS implementation.

Reviewed By: xavierd

Differential Revision: D17618818

fbshipit-source-id: d486642caae924f866a200d3c82fa5a4cb7d5286
2019-09-27 11:07:23 -07:00
Carolyn Busch
6cb28d027c Workingcopy python binding
Summary: Python binding for working copy rust library. Walker is initialized with the root of the repo and the python matcher, iteratively returns the matching files in the working copy. Modified Walker and PythonMatcher to allow Walker to have Send trait.

Reviewed By: xavierd

Differential Revision: D17403235

fbshipit-source-id: b8b84928aac7c79c4388a8ba8aa5475aac0c5219
2019-09-27 09:17:10 -07:00
Carolyn Busch
d90d66fc6e pypathmatcher: Remove python member
Summary:
Remove py from python matcher, because of RC, so python matcher will have send trait.
The python matcher needs to be stored by the python walker, and pyclasses can only
store data that is Send + 'static. Python pathmatcher methods should only be called
by python methods.

Reviewed By: xavierd

Differential Revision: D17511705

fbshipit-source-id: 00a4938fb00c30244ae04cb38362e8875c72fa47
2019-09-27 08:15:54 -07:00
Jun Wu
10aa5196c2 revlogindex: do not allocate parentrevs
Summary:
Getting "parents" in revlogindex is used in a very hot loop. So avoiding
allocation matters. With this patch, it's roughly 5x faster, and matches
C code doing whole changelog scan.

Before:

  In [3]: %time cl.index2.headsancestors([1,len(cl)-1])
  CPU times: user 330 ms, sys: 68 µs, total: 330 ms
  Wall time: 330 ms
  Out[3]: [5584666]

After:

  In [3]: %time cl.index2.headsancestors([1,len(cl)-1])
  CPU times: user 52.9 ms, sys: 0 ns, total: 52.9 ms
  Wall time: 53 ms
  Out[3]: [5584665]

C code doing whole changelog scan:

  In [5]: %time cl.index.clearcaches(); cl.index.headrevs(); 1
  CPU times: user 54.2 ms, sys: 187 µs, total: 54.4 ms
  Wall time: 54.4 ms

`smallvec` was not used, as it has extra overhead tracking whether it's
stack or heap allocated, which makes it 2x slower than this diff.

Reviewed By: sfilipco

Differential Revision: D17581248

fbshipit-source-id: cf6e36e0000759f41410f1e3a1d252920711fb79
2019-09-25 18:44:42 -07:00
Jun Wu
aa8e29b5a7 nodemap: add nodeset structure
Summary: This will be used to store "invisible heads".

Reviewed By: sfilipco

Differential Revision: D17264837

fbshipit-source-id: a450b5c10cc961d43ec8eb852cb2fb22849a8c00
2019-09-23 17:11:23 -07:00
Jun Wu
20dfef4801 bindings: implement lazy iteration on spans
Summary:
The SpanSet can include a large amount of revs. Iterating through it by putting
everything in a PyList is suboptimal. Therefore add a dedicated native iterator
for it. This speeds up iteration greatly, which can be verified via debugshell:

Before:

  In [1]: s=m.smartset.spansset(b.dag.spans(xrange(5000000)))
  In [2]: %time s.slice(0,10)
  CPU times: user 135 ms, sys: 42.9 ms, total: 178 ms
  Wall time: 180 ms

After:

  In [1]: s=m.smartset.spansset(b.dag.spans(xrange(5000000)))
  In [2]: %time s.slice(0,10)
  CPU times: user 49 µs, sys: 6 µs, total: 55 µs
  Wall time: 58.2 µs

Reviewed By: sfilipco

Differential Revision: D17305350

fbshipit-source-id: 0db00aa57fb6bf2141ccea94b2536da78f103cef
2019-09-23 17:11:21 -07:00
Jun Wu
484939a75d hgmain: make bindings a builtin module
Summary:
Global states (For example, the global blackbox instance, potentially some
logging / tracing libraries) are separate in the Rust and Python worlds.

That is because related code gets compiled separately:

  bindings.so (top-level)
   \_ blackbox

  hgmain (top-level)
   \_ blackbox (have a different global instance than the above blackbox)

To address it, make `bindings` a builtin module in `hgmain`.

The builtin module was renamed from `edenscmnative.bindings` to `bindings` so
it does not require importing anything else (For example, `edenscmnative`).

This unfortunately makes `hg` 100+ MB. Fortunately it can be compressed well
(gzip: 31MB).

Reviewed By: singhsrb

Differential Revision: D17429688

fbshipit-source-id: bf16910d7a260ca58db0d272fc95d8071d47bbc6
2019-09-20 18:32:36 -07:00
Xavier Deguillard
77f25d449d revisionstore: move DataStore::prefetch onto RemoteDataStore
Summary:
The RemoteDataStore is expected to be implemented into the higher level types,
while we don't want low level ones to pretend to have a prefetch method.

Reviewed By: quark-zju

Differential Revision: D17437893

fbshipit-source-id: 52ec90a6edf9aa5dac852fb827275be7fd361080
2019-09-19 11:21:52 -07:00
Xavier Deguillard
5029409860 edenapi: remove dependency on revisionstore
Summary:
The revisionstore crate will soon need to build a `dyn EdenApi` object to fetch
data out of the network. Since the edenapi crate depends on revisionstore, this
would create a recursive dependency between these 2. Several approach were
thought of, including moving either the EdenApi, or the MutableDeltaStore trait
outside of their respective crates. In the end, I decided to remove the
dependency altogether and let the caller decide what to do about the data.

Reviewed By: quark-zju

Differential Revision: D17437895

fbshipit-source-id: cc3ec830562c0616d40d7d5d36f69674934d87b9
2019-09-19 11:21:51 -07:00
Xavier Deguillard
367e2f1de3 edenapi: Mutable{Delta,History}Store have internal mutability
Summary: These no longer needs to be passed in with &mut.

Reviewed By: quark-zju

Differential Revision: D17379054

fbshipit-source-id: b1d0591013d92aaa3cc60cc3b23f42a1f175d1cb
2019-09-19 11:21:51 -07:00
Jun Wu
0dda4d7bae bindings: add a way to calculate heads(ancestors(revs))
Summary:
Per team meeting, we want to remove whole changelog scans that are incompatible
with the upcoming dag changes.

Heads calculation is one of such "whole changelog scans".

The plan is to use visibility heads + remote names to answer `head()`. However,
remote names are not guarnateed to be heads. For example, `stable` might be
an ancestor of `master`. To get the right answer about `head()`, some
calculation like `heads(::(remotenames() + visible-heads()))` needs to be done.

Calculating `heads(ancestors(...))` in Python is quite slow. This diff provides
a native fast path for it. It still requires a partial changelog scan, but will be
compatible with the future dag-based commit graph.

Reviewed By: sfilipco

Differential Revision: D17199841

fbshipit-source-id: 6ea4367b8877209899d56094f8d8ee1aff1ad6f3
2019-09-17 18:15:20 -07:00
Jun Wu
22526b0f3f bindings: add a way to calculate phases by heads
Summary:
Add a function to do "head-based phases" calculation on the revlog. So we can
experiment the breaking change, since phases are no longer root-based, and
are probably defined by remotenames and visibility heads.

The segmented changelog structure will drop support for root-based phases for
performance.

Reviewed By: sfilipco

Differential Revision: D17199844

fbshipit-source-id: 4a4dba183bb5f751b0cf454b9fc2b7e601e8c491
2019-09-17 18:15:19 -07:00
Jun Wu
d2fbb0164e bindings: add revlogindex
Summary:
This module is inteneded to have native paths for some operations that need to
scan the whole changelog. It allows us to experiment some breaking changes,
namely, head-based visibility without "filtered revs", head-based phases on
the revlog format, before the more advanced structure taking over.

This diff adds a revlog index reader that can answer do simple queries like
"length", "parents".

Reviewed By: sfilipco

Differential Revision: D17199837

fbshipit-source-id: 2574f64c980419fa966200fd52fa5ddf873baae4
2019-09-17 18:15:19 -07:00
Jun Wu
493a96a006 bindings: add more methods to dag.spans
Summary:
Expose more methods in Rust to Python.

As we're here, change `__contains__` to take a signed int so `-1 in set` test
won't trigger an error.

Reviewed By: sfilipco

Differential Revision: D17244562

fbshipit-source-id: 0b8b9069bd0a35615066d1328933ca50b09b4a25
2019-09-17 18:15:19 -07:00
Jun Wu
d10dab5342 dag: make sure Dag has complete high-level segments when SyncableDag gets dropped
Summary:
Previously, `SyncableDag` and `Dag` can co-exist. Dropping SyncableDag involves
error handling and is not panic-free. If we want to make sure `Dag` has complete
high-level segments, then it would have been implemented in `SyncableDag::drop`,
making it more sensitive to panic.

Change the API so `SyncableDag` is independent from `Dag`, so `Dag` always
has complete segments, and changes to `SyncableDag` are invisible to `Dag`,
so `SyncableDag` cannot mess up existing `Dag` structures.

Reviewed By: sfilipco

Differential Revision: D17000969

fbshipit-source-id: 1ceed4ea335d3d64848b7430d48076846b90695d
2019-09-17 12:36:44 -07:00
Jun Wu
257b0b053d bindings: add vlq.read API
Summary: This makes it possible to decode VLQ from a stream.

Reviewed By: alexeyqu

Differential Revision: D17404066

fbshipit-source-id: 4a3b0e5333664c3cfc0f76bbdc7db80c25a3a49c
2019-09-16 14:01:15 -07:00
Jun Wu
b3eb5bf97d dag: refactor segment building APIs
Summary:
Previously, the `Dag` has 2 low-level `build_segemnts` APIs:

- Dag::build_flat_segments(..., last_threshold)
- Dag::build_high_level_segments(..., drop_last)

They allow customization about whether the segments are lagging or not.

However, certain algorithms (ex. children and range) now require the high level
segments to cover everything covered by the flat segments. The above APIs
wouldn't ensure that.

This diff refactors the segment building APIs so that:

- Make `build_flat_segments`, and `build_high_level_segments` private to
  prevent misuse.
- Ensure high level segments cover flat segments at `Dag::open` and
  `Dag::build_segments_volatile`, the only ways to change `Dag`.
- Provide different APIs suitable for different (one-time in-memory vs
  on-disk) use-cases. The on-disk `build_segments_persistent` API makes high
  level segments lagging to avoid fragmentation, while the in-memory
  `build_segments_volatile` does not.

To satisfy the existing test need, a `set_segment_size` API was added to
override the default segment size.

Most callsites become simpler because they no longer need to figure out
details about segment size, level, and lagging.

Reviewed By: sfilipco

Differential Revision: D17000965

fbshipit-source-id: 78bb0c7674c99e91be6011bb7e623cd4f63b1521
2019-09-13 19:31:03 -07:00
Xavier Deguillard
3e1eccd586 replace std::sync{Mutex, RwLock, Condvar} with parking_lot
Summary:
The parking_lot crate is more convenient to use than std::sync, on top
of everything else listed at https://crates.io/crates/parking_lot. Let's
use it everywhere.

Reviewed By: quark-zju

Differential Revision: D17337444

fbshipit-source-id: b5489be0b7d2bd5f6a6edc5d1d6eea366a6c05b9
2019-09-13 15:16:57 -07:00
Arun Kulshreshtha
980bcdd0f8 manifest: add bfs diff to bindings crate
Summary: Add support for calling the new BFS diff implementation from Python. This diff adds the appropriate glue code to the bindings crate and adds a config option (`treemanifest.bfsdiff`) to enable the new functionality.

Reviewed By: xavierd

Differential Revision: D17334739

fbshipit-source-id: 24aac21910e74a42d625c93bed7fa3aa08e167c0
2019-09-12 18:28:43 -07:00
Jun Wu
07184f2bb8 bindings: split the crate into multiple crates
Summary:
Split the crate to improve build time.

Before this change, a naive change on any of the simple modules can still take
20+ seconds to compile, even with incremental compilation enabled.

This diff splits the crate into multiple smaller crates. A simple change to a
simple crate can take < 10 seconds to re-compile.

Different from pre-D13923866 state, there is still only one single Python
extension.

Reviewed By: xavierd

Differential Revision: D17345706

fbshipit-source-id: c7e2e6f0e1b86071c863cfb8989070a581825956
2019-09-12 10:51:07 -07:00
Jun Wu
a17a87bf4f setup: move native extensions to edenscmnative
Summary:
This just moves things around. So native and pure Python modules are split to
different Python packages. This makes it possible to use the standard zip
importer without hacks (ex. `hgdemandimport/embeddedimport`).

This diff is mostly about moving things. While `make local` still works,
it does break nupkg build, which will be fixed in a later diff.

Reviewed By: kulshrax

Differential Revision: D15798642

fbshipit-source-id: 5d83f17099aa198df0acd5b7a99667e2f35fe7b4
2019-06-19 17:55:49 -07:00