Commit Graph

552 Commits

Author SHA1 Message Date
Durham Goode
694cc78523 historypack: implement FileIndexEntry read/write logic for historyindex
Summary:
To start the history pack implementation, let's start by implementing
reader/writers for the various parts. In this diff we do the FileIndexEntry

Reviewed By: markbt

Differential Revision: D9231395

fbshipit-source-id: d054959796ee4e3d51df8f3533712f8f959a04d2
2018-08-15 15:24:38 -07:00
Durham Goode
3b5966895f mutablehistorypack: implement get_ancestors
Summary: Implements the HistoryStore get_ancestors api.

Reviewed By: quark-zju

Differential Revision: D9136980

fbshipit-source-id: 59b7a1d51c4bf95edec452fcb912fb7647151d24
2018-08-15 15:24:38 -07:00
Durham Goode
c8d9db34df revisionstore: add AncestorIterator class
Summary:
This moves the ancestor iteration logic for cases where we iterate one
by one. This will be used by the HistoryPack code in upcoming diffs.

Reviewed By: quark-zju

Differential Revision: D9136978

fbshipit-source-id: e60b0a1e2ee5036938b51bbd910fbaf548d7aa75
2018-08-15 15:24:38 -07:00
Durham Goode
550d912ae0 revisionstore: add BatchedAncestorIterator class
Summary:
This moves the ancestor iteration logic into it's own class, with
support for cases where we receive bulk sets of ancestors at once. A future diff
will add similar logic for ancestor traversals where we receive one hash at a
time.

Reviewed By: quark-zju

Differential Revision: D9136985

fbshipit-source-id: 7f918476f777020b3436f5104ad3bf4b00fe9827
2018-08-15 15:24:38 -07:00
Durham Goode
d45da0c3aa mutablehistorypack: implement get_node_info
Summary: Implements get_node_info in the HistoryStore trait.

Reviewed By: quark-zju

Differential Revision: D9137007

fbshipit-source-id: e98b5ed247b5756074902a155fd31eeff8e176d8
2018-08-15 15:24:38 -07:00
Durham Goode
8e6bf4f8f5 mutablehistorypack: implement get_missing()
Summary: Initial implementation for get_missing on MutableHistoryPack

Reviewed By: quark-zju

Differential Revision: D9136983

fbshipit-source-id: ea6c7a7a513d9ef8f2c06a1e6601109fc6e9ebce
2018-08-15 15:24:38 -07:00
Durham Goode
1ac2e4a88b mutablehistorypack: implement add()
Summary:
The initial code for implementing a mutable history pack. Future diffs
will add logic that serializes this to a pack file, an index file, and adds
a history pack reader class.

Reviewed By: quark-zju

Differential Revision: D9136997

fbshipit-source-id: 7e80613eb4cc0cb51a977a4a449d565ab1d0ce80
2018-08-15 15:24:38 -07:00
Durham Goode
01b81f1217 tests: remove cargo warnings
Summary: Fix up some simple cargo test warnings

Reviewed By: quark-zju

Differential Revision: D9136979

fbshipit-source-id: dd10ea6751eb68190190381dc69b3494160cf358
2018-08-15 15:24:38 -07:00
Jun Wu
f85675d80c configparser: record file content in ValueSource struct
Summary:
Embed a snapshot of the config file at the parsing time. So applications can
have access to them, and can do things like calculating line numbers, or editing
the config files.

This is shallow copy. So it does not affect performance.

Reviewed By: DurhamG

Differential Revision: D8960872

fbshipit-source-id: e1905712dbec4b02d93a4fecc97064f0e00024c8
2018-08-09 21:21:49 -07:00
Jun Wu
134541a7ae configparser: make load_system and load_user return errors
Summary: Otherwise there is no way to get parse errors with the last change.

Reviewed By: DurhamG

Differential Revision: D8960867

fbshipit-source-id: 48ef748096a67baa155bddf202c8ebec7ed1eeb5
2018-08-09 21:21:49 -07:00
Jun Wu
cb58cd0d26 configparser: return errors instead of keeping them
Summary:
Change the API to return parse errors directly, instead of keeping them in
ConfigSet struct. This makes it easier to get errors related to one of the
"parse" calls.

Reviewed By: DurhamG

Differential Revision: D8960869

fbshipit-source-id: fbd571f264415e788c5ac44961149d1498826a6d
2018-08-09 21:21:49 -07:00
Jun Wu
d028811a8f configparser: strip leading space from multi-line value
Summary:
Multiline value like:

  [section]
  foo = a
    b

should be parsed as "a\nb", instead of "a \nb".

It does not affect configlist, but affects template definations.

Unfortunately in this case we had to allocate a new buffer instead of using
`Bytes::slice`. Fortunately most configs are single-line, so the performance
impact is hardly visible practically.

Reviewed By: DurhamG

Differential Revision: D8960866

fbshipit-source-id: 011e7f431d682236529ce176fe577aac6a010d91
2018-08-09 21:21:48 -07:00
Jun Wu
28b42961e8 configparser: use indexmap
Summary:
Switch to indexmap, which is more actively maintained than linked-hash-map.

There is no visible performance difference when parsing large config files.

Reviewed By: DurhamG

Differential Revision: D8960870

fbshipit-source-id: 8d6650e2d8b14989061dceb2081a3f93004cea76
2018-08-09 21:21:48 -07:00
Jun Wu
3bfce55697 configparser: use dirs crate
Summary:
`home_dir` in stdlib is going to be deprecated. Therefore switch to
external crate.

Reviewed By: DurhamG

Differential Revision: D8960874

fbshipit-source-id: e123debc5c58e6a632a801dedcd9fc6834cb1f65
2018-08-09 21:21:48 -07:00
Jun Wu
8aefad1c97 configparser: use shellexpand crate to expand paths
Summary: The crate also helps expanding environment variables.

Reviewed By: DurhamG

Differential Revision: D8960873

fbshipit-source-id: c83fc7256a8297752a14c1d86d1ddb3735f95682
2018-08-09 21:21:48 -07:00
Jun Wu
a4129f8d53 configparser: use pest to parse config files
Summary:
[pest](https://github.com/pest-parser/pest) is an elegant Rust library for
parsing text.

A navie benchmark on a 1MB config file shows pest is about 1.5 to 2x slower.
But the better error message and cleaner code seems worth it.

Practically, in a VirtualBox VM, parsing a set of our config files takes 3-7ms.
The overhead seems to be opening too many files. Reducing it to one file makes
parsing complete in 2-4ms.

Unfortunately the buck build has issues with the elegant syntax
`#[grammar = "spec.pest"]`, because "spec.pest" cannot be located by pest_derive.
Therefore a workaround is used to generate the parser.

The motivation behind this is because I noticed multi-line value can not be
taken as a plain Bytes slice. For example:

  [section]
  foo = line1
    line2

"foo" should be "line1\nline2", instead of "line1\n  line2". It does not make a
difference on configlist. But it affects templates. Rather than making the
parser more complex, it seems better to just adopt a reasonbly fast parsing
library.

Reviewed By: DurhamG

Differential Revision: D8960876

fbshipit-source-id: 2fa04e38b706f7126008512732c9efa168f84cc7
2018-08-08 17:20:00 -07:00
Jun Wu
0ea7f4aa94 configparser: skip space characters
Summary:
Previously, a line with all space characters is considered "illegal" and I
didn't handle it. It would actually be parsed as part of config name.

Let's skip them. So config files with spaces would behave sanely.

Reviewed By: DurhamG

Differential Revision: D8887370

fbshipit-source-id: e55d221d281fc58b2d2efbcb9196e7f68a78d719
2018-08-08 17:20:00 -07:00
Jun Wu
4f6c9b1a5e configparser: add a way to clone configs
Summary: Mercurial's ui.py needs a way to copy configs.

Reviewed By: DurhamG

Differential Revision: D8886245

fbshipit-source-id: b936edf5e215ecae078d992a344bcecef7fcd7f3
2018-08-08 17:20:00 -07:00
Jun Wu
d1e4252154 configparser: add a way to mark configs as read-only
Summary:
Command-line flags override config files configs. However, config files load
after parsing command-line flags in Mercurial's current logic. Therefore, a way
to make sure config files do not override command line flags is needed.

`ui.py` uses two config objects `ui._ocfg`, `ui._ucfg` and calls
`_ucfg.update(_ocfg)` after loading a config file to solve the problem.
That adds overhead updating ucfg.

With configparser's "filter" API, instead of rewriting configs afterwards,
the configs can be stopped from loading via files in the first place. So
there is no overhead maintaining two config sets and updating them.

Reviewed By: DurhamG

Differential Revision: D8960877

fbshipit-source-id: cf7b9a820911638956e123c1c93d3febeabf53c2
2018-08-08 17:20:00 -07:00
Jun Wu
39a4651bc3 configparser: implement system and user config loading
Summary:
This is to locate system / user config files at some fixed places.
It will replace most of `mercurial.rcutil`, and be used in native clients.

Reviewed By: DurhamG

Differential Revision: D8895791

fbshipit-source-id: 47166b943a3bd90a8aff1c15674a3da4e14bf8d3
2018-08-08 17:20:00 -07:00
Jun Wu
5377ca5f97 configparser: implement HGPLAIN handling
Summary:
Implement HGPLAIN handling using the filter feature of `Options`.

This is hg-specific, therefore it's implemented in a separate `hg` module, as
an extension to `config::Options`.

The `hg` module could contain more hg related logic, like locating system
and user config files, to make it easier to use by Eden.

The plan is to have this as the single source of truth handling HGPLAIN
environment variables and migrate other places reading HGPLAIN to
use side effects caused by functions defined here. The side effects
are ideally just normal config options accessible via `ConfigSet::get` APIs,
instead of another special case (ex. `HgPlain::get(name) -> bool`).

Reviewed By: DurhamG

Differential Revision: D8895788

fbshipit-source-id: fa0ad7e7207513d947216292cbbd65530391cf11
2018-08-08 17:19:59 -07:00
Jun Wu
e33154698b Back out "Reuse pylz4 encoding between hg and Mononoke into a separate library"
Summary:
Backout D9124508.

This is actually more complex than it seems. It breaks non-buck build
everywhere:

- hgbuild on all platforms. POSIX platforms break because `hg archive` will
  miss `scm/common`. Windows build breaks because of symlink.
- `make local` on GitHub repo because `failure_ext` is not public. The `pylz4`
  Cargo.toml has missing dependencies.

Fixing them correctly seems non-trivial. Therefore let's backout the change to
unblock builds quickly.

The linter change is kept in case we'd like to try again in the future.

Reviewed By: simpkins

Differential Revision: D9225955

fbshipit-source-id: 4170a5f7664ac0f6aa78f3b32f61a09d65e19f63
2018-08-08 12:20:54 -07:00
Tuan Tran
f50d617d2d Reuse pylz4 encoding between hg and Mononoke into a separate library
Summary: Moved the lz4 compression code into a separate module in `scm/common/pylz4` and redirected code referencing the former two files to the new module

Reviewed By: quark-zju, mitrandir77

Differential Revision: D9124508

fbshipit-source-id: e4796cf36d16c3a8c60314c75f26ee942d2f9e65
2018-08-08 10:08:11 -07:00
Liubov Dmitrieva
d2be4f57d0 scm daemon: fix spelling
Reviewed By: markbt

Differential Revision: D9194901

fbshipit-source-id: 1f077a5778bee2d2b1f62b2d10beff3dd3365471
2018-08-07 06:39:45 -07:00
Jun Wu
7d346e6bc2 ignore: support global gitignore configs
Summary:
Change the Rust ignore matcher to accept an extra list of gitignore files.
Parse "git:" entries of "ui.ignore" to be git ignore files.

Reviewed By: DurhamG

Differential Revision: D8863905

fbshipit-source-id: 0cd5e29e01f01496ff61c81b89f7876202f18a98
2018-08-02 20:22:47 -07:00
Hugh Harris
99a1b993c3 Add secrets authentication for commitcloud in scm_daemon
Summary: If the daemon can't find the token file, it will try to read from secrets_tool on unix-like systems. Integrates well with people who have enabled the secrets_token option as their token file will have been deleted.

Reviewed By: liubov-dmitrieva

Differential Revision: D9029795

fbshipit-source-id: b364d9e8885ee0473b8d1effd6ee0b2e86a699f9
2018-08-02 12:06:47 -07:00
Jeremy Fitzhardinge
08f618a7f3 tp2/rust: rust-crates-io update
Summary: Update rust-crates-io. Small changes needed for failure 0.1.2 update.

Reviewed By: rahulg

Differential Revision: D9125235

fbshipit-source-id: fd98af065b54e207fcb2c3cfc9dd9a2d325cc6c8
2018-08-02 10:05:38 -07:00
Jun Wu
dafc189588 treestate: fix documentation about FilteredKeyCache
Summary: It's just a documentation fix.

Reviewed By: singhsrb

Differential Revision: D9110152

fbshipit-source-id: ce4065b7aad6fac05f4c27ef7d2569352cdd2633
2018-07-31 16:35:50 -07:00
Jun Wu
03a0270913 treestate: change getfiltered API to return all matched entries
Summary:
Case-folding could be more complex than what Mercurial currently handles.
Suppose the following paths are committed to a repo using a case-sensitive
filesystem:

  a/a/A
  a/A/a
  A/a/a

Then querying "a/a/a" with a "normpath" filter should ideally have access to
all the above paths.

Unfortunately, the API is changed to use copy instead of references, as it's
impossible to return multiple values borrowed from `&mut self`.

Changes are made on treestate Python land as well to use the new API.  This
solves issues about case-folding corner cases covered by test-eol.t and
test-casefolding.t.

Reviewed By: DurhamG

Differential Revision: D9092405

fbshipit-source-id: 49eb4511ff3c9e5400a522b37126e112c917d2d7
2018-07-31 13:49:35 -07:00
Jun Wu
d662a1e82e configparser: implement section remapping
Summary:
The feature is required by Mercurial config layer. It's used by hgweb and
some templater configs.

Reviewed By: DurhamG

Differential Revision: D8886246

fbshipit-source-id: 836fc255b821e6b6c50cf2a435837e9051e90a7d
2018-07-27 18:49:49 -07:00
Jun Wu
fd7791958a configparser: implement section whitelist option
Summary:
The Mercurial API allows setting a section whitelist when parsing configs.
Let's add such feature to the Rust config parser.

Reviewed By: StanislavGlebik

Differential Revision: D8886247

fbshipit-source-id: 981026b98962e065b536077012d7d1042d2ada91
2018-07-27 18:49:49 -07:00
Jun Wu
4f5c7ccc14 configparser: allow defining filter functions to rename section or discard configs
Summary:
There are some advanced config related requirements in Mercurial:
- Drop certain configs if certain HGPLAIN features are set.
- But, do not drop HGPLAIN configs if the config is set via CLI flags.
- Remap section names.
- Whitelist sections.

This diff adds a filter function option aiming to support all of the above.

Reviewed By: StanislavGlebik

Differential Revision: D8895787

fbshipit-source-id: 1abd90974c4e4b3f7f2fb33173ad2af34e0a4a65
2018-07-27 18:49:49 -07:00
Jun Wu
47094efdde configparser: move "source" to a dedicated "Options" struct
Summary:
It turns out that "source" is not the only "option" that the caller needs to
set. From Mercurial's existing code, namely `ui.readconfig`, the API also
needs to support whitelisting config sections, and "remap" config sections.

Instead of adding more parameters to almost all functions. Let's add an
`Options` struct that will holds those configs. For now, it only has
`source`. New fields will be added by upcoming changes.

To help existing code migrate smoothly, and satisfy the most common
use-cases where only "source" is set, a `From<impl Into<Bytes>>` trait is
implemented.

Reviewed By: StanislavGlebik

Differential Revision: D8886244

fbshipit-source-id: 90b49565de6fbbce3e8e48db8e6805154d156360
2018-07-27 18:49:49 -07:00
Durham Goode
2b272f7bbc revisionstore: use ok_or_else instead of ok_or
Summary:
When reading entries, we were using ok_or to read a slice and catch
errors, but this causes an unnecessary allocation for the error even if we don't
have an error. Let's use ok_or_else to avoid that.

Reviewed By: quark-zju

Differential Revision: D8897109

fbshipit-source-id: d308f64d54a58077d9ec2eb34dd1bef431ac1819
2018-07-26 12:17:20 -07:00
Durham Goode
6dc36e06d2 datapack: implement Repackable
Summary: Implements the Repackable interface.

Reviewed By: quark-zju

Differential Revision: D8895276

fbshipit-source-id: ba0c83894db283c5c1dddf68ec8fdbe64a17a801
2018-07-26 12:17:19 -07:00
Durham Goode
dfc30ad8e6 revisionstore: add Repackable trait
Summary:
Adds a trait that represents a store that is repackable. An implementor
only needs to be iterable, and expose some basic type and identifier information
and the trait provides the actual repack logic.

Reviewed By: quark-zju

Differential Revision: D8894756

fbshipit-source-id: 13053f8c7b6dca8b80ea819ef18949f3862cf367
2018-07-26 12:17:19 -07:00
Durham Goode
c18a1d04f1 revisionstore: implement iter for DataStore trait
Summary:
We need the ability to iterate over a datastore so we can implement
repack and cleanup. In a later diff we'll use this trait to implement repack
functionality in a way that it can apply to any store that implements
IterableStore.

Reviewed By: quark-zju

Differential Revision: D8885094

fbshipit-source-id: 0a2b1ab8cf524392d890302c33e386f1cd218d24
2018-07-26 12:17:19 -07:00
Durham Goode
e1b153825b revisionstore: store paths on DataPack object
Summary:
The paths for each data pack are used in various situations (repack,
error reporting, etc) so let's store them and make them accessible via the
python api.

Reviewed By: quark-zju

Differential Revision: D8884773

fbshipit-source-id: 4108c98b4e303ba9bded1f264746fa4a84845c73
2018-07-26 12:17:17 -07:00
Jeremy Fitzhardinge
03640e680e tp2: update rust-crates-io
Summary:
Fix crate names for where the crate name doesn't match the package
name. This affected a few crates, but in practice only rust-crypto/crypto was
used.

Reviewed By: Imxset21

Differential Revision: D9002131

fbshipit-source-id: d9591e4b6da9a00029054785b319a6584958f043
2018-07-25 15:50:52 -07:00
Durham Goode
e64f1c7c7f datapack: return KeyError for missing key
Summary:
If the DataIndex didn't have a key, we were returning a DataIndexError
when we should've been returning a KeyError. This tells the higher level stores
to continue to the next store instead of raising the exception further.

Reviewed By: quark-zju

Differential Revision: D8806186

fbshipit-source-id: c40da96101494d5e3ea7910bf4b1a89674463a77
2018-07-25 11:07:33 -07:00
Durham Goode
1f93a51285 datapack: change version to be an enum
Summary:
The version only has a few valid values, so let's change it to be an
enum. This will be used in an upcoming diff to make the python tests pass.

Reviewed By: quark-zju

Differential Revision: D8775752

fbshipit-source-id: b1101c123b4802fbcb0f0a6fe5a45d741aec764f
2018-07-25 11:07:32 -07:00
Durham Goode
1f260293cc datapack: fix indicator for end of chain vs missing delta
Summary:
In the python code, end of a delta chain is marked with one value while
a missing delta is marked with another value. This isn't actually used anywhere,
but let's make the rust code mimic this for now.

Reviewed By: quark-zju

Differential Revision: D8775039

fbshipit-source-id: c9f81471bfd67e720938d6c5bbd10db029406686
2018-07-25 11:07:32 -07:00
Durham Goode
c1baef6ce3 dataindex: hide IndexEntry fields behind methods
Summary:
In a future diff we'll be changing the storage of IndexEntry to be
different from the API. So let's hide the actual format behind functions.

Reviewed By: quark-zju

Differential Revision: D8923005

fbshipit-source-id: 2f87b35315f8a7a5a8e67b6d0be2c73a1d9bccb4
2018-07-25 11:07:32 -07:00
Durham Goode
82a6a73b98 datastore: add get_delta function to DataStore trait
Summary:
This function is present on the python data store api, so let's
replicate it here. Later we should come back and refactor this to be a special
case of the get_delta_chain result, but for now we'll maintain the custom API so
we can start using this code from python.

Reviewed By: quark-zju

Differential Revision: D8774474

fbshipit-source-id: aabcff3a43ae68859a1bf3b23f433214571b1a9d
2018-07-25 11:07:32 -07:00
Durham Goode
047bf26495 buck: add buck target files
Summary: This will let us build with buck.

Reviewed By: quark-zju

Differential Revision: D8980839

fbshipit-source-id: ea64328d32bc2c88984d0c861acefcc55b84ce02
2018-07-24 16:05:26 -07:00
Jun Wu
7e31ecff45 configparser: silence a compiler warning
Summary:
`std::env::home_dir` got deprecated [1]. But the replacements are not in tp2
yet (meaning the buck build will fail). So let's silence the warning for now.

As we're here, also fix an incorrect comment.

[1]: https://internals.rust-lang.org/t/deprecate-or-break-fix-std-env-home-dir/7315

Reviewed By: mitrandir77

Differential Revision: D8886248

fbshipit-source-id: aca0334cbc8b710e42c5c86c952f58adcd10ba2c
2018-07-23 18:37:10 -07:00
Durham Goode
6c71f5a3c0 revisionstore: fix unused code warnings
Summary:
There were a bunch of unused code warnings because the mutabledatapack
module wasn't exposed as public. This then lead to us ignoring other warnings.
Let's fix all of them.

Reviewed By: quark-zju

Differential Revision: D8895468

fbshipit-source-id: 914c81026469382fcf28015b4a6bce13bad746c2
2018-07-18 10:08:49 -07:00
Durham Goode
fb0e3537bf dataindex: fix dataindex to store index_start relative locations
Summary:
Previously the rust dataindex would store the delta base location as an
offset relative to the start of the file. The python implementation stores it
relative to the start of the index though. So let's update the rust
implementation.

Reviewed By: quark-zju

Differential Revision: D8774206

fbshipit-source-id: d4317a95df353a7b635f1827fcfad7f3fb171afd
2018-07-17 15:10:01 -07:00
Jun Wu
4089c7bd52 configparser: expose types
Summary: Exposes important types so they can be used in other crates.

Reviewed By: mitrandir77

Differential Revision: D8790923

fbshipit-source-id: 955249219ba5d963d0529ba35f79ed4a8120140a
2018-07-16 19:57:37 -07:00
Jun Wu
e94ffb1907 configparser: implement %unset and %include
Summary: Implement parsing those special macros.

Reviewed By: mitrandir77

Differential Revision: D8779053

fbshipit-source-id: 422cae90497b88b0ad930d3eeacfd94624586f67
2018-07-16 19:57:36 -07:00
Jun Wu
0b39ff42d9 configparser: implement basic parsing
Summary:
Handling sections and normal config items. `%` support will be added in an
upcoming patch.

Note: regex would make the code simpler - the expression
`^([^\s=]+)\s*=\s*(.*(?:\n[\t ].*)*)\s*` can extract both config name and
multi-line values. However a naive benchmark shows it is 20x slower parsing
larger files, and it has some initialization cost. Config parsing is at such
a low level and its performance is critical. So the code does its own
parsing instead of using regex.

Reviewed By: mitrandir77

Differential Revision: D8779051

fbshipit-source-id: a2de698f0676c886737c47891a0400f187bff822
2018-07-16 19:57:36 -07:00
Jun Wu
245d655673 configparser: implement loading a directory
Summary:
Add functions to load a path, where the path can either be a directory, or a
file. Implement the directory traversal. Loading a file is the most complex
part and will be implemented by an upcoming diff.

Reviewed By: lukaspiatkowski

Differential Revision: D8779052

fbshipit-source-id: f25265b4b7cc5df5cc3717643c3d0ee9cf6da8a4
2018-07-16 19:57:35 -07:00
Jun Wu
b499ec3daa configparser: add string handling utilities
Summary: They will be used by the actual parser.

Reviewed By: lukaspiatkowski

Differential Revision: D8777326

fbshipit-source-id: c6cda3168a060b1d36aaf3224a5e547d0aa45530
2018-07-11 17:36:06 -07:00
Jun Wu
3a93d55e44 configparser: implement set
Summary: This allows setting a config value.

Reviewed By: mitrandir77

Differential Revision: D8779050

fbshipit-source-id: 48544460060bcd383528461275462e63d4884f7f
2018-07-11 17:36:06 -07:00
Jun Wu
dc7ac5545a configparser: define basic interface for the config object
Summary:
Define internal objects and public API. `Bytes` is heavily used for cheaply
copying the values. Simple public APIs are implemented. Complex ones like
the actual parser will be implemented in upcoming changes.

Reviewed By: mitrandir77

Differential Revision: D8777329

fbshipit-source-id: d9de10274d7de6bcdd9af030d238b2b12594f085
2018-07-11 17:36:06 -07:00
Jun Wu
8660f02fcc configparser: define error types
Summary: Define error types to be used in upcoming changes.

Reviewed By: mitrandir77

Differential Revision: D8777328

fbshipit-source-id: 88a171c889798887e4f2436147427837b66573be
2018-07-11 17:36:06 -07:00
Jun Wu
9e08d19d8e configparser: add a new Rust library
Summary: This will be used to parse hgrc-like config files.

Reviewed By: mitrandir77

Differential Revision: D8777330

fbshipit-source-id: 73a114df36e23246a3fc1206be202fba8705453a
2018-07-11 17:36:06 -07:00
Durham Goode
51cca830f8 lz4-pyframe: fix compression of 0 length strings
Summary:
The python lz4 framing logic chooses to include no data when the input
string is 0 length. We need to match that logic in order to be compatible with
it.

See https://github.com/steeve/python-lz4/blob/master/src/python-lz4.c#L75

Reviewed By: quark-zju

Differential Revision: D8773951

fbshipit-source-id: 9bc60fc0779eb923f7c663d7e516b519963e8056
2018-07-09 18:02:58 -07:00
Durham Goode
a169a98521 loosefile: fix compilation errors in tests
Summary:
The Node::random() function changed while this was landing. So we need
to update the tests.

Reviewed By: quark-zju

Differential Revision: D8774074

fbshipit-source-id: 6f3bcdeac069ef5ffdb2deb1970a1655cabcedaf
2018-07-09 15:20:50 -07:00
Jun Wu
9e8f7613fb indexedlog: detect index corruption
Summary:
The primary log and indexes could be out of sync when mutating the indexes
error out. In that case, mark the indexes as "corrupted" and refuse to
perform index read (lookup) operations, for correctness.

Reviewed By: DurhamG

Differential Revision: D8337689

fbshipit-source-id: 3db9006ea03cfcaba52391f189aa697944b616e5
2018-07-09 14:37:27 -07:00
Jun Wu
9714887f14 indexedlog: add a test about swapping indexes
Summary:
This demonstrates the index definitions can have different orders, as long
as their names do not change, things still work.

Reviewed By: DurhamG

Differential Revision: D8337688

fbshipit-source-id: 2fbbdf711d8edc10fc6d3314532390ea712aca6c
2018-07-09 14:37:26 -07:00
Jun Wu
fdcf835ec4 indexedlog: log: add a test about index lookup
Summary: The test tries to cover interesting variants.

Reviewed By: DurhamG

Differential Revision: D8156520

fbshipit-source-id: b739d1dfcecf8bfa5b23671a83c7f314a021007b
2018-07-09 14:37:26 -07:00
Jun Wu
7a5291ee43 indexedlog: log: add LogLookupIter.into_vec
Summary: This is handy to use.

Reviewed By: DurhamG

Differential Revision: D8156517

fbshipit-source-id: 63aa836bf469de2ad55237dea02b9d0ca28fa3ce
2018-07-09 14:37:26 -07:00
Jun Wu
ee638e6de4 indexedlog: log: implement flush
Summary: Completes the interface.

Reviewed By: DurhamG

Differential Revision: D8156511

fbshipit-source-id: 0d4d05aa23c47117da70ec47cf9be3d4fe41df7b
2018-07-09 14:37:26 -07:00
Qingpeng Niu
7e0204ff39 loosefile class to read Mercurial loose file format data.
Summary: Create a simple rust reader for our loose file format.  One of Mercurial’s simplest file formats is the loose file format.  fbsource/fbcode/scm/hg/hgext/remotefilelog/remotefilelog.py:_createfileblob() is the python writing implementation.

Reviewed By: DurhamG

Differential Revision: D8731050

fbshipit-source-id: 80eb2abde2a2e5bb672d7e8ffa8ba58ed62184c1
2018-07-06 12:51:08 -07:00
Durham Goode
20c35ecbf3 revisionstore: use fixed random generator for tests
Summary:
Instead of using random nodes, let's use ones based off a seeded
generator.

Reviewed By: quark-zju

Differential Revision: D8741139

fbshipit-source-id: a90e6f092adac6aef35149ee6c4bf2b47c469602
2018-07-06 11:11:40 -07:00
Durham Goode
0e34d12531 mutabledatapack: implement get_delta_chain
Summary: Implements the get_delta_chain function of the DataStore trait.

Reviewed By: quark-zju

Differential Revision: D8598658

fbshipit-source-id: 708bca63e2da3aae6064ed18076a9a1f1282a756
2018-07-06 11:11:40 -07:00
Durham Goode
32ce9b99ab revisionstore: change delta base to be an Option<>
Summary:
Deltas may not have bases if they are a full text. Let's represent
that as an Option instead of as a magical null id value. This has the nice
effect of moving the decision to serialize a missing delta base down into the
serializer instead of up at the delta chain construction level.

Reviewed By: quark-zju

Differential Revision: D8739231

fbshipit-source-id: b58bd40dae45cb85890812db21e7eeff46aa6b4e
2018-07-06 11:11:40 -07:00
Durham Goode
28e570113e lib: remove cbincode from cargo workspace
Summary: This doesn't exist.

Reviewed By: quark-zju

Differential Revision: D8743699

fbshipit-source-id: b12c2beb600b2918bee8ca579dbf96bc8ce5288c
2018-07-05 18:50:43 -07:00
Jun Wu
a487dacc4b codemod: reformat rest of the code
Summary:
Previous code format attempt (D8173629) didn't cover all files due to `**/*.py`
was not expanded recursively by bash. That makes certain changes larger than
they should be (ex. D8675439). Now use zsh's `**/*.py` to format them.

Also fix Python syntax so black can run on more files, and all lint issues.

Reviewed By: phillco

Differential Revision: D8696912

fbshipit-source-id: 95f07aa0c5eb1b63947b0f77f534957f4ab65364
2018-07-05 17:52:43 -07:00
Jun Wu
d0c1b6d014 cargo: add a workspace
Summary:
Make `lib` a cargo workspace so building in subprojects would share a
`target` directory and `cargo doc` will build documentation for all
subprojects.

Reviewed By: DurhamG

Differential Revision: D8741175

fbshipit-source-id: 512325bcb23d51e866e764bdc76dddb22c59ef05
2018-07-05 16:06:35 -07:00
Durham Goode
a1b6fa3007 mutabledatapack: implement get_meta
Summary:
Implements the get_meta function of the DataStore trait. This caught a
bug in how we record lengths as well.

Reviewed By: quark-zju

Differential Revision: D8598661

fbshipit-source-id: 566dca1770d6666e4215fa1fd8f33babdede2f90
2018-07-05 14:53:19 -07:00
Durham Goode
b728154963 mutabledatapack: change error to contain String
Summary:
We want to be able to format error strings, so we can't return a static
str anymore.

Reviewed By: quark-zju

Differential Revision: D8598659

fbshipit-source-id: 44d7a73c06416efca51ca4d0f24a0c8911af8582
2018-07-05 14:53:19 -07:00
Durham Goode
587fc95964 mutabledatapack: begin implementing DataStore trait
Summary:
A mutabledatapack also needs to be readable as a normal store. Let's
start implementing the DataStore trait, starting with get_missing

Reviewed By: quark-zju

Differential Revision: D8598657

fbshipit-source-id: 1f8bc89fae2be73fe789bc0ef1cdd922222019a2
2018-07-05 14:53:18 -07:00
Durham Goode
875758fdbe datapack: implement getdeltachain
Summary: Implements the last of the DataStore api, getdeltachain.

Reviewed By: quark-zju

Differential Revision: D8557950

fbshipit-source-id: 7f6530fe2064f0d035414b7920a126c6aab41beb
2018-07-05 14:53:18 -07:00
Durham Goode
24a4751ff0 revisionstore: change Delta.data to Rc
Summary:
In a future diff we'll be returning data read from a pack file out as a
Delta. To avoid copies, we need to be able to return an Rc from DataPack. This
seems like it will be a common pattern, so let's go ahead and make Delta contain
its data as an Rc.

Reviewed By: quark-zju

Differential Revision: D8557949

fbshipit-source-id: 276005360bfa48e9154143dedce579a21129e976
2018-07-05 14:53:18 -07:00
Durham Goode
c6af00dbc9 datapack: implement getmetadata
Summary:
Introduces the DataEntry structure which is able to parse data entries
from pack files. Uses it to implement getmetadata

Reviewed By: quark-zju

Differential Revision: D8556610

fbshipit-source-id: c25427c3c247970a879ad7d409b821f3695b97d9
2018-07-05 14:53:17 -07:00
Durham Goode
79a9dd976f datapack: implement getmissing
Summary: Adds the DataStore trait and implements the getmissing function.

Reviewed By: quark-zju

Differential Revision: D8554391

fbshipit-source-id: 41c107c07de7d6945ca7370e264c6bc0bf154754
2018-07-05 14:53:17 -07:00
Durham Goode
ebc31e8daf datapack: add initial datapack structure
Summary:
This adds the initial struct and opener for a datapack. Future diffs
will add actual functionality and tests.

Reviewed By: quark-zju

Differential Revision: D8553436

fbshipit-source-id: 3b17f995632e859019205f242a4cce389ac77407
2018-07-05 14:53:17 -07:00
Durham Goode
029666cf27 mutabledatapack: write index to disk during serialization
Summary:
Actually write the index to disk when the mutabledatapack is
serializing.

Reviewed By: quark-zju

Differential Revision: D8552276

fbshipit-source-id: 354c7fdc3fe84b91d582f0e8cde8c6ae2494c559
2018-07-05 14:53:17 -07:00
Durham Goode
e68c0ec7e0 mutabledatapack: add logic for reading DataIndex
Summary: This adds the logic for reading a DataIndex from disk.

Reviewed By: quark-zju

Differential Revision: D8552278

fbshipit-source-id: 611ff09c27716b8d8ff7424c1a27287b9fc42b78
2018-07-05 14:53:16 -07:00
Durham Goode
d22f6ce58d mutabledatapack: add logic for writing DataIndex
Summary:
Soon we will be writing the index during pack file serialization, so
let's add the logic for serializing the index.

Reviewed By: quark-zju

Differential Revision: D8552277

fbshipit-source-id: 60829631eb060f62d266c16f6016f34080311f8e
2018-07-05 14:53:16 -07:00
Durham Goode
69a59f53d9 revisionstore: add Node::from_slice
Summary: A simple helper method for producing Node's from slices in a safe way.

Reviewed By: quark-zju

Differential Revision: D8547679

fbshipit-source-id: 85ae8fcd7749c662b1459af1d84ccf9695dd5f0b
2018-07-05 14:53:16 -07:00
Durham Goode
1c3767bc11 mutabledatapack: implement data index header serialization
Summary:
We're beginning to implement the DataPack index file logic. Let's start
with header serialization/deserialization.

Reviewed By: quark-zju

Differential Revision: D8319727

fbshipit-source-id: 079aab06ececb1c5159aec2da3243268eea0cb61
2018-07-05 14:53:15 -07:00
Durham Goode
12e0e5bf16 mutabledatapack: build inmemory index as revisions are added
Summary:
Let's build an inmemory hash table of the revisions that were added. A
future diff will serialize this index into a dataidx file.

Reviewed By: quark-zju

Differential Revision: D8309730

fbshipit-source-id: 9efc7f0f34129a63c52309b4d70179f2c10840b3
2018-07-05 14:53:15 -07:00
Durham Goode
763d1e4bef datapack: implement a fanout table
Summary:
Implements a fanout table trait that history pack and data pack will
use. It basically consists of logic to build and read a quick lookup table that
uses the first few bytes of a key to determine the bounding range of a binary
search.

Reviewed By: quark-zju

Differential Revision: D8309729

fbshipit-source-id: 71e398277dc8ae041447035f044e5d47ca41cf7e
2018-07-05 14:53:15 -07:00
Durham Goode
4588cc18c8 mutabledatapack: write version number header
Summary:
The mutabledatapack format has a one byte header containing the version
number.

Reviewed By: quark-zju

Differential Revision: D8305653

fbshipit-source-id: c4a96dc48e64acd2c5849034e5d90b87363fbc8d
2018-07-05 14:53:15 -07:00
Durham Goode
99a11bbb24 mutabledatapack: use hash of contents as name
Summary:
Implements the logic that builds a hash of the contents of the pack
file and uses it as the name.

Reviewed By: quark-zju

Differential Revision: D8305654

fbshipit-source-id: d1270e7519a7718aa5427f3be5cdc0cd0dee2fe2
2018-07-05 14:53:14 -07:00
Durham Goode
3f467bd21f mutabledatapack: implement add()
Summary:
This is the start of a rust mutable datapack implementation. The first
diff adds a simple add function. Later diffs will add the logic that builds the
index, serializes the index, and computes the final hash name.

Reviewed By: quark-zju

Differential Revision: D8304036

fbshipit-source-id: db05c2b845e51a3552c039b7fc0b8f4cc0ff0852
2018-07-05 14:53:14 -07:00
Durham Goode
8057817dc1 revisionstore: add read/write functions to Metadata
Summary:
In a future diff we'll be serializing and deserializing metadata in
datapacks. Let's add the reader and writer functions for Metadata and some unit
tests.

Reviewed By: quark-zju

Differential Revision: D8303603

fbshipit-source-id: 7e7a7aa218c05179b205abf8b151b1488be674b3
2018-07-05 14:53:14 -07:00
Liubov Dmitrieva
e22322c2e5 commit cloud subscriber: skip cloud sync if can't resolve interngraph host
Summary:
this will reduce cloud sync errors and unnecessary cloud sync calls

the daemon triggers cloud sync on service start/restart
it is not always the time when the machine online (and connected to correct network), so we get cloud sync errors

Reviewed By: markbt

Differential Revision: D8692972

fbshipit-source-id: 59033fd4c3e7c30100d82b908442bbf1ebea9322
2018-06-29 12:20:11 -07:00
Jun Wu
4ba555977c vendoredcrates: upgrade zstd-sys to the latest
Summary:
zstd has dropped `ZSTD_TARGETLENGTH_MIN` [1]. Let's upgrade our code to be
compatible.

[1]: c2c47e24e0

Reviewed By: DurhamG

Differential Revision: D8683180

fbshipit-source-id: 66cbab1ddd254b1e0b91232565b4d512810ba03d
2018-06-28 15:08:01 -07:00
Saurabh Singh
fa3c7b34a3 add basic tests for unionhistorystore
Summary:
This commit adds very basic tests for the Union History Store. These
tests just test for expected output of operations on bad/empty stores.

Reviewed By: quark-zju

Differential Revision: D8553821

fbshipit-source-id: a0dfa47f10083c37901535e8a810a99693a28c82
2018-06-27 19:05:31 -07:00
Saurabh Singh
40e70758b7 introduce union history store
Summary: This commit just introduces the Union History Store.

Reviewed By: DurhamG

Differential Revision: D8553822

fbshipit-source-id: 6c7ee0b5d33dae6d51b4179616d206f42eb0cd50
2018-06-27 19:05:31 -07:00
Saurabh Singh
0326fe4584 introduce history store
Summary: This commit just introduces the history store.

Reviewed By: DurhamG

Differential Revision: D8553823

fbshipit-source-id: 93af6059296d11c4fcc0dd306b4472c4f2168fa7
2018-06-27 19:05:31 -07:00
Saurabh Singh
164bf3e85a fix error messaging
Summary: This commit just fixes the messaging for the errors.

Reviewed By: DurhamG

Differential Revision: D8553820

fbshipit-source-id: 73f2cd13e7538b6870b16a0e47e657a6d08af9e3
2018-06-26 11:36:09 -07:00
Jun Wu
e17f635422 treestate: fix perf regression on treedirstate
Summary: `calculate_aggregated_state_recursive` should be a no-op with treedirstate.

Reviewed By: DurhamG

Differential Revision: D8505551

fbshipit-source-id: 08b081944cccc0abc4f41ac2e75c8c4305bc9772
2018-06-19 00:48:56 -07:00
Liubov Dmitrieva
91144d493c cloudsyncsubscriber log pid of cloud sync process
Summary: log the pid of the spawned cloud sync process, it might help with debugging if something is broken

Reviewed By: markbt

Differential Revision: D8478566

fbshipit-source-id: fd9a9a228bc325056fb35d17ee93c865679e6e23
2018-06-18 08:05:14 -07:00
Liubov Dmitrieva
053b496956 improve robustness
Summary:
read the token only when it is needed to do so, not in the constructor
scm daemon can run for users who are not registered with Commit Cloud

Reviewed By: markbt

Differential Revision: D8445923

fbshipit-source-id: b0d8c86729721037a02f93bbf7fa1fc88d7d7979
2018-06-15 07:48:22 -07:00
Jun Wu
807e8af1e1 zstdelta: update to rand 0.5
Summary: Update rand to 0.5. Make it build with buck.

Reviewed By: phillco

Differential Revision: D8412349

fbshipit-source-id: 663b9ca7d3c2b08ade756b4cb3f135b3af2a3d20
2018-06-14 21:38:33 -07:00
Jun Wu
119b479c9e indexedlog: log: implement index updating logic
Reviewed By: DurhamG

Differential Revision: D8156519

fbshipit-source-id: eb82e7547d10c7b839e757fa787f91950dea181e
2018-06-11 19:36:16 -07:00
Jun Wu
365c728134 indexedlog: index: add metadata to the root node
Summary:
This allows us to store arbitrary metadata in the root node. It will be used
by the `Log` structure to store how many bytes the index covers.

Reviewed By: DurhamG

Differential Revision: D8337687

fbshipit-source-id: 159a89d66765fc251a486fd62c1ffd01f625b503
2018-06-11 19:36:16 -07:00
Jun Wu
0b92632004 indexedlog: log: implement log loading functions
Summary: Implement the dependencies of the "open" public API.

Reviewed By: DurhamG

Differential Revision: D8156518

fbshipit-source-id: 9fed441f520a3b74cbef5bfb815c82943c615fdf
2018-06-11 19:36:16 -07:00
Jun Wu
77d75acbdd indexedlog: log: implement the iterators
Summary: Implement `LogLookupIter`, and `LogIter` for fetching data.

Reviewed By: DurhamG

Differential Revision: D8156521

fbshipit-source-id: 5ef2b2e6475d41ae7468e79b4a1234619decf75f
2018-06-11 19:36:15 -07:00
Jun Wu
8c3a69a56e indexedlog: log: implement internal read_entry function
Summary:
The read_entry function takes care of reading an entry from a given offset,
and return internal stats like real data offset (skipping the length and
checksum metadata), and the next entry offset.

It does integrity check and handles offset for both in-memory and on-disk
buffers. The offsets to in-memory entries are fairly simple - they start
from "meta.primary_len" instead of a fixed reserved value. This makes the
"next_offset" work seamlessly.

The public API won't have "offset" exposed, so the API is private.

Reviewed By: DurhamG

Differential Revision: D8156513

fbshipit-source-id: 8661f2f2757de6f3f94defc64f4a8dd5261973b2
2018-06-11 19:36:15 -07:00
Jun Wu
991a9343b9 indexedlog: log: partially implement main APIs
Summary:
Partially implement open, append, flush, lookup APIs. This shows how things
work in general, like how locking works. What's in-memory and what's on-disk
etc.

Reviewed By: DurhamG

Differential Revision: D8156514

fbshipit-source-id: 2de23dcde2f63895f3f3e4f67057aa9520fdfa34
2018-06-11 19:36:15 -07:00
Jun Wu
529c79bd33 indexedlog: log: implement serialization for the meta file
Summary: Implemented as the file format specification added by the previous diff.

Reviewed By: DurhamG

Differential Revision: D8156516

fbshipit-source-id: 7153932b9442b3ab5bdb81490f88c40346128afc
2018-06-11 19:36:15 -07:00
Jun Wu
97281caabf indexedlog: log: define public facing interface
Summary: The public interface and its dependencies.

Reviewed By: DurhamG

Differential Revision: D8156509

fbshipit-source-id: c6f3e4b88851683a5d8804b80f689282e3f582d4
2018-06-11 19:36:15 -07:00
Jun Wu
8ad9276975 indexedlog: log: add comments about the file format
Summary: Start implementing the "Log" object. Let's define the file formats first.

Reviewed By: DurhamG

Differential Revision: D8156515

fbshipit-source-id: 037f7454452959f82583a4d97d3f38dfa60aa741
2018-06-11 19:36:14 -07:00
Jun Wu
d7c4d3a249 treestate: optimize calculate_aggregated_state_recursive
Summary:
Follow-up of the previous diff. Change the file format so aggregated_state
could be loaded without loading all entries. This would make
`calculate_aggregated_state_recursive` (and `write_delta`) more efficient
in case the node is not modified.

Reviewed By: markbt

Differential Revision: D7909169

fbshipit-source-id: d70b662c7d8c544edf81fbc7da94da9ccbee6cf0
2018-06-11 14:32:42 -07:00
Jun Wu
4819f4203d treestate: calculate aggregated_state recursively during write_delta
Summary:
This avoids a possible Rust panic during `write_delta`, because entries
could have `id` set without `aggregated_state`, by `Node::open`. This diff
fixes that by calling `calculate_aggregated_state_recursive`. The function
has to be changed to static dispatch, since dynamic dispatch only supports
one trait. Practically, this would load one-level content unnecessarily,
which might be optimized by separating loading entries vs loading aggregated
state.

Reviewed By: markbt

Differential Revision: D7909168

fbshipit-source-id: 5effe9df59ce42829a077cab89525103e211bddf
2018-06-11 14:32:42 -07:00
Jun Wu
1f83a4dc00 treestate: require visitor to provide whether it modifies a file or not
Summary:
This is subtle. If visitor changes file state, `Node.id` should be set to
`None` to mark it as "changed".

In practise, treedirstate uses visitor to rewrite mtime to -1 if mtime is
"fsnow". Those rewritten mtime all belong to "changed" nodes (because "fsnow"
can only increase, and on-disk entries cannot have "mtime == fsnow" because
they would be written to -1 during the previous write), so it's not a problem
yet.

It is safer to not depend on the fact that "visitor" can only change "changed"
nodes. On the other hand, detecting changes for all filestate fields could be
undesirably expensive. So let's make the visitor provide the "changed or not"
information. Surely the visitor knows what it does.

Reviewed By: markbt

Differential Revision: D7909167

fbshipit-source-id: 21e71302cf1db86c1330b294baddd51cc8a96026
2018-06-11 14:32:42 -07:00
Jun Wu
98db645ecd treestate: drop Tree.get_mut API
Summary: It's not used, therefore removed.

Reviewed By: markbt

Differential Revision: D7909171

fbshipit-source-id: 587a1d844ece4f2cb0c2ccd9b2d978aed69a959f
2018-06-11 14:32:42 -07:00
Liubov Dmitrieva
a4d1fac35a commitcloud add '--use-bgssh' option for hg cloud sync
Summary:
this is needed because `hg cloud sync` can be triggered by external serviced like scm_daemon on behalf of the user,
so it should just fail rather than expect user to type the password, so we change ui ssh option to the bgssh (background ssh) that is defined in infinitepush section

Reviewed By: markbt

Differential Revision: D8331723

fbshipit-source-id: 28f9d007702e4f6ed5216114921375b76def3f93
2018-06-08 10:32:34 -07:00
Jun Wu
c5b267584b treestate: migrate to rand 0.5 to fix cargo test without breaking buck
Summary:
The pull request [1] is still open, which means `quickcheck::rand` is still
private when building with `cargo`. It only works with a patched quickcheck.
We cannot revert D8234503 since that will break buck build. So there is no
choice but upgrade to rand 0.5.

[1]: https://github.com/BurntSushi/quickcheck/pull/204

Reviewed By: DurhamG

Differential Revision: D8297404

fbshipit-source-id: 19937c49ae96a39e326b1b54eb00e6e2944193c2
2018-06-06 12:54:37 -07:00
Phil Cohen
b54cbaa464 commitcloudsubscriber: use old import syntax
Summary: The Ubuntu and Windows builders have an older rustc that doesn't support this syntax.

Reviewed By: DurhamG

Differential Revision: D8301570

fbshipit-source-id: 56990a804053a4dc78e41789c7b577bcf82868d7
2018-06-06 12:06:18 -07:00
Wez Furlong
31bcfbe58e hg: disable check-code tests for C code
Summary:
They're actively fighting against the clang-format config
and don't have an auto-fix.

Reviewed By: quark-zju

Differential Revision: D8283622

fbshipit-source-id: 2de45f50e6370a5ed14915c6ff23dc843ff14e8a
2018-06-05 19:21:43 -07:00
Durham Goode
d34a99a394 commitcloud: avoid using nested includes
Summary:
The windows and ubuntu builds don't have a version of rust that
supports these features, so this breaks the build.

Reviewed By: phillco, quark-zju, singhsrb

Differential Revision: D8289651

fbshipit-source-id: d08b141b4d9996e3b899ac0604225ad34f863990
2018-06-05 16:06:56 -07:00
Liubov Dmitrieva
c80a2aafcb scm daemon: refactoring (remove unused crates)
Summary: just refactoring to improve the code quality

Reviewed By: markbt

Differential Revision: D8276584

fbshipit-source-id: bf0317e91f96d2f7fee24ea69c0f33a0aed54a98
2018-06-05 07:11:55 -07:00
Liubov Dmitrieva
ae7ece9cd5 scm daemon: refactoring
Summary:
just refactoring to improve the code quality

the main improvement is that I separated TcpReceiver to a different service,
any other services can register callbacks with TcpReceiver service.

For WorkspaceSubscriberService callbacks are implemented using mpsc channel to notify the main WorkspaceSubscriberService thread and single atomic flag that allows running subscriptions to join.

Another improvement is that I added logic to run cloud sync on the first keep alive after connection errors

Reviewed By: markbt

Differential Revision: D8226109

fbshipit-source-id: 3fe513da9273b28b2262948ecdf620821e7ab313
2018-06-05 07:11:55 -07:00
Liubov Dmitrieva
80f63e9451 scm daemon: refactoring (improve messages correctness)
Summary: just refactoring to improve the code quality

Reviewed By: markbt

Differential Revision: D8276563

fbshipit-source-id: afca70b9b487450fbaab897dff5cd79d6c3a0108
2018-06-05 04:36:07 -07:00
Jun Wu
c65612acc9 indexedlog: index: stop iteration if an error is encountered
Summary:
Without this change, code doing `index.get(...).values().collect()` might
end up with an infinite loop.

Reviewed By: DurhamG

Differential Revision: D8156510

fbshipit-source-id: 5497aa354de7d49cfc4308a025856608ce981a1e
2018-06-05 00:12:29 -07:00
Jun Wu
798e55d53d indexedlog: index: change APIs to take file lengths instead of root offsets
Summary:
Previously, the index API optionally takes a root offset. This is
inconvenient for the caller since they probably need to record both
valid file length and root offsets. Since root nodes are always at
the end of the index. Let's just simplify the API to take a logical
file length instead of a root offset.

Reviewed By: DurhamG

Differential Revision: D8156512

fbshipit-source-id: 7029272a61c9990e6484bca7ebbff64e2233c6cd
2018-06-05 00:12:29 -07:00
Jun Wu
68660cc443 indexedlog: utils: make mmap_readonly optionally take file length
Summary:
Previously, `mmap_readonly` always reads file length, and uses that for mmap
length. In many cases we do know the desired file length and it's cleaner to
not `mmap` unused bytes. So let's add a parameter to do that.

Note: The `stat` call is still needed. Since `mmap` wouldn't return an error
of the requested length is greater than the file length.

Reviewed By: DurhamG

Differential Revision: D8156523

fbshipit-source-id: 991aa28f3542eaff24387dcc6a7302122fb6962f
2018-06-05 00:12:29 -07:00
Jun Wu
c43312ad9c indexedlog: utils: move xxhash to utils
Summary: The function will be reused in another module.

Reviewed By: DurhamG

Differential Revision: D8156522

fbshipit-source-id: 2aff6f2e4b8fc9b5d2c000e12ac2d940f7fab407
2018-06-05 00:12:29 -07:00
Saurabh Singh
7c9227818a refactor rust datastore to a consistent naming scene
Summary: This is just a refactor to address the naming scheme.

Reviewed By: quark-zju

Differential Revision: D8269217

fbshipit-source-id: 8c52d2c67837550e0b7dc1a45b3faf9a80319b61
2018-06-04 17:39:47 -07:00
Saurabh Singh
7067e4ca1f fix nit in implementation
Summary:
Based on review for D8214151 by quark-zju, addressing the nit here as
well.

Reviewed By: quark-zju

Differential Revision: D8267140

fbshipit-source-id: 12c3355852a49859c2b0a243fa8666105c914c73
2018-06-04 16:21:38 -07:00
Saurabh Singh
8ba5a79489 adding tests for bad data store
Summary:
Adding the tests for the case when the union store has only one data
store which always returns an `Err` as `Result`. This `Err` is not of the type
`KeyError` which the union store handles differently.

Reviewed By: quark-zju

Differential Revision: D8214156

fbshipit-source-id: bd077af343086c92f46ec6a6f1551d05dd9bda09
2018-06-04 16:21:38 -07:00
Saurabh Singh
242f2b904f add tests for empty data store
Summary:
Adding tests for the case when the union store only has a single data
store which is completely empty.

Reviewed By: quark-zju

Differential Revision: D8214151

fbshipit-source-id: 9d8f329548a1b7e105a5dc6219067a6e292fe97c
2018-06-04 16:21:37 -07:00
Saurabh Singh
34ca90a6f8 rename test methods to be more specific
Summary:
This commit just renames the methods to be more specific. This is
useful for later changes.

Reviewed By: quark-zju

Differential Revision: D8214153

fbshipit-source-id: e8db9148334f7cd539aca626e3798e256b9b022f
2018-06-04 16:21:37 -07:00
Liubov Dmitrieva
9ebe334049 commitcloudsubscriber: enable new options for cloud sync
Reviewed By: markbt

Differential Revision: D8187376

fbshipit-source-id: cfc9feedb8763e36f2af8eec61d73f2e943b19d7
2018-06-04 06:47:22 -07:00
Jeremy Fitzhardinge
98be816aba rust/tp2: update rust-crates-io
Summary:
Big change here is update to rand 0.5. This is a significant API
change. quickcheck still uses rand 0.4, so for quickcheck users I changed it so
that quickcheck re-exports the rand it uses. This means that quickcheck users
are unchanged aside from using quickcheck::rand, whereas direct rand users have
been updated to use the new API.

Reviewed By: farnz

Differential Revision: D8234503

fbshipit-source-id: f9e620851b8dfcc33f22a0af26122adcd5fbde39
2018-06-01 09:32:56 -07:00
Adam Simpkins
641bac8427 improve cdatapack fanout table calculation
Summary:
Refactor the cdatapack logic that computes the fanout table.  This more
accurately computes the correct ranges to bisect for each fanout table entry.

This fixes an off-by-one error setting end_index in most buckets that caused
it to search a slightly larger bisection range than necessary.

This also fixes the code to accurately compute which buckets do not have any
nodes, and sets a (start, end) range of (1, 0) for these buckets, causing
find() to avoid having to search anything in these cases.

Reviewed By: quark-zju

Differential Revision: D8131019

fbshipit-source-id: 70d6d0f2e1d900a2df27b64f3a38f114d301be0d
2018-05-30 18:54:19 -07:00
Adam Simpkins
2ea8028d14 fix a bug in cdatapack fanout table calculation
Summary:
If all of the nodes in a datapack file start with the same byte value,
the cdatapack code computed the fanout table incorrectly.  If this byte value
was anything other than 0x00 it would only be able to find the last node in
the pack, and would not be able to find any other nodes.

This fixes the code to compute the fanout table correctly in this case.

Reviewed By: quark-zju

Differential Revision: D8131020

fbshipit-source-id: 84e49befc5776cff96831f6120194466d9c80b35
2018-05-30 13:47:50 -07:00
Liubov Dmitrieva
4727ecd46d commitcloudsubscriber: improve code
Summary: minor code improvements

Differential Revision: D8206294

fbshipit-source-id: 7ea46db7b7af200665b84d00f8912fa385ebc091
2018-05-30 13:47:50 -07:00
Liubov Dmitrieva
d865e06a0d commitcloudsubscriber: add log throttling rates and add tcp socket listener to receive simple commands
Summary:
Added logic to control logging rate: empty messages that comes to confirm the subscription is alive, also on error logging rate when we are offline, also when we are running in standby with no active subscriptions

Also, I made a simple cross platform API, so that hg can trigger restart subscriptions in 2 lines of code. It is simple request - response API on tcp socket and json.

If a human run `hg cloud join`, hg will add subscriber file to the directory scm daemon reads subscribers from and will send the restart command, same for any `hg cloud leave` run

Another advantage is that the client (hg) can very easy check if the scm daemon is alive or not. (In 2 lines of code, cross platform, without any pid logic or other platform specific ifs)

Another advantage is that we can use it to receive some stats from the scm daemon.

I decided do not go with any watching directory logic, because changes are really rare events, and it will be better if a client (hg)  will just notify the service to restart subscriptions when needed.

Also, I verified that hg and SCM Daemon use the same config options and logic related to detected home directory on different platforms and reading the token.

Reviewed By: markbt

Differential Revision: D8162237

fbshipit-source-id: 3cb48b90f5e065ce4dc7fdc7215c3ce6ad57fb9a
2018-05-30 08:15:17 -07:00
Lukasz Langa
dfda82e492 Upgrade to 18.5b1
Summary: Mostly empty lines removed and added.  A few bugfixes on excessive line splitting.

Reviewed By: quark-zju

Differential Revision: D8199128

fbshipit-source-id: 90c1616061bfd7cfbba0b75f03f89683340374d5
2018-05-30 02:23:58 -07:00
Liubov Dmitrieva
7eb66b2717 watchman rust client for hg: allow fallible deserialization for paths and add tests
Summary: watchman rust client: allow PathBuf fallible deserialization

Reviewed By: wez

Differential Revision: D7939382

fbshipit-source-id: f1d2a2f778ef9dc40ab325346c9428ca0b605750
2018-05-29 13:09:18 -07:00
Jun Wu
ce8e166ebe treestate: add API to get directory's aggregated states
Summary:
Add an internal `get_dir` API to return aggregated states. It is exposed via
`.get('dir/')` python interface.

This is useful for implementing `hastrackeddir` of the dirstatemap class.

Reviewed By: markbt

Differential Revision: D7909173

fbshipit-source-id: 100a8f36237a6b911a4bfb4afbb4c63b98611317
2018-05-26 14:05:18 -07:00
Jun Wu
bcbd121255 treestate: split Node.write_ext into two methods
Summary:
`Node.write_ext` currently contains the logic to calculate
`aggregated_state` and write it to disk. A future diff would need the
calculation without the writing part. So let's split the method.

Reviewed By: markbt

Differential Revision: D7909177

fbshipit-source-id: e83d622e6c1eb512c6a0c3ea8c7201055aa67a21
2018-05-26 14:05:18 -07:00
Jun Wu
46ab269f99 treestate: forbid addfile with path ending with slash
Summary:
Inserting file with names like `a/b/` should be forbidden to avoid errors
later.

Reviewed By: markbt

Differential Revision: D7886809

fbshipit-source-id: c78d357542af4fdc1cea70ad5751b356d3cb308d
2018-05-26 10:50:26 -07:00
Jun Wu
ebe5ab388f treestate: protect split_key from crashing on empty key
Summary:
Not sure how this actually happens. But it happened in a test instance.
Re-run the test locally on the same commit does not reproduce it though.

Reviewed By: phillco

Differential Revision: D8169935

fbshipit-source-id: 7e71611915722d68a1bc633819b94836f63fbc3f
2018-05-26 10:50:24 -07:00
Adam Simpkins
490247285c update open_datapack() to reject empty pack files
Summary:
Update the open_datapack() function to fail with an DATAPACK_HANDLE_CORRUPT
error if the index file contains a valid header but no index entries.

Previously open_datapack() would succeed, but it would incorrectly read past
the end of the mmapped data when trying to find entries.

The python code never generates empty pack files, so this behavior change
should not affect any legitimate pack files.

Reviewed By: quark-zju

Differential Revision: D8131018

fbshipit-source-id: a07eac9048314c9974cae0f98efaf8a4a383d966
2018-05-25 18:03:27 -07:00
Jun Wu
6210435392 zstdelta: add a simple command-line utility for debugging
Summary:
The command-line utility takes 2 files and compress/decompress them.
It's not meant to be formally used like `zstd`. But is handy to do simple
tests without writing Rust code.

Reviewed By: DurhamG

Differential Revision: D7596169

fbshipit-source-id: 6c21a38e21a061fab7032ff823b907b0e586bd42
2018-05-25 16:28:59 -07:00
Jun Wu
7b9867ac12 crates: pin rand to 0.4 version
Summary:
`rand` 0.5 has too many breaking changes that the code is not ready to
migrate yet. So let's ping rand to 0.4. Ideally all dependencies in
Cargo.toml should avoid using "*". But for now `rand` is the only
troublemaker.

Note `rand 0.4` is a dependency of `quickcheck 0.6.2` so it's available.

Reviewed By: phillco, singhsrb

Differential Revision: D8158406

fbshipit-source-id: 417ae6807a2efc650acb8d82370964fab6531fdb
2018-05-25 09:51:19 -07:00
Jun Wu
7a348e55a9 zstdelta: rust library to use zstd to do delta-ing and compression
Summary:
Using zstd dictionary as the "delta base" can result in overall better and
faster compression (than things like mdiff + zstd, or fossil_delta + zstd).

This diff adds utility functions to do delta generation and application.
It tweaks compression parameters (wlog, hlog) to optimize the "delta-ing"
usecase. It now hardcoded the "fast" strategy (level=1) to have reasonable
speed. We can add other compression levels later if needed.

Reviewed By: jsgf

Differential Revision: D7562908

fbshipit-source-id: 3334059b4abeb8923d603d055bde0bfdc854bc7b
2018-05-25 09:34:34 -07:00
Liubov Dmitrieva
12796cbac0 code fixes to support Rust < 1.26
Summary: This changes to support Scm Daemon on dev machines

Reviewed By: farnz

Differential Revision: D8139892

fbshipit-source-id: b6df53d6ce6615d24822b739d4d1705e0f572660
2018-05-24 12:19:55 -07:00
Liubov Dmitrieva
39ccc28933 Scm Daemon initial implementation
Summary: Scm Daemon initial implementation that currently just listen to Commit Cloud Live Notifications and trigger `hg cloud sync` on notifications

Reviewed By: markbt

Differential Revision: D8119768

fbshipit-source-id: a0d86624fe4b81b3adc89990640916d3da279b8c
2018-05-24 12:19:55 -07:00
Adam Simpkins
7233f0dfb1 update cdatapack utilities to check open success
Summary:
Update cdatapack_dump and cdatapack_get to check the return value from
`open_datapack()` to confirm if they actually successfully opened the file.

Previously these programs would segfault if invoked with a non-existent path.

Reviewed By: quark-zju

Differential Revision: D8131017

fbshipit-source-id: 90800de57430efd176b8e71fa84161f7b288e375
2018-05-24 11:30:48 -07:00
Jun Wu
af62797cf3 treestate: improve aggregated_state correctness and usability
Summary:
Previously, `aggregated_state` is only a union of bit flags. It makes
querying files with rare bits fast. But it cannot help with common bits.

This diff adds an `intersection` field, so both "having rare bits", and
"not having common bits" can be queried efficiently.

As we're here, use an explicit `None` to represent "need re-calculation",
instead of using `is_empty()`. This makes the code easier to reason about.
It also solves an issue in `add` that is caught by the next test.

Reviewed By: markbt

Differential Revision: D7886281

fbshipit-source-id: 4ce395883ea26ea9b33794e03c792ea157dc21d0
2018-05-23 06:12:46 -07:00
Jun Wu
acc362ae38 treestate: rename watchman_clock to metadata
Summary:
The field is a string that contains information `TreeState` itself does not
care about. It's up to the upper layer to decide how to use it, and it does
not have to be watchman clock, and might contain other fields like hostname,
ignore matcher hash, etc. So let's rename it to clarify.

Differential Revision: D7886282

fbshipit-source-id: 739a85d7a710918e0b18a9b7fe0e31b366bab447
2018-05-23 06:12:45 -07:00
Jun Wu
8d965905d1 treestate: add TreeState.path_complete and get_filtered_key
Summary: Similar to `TreeDirstate`, those methods are required.

Reviewed By: markbt

Differential Revision: D7874126

fbshipit-source-id: 6dbd6c47c7ba2ded7ea7389dfa9de4cf43db8a01
2018-05-23 06:12:45 -07:00
Jun Wu
b27143828b treestate: change Key from Vec<u8> to Box<[u8]>
Summary: This saves one `usize` per `Key`.

Differential Revision: D7861766

fbshipit-source-id: e44d6b98758966edd0f9823f2f50270ba5481b22
2018-05-23 06:12:45 -07:00
Jun Wu
395edeaea6 treestate: remove clear_filtered_keys
Summary:
The method looks like a foot-gun, and it's O(all entries) instead of
O(cached entries). Change `get_tracked_filtered_key` to take an identity of
the filter function explicitly to solve the problem.

Reviewed By: markbt

Differential Revision: D7861765

fbshipit-source-id: a57ca4a7597120a5b00c63f3f373a62e19e5a834
2018-05-23 06:12:45 -07:00
Jun Wu
e8a83ab74e treestate: use functions to filter out what to visit
Summary:
Follow up of the StateFlags change. Previously the code uses simple bitwise
operations to decide what to visit. That can not express complex conditions.
Let's use functions instead. Rustc should know how to inline those
functions.

Reviewed By: markbt

Differential Revision: D7860276

fbshipit-source-id: 71bf381e00adbb3259a1ae61dbd68fa67f02efdb
2018-05-23 06:12:45 -07:00
Jun Wu
ff2e0ccde1 treestate: add TreeState.visit
Summary: The visit function allows visiting files, filtered by StateFlags.

Reviewed By: markbt

Differential Revision: D7824117

fbshipit-source-id: 7625396cd942ca87056f322a7342979570853d37
2018-05-23 06:12:45 -07:00
Jun Wu
8ca928ced2 treestate: update aggregated_state inside Node.visit
Summary:
Node.visit might change internal file states. So aggregated_state might need
update.

Reviewed By: markbt

Differential Revision: D7824118

fbshipit-source-id: 4f935f427de4d1803524f5908466917e6163dd90
2018-05-23 06:12:45 -07:00
Jun Wu
ddc0a1bcfe treestate: use interior mutability for aggregated_state
Summary:
The upcoming diffs need to do something like:

  self.entries.as_mut().iter_mut() {
    self.aggregated_state = ...
  }

That cannot be done if accessing entries and aggregated_state both need
`&mut` borrow of `self`. So let's use interior mutability.

Reviewed By: markbt

Differential Revision: D7824116

fbshipit-source-id: 67dd317ebfbfc698e79ed6c0e96e69d3fe495a26
2018-05-23 06:12:45 -07:00
Jun Wu
225a0771b4 treestate: revise StateFlags bits
Summary:
The "added", "removed", "normal", "? (untracked)" 4 states could be simplified
to 2 bits: "EXIST_P1", "EXIST_NEXT". With "merge" considered, adding "EXIST_P2"
would be enough. This avoids some invalid states, making it easier to reason
about. It also makes Mercurial dirstate hacks like size = -1, size = -2 noting
"merge" and "otherparent" unnecessary.

With this change, the previous `state_required_all`, `state_required_any`
query parameters are not powerful enough. That would be changed to functions
in a later diff. There is a new need to select files by querying "unset" bits.
That will be addressed by D7886281.

Reviewed By: markbt

Differential Revision: D7860277

fbshipit-source-id: 15d198fbd0ffa858c8ed751d42dff73e06114c12
2018-05-23 06:12:45 -07:00
Jun Wu
0b3737c904 treestate: avoid broad borrow on self in Node.visit
Summary:
Looking at the `path` variable, it is now:

  vec![KeyRef<'a>, KeyRef<'a>, KeyRef<'a>, ...]

where `'a` is bounded to `self`. That's okay but makes it impossible to
modify `self` between `path.push` and the end of the `visit` function.

For `Node<FileStateV2>`, `self` needs to be modified, because `visitor`
might change `file`'s state, and `self.aggregated_state` needs to be updated
accordingly.

Basically, the elements of `Vec` should match the call stack, and have
different lifetimes. A nested `visit()` should have a nested lifetime on
`Vec` elements, instead of relying on borrowing `self`.

  vec![KeyRef<'a>, KeyRef<'b>, KeyRef<'c>, ...]
  #    visit(path) {
  #                let name: KeyRef<'b>; // instead of <'a>.
  #                path.push(name)
  #                visit(path) {
  #                            let name: KeyRef<'c>;
  #                            path.push(name)
  #                            visit(path) { ... }
  #                            }
  #                }

The previous diff introduces a `VecStack` struct to work in this case. Let's
use it. This also make sure all `path.pop()` happens automatically even if
panic happens.

Reviewed By: markbt

Differential Revision: D7797457

fbshipit-source-id: e1723fac3dbd7d244b69f1fc46d90bef64a29a3f
2018-05-23 06:12:45 -07:00
Jun Wu
7fa9772b67 treestate: add VecStack to support nested lifetimes matching stack frames
Summary:
The motivation is to have something like:

  // in Node.visit
  let mut vec = Vec::new();
  {
    vec.push(self.foobar()); // borrows self
    ...
    vec.pop(); // vec no longer contains "&self". But rustc won't know.
  }
  // want to mutate "self" here.

Reviewed By: markbt

Differential Revision: D7803738

fbshipit-source-id: 1bf2fc5788e7963f3144db490d044fcdb5193fad
2018-05-23 06:12:45 -07:00
Jun Wu
5cd280bf4e treestate: add TreeState.has_dir
Summary: This is needed for certain code paths. Fix `Tree.has_dir("/")` special case.

Differential Revision: D7797455

fbshipit-source-id: 5855e7ad6ef73eb07d590dd5201367b5c7f86a96
2018-05-23 06:12:45 -07:00
Jun Wu
6111ea14fd treestate: add basic tests for TreeState
Summary: Basic tests about serialization and add/remove/modify operations.

Differential Revision: D7769657

fbshipit-source-id: 767ee8fb1d813de5bc99a5d6c81978c89c51f298
2018-05-23 06:12:45 -07:00
Liubov Dmitrieva
aa61690c9a watchman rust client for hg: add test for arrays of objects
Summary:
watchman rust client: add tests for arrays of objects

this contains a reproducer test for broken compact arrays in combination with untagged enums

Reviewed By: sunshowers

Differential Revision: D7877495

fbshipit-source-id: 7fc3c05f1590e708a7645f0f9adbfd545e8bae42
2018-05-21 04:30:34 -07:00
Liubov Dmitrieva
44af40467e watchman rust client for hg: hg client
Summary:
Implement fsmonitor interfaces used by Mercurial fsmonitor extension, namely:

 - query_dirs
 - query_files
 - state_enter
 - state_leave

Reviewed By: quark-zju

Differential Revision: D7876428

fbshipit-source-id: 45c7a23bd0da4dcedcc473d75ac75fbd006a599a
2018-05-21 04:30:33 -07:00
Liubov Dmitrieva
7800d0ec9f watchman rust client for hg: initial implementation
Summary:
watchman rust client for mercurial needs

* describe responses and request with strong typing
* supports bser and json protocols
* command line transport and unix socket transport
* socket discovery (env or query command line client)
* read timeouts

Reviewed By: wez

Differential Revision: D7764586

fbshipit-source-id: 1f5725f6ce615e3e6e30395d09b5b37e0c2229d4
2018-05-21 04:30:32 -07:00
Durham Goode
8a6a929876 lz4: add rust lz4 bindings
Summary:
The crates.io lz4 bindings only support the lz4 framed format, while
our python lz4 library produces custom framed compressed blobs. Let's add a new
wrapper around lz4-sys that handles are special framing. We can migrate to the
standard framing later.

Reviewed By: quark-zju

Differential Revision: D7855502

fbshipit-source-id: 04abb1bc784c6be7f22bcd80645d1b50debc93bd
2018-05-16 09:13:18 -07:00
Saurabh Singh
a4f5e8aefc add tests for empty union date store
Summary:
These tests just test the expected outcome when the union data store
is empty.

Reviewed By: quark-zju

Differential Revision: D8018975

fbshipit-source-id: a2cc4c87509b857dbf5f6af506f165ea62080db8
2018-05-15 18:37:47 -07:00
Saurabh Singh
cab75fbb90 add common traits for Key
Summary:
This commit derives the common traits for the Key type just as we did
for the Node type in D7872300.

Reviewed By: quark-zju

Differential Revision: D8018973

fbshipit-source-id: 566a69be16d74529c6eb5f157b84de25835f780f
2018-05-15 18:37:47 -07:00
Saurabh Singh
88a2d8ff6b implement quickcheck::Arbitrary for Key in revisionstore
Summary:
We need to implement `quickcheck::Arbitrary` for Key so that it can be
used for the quickcheck tests.

Reviewed By: quark-zju

Differential Revision: D8018977

fbshipit-source-id: dbdbb34fbd7eaeb18321eafec4604d752f496a4d
2018-05-15 18:37:47 -07:00
Saurabh Singh
0b5b994973 implement quickcheck::Arbitrary for Node type in revisionstore
Summary:
We need to implement `quickcheck::Arbitrary` for Node so that it can
be used for quickcheck tests.

Reviewed By: quark-zju

Differential Revision: D8018978

fbshipit-source-id: ceda99622370bee6e9d05b839f9856c0526f553c
2018-05-15 18:37:47 -07:00
Saurabh Singh
411736dc7c add 'quickcheck' crate to revisionstore
Summary:
I am planning to use the `quickcheck` crate for testing the union data
store. This commit just adds the crate to the revisionstore.

Reviewed By: quark-zju

Differential Revision: D8018974

fbshipit-source-id: d390deeb01aa7d1bf1e66bb5bc948d48bd3f269e
2018-05-15 18:37:47 -07:00
Saurabh Singh
8688e1cc5e union data store: union data store implementation in Rust
Summary:
This commit just introduces the `UnionDataStore` and implements the
`DataStore` trait for it.

Reviewed By: quark-zju

Differential Revision: D7801615

fbshipit-source-id: 14eabd2aa1b1e085de94aec126a7108231ec6e8d
2018-05-15 18:37:47 -07:00
Saurabh Singh
d2b9c6c6ac union store: introduce common type for the union store implmentation
Summary:
We will be implmenting multiple union stores and therefore, it makes
sense to encapsulate the common logic in its own type. This also abstracts the
usage of `RefCell` within the union store.

Reviewed By: jsgf

Differential Revision: D7884651

fbshipit-source-id: a74b6d9df5ee0d7d04359219e276fd5713b3a00b
2018-05-15 18:37:47 -07:00
Saurabh Singh
bcb7ac0b32 node: add common traits for the node type
Summary:
Based on the review comments for D7801615, this commit adds the common traits
for the `Node` type

Reviewed By: jsgf

Differential Revision: D7872300

fbshipit-source-id: 44dedfc3ec0e18ac0dee0dcfc5f7dfc4aff2511d
2018-05-15 18:37:47 -07:00
Durham Goode
8c6e5fd964 mpatch: add rust bindings
Summary:
Adds rust bindings around the existing mpatch c library.

Also fixes a bug in mpatch where it could reference uninitialized memory.

Reviewed By: quark-zju

Differential Revision: D7769299

fbshipit-source-id: bcc21df85c97ef6f5537ebff8fbf1b350ee64fc3
2018-05-14 16:06:32 -07:00
Durham Goode
18697e3fb3 hg: implement getmissing() for Rust hgstore
Summary:
Initial implementation of getmissing for a simple Rust pass through
data store. Future diffs will extend this to implement the union data store
completely in Rust.

Reviewed By: quark-zju

Differential Revision: D7632405

fbshipit-source-id: e660d33f8231410805cfaba6d77c56f27b002f8e
2018-05-14 12:05:13 -07:00
Durham Goode
f2b4d7f2e0 hg: implement getmeta() for Rust hgstore
Summary:
An initial implementation of getmeta for the Rust hg data store. Future
diffs will add more functionality.

Reviewed By: quark-zju

Differential Revision: D7632404

fbshipit-source-id: 53bd3b96b777bc3c5aef2b4d07ce1a9d9a5a52ed
2018-05-14 12:05:13 -07:00
Durham Goode
39dde8552d hg: implement getdeltachain() for Rust hgstore
Summary:
An initial implementation of getdeltachain for a simple pass through
data store. Future diffs will add additional functionality.

Reviewed By: quark-zju

Differential Revision: D7632407

fbshipit-source-id: 1a38089ba8ea70f8772af95afd871ee493082d80
2018-05-14 12:05:13 -07:00
Durham Goode
880ff5d0a9 hg: implement datastore.get() for Rust hgstore
Summary:
Implements the get function for a simple pass through rust data store
layer. Future diffs will implement more functions, and then later we will
implement the entire union data store in Rust.

Reviewed By: quark-zju

Differential Revision: D7632403

fbshipit-source-id: 3a1d0a8500e3110213d70dc1cff637cf8eadd809
2018-05-14 12:05:13 -07:00
Durham Goode
a97e97e413 hg: initial boiler plate for new hgstore crate
Summary:
This will contain all the Python centric hg store code that will let
Python call into the Rust storage layer.

Reviewed By: quark-zju

Differential Revision: D7632406

fbshipit-source-id: 6b7bcc8f47a23e9c0121e1f92de1137369bf584e
2018-05-14 12:05:12 -07:00
Jun Wu
a2e7d1cfe3 treestate: implement Rand for FileStateV2
Summary: This will be used in tests.

Differential Revision: D7769655

fbshipit-source-id: 27647685848c03f56740f49361dd286abdef8e33
2018-05-10 16:40:25 -07:00
Mateusz Kwapich
be9c5d754b fix pyflakes.t failure
Summary: Our CI didn't catch when I landed the previous diff.

Reviewed By: singhsrb

Differential Revision: D7834066

fbshipit-source-id: a51c2a294ea550917836f8b1eede2570838b60b7
2018-05-01 13:44:40 -07:00
Mateusz Kwapich
626759e564 add short_list support to argparse
Summary:
We mark some hg commands as appearing in short help, let's add this
to our parser

Reviewed By: quark-zju

Differential Revision: D7779270

fbshipit-source-id: 0c2b790f1994205ae4dbf7cd12ac3ba7f5ef39ad
2018-05-01 04:26:26 -07:00
Mateusz Kwapich
cc3ed51d7f importer of command defninitions from python hg to rust
Summary:
Let's check-in the definitions for now. In the future those should be generated
every build with all the extensions enabled - like in prod.

Reviewed By: quark-zju

Differential Revision: D7779273

fbshipit-source-id: f0d5c5260be74c5f64c0945004bf60399a6e8c4c
2018-05-01 04:26:26 -07:00
Jun Wu
8f8adff716 treestate: add getter and setter of watchman clock to TreeState
Summary: Previously they cannot be changed.

Reviewed By: markbt

Differential Revision: D7769658

fbshipit-source-id: 4548eb90a82e9bd85fadf6a6f356cca7352fff0d
2018-04-30 19:10:45 -07:00
Jun Wu
224fe91344 treestate: add write methods to TreeState
Summary:
This allows writing `TreeState` state in two ways - save as a new file, or
incrementally update an existing file.

Reviewed By: markbt

Differential Revision: D7748822

fbshipit-source-id: 472b78af6cf7ea79968460a51ec824eaa96e4973
2018-04-30 19:10:45 -07:00
Jun Wu
a34b11ec8d treestate: make Tree write methods return BlockId
Summary: They are used by the next diff.

Reviewed By: markbt

Differential Revision: D7748834

fbshipit-source-id: 9562204975d83a8dce6eb80d2677387e24f8f0a0
2018-04-30 19:10:45 -07:00
Jun Wu
0923d108be treestate: add map-like operations to TreeState
Summary:
The method names are inspired by std HashMap. The types are slightly
different due to `Tree` implementation details.

Reviewed By: markbt

Differential Revision: D7748828

fbshipit-source-id: fc24481cdf0054c8e879d760082e192e52afc7f5
2018-04-30 19:10:45 -07:00
Jun Wu
b15a1b747f treestate: add Tree.get_mut
Summary:
The `Tree` object can return an `&mut` entry easily. Let's expose the
interface. This could be useful when the caller only wants to modify part of
the file state. For example, changing `copied` without touching anything
else.

Reviewed By: markbt

Differential Revision: D7748820

fbshipit-source-id: 430fa8ee310297c61866695a692134daf519e78d
2018-04-30 19:10:45 -07:00
Jun Wu
e07e4a99a1 treestate: add a TreeState struct
Summary:
Unlike TreeDirstate, this struct does not have two trees, and uses
FileStateV2.

Reviewed By: markbt

Differential Revision: D7748826

fbshipit-source-id: e637fad64e6b3e9b2a122e26a29fd04014181d6b
2018-04-30 19:10:45 -07:00
Jun Wu
9ce759a99a treestate: expose visit filtering via Tree.visit_advanced
Reviewed By: markbt

Differential Revision: D7748830

fbshipit-source-id: f3b41531e015fef90c01773ab65a4523ee72e7df
2018-04-25 17:38:20 -07:00
Jun Wu
53e7ab2a6c treestate: add file state filtering to Node.visit
Reviewed By: markbt

Differential Revision: D7748825

fbshipit-source-id: 2395ac8cc25fb4f4d3e6bdb5770616d859fcfab0
2018-04-25 17:38:20 -07:00
Jun Wu
23506bde19 treestate: merge Node.visit and Node.visit_changed
Summary:
They are similar. Merge into one single method. The `visit` method will be
extended to support other filtering features.

Reviewed By: markbt

Differential Revision: D7748829

fbshipit-source-id: 4388291945668a684808fe384341328ffd4ad2a8
2018-04-25 17:38:20 -07:00
Jun Wu
8cbe4d45c0 treestate: add serialization for Node<FileStateV2>
Reviewed By: markbt

Differential Revision: D7748832

fbshipit-source-id: bd7c6e8fce5b512068d86e16d441564e36565459
2018-04-25 17:38:20 -07:00
Jun Wu
b3da9a0262 treestate: add a compatibility layer for Node
Summary:
Allow `Node` type to work with both versions of file states. This is the
static dispatch approach that does not introduce runtime overhead.

Reviewed By: markbt

Differential Revision: D7748831

fbshipit-source-id: 4ac0386f9f93e55af1102b97a3510c8e872444a2
2018-04-25 17:38:20 -07:00
Jun Wu
cc0390192f treestate: add serialization for FileStateV2
Reviewed By: markbt

Differential Revision: D7748821

fbshipit-source-id: 56c7d7d81c86a8db05f6db2c8f6f02993cd07989
2018-04-25 17:38:20 -07:00
Jun Wu
75d50004f4 treestate: add serialization for StateFlags
Summary: It's just a thin wrapper about writing VLQ integers.

Reviewed By: markbt

Differential Revision: D7748835

fbshipit-source-id: 53a302afe51d551e49ac341901e2767d1a044946
2018-04-25 17:38:20 -07:00
Jun Wu
b796901d0c treestate: add StateFlags to Node
Summary:
This field stores the pre-computed aggregated state that helps fast path
traversal - if a state does not match the aggregated state, we can now skip
an entire tree quickly.

Reviewed By: markbt

Differential Revision: D7748823

fbshipit-source-id: 4b81ef5b911b4a21fdd46f8845ec217a75f5af8c
2018-04-25 17:38:20 -07:00
Jun Wu
71b99ae067 treestate: define a new file state object
Summary: The new `FileState` has a state bitflags field, and a "copied" information.

Reviewed By: markbt

Differential Revision: D7748824

fbshipit-source-id: a68687764e1b0c13252cb914673f2b16fa22d4ef
2018-04-25 17:38:19 -07:00
Mateusz Kwapich
052d0e3708 Command trait in rust
Summary:
We (me and Aida)  wanted to have something that could be used instead of Command decorator that we have in python
and we came up with the following trait.

Differential Revision: D7754930

fbshipit-source-id: 15f412d07045e7d8b229801ec3094664f78f801b
2018-04-25 16:00:36 -07:00
Mateusz Kwapich
86d3de94e5 demo binary
Summary: Let's commit a stub of demo binary depending on argparse that we'll work on this wekk

Differential Revision: D7752614

fbshipit-source-id: a811ea363d49e0fd56cc755b0abb74d89b4a3112
2018-04-25 16:00:36 -07:00
Mateusz Kwapich
21069062f4 argparse crate (derived from fbcode/scm/telemetry/telemetry/src)
Summary:
let's turn the wez's argparse library from telemetry into a separate
crate withing hg/lib. We'll experiment on it and if things go well we'll make
`telemetry` depend on that.

Reviewed By: quark-zju

Differential Revision: D7752615

fbshipit-source-id: 0814d91d704abdb746894a0289bf082e8d799b73
2018-04-25 16:00:36 -07:00
Jun Wu
d15213f6f5 treedirstate: move non-Python part to a separate crate
Summary:
This makes it easier to modify and test the core logic without coupling with
the Python logic.

Reviewed By: markbt

Differential Revision: D7734012

fbshipit-source-id: 0d7b19198d85f6ca7314611256e9271be60070d1
2018-04-24 15:59:07 -07:00
Liubov Dmitrieva
c0ddf154fb watchman client: add missing crates
Summary: add missing vendor crates as initial step

Reviewed By: DurhamG

Differential Revision: D7738538

fbshipit-source-id: 9c1c55a62ce28da755f98cc5100e5958db064d77
2018-04-23 23:34:28 -07:00
Jun Wu
00a9659536 radixbuf: replace criterion with minibench
Summary: Similar with D7440249. See D7440254 for motivation.

Reviewed By: DurhamG

Differential Revision: D7562195

fbshipit-source-id: b11eb4f47375a2a2d70be96ebcfe2fefe1e0aaad
2018-04-17 18:54:39 -07:00
Jun Wu
3408c53051 vlqencoding: replace criterion with minibench
Summary: Similar with D7440249. See D7440254 for motivation.

Reviewed By: DurhamG

Differential Revision: D7562196

fbshipit-source-id: e90c623bd9576de49c3d4990ac93c105238d219c
2018-04-17 18:54:39 -07:00
Jun Wu
3ec8c3af00 minibench: simple test filtering support
Summary:
Now it's possible to filter tests like:

  cargo bench --bench index --verbose -- TEST_KEYWORD

Useful for profiling specific test.

Reviewed By: DurhamG

Differential Revision: D7562174

fbshipit-source-id: 9c7fe13a0541bd3dda7a9c1acf95c91513b633f2
2018-04-17 18:54:39 -07:00
Jun Wu
40a88364be indexedlog: replace div with shr to make checksum faster
Summary:
Spot `div` slowness using Linux's `perf` tool.

        |    Disassembly of section .text:
        |
        |    0000000000018990 <indexedlog::checksum_table::ChecksumTable::check_range>:
        |    _ZN10indexedlog14checksum_table13ChecksumTable11check_range17h2303c96b1e035e20E():
   1.36 |      push   %rax
   0.18 |      mov    %rdx,%r8
   0.54 |      mov    $0x1,%cl
        |      test   %r8,%r8
        |      je     60
   0.54 |      add    %rsi,%r8
   0.72 |      cmp    0x30(%rdi),%r8
        |      ja     64
   0.27 |      mov    0x28(%rdi),%r9
   0.27 |      test   %r9,%r9
        |      je     6a
   0.36 |      add    $0xffffffffffffffff,%r8
   0.18 |      xor    %edx,%edx
   0.45 |      mov    %rsi,%rax
   0.36 |      div    %r9
  43.72 |      mov    %rax,%rsi
        |      xor    %edx,%edx
        |      mov    %r8,%rax
   0.18 |      div    %r9
  42.82 |      add    $0x1,%rax
   0.09 |      cmp    %rax,%rsi
        |      jae    60
   2.17 |      cmpq   $0x0,0x60(%rdi)
        |      je     78
        |      mov    0x50(%rdi),%rcx
        |      cmpb   $0x0,(%rcx)
   1.63 |      sete   %cl
   0.18 |      xchg   %ax,%ax
        |50:   test   $0x1,%cl
        |      je     64
   0.45 |      add    $0x1,%rsi
   0.81 |      mov    $0x1,%cl
   0.09 |      cmp    %rax,%rsi
        |      jb     50
        |60:   mov    %ecx,%eax
        |      pop    %rcx
   2.62 |      retq
        |64:   xor    %ecx,%ecx
        |      mov    %ecx,%eax
        |      pop    %rcx
        |      retq
        |6a:   lea    panic_loc.a.llvm.9800112514578621117,%rdi
        |      callq  core::panicking::panic
        |      ud2
        |78:   lea    panic_bounds_check_loc.7.llvm.9800112514578621117,%rdi
        |      xor    %esi,%esi
        |      xor    %edx,%edx
        |      callq  core::panicking::panic_bounds_check
        |      ud2

Change `chunk_size` to `chunk_size_log`. Replace `div` with `shr` to make it
significantly faster:

Before:

  index lookup (memory)           1.118 ms
  index lookup (disk, no verify)  2.078 ms
  index lookup (disk, verified)   7.687 ms

After:

  index lookup (memory)           1.066 ms
  index lookup (disk, no verify)  1.992 ms
  index lookup (disk, verified)   3.591 ms

Reviewed By: DurhamG, markbt

Differential Revision: D7554992

fbshipit-source-id: c24189ced722d880af6ca0d64967eb762363d9e3
2018-04-17 18:54:39 -07:00
Jun Wu
f25c152d01 indexedlog: add a test about checksum
Summary:
Add a test that bitflips the index content, and make sure reading the index
would trigger an error.

Due to run-time performance difference, the release version tests 2-byte key
while the debug version only tests 1-byte key.

The header byte was not verified. Now it is verified.

Reviewed By: DurhamG

Differential Revision: D7517134

fbshipit-source-id: b3d8665ff4ac08c1a70db8d21122ba241913a2ed
2018-04-17 18:54:39 -07:00
Jun Wu
9ce455769c indexedlog: avoid writing unused entries due to leaf split
Summary:
In "split_leaf" "Example 3" case, the old leaf entry (and its key) becomes
unused. Writing them to disk is unnecessary. This patch adds "unused" marker
so they could be marked and skipped inside flush().

No visible performance change:

  index insertion                 3.710 ms
  index flush                     3.717 ms
  index lookup (memory)           1.128 ms
  index lookup (disk, no verify)  1.993 ms
  index lookup (disk, verified)   7.866 ms

Reviewed By: DurhamG

Differential Revision: D7517139

fbshipit-source-id: 253c878bc4b3762382c424777dfa779b3868e851
2018-04-17 18:54:38 -07:00
Kostia Balytskyi
fa03821500 hg: fix test-checks broken in windows treemanifest stack
Reviewed By: quark-zju

Differential Revision: D7560133

fbshipit-source-id: 98b016d0911aaecc1058263c134a5e4ecd0be9e5
2018-04-13 21:51:50 -07:00
Kostia Balytskyi
171ca13ebe hg: add a dirent.h portability header
Summary:
This will either include system dirent on POSIX or a vendored dirent from
folly on Windows.

`/no-check-code` is here because it's everywhere across hg's .c codebase.

Differential Revision: D7555759

fbshipit-source-id: dc55926e83e17976930522277ed7fe6ce41f32f7
2018-04-13 21:51:50 -07:00
Kostia Balytskyi
6be12d9dae hg: install mman as a third-party dep before building hg
Summary: This is needed for `treemanifest`.

Differential Revision: D7555758

fbshipit-source-id: 24d7dac292a62b0f3cabed1cbc0cd39e0b19a470
2018-04-13 21:51:50 -07:00
Jun Wu
bdbf60f28d xdiff: backport upstream changes
Summary:
I did some extra xdiff changes in upstream, namely:

  - Remove unused features
  - Replace "long" (32-bit in MSVC) with int64_t to support large files
  - Add comment on some key variables

This backports them. It also includes Matt's fixes about Windows compatibility.

Reviewed By: ryanmce

Differential Revision: D7223939

fbshipit-source-id: 9287d5be22dae4ab41b05b3a4c160d836b5714a6
2018-04-13 21:51:48 -07:00
Jun Wu
3ffa0f28e2 gitignore: avoid quadratic behavior
Summary:
The correct gitignore matcher needs O(N^2) time to check a path which is N
directory deep. For example, to check "a/b/c/d", it needs to check:

  - Whether .gitignore matches a/b/c/d
  - Whether a/.gitignore matches b/c/d
  - Whether a/b/.gitignore matches c/d
  - Whether a/b/c/.gitignore matches d

  - Whether .gitignore matches a/b/c
  - Whether a/.gitignore matches b/c
  - Whether a/b/.gitignore matches c

  - Whether .gitignore matches a/b
  - Whether a/.gitignore matches b

  - Whether .gitignore matches a

It might not look that bad because N=4 for the above example. But when N is
larger (ex. node_modules/../node_modules/../node_modules/..), things get much
worse.

This patch adds "caching" about whether a directory is ignored or not. For
example, if "a/b/" is ignored, the new code would skip checking subdirectories
(ex. "a/b/c/"). The time complexity is now roughly O(N) gitignore tests instead
of O(N^2), since we only did a gitignore check for a parent directory of a path
being tested once, and then cache the parent directory result in a boolean
value.

To be clear, for the first time checking a path which is not ignored, it still
needs O(N^2) for initializing the trees. But once it's initialized, the next
time checking a file in a same directory, will be O(N).

`LruCache` is replaced by `HashMap` since it does not support `.get` and the
code needs that to work.

The perf issue was previously documented as a "PERF" comment.
This diff removes it.

Reviewed By: DurhamG

Differential Revision: D7496058

fbshipit-source-id: f10895b8f0d7dcdde6faf9daeec5cd78a1f15a2b
2018-04-13 21:51:48 -07:00
Jun Wu
ac52e4a6fb indexedlog: add a test against std hashmap for multi-values
Summary: Since we now have the ability to store multiple values. Add a test.

Reviewed By: DurhamG

Differential Revision: D7472880

fbshipit-source-id: 85b1c69245ac7f0c4702daf22a02f5e5072f0924
2018-04-13 21:51:46 -07:00
Jun Wu
de74642bc7 indexedlog: implement value iterator
Summary:
The value type is a linked list of u64 integers. Add an API to expose that.

Using iterator framework has benefits about flexibility - the caller can
take the first value, or convert it to a vector, or count the values, etc.
easily.

Reviewed By: DurhamG

Differential Revision: D7472881

fbshipit-source-id: d31e81770e069734b54fa08729c0cd45a699aae2
2018-04-13 21:51:46 -07:00
Jun Wu
cc4193ba29 indexedlog: handle radix null child correctly
Summary:
This is caught by a later test. Looking up a non-existed child (jumptable
value is 0) returns InvalidData error, while it should return Offset(0).

The added if condition does not seem to have noticeable performance impact:

  index insertion                 3.840 ms
  index flush                     3.740 ms
  index lookup (memory)           1.085 ms
  index lookup (disk, no verify)  1.972 ms
  index lookup (disk, verified)   7.752 ms

Reviewed By: DurhamG

Differential Revision: D7472882

fbshipit-source-id: 1cc51e9afa248e123cca9c561d7bb2128fd898b1
2018-04-13 21:51:46 -07:00
Jun Wu
b82b0daab5 indexedlog: make LinkOffset also return next link offset
Summary:
Previously, the code was focusing on getting the hardest (index) part right,
but less about the value part. There is no way to get all values in the
linked list, as designed, yet. This diff starts the work.

Similar to `KeyOffset::key_and_link_offset`, change the internal API of
LinkOffset to return both value and the next link offset.

Reviewed By: DurhamG

Differential Revision: D7472879

fbshipit-source-id: 4a4512d7c63abbb667146de582e0f8cd04c9c04a
2018-04-13 21:51:46 -07:00
Jun Wu
b9b1f1e907 indexedlog: use OpenOptions
Summary:
`Index::open` now takes too many parameters, which is not very convenient to
use. Inspired by `fs::OpenOptions`, use a dedicated strut for specifying
open options.

Motivation: To test checksum ability more confidently, I'd like to write
something that randomly mutates 1 byte from a sane index. To make sure the
checksum coverage is "correct", checksum chunk size is another parameter.

Reviewed By: DurhamG

Differential Revision: D7464182

fbshipit-source-id: 469ce7d1cfa5de3946028418567a9f3e2bc303fa
2018-04-13 21:51:46 -07:00
Jun Wu
6cb2b1dd23 indexedlog: make OffsetMap::get have no assumption about offset
Summary:
Address DurhamG's review comment on D7422832.

Previously, `OffsetMap::get` expects a dirty offset. That's because it was
changed from `HashMap` and we don't control `HashMap::get`. It's cleaner to
let `OffsetMap` do the `is_dirty` check.

Reviewed By: DurhamG

Differential Revision: D7461707

fbshipit-source-id: 9f2abdf6c6f993d98d9443f16bafcc6154ee0dbb
2018-04-13 21:51:46 -07:00
Jun Wu
9787cfc15b indexedlog: add more tests about leaf split
Summary:
The new test covers the `else` branch inside `LeafOffset::set_link`
previously not covered.

Coverage was checked by the following script:

```
from __future__ import absolute_import

import glob
import os
import shutil

os.system('cargo rustc --lib --profile test -- -Ccodegen-units=1 -Clink-dead-code -Zno-landing-pads')
path = max((os.stat(path).st_mtime, path) for path in glob.glob('./target/debug/*-????????????????'))[1]
shutil.rmtree('target/kcov')
os.system('kcov --include-path $PWD/src --verify target/kcov %s' % path)
```

Reviewed By: DurhamG

Differential Revision: D7446902

fbshipit-source-id: 293da2ff53b83c8f11534f0f8e5e7fd102216a01
2018-04-13 21:51:46 -07:00
Jun Wu
5209e8360b indexedlog: support external keys
Summary:
Change `insert_advanced` to accept an enum that could be either a key, or an
(offset, len) that refers to the external key buffer.

Insertion becomes slower due to new flexibility overhead.  For some reason,
"index lookup (no verify)" becomes faster (restores pre-D7440248 performance):

  index insertion                 6.434 ms
  index flush                     3.757 ms
  index lookup (memory)           1.068 ms
  index lookup (disk, no verify)  1.969 ms
  index lookup (disk, verified)   7.805 ms

With 2M 20-byte keys, the non-external key version generates a 105MB index:

  seconds operation
  1.247   insert
  0.622   flush
  1.859   flush done
  0.702   lookup (without checksum)
  1.395   lookup (with checksum)

Using external keys,the index is 70MB, and time for each operation:

  seconds operation
  1.086   insert
  0.702   flush
  0.665   lookup (without checksums)
  1.602   lookup (with checksums)

The external key will have more space wins for longer keys, ex. file path.

`Index` module was made public so `InsertKey` type is usable.

Reviewed By: DurhamG

Differential Revision: D7444907

fbshipit-source-id: b89d95246845799c2c55fb73ad203a7e6724b85e
2018-04-13 21:51:46 -07:00
Jun Wu
36dfda984c indexedlog: relax leaf entry's key offset type
Summary:
Previously, a leaf entry can only have a `KeyOffset`. This diff makes it
possible to be either `KeyOffset`, or `ExtKeyOffset`. The API didn't change
much since `LeafOffset::key_and_link_offset` handles the difference
transparently.

Latest benchmark result:

  index insertion                 4.879 ms
  index flush                     3.620 ms
  index lookup (memory)           1.827 ms
  index lookup (disk, no verify)  3.508 ms
  index lookup (disk, verified)   7.861 ms

Reviewed By: DurhamG

Differential Revision: D7444909

fbshipit-source-id: 5441e1ae187d42931377d7213dcb77156b2af714
2018-04-13 21:51:46 -07:00
Jun Wu
44a0998bc6 indexedlog: let leaf entry return key content
Summary:
The leaf entry has a `key_and_link_offset` method. Previously it returns a
`KeyOffset`, since we now have `ExtKeyOffset`, it's friendly to handle the
key entry type difference at the leaf entry level, instead of requiring the
caller to handle it.

Reviewed By: DurhamG

Differential Revision: D7444905

fbshipit-source-id: 56d87641a2a5a50ddca8b1e4c74c9aaa3891b542
2018-04-13 21:51:46 -07:00
Jun Wu
1294c1b471 indexedlog: add an "external key" entry type
Summary:
Previously, I thought there is only one index that will use "commit hash" as
keys, that is the nodemap, and other indexes (like childmap) would just use
shorter integer keys (ex. revision number, or offsets). So the space overhead
of storing full keys only applies to one index and seems acceptable.

But that implies strict topo order for the source of truth data (ex. to use
integers as keys in childmap, you have to know how to translate parent
revisions from hashes to integers at the time writing the revision).

Thinking about it again, it seems the topo-order requirement would make a lot
of things less flexible. It's much easier to just use hashes as keys in the
index. Then it's worthwhile to address the space efficiency problem by
introducing an "external key buffer" concept. That's actually what `radixbuf`
does.

This is the start. It adds the type to the strcut. The feature is not completed
yet.

Reviewed By: DurhamG

Differential Revision: D7444904

fbshipit-source-id: 60a83c9e6e8b0734450f0c5827928a7c5bd111d5
2018-04-13 21:51:45 -07:00
Jun Wu
5e828307f4 indexedlog: verify checksum for all reads
Summary:
It further slows down lookups, even when checksum is disabled, since even a
`is_none()` check is not free:

  index insertion                 4.697 ms
  index flush                     3.764 ms
  index lookup (memory)           2.878 ms
  index lookup (disk, no verify)  3.564 ms
  index lookup (disk, verified)   7.788 ms

The "verified" version basically needs 2x time due to more memory lookups.

Unfortunately this means eventual lookup performance will be slower than
gdbm, but insertion is still much faster. And the index still has a better
locking properties (lock-free read) that gdbm does not have.

With correct time complexity (no O(len(changelog)) index-only operations for
example), I'd expect it's rare for the overall performance to be bounded by
index performance. Data integrity is more important.

With a larger number of nodes, ex. 2M 20-byte strings: inserting to memory
takes 1.4 seconds, flushing to disk takes 0.9 seconds, looking up without
checksum takes 0.9 seconds, looking up with checksum takes 1.7 seconds.

Reviewed By: DurhamG

Differential Revision: D7440248

fbshipit-source-id: 020e5204606f9f0a4f68843a491009a6a6f75751
2018-04-13 21:51:42 -07:00
Jun Wu
ca8f60eb0a indexedlog: verify checksum for type bytes
Summary:
This is in the critical path for lookup, and has very visible performance
penalty:

  index insertion                 3.923 ms
  index flush                     3.921 ms
  index lookup (memory)           1.070 ms
  index lookup (disk, no verify)  1.980 ms
  index lookup (disk, verified)   5.206 ms

Reviewed By: DurhamG

Differential Revision: D7440252

fbshipit-source-id: 49540f974faff1cdd0603a72328f141ccd054ee2
2018-04-13 21:51:42 -07:00
Jun Wu
55fc90dfea indexedlog: verify checksum for Mem* structs
Summary:
Previously checksum is only for `MemRoot`, now it's for all `Mem` structs.
Since `Mem*` structs are not frequently used in the normal lookup code path,
there is no visible performance change.

Reviewed By: DurhamG

Differential Revision: D7440253

fbshipit-source-id: 945f5a8c38d228f59190a487b0cf6dbc5daac4f7
2018-04-13 21:51:42 -07:00
Jun Wu
a7e3e7884d indexedlog: add a type alias for Option<ChecksumTable>
Summary:
The type will be used all over the place and may make `rustfmt` wrap lines.
Use a shorter type to make it slightly cleaner.

Reviewed By: DurhamG

Differential Revision: D7436338

fbshipit-source-id: ecaada23916a22658f65669b748632a077e60df2
2018-04-13 21:51:42 -07:00
Jun Wu
bfd8e33370 indexedlog: verify checksum for root entry
Summary:
This only affects `Index::open` right now. So it's a one time check and does
not affect performance.

Reviewed By: DurhamG

Differential Revision: D7436341

fbshipit-source-id: 30313064bf2ea50320ac744fc18c03bff4b12c89
2018-04-13 21:51:42 -07:00
Jun Wu
a0cec9853c indexedlog: add checksum table to index struct
Summary:
Add `ChecksumTable` to the `Index` struct. But it's not functional yet.
The checksum will mainly affect "index lookup (disk)" case. Add another
benchmark for showing the difference with checksum on and off. They do not
have much difference right now:

  index insertion                 3.756 ms
  index flush                     3.469 ms
  index lookup (memory)           0.990 ms
  index lookup (disk, no verify)  1.768 ms
  index lookup (disk, verified)   1.766 ms

Reviewed By: DurhamG

Differential Revision: D7436339

fbshipit-source-id: 60a6554a2c96067a53ce9e1753cd51d0d61c0bea
2018-04-13 21:51:42 -07:00
Jun Wu
8d7d4de8ee indexedlog: separate benchmarks
Summary:
The minibench framework does not provide benchmark filtering. So let's
separate benchmarks using different entry points.

Reviewed By: DurhamG

Differential Revision: D7440250

fbshipit-source-id: 11e7790a5074ebf4c08e33c312a490a66a921926
2018-04-13 21:51:42 -07:00
Jun Wu
d86adc417e indexedlog: remove "index clone" benchmarks
Summary:
The "clone" benchmarks were added to be subtracted from "lookup" to
workaround the test framework limitation.

The new minibench framework makes it easier to exclude preparation cost.
Therefore the clone benchmarks are no longer needed.

  index insertion                 3.881 ms
  index flush                     3.286 ms
  index lookup (memory)           0.928 ms
  index lookup (disk)             1.685 ms

"index lookup (memory)" is basically "index lookup (memory)" minus
"index clone (memory)" in previous benchmarks.

Reviewed By: DurhamG

Differential Revision: D7440251

fbshipit-source-id: 0e6a1fb7ee64f9a393ee9ada4db6e6eb052e20bf
2018-04-13 21:51:42 -07:00
Jun Wu
9b9dd289e4 indexedlog: use minibench to do benchmark
Summary:
See the previous minibench diff for the motivation.

"failure" was removed from build dependencies since it's not used yet.

Run benchmark a few times. It seems the first several items are less stable
due to possibly warming up issues. Otherwise the result looks good enough.
The test also compiles and runs much faster.

```
base16 iterating 1M bytes       0.921 ms
index insertion                 4.804 ms
index flush                     5.104 ms
index lookup (memory)           2.929 ms
index lookup (disk)             1.767 ms
index clone (memory)            2.036 ms
index clone (disk)              0.010 ms

base16 iterating 1M bytes       0.853 ms
index insertion                 4.512 ms
index flush                     4.717 ms
index lookup (memory)           2.907 ms
index lookup (disk)             1.755 ms
index clone (memory)            1.856 ms
index clone (disk)              0.010 ms

base16 iterating 1M bytes       1.525 ms
index insertion                 4.577 ms
index flush                     4.901 ms
index lookup (memory)           2.800 ms
index lookup (disk)             1.790 ms
index clone (memory)            1.794 ms
index clone (disk)              0.010 ms

base16 iterating 1M bytes       0.768 ms
index insertion                 4.486 ms
index flush                     4.918 ms
index lookup (memory)           2.658 ms
index lookup (disk)             1.721 ms
index clone (memory)            1.763 ms
index clone (disk)              0.010 ms

base16 iterating 1M bytes       0.732 ms
index insertion                 4.489 ms
index flush                     4.792 ms
index lookup (memory)           2.689 ms
index lookup (disk)             1.739 ms
index clone (memory)            1.850 ms
index clone (disk)              0.009 ms

base16 iterating 1M bytes       1.124 ms
index insertion                 7.188 ms
index flush                     4.888 ms
index lookup (memory)           2.829 ms
index lookup (disk)             1.609 ms
index clone (memory)            2.642 ms
index clone (disk)              0.010 ms

base16 iterating 1M bytes       1.055 ms
index insertion                 4.683 ms
index flush                     4.996 ms
index lookup (memory)           2.782 ms
index lookup (disk)             1.710 ms
index clone (memory)            1.802 ms
index clone (disk)              0.009 ms
```

Reviewed By: DurhamG

Differential Revision: D7440249

fbshipit-source-id: 0f946ab184455acd40c5a38cf46ff94d9e3755c8
2018-04-13 21:51:42 -07:00
Jun Wu
f9fb60337a minibench: add a simple library to do benchmark
Summary:
It's sad to find that existing Rust benchmark frameworks do not fit well in
our simple benchmark purpose. The benchmark library shipped with Rust [1] has
been in "nightly-only" for long. Third-party choices like "criterion.rs" does
too many things and misses certain small features. Namely, indexedlog wants:

  - More stable benchmark result. This means not picking the average time,
    but the "best" time among all runs, like what Mercurial does.
  - Do not measure setup cost from repetitive runs. As in D7404532, do not
    clone the index, and do not have separate "clone" benchmarks.
  - Faster benchmarks. This means getting rid of unused parts like calling
    gnuplot.

Besides, having the test framework to be lightweight also helps compilation
time. Looking at `indexedlog`'s dependencies (with unused "failure"
removed), 70% of them are from `criterion.rs`.

```
indexedlog v0.1.0 (lib/indexedlog)
[dependencies]
|-- atomicwrites v0.1.5
|   [dependencies]
|   |-- nix v0.9.0
|   |   [dependencies]
|   |   |-- bitflags v0.9.1
|   |   |-- cfg-if v0.1.2
|   |   |-- libc v0.2.39
|   |   `-- void v1.0.2
|   `-- tempdir v0.3.6
|       [dependencies]
|       |-- rand v0.4.2
|       |   [dependencies]
|       |   `-- libc v0.2.39 (*)
|       `-- remove_dir_all v0.3.0
|           [dependencies]
|           |-- kernel32-sys v0.2.2
|           |   [dependencies]
|           |   `-- winapi v0.2.8
|           |   [build-dependencies]
|           |   `-- winapi-build v0.1.1
|           `-- winapi v0.2.8 (*)
|-- byteorder v1.2.1
|-- fs2 v0.4.3
|   [dependencies]
|   `-- libc v0.2.39 (*)
|-- memmap v0.6.2
|   [dependencies]
|   `-- libc v0.2.39 (*)
|-- twox-hash v1.1.0
|   [dependencies]
|   `-- rand v0.3.22
|       [dependencies]
|       |-- libc v0.2.39 (*)
|       `-- rand v0.4.2 (*)
`-- vlqencoding v0.1.0 (lib/vlqencoding)
[dev-dependencies]
|-- criterion v0.2.1
|   [dependencies]
|   |-- atty v0.2.8
|   |   [dependencies]
|   |   `-- libc v0.2.39 (*)
|   |-- clap v2.31.1
|   |   [dependencies]
|   |   |-- ansi_term v0.11.0
|   |   |-- atty v0.2.8 (*)
|   |   |-- bitflags v1.0.1
|   |   |-- strsim v0.7.0
|   |   |-- textwrap v0.9.0
|   |   |   [dependencies]
|   |   |   `-- unicode-width v0.1.4
|   |   |-- unicode-width v0.1.4 (*)
|   |   `-- vec_map v0.8.0
|   |-- criterion-plot v0.2.1
|   |   [dependencies]
|   |   |-- byteorder v1.2.1 (*)
|   |   |-- cast v0.2.2
|   |   `-- itertools v0.7.7
|   |       [dependencies]
|   |       `-- either v1.4.0
|   |-- criterion-stats v0.2.1
|   |   [dependencies]
|   |   |-- cast v0.2.2 (*)
|   |   |-- num-traits v0.2.1
|   |   |-- num_cpus v1.8.0
|   |   |   [dependencies]
|   |   |   `-- libc v0.2.39 (*)
|   |   |-- rand v0.4.2 (*)
|   |   `-- thread-scoped v1.0.2
|   |-- failure v0.1.1
|   |   [dependencies]
|   |   |-- backtrace v0.3.5
|   |   |   [dependencies]
|   |   |   |-- backtrace-sys v0.1.16
|   |   |   |   [dependencies]
|   |   |   |   `-- libc v0.2.39 (*)
|   |   |   |   [build-dependencies]
|   |   |   |   `-- cc v1.0.8
|   |   |   |-- cfg-if v0.1.2 (*)
|   |   |   |-- libc v0.2.39 (*)
|   |   |   `-- rustc-demangle v0.1.7
|   |   `-- failure_derive v0.1.1
|   |       [dependencies]
|   |       |-- quote v0.3.15
|   |       |-- syn v0.11.11
|   |       |   [dependencies]
|   |       |   |-- quote v0.3.15 (*)
|   |       |   |-- synom v0.11.3
|   |       |   |   [dependencies]
|   |       |   |   `-- unicode-xid v0.0.4
|   |       |   `-- unicode-xid v0.0.4 (*)
|   |       `-- synstructure v0.6.1
|   |           [dependencies]
|   |           |-- quote v0.3.15 (*)
|   |           `-- syn v0.11.11 (*)
|   |-- failure_derive v0.1.1 (*)
|   |-- handlebars v0.31.0
|   |   [dependencies]
|   |   |-- lazy_static v1.0.0
|   |   |-- log v0.4.1
|   |   |   [dependencies]
|   |   |   `-- cfg-if v0.1.2 (*)
|   |   |-- pest v1.0.6
|   |   |-- pest_derive v1.0.6
|   |   |   [dependencies]
|   |   |   |-- pest v1.0.6 (*)
|   |   |   |-- quote v0.3.15 (*)
|   |   |   `-- syn v0.11.11 (*)
|   |   |-- quick-error v1.2.1
|   |   |-- regex v0.2.10
|   |   |   [dependencies]
|   |   |   |-- aho-corasick v0.6.4
|   |   |   |   [dependencies]
|   |   |   |   `-- memchr v2.0.1
|   |   |   |       [dependencies]
|   |   |   |       `-- libc v0.2.39 (*)
|   |   |   |-- memchr v2.0.1 (*)
|   |   |   |-- regex-syntax v0.5.3
|   |   |   |   [dependencies]
|   |   |   |   `-- ucd-util v0.1.1
|   |   |   |-- thread_local v0.3.5
|   |   |   |   [dependencies]
|   |   |   |   |-- lazy_static v1.0.0 (*)
|   |   |   |   `-- unreachable v1.0.0
|   |   |   |       [dependencies]
|   |   |   |       `-- void v1.0.2 (*)
|   |   |   `-- utf8-ranges v1.0.0
|   |   |-- serde v1.0.33
|   |   `-- serde_json v1.0.11
|   |       [dependencies]
|   |       |-- dtoa v0.4.2
|   |       |-- itoa v0.3.4
|   |       |-- num-traits v0.2.1 (*)
|   |       `-- serde v1.0.33 (*)
|   |-- itertools v0.7.7 (*)
|   |-- itertools-num v0.1.1
|   |   [dependencies]
|   |   `-- num-traits v0.1.43
|   |       [dependencies]
|   |       `-- num-traits v0.2.1 (*)
|   |-- log v0.4.1 (*)
|   |-- serde v1.0.33 (*)
|   |-- serde_derive v1.0.33
|   |   [dependencies]
|   |   |-- proc-macro2 v0.2.3
|   |   |   [dependencies]
|   |   |   `-- unicode-xid v0.1.0
|   |   |-- quote v0.4.2
|   |   |   [dependencies]
|   |   |   `-- proc-macro2 v0.2.3 (*)
|   |   |-- serde_derive_internals v0.21.0
|   |   |   [dependencies]
|   |   |   |-- proc-macro2 v0.2.3 (*)
|   |   |   `-- syn v0.12.14
|   |   |       [dependencies]
|   |   |       |-- proc-macro2 v0.2.3 (*)
|   |   |       |-- quote v0.4.2 (*)
|   |   |       `-- unicode-xid v0.1.0 (*)
|   |   `-- syn v0.12.14 (*)
|   |-- serde_json v1.0.11 (*)
|   `-- simplelog v0.5.0
|       [dependencies]
|       |-- chrono v0.4.0
|       |   [dependencies]
|       |   |-- num v0.1.42
|       |   |   [dependencies]
|       |   |   |-- num-integer v0.1.36
|       |   |   |   [dependencies]
|       |   |   |   `-- num-traits v0.2.1 (*)
|       |   |   |-- num-iter v0.1.35
|       |   |   |   [dependencies]
|       |   |   |   |-- num-integer v0.1.36 (*)
|       |   |   |   `-- num-traits v0.2.1 (*)
|       |   |   `-- num-traits v0.2.1 (*)
|       |   `-- time v0.1.39
|       |       [dependencies]
|       |       `-- libc v0.2.39 (*)
|       |       [dev-dependencies]
|       |       `-- winapi v0.3.4
|       |-- log v0.4.1 (*)
|       `-- term v0.4.6
|-- quickcheck v0.6.2
|   [dependencies]
|   |-- env_logger v0.5.6
|   |   [dependencies]
|   |   |-- atty v0.2.8 (*)
|   |   |-- humantime v1.1.1
|   |   |   [dependencies]
|   |   |   `-- quick-error v1.2.1 (*)
|   |   |-- log v0.4.1 (*)
|   |   |-- regex v0.2.10 (*)
|   |   `-- termcolor v0.3.5
|   |-- log v0.4.1 (*)
|   `-- rand v0.4.2 (*)
|-- rand v0.4.2 (*)
`-- tempdir v0.3.6 (*)
```

[1]: https://github.com/rust-lang/rust/issues/29553

Reviewed By: DurhamG

Differential Revision: D7440254

fbshipit-source-id: 53cdbd470945388db96702ab771a3f73b456da37
2018-04-13 21:51:42 -07:00
Jun Wu
8bcff92cab indexedlog: use a dedicated map type for offset translation
Summary:
The dirty -> non-dirty offset mapping can be optimized using a dedicated
"map" type that is backed by `vec`s, because dirty offsets are continuous
per type.

This makes "flush" significantly faster:

```
index flush             time:   [5.8808 ms 6.1800 ms 6.4813 ms]
                        change: [-62.250% -59.481% -56.325%] (p = 0.00 < 0.05)
                        Performance has improved.
```

Reviewed By: DurhamG

Differential Revision: D7422832

fbshipit-source-id: 9ab8a70d1663155941dae5b4f02f7452f5e3cadf
2018-04-13 21:51:42 -07:00
Jun Wu
00503a6d94 indexedlog: avoid a memory allocation
Summary:
It seems to improve the performance a bit:

```
index insertion         time:   [5.4643 ms 5.6818 ms 5.9188 ms]
                        change: [-24.526% -17.384% -10.315%] (p = 0.00 < 0.05)
                        Performance has improved.
```

Reviewed By: DurhamG

Differential Revision: D7422831

fbshipit-source-id: fc1c72f402258db7e189cd8724583757d48affb7
2018-04-13 21:51:42 -07:00
Jun Wu
4cb2cc1abb indexedlog: use Box<[u8]> instead of Vec<u8>
Summary:
For key entries, the key is immutable once stored. So just use `Box<[u8]>`.
It saves a `usize` per entry. On 64-bit platform, that's a lot.

Performance is slightly improved and it catches up with D7404532 before
typed offset refactoring now:

  index insertion         time:   [6.1852 ms 6.6598 ms 7.2433 ms]
  index flush             time:   [15.814 ms 16.538 ms 17.235 ms]
  index lookup (memory)   time:   [3.7636 ms 3.9403 ms 4.1424 ms]
  index lookup (disk)     time:   [1.9413 ms 2.0366 ms 2.1325 ms]
  index clone (memory)    time:   [2.6952 ms 2.9221 ms 3.0968 ms]
  index clone (disk)      time:   [5.0296 us 5.2862 us 5.5629 us]

Reviewed By: DurhamG

Differential Revision: D7422837

fbshipit-source-id: 4aabfdc028aefb8e796803e103f0b2e4965f84e6
2018-04-13 21:51:42 -07:00
Jun Wu
36793b7c14 indexedlog: simplify insert_advanced API
Summary:
Previously, both `value` and `link` are optional in `insert_advanced`.
This diff makes `value` required.

`maybe_create_link_entry` becomes unused and removed.

No visible performance change.

Reviewed By: DurhamG

Differential Revision: D7422838

fbshipit-source-id: 8d7d3cc1cc325f6fea7e8ce996d0a43d3ee49839
2018-04-13 21:51:41 -07:00
Jun Wu
892fcd6dfd indexedlog: use typed offsets
Summary:
This is a large refactoring that replaces `u64` offsets with strong typed
ones.

Tests about serialization are removed since they generate illegal data that
cannot pass type check.

It seems to slow down the code a bit, comparing with D7404532. But there are
still room to improve.

  index insertion         time:   [6.9395 ms 7.3863 ms 7.7620 ms]
  index flush             time:   [15.949 ms 17.965 ms 20.246 ms]
  index lookup (memory)   time:   [3.6212 ms 3.8855 ms 4.1923 ms]
  index lookup (disk)     time:   [2.2496 ms 2.4649 ms 2.8090 ms]
  index clone (memory)    time:   [2.7292 ms 2.9399 ms 3.2055 ms]
  index clone (disk)      time:   [4.9239 us 5.5928 us 6.3167 us]

Reviewed By: DurhamG

Differential Revision: D7422833

fbshipit-source-id: 7357cb0f4f573f620e829c5e300cd423619dbd62
2018-04-13 21:51:41 -07:00
Jun Wu
283b8d130d pathmatcher: initial Rust matcher that handles gitignore lazily
Summary:
The "pathmatcher" crate is intended to eventually cover more "matcher"
abilities so all Python "matcher" related logic can be handled by Rust.
For now, it only contains a gitignore matcher.

The gitignore matcher is designed to work in a repo (no need to create
multiple gitignore matchers for a repo from a higher layer), and be lazy
i.e. be tree-aware, and do not parse ".gitignore" unless necessary.

Worth mentioning that the gitignore logic provided by the "ignore" crate
seems decent in time complexity - it uses regular expression, which uses state
machines to achieve "testing against multiple patterns at once", instead of
testing patterns one-by-one like what git currently does.

Note: The "ignore" crate provides a nice "Walker" interface but that does
not fit very well with the required laziness here. So the walker interface
is not used.

Reviewed By: markbt

Differential Revision: D7319609

fbshipit-source-id: ebd131adf45a38f83acdf653f5e49d0624012152
2018-04-13 21:51:40 -07:00
Jun Wu
a87fea077c indexedlog: prefix in-memory entries with Mem
Summary: This makes it clear the code has different code paths for on-disk entries.

Reviewed By: DurhamG

Differential Revision: D7422836

fbshipit-source-id: 018fa0e2c20682d4e1beba99f3307550e1f40388
2018-04-13 21:51:40 -07:00
Jun Wu
3332522d43 indexedlog: add some benchmarks
Summary:
Add benchmarks inserting / looking up 20K entries.

Benchmark results on my laptop are:

  index insertion         time:   [6.5339 ms 6.8174 ms 7.1805 ms]
  index flush             time:   [15.651 ms 16.103 ms 16.537 ms]
  index lookup (memory)   time:   [3.6995 ms 4.0252 ms 4.3046 ms]
  index lookup (disk)     time:   [1.9986 ms 2.1224 ms 2.2464 ms]
  index clone (memory)    time:   [2.5943 ms 2.6866 ms 2.7749 ms]
  index clone (disk)      time:   [5.2302 us 5.5477 us 5.9518 us]

Comparing with highly optimized radixbuf:

  index insertion         time:   [991.89 us 1.1708 ms 1.3844 ms]
  index lookup            time:   [863.83 us 945.69 us 1.0304 ms]

Insertion takes 6x time. Lookup from memory takes 1.4x time, from disk takes
2.2x time. Flushing is the slowest - it needs 16x radixbuf insertion time.

Note: need to subtract "clone" time from "lookup" to get meaningful values
about "lookup". This cannot be done automatically due to the limitation of the
benchmark framework.

Although it's slower than radixbuf, the index is still faster than gdbm and
rocksdb. Note: the index does less than gdbm/rocksdb since it does not return
a `[u8]`-ish which requires extra lookups. So it's not a very fair comparison.

  gdbm insertion          time:   [69.607 ms 75.102 ms 79.334 ms]
  gdbm lookup             time:   [9.0855 ms 9.8480 ms 10.637 ms]
  gdbm prepare            time:   [110.35 us 120.40 us 135.63 us]
  rocksdb insertion       time:   [117.96 ms 123.42 ms 127.85 ms]
  rocksdb lookup          time:   [24.413 ms 26.147 ms 28.153 ms]
  rocksdb prepare         time:   [3.8316 ms 4.1776 ms 4.5039 ms]

Note: Subtract "prepare" from "insertion" to get meaningful values.

Code to benchmark rocksdb and gdbm:

```
extern crate criterion;
extern crate gnudbm;
extern crate rand;
extern crate rocksdb;
extern crate tempdir;

use criterion::Criterion;
use gnudbm::GdbmOpener;
use rand::{ChaChaRng, Rng};
use rocksdb::DB;
use tempdir::TempDir;

const N: usize = 20480;

/// Generate random buffer
fn gen_buf(size: usize) -> Vec<u8> {
    let mut buf = vec![0u8; size];
    ChaChaRng::new_unseeded().fill_bytes(buf.as_mut());
    buf
}

fn criterion_benchmark(c: &mut Criterion) {
    c.bench_function("rocksdb prepare", |b| {
        b.iter(move || {
            let dir = TempDir::new("index").expect("TempDir::new");
            let _db = DB::open_default(dir.path().join("a")).unwrap();
        });
    });

    c.bench_function("rocksdb insertion", |b| {
        let buf = gen_buf(N * 20);
        b.iter(move || {
            let dir = TempDir::new("index").expect("TempDir::new");
            let db = DB::open_default(dir.path().join("a")).unwrap();
            for i in 0..N {
                db.put(&&buf[20 * i..20 * (i + 1)], b"v").unwrap();
            }
        });
    });

    c.bench_function("rocksdb lookup", |b| {
        let dir = TempDir::new("index").expect("TempDir::new");
        let db = DB::open_default(dir.path().join("a")).unwrap();
        let buf = gen_buf(N * 20);
        for i in 0..N {
            db.put(&&buf[20 * i..20 * (i + 1)], b"v").unwrap();
        }
        b.iter(move || {
            for i in 0..N {
                db.get(&&buf[20 * i..20 * (i + 1)]).unwrap();
            }
        });
    });

    c.bench_function("gdbm prepare", |b| {
        let buf = gen_buf(N * 20);
        b.iter(move || {
            let dir = TempDir::new("index").expect("TempDir::new");
            let _db = GdbmOpener::new().create(true).readwrite(dir.path().join("a")).unwrap();
        });
    });

    c.bench_function("gdbm insertion", |b| {
        let buf = gen_buf(N * 20);
        b.iter(move || {
            let dir = TempDir::new("index").expect("TempDir::new");
            let mut db = GdbmOpener::new().create(true).readwrite(dir.path().join("a")).unwrap();
            for i in 0..N {
                db.store(&&buf[20 * i..20 * (i + 1)], b"v").unwrap();
            }
        });
    });

    c.bench_function("gdbm lookup", |b| {
        let dir = TempDir::new("index").expect("TempDir::new");
        let mut db = GdbmOpener::new().create(true).readwrite(dir.path().join("a")).unwrap();
        let buf = gen_buf(N * 20);
        for i in 0..N {
            db.store(&&buf[20 * i..20 * (i + 1)], b"v").unwrap();
        }
        b.iter(move || {
            for i in 0..N {
                db.fetch(&&buf[20 * i..20 * (i + 1)]).unwrap();
            }
        });
    });
}

criterion_group!{
    name=benches;
    config=Criterion::default().sample_size(20);
    targets=criterion_benchmark
}
criterion_main!(benches);
```

Reviewed By: DurhamG

Differential Revision: D7404532

fbshipit-source-id: ff39f520b78ad1b71eb36970506b313bb2ff426b
2018-04-13 21:51:40 -07:00
Jun Wu
5576402ea9 indexedlog: add ability to clone a Index object
Summary:
This will be useful for benchmarks - prepare an index as a template, and
clone it in the tests.

Reviewed By: DurhamG

Differential Revision: D7422835

fbshipit-source-id: 190bbdee7cb7c1526274b4d4dab07af4984b5df6
2018-04-13 21:51:40 -07:00
Jun Wu
2f30189748 indexedlog: reorder "use"s
Summary:
The latest rustfmt disagrees about the order of `std::io` imports. Move the
troublesome line to a separate group so both the old and new rustfmt agress
on the format.

Reviewed By: DurhamG

Differential Revision: D7422834

fbshipit-source-id: 9f5289ef2af1a691559fe691e121190f6d845162
2018-04-13 21:51:40 -07:00
Jun Wu
704eef1e4e radixbuf: use criterion for benchmark
Summary:
The old `rustc-test` crate no longer works. There is an upstream
bug report at https://github.com/servo/rustc-test/issues/7.

This change makes it possible to compare radixbuf performance
with the new index.

Reviewed By: DurhamG

Differential Revision: D7404531

fbshipit-source-id: 515e732a65388db4c865c7b139d0f57ead76f788
2018-04-13 21:51:40 -07:00
Jun Wu
9672c45582 indexedlog: add a test comparing with std HashMap
Reviewed By: DurhamG

Differential Revision: D7404529

fbshipit-source-id: a52da9aa9661b48eefc015ce351886677f842d66
2018-04-13 21:51:40 -07:00
Jun Wu
9077cbb5a7 indexedlog: reverse the writing order of radix entries
Summary:
Radix entries need to be written in an reversed order given the order they
are added to the vector.

Reviewed By: DurhamG

Differential Revision: D7404530

fbshipit-source-id: 403189b5c0fa6f21183e62eea04ce4ce7c4e1129
2018-04-13 21:51:40 -07:00
Jun Wu
2075ad87c2 indexedlog: implement leaf splitting
Summary: Complete the insertion interface.

Reviewed By: DurhamG

Differential Revision: D7377210

fbshipit-source-id: 96645ac03a3fd65f22d9a9a54d8479715f49e67d
2018-04-13 21:51:39 -07:00
Jun Wu
a436d0554d indexedlog: add more helper methods
Summary: Those little read and write helpers are used in the next diff.

Reviewed By: DurhamG

Differential Revision: D7377214

fbshipit-source-id: c6e2d240334c11a0b08b15cd7d5c114b6f4d8ace
2018-04-13 21:51:39 -07:00
Jun Wu
61bf1f3854 indexedlog: add a helper function to get key content
Summary:
Add a helper function `peek_key_entry_content` that checks key type and
return the key content.

Reviewed By: DurhamG

Differential Revision: D7377211

fbshipit-source-id: 0ce509aba30309373a709cf5fbcb909dd80471dc
2018-04-13 21:51:39 -07:00
Jun Wu
bf55572f78 indexedlog: partially implement insertion
Summary:
Implement insertion when there is no need to split a leaf entry.

The API may be subject to change if we want other value types. For now, it's
better to get something working and can be benchmarked so we have data about
performance impact with new format changes.

Reviewed By: DurhamG

Differential Revision: D7343423

fbshipit-source-id: 9761f72168046dbafcb00883634aa7ad513a522b
2018-04-13 21:51:39 -07:00
Jun Wu
2389fd95c0 indexedlog: add helper methods about writing data
Summary:
Like the `peek_` family of helper methods. Those methods handles writing
data for both dirty (in-memory) and non-dirty (on-disk) cases. They will
be used in the next diff.

Reviewed By: DurhamG

Differential Revision: D7377208

fbshipit-source-id: f458a20da4bb7808f37daeed3077be2f7e90a9df
2018-04-13 21:51:39 -07:00
Jun Wu
cb58628046 indexedlog: add debug formatter
Summary:
Add code to print out Index's on-disk and in-memory entries in
human-friendly form. This is useful for explaining its internal state, so it
could be used in tests.

Reviewed By: DurhamG

Differential Revision: D7343427

fbshipit-source-id: 706a35404ea42c413657b389166729f8dd1315a3
2018-04-13 21:51:39 -07:00
Jun Wu
a3f7ec3f9b indexedlog: fix root entry serialization
Summary:
Offset stored in it needs to be translated, as done in other types of
entries.  I forgot it.

Reviewed By: DurhamG

Differential Revision: D7404528

fbshipit-source-id: fb09a9c3052ddfe8f8016440290062084d5d8b03
2018-04-13 21:51:39 -07:00
Jun Wu
fcc71af3ab indexedlog: add API to find link offset from a key
Summary:
This is a low-level API that follows the base16 sequence of a key, and
return potentially matched `LinkOffset`.

Reviewed By: DurhamG

Differential Revision: D7343424

fbshipit-source-id: 38f260064d1a23695a28dda6f7dc921f88c7fccc
2018-04-13 21:51:39 -07:00
Jun Wu
871ca6c96b indexedlog: add helper methods to read data
Summary:
Add a bunch of helper methods to "peek" data inside all kinds of entries.
They will be used in the next diff.

The benefit of those helper methods is they handle both dirty offsets and
non-dirty offsets transparently. Previously I have tried to always parse
on-disk entries into in-memory ones and stored them in a hashmap cache.
But that turned to have too much overhead so always reading from disk is
more desirable. It seems to provide at least 2x perf improvement from my
previous quick test.

Reviewed By: DurhamG

Differential Revision: D7377207

fbshipit-source-id: 1b393f1fe64c1d54b986ba7c3b03c790adb694d4
2018-04-13 21:51:39 -07:00
Jun Wu
983d6920f5 indexedlog: add a non-dirty helper method
Summary:
The `non_dirty` helper method enforces the offset to be a non-dirty one.
It will be used frequently for checking offsets read from the disk, since
the on-disk offsets shouldn't have any reference to dirty (in-memory)
entries.

Reviewed By: DurhamG

Differential Revision: D7377209

fbshipit-source-id: c6c381c065d3ba8aaa65698224e4778b86edbc4a
2018-04-13 21:51:39 -07:00
Jun Wu
f0b5cd6eae indexedlog: add simple DirtyOffset abstraction
Summary: The `DirtyOffset` enum converts between array indexes and u64.

Reviewed By: DurhamG

Differential Revision: D7377215

fbshipit-source-id: 29d4f7d74f15523034c11abcc09329a1b21142b1
2018-04-13 21:51:39 -07:00
Jun Wu
3859d00394 indexedlog: implement flush for the main index
Summary:
The flush method will write buffered data to disk.

A mistake in Root entry serialization is fixed - it needs to translate dirty
offsets to non-dirty ones.

Reviewed By: DurhamG

Differential Revision: D7223729

fbshipit-source-id: baeaab27627d6cfb7c5798d3a39be4d2b8811e5f
2018-04-13 21:51:35 -07:00
Jun Wu
8f5c35c8d2 indexedlog: initial main index structure
Summary:
Add the main `Index` structure and its constructor.

The structure focus on the index logic itself. It does not have the checksum
part yet.

Some notes about choices made:
- The use of mmap: mmap is good for random I/O, and has the benefit of
  sharing buffers between processes reading the same file. We may be able to
  do good user-space caching for the random I/O part. But it's harder to
  share the buffers between processes.
- The "read_only" auto decision. Common "open" pattern requires the caller
  to pass whether they want to read or write. The index makes the decision
  for the caller for convenience (ex. running "hg log" on somebody else's
  repo).
- The "load root entry from the end of the file" feature. It's just for
  convenience for users wanting to use the Index in a standalone way. We
  probably

Reviewed By: DurhamG

Differential Revision: D7208358

fbshipit-source-id: 14b74d7e32ef28bd5bc3483fd560c489d36bf8e5
2018-04-13 21:51:35 -07:00
Jun Wu
545f670504 pathencoding: utility for converting between bytes and paths
Summary:
A simple utility that does paths <-> local bytes conversion. It's needed
since Mercurial stores paths using local encoding in manifests.

For POSIX, the code is zero-cost - no real conversion or error can happen.
This is in theory cheaper than what treedirstate does.

For Windows, the "local_encoding" crate is selected as Yuya suggested the
`MultiByteToWideChar` Win32 API [1] and "local_encoding" uses it. It does
the right thing given my experiment with GBK (Chinese, simplified) encoding.

```
  ....
  C:\Users\quark\enc>hg debugshell --config extensions.debugshell=
  >>> repo[0].manifest().text()
  '\xc4\xbf\xc2\xbc1/\xce\xc4\xbc\xfe1\x00b80de5d138758541c5f05265ad144ab9fa86d1db\n'
  >>> repo[0].files()
  ['\xc4\xbf\xc2\xbc1/\xce\xc4\xbc\xfe1']
  extern crate local_encoding;
  use std::path::PathBuf;
  use local_encoding::{Encoder, Encoding};
  const mpath: &[u8] = b"\xc4\xbf\xc2\xbc1/\xce\xc4\xbc\xfe1";
  fn main() {
      let p = PathBuf::from(Encoding::OEM.to_string(mpath).unwrap());
      println!("exists: {}", p.exists());
      println!("mpath len: {}, osstr len: {}", mpath.len(), p.as_path().as_os_str().len());
  }
  exists: true
  mpath len: 11, osstr len: 15
```

In the future, we might normalize the paths to UTF-8 before storing them in
manifest to avoid issues.

Differential Revision: D7319604

fbshipit-source-id: a7ed5284be116c4176598b4c742e8228abcc3b02
2018-04-13 21:51:35 -07:00
Jun Wu
78f4faea65 xdiff: add a preprocessing step that trims files
Summary:
xdiff has a `xdl_trim_ends` step that removes common lines, unmatchable
lines. That is in theory good, but happens too late - after splitting,
hashing, and adjusting the hash values so they are unique. Those splitting,
hashing and adjusting hash values steps could have noticeable overhead.

For not uncommon cases like diffing two large files with minor differences,
the raw performance of those preparation steps seriously matter. Even
allocating an O(N) array and storing line offsets to it is expensive.
Therefore my previous attempts [1] [2] cannot be good enough since they do
not remove the O(N) array assignment.

This patch adds a preprocessing step - `xdl_trim_files` that runs before
other preprocessing steps. It counts common prefix and suffix and lines in
them (needed for displaying line number), without doing anything else.

Testing with a crafted large (169MB) file, with minor change:

```
  open('a','w').write(''.join('%s\n' % (i % 100000) for i in xrange(30000000) if i != 6000000))
  open('b','w').write(''.join('%s\n' % (i % 100000) for i in xrange(30000000) if i != 6003000))
```

Running xdiff by a simple binary [3], this patch improves the xdiff perf by
more than 10x for the above case:

```
  # xdiff before this patch
  2.41s user 1.13s system 98% cpu 3.592 total
  # xdiff after this patch
  0.14s user 0.16s system 98% cpu 0.309 total
  # gnu diffutils
  0.12s user 0.15s system 98% cpu 0.272 total
  # (best of 20 runs)
```

It's still slightly slower than GNU diffutils. But it's pretty close now.

Testing with real repo data:

For the whole repo, this patch makes xdiff 25% faster:

```
  # hg perfbdiff --count 100 --alldata -c d334afc585e2 --blocks [--xdiff]
  # xdiff, after
  ! wall 0.058861 comb 0.050000 user 0.050000 sys 0.000000 (best of 100)
  # xdiff, before
  ! wall 0.077816 comb 0.080000 user 0.080000 sys 0.000000 (best of 91)
  # bdiff
  ! wall 0.117473 comb 0.120000 user 0.120000 sys 0.000000 (best of 67)
```

For files that are long (ex. commands.py), the speedup is more than 3x, very
significant:

```
  # hg perfbdiff --count 3000 --blocks commands.py.i 1 [--xdiff]
  # xdiff, after
  ! wall 0.690583 comb 0.690000 user 0.690000 sys 0.000000 (best of 12)
  # xdiff, before
  ! wall 2.240361 comb 2.210000 user 2.210000 sys 0.000000 (best of 4)
  # bdiff
  ! wall 2.469852 comb 2.440000 user 2.440000 sys 0.000000 (best of 4)
```

The improvement is also seen for the `json` test case mentioned in D7124455.
xdiff's time improves from 0.3s to 0.04s, similar to GNU diffutils.

This patch is also sent as https://phab.mercurial-scm.org/D2686.

[1]: https://phab.mercurial-scm.org/D2631
[2]: https://phab.mercurial-scm.org/D2634
[3]:

```
// Code to run xdiff from command line. No proper error handling.
mmfile_t readfile(const char *path) {
  struct stat st; int fd = open(path, O_RDONLY);
  fstat(fd, &st); mmfile_t f = { malloc(st.st_size), st.st_size };
  ensure(read(fd, f.ptr, st.st_size) == st.st_size); close(fd); return f; }
static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf) { int i;
  for (i = 0; i < nbuf; i++) { write(STDOUT_FILENO, mb[i].ptr, mb[i].size); }
  return 0; }
int main(int argc, char const *argv[]) {
  mmfile_t a = readfile(argv[1]), b = readfile(argv[2]);
  xpparam_t xpp = { XDF_INDENT_HEURISTIC, 0 };
  xdemitconf_t xecfg = { 3, 0 }; xdemitcb_t ecb = { 0, &xdiff_outf };
  xdl_diff(&a, &b, &xpp, &xecfg, &ecb); return 0; }
```

Reviewed By: ryanmce

Differential Revision: D7151582

fbshipit-source-id: 3f2dd43b74da118bd827af4fc5e1bf65be191ad2
2018-04-13 21:51:25 -07:00
Jun Wu
865700883d indexedlog: move mmap_readonly to utils
Summary:
`mmap_readonly` will be reused in `index.rs` so let's moved it to a shared
utils module.

Reviewed By: DurhamG

Differential Revision: D7208359

fbshipit-source-id: d98779e4e21765ce0e185281c9560245b59b174c
2018-04-13 21:51:25 -07:00
Jun Wu
d3b0f0cdfb indexedlog: add RAII file lock
Summary:
Add ScopedFileLock. This is similar to Python's contextmanager.
It's easier to use than the fs2 raw API, since it guarantees the file is
unlocked.

Reviewed By: jsgf

Differential Revision: D7203684

fbshipit-source-id: 5d7beed99ff992466ab7bf1fbea0353de4dfe4f9
2018-04-13 21:51:25 -07:00
Jun Wu
605cd36716 indexedlog: add serialization for root entry
Reviewed By: DurhamG

Differential Revision: D7191653

fbshipit-source-id: 4c82a6b2a00d8e4cb3c67ecb382659ff8946bdad
2018-04-13 21:51:25 -07:00
Jun Wu
0f9d39cae8 indexedlog: add serialization for key entry
Reviewed By: DurhamG

Differential Revision: D7191651

fbshipit-source-id: 8eb8cbc00f0b15660e6d9e988ae41b761d854fa2
2018-04-13 21:51:25 -07:00
Jun Wu
ba05e88179 indexedlog: add serialization for leaf and link entry
Summary: They are simpler than radix entry and similar.

Reviewed By: DurhamG

Differential Revision: D7191652

fbshipit-source-id: b516663567267a2e354748396b44c2ac8ebb691f
2018-04-13 21:51:25 -07:00
Jun Wu
dab5948078 indexedlog: add serialization for radix entry
Summary: Start serialization implementation. First, add support for the radix entry.

Reviewed By: DurhamG

Differential Revision: D7191365

fbshipit-source-id: 54a5ba5c666ba4def1e80eaa2ff7d4d77ff53f8c
2018-04-13 21:51:25 -07:00
Jun Wu
599194b15d indexedlog: define basic structures
Summary: These are Rust structures that map to the file format.

Reviewed By: DurhamG

Differential Revision: D7191366

fbshipit-source-id: 23a4431383be9713e955b74306cd68108eb80536
2018-04-13 21:51:25 -07:00
Jun Wu
6542d0ebf4 indexedlog: add comment about index file format
Summary: Document the format. Actual implementation in later diffs.

Reviewed By: DurhamG

Differential Revision: D7190575

fbshipit-source-id: 243992fd052ca7a9688d54d20694e65daebb9660
2018-04-13 21:51:25 -07:00
Jun Wu
015a4ac5d6 indexedlog: port base16 iterator from radixbuf
Summary:
The append-only index is too different so it's cleaner to cherry-pick code
from radixbuf, instead of modifying radixbuf which would break code
depending on it.

Started by picking the base16 iterator part.

`rustc-test` does not work with buck, and seems to be in an unmaintained
state, so benchmark tests are migrated to criterion.

Reviewed By: DurhamG

Differential Revision: D7189143

fbshipit-source-id: 459a79b4cf16f35d2ff86f11a5980ba1fc627951
2018-04-13 21:51:25 -07:00
Jun Wu
d2c457a6e2 indexedlog: integrity check utility on an append-only file
Summary:
Filesystem is hard. Append-only sounds like a safe way to write files, but it
only really helps with process crashes. If the OS crashes, it's possible that
other parts of the file gets corrupted. As source control, data integrity check
is important. So bytes not logically touched by appending also needs to be
checked.

Implement a `ChecksumTable` which adds integrity check ability to append-only
files. It's intended to be used by future append-only indexes.

Reviewed By: DurhamG

Differential Revision: D7108433

fbshipit-source-id: 16daf6b8d04bba464f1ee9221716beba69c1d47b
2018-04-13 21:51:24 -07:00
Jun Wu
0518016553 indexedlog: initial boilerplate
Summary:
First step of a storage-related building block that is in Rust. The goal is
to use it to replace revlog, obsstore and packfiles.

Extern crates that are likely useful are added to reduce future churns.

Reviewed By: DurhamG

Differential Revision: D7108434

fbshipit-source-id: 97ebd9ba69547d876dcecc05e604acdf9088877e
2018-04-13 21:51:24 -07:00
Kostia Balytskyi
0ef59877cd hg: some portability fixes to py-cdatapack.h
Summary:
1. Variable Length Arrays are not supported by MSVC, but since this is a C++ code, we can just use heap allocation
2. Replacing `inet` with portability version

Depends on D7196403

Reviewed By: quark-zju

Differential Revision: D7196605

fbshipit-source-id: a0d88b6e06f255ef648c0b35a99b42ba3bee538a
2018-04-13 21:51:24 -07:00
Ryan Prince
573a8eb9cc fixing xdiff build on windows
Summary: fixing xdiff build on windows

Reviewed By: quark-zju

Differential Revision: D7189839

fbshipit-source-id: ef05219d911af44f3546bc51fb74539d06b443b5
2018-04-13 21:51:23 -07:00
Jun Wu
81e68a9a57 xdiff: decrease indent heuristic overhead
Summary:
Add a "boring" threshold to limit the search range of the indention heuristic,
so the performance of the diff algorithm is mostly unaffected by turning on
indention heuristic.

Reviewed By: ryanmce

Differential Revision: D7145002

fbshipit-source-id: 024ec685f96aa617fb7da141f38fa4e12c4c0fc9
2018-04-13 21:51:21 -07:00
Jun Wu
511ec41260 xdiff: add a bdiff hunk mode
Summary:
xdiff generated hunks for the differences (ex. questionmarks in the
`@@ -?,?  +?,? @@` part from `diff --git` output). However, bdiff generates
matched hunks instead.

This patch adds a `XDL_EMIT_BDIFFHUNK` flag used by the output function
`xdl_call_hunk_func`.  Once set, xdiff will generate bdiff-like hunks
instead. That makes it easier to use xdiff as a drop-in replacement of bdiff.

Note that since `bdiff('', '')` returns `[(0, 0, 0, 0)]`, the shortcut path
`if (xscr)` is removed. I have checked functions called with `xscr` argument
(`xdl_mark_ignorable`, `xdl_call_hunk_func`, `xdl_emit_diff`,
`xdl_free_script`) work just fine with `xscr = NULL`.

Reviewed By: ryanmce

Differential Revision: D7135207

fbshipit-source-id: cfb8c363e586841c06c94af283c7f014ba65fcc0
2018-04-13 21:51:21 -07:00
Jun Wu
56a738fce4 xdiff: remove patience and histogram diff algorithms
Summary:
Patience diff is the normal diff algorithm, plus some greediness that
unconditionally matches common common unique lines.  That means it is easy to
construct cases to let it generate suboptimal result, like:

```
open('a', 'w').write('\n'.join(list('a' + 'x' * 300 + 'u' + 'x' * 700 + 'a\n')))
open('b', 'w').write('\n'.join(list('b' + 'x' * 700 + 'u' + 'x' * 300 + 'b\n')))
```

Patience diff has been advertised as being able to generate better results for
some C code changes. However, the more scientific way to do that is the
indention heuristic [1].

Since patience diff could generate suboptimal result more easily and its
"better" diff feature could be replaced by the new indention heuristic, let's
just remove it and its variant histogram diff to simplify the code.

[1]: 433860f3d0

Reviewed By: ryanmce

Differential Revision: D7124711

fbshipit-source-id: 127e8de6c75d0262687a1b60814813e660aae3da
2018-04-13 21:51:20 -07:00
Jun Wu
65d9160c6f xdiff: vendor xdiff library from git
Summary:
Vendor git's xdiff library from git commit
d7c6c2369d7c6c2369ac21141b7c6cceaebc6414ec3da14ad using GPL2+ license.

There is another recent user report that hg diff generates suboptimal
result. It seems the fix to issue4074 isn't good enough. I crafted some
other interesting cases, and hg diff barely has any advantage compared with
gnu diffutils or git diff.

| testcase | gnu diffutils |      hg diff |   git diff |
|          |    lines time |   lines time | lines time |
| patience |        6 0.00 |     602 0.08 |     6 0.00 |
|   random |    91772 0.90 |  109462 0.70 | 91772 0.24 |
|     json |        2 0.03 | 1264814 1.81 |     2 0.29 |

"lines" means the size of the output, i.e. the count of "+/-" lines. "time"
means seconds needed to do the calculation. Both are the smaller the better.
"hg diff" counts Python startup overhead.

Git and GNU diffutils generate optimal results. For the "json" case, git can
have an optimization that does a scan for common prefix and suffix first,
and match them if the length is greater than half of the text. See
https://neil.fraser.name/news/2006/03/12/. That would make git the fastest
for all above cases.

About testcases:

patience:
Aiming for the weakness of the greedy "patience diff" algorithm.  Using
git's patience diff option would also get suboptimal result. Generated using
the Python script:

```
open('a', 'w').write('\n'.join(list('a' + 'x' * 300 + 'u' + 'x' * 700 + 'a\n')))
open('b', 'w').write('\n'.join(list('b' + 'x' * 700 + 'u' + 'x' * 300 + 'b\n')))
```

random:
Generated using the script in `test-issue4074.t`. It practically makes the
algorithm suffer. Impressively, git wins in both performance and diff
quality.

json:
The recent user reported case. It's a single line movement near the end of a
very large (800K lines) JSON file.

Reviewed By: ryanmce

Differential Revision: D7124455

fbshipit-source-id: 832651115da770f9d2ed5fdff2e200453c0013f8
2018-04-13 21:51:20 -07:00
Jun Wu
c114d2499b vlqencoding: add read_vlq_at API that works for AsRef<[u8]>
Summary:
This allows us to decode VLQ integers at a given offset, for anything that
implements `AsRef<[u8]>`. Instead of having to couple with a `&mut Read`
interface. The main benefit is to get rid of `mut`. The old `VLQDecode`
interface has to use `&mut Read` since reading has a side effect of changing
the internal position counter.

Reviewed By: markbt

Differential Revision: D7093998

fbshipit-source-id: 20cb14e38c828462c34f32245d0f0f512028b647
2018-04-13 21:51:19 -07:00
Jun Wu
e266793816 vlqencoding: add a benchmark
Summary:
I'm going to add more ways to do VLQ parsing (ex. reading from a `&[u8]`
instead of a `Read` which has to be mutable). So let's add a benchmark to
compare the `&[u8]` version with the `Read` version.

Reviewed By: DurhamG

Differential Revision: D7092960

fbshipit-source-id: e1189de10396516c732dc73b45b7690a1718f1c0
2018-04-13 21:51:19 -07:00
Jun Wu
f547ef9ed0 rust: vendor more crates
Summary:
criterion provides useful utilities for writing benchmarks.
fs2 provides cross-platform file locking.
memmap provides cross-platform mmap.
atomicwrites provides cross-platform atomic file rewrite.
twox-hash provides xxHash fast hash algorithm for integrity check usecase.

Reviewed By: singhsrb

Differential Revision: D7092764

fbshipit-source-id: a3a2a31c198e73701708d7124574ba447ab99c45
2018-04-13 21:51:19 -07:00
Jun Wu
c1bebda5d6 radixbuf: avoid using unstable features in buck build
Summary:
`test::Bencher` is an unstable feature, which is enabled by 3rd-party crate
`rustc-test`. However, `rustc-test` does not work with buck build. So let's
workaround that by allowing all usage of `test::Bencher` to be disabled by a
feature. And turn on that feature in buck build. Cargo build will remain
unchanged.

Reviewed By: singhsrb

Differential Revision: D7011703

fbshipit-source-id: e08ba9516bf7fadb6edb52ab107e0172df0aaf5b
2018-04-13 21:51:12 -07:00
Kostia Balytskyi
62ecc73818 hg: make sure platform_madvise_away returns -1 on Windows
Summary:
On the other two platforms we return the result of `madvise`, so let's return -1,
as this is the error return value of `madvise` on POSIX.

Reviewed By: quark-zju

Differential Revision: D6979093

fbshipit-source-id: 7c715eb459aaad6c21fae6e346e8650211649182
2018-04-13 21:51:11 -07:00
Kostia Balytskyi
c85791785b hg: build cdatapack on Windows
Summary: Seems to be working now.

Reviewed By: quark-zju

Differential Revision: D6970927

fbshipit-source-id: e67753d811819015282f47fcbdfbb263d85f054f
2018-04-13 21:51:10 -07:00
Kostia Balytskyi
5d1139f87d hg: move defines out of struct definition in cdatapack.c
Summary: The current location of these defines is really odd and does not work with the current version of `PACKEDSTRUCT` macro expansion (it expands everything in the same line, therefore `#defines` are inline, which fails to compile.

Reviewed By: quark-zju

Differential Revision: D6970926

fbshipit-source-id: ed01042760fa729004e159b492cf67a4afd25923
2018-04-13 21:51:10 -07:00
Kostia Balytskyi
7d4f6a9033 hg: start using imported mman-win32 in the portability headers
Summary:
Let's create a new portability header, which can be used on both Windows and
Posix.

Reviewed By: quark-zju

Differential Revision: D6970928

fbshipit-source-id: a3970c50260f52bfc0a9420a4ff11d93ace304b0
2018-04-13 21:51:10 -07:00
Kostia Balytskyi
67b2e1496a hg: vendor a third-party implementation of mman library for Windows
Summary: This is needed to make our C code compile on Windows.

Reviewed By: quark-zju

Differential Revision: D6970929

fbshipit-source-id: 2cfe46e0718fe75916912d0e59c5400038e03a12
2018-04-13 21:51:10 -07:00
Jun Wu
d942f5a88e hg: basic support for building hg using buck
Summary:
Adds some basic building blocks to build hg using buck.

Header files are cleaned up, so they are relative to the project root.

Some minor changes to C code are made to remove clang build
warnings.

Rust dependencies, fb-hgext C/Python dependencies (ex. cstore,
mysql-connector), and 3rd-party dependencies like python-lz4
are not built yet. But the built hg binary should be able to run
most tests just fine.

Reviewed By: wez

Differential Revision: D6814686

fbshipit-source-id: 59eefd5a3ad86db2ad1c821ed824c9f1878c93e4
2018-04-13 21:50:58 -07:00
Phil Cohen
c097dde0b9 READMEs: tweaks based on feedback
Summary: Based on feedback to D6687860.

Test Plan: n/a

Reviewers: durham, #mercurial

Reviewed By: durham

Differential Revision: https://phabricator.intern.facebook.com/D6714211

Signature: 6714211:1515788399:386b8f7330f343349234d1f317e5ac0a594142cf
2018-01-12 12:35:52 -08:00
Phil Cohen
bf8527e7a9 lib: add READMEs to lib, extlib, cext 2018-01-09 15:20:46 -08:00
Saurabh Singh
9da30944be cfastmanifest: move to hgext/extlib/
Summary:
Moves ctreemanifest into hgext/extlib/. D6679698 was committed to scratch branch
by mistake.

Test Plan: make local && cd tests && ./run-tests.py

Reviewers: durham, #mercurial, #sourcecontrol

Reviewed By: durham

Differential Revision: https://phabricator.intern.facebook.com/D6684623

Signature: 6684623:1515522634:9bec363d00990d9ff7d5f655e30ab8cae636155c
2018-01-09 10:36:54 -08:00
Durham Goode
228e6a901e cstore: move to hgext/extlib/
Summary: Moves cstore to hgext/extlib/ and makes it build.

Test Plan: make local && run-tests.py

Reviewers: #mercurial

Differential Revision: https://phabricator.intern.facebook.com/D6678852
2018-01-08 17:55:53 -08:00
Durham Goode
eb099b7fe1 cdatapack: move to lib/
Summary:
This moves the cdatapack code to the new lib/ directory and adds it to the main
setup.py.

Test Plan: hg purge --all && make local && cd tests && ./run-tests.py -S -j 48

Reviewers: #mercurial

Differential Revision: https://phabricator.intern.facebook.com/D6677491
2018-01-08 17:55:53 -08:00
Jun Wu
1a84c9d5db linelog: format the code using clang-format
Summary:
I didn't notice the test failure because clang-format was not installed.
Might be a good idea to make it a hard error.

Test Plan: Run test-check-clang-format.t

Reviewers: phillco, #mercurial

Reviewed By: phillco

Subscribers: mathieubaudet

Differential Revision: https://phabricator.intern.facebook.com/D6679576

Signature: 6679576:1515457526:6b1935858da284b896244b0d99e2fef03ead97b8
2018-01-08 16:22:30 -08:00
Jun Wu
1802036ff3 linelog: move to lib/ and mercurial/cyext
Summary:
The `lib/linelog` directory contains pure C code that is unrelated from
either Mercurial or Python. The `mercurial/cyext` contains Cython extension
code (although for linelog's case, the Cython extension is unrelated from
Mercurial).

Cython is now a hard dependence to simplify the code.

Test Plan: `make local` and check `from mercurial.cyext import linelog` works.

Reviewers: durham, #mercurial

Reviewed By: durham

Subscribers: durham, fried

Differential Revision: https://phabricator.intern.facebook.com/D6678541

Signature: 6678541:1515455512:967266dc69c702dbff95fdea05671e11c32ebf28
2018-01-08 14:35:01 -08:00
Mark Thomas
2e81565606 fb-hgext: integrate rust libraries and extensions with setup.py
Summary:
Move the rust libraries and extensions to their new locations, and integrate
them with the hg-crew setup.py.

Test Plan: Run `python setup.py build` and verify rust extensions are built.

Reviewers: durham, #mercurial

Reviewed By: durham

Subscribers: fried, jsgf, mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D6677251

Tasks: T24908724

Signature: 6677251:1515450235:920faf40babbce9b09e3283ff9ca328d1c5c51e6
2018-01-08 15:26:24 -08:00