Commit Graph

572 Commits

Author SHA1 Message Date
Jun Wu
56a738fce4 xdiff: remove patience and histogram diff algorithms
Summary:
Patience diff is the normal diff algorithm, plus some greediness that
unconditionally matches common common unique lines.  That means it is easy to
construct cases to let it generate suboptimal result, like:

```
open('a', 'w').write('\n'.join(list('a' + 'x' * 300 + 'u' + 'x' * 700 + 'a\n')))
open('b', 'w').write('\n'.join(list('b' + 'x' * 700 + 'u' + 'x' * 300 + 'b\n')))
```

Patience diff has been advertised as being able to generate better results for
some C code changes. However, the more scientific way to do that is the
indention heuristic [1].

Since patience diff could generate suboptimal result more easily and its
"better" diff feature could be replaced by the new indention heuristic, let's
just remove it and its variant histogram diff to simplify the code.

[1]: 433860f3d0

Reviewed By: ryanmce

Differential Revision: D7124711

fbshipit-source-id: 127e8de6c75d0262687a1b60814813e660aae3da
2018-04-13 21:51:20 -07:00
Jun Wu
65d9160c6f xdiff: vendor xdiff library from git
Summary:
Vendor git's xdiff library from git commit
d7c6c2369d7c6c2369ac21141b7c6cceaebc6414ec3da14ad using GPL2+ license.

There is another recent user report that hg diff generates suboptimal
result. It seems the fix to issue4074 isn't good enough. I crafted some
other interesting cases, and hg diff barely has any advantage compared with
gnu diffutils or git diff.

| testcase | gnu diffutils |      hg diff |   git diff |
|          |    lines time |   lines time | lines time |
| patience |        6 0.00 |     602 0.08 |     6 0.00 |
|   random |    91772 0.90 |  109462 0.70 | 91772 0.24 |
|     json |        2 0.03 | 1264814 1.81 |     2 0.29 |

"lines" means the size of the output, i.e. the count of "+/-" lines. "time"
means seconds needed to do the calculation. Both are the smaller the better.
"hg diff" counts Python startup overhead.

Git and GNU diffutils generate optimal results. For the "json" case, git can
have an optimization that does a scan for common prefix and suffix first,
and match them if the length is greater than half of the text. See
https://neil.fraser.name/news/2006/03/12/. That would make git the fastest
for all above cases.

About testcases:

patience:
Aiming for the weakness of the greedy "patience diff" algorithm.  Using
git's patience diff option would also get suboptimal result. Generated using
the Python script:

```
open('a', 'w').write('\n'.join(list('a' + 'x' * 300 + 'u' + 'x' * 700 + 'a\n')))
open('b', 'w').write('\n'.join(list('b' + 'x' * 700 + 'u' + 'x' * 300 + 'b\n')))
```

random:
Generated using the script in `test-issue4074.t`. It practically makes the
algorithm suffer. Impressively, git wins in both performance and diff
quality.

json:
The recent user reported case. It's a single line movement near the end of a
very large (800K lines) JSON file.

Reviewed By: ryanmce

Differential Revision: D7124455

fbshipit-source-id: 832651115da770f9d2ed5fdff2e200453c0013f8
2018-04-13 21:51:20 -07:00
Jun Wu
c114d2499b vlqencoding: add read_vlq_at API that works for AsRef<[u8]>
Summary:
This allows us to decode VLQ integers at a given offset, for anything that
implements `AsRef<[u8]>`. Instead of having to couple with a `&mut Read`
interface. The main benefit is to get rid of `mut`. The old `VLQDecode`
interface has to use `&mut Read` since reading has a side effect of changing
the internal position counter.

Reviewed By: markbt

Differential Revision: D7093998

fbshipit-source-id: 20cb14e38c828462c34f32245d0f0f512028b647
2018-04-13 21:51:19 -07:00
Jun Wu
e266793816 vlqencoding: add a benchmark
Summary:
I'm going to add more ways to do VLQ parsing (ex. reading from a `&[u8]`
instead of a `Read` which has to be mutable). So let's add a benchmark to
compare the `&[u8]` version with the `Read` version.

Reviewed By: DurhamG

Differential Revision: D7092960

fbshipit-source-id: e1189de10396516c732dc73b45b7690a1718f1c0
2018-04-13 21:51:19 -07:00
Jun Wu
f547ef9ed0 rust: vendor more crates
Summary:
criterion provides useful utilities for writing benchmarks.
fs2 provides cross-platform file locking.
memmap provides cross-platform mmap.
atomicwrites provides cross-platform atomic file rewrite.
twox-hash provides xxHash fast hash algorithm for integrity check usecase.

Reviewed By: singhsrb

Differential Revision: D7092764

fbshipit-source-id: a3a2a31c198e73701708d7124574ba447ab99c45
2018-04-13 21:51:19 -07:00
Jun Wu
c1bebda5d6 radixbuf: avoid using unstable features in buck build
Summary:
`test::Bencher` is an unstable feature, which is enabled by 3rd-party crate
`rustc-test`. However, `rustc-test` does not work with buck build. So let's
workaround that by allowing all usage of `test::Bencher` to be disabled by a
feature. And turn on that feature in buck build. Cargo build will remain
unchanged.

Reviewed By: singhsrb

Differential Revision: D7011703

fbshipit-source-id: e08ba9516bf7fadb6edb52ab107e0172df0aaf5b
2018-04-13 21:51:12 -07:00
Kostia Balytskyi
62ecc73818 hg: make sure platform_madvise_away returns -1 on Windows
Summary:
On the other two platforms we return the result of `madvise`, so let's return -1,
as this is the error return value of `madvise` on POSIX.

Reviewed By: quark-zju

Differential Revision: D6979093

fbshipit-source-id: 7c715eb459aaad6c21fae6e346e8650211649182
2018-04-13 21:51:11 -07:00
Kostia Balytskyi
c85791785b hg: build cdatapack on Windows
Summary: Seems to be working now.

Reviewed By: quark-zju

Differential Revision: D6970927

fbshipit-source-id: e67753d811819015282f47fcbdfbb263d85f054f
2018-04-13 21:51:10 -07:00
Kostia Balytskyi
5d1139f87d hg: move defines out of struct definition in cdatapack.c
Summary: The current location of these defines is really odd and does not work with the current version of `PACKEDSTRUCT` macro expansion (it expands everything in the same line, therefore `#defines` are inline, which fails to compile.

Reviewed By: quark-zju

Differential Revision: D6970926

fbshipit-source-id: ed01042760fa729004e159b492cf67a4afd25923
2018-04-13 21:51:10 -07:00
Kostia Balytskyi
7d4f6a9033 hg: start using imported mman-win32 in the portability headers
Summary:
Let's create a new portability header, which can be used on both Windows and
Posix.

Reviewed By: quark-zju

Differential Revision: D6970928

fbshipit-source-id: a3970c50260f52bfc0a9420a4ff11d93ace304b0
2018-04-13 21:51:10 -07:00
Kostia Balytskyi
67b2e1496a hg: vendor a third-party implementation of mman library for Windows
Summary: This is needed to make our C code compile on Windows.

Reviewed By: quark-zju

Differential Revision: D6970929

fbshipit-source-id: 2cfe46e0718fe75916912d0e59c5400038e03a12
2018-04-13 21:51:10 -07:00
Jun Wu
d942f5a88e hg: basic support for building hg using buck
Summary:
Adds some basic building blocks to build hg using buck.

Header files are cleaned up, so they are relative to the project root.

Some minor changes to C code are made to remove clang build
warnings.

Rust dependencies, fb-hgext C/Python dependencies (ex. cstore,
mysql-connector), and 3rd-party dependencies like python-lz4
are not built yet. But the built hg binary should be able to run
most tests just fine.

Reviewed By: wez

Differential Revision: D6814686

fbshipit-source-id: 59eefd5a3ad86db2ad1c821ed824c9f1878c93e4
2018-04-13 21:50:58 -07:00
Phil Cohen
c097dde0b9 READMEs: tweaks based on feedback
Summary: Based on feedback to D6687860.

Test Plan: n/a

Reviewers: durham, #mercurial

Reviewed By: durham

Differential Revision: https://phabricator.intern.facebook.com/D6714211

Signature: 6714211:1515788399:386b8f7330f343349234d1f317e5ac0a594142cf
2018-01-12 12:35:52 -08:00
Phil Cohen
bf8527e7a9 lib: add READMEs to lib, extlib, cext 2018-01-09 15:20:46 -08:00
Saurabh Singh
9da30944be cfastmanifest: move to hgext/extlib/
Summary:
Moves ctreemanifest into hgext/extlib/. D6679698 was committed to scratch branch
by mistake.

Test Plan: make local && cd tests && ./run-tests.py

Reviewers: durham, #mercurial, #sourcecontrol

Reviewed By: durham

Differential Revision: https://phabricator.intern.facebook.com/D6684623

Signature: 6684623:1515522634:9bec363d00990d9ff7d5f655e30ab8cae636155c
2018-01-09 10:36:54 -08:00
Durham Goode
228e6a901e cstore: move to hgext/extlib/
Summary: Moves cstore to hgext/extlib/ and makes it build.

Test Plan: make local && run-tests.py

Reviewers: #mercurial

Differential Revision: https://phabricator.intern.facebook.com/D6678852
2018-01-08 17:55:53 -08:00
Durham Goode
eb099b7fe1 cdatapack: move to lib/
Summary:
This moves the cdatapack code to the new lib/ directory and adds it to the main
setup.py.

Test Plan: hg purge --all && make local && cd tests && ./run-tests.py -S -j 48

Reviewers: #mercurial

Differential Revision: https://phabricator.intern.facebook.com/D6677491
2018-01-08 17:55:53 -08:00
Jun Wu
1a84c9d5db linelog: format the code using clang-format
Summary:
I didn't notice the test failure because clang-format was not installed.
Might be a good idea to make it a hard error.

Test Plan: Run test-check-clang-format.t

Reviewers: phillco, #mercurial

Reviewed By: phillco

Subscribers: mathieubaudet

Differential Revision: https://phabricator.intern.facebook.com/D6679576

Signature: 6679576:1515457526:6b1935858da284b896244b0d99e2fef03ead97b8
2018-01-08 16:22:30 -08:00
Jun Wu
1802036ff3 linelog: move to lib/ and mercurial/cyext
Summary:
The `lib/linelog` directory contains pure C code that is unrelated from
either Mercurial or Python. The `mercurial/cyext` contains Cython extension
code (although for linelog's case, the Cython extension is unrelated from
Mercurial).

Cython is now a hard dependence to simplify the code.

Test Plan: `make local` and check `from mercurial.cyext import linelog` works.

Reviewers: durham, #mercurial

Reviewed By: durham

Subscribers: durham, fried

Differential Revision: https://phabricator.intern.facebook.com/D6678541

Signature: 6678541:1515455512:967266dc69c702dbff95fdea05671e11c32ebf28
2018-01-08 14:35:01 -08:00
Mark Thomas
2e81565606 fb-hgext: integrate rust libraries and extensions with setup.py
Summary:
Move the rust libraries and extensions to their new locations, and integrate
them with the hg-crew setup.py.

Test Plan: Run `python setup.py build` and verify rust extensions are built.

Reviewers: durham, #mercurial

Reviewed By: durham

Subscribers: fried, jsgf, mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D6677251

Tasks: T24908724

Signature: 6677251:1515450235:920faf40babbce9b09e3283ff9ca328d1c5c51e6
2018-01-08 15:26:24 -08:00
Durham Goode
0938fe19a3 clib: move fb-hgext/clib/ to lib
Summary:
cdatapack depends on clib, so let's move it to lib/ outside of fb-hgext.

None of the consumers of these files were changed. They will be changed as they
are moved into the main part of the repo.

Test Plan: hg purge --all && make local && cd tests && ./run-tests.py -S -j 48

Reviewers: mitrandir, #mercurial

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D6677197

Signature: 6677197:1515447873:399fb3e7beb5cc1ad8db18f42b359ffbfbeb21f2
2018-01-08 15:08:18 -08:00
Durham Goode
1ab0bb112d sha1: add sha1detectcoll library to setup.py
Summary:
cdatapack depends on sha1detectcoll, so let's add the library to setup.py before
we add cdatapack.

Test Plan:
hg purge --all && make local && cd tests/ && ./run-tests.py -S -j 48

Verified sha1dc was in the build output and the tests passed.

Reviewers: quark, #mercurial

Reviewed By: quark

Differential Revision: https://phabricator.intern.facebook.com/D6676405

Signature: 6676405:1515444508:2da65c6c3a18267a1d3c151c8e9acf60b674ffc2
2018-01-08 12:54:57 -08:00