Summary:
Patience diff is the normal diff algorithm, plus some greediness that
unconditionally matches common common unique lines. That means it is easy to
construct cases to let it generate suboptimal result, like:
```
open('a', 'w').write('\n'.join(list('a' + 'x' * 300 + 'u' + 'x' * 700 + 'a\n')))
open('b', 'w').write('\n'.join(list('b' + 'x' * 700 + 'u' + 'x' * 300 + 'b\n')))
```
Patience diff has been advertised as being able to generate better results for
some C code changes. However, the more scientific way to do that is the
indention heuristic [1].
Since patience diff could generate suboptimal result more easily and its
"better" diff feature could be replaced by the new indention heuristic, let's
just remove it and its variant histogram diff to simplify the code.
[1]: 433860f3d0
Reviewed By: ryanmce
Differential Revision: D7124711
fbshipit-source-id: 127e8de6c75d0262687a1b60814813e660aae3da
Summary:
Vendor git's xdiff library from git commit
d7c6c2369d7c6c2369ac21141b7c6cceaebc6414ec3da14ad using GPL2+ license.
There is another recent user report that hg diff generates suboptimal
result. It seems the fix to issue4074 isn't good enough. I crafted some
other interesting cases, and hg diff barely has any advantage compared with
gnu diffutils or git diff.
| testcase | gnu diffutils | hg diff | git diff |
| | lines time | lines time | lines time |
| patience | 6 0.00 | 602 0.08 | 6 0.00 |
| random | 91772 0.90 | 109462 0.70 | 91772 0.24 |
| json | 2 0.03 | 1264814 1.81 | 2 0.29 |
"lines" means the size of the output, i.e. the count of "+/-" lines. "time"
means seconds needed to do the calculation. Both are the smaller the better.
"hg diff" counts Python startup overhead.
Git and GNU diffutils generate optimal results. For the "json" case, git can
have an optimization that does a scan for common prefix and suffix first,
and match them if the length is greater than half of the text. See
https://neil.fraser.name/news/2006/03/12/. That would make git the fastest
for all above cases.
About testcases:
patience:
Aiming for the weakness of the greedy "patience diff" algorithm. Using
git's patience diff option would also get suboptimal result. Generated using
the Python script:
```
open('a', 'w').write('\n'.join(list('a' + 'x' * 300 + 'u' + 'x' * 700 + 'a\n')))
open('b', 'w').write('\n'.join(list('b' + 'x' * 700 + 'u' + 'x' * 300 + 'b\n')))
```
random:
Generated using the script in `test-issue4074.t`. It practically makes the
algorithm suffer. Impressively, git wins in both performance and diff
quality.
json:
The recent user reported case. It's a single line movement near the end of a
very large (800K lines) JSON file.
Reviewed By: ryanmce
Differential Revision: D7124455
fbshipit-source-id: 832651115da770f9d2ed5fdff2e200453c0013f8
Summary:
This allows us to decode VLQ integers at a given offset, for anything that
implements `AsRef<[u8]>`. Instead of having to couple with a `&mut Read`
interface. The main benefit is to get rid of `mut`. The old `VLQDecode`
interface has to use `&mut Read` since reading has a side effect of changing
the internal position counter.
Reviewed By: markbt
Differential Revision: D7093998
fbshipit-source-id: 20cb14e38c828462c34f32245d0f0f512028b647
Summary:
I'm going to add more ways to do VLQ parsing (ex. reading from a `&[u8]`
instead of a `Read` which has to be mutable). So let's add a benchmark to
compare the `&[u8]` version with the `Read` version.
Reviewed By: DurhamG
Differential Revision: D7092960
fbshipit-source-id: e1189de10396516c732dc73b45b7690a1718f1c0
Summary:
`test::Bencher` is an unstable feature, which is enabled by 3rd-party crate
`rustc-test`. However, `rustc-test` does not work with buck build. So let's
workaround that by allowing all usage of `test::Bencher` to be disabled by a
feature. And turn on that feature in buck build. Cargo build will remain
unchanged.
Reviewed By: singhsrb
Differential Revision: D7011703
fbshipit-source-id: e08ba9516bf7fadb6edb52ab107e0172df0aaf5b
Summary:
On the other two platforms we return the result of `madvise`, so let's return -1,
as this is the error return value of `madvise` on POSIX.
Reviewed By: quark-zju
Differential Revision: D6979093
fbshipit-source-id: 7c715eb459aaad6c21fae6e346e8650211649182
Summary: The current location of these defines is really odd and does not work with the current version of `PACKEDSTRUCT` macro expansion (it expands everything in the same line, therefore `#defines` are inline, which fails to compile.
Reviewed By: quark-zju
Differential Revision: D6970926
fbshipit-source-id: ed01042760fa729004e159b492cf67a4afd25923
Summary:
Let's create a new portability header, which can be used on both Windows and
Posix.
Reviewed By: quark-zju
Differential Revision: D6970928
fbshipit-source-id: a3970c50260f52bfc0a9420a4ff11d93ace304b0
Summary: This is needed to make our C code compile on Windows.
Reviewed By: quark-zju
Differential Revision: D6970929
fbshipit-source-id: 2cfe46e0718fe75916912d0e59c5400038e03a12
Summary:
Adds some basic building blocks to build hg using buck.
Header files are cleaned up, so they are relative to the project root.
Some minor changes to C code are made to remove clang build
warnings.
Rust dependencies, fb-hgext C/Python dependencies (ex. cstore,
mysql-connector), and 3rd-party dependencies like python-lz4
are not built yet. But the built hg binary should be able to run
most tests just fine.
Reviewed By: wez
Differential Revision: D6814686
fbshipit-source-id: 59eefd5a3ad86db2ad1c821ed824c9f1878c93e4
Summary: Based on feedback to D6687860.
Test Plan: n/a
Reviewers: durham, #mercurial
Reviewed By: durham
Differential Revision: https://phabricator.intern.facebook.com/D6714211
Signature: 6714211:1515788399:386b8f7330f343349234d1f317e5ac0a594142cf
Summary:
Moves ctreemanifest into hgext/extlib/. D6679698 was committed to scratch branch
by mistake.
Test Plan: make local && cd tests && ./run-tests.py
Reviewers: durham, #mercurial, #sourcecontrol
Reviewed By: durham
Differential Revision: https://phabricator.intern.facebook.com/D6684623
Signature: 6684623:1515522634:9bec363d00990d9ff7d5f655e30ab8cae636155c
Summary:
This moves the cdatapack code to the new lib/ directory and adds it to the main
setup.py.
Test Plan: hg purge --all && make local && cd tests && ./run-tests.py -S -j 48
Reviewers: #mercurial
Differential Revision: https://phabricator.intern.facebook.com/D6677491
Summary:
I didn't notice the test failure because clang-format was not installed.
Might be a good idea to make it a hard error.
Test Plan: Run test-check-clang-format.t
Reviewers: phillco, #mercurial
Reviewed By: phillco
Subscribers: mathieubaudet
Differential Revision: https://phabricator.intern.facebook.com/D6679576
Signature: 6679576:1515457526:6b1935858da284b896244b0d99e2fef03ead97b8
Summary:
The `lib/linelog` directory contains pure C code that is unrelated from
either Mercurial or Python. The `mercurial/cyext` contains Cython extension
code (although for linelog's case, the Cython extension is unrelated from
Mercurial).
Cython is now a hard dependence to simplify the code.
Test Plan: `make local` and check `from mercurial.cyext import linelog` works.
Reviewers: durham, #mercurial
Reviewed By: durham
Subscribers: durham, fried
Differential Revision: https://phabricator.intern.facebook.com/D6678541
Signature: 6678541:1515455512:967266dc69c702dbff95fdea05671e11c32ebf28
Summary:
Move the rust libraries and extensions to their new locations, and integrate
them with the hg-crew setup.py.
Test Plan: Run `python setup.py build` and verify rust extensions are built.
Reviewers: durham, #mercurial
Reviewed By: durham
Subscribers: fried, jsgf, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D6677251
Tasks: T24908724
Signature: 6677251:1515450235:920faf40babbce9b09e3283ff9ca328d1c5c51e6
Summary:
cdatapack depends on clib, so let's move it to lib/ outside of fb-hgext.
None of the consumers of these files were changed. They will be changed as they
are moved into the main part of the repo.
Test Plan: hg purge --all && make local && cd tests && ./run-tests.py -S -j 48
Reviewers: mitrandir, #mercurial
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D6677197
Signature: 6677197:1515447873:399fb3e7beb5cc1ad8db18f42b359ffbfbeb21f2
Summary:
cdatapack depends on sha1detectcoll, so let's add the library to setup.py before
we add cdatapack.
Test Plan:
hg purge --all && make local && cd tests/ && ./run-tests.py -S -j 48
Verified sha1dc was in the build output and the tests passed.
Reviewers: quark, #mercurial
Reviewed By: quark
Differential Revision: https://phabricator.intern.facebook.com/D6676405
Signature: 6676405:1515444508:2da65c6c3a18267a1d3c151c8e9acf60b674ffc2