Commit Graph

20 Commits

Author SHA1 Message Date
Pulkit Goyal
6f863f0fc6 test-bdiff: move import inside the function to avoid test failure
test-check-module-imports.t fails on some systems where the path of home
directories is different than sys.prefix and sys.exec_prefix.
Importing silenttestrunner will help avoiding that failure.
2017-02-14 01:52:16 +05:30
Augie Fackler
74b72bf255 tests: fix test-bdiff to handle variance between pure and c bdiff code
Obviously we'd rather patch pure to have the same algorithmic win as
the C code, but this is a quick fix for the pure build since pure
isn't wrong, just not as fast as it could be.
2016-12-15 11:14:00 -05:00
Augie Fackler
7615fa0f90 tests: finish updating test-bdiff to unittest (part 4 of 4) 2016-12-15 11:04:09 -05:00
Augie Fackler
973a0b2065 tests: update more of test-bdiff.py to use unittest (part 3 of 4) 2016-12-15 10:56:26 -05:00
Augie Fackler
d751e0686d tests: update more of test-bdiff.py to use unittest (part 2 of 4) 2016-12-15 10:50:06 -05:00
Augie Fackler
133f8468d3 tests: migrate test-bdiff.py to use unittest (part 1 of 4)
This moves all the test() calls, which were easy and mechanical.
2016-12-15 10:10:15 -05:00
Mads Kiilerich
b3ae903762 bdiff: give slight preference to removing trailing lines
[This change could be folded into the previous changeset to minimize the repo
churn ...]

Similar to the previous change, introduce an exception to the general
preference for matches in the middle of bdiff ranges: If the best match on the
B side starts at the beginning of the bdiff range, don't aim for the
middle-most A side match but for the earliest.

New (later) matches on the A side will only be considered better if the
corresponding match on the B side *not* is at the beginning of the range.
Thus, if the best (middle-most) match on the B side turns out to be at the
beginning of the range, the earliest match on the A side will be used.

The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from
22807275 to 22808120 bytes - a 0.004% increase.
2016-11-15 21:56:49 +01:00
Mads Kiilerich
7eb5c806da bdiff: give slight preference to appending lines
[This change could be folded into the previous changeset to minimize the repo
churn ...]

The general preference to matches in the middle of bdiff ranges helps getting
balanced recursion and efficient computation. But, as previous changes have
shown, it might also give diffs that seems "obviously wrong".

To mitigate that: If the best match on the A side starts at the beginning of
the bdiff range, don't aim for the middle-most B side match but for the
earliest.

This will make the matches balanced (by both sides being "early") even though
the bisection will be less balanced. Still, this case only apply if the *best*
and middle-most match was fully unbalanced on the A side. Each recursion will
thus even in this worst case reduce the problem significantly and we are not
re-introducing the problem that was fixed in d3deb406b55b.

The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from
22806817 to 22807275 bytes - a 0.002% increase.

This make the recent test-bdiff.py changes give a more pretty output ... but
they no longer show that the recursion is around middle matches (because it in
these cases isn't).
2016-11-15 21:56:49 +01:00
Mads Kiilerich
b5feb5a49b bdiff: give slight preference to longest matches in the middle of the B side
We already have a slight preference for matches close to the middle on the A
side. Now, do the same on the B side.

j is iterating the b range backwards and we thus accept a new j if the previous
match was in the upper half.

This makes the test-bhalf diff "correct". It obviously also gives more
preference to balanced recursion than to appending to sequences. That is kind
of correct, but will also unfortunately make some bundles bigger. No doubt, we
can also create examples where it will make them smaller ...

The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from
22803824 to 22806817 bytes - an 0.01% increase.
2016-11-08 18:37:33 +01:00
Mads Kiilerich
3f0e3f18d8 bdiff: adjust criteria for getting optimal longest match in the A side middle
We prefer matches closer to the middle to balance recursion, as introduced in
d3deb406b55b.

For ranges with uneven length, matches starting exactly in the middle should
have preference. That will be optimal for matches of length 1. We will thus
accept equality in the half check.

For ranges with even length, half was ceil'ed when calculated but we got the
preference for low matches from the 'less than half' check. To get the same
result as before when we also accept equality, floor it. Without that,
test-annotate.t would show some different (still correct but less optimal)
results.

This will change the heuristics. Tests shows a slightly different output - and
sometimes slightly smaller bundles.

The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from
22804885 to 22803824 bytes - an 0.005% reduction.
2016-11-08 18:37:33 +01:00
Mads Kiilerich
9d1edc2d4f tests: explore some bdiff cases 2016-11-08 18:37:33 +01:00
Mads Kiilerich
f19d3ccfaf tests: make test-bdiff.py easier to maintain
Add more stdout logging to help navigate the .out file.
2016-11-15 21:56:49 +01:00
Matt Mackall
f33142790b bdiff: deal better with duplicate lines
The longest_match code compares all the possible positions in two
files to find the best match. Given a pair of sequences, it
effectively searches a grid like this:

  a b b b c . d e . f
  0 1 2 3 4 5 6 7 8 9
a 1 - - - - - - - - -
b - 2 1 1 - - - - - -
b - 1 3 2 - - - - - -
b - 1 2 4 - - - - - -
. - - - - - 1 - - 1 -


Here, the 4 in the middle says "the first four lines of the
file match", which it can compute be comparing the fourth lines and
then adding one to the result found when comparing the third lines in
the entry to the upper left.

We generally avoid the quadratic worst case by only looking at lines
that match, which is precomputed. We also avoid quadratic storage by
only keeping a single column vector and then keeping track of the best
match.

Unfortunately, this can get us into trouble with the sequences above.
Because we want to reuse the '3' value when calculating the '4', we
need to be careful not to overwrite it with the '2' we calculate
immediately before. If we scan left to right, top to bottom, we're
going to have a problem: we'll overwrite our 3 before we use it and
calculate a suboptimal best match.

To address this, we can either keep two column vectors and swap
between them (which significantly complicates bookkeeping), or change
our scanning order. If we instead scan from left to right, bottom to
top, we'll avoid ever overwriting values we'll need in the future.

This unfortunately needs several changes to be made simultaneously:

- change the order we build the initial hash chains for the b sequence
- change the sentinel values from INT_MAX to -1
- change the visit order in the longest_match inner loop
- add a tie-breaker preference for earlier matches

This last is needed because we previously had an implicit tie-breaker
from our visitation order that our test suite relies on. Later matches
can also trigger a bug in the normalization code in diff().
2016-04-21 21:05:26 -05:00
Robert Stanca
4687d363df py3: use print_function in test-bdiff.py 2016-04-03 06:16:17 +03:00
Robert Stanca
f0f5c61f68 py3: use absolute_import in test-bdiff.py 2016-04-03 06:12:18 +03:00
Patrick Mezard
c4cef76e25 mdiff: replace wscleanup() regexps with C loops
On my system it reduces:

  hg annotate -w mercurial/commands.py

from 36s to less than 8s, to be compared with 6.3s when run without whitespace
options.
2011-11-18 14:23:03 +01:00
Dan Villiom Podlaski Christiansen
f385faac7a *: kill all unnecessary shebangs. 2010-10-26 12:18:39 +02:00
Martin Geisler
58e8fb6277 removed unused imports 2009-05-30 23:20:30 +02:00
Martin Geisler
aedc0ac57f tests: removed unnecessary execute bit on Python tests 2009-05-17 01:42:21 +02:00
Martin Geisler
a645da9ed1 tests: renamed Python tests to .py 2009-05-17 01:39:31 +02:00