sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-09 16:31:02 +03:00

Author	SHA1	Message	Date
Pulkit Goyal	6f863f0fc6	test-bdiff: move import inside the function to avoid test failure test-check-module-imports.t fails on some systems where the path of home directories is different than sys.prefix and sys.exec_prefix. Importing silenttestrunner will help avoiding that failure.	2017-02-14 01:52:16 +05:30
Augie Fackler	74b72bf255	tests: fix test-bdiff to handle variance between pure and c bdiff code Obviously we'd rather patch pure to have the same algorithmic win as the C code, but this is a quick fix for the pure build since pure isn't wrong, just not as fast as it could be.	2016-12-15 11:14:00 -05:00
Augie Fackler	7615fa0f90	tests: finish updating test-bdiff to unittest (part 4 of 4)	2016-12-15 11:04:09 -05:00
Augie Fackler	973a0b2065	tests: update more of test-bdiff.py to use unittest (part 3 of 4)	2016-12-15 10:56:26 -05:00
Augie Fackler	d751e0686d	tests: update more of test-bdiff.py to use unittest (part 2 of 4)	2016-12-15 10:50:06 -05:00
Augie Fackler	133f8468d3	tests: migrate test-bdiff.py to use unittest (part 1 of 4) This moves all the test() calls, which were easy and mechanical.	2016-12-15 10:10:15 -05:00
Mads Kiilerich	b3ae903762	bdiff: give slight preference to removing trailing lines [This change could be folded into the previous changeset to minimize the repo churn ...] Similar to the previous change, introduce an exception to the general preference for matches in the middle of bdiff ranges: If the best match on the B side starts at the beginning of the bdiff range, don't aim for the middle-most A side match but for the earliest. New (later) matches on the A side will only be considered better if the corresponding match on the B side not is at the beginning of the range. Thus, if the best (middle-most) match on the B side turns out to be at the beginning of the range, the earliest match on the A side will be used. The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from 22807275 to 22808120 bytes - a 0.004% increase.	2016-11-15 21:56:49 +01:00
Mads Kiilerich	7eb5c806da	bdiff: give slight preference to appending lines [This change could be folded into the previous changeset to minimize the repo churn ...] The general preference to matches in the middle of bdiff ranges helps getting balanced recursion and efficient computation. But, as previous changes have shown, it might also give diffs that seems "obviously wrong". To mitigate that: If the best match on the A side starts at the beginning of the bdiff range, don't aim for the middle-most B side match but for the earliest. This will make the matches balanced (by both sides being "early") even though the bisection will be less balanced. Still, this case only apply if the best and middle-most match was fully unbalanced on the A side. Each recursion will thus even in this worst case reduce the problem significantly and we are not re-introducing the problem that was fixed in d3deb406b55b. The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from 22806817 to 22807275 bytes - a 0.002% increase. This make the recent test-bdiff.py changes give a more pretty output ... but they no longer show that the recursion is around middle matches (because it in these cases isn't).	2016-11-15 21:56:49 +01:00
Mads Kiilerich	b5feb5a49b	bdiff: give slight preference to longest matches in the middle of the B side We already have a slight preference for matches close to the middle on the A side. Now, do the same on the B side. j is iterating the b range backwards and we thus accept a new j if the previous match was in the upper half. This makes the test-bhalf diff "correct". It obviously also gives more preference to balanced recursion than to appending to sequences. That is kind of correct, but will also unfortunately make some bundles bigger. No doubt, we can also create examples where it will make them smaller ... The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from 22803824 to 22806817 bytes - an 0.01% increase.	2016-11-08 18:37:33 +01:00
Mads Kiilerich	3f0e3f18d8	bdiff: adjust criteria for getting optimal longest match in the A side middle We prefer matches closer to the middle to balance recursion, as introduced in d3deb406b55b. For ranges with uneven length, matches starting exactly in the middle should have preference. That will be optimal for matches of length 1. We will thus accept equality in the half check. For ranges with even length, half was ceil'ed when calculated but we got the preference for low matches from the 'less than half' check. To get the same result as before when we also accept equality, floor it. Without that, test-annotate.t would show some different (still correct but less optimal) results. This will change the heuristics. Tests shows a slightly different output - and sometimes slightly smaller bundles. The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from 22804885 to 22803824 bytes - an 0.005% reduction.	2016-11-08 18:37:33 +01:00
Mads Kiilerich	9d1edc2d4f	tests: explore some bdiff cases	2016-11-08 18:37:33 +01:00
Mads Kiilerich	f19d3ccfaf	tests: make test-bdiff.py easier to maintain Add more stdout logging to help navigate the .out file.	2016-11-15 21:56:49 +01:00
Matt Mackall	f33142790b	bdiff: deal better with duplicate lines The longest_match code compares all the possible positions in two files to find the best match. Given a pair of sequences, it effectively searches a grid like this: a b b b c . d e . f 0 1 2 3 4 5 6 7 8 9 a 1 - - - - - - - - - b - 2 1 1 - - - - - - b - 1 3 2 - - - - - - b - 1 2 4 - - - - - - . - - - - - 1 - - 1 - Here, the 4 in the middle says "the first four lines of the file match", which it can compute be comparing the fourth lines and then adding one to the result found when comparing the third lines in the entry to the upper left. We generally avoid the quadratic worst case by only looking at lines that match, which is precomputed. We also avoid quadratic storage by only keeping a single column vector and then keeping track of the best match. Unfortunately, this can get us into trouble with the sequences above. Because we want to reuse the '3' value when calculating the '4', we need to be careful not to overwrite it with the '2' we calculate immediately before. If we scan left to right, top to bottom, we're going to have a problem: we'll overwrite our 3 before we use it and calculate a suboptimal best match. To address this, we can either keep two column vectors and swap between them (which significantly complicates bookkeeping), or change our scanning order. If we instead scan from left to right, bottom to top, we'll avoid ever overwriting values we'll need in the future. This unfortunately needs several changes to be made simultaneously: - change the order we build the initial hash chains for the b sequence - change the sentinel values from INT_MAX to -1 - change the visit order in the longest_match inner loop - add a tie-breaker preference for earlier matches This last is needed because we previously had an implicit tie-breaker from our visitation order that our test suite relies on. Later matches can also trigger a bug in the normalization code in diff().	2016-04-21 21:05:26 -05:00
Robert Stanca	4687d363df	py3: use print_function in test-bdiff.py	2016-04-03 06:16:17 +03:00
Robert Stanca	f0f5c61f68	py3: use absolute_import in test-bdiff.py	2016-04-03 06:12:18 +03:00
Patrick Mezard	c4cef76e25	mdiff: replace wscleanup() regexps with C loops On my system it reduces: hg annotate -w mercurial/commands.py from 36s to less than 8s, to be compared with 6.3s when run without whitespace options.	2011-11-18 14:23:03 +01:00
Dan Villiom Podlaski Christiansen	f385faac7a	*: kill all unnecessary shebangs.	2010-10-26 12:18:39 +02:00
Martin Geisler	58e8fb6277	removed unused imports	2009-05-30 23:20:30 +02:00
Martin Geisler	aedc0ac57f	tests: removed unnecessary execute bit on Python tests	2009-05-17 01:42:21 +02:00
Martin Geisler	a645da9ed1	tests: renamed Python tests to .py	2009-05-17 01:39:31 +02:00

20 Commits