Commit Graph

33 Commits

Author SHA1 Message Date
Augie Fackler
e82e3401a1 tests: handle variation between pure and normal output in annotate --skip
I'm pretty sure that both results are valid, depending on how you
slice the edits.
2017-06-10 10:46:06 -04:00
Yuya Nishihara
f1b986c603 annotate: restructure formatter output to be nested list (BC)
Annotate data should be in [(file, [line...])...] form, but there was no
API to represent such data structure when I ported it to formatter. Now
we have fm.nested() and the -T option is still experimental, so we can fix
the data format.
2017-06-03 00:25:24 +09:00
Siddharth Agarwal
51c7d417b5 annotate: add a new experimental --skip option to skip revs
This option is most useful for mechanical code modifications, especially ones
that retain the same number of lines.
2017-05-24 19:39:33 -07:00
FUJIWARA Katsunori
1ff2143781 revset: add i18n comments to error messages for followlines predicate
This patch also includes un-quoting "descend" keyword for similarity
to other error messages (this seems too trivial as a separated patch).
2017-05-01 05:52:36 +09:00
Denis Laxalde
7cc06d2fbf context: start walking from "introrev" in blockancestors()
Previously, calling blockancestors() with a fctx not touching file would
sometimes yield this filectx first, instead of the first "block ancestor",
because when compared to its parent it may have changes in specified line
range despite not touching the file at all.

Fixing this by starting the algorithm from the "base" filectx obtained using
fctx.introrev() (as done in annotate()).

In tests, add a changeset not touching file we want to follow lines of to
cover this case. Do this in test-annotate.t for followlines revset tests and
in test-hgweb-filelog.t for /log/<rev>/<file>?linerange=<from>:<to> tests.
2017-04-20 21:40:28 +02:00
Denis Laxalde
9e99218a46 revset: properly parse "descend" argument of followlines()
We parse "descend" symbol as a Boolean using getboolean (prior extraction by
getargsdict already checked that it is a symbol).

In tests, check for error cases and vary Boolean values here and there.
2017-04-15 11:29:42 +02:00
Denis Laxalde
631e6988ca context: possibly yield initial fctx in blockdescendants()
If initial 'fctx' has changes in line range with respect to its parents, we
yield it first. This makes 'followlines(..., descend=True)' consistent with
'descendants()' revset which yields the starting revision.

We reuse one iteration of blockancestors() which does exactly what we want.

In test-annotate.t, adjust 'startrev' in one case to cover the situation where
the starting revision does not touch specified line range.
2017-04-14 14:25:06 +02:00
Denis Laxalde
559326afdb context: add an assertion checking linerange consistency in blockdescendants()
If this assertion fails, this indicates a flaw in the algorithm. So fail fast
instead of possibly producing wrong results.

Also extend the target line range in test to catch a merge changeset with all
its parents.
2017-04-14 14:09:26 +02:00
Denis Laxalde
761577866a context: follow all branches in blockdescendants()
In the initial implementation of blockdescendants (and thus followlines(...,
descend=True) revset), only the first branch encountered in descending
direction was followed.

Update the algorithm so that all children of a revision ('x' in code) are
considered. Accordingly, we need to prevent a child revision to be yielded
multiple times when it gets visited through different path, so we skip 'i'
when this occurs. Finally, since we now consider all parents of a possible
child touching a given line range, we take care of yielding the child if it
has a diff in specified line range with at least one of its parent (same logic
as blockancestors()).
2017-04-14 08:55:18 +02:00
Denis Laxalde
779e08447b revset: add a 'descend' argument to followlines to return descendants
This is useful to follow changes in a block of lines forward in the history
(for instance, when one wants to find out how a function evolved from a point
in history).

We added a 'descend' parameter to followlines(), which defaults to False. If
True, followlines() returns descendants of startrev.

Because context.blockdescendants() does not follow renames, these are not
followed by the revset either, so history will end when a rename occurs (as
can be seen in tests).
2017-01-16 09:24:47 +01:00
Yuya Nishihara
5ade140d5c revset: abuse x:y syntax to specify line range of followlines()
This slightly complicates the parsing (see the previous patch), but the
overall result seems not bad.

I keep x:, :y and : for future extension.
2017-01-09 17:58:19 +09:00
Yuya Nishihara
a73b0aaf6b revset: rename rev argument of followlines() to startrev
The rev argument has the same meaning as startrev of follow(), and I think
startrev is more informative.

followlines() is new function, we can make BC now.
2017-01-09 16:16:26 +09:00
Yuya Nishihara
d04abe7517 revset: parse variable-length arguments of followlines() by getargsdict() 2017-01-09 16:02:56 +09:00
Denis Laxalde
20d1dad252 revset: add a followlines(file, fromline, toline[, rev]) revset
This revset returns the history of a range of lines (fromline, toline) of a
file starting from `rev` or the current working directory.

Added tests in test-annotate.t which already contains a reasonably complex
repository.
2017-01-04 16:47:49 +01:00
Mads Kiilerich
7eb5c806da bdiff: give slight preference to appending lines
[This change could be folded into the previous changeset to minimize the repo
churn ...]

The general preference to matches in the middle of bdiff ranges helps getting
balanced recursion and efficient computation. But, as previous changes have
shown, it might also give diffs that seems "obviously wrong".

To mitigate that: If the best match on the A side starts at the beginning of
the bdiff range, don't aim for the middle-most B side match but for the
earliest.

This will make the matches balanced (by both sides being "early") even though
the bisection will be less balanced. Still, this case only apply if the *best*
and middle-most match was fully unbalanced on the A side. Each recursion will
thus even in this worst case reduce the problem significantly and we are not
re-introducing the problem that was fixed in d3deb406b55b.

The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from
22806817 to 22807275 bytes - a 0.002% increase.

This make the recent test-bdiff.py changes give a more pretty output ... but
they no longer show that the recursion is around middle matches (because it in
these cases isn't).
2016-11-15 21:56:49 +01:00
Mads Kiilerich
b5feb5a49b bdiff: give slight preference to longest matches in the middle of the B side
We already have a slight preference for matches close to the middle on the A
side. Now, do the same on the B side.

j is iterating the b range backwards and we thus accept a new j if the previous
match was in the upper half.

This makes the test-bhalf diff "correct". It obviously also gives more
preference to balanced recursion than to appending to sequences. That is kind
of correct, but will also unfortunately make some bundles bigger. No doubt, we
can also create examples where it will make them smaller ...

The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from
22803824 to 22806817 bytes - an 0.01% increase.
2016-11-08 18:37:33 +01:00
Jun Wu
a5ec43fe40 annotate: pre-calculate the "needed" dictionary (issue5360)
The "needed" dict is used as a reference counter to free items in the giant
"hist" dict. However, currently it is not very accurate and can lead to
dropping "hist" items unnecessarily, for example, with the following DAG,

       -3-
      /   \
  0--1--2--4--

The current algorithm will visit and calculate rev 1 twice, undesired. And
it tries to be smart by clearing rev 1's parents: "pcache[1] = []" at the
time hist[1] being accessed (note: hist[1] needs to be used twice, by rev 2
and rev 3). It can result in incorrect results if p1 of rev 4 deletes chunks
belonging to rev 0.

However, simply removing "needed" is not okay, because it will consume 10x
memory:

  # without any change
  % HGRCPATH= lrun ./hg annotate mercurial/commands.py -r d130a38 3>&2 [1]
  MEMORY   49074176
  CPUTIME  9.213
  REALTIME 9.270

  # with "needed" removed
  MEMORY   637673472
  CPUTIME  8.164
  REALTIME 8.249

This patch moves "needed" (and "pcache") calculation to a separate DFS to
address the issue. It improves perf and fixes issue5360 by correctly reusing
hist, while maintaining low memory usage. Some additional attempt has been
made to further reduce memory usage, like changing "pcache[f] = []" to "del
pcache[f]". Therefore the result can be both faster and lower memory usage:

  # with this patch applied
  MEMORY   47575040
  CPUTIME  7.870
  REALTIME 7.926

[1]: lrun is a lightweight sandbox built on Linux cgroup and namespace. It's
     used to measure CPU and memory usage here. Source code is available at
     github.com/quark-zju/lrun.
2016-09-02 15:20:59 +01:00
Pierre-Yves David
30913031d4 error: get Abort from 'error' instead of 'util'
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.

For great justice.
2015-10-08 12:55:45 -07:00
Yuya Nishihara
5b51ae23c8 committablefilectx: propagate ancestry info to parent to fix annotation
Before this patch, annotating working directory could include wrong revisions
that were hidden or belonged to different branches. This fixes wfctx.parents()
to set _descendantrev so that all ancestors can take advantage of the linkrev
adjustment introduced at a5aaaeedd6cb. _adjustlinkrev() can handle 'None'
revision thanks to bb19d597bbcd.
2015-04-18 14:10:55 +09:00
Matt Harbison
e0846fc2d8 test-annotate: conditionalize error output for Windows
It seems better to leave the actual output in place instead of globbing
everything but 'abort:', in case it starts aborting for other reasons.

It isn't clear the purpose for reversing the file name position, but that
originates in windows.posixfile.
2015-03-29 00:20:56 -04:00
Yuya Nishihara
676ee8092d annotate: add option to annotate working-directory files
Working revision or node is displayed with "+" suffix in plain output, but
null/None in machine-readable format.
2014-08-16 17:50:55 +09:00
Pierre-Yves David
451115c9e1 linkrev: also adjust linkrev when bootstrapping annotate (issue4305)
The annotate logic now use the new 'introrev' method to bootstrap its traversal.
This catches issues from linkrev-shadowing of the changeset introducing the
version of a file in source changeset.

More tests have been added to display pathological cases.
2014-12-24 03:26:48 -08:00
Pierre-Yves David
3c79d53ced filectx.parents: enforce changeid of parent to be in own changectx ancestors
Because of the way filenodes are computed, you can have multiple changesets
"introducing" the same file revision. For example, in the changeset graph
below, changeset 2 and 3 both change a file -to- and -from- the same content.

  o 3: content = new
  |
  | o 2: content = new
  |/
  o 1: content = old

In such cases, the file revision is create once, when 2 is added, and just reused
for 3. So the file change in '3' (from "old" to "new)" has no linkrev pointing
to it).  We'll call this situation "linkrev-shadowing". As the linkrev is used for
optimization purposes when walking a file history, the linkrev-shadowing
results in an unexpected jump to another branch during such a walk.. This leads to
multiple bugs with log, annotate and rename detection.

One element to fix such bugs is to ensure that walking the file history sticks on
the same topology as the changeset's history. For this purpose, we extend the
logic in 'basefilectx.parents' so that it always defines the proper changeset
to associate the parent file revision with. This "proper" changeset has to be an
ancestor of the changeset associated with the child file revision.

This logic is performed in the '_adjustlinkrev' function. This function is
given the starting changeset and all the information regarding the parent file
revision. If the linkrev for the file revision is an ancestor of the starting
changeset, the linkrev is valid and will be used. If it is not, we detected a
topological jump caused by linkrev shadowing, we are going to walk the
ancestors of the starting changeset until we find one setting the file to the
revision we are trying to create.

The performance impact appears acceptable:

- We are walking the changelog once for each filelog traversal (as there should
  be no overlap between searches),

- changelog traversal itself is fairly cheap, compared to what is likely going
  to be perform on the result on the filelog traversal,

- We only touch the manifest for ancestors touching the file, And such
  changesets are likely to be the one introducing the file. (except in
  pathological cases involving merge),

- We use manifest diff instead of full manifest unpacking to check manifest
  content, so it does not involve applying multiple diffs in most case.

- linkrev shadowing is not the common case.

Tests for fixed issues in log, annotate and rename detection have been
added.

But this changeset does not solve all problems. It fixes -ancestry-
computation, but if the linkrev-shadowed changesets is the starting one, we'll
still get things wrong. We'll have to fix the bootstrapping of such operations
in a later changeset. Also, the usage of `hg log FILE`  without --follow still
has issues with linkrev pointing to hidden changesets, because it relies on the
`filelog` revset which implement its own traversal logic that is still to be
fixed.

Thanks goes to:
- Matt Mackall: for nudging me in the right direction
- Julien Cristau and Rémi Cardona: for keep telling me linkrev bug were an
  evolution show stopper for 3 years.
- Durham Goode: for finding a new linkrev issue every few weeks
- Mads Kiilerich: for that last rename bug who raise this topic over my
  anoyance limit.
2014-12-23 15:30:38 -08:00
Yuya Nishihara
6ecd008e93 annotate: port to generic templater enabled by hidden -T option
If the selected formatter is other than plainformatter, raw data are passed
to the formatter.  In this case, it isn't necessary (and not possible) to
calculate column widths.

Field names are substituted to be the same as "log" command.

There are a few limitations:

 - "binary file" message is not included in formatted output.
 - no data structure for multiple files. all lines are packed to single list.
2014-09-17 23:21:20 +09:00
FUJIWARA Katsunori
3a71637352 annotate: increase refcount of each revisions correctly (issue3841)
Before this patch, refcount (managed in "needed") of parents of each
revisions in "visit" is increased, only when parent is not annotated
yet (examined by "p not in hist").

But this causes less refcount of the revision like "A" in the tree
below ("A" is assumed as the second parent of "C"):

    A --- B --- C
      \       /
       \-----/

Steps of annotation for "C" in this case are shown below:

  1. for "C"
    1.1 increase refcount of "B"
    1.2 increase refcount of "A" (=> 1)
    1.3 defer annotation for "C"

  2. for "A"
    2.1 annotate for "A" (=> put result into "hist[A]")
    2.2 clear "pcache[A]" ("pcache[A] = []")

  3. for "B"
    3.1 not increase refcount of "A", because "A not in hist" is False
    3.2 annotate for "B"
    3.3 decrease refcount of "A" (=> 0)
    3.4 delete "hist[A]", even though "A" is still needed by "C"
    3.5 clear "pcache[B]"

  4. for "C", again
    4.1 not increase refcount of "B", because "B not in hist" is False
    4.2 increase refcount of "A" (=> 1)
    4.3 defer annotation for "C"

  5. for "A", again
    5.1 annotate for "A" (=> put result into "hist[A]", again)
    5.2 clear "pcache[A]"

  6. for "C", once again
    6.1 not increase refcount of "B", because "B not in hist" is False
    6.2 not increase refcount of "A", because "A not in hist" is False
    6.3 annotate for "C"
    6.4 decrease refcount of "A", and delete "hist[A]"
    6.5 decrease refcount of "B", and delete "hist[B]"
    6.6 clear "pcache[C]"

At step (5.1), annotation for "A" mis-recognizes that all lines are
created at "A", because "pcache[A]" already cleared at step (2.2)
prevents from scanning ancestors of "A".

So, annotation for "C" or its descendants loses information about "A"
or its ancestors.

The root cause of this problem is that refcount of "A" is decreased at
step (3.3), even though it isn't increased at step (3.1).

To increase refcount correctly, this patch increases refcount of each
parents of each revisions:

  - regardless of "p not in hist" or not, and
  - only once for each revisions in "visit" (by "not pcached")

In fact, this problem should occur only on legacy repositories in
which a filelog includes the merging between the revision and its
ancestor (as the second parent), because:

  - tree is scanned in depth-first

    without such merging, revisions in "visit" refer different
    revisions as parent each other

  - recent Mercurial doesn't allow such merging

    changelog and manifest can include such merging someway, but
    filelogs can't, because "localrepository._filecommit()" converts
    such merging request to linear history.

This patch tests merging cases below: these cases are from filelog of
"mercurial/commands.py" in the repository of Mercurial itself.

  - both parents are same

        10 --- 11 --- 12
                  \_/

        filelogrev: changesetid:
          10          526aca6bcb38
          11          05098100ff44
          12          2d4f4cfa81d6

  - the second parent is also ancestor of the first one

        37 --- 38 --- 39 --- 40
                  \________/

        filelogrev: changesetid:
          37          033dc4170fe6
          38          5ff1a23ce38c
          39          661a47367859
          40          a2ba99fd026f
2013-03-29 22:57:16 +09:00
Mads Kiilerich
59664fd950 check-code: fix check for trailing whitespace on continued lines too
The tests in test-annotate.t and test-import-git.t that relied on trailing
space in a file created by a here string is now masked by a literal 'EOL'
string that is removed.
2012-08-08 18:10:37 +02:00
Mads Kiilerich
fa1c4e5ebe tests: add missing trailing 'cd ..'
Many tests didn't change back from subdirectories at the end of the tests ...
and they don't have to. The missing 'cd ..' could always be added when another
test case is added to the test file.

This change do that tests (99.5%) consistently end up in $TESTDIR where they
started, thus making it simpler to extend them or move them around.
2012-06-11 01:40:51 +02:00
Ion Savin
029e0ada33 annotate: append newline after non newline-terminated file listings
The last line of a non newline-terminated file would mix with the first line of
the next file in multiple-file listings before this patch.

Possible compatibility issue: no longer possible to tell from the annotate
output if the file is terminated by new line or not.
2012-01-10 10:18:19 +02:00
Patrick Mezard
cc3315778f annotate: support diff whitespace filtering flags (issue3030)
splitblock() was added to handle blocks returned by bdiff.blocks() which differ
only by blank lines but are not made only of blank lines. I do not know exactly
how it could happen but mdiff.blocks() threshold behaviour makes me think it
can if those blocks are made of very popular lines mixed with popular blank
lines. If it is proven to be wrong, the function can be dropped.

The first implementation made annotate share diff configuration entries. But it
looks like users will user -w/b for annotate but not for diff, on both the
command line and hgweb. Since the latter cannot use command line entries, we
introduce a new [annotate] section duplicating the diff whitespace options.
2011-11-18 12:04:31 +01:00
Thomas Arendsen Hein
5921e1efcc annotate: fix alignment of columns in front of line numbers (issue2807) 2011-05-18 15:41:03 +02:00
Matt Mackall
ca9fdc1884 annotate: catch nonexistent files using match.bad callback (issue1590) 2011-03-19 01:34:49 -05:00
Martin Geisler
3d112b3042 tests: added a short description to issue numbers
Many tests already had a short line to describe what IssueXXX is
about. I find that quite useful when reading a test.
2010-09-24 10:13:49 +02:00
Martin Geisler
f90f39674a tests: unify test-annotate 2010-08-14 02:18:17 +02:00