Commit Graph

32 Commits

Author SHA1 Message Date
Jun Wu
f183d30cdf fastannotate: move path-related methods to a class
Summary:
Previously, the linelog and revmap paths are private information inside
`annotatecontext`. But we still want to get those paths in other places,
like the perfhack path check, or the quick up-to-date check (linelog vs
filelog) server-side. Let's expose the information by adding the helper
class.

Test Plan: Run existing tests

Reviewers: #mercurial, stash

Reviewed By: stash

Subscribers: stash, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4062835

Signature: t1:4062835:1477552929:7ece4f492e6b1e6f22cc626044819d5e3e2a66aa
2016-10-22 01:37:08 +01:00
Jun Wu
79b9423cee fastannotate: improve compatibility with older hg
Summary:
Before ba91edc29ce5, `diffopts` does not have `__dict__`. Let's fix that by
querying keys from `diffopts.default`, instead of using `__dict__`.

Test Plan: `arc unit`

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4086921

Signature: t1:4086921:1477554146:e48647c59769c2951518b091350289aa966aab7b
2016-10-27 00:19:05 +01:00
Jun Wu
3339323a30 fastannotate: use fastannotate to run test-annotate.t
Summary:
There are some minor differences between the two annotate algorithm.
Both are reasonable in theory. The differences are explained in the test.

The `badfn` change is needed to make exit code correct.

Test Plan: Run the modified test

Reviewers: #mercurial, stash

Reviewed By: stash

Subscribers: stash, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4060263

Signature: t1:4060263:1477401942:64a08d2fa28ea0fa9b54fa0f0ea48ca489c308df
2016-10-21 21:06:20 +01:00
Jun Wu
87a71c3a2f fastannotate: support -r wdir()
Summary:
Although `wdir()` is an undocumented feature, it exists in upstream annotate
tests. Let's support it so it is more similar to the vanilla annotate.

Test Plan: Added a new test

Reviewers: #mercurial, stash

Reviewed By: stash

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4060169

Signature: t1:4060169:1477402083:021cae092e695a72fff65cd07640c9f7984b2a60
2016-10-21 20:50:37 +01:00
Jun Wu
a91f5d53ac fastannotate: add a JSON formatter
Summary: The vanilla annotate command supports `-Tjson`. Let's implement it as well.

Test Plan: Added a new test

Reviewers: #mercurial, simonfar

Reviewed By: simonfar

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4059577

Signature: t1:4059577:1477074802:09b0b3ea0769d480eb3a2e42308636ff2b8d40d2
2016-10-21 19:28:30 +01:00
Jun Wu
a812dc99fb fastannotate: fix imports and pyflakes check
Summary:
This diff tries to fix `test-check-*` issues for fastannotate.

One issue cannot be fixed:

  fastannotate/context.py:31: imports not lexically sorted: linelog < os

If these two imports are swapped, the checker will report:

  stdlib import "os" follows local import: linelog

instead. So leave it as-is for now until we get the checker fixed upstream.

Test Plan: `arc unit`

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4044157

Signature: t1:4044157:1476949486:13052173bc694edd99d21affa336d252bc2aae88
2016-10-19 16:05:22 +01:00
Jun Wu
393ae99254 test-check: backport test-check-config from core hg
Summary:
Hopefully this will prevent new undocumented config options.

Issues about fastannotate are fixed. We still have many other issues, which
are ignored by this diff for now.

Test Plan: `arc unit`

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: stash, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4037173

Signature: t1:4037173:1476876491:280e839a8e3d4f4dfbdf6b6953ab860bb7cd3ed4
2016-10-18 17:48:11 +01:00
Jun Wu
dc07b62633 fastannotate: handle "draft" paths, and renames correctly
Summary:
Previously we can only answer the "path" information when the revision is in
the linelog revmap, and the code would crash if a revision is in a side
branch, and the user requests path information. This diff fixes it.

Besides, this diff improves rename handling. For example, given the following
chart:

```
  o---o  -o   file name: a
        /
  o---o-      file name: b
  ^   ^   ^
  1   2   3   revisions
```

Depending on the position of the `main branch` reference, fastannotate may
or may not use linelog:

- main branch is at rev 2, annotate a -r 3 will not take advantage of linelog
  (fallback to slow annotate)
- main branch is at rev 3, annotate a -r 2 will not take advantage of linelog

This is not ideal, but seems to be the best we can do for now.

Test Plan:
Added a new test, updated existing relevant tests. Some debug messages are
changed to reflect internals more precisely.

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: stash, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4010964

Signature: t1:4010964:1476458201:79875d96399d023d0000d0c4bb8b8d40ea43eef0
2016-10-09 19:47:01 +01:00
Jun Wu
d79ce74ce1 fastannotate: document defaultformat
Summary: This config option exists but is not documented.

Test Plan: Run existing tests

Reviewers: #sourcecontrol, eql, rmcelroy

Reviewed By: rmcelroy

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4011108

Signature: t1:4011108:1476307414:96191408f56b42a7949c2804114c7449ec8b5b27
2016-10-12 22:05:03 +01:00
Jun Wu
fc1e2d9c80 fastannotate: add hgweb support
Summary:
This adds fastannotate support for hgweb. There are some issues with "path"
handling, which will be addressed in follow up patches.

A minor change has been made in this patch to support revision numbers
(previously only global changeset hashes are supported).

Test Plan:
Run `hg serve --config fastannotate.hgweb=1` on the `hg-committed` repo, open
the following URL and confirm it's using fastannotate:
http://localhost:8000/annotate/9af6f8434430/mercurial/commands.py

Reviewers: #sourcecontrol, simonfar

Reviewed By: simonfar

Subscribers: simonfar, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3993636

Signature: t1:3993636:1476026586:cb0628aa7107bdbfde852a6a1471f70dcb21a5ef
2016-10-09 15:19:18 +01:00
Jun Wu
de2e6fd858 fastannotate: add diffopts
Summary:
The vanilla annotate command takes diffopts, let's add it to fastannotate.
This is also useful to support hgweb correctly.

Test Plan: Added a new test

Reviewers: #sourcecontrol, simonfar

Reviewed By: simonfar

Subscribers: simonfar, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3992672

Signature: t1:3992672:1475944204:2fbdf6a90d8c965775e3b7d5bc55fdb37c8909e7
2016-10-08 15:58:21 +01:00
Jun Wu
4d1e57d182 fastannotate: fix getalllines
Summary:
`getalllines` needs to call annotate before continuing to build the "forked"
linelog in memory.

Test Plan:
Set mainbranch to @, run `hg fa --deleted` on a file that has been changed
in the draft branch and it does not crash.

Pinged: rmcelroy.
2016-10-08 01:39:17 +01:00
Jun Wu
38100f3be4 fastannotate: handle empty file correctly
Summary:
`_decorate` backported from upstream does not handle empty file correctly.
It would raise an assertion error when annotating an empty file:

  File "fastannotate/commands.py", line 138, in fastannotate
    not showdeleted))
  File "fastannotate/context.py", line 331, in annotate
    return self._refineannotateresult(result, revfctx, showpath, showlines)
  File "fastannotate/context.py", line 503, in _refineannotateresult
    if len(lines) != len(result):
  AssertionError

This patch fixes it.

Test Plan: Run the modified test.

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3944132

Signature: t1:3944132:1475166894:e2610c6364806b77c8533315a1a0a08b6c158fe5
2016-09-29 16:48:40 +01:00
Jun Wu
56075d02ef fastannotate: encode directory names
Summary:
We need to reserve ".l", ".m" and ".lock". So encode directory names to
avoid collision.

Test Plan: Run the modified test

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3944069

Signature: t1:3944069:1475166648:055811239514cb699a0ebe1cfab809b661c6cfd2
2016-09-29 16:33:46 +01:00
Jun Wu
fa32e904bb fastannotate: try to avoid using matcher if perfhack is enabled
Summary:
The mercurial matcher will read the manifest - unacceptably slow for our
usecase. Instead, if perfhack is enabled and matched linelog files exist,
treat patterns as valid file names directly.

Test Plan: perfhack tests will be added in the next patch

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3937103

Signature: t1:3937103:1475167055:dc27b65fe10f7a3a561afdb31292531f38e90dfb
2016-09-28 13:12:46 +01:00
Jun Wu
76bc8110f2 fastannotate: add a perfhack to use linkrev as introrev
Summary:
`introrev` can be slow because `_adjustlinkrev` can be very expensive.
Usually `introrev` is just `linkrev`. So let's use the latter as an
approximate, if `perfhack` is enabled.

Note that we only use `linkrev` when *checking* if the hg hash is in linelog
revmap or not - for the read-only operation. If we need to update linelog,
we fallback to the slow `introrev` implementation.

Test Plan: perfhack related tests will be added later

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3937044

Signature: t1:3937044:1475139823:450c6bb68b33db671f6b95cde4abfd24b7f2af4c
2016-09-28 15:56:47 +01:00
Jun Wu
bd3b5783c6 fastannotate: use a more efficient way to get filectx
Summary:
Usually we simply use `repo[node][path]` to get a `filectx` object.
But that can be painfully slow - it's reading the whole manifest by default.

The flat manifest and tree manifest both provide a method called `find` to
get a `filenode` without reading the entire manifest - which is our use-case
here since we don't need to resolve all files but just need to peek several
individual files.

As we are touching this area, also deal with revset string resolution so
we can simplify the logic in more places - now master can be a revset string
and is no lonnger needed to be lazily resolved. Besides, the `_getbase` logic
is also moved here and to be more efficient - the old `_getbase` still need
to read manifest.

This patch replaces all code using the `repo[node][path]` pattern to use the
more efficient implementation one.

Test Plan: Run existing tests

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3936992

Signature: t1:3936992:1475138449:2ec6c71a592a256fa9487a5f7b41241bf7440f89
2016-09-28 15:56:31 +01:00
Jun Wu
89c9da2b06 fastannotate: add a --deleted option
Summary:
This feature uses the linelog to show all lines ever existed (even deleted) in
a file. Helpful to see the history all the way back to the beginning.

Sadly it has to be inefficient currently as we have chosen to not store line
content (but only numbers) in linelog. Calculating the revisions and line
numbers is very fast because of linelog but resolving the line contents is
painfully slow. We may want a key-value database in the future, answering the
query:

  (path, node, linenum) -> content

How slow is it? With the linelog pre-built, generating the output for
`mercurial/commands.py` needs resolving 400+ revisions and is about 10+
seconds.

Test Plan: Run the changed `test-fastannotate.t`

Reviewers: #sourcecontrol, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3849503

Signature: t1:3849503:1475086235:83077c571746a7515b5ba75c4df37a1a400d9232
2016-09-11 22:52:29 +01:00
Jun Wu
58826b52c2 fastannotate: add a config option to replace the default command
Summary:
This diff adds a config option to replace the default annotate command using
fastannotate.

Test Plan: Run the modified test

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3837499

Signature: t1:3837499:1475076620:65dc656c74e9c8a969f68cc4a2480f5dcbeb6361
2016-09-08 21:52:01 +01:00
Jun Wu
96e56210ff fastannotate: implement the command
Summary:
This diff adds the `fastannotate` command, which puts all components
(formatter, annotatecontext - linelog, revmap) together.

Note that `fastannotate.mainbranch` needs to be carefully chosen to avoid
rebuilds. If `mainbranch` moves backwards, the `fastannotate` command
can fail because it could not move linelog backwards and thus cannot
update linelog incrementally.

Test Plan: Code Review. The `.t` file is coming in the next diff.

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: stash, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3836670

Signature: t1:3836670:1474016576:b323b62db3bfd69bff88f6911abbe26075734737
2016-09-08 17:59:01 +01:00
Jun Wu
bb07a56a18 fastannotate: add logic to format annotate output
Summary:
The annotate output is not trivial. It supports a lot of flags, and will do
padding by default. This diff implements most flags and the padding logic.
It tries to imitate what the original annotate does by default.

Note that `-T` (and namely `-Tjson`) is not supported currently.

Test Plan: Code Review. A `.t` test will be added later.

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: stash, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3836445

Signature: t1:3836445:1473868393:91f6cede059e319eca7015d0d18727afa7ded665
2016-09-08 16:57:50 +01:00
Jun Wu
60eb8a2c22 fastannotate: implement the annotate algorithm
Summary:
This diff implements the `annotate` algorithm. Unlike the vanilla one, the
annotate method takes 2 revisions: the revision specified for annotating,
and the head of the main branch. The algorithm will do a "hybrid" annotate:
incrementally update the linelog (the cache) so it can answer queries of
any revision in the main branch. And use the traditional algorithm to deal
with revisions not in the main branch: like a side branch of a merge commit,
or the revision the user specified not in the main branch.

The main branch is supposed to be something like `master` or `@`, and their
p1s.

Building up linelog with merge handled reasonably for the main branch, and
the non-linelog part that produces final result share a lot internal states
and logic so they are deeply coupled. Splitting them will probably reduce
performance, or have difficulty (no clean way) to share internal states.
If the caller only wants to build linelog without annotate things, just pass
`rev = master`.

While some attempts are made to support "merge" changeset, the result can
still be different from the vanilla one sometimes. In those cases, both
results make sense. It's really hard, if not impossible, to make the new
implementation 100% same with the vanilla one because of the linear history
restriction of linelog so I guess currently it's good enough. The differences
will be covered by a `.t` test later.


Test Plan: Code Review. A `.t` file will be added.

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: stash, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3836438

Signature: t1:3836438:1473778829:27978479a01920833fa146f427178292ea1f5306
2016-09-08 16:54:21 +01:00
Jun Wu
943e37217d fastannotate: add an empty annotatecontext
Summary:
The annotatecontext object contains all the states to perform a fast
annotate operation. This diff adds a skeleton class and a helper method
that handles locking properly.

Test Plan: Code Review

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: stash, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3836380

Signature: t1:3836380:1473433244:7759527278200dabdfc4c4869c1df28fc156d0dd
2016-09-08 16:46:27 +01:00
Jun Wu
9d919136d2 fastannotate: add annotateopts object
Summary:
The `annotateopts` object is like `diffopts`, maintaining a set of related
annotate options. Currently different options will result in different
(incompatible) linelogs being built. Therefore the `annotateopts` object
also have a method to generate a unique string describing the options, so
we can use the string as part of the path where we write the linelog file.

Test Plan: Code Review

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3836366

Signature: t1:3836366:1473413541:0e88ca2159dab38a5f4067a67d8a86c74d072cae
2016-09-08 16:09:48 +01:00
Jun Wu
84384559c3 fastannotate: cherry pick some functions from the vanilla annotate
Summary:
These are some small functions defined in the core hg `context.annotate`
method and we want to use them later. The ported code was licensed under
GPLv2.

Test Plan: A `.t` test will be added when the fastannotate command is functional.

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: stash, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3836335

Signature: t1:3836335:1473413278:de2510a6e5041455c430199717a1ef5217de5348
2016-09-08 16:08:12 +01:00
Jun Wu
13126d61f2 fastannotate: appending revmap instead of rewriting whenever possible
Summary:
Previously in D3837023, we added logic to prevent flushing the entire revmap
if nothing changes. It could be smarter - only append newly added revisions.
So the revmap becomes append-only for the most of the time.

Test Plan: Run test-fastannotate-revmap.py

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3856272

Signature: t1:3856272:1473778447:9f30c315beead549f883b4ff3e19b2fd750a66ab
2016-09-12 21:01:50 +01:00
Jun Wu
b0a021e58a fastannotate: add a "copyfrom" method to the revmap
Summary:
Like the linelog, the revmap can be either backed by the file system, or by
the memory. Sometimes we need to "fork" an on-disk linelog and a revmap to
be able to change them without writing the changes back to disk. This diff
implements a "copyfrom" method that works in that case. A typical usage
will be:

  m1 = revmap(path) # load revmap from path on disk
  m2 = revmap()     # create an in-memory-only revmap
  m2.copyfrom(m1)   # copy content from disk to memory
  # do anything to m2 without affecting m1

Test Plan: A revmap Python test will be added later and cover this feature

Reviewers: #sourcecontrol, zamsden

Reviewed By: zamsden

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3849345

Signature: t1:3849345:1473713912:0bd7bd8784ffde2ae7f4f3a29f902cc25cece8b8
2016-09-11 16:44:57 +01:00
Jun Wu
0c4dd52818 fastannotate: record file names in the revmap
Summary:
To support `--file` and to be able to get the filectx from a `repo` object
correctly, we need to store path information in the revmap and this diff
did it.

Consider the fact that renaming happens rarely and the query about path is
also not often used, we only store paths for revisions that do real renames,
both on-disk and in-memory. A bisect is used to calculate the path for
a given revision.

Test Plan: A revmap Python test will be added later and cover this feature

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: stash, zamsden, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3849334

Signature: t1:3849334:1473872469:ed1e51d470377d3941cf3f9b82971f0e626d38a0
2016-09-19 10:51:54 +01:00
Jun Wu
12805254e6 fastannotate: skip flushing revmap if nothing changed
Summary:
Writing the revmap back to disk can be a bit expensive if the revmap is not
small. This diff optimizes `revmap.flush` so it does not write anything if
nothing changed.

Test Plan: Profiling data shows no more substantial `write` syscalls.

Reviewers: #sourcecontrol, zamsden

Reviewed By: zamsden

Subscribers: zamsden, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3837023

Signature: t1:3837023:1473954024:6af83074096517ab1d3d05cb90d091af6e0c5d02
2016-09-08 21:01:47 +01:00
Jun Wu
833188e453 fastannotate: move exception classes to error.py
Summary: The newly added error.py will be used in other files.

Test Plan: A related test will be added later

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: ttung, stash, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3836304

Signature: t1:3836304:1473411906:ca110c8214d2b1a0f9ae29111d75baa54c072c19
2016-09-08 14:47:49 +01:00
Jun Wu
190eb53c0b fastannotate: implements the "contains" operations for the revmap
Summary:
The revmap stores 2 kinds of revisions:
- Real revisions in the main branch that can be annotated directly
- Revisions in a merge changeset's p2 branch. They are not in the linelog
  linear history but may appear in lines introduced by the merge changeset.
  We call those revisions in a "side branch", outside the "main branch".

This diff implements `__contains__` to test if a revision is in the main
branch or not, it will be frequently used in upcoming code like calculating
the annotate information for a side branch, or finding a joint point when
doing a hybrid annotate.

Test Plan: A revmap Python test will be added later and cover this feature

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: stash, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3827995

Signature: t1:3827995:1473324222:4f0d94a2d21c033ab78b3a5dd31254692d36a852
2016-09-07 15:05:55 +01:00
Jun Wu
f1c3dc9f91 fastannotate: implement a simple revision map
Summary:
To use linelog, we need to be able to translate between hg commit hashes and
linelog revision numbers. This diff implements such a revmap using the most
direct way.

The revmap also contains an extra "flag" for each revision, which will be used
to mark if the revision is in the main branch or not, to handle merge commits.

Test Plan:
`import revmap` from IPython and test its interface manually. Also have a
simple script to get some idea about its perf with 10000 revisions:

```
import contextlib, time, os, random, revmap, sys

@contextlib.contextmanager
def benchmark(msg):
    sys.stderr.write('%s: ' % msg)
    t1 = time.time()
    yield
    t2 = time.time()
    sys.stderr.write('%f seconds\n' % (t2 - t1))

def randomid():
    return ''.join([chr(random.randint(0,255)) for _ in xrange(0, 20)])

rm = revmap.revmap('revmap1')

with benchmark('insert 10000 random revisions'): # ~0.3 seconds
    for i in xrange(0, 10000):
        rm.append(randomid(), flag=1, flush=False)

with benchmark('writing to disk'): # 0.02 seconds
    rm.flush()

os.rename('revmap1', 'revmap2')
with benchmark('loading'): # ~0.015 seconds
    rm = revmap.revmap('revmap2')
```

Reviewers: ttung, #sourcecontrol, ikostia

Reviewed By: ikostia

Subscribers: ikostia, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3709706

Signature: t1:3709706:1471936489:0bbe35ed39a2af3f06e1000c4f9674149ad43995
2016-08-23 16:24:13 +01:00