Commit Graph

62 Commits

Author SHA1 Message Date
Jun Wu
bb07a56a18 fastannotate: add logic to format annotate output
Summary:
The annotate output is not trivial. It supports a lot of flags, and will do
padding by default. This diff implements most flags and the padding logic.
It tries to imitate what the original annotate does by default.

Note that `-T` (and namely `-Tjson`) is not supported currently.

Test Plan: Code Review. A `.t` test will be added later.

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: stash, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3836445

Signature: t1:3836445:1473868393:91f6cede059e319eca7015d0d18727afa7ded665
2016-09-08 16:57:50 +01:00
Jun Wu
60eb8a2c22 fastannotate: implement the annotate algorithm
Summary:
This diff implements the `annotate` algorithm. Unlike the vanilla one, the
annotate method takes 2 revisions: the revision specified for annotating,
and the head of the main branch. The algorithm will do a "hybrid" annotate:
incrementally update the linelog (the cache) so it can answer queries of
any revision in the main branch. And use the traditional algorithm to deal
with revisions not in the main branch: like a side branch of a merge commit,
or the revision the user specified not in the main branch.

The main branch is supposed to be something like `master` or `@`, and their
p1s.

Building up linelog with merge handled reasonably for the main branch, and
the non-linelog part that produces final result share a lot internal states
and logic so they are deeply coupled. Splitting them will probably reduce
performance, or have difficulty (no clean way) to share internal states.
If the caller only wants to build linelog without annotate things, just pass
`rev = master`.

While some attempts are made to support "merge" changeset, the result can
still be different from the vanilla one sometimes. In those cases, both
results make sense. It's really hard, if not impossible, to make the new
implementation 100% same with the vanilla one because of the linear history
restriction of linelog so I guess currently it's good enough. The differences
will be covered by a `.t` test later.


Test Plan: Code Review. A `.t` file will be added.

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: stash, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3836438

Signature: t1:3836438:1473778829:27978479a01920833fa146f427178292ea1f5306
2016-09-08 16:54:21 +01:00
Jun Wu
943e37217d fastannotate: add an empty annotatecontext
Summary:
The annotatecontext object contains all the states to perform a fast
annotate operation. This diff adds a skeleton class and a helper method
that handles locking properly.

Test Plan: Code Review

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: stash, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3836380

Signature: t1:3836380:1473433244:7759527278200dabdfc4c4869c1df28fc156d0dd
2016-09-08 16:46:27 +01:00
Jun Wu
9d919136d2 fastannotate: add annotateopts object
Summary:
The `annotateopts` object is like `diffopts`, maintaining a set of related
annotate options. Currently different options will result in different
(incompatible) linelogs being built. Therefore the `annotateopts` object
also have a method to generate a unique string describing the options, so
we can use the string as part of the path where we write the linelog file.

Test Plan: Code Review

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3836366

Signature: t1:3836366:1473413541:0e88ca2159dab38a5f4067a67d8a86c74d072cae
2016-09-08 16:09:48 +01:00
Jun Wu
84384559c3 fastannotate: cherry pick some functions from the vanilla annotate
Summary:
These are some small functions defined in the core hg `context.annotate`
method and we want to use them later. The ported code was licensed under
GPLv2.

Test Plan: A `.t` test will be added when the fastannotate command is functional.

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: stash, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3836335

Signature: t1:3836335:1473413278:de2510a6e5041455c430199717a1ef5217de5348
2016-09-08 16:08:12 +01:00
Jun Wu
13126d61f2 fastannotate: appending revmap instead of rewriting whenever possible
Summary:
Previously in D3837023, we added logic to prevent flushing the entire revmap
if nothing changes. It could be smarter - only append newly added revisions.
So the revmap becomes append-only for the most of the time.

Test Plan: Run test-fastannotate-revmap.py

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3856272

Signature: t1:3856272:1473778447:9f30c315beead549f883b4ff3e19b2fd750a66ab
2016-09-12 21:01:50 +01:00
Jun Wu
b0a021e58a fastannotate: add a "copyfrom" method to the revmap
Summary:
Like the linelog, the revmap can be either backed by the file system, or by
the memory. Sometimes we need to "fork" an on-disk linelog and a revmap to
be able to change them without writing the changes back to disk. This diff
implements a "copyfrom" method that works in that case. A typical usage
will be:

  m1 = revmap(path) # load revmap from path on disk
  m2 = revmap()     # create an in-memory-only revmap
  m2.copyfrom(m1)   # copy content from disk to memory
  # do anything to m2 without affecting m1

Test Plan: A revmap Python test will be added later and cover this feature

Reviewers: #sourcecontrol, zamsden

Reviewed By: zamsden

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3849345

Signature: t1:3849345:1473713912:0bd7bd8784ffde2ae7f4f3a29f902cc25cece8b8
2016-09-11 16:44:57 +01:00
Jun Wu
0c4dd52818 fastannotate: record file names in the revmap
Summary:
To support `--file` and to be able to get the filectx from a `repo` object
correctly, we need to store path information in the revmap and this diff
did it.

Consider the fact that renaming happens rarely and the query about path is
also not often used, we only store paths for revisions that do real renames,
both on-disk and in-memory. A bisect is used to calculate the path for
a given revision.

Test Plan: A revmap Python test will be added later and cover this feature

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: stash, zamsden, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3849334

Signature: t1:3849334:1473872469:ed1e51d470377d3941cf3f9b82971f0e626d38a0
2016-09-19 10:51:54 +01:00
Jun Wu
12805254e6 fastannotate: skip flushing revmap if nothing changed
Summary:
Writing the revmap back to disk can be a bit expensive if the revmap is not
small. This diff optimizes `revmap.flush` so it does not write anything if
nothing changed.

Test Plan: Profiling data shows no more substantial `write` syscalls.

Reviewers: #sourcecontrol, zamsden

Reviewed By: zamsden

Subscribers: zamsden, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3837023

Signature: t1:3837023:1473954024:6af83074096517ab1d3d05cb90d091af6e0c5d02
2016-09-08 21:01:47 +01:00
Jun Wu
833188e453 fastannotate: move exception classes to error.py
Summary: The newly added error.py will be used in other files.

Test Plan: A related test will be added later

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: ttung, stash, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3836304

Signature: t1:3836304:1473411906:ca110c8214d2b1a0f9ae29111d75baa54c072c19
2016-09-08 14:47:49 +01:00
Jun Wu
190eb53c0b fastannotate: implements the "contains" operations for the revmap
Summary:
The revmap stores 2 kinds of revisions:
- Real revisions in the main branch that can be annotated directly
- Revisions in a merge changeset's p2 branch. They are not in the linelog
  linear history but may appear in lines introduced by the merge changeset.
  We call those revisions in a "side branch", outside the "main branch".

This diff implements `__contains__` to test if a revision is in the main
branch or not, it will be frequently used in upcoming code like calculating
the annotate information for a side branch, or finding a joint point when
doing a hybrid annotate.

Test Plan: A revmap Python test will be added later and cover this feature

Reviewers: #sourcecontrol, stash

Reviewed By: stash

Subscribers: stash, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3827995

Signature: t1:3827995:1473324222:4f0d94a2d21c033ab78b3a5dd31254692d36a852
2016-09-07 15:05:55 +01:00
Jun Wu
f1c3dc9f91 fastannotate: implement a simple revision map
Summary:
To use linelog, we need to be able to translate between hg commit hashes and
linelog revision numbers. This diff implements such a revmap using the most
direct way.

The revmap also contains an extra "flag" for each revision, which will be used
to mark if the revision is in the main branch or not, to handle merge commits.

Test Plan:
`import revmap` from IPython and test its interface manually. Also have a
simple script to get some idea about its perf with 10000 revisions:

```
import contextlib, time, os, random, revmap, sys

@contextlib.contextmanager
def benchmark(msg):
    sys.stderr.write('%s: ' % msg)
    t1 = time.time()
    yield
    t2 = time.time()
    sys.stderr.write('%f seconds\n' % (t2 - t1))

def randomid():
    return ''.join([chr(random.randint(0,255)) for _ in xrange(0, 20)])

rm = revmap.revmap('revmap1')

with benchmark('insert 10000 random revisions'): # ~0.3 seconds
    for i in xrange(0, 10000):
        rm.append(randomid(), flag=1, flush=False)

with benchmark('writing to disk'): # 0.02 seconds
    rm.flush()

os.rename('revmap1', 'revmap2')
with benchmark('loading'): # ~0.015 seconds
    rm = revmap.revmap('revmap2')
```

Reviewers: ttung, #sourcecontrol, ikostia

Reviewed By: ikostia

Subscribers: ikostia, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3709706

Signature: t1:3709706:1471936489:0bbe35ed39a2af3f06e1000c4f9674149ad43995
2016-08-23 16:24:13 +01:00