Commit Graph

621 Commits

Author SHA1 Message Date
Jun Wu
f1c575a099 flake8: enable F821 check
Summary:
This check is useful and detects real errors (ex. fbconduit).  Unfortunately
`arc lint` will run it with both py2 and py3 so a lot of py2 builtins will
still be warned.

I didn't find a clean way to disable py3 check. So this diff tries to fix them.
For `xrange`, the change was done by a script:

```
import sys
import redbaron

headertypes = {'comment', 'endl', 'from_import', 'import', 'string',
               'assignment', 'atomtrailers'}

xrangefix = '''try:
    xrange(0)
except NameError:
    xrange = range

'''

def isxrange(x):
    try:
        return x[0].value == 'xrange'
    except Exception:
        return False

def main(argv):
    for i, path in enumerate(argv):
        print('(%d/%d) scanning %s' % (i + 1, len(argv), path))
        content = open(path).read()
        try:
            red = redbaron.RedBaron(content)
        except Exception:
            print('  warning: failed to parse')
            continue
        hasxrange = red.find('atomtrailersnode', value=isxrange)
        hasxrangefix = 'xrange = range' in content
        if hasxrangefix or not hasxrange:
            print('  no need to change')
            continue

        # find a place to insert the compatibility  statement
        changed = False
        for node in red:
            if node.type in headertypes:
                continue
            # node.insert_before is an easier API, but it has bugs changing
            # other "finally" and "except" positions. So do the insert
            # manually.
            # # node.insert_before(xrangefix)
            line = node.absolute_bounding_box.top_left.line - 1
            lines = content.splitlines(1)
            content = ''.join(lines[:line]) + xrangefix + ''.join(lines[line:])
            changed = True
            break

        if changed:
            # "content" is faster than "red.dumps()"
            open(path, 'w').write(content)
            print('  updated')

if __name__ == "__main__":
    sys.exit(main(sys.argv[1:]))
```

For other py2 builtins that do not have a py3 equivalent, some `# noqa`
were added as a workaround for now.

Reviewed By: DurhamG

Differential Revision: D6934535

fbshipit-source-id: 546b62830af144bc8b46788d2e0fd00496838939
2018-04-13 21:51:09 -07:00
Jun Wu
2946a1c198 codemod: use single blank line
Summary: This makes test-check-code cleaner.

Reviewed By: ryanmce

Differential Revision: D6937934

fbshipit-source-id: 8f92bc32f75b9792ac67db77bb3a8756b37fa941
2018-04-13 21:51:08 -07:00
Jun Wu
4c44fcc611 revlog: calculate rawsize correctly when applying deltas
Summary: This fixes the `hg verify` issue.

Reviewed By: DurhamG

Differential Revision: D6920596

fbshipit-source-id: 4a777f182b58acf29a62d50aca65d668b307e5b3
2018-04-13 21:51:06 -07:00
Jun Wu
a226ef4969 revlog: forbid revdiff revisions with non-zero flags
Summary:
Calling revdiff with non-zero flags is a sign of a hard-to-debug
error. Raise ProgrammingError in this case.

The change is straightforward. Apply it to both shallow and full
repos.

Reviewed By: DurhamG

Differential Revision: D6910080

fbshipit-source-id: cbcf1a444de90e104867cc9f1525629b7edda851
2018-04-13 21:51:05 -07:00
Jun Wu
d76d41a0b2 revlog: resolve lfs rawtext to vanilla rawtext before applying delta
Summary:
This happens when the client with LFS revisions applies a bundle
with a delta base pointing to an LFS revision stored in the repo.

Reviewed By: DurhamG

Differential Revision: D6906210

fbshipit-source-id: 8b47f8304f8ef5ae4b02d7239b680f70106a4d83
2018-04-13 21:51:05 -07:00
Jun Wu
453b043043 revlog: do not use delta for lfs revisions
Summary:
This is similar to what we have done for changegroups, but for non-changegroup
(addrawrevision) case. This is needed to make sure the delta application code
path can assume deltas are always against vanilla (non-LFS) rawtext so the next
fix becomes possible.

Reviewed By: DurhamG

Differential Revision: D6906202

fbshipit-source-id: a7d62dfed4206d45b42299f1dabf013620ae52b3
2018-04-13 21:51:05 -07:00
Jun Wu
07042a2157 changegroup: do not delta lfs revisions
Summary:
There is no way to distinguish whether a delta base is LFS or non-LFS.

If the delta is against LFS rawtext, and the client trying to apply it has the
base revision stored as fulltext, the delta (aka. bundle) will fail to apply.

This patch forbids using delta on LFS revisions.

Note: this does not solve the problem entirely. Since the problem could also be
a client with base file revision being LFS tries to apply a non-LFS delta
(bundle).

Reviewed By: DurhamG

Differential Revision: D6878326

fbshipit-source-id: 9c3951e4673b8de61aae73a51e1bfff422f38d0f
2018-04-13 21:51:05 -07:00
Jun Wu
97dfe79221 perftweaks: fold isgooddelta tweak into core
Summary:
The isgooddelta tweak was introduced in D2693043 (perftweaks: change revlog
delta heuristic, 2015-11-24).  Comparing with the existing version, the only
change is that we removed `dist > maxdist` check.

Note that the upstream commit 895ecec31 (revlog: add an experimental option
to mitigated delta issues (issue5480), 2017-06-23) also introduces a config
option to override `maxdist` to make the condition fail, which basically does
a same thing.

Instead of introducing new config options or adding more "if"s to the
codebase to make it more obscure, let's just simplify it by disabling the
check entirely, and removing the `dist` concept, removing two config
options: `experimental.maxdeltachainspan` and `perftweaks.preferdeltas`.

The `chainlen > self._maxchainlen` check should be enough for keeping
delta chain length bounded.

Reviewed By: DurhamG

Differential Revision: D6752529

fbshipit-source-id: e8fd8ec39240191db5fb274190fc661e97087a78
2018-04-13 21:50:53 -07:00
Boris Feld
d8aa8e36d8 upgrade: add a 'redeltafullall' mode
We add a new mode for delta recomputation, when selected, each full text will
go through the full "addrevision" mechanism again. This is slower than
"redeltaall" but this gives the opportunity for extensions to trigger special
logic. For example, the lfs extensions can decide to promote some revision to
lfs storage during the upgrade.
2017-12-07 20:27:03 +01:00
Paul Morelle
80d72dfb5f sparse-read: ignore trailing empty revs in each read chunk
An empty entry in the revlog may happen for two reasons:
- when the file is empty, and the revlog stores a snapshot;
- when there is a merge and both parents were identical.

`hg debugindex -m | awk '$3=="0"{print}' | wc -l` gives 1917 of such entries
in my clone of pypy, and 113 on my clone of mercurial.

These empty revision may be located at the end of a sparse chain, and in some
special cases may lead to read relatively large amounts of data for nothing.
2017-10-18 15:28:19 +02:00
Paul Morelle
942b4f2e1a sparse-read: skip gaps too small to be worth splitting
Splitting at too small gaps might not be worthwhile. With this changeset,
we stop considering splitting on too small gaps. The threshold is configurable.
We arbitrarily pick 256K as a default value because it seems "okay".
Further testing on various repositories and setups will be needed to tune it.

The option name is 'experimental.sparse-read.min-gap-size`, and replaces
`experimental.sparse-read.min-block-size` which is not used any more.
2017-10-18 09:07:48 +02:00
Boris Feld
b5ee1a86a9 sparse-read: move from a recursive-based approach to a heap-based one
The previous recursive approach was trying to optimise each read slice to have
a good density. It had the tendency to over-optimize smaller slices while
leaving larger hole in others.

The new approach focuses on improving the combined density of all the reads,
instead of the individual slices. It slices at the largest gaps first, as they
reduce the total amount of read data the most efficiently.

Another benefit of this approach is that we iterate over the delta chain only
once, reducing the overhead of slicing long delta chains.

On the repository we use for tests, the new approach shows similar or faster
performance than the current default linear full read.

The repository contains about 450,000 revisions with many concurrent
topological branches. Tests have been run on two versions of the repository:
one built with the current delta constraint, and the other with an unlimited
delta span (using 'experimental.maxdeltachainspan=0')

Below are timings for building 1% of all the revision in the manifest log using
'hg perfrevlogrevisions -m'. Times are given in seconds. They include the new
couple of follow-up changeset in this series.

delta-span  standard    unlimited
linear-read     922s         632s
sparse-read     814s         566s
2017-10-18 12:53:00 +02:00
Paul Morelle
549788470a revlog-sparse-read: add a lower-threshold for read block size
The option experimental.sparse-read.min-block-size specifies the minimal size
of a deltachain span, under which it won't be split by _slicechunk.
2017-10-14 17:05:41 +02:00
Paul Morelle
d1288f1f44 revlog: introduce an experimental flag to slice chunks reads when too sparse
Delta chains can become quite sparse if there is a lot of unrelated data between
relevant pieces. Right now, revlog always reads all the necessary data for the
delta chain in one single read. This can lead to a lot of unrelated data to be
read (see issue5482 for more details).
One can use the `experimental.maxdeltachainspan` option with a large value
(or -1) to easily produce a very sparse delta chain.

This change introduces the ability to slice the chunks retrieval into  multiple
reads, skipping large sections of unrelated data. Preliminary testing shows
interesting results. For example the peak memory consumption to read a manifest
on a large repository is reduced from 600MB to 250MB (200MB without
maxdeltachainspan). However, the slicing itself and the multiple reads can have
an negative impact on performance. This is why the new feature is hidden behind
an experimental flag.

Future changesets will add various parameters to control the slicing heuristics.
We hope to experiment a wide variety of repositories during 4.4 and hopefully
turn the feature on by default in 4.5.
As a first try, the algorithm itself is prone to deep changes. However, we wish
to define APIs and have a baseline to work on.
2017-10-10 17:50:27 +02:00
Paul Morelle
6d907667b2 revlog: ignore empty trailing chunks when reading segments
When a merge commit creates an empty diff in the revlog, its offset may still
be quite far from the end of the previous chunk.
Skipping these empty chunks may reduce read size significantly.
In most cases, there is no gain, and in some cases, little gain.
On my clone of pypy, `hg manifest` reads 65% less bytes (96140 i/o 275943) for
revision 4260 by ignoring the only empty trailing diff.
For revision 2229, 35% (34557 i/o 53435)
Sadly, this is difficult to reproduce, as hg clone can make its own different
structure every time.
2017-10-09 15:13:41 +02:00
Mark Thomas
073ae56963 revlog: add option to mmap revlog index
Following on from Jun Wu's patch last October[1], we have found that using mmap
for the revlog index in repos with large revlogs gives a noticable performance
improvment (~110ms on each hg invocation), particularly for commands that don't
touch the index very much.

This changeset adds this as an option, activated by a new experimental config
option so that it can be enabled on a per-repo basis. The configuration option
specifies an index size threshold at which Mercurial will switch to using mmap
to access the index.

If the configuration option is not specified, the default remains to load the
full file, which seems to be the best option for smaller repos.

Some initial performance numbers for average of 5 invocations of `hg log -l 5`
for different cache states:

| Repo: | HG | FB |
|---|---|---|
| Index size: | 2.3MB | much bigger |
| read (warm): | 237ms | 432ms |
| mmap (warm): | 227ms | 321ms |
|   | (-3%) | (-26%) |
| read (cold): | 397ms | 696ms |
| mmap (cold): | 410ms | 888ms |
|   | (+3%) | (+28%) |

[1] https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-October/088737.html

Test Plan:
`hg log --config experimental.mmapindex=true`

Differential Revision: https://phab.mercurial-scm.org/D477
2017-09-13 17:26:26 +00:00
Durham Goode
c00411b064 revlog: add revmap back to revlog.addgroup
The recent e85296920485 patch removed the linkmapper argument from addgroup, as
part of trying to make addgroup more agnostic from the changegroup format. It
turns out that the changegroup can't resolve linkrevs while iterating over the
deltas, because applying the deltas might affect the linkrev resolution. For
example, when applying a series of changelog entries, the linkmapper just
returns len(cl). If we're iterating over the deltas without applying them to the
changelog, this results in incorrect linkrevs. This was caught by the hgsql
extension, which reads the revisions before applying them.

The fix is to return linknodes as part of the delta iterator, and let the
consumer choose what to do.

Differential Revision: https://phab.mercurial-scm.org/D730
2017-09-20 09:22:22 -07:00
Martin von Zweigbergk
cb781eb5fa templater: extract shortest() logic from template function
It can be useful for extensions to be able to produce the shortest
unambiguous hash (including the in-tree "show" extension). That logic
is currently inside the shortest() template function. Let's move it
out of the templater. I've put it on revlog since it's closely related
to revlog._partialmatch. We may also want a convenience method on
context, but I'll leave that for a later patch.

Differential Revision: https://phab.mercurial-scm.org/D724
2017-09-15 00:01:57 -07:00
Durham Goode
c94bd0e75c changegroup: remove changegroup dependency from revlog.addgroup
Previously revlog.addgroup would accept a changegroup and a linkmapper and use
it to iterate of the deltas. As part of untangling the revlog-changegroup
interdependency, let's move the changegroup delta iteration logic to it's own
function and pass the simple iterator to the revlog instead.

This will make it easier to introduce non-revlogs stores in the future, without
reinventing any changegroup specific logic.

Differential Revision: https://phab.mercurial-scm.org/D688
2017-09-13 10:43:44 -07:00
Durham Goode
d27ceccf8a revlog: refactor chain variable
Previously the addgroup loop would set chain to be the result of
self._addrevision(node,...). Since _addrevision now always returns the passed in
node, we can drop that behavior and just always set chain = node in the loop.

This will be useful in a future patch where we refactor the cg.deltachunk logic
to another function and therefore chain disappears entirely from this function.

Differential Revision: https://phab.mercurial-scm.org/D699
2017-09-13 10:43:16 -07:00
Martin von Zweigbergk
54312c2822 revlog: move check for wdir from changelog to revlog
Yuya said he preferred this (to keep them in one place, I think).

Differential Revision: https://phab.mercurial-scm.org/D569
2017-08-30 09:21:31 -07:00
Augie Fackler
58925444d0 revlog: use pycompat.bytestr() to reliably have a %s-able value 2017-08-22 21:21:43 -04:00
Martin von Zweigbergk
9255a8ef24 revlog: abort on attempt to write null revision
My repo got corrupted yesterday by something that ended up writing the
null revision to the revlog (nullid hash, not nullrev index, of
course).  We use many extensions internally (narrowhg, remotefilelog,
evolve, internal extensions) and treemanifests are on. The null
revision was written to the changelog, the root manifest log, and one
subdirectory manifest log. I have no idea exactly why the null
revision was written, but it seems cheap enough to check that we
should fail instead of corrupting the repo.

Differential Revision: https://phab.mercurial-scm.org/D522
2017-08-25 15:50:07 -07:00
Alex Gaynor
119e84a2a0 revlog: use struct.Struct instances for slight performance wins
Differential Revision: https://phab.mercurial-scm.org/D32
2017-07-10 16:41:13 -04:00
Alex Gaynor
c6f81cfa87 revlog: micro-optimize the computation of hashes
Differential Revision: https://phab.mercurial-scm.org/D31
2017-07-10 16:39:28 -04:00
Pierre-Yves David
7c5463c25b revlog: add an experimental option to mitigated delta issues (issue5480)
The general delta heuristic to select a delta do not scale with the number of
branch. The delta base is frequently too far away to be able to reuse a chain
according to the "distance" criteria. This leads to insertion of larger delta (or
even full text) that themselves push the bases for the next delta further away
leading to more large deltas and full texts. This full text and frequent
recomputation throw Mercurial performance in disarray.

For example of a slightly large repository

  280 000 files (2 150 000 versions)
  430 000 changesets (10 000 topological heads)

Number below compares repository with and without the distance criteria:

manifest size:
    with:    21.4 GB
    without:  0.3 GB

store size:
    with:    28.7 GB
    without   7.4 GB

bundle last 15 00 revisions:
    with:    800 seconds
             971 MB
    without:  50 seconds
              73 MB

unbundle time (of the last 15K revisions):
    with:    1150 seconds (~19 minutes)
    without:   35 seconds

Similar issues has been observed in other repositories.


Adding a new option or "feature" on stable is uncommon. However, given that this
issues is making Mercurial practically unusable, I'm exceptionally targeting
this patch for stable.

What is actually needed is a full rework of the delta building and reading
logic. However, that will be a longer process and churn not suitable for stable.

In the meantime, we introduces a quick and dirty mitigation of this in the
'experimental' config space. The new option introduces a way to set the maximum
amount of memory usable to store a diff in memory. This extend the ability for
Mercurial to create chains without removing all safe guard regarding memory
access. The option should be phased out when core has a more proper solution
available.

Setting the limit to '0' remove all limits, setting it to '-1' use the default
limit (textsize x 4).
2017-06-23 13:49:34 +02:00
Gregory Szorc
c45a2cb5ab revlog: C implementation of delta chain resolution
I've seen revlog._deltachain() appear in a number of performance
profiles. I suspect there are 2 reasons for this:

1. Delta chain resolution performs many index lookups, thus triggering
   population of index tuples. Creating possibly tens of thousands of
   PyObject will have overhead.
2. Delta chain resolution is a tight loop.

By moving delta chain resolution to C, we can defer instantiation
of full index entry tuples and make the loop faster courtesy of
not running in Python.

We can measure the impact to delta chain resolution via
`hg perflogrevision` using the mozilla-central repo with a recent
manifest having delta chain length of 33726:

$ hg perfrevlogrevision -m 364895
! full
! wall 0.367585 comb 0.370000 user 0.340000 sys 0.030000 (best of 27)
! wall 0.357581 comb 0.360000 user 0.350000 sys 0.010000 (best of 28)
! deltachain
! wall 0.010644 comb 0.010000 user 0.010000 sys 0.000000 (best of 270)
! wall 0.000292 comb 0.000000 user 0.000000 sys 0.000000 (best of 8729)

$ hg perfrevlogrevision --cache -m 364895
! deltachain
! wall 0.003904 comb 0.000000 user 0.000000 sys 0.000000 (best of 712)
! wall 0.000284 comb 0.000000 user 0.000000 sys 0.000000 (best of 9926)

The first test measures savings from both not instantiating index
entries and moving to C. The second test (which doesn't clear the
index caches) essentially isolates the benefits of moving from Python
to C. It still shows a 13.7x speedup (versus 36.4x). And there are
multiple milliseconds of savings within the critical path for resolving
revision data. I think that justifies the existence of C code.

A more striking example of the benefits of this change can be
demonstrated by timing `hg debugdeltachain -m` for the mozilla-central
repo:

$ time hg debugdeltachain -m > /dev/null
before:        1057.4s
after:          503.3s
PyPy2.7 5.8.0:  220.0s

It's worth noting that the C code isn't as optimal as it could be.
We're still instantiating a new PyObject for every revision. A future
optimization would be to reuse the PyObject on the cached index tuple.
We could potentially also get wins by using a memory array of raw
integers. There is also room for a delta chain cache on revlog
instances. Of course, the best optimization is to implement revlog
reading outside of Python so Python doesn't need to be concerned
about the relatively expensive index entries and operations on them.
2017-06-25 12:41:34 -07:00
Pulkit Goyal
dc7fe61263 py3: catch binascii.Error raised from binascii.unhexlify
Before Python 3, binsacii.unhexlify used to raise TypeError, now it raises
binascii.Error.
2017-06-20 22:11:46 +05:30
Martin von Zweigbergk
daced3540c revlog: rename list of nodes from "content" to "nodes"
It seems like the reason for "content" is that the variable contains
the nodes that the changegroup "contains", see c2901cddb53f (revlog:
make addgroup returns a list of node contained in the added source,
2012-01-13), but "nodes" seems much clearer.
2017-06-15 13:42:35 -07:00
Martin von Zweigbergk
12dbf027df revlog: delete obsolete comment
The comment seems to refer to code that was deleted in a46638c640d8
(revlog.addgroup(): always use _addrevision() to add new revlog
entries, 2010-10-08).
2017-06-15 13:25:41 -07:00
Martin von Zweigbergk
f0c7377259 revlog: delete dead assignment in addgroup() 2017-06-15 13:23:21 -07:00
Gregory Szorc
efbb740737 revlog: skeleton support for version 2 revlogs
There are a number of improvements we want to make to revlogs
that will require a new version - version 2. It is unclear what the
full set of improvements will be or when we'll be done with them.
What I do know is that the process will likely take longer than a
single release, will require input from various stakeholders to
evaluate changes, and will have many contentious debates and
bikeshedding.

It is unrealistic to develop revlog version 2 up front: there
are just too many uncertainties that we won't know until things
are implemented and experiments are run. Some changes will also
be invasive and prone to bit rot, so sitting on dozens of patches
is not practical.

This commit introduces skeleton support for version 2 revlogs in
a way that is flexible and not bound by backwards compatibility
concerns.

An experimental repo requirement for denoting revlog v2 has been
added. The requirement string has a sub-version component to it.
This will allow us to declare multiple requirements in the course
of developing revlog v2. Whenever we change the in-development
revlog v2 format, we can tweak the string, creating a new
requirement and locking out old clients. This will allow us to
make as many backwards incompatible changes and experiments to
revlog v2 as we want. In other words, we can land code and make
meaningful progress towards revlog v2 while still maintaining
extreme format flexibility up until the point we freeze the
format and remove the experimental labels.

To enable the new repo requirement, you must supply an experimental
and undocumented config option. But not just any boolean flag
will do: you need to explicitly use a value that no sane person
should ever type. This is an additional guard against enabling
revlog v2 on an installation it shouldn't be enabled on. The
specific scenario I'm trying to prevent is say a user with a
4.4 client with a frozen format enabling the option but then
downgrading to 4.3 and accidentally creating repos with an
outdated and unsupported repo format. Requiring a "challenge"
string should prevent this.

Because the format is not yet finalized and I don't want to take
any chances, revlog v2's version is currently 0xDEAD. I figure
squatting on a value we're likely never to use as an actual revlog
version to mean "internal testing only" is acceptable. And
"dead" is easily recognized as something meaningful.

There is a bunch of cleanup that is needed before work on revlog
v2 begins in earnest. I plan on doing that work once this patch
is accepted and we're comfortable with the idea of starting down
this path.
2017-05-19 20:29:11 -07:00
Yuya Nishihara
685172007c revlog: add support for partial matching of wdir node id
The idea is simple. If the given node id prefix is 'ff...f', add +1 to the
number of matches (e.g. ambiguous if partial + maybewdir > 1).

This patch also fixes id() revset and shortest() template since _partialmatch()
can raise WdirUnsupported exception.
2016-08-19 18:26:04 +09:00
Yuya Nishihara
b42457ae0a revlog: map rev(wdirid) to WdirUnsupported exception
This will allow us to map repo["ff..."] to workingctx. _partialmatch() will
be updated later. I tried "return wdirrev" in place of raising the exception,
but earlier exception seemed better.
2016-08-20 22:37:58 +09:00
Pulkit Goyal
dfd06a1929 revlog: raise error.WdirUnsupported from revlog.node() if wdirrev is passed
When we try to run, 'hg debugrevspec 'branch(wdir())'', it throws an index error
and blows up. Lets raise the WdirUnsupported if wdir() is passed so that we can
catch that later.
2017-05-23 01:30:36 +05:30
Pulkit Goyal
26a5b62b59 revlog: raise WdirUnsupported when wdirrev is passed
revlog.parentrevs() is called while evaluating ^ operator in revsets. When wdir
is passed, it raises IndexError. This patch raises WdirUnsupported if wdir is
passed in the function. The error will be caugth in future patches.
2017-05-19 19:12:06 +05:30
Gregory Szorc
7d51a8278d revlog: remove some revlogNG terminology
RevlogNG is not such a good name when it is no longer the
newest revlog version. Since we'll soon have revlog version 2,
let's remove some references to it.
2017-05-19 20:14:31 -07:00
Gregory Szorc
5bcef1853c revlog: tweak wording and logic for flags validation
First, the logic around the if..elif..elif was subtly wrong
and sub-optimal because all branches would be tested as long as
the revlog was valid. This patch changes things so it behaves like
a switch statement over the revlog version.

While I was here, I also tweaked error strings to make them
consistent and to read better.
2017-05-19 20:10:50 -07:00
Yuya Nishihara
4563e16232 parsers: switch to policy importer
# no-check-commit
2016-08-13 12:23:56 +09:00
Gregory Szorc
8af088ee65 revlog: rename constants (API)
Feature flag constants don't need "NG" in the name because they will
presumably apply to non-"NG" version revlogs.

All feature flag constants should also share a similar naming
convention to identify them as such.

And, "RevlogNG" isn't a great internal name since it isn't obvious it
maps to version 1 revlogs. Plus, "NG" (next generation) is only a good
name as long as it is the latest version. Since we're talking about
version 2, now is as good a time as any to move on from that naming.
2017-05-17 19:52:18 -07:00
Jun Wu
718861a5f7 changelog: make sure datafile is 00changelog.d (API)
0ad0d26ff7 makes it possible for changelog datafile to be "00changelog.i.d",
which is wrong. This patch adds an explicit datafile parameter to fix it.
2017-05-17 20:14:27 -07:00
Martin von Zweigbergk
c3406ac3db cleanup: use set literals
We no longer support Python 2.6, so we can now use set literals.
2017-02-10 16:56:29 -08:00
Jun Wu
4656f56bb3 flagprocessor: add a fast path when flags is 0
When flags is 0, _processflags could be a no-op instead of iterating through
the flag bits.
2017-05-10 16:17:58 -07:00
Jun Wu
2c11c92a85 revlog: move part of "addrevision" to "addrawrevision"
"addrawrevision" will be the public API to reuse revision rawdata elsewhere.
It will be used by a future patch.
2017-05-09 21:27:06 -07:00
Gregory Szorc
5d6e940365 revlog: rename _chunkraw to _getsegmentforrevs()
This completes our rename of internal revlog methods to
distinguish between low-level raw revlog data "segments" and
higher-level, per-revision "chunks."

perf.py has been updated to consult both names so it will work
against older Mercurial versions.
2017-05-06 12:12:53 -07:00
Gregory Szorc
46413ff643 revlog: rename internal functions containing "chunk" to use "segment"
Currently, "chunk" is overloaded in revlog terminology to mean
multiple things. One of them refers to a segment of raw data from
the revlog. This commit renames various methods only used within
revlog.py to have "segment" in their name instead of "chunk."

While I was here, I also made the names more descriptive. e.g.
"_loadchunk()" becomes "_readsegment()" because it actually does
I/O.
2017-05-06 12:02:12 -07:00
Jun Wu
0606028aff revlog: make "size" diverge from "rawsize"
Previously, revlog.size equals to revlog.rawsize. However, the flag
processor framework could make a difference - "size" could mean the length
of len(revision(raw=False)), while "rawsize" means len(revision(raw=True)).
This patch makes it so.

This corrects "hg status" output when flag processor is involved. The call
stack looks like:

  basectx.status -> workingctx._buildstatus -> workingctx._dirstatestatus
  -> workingctx._checklookup -> filectx.cmp -> filelog.cmp -> filelog.size
  -> revlog.size
2017-04-09 12:53:31 -07:00
Jun Wu
e557e14680 revlog: avoid applying delta chain on cache hit
Previously, revlog.revision(raw=False) may try to apply the delta chain
on _cache hit. That happens if flags are non-empty. This patch makes rawtext
reused so delta chain application is avoided.

"_cache" and "rev" are moved a bit to avoid unnecessary assignments.
2017-04-02 18:40:13 -07:00
Jun Wu
5f26616d71 revlog: indent block to make review easier 2017-04-02 18:29:24 -07:00
Jun Wu
2ab18ee566 revlog: avoid calculating "flags" twice in revision()
This is more consistent with other code in "revision()" - prefer performance
to code length.
2017-04-02 18:25:12 -07:00