Commit Graph

145 Commits

Author SHA1 Message Date
Gregory Szorc
8af088ee65 revlog: rename constants (API)
Feature flag constants don't need "NG" in the name because they will
presumably apply to non-"NG" version revlogs.

All feature flag constants should also share a similar naming
convention to identify them as such.

And, "RevlogNG" isn't a great internal name since it isn't obvious it
maps to version 1 revlogs. Plus, "NG" (next generation) is only a good
name as long as it is the latest version. Since we're talking about
version 2, now is as good a time as any to move on from that naming.
2017-05-17 19:52:18 -07:00
Jun Wu
718861a5f7 changelog: make sure datafile is 00changelog.d (API)
0ad0d26ff7 makes it possible for changelog datafile to be "00changelog.i.d",
which is wrong. This patch adds an explicit datafile parameter to fix it.
2017-05-17 20:14:27 -07:00
Gregory Szorc
ae8cb885e7 changelog: load pending file directly
When changelogs are written, a copy of the index (or inline revlog)
may be written to an 00changelog.i.a file to facilitate hooks and
other processes having access to the pending data before it is
finalized.

The way it works today, the localrepo class loads the changelog
like normal. Then, if it detects a pending transaction, it asks
the changelog class to load a pending changelog. The changelog
class looks for a 00changelog.i.a file. If it exists, it is
loaded and internal data structures on the new revlog class are
copied to the original instance.

The existing mechanism is inefficient because it loads 2 revlog
files. The index, node map, and chunk cache for 00changelog.i
are thrown away and replaced by those for 00changelog.i.a.

The existing mechanism is also brittle because it is a layering
violation to access the data structures being accessed. For example,
the code copies the "chunk cache" because for inline revlogs
this cache contains the raw revision chunks and allows the original
changelog/revlog instance to access revision data for these pending
revisions. This whole behavior of course relies on the revlog
constructor reading the entirety of an inline revlog into memory
and caching it. That's why it is brittle. (I discovered all this
as part of modifying behavior of the chunk cache.)

This patch streamlines the loading of a pending 00changelog.i.a
revlog by doing it directly in the changelog constructor if told
to do so. When this code path is active, we no longer load the
00changelog.i file at all.

The only negative outcome I see from this change is if loading
00changelog.i was somehow facilitating a role. But I can't imagine
what that would be because we throw away its data (the index data
structures are replaced and inline revision data is replaced via
the chunk cache) and since 00changelog.i.a is a copy of
00changelog.i, file content should be identical, so there should
be no meaninful file integrity checking at play. I think this was
all just sub-optimal code.
2017-05-13 16:26:43 -07:00
Pierre-Yves David
a1a70e3fbc transaction: track newly introduced revisions
Tracking revisions is not the data that will unlock the most new capability.
However, they are the simplest thing to track and still unlock some nice
improvements in regard with caching.

We plug ourself at the changelog level to make sure we do not miss any revision
additions.

The 'revs' set is configured at the repository level because the transaction
itself does not needs to know that much about the business logic.
2017-05-02 18:45:51 +02:00
Pulkit Goyal
7b5053db1f py3: slice over bytes to prevent getting ascii values 2017-05-05 01:26:13 +05:30
Yuya Nishihara
cb9a6405bb py3: use bytes() to cast to immutable bytes in changelog.appender.write() 2017-03-26 16:31:01 +09:00
Yuya Nishihara
dc88179a4e util: wrap s.decode('string_escape') calls for future py3 compatibility 2017-03-17 23:42:46 +09:00
Durham Goode
323f27948d changelog: keep track of file end in appender (issue5444)
Previously, changelog.appender.end() would compute the end of the file by
joining all the current appended data and checking the length. This is an O(n)
operation.  449b4adb7d39 introduced a seek call before every revlog write, which
means we are hitting this O(n) behavior n times, which causes changelog writes
during a pull to be n^2.

In our large repo, this caused pulling 100k commits to go from 17s to 130s. With
this fix, it's back to 17s.
2016-12-15 11:00:18 -08:00
Pierre-Yves David
b03bd97b6a revlog: make 'storedeltachains' a "public" attribute
The next changeset will make that attribute read by the changegroup packer. We
make it "public" beforehand.
2016-10-14 02:25:08 +02:00
Gregory Szorc
0ee2ea3be0 changelog: disable delta chains
This patch disables delta chains on changelogs. After this patch, new
entries on changelogs - including existing changelogs - will be stored
as the fulltext of that data (likely compressed). No delta computation
will be performed.

An overview of delta chains and data justifying this change follows.

Revlogs try to store entries as a delta against a previous entry (either
a parent revision in the case of generaldelta or the previous physical
revision when not using generaldelta). Most of the time this is the
correct thing to do: it frequently results in less CPU usage and smaller
storage.

Delta chains are most effective when the base revision being deltad
against is similar to the current data. This tends to occur naturally
for manifests and file data, since only small parts of each tend to
change with each revision. Changelogs, however, are a different story.

Changelog entries represent changesets/commits. And unless commits in a
repository are homogonous (same author, changing same files, similar
commit messages, etc), a delta from one entry to the next tends to be
relatively large compared to the size of the entry. This means that
delta chains tend to be short. How short? Here is the full vs delta
revision breakdown on some real world repos:

Repo             % Full    % Delta   Max Length
hg                45.8       54.2        6
mozilla-central   42.4       57.6        8
mozilla-unified   42.5       57.5       17
pypy              46.1       53.9        6
python-zstandard  46.1       53.9        3

(I threw in python-zstandard as an example of a repo that is homogonous.
It contains a small Python project with changes all from the same
author.)

Contrast this with the manifest revlog for these repos, where 99+% of
revisions are deltas and delta chains run into the thousands.

So delta chains aren't as useful on changelogs. But even a short delta
chain may provide benefits. Let's measure that.

Delta chains may require less CPU to read revisions if the CPU time
spent reading smaller deltas is less than the CPU time used to
decompress larger individual entries. We can measure this via
`hg perfrevlog -c -d 1` to iterate a revlog to resolve each revision's
fulltext. Here are the results of that command on a repo using delta
chains in its changelog and on a repo without delta chains:

hg (forward)
! wall 0.407008 comb 0.410000 user 0.410000 sys 0.000000 (best of 25)
! wall 0.390061 comb 0.390000 user 0.390000 sys 0.000000 (best of 26)

hg (reverse)
! wall 0.515221 comb 0.520000 user 0.520000 sys 0.000000 (best of 19)
! wall 0.400018 comb 0.400000 user 0.390000 sys 0.010000 (best of 25)

mozilla-central (forward)
! wall 4.508296 comb 4.490000 user 4.490000 sys 0.000000 (best of 3)
! wall 4.370222 comb 4.370000 user 4.350000 sys 0.020000 (best of 3)

mozilla-central (reverse)
! wall 5.758995 comb 5.760000 user 5.720000 sys 0.040000 (best of 3)
! wall 4.346503 comb 4.340000 user 4.320000 sys 0.020000 (best of 3)

mozilla-unified (forward)
! wall 4.957088 comb 4.950000 user 4.940000 sys 0.010000 (best of 3)
! wall 4.660528 comb 4.650000 user 4.630000 sys 0.020000 (best of 3)

mozilla-unified (reverse)
! wall 6.119827 comb 6.110000 user 6.090000 sys 0.020000 (best of 3)
! wall 4.675136 comb 4.670000 user 4.670000 sys 0.000000 (best of 3)

pypy (forward)
! wall 1.231122 comb 1.240000 user 1.230000 sys 0.010000 (best of 8)
! wall 1.164896 comb 1.160000 user 1.160000 sys 0.000000 (best of 9)

pypy (reverse)
! wall 1.467049 comb 1.460000 user 1.460000 sys 0.000000 (best of 7)
! wall 1.160200 comb 1.170000 user 1.160000 sys 0.010000 (best of 9)

The data clearly shows that it takes less wall and CPU time to resolve
revisions when there are no delta chains in the changelogs, regardless
of the direction of traversal. Furthermore, not using a delta chain
means that fulltext resolution in reverse is as fast as iterating
forward. So not using delta chains on the changelog is a clear CPU win
for reading operations.

An example of a user-visible operation showing this speed-up is revset
evaluation. Here are results for
`hg perfrevset 'author(gps) or author(mpm)'`:

hg
! wall 1.655506 comb 1.660000 user 1.650000 sys 0.010000 (best of 6)
! wall 1.612723 comb 1.610000 user 1.600000 sys 0.010000 (best of 7)

mozilla-central
! wall 17.629826 comb 17.640000 user 17.600000 sys 0.040000 (best of 3)
! wall 17.311033 comb 17.300000 user 17.260000 sys 0.040000 (best of 3)

What about 00changelog.i size?

Repo                Delta Chains     No Delta Chains
hg                    7,033,250         6,976,771
mozilla-central      82,978,748        81,574,623
mozilla-unified      88,112,349        86,702,162
pypy                 20,740,699        20,659,741

The data shows that removing delta chains from the changelog makes the
changelog smaller.

Delta chains are also used during changegroup generation. This
operation essentially converts a series of revisions to one large
delta chain. And changegroup generation is smart: if the delta in
the revlog matches what the changegroup is emitting, it will reuse
the delta instead of recalculating it. We can measure the impact
removing changelog delta chains has on changegroup generation via
`hg perfchangegroupchangelog`:

hg
! wall 1.589245 comb 1.590000 user 1.590000 sys 0.000000 (best of 7)
! wall 1.788060 comb 1.790000 user 1.790000 sys 0.000000 (best of 6)

mozilla-central
! wall 17.382585 comb 17.380000 user 17.340000 sys 0.040000 (best of 3)
! wall 20.161357 comb 20.160000 user 20.120000 sys 0.040000 (best of 3)

mozilla-unified
! wall 18.722839 comb 18.720000 user 18.680000 sys 0.040000 (best of 3)
! wall 21.168075 comb 21.170000 user 21.130000 sys 0.040000 (best of 3)

pypy
! wall 4.828317 comb 4.830000 user 4.820000 sys 0.010000 (best of 3)
! wall 5.415455 comb 5.420000 user 5.410000 sys 0.010000 (best of 3)

The data shows eliminating delta chains makes the changelog part of
changegroup generation slower. This is expected since we now have to
compute deltas for revisions where we could recycle the delta before.

It is worth putting this regression into context of overall changegroup
times. Here is the rough total CPU time spent in changegroup generation
for various repos while using delta chains on the changelog:

Repo              CPU Time (s)    CPU Time w/ compression
hg                  4.50              7.05
mozilla-central   111.1             222.0
pypy               28.68             75.5

Before compression, removing delta chains from the changegroup adds
~4.4% overhead to hg changegroup generation, 1.3% to mozilla-central,
and 2.0% to pypy. When you factor in zlib compression, these percentages
are roughly divided by 2.

While the increased CPU usage for changegroup generation is unfortunate,
I think it is acceptable because the percentage is small, server
operators (those likely impacted most by this) have other mechanisms
to mitigate CPU consumption (namely reducing zlib compression level and
pre-generated clone bundles), and because there is room to optimize this
in the future. For example, we could use the nullid as the base revision,
effectively encoding the full revision for each entry in the changegroup.
When doing this, `hg perfchangegroupchangelog` nearly halves:

mozilla-unified
! wall 21.168075 comb 21.170000 user 21.130000 sys 0.040000 (best of 3)
! wall 11.196461 comb 11.200000 user 11.190000 sys 0.010000 (best of 3)

This looks very promising as a future optimization opportunity.

It's worth that the changes in test-acl.t to the changegroup part size.
This is because revision 6 in the changegroup had a delta chain of
length 2 before and after this patch the base revision is nullrev.
When the base revision is nullrev, cg2packer.deltaparent() hardcodes
the *previous* revision from the changegroup as the delta parent.
This caused the delta in the changegroup to switch base revisions,
the delta to change, and the size to change accordingly. While the
size increased in this case, I think sizes will remain the same
on average, as the delta base for changelog revisions doesn't matter
too much (as this patch shows). So, I don't consider this a regression.
2016-10-13 12:50:27 +02:00
FUJIWARA Katsunori
ec6b9160d3 changelog: specify checkambig=True to revlog.__init__, to avoid ambiguity
If steps below occurs at "the same time in sec", all of mtime, ctime
and size are same between (1) and (3).

  1. append data to 00changelog.i (and close transaction)
  2. discard appended data by truncation (strip or rollback)
  3. append same size but different data to 00changelog.i again

Therefore, cache validation doesn't work after (3) as expected.

To avoid such file stat ambiguity around truncation, this patch
specifies checkambig=True to revlog.__init__(). This makes revlog
write changes out with checkambig=True.

Even though changes of 00changelog.i themselves are written out at
changelog._finalize(), this checkambig=True is needed, because
revlog.checkinlinesize(), which is invoked at the end of
changelog._finalize(), might replace already changed 00changelog.i by
converted one.

Even after this patch, avoiding file stat ambiguity of 00changelog.i
around truncation isn't yet completed, because truncation side isn't
aware of this issue.

This is a part of ExactCacheValidationPlan.

    https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
2016-09-22 21:51:59 +09:00
FUJIWARA Katsunori
09ab104d31 changelog: specify checkambig=True to avoid ambiguity around truncation
If steps below occurs at "the same time in sec", all of mtime, ctime
and size are same between (1) and (3).

  1. append data to 00changelog.i (and close transaction)
  2. discard appended data by truncation (strip or rollback)
  3. append same size but different data to 00changelog.i again

Therefore, cache validation doesn't work after (3) as expected.

To avoid such file stat ambiguity around truncation, this patch
specifies checkambig=True for renaming or opening to write changes out
at finalization.

Even after this patch, avoiding file stat ambiguity of 00changelog.i
around truncation isn't yet completed, because truncation side isn't
aware of this issue.

This is a part of ExactCacheValidationPlan.

    https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
2016-09-22 21:51:59 +09:00
FUJIWARA Katsunori
2215e1134b revlog: specify checkambig at writing to avoid file stat ambiguity
This allows revlog-style files to be written out with checkambig=True
easily.

Because avoiding file stat ambiguity is needed only for filecache-ed
manifest and changelog, this patch does:

  - use False for default value of checkambig
  - focus only on writing changes of index file out

This patch also adds optional argument checkambig to _divert/_delay
for changelog, to safely accept checkambig specified in revlog
layer. But this argument can be fully ignored, because:

  - changes are written into other than index file, if name != target
  - changes are never written into index file, otherwise
    (into pending file by _divert, or into in-memory buffer by _delay)

This is a part of ExactCacheValidationPlan.

    https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
2016-09-22 21:51:58 +09:00
Pulkit Goyal
5ea942af2d py3: use unicode literals in changelog.py
collections.namedtuple type and field names must be str in Python 3.
Our custom module importer was rewriting them to bytes literals,
making this fail.

In addition, __slots__ values must also be unicode.
2016-08-04 00:15:39 +05:30
Gregory Szorc
d6a37b1c7d changelog: avoid slicing raw data until needed
Before, we were slicing the original raw text and storing individual
variables with values corresponding to each field. This is avoidable
overhead.

With this patch, we store the offsets of the fields at construction
time and perform the slice when a property is accessed.

This appears to show a very marginal performance win on its own and
the gains are so small as to not be worth reporting. However, this
patch marks the end of our parsing refactor, so it is worth reporting
the gains from the entire series:

author(mpm)
0.896565
0.795987  89%

desc(bug)
0.887169
0.803438  90%

date(2015)
0.878797
0.773961  88%

extra(rebase_source)
0.865446
0.761603  88%

author(mpm) or author(greg)
1.801832
1.576025  87%

author(mpm) or desc(bug)
1.812438
1.593335  88%

date(2015) or branch(default)
0.968276
0.875270  90%

author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.183104  87%

Pretty consistent speed-up across the board for any revset accessing
changelog revision data. Not bad!

It's also worth noting that PyPy appears to experience a similar to
marginally greater speed-up as well!

According to statprof, revsets accessing changelog revision data
are now clearly dominated by zlib decompression (16-17% of execution
time). Surprisingly, it appears the most expensive part of revision
parsing are the various text.index() calls to search for newlines!
These appear to cumulatively add up to 5+% of execution time. I
reckon implementing the parsing in C would make things marginally
faster.

If accessing larger strings (such as the commit message),
encoding.tolocal() is the most expensive procedure outside of
decompression.
2016-03-06 15:40:20 -08:00
Gregory Szorc
9f5cd6d5ba changelog: parse description last
Before, we first searched for the double newline as the first step in
the parse then moved to the front of the string and worked our way
to the back again. This made sense when we were splitting the raw
text on the double newline. But in our new newline scanning based
approach, this feels awkward.

This patch updates the parsing logic to parse the text linearly and
deal with the description field last.

Because we're avoiding an extra string scan, revsets appear to
demonstrate a very slight performance win. But the percentage
change is marginal, so the numbers aren't worth reporting.
2016-03-06 13:13:54 -08:00
Gregory Szorc
b586216842 changelog: lazily parse files
More of the same.

Again, modest revset performance wins:

author(mpm)
0.896565
0.822961
0.805156

desc(bug)
0.887169
0.847054
0.798101

date(2015)
0.878797
0.811613
0.786689

extra(rebase_source)
0.865446
0.797756
0.777408

 author(mpm) or author(greg)
1.801832
1.668172
1.626547

author(mpm) or desc(bug)
1.812438
1.677608
1.613941

date(2015) or branch(default)
0.968276
0.896032
0.869017
2016-03-06 14:31:06 -08:00
Gregory Szorc
eee1336322 changelog: lazily parse date/extra field
This is probably the most complicated patch in the parsing
refactor.

Because the date and extras are encoded in the same field, we
stuff the entire field into a dedicated variable and add a
property for accessing the sub-components of each. There is
some duplicated code here. But the code is relatively simple,
so it shouldn't be a big deal.

We see revset performance wins across the board:

author(mpm)
0.896565
0.876713
0.822961

desc(bug)
0.887169
0.895514
0.847054

date(2015)
0.878797
0.820987
0.811613

extra(rebase_source)
0.865446
0.823811
0.797756

author(mpm) or author(greg)
1.801832
1.784160
1.668172

author(mpm) or desc(bug)
1.812438
1.822756
1.677608

date(2015) or branch(default)
0.968276
0.910981
0.896032

author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.516788
3.265024

We see a speed-up on revsets accessing date and extras because the new
parsing code only parses what you access. Even though they are stored
the same text field, we avoid parsing dates when accessing extras and
vice-versa.

But strangely revsets accessing both date and extras appeared to speed
up as well! I'm not sure if this is due to refactoring the parsing
code or due to an optimization in revsets. You can't argue with the
results!
2016-03-06 14:30:25 -08:00
Gregory Szorc
e043adff86 changelog: lazily parse user
Same strategy as before.

Revsets not accessing the user demonstrate a slight performance win:

desc(bug)
0.887169
0.910400
0.895514

date(2015)
0.878797
0.870697
0.820987

extra(rebase_source)
0.865446
0.841644
0.823811

date(2015) or branch(default)
0.968276
0.945792
0.910981
2016-03-06 14:29:46 -08:00
Gregory Szorc
bb5baaea05 changelog: lazily parse manifest node
Like the description, we store the raw bytes and convert from
hex on access.

This patch also marks the beginning of our new parsing method,
which is based on newline offsets and doesn't rely on
str.split().

Many revsets showed a performance improvement:

author(mpm)
0.896565
0.869085
0.868598

desc(bug)
0.887169
0.928164
0.910400

extra(rebase_source)
0.865446
0.871500
0.841644

author(mpm) or author(greg)
1.801832
1.791589
1.731503

author(mpm) or desc(bug)
1.812438
1.851003
1.798764

date(2015) or branch(default)
0.968276
0.974027
0.945792
2016-03-06 14:29:13 -08:00
Gregory Szorc
59c7349712 changelog: lazily parse description
Before, the description field was converted to a localstr at parse
time. With this patch, we store the raw description and convert to
a localstr when it is first accessed.

We see a revset speedup for revsets that don't access the description:

author(mpm)
0.896565
0.914234
0.869085

date(2015)
0.878797
0.891980
0.862525

extra(rebase_source)
0.865446
0.912514
0.871500

author(mpm) or author(greg)
1.801832
1.860402
1.791589

date(2015) or branch(default)
0.968276
0.994673
0.974027

author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.721032
3.643593

As you can see, most of these revsets are already faster than from
before this refactoring: we have already offset the performance
loss from the introduction of the new class representing parsed
changelog entries!
2016-03-06 14:28:46 -08:00
Gregory Szorc
a18df3d4e5 changelog: add class to represent parsed changelog revisions
Currently, changelog entries are parsed into their respective
components at read time. Many operations are only interested
in a subset of fields of a changelog entry. The parsing and
storing of all the fields adds avoidable overhead.

This patch introduces the "changelogrevision" class. It takes
changelog raw text and exposes the parsed results as attributes.
The code for parsing changelog entries has been moved into its
construction function. changelog.read() has been modified to use
the new class internally while maintaining its existing API.
Future patches will make revision parsing lazy.

We implement the construction function of the new class with
__new__ instead of __init__ so we can use a named tuple to
represent the empty revision. This saves overhead and complexity
of coercing later versions of this class to represent an empty
instance.

While we are here, we add a method on changelog to obtain an
instance of the new type.

The overhead of constructing the new class regresses performance
of revsets accessing this data:

author(mpm)
0.896565
0.929984

desc(bug)
0.887169
0.935642 105%

date(2015)
0.878797
0.908094

extra(rebase_source)
0.865446
0.922624 106%

author(mpm) or author(greg)
1.801832
1.902112 105%

author(mpm) or desc(bug)
1.812438
1.860977

date(2015) or branch(default)
0.968276
1.005824

author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.743381

Once lazy parsing is implemented, these revsets will all be faster
than before. There is no performance change on revsets that do not
access this data. There /could/ be a performance regression on
operations that perform several changelog reads. However, I can't
think of anything outside of revsets and `hg log` (basically the
same as a revset) that would be impacted.
2016-03-06 14:28:02 -08:00
Matt Mackall
75ceec976e changelog: backed out changeset 4ef1c9b76e22 2016-03-02 16:05:30 -06:00
Matt Mackall
b4737e5b27 changelog: backed out changeset 9f92d143bdd2
We want to avoid leaking UTF-8 to main body of code wherever possible.
2016-03-02 12:46:54 -06:00
Gregory Szorc
ab7f7f6a19 changelog: lazy decode user (API)
This appears to show a similar speedup as the previous patch.
2016-02-27 22:34:18 -08:00
Gregory Szorc
6f651cb96b changelog: lazy decode description (API)
Currently, changelog reading decodes read values. This is wasteful
because a lot of times consumers aren't interested in some of these
values.

This patch changes description decoding to occur in changectx as
needed.

revsets reading changelog entries appear to speed up slightly:

revset #7: author(lmoscovicz)
   plain
0) 0.906329
1) 0.872653

revset #8: author(mpm)
   plain
0) 0.903478
1) 0.878037

revset #9: author(lmoscovicz) or author(mpm)
   plain
0) 1.817855
1) 1.778680

revset #10: author(mpm) or author(lmoscovicz)
   plain
0) 1.837052
1) 1.764568
2016-02-27 22:25:14 -08:00
Gregory Szorc
c5114195d9 changelog: remove redundant parentheses
You don't need to surround returned tuples with parens.
2016-02-27 22:25:47 -08:00
Laurent Charignon
be5dd01ee1 changelog: add a new method to get files modified by a changeset
This patch adds a new method "readfiles" to get the files modified by a
changeset. It extracts some logic from "read" to only return the files modified
by a changeset as efficiently as possible. This is used in the next patch to
speed up hg log <file|folder>
2015-12-18 13:45:55 -08:00
Yuya Nishihara
d0b6532f54 reachableroots: construct and sort baseset in revset module
This can remove the dependency from changelog to revset, which seems a bit awkward
for me.
2015-08-28 11:14:24 +09:00
Pierre-Yves David
25876521c9 reachableroots: use baseset lazy sorting
smartset sorting is lazy (so faster in some case) and better (informs that the
set is sorted allowing some optimisation). So we rely on it directly.

Some test output are updated because we now have more information (ordering).
2015-08-20 17:23:21 -07:00
Yuya Nishihara
7f0aba37f0 reachableroots: use internal "revstates" array to test if rev is a root
The main goal of this patch series is to reduce the use of PyXxx() function
that is likely to require ugly error handling and inc/decref. Plus, this is
faster than using PySet_Contains().

  revset #0: 0::tip
  0) 0.004168
  1) 0.003678  88%

This patch ignores out-of-range roots as they are in the pure implementation.
Because reachable sets are calculated from heads, and out-of-range heads raise
IndexError, we can just take out-of-range roots as unreachable. Otherwise,
the test of "hg log -Gr '. + wdir()'" would fail.

"heads" argument is changed to a list. Should we have to rename the C function
as its signature is changed?
2015-08-14 15:43:29 +09:00
Augie Fackler
3abaf93572 changelog: trust C implementation of reachableroots more
There are no remaining codepaths in reachableroots where it will
return None, so just trust it completely and simplify this method.

Result by revset
================

Revision:
0) Revision 462169b7f799: style: adjust whitespaces in webutil.py
1) Revision d1d91b8090c6: changelog: trust C implementation of reachableroots more

revset #0: 0::tip
   plain
0) 0.067684
1) 0.006622   9%

revset #1: 0::@
   plain
0) 0.068249
1) 0.009394  13%

IOW this is a 10x speedup in my repo for hg itself for 0::tip and
similar revsets now that the C code is correctly wired up.
2015-08-11 15:06:02 -04:00
Laurent Charignon
1268f45eae changelog: add way to call the reachableroots C implementation
This patch is part of a series of patches to speed up the computation of
revset.reachableroots by introducing a C implementation. The main motivation is
to speed up smartlog on big repositories. At the end of the series, on our big
repositories the computation of reachableroots is 10-50x faster and smartlog on
is 2x-5x faster.

This patch allows us to call the new C implementation of reachableroots from
python by creating an entry point in the changelog class.
2015-08-06 22:10:31 -07:00
Gregory Szorc
68dd6a9cac changelog: use absolute_import 2015-08-08 00:26:49 -07:00
Pierre-Yves David
b06e4ae06e changelog: update read pending documentation
The pending index contains a full copy of the index + in-transaction data. We
replace "extend" with "overwrite" to make this clearer.
2015-07-17 15:53:56 +02:00
Pierre-Yves David
7538ebe084 changelog: document the 'readpending' method
I happen to have spent some time understanding this logic, so I'm leaving
documentation for the next poor fellow.
2014-09-28 20:18:43 -07:00
Yuya Nishihara
8776396de3 changelog: drop unnecessary override of "hasnode"
revlog.hasnode() calls self.rev(node) that takes filtering into account.
2015-05-10 11:39:01 -05:00
Pierre-Yves David
bf28a4a61f changelog: fix readpending if no pending data exist (issue4609)
Since transaction are used for more than just changesets, it is possible
to have a transaction without new changesets at all. In this case no
''00changelog.i.a' are written. In all cases the 'changelog.readpending'
method is called if the repository has any pending data. The 'revlog' logic
provides empty content if the file is missing, so the whole operation
resulted in an empty changelog.

We now skip reading the pending file if it is missing.
2015-04-20 17:16:22 +02:00
Yuya Nishihara
7941359b9d changelog: inline revlog.__contains__ in case it is used in hot loop
Currently __contains__ is called only by "rev()" revset, but "x in cl" is a
function that is likely to be used in hot loop. revlog.__contains__ is simple
enough to duplicate to changelog, so just inline it.
2015-04-04 22:30:59 +09:00
Yuya Nishihara
237120f282 revlog: add __contains__ for fast membership test
Because revlog implements __iter__, "rev in revlog" works but does silly O(n)
lookup unexpectedly. So it seems good to add fast version of __contains__.

This allows "rev in repo.changelog" in the next patch.
2015-02-04 21:25:57 +09:00
Pierre-Yves David
500808d844 changelog: register changelog.i.a as a temporary file
The file is registered to make sure the transaction is cleaned up in all cases.
2014-11-08 17:08:09 +00:00
Pierre-Yves David
29f854f61a transaction: pass the transaction to 'finalize' callback
The callback will likely need to perform some operation related to the
transaction (eg: registering file update). So we better pass the current
transaction as the callback argument. Otherwise callback that needs it has to
rely on horrible weak reference trick.

This allow already allow us to slay a wild weak reference usage.
2014-11-08 16:31:38 +00:00
Pierre-Yves David
92bf4dcbdc transaction: pass the transaction to 'pending' callback
The callback will likely need to perform some operation related to the
transaction (eg: backing files up). So we better pass the current transaction as
the callback argument. Otherwise callback that needs it has to rely on horrible
weak reference trick.

The first foreseen user of this is changelog._writepending. We would like it to
register the temporary file it create for cleanup purpose.
2014-11-08 16:27:50 +00:00
Pierre-Yves David
8803fc197d changelog: rely on transaction for finalization
Instead of calling 'cl.finalize()' by hand (possibly at a bogus time) we
register it in the transaction during 'delayupdate' and rely on 'tr.close()' to
call it at the right time.
2014-10-18 01:09:41 -07:00
Pierre-Yves David
d6b8860637 changelog: handle writepending in the transaction
The 'delayupdate' method now takes a transaction object and registers its
'_writepending' method for execution in 'transaction.writepending()'. The hook can then
use 'transaction.writepending()' directly.

At some point this will allow the addition of other file creation
during writepending.
2014-10-17 21:55:31 -07:00
Pierre-Yves David
71f171494e changelog: rework the delayupdate mechanism
The current way we use the 'delayupdate' mechanism is wrong. We call
'delayupdate' right after the transaction retrieval, then we call 'finalize'
right before calling 'tr.close()'. The 'finalize' call will -always- result in a
flush to disk, making the data available to all readers. But the 'tr.close()' may
be a no-op if the transaction is nested. This would result in data:

1) exposed to reader too early,
2) rolled back by other part of the transaction after such exposure

So we need to end up in a situation where we call 'finalize' a single time when
the transaction actually closes. For this purpose we need to be able to call
'delayupdate' and '_writepending' multiple times and 'finalize' once. This was
not possible with the previous state of the code.

This changeset refactors the code to makes this possible. We buffer data in memory
as much as possible and fall-back to writing to a ".a" file after the first call
to '_writepending'.
2014-10-18 01:12:18 -07:00
Mads Kiilerich
9a3561b211 changelog: use headrevsfiltered
5d1adb6683fa introduced use of the new filtering headrevs C implementation. It
caught TypeError to detect when to fall back to the implementation that was
compatible with old extensions. That method was however not reliable.

Instead, use the new headrevsfiltered function when passing a filter. It will
reliably fail with AttributeError when an old extension that predates
headrevsfiltered is used.
2014-10-26 12:14:12 +01:00
Pierre-Yves David
37d7d2958f repoview: add a FilteredLookupError class
This exception is a more precise LookupError that will allow us to
issue a special message when we end up accessing a filtered revision.
2014-10-16 02:05:06 -07:00
Pierre-Yves David
ea3e835124 repoview: add a FilteredIndexError class
This exception is a more precise IndexError that will allow us to
issue a special message when we end up accessing a filtered revision.
2014-10-15 17:02:44 -07:00
Durham Goode
1b59b1e4c0 obsolete: use C code for headrevs calculation
Previously, if there were filtered revs the repository could not use the C fast
path for computing the head revs in the changelog. This slowed down many
operations in large repositories.

This adds the ability to filter revs to the C fast path. This speeds up histedit
on repositories with filtered revs by 30% (13s to 9s). This could be improved
further by sorting the filtered revs and walking the sorted list while we walk
the changelog, but even this initial version that just calls __contains__ is
still massively faster.

The new C api is compatible for both new and old python clients, and the new
python client can call both new and old C apis.
2014-09-16 16:03:21 -07:00