Commit Graph

95 Commits

Author SHA1 Message Date
Durham Goode
2e0dd44c19 branchmap: handle nullrev in setcachedata
906be86990 recently changed to switch from:

  self._rbcrevs[rbcrevidx:rbcrevidx + _rbcrecsize] = rec

to

  pack_into(_rbcrecfmt, self._rbcrevs, rbcrevidx, node, branchidx)

This causes an exception if rbcrevidx is -1 (i.e. the nullrev). The old code
handled this because python handles out of bound sets to arrays gracefully. The
new code throws because the self._rbcrevs buffer isn't long enough to write 8
bytes to.  Normally it would've been resized by the immediately preceding line,
but because the 0 length buffer is greater than the idx (-1) times the size, no
resize happens.

Setting the branch for the nullrev doesn't make sense anyway, so let's skip it.
This was caught by external tests in the Facebook extensions repo, but I've
added a test here that catches the issue.
2017-03-15 15:48:57 -07:00
Augie Fackler
38e6574e36 branchmap: fix python 2.6 by using util.buffer() instead of passing bytearray 2017-03-12 19:47:51 -04:00
Mads Kiilerich
d97e14e32b rbc: empty (and invalid) rbc-names file should give an empty name list
An empty file (if it somehow should exist) used to give a list with an empty
name. That didn't do any harm, but it was "wrong". Fix that.
2017-03-12 12:17:30 -07:00
Mads Kiilerich
d6292de3bd rbc: use struct unpack_from and pack_into instead of unpack and pack
These functions were introduced in Python 2.5 and are faster and simpler than
the old ones ...  mainly because we can avoid intermediate buffers:

  $ python -m timeit -s "_rbcrecfmt='>4sI'" -s 's = "x"*10000' -s 'from struct import unpack' 'unpack(_rbcrecfmt, buffer(s, 16, 8))'
  1000000 loops, best of 3: 0.543 usec per loop
  $ python -m timeit -s "_rbcrecfmt='>4sI'" -s 's = "x"*10000' -s 'from struct import unpack_from' 'unpack_from(_rbcrecfmt, s, 16)'
  1000000 loops, best of 3: 0.323 usec per loop

  $ python -m timeit -s "from array import array" -s "_rbcrecfmt='>4sI'" -s "s = array('c')" -s 's.fromstring("x"*10000)' -s 'from struct import pack' -s "rec = array('c')" 'rec.fromstring(pack(_rbcrecfmt, "asdf", 7))'
  1000000 loops, best of 3: 0.364 usec per loop
  $ python -m timeit -s "from array import array" -s "_rbcrecfmt='>4sI'" -s "s = array('c')" -s 's.fromstring("x"*10000)' -s 'from struct import pack_into' -s "rec = array('c')" -s 'rec.fromstring("x"*100)' 'pack_into(_rbcrecfmt, rec, 0, "asdf", 7)'
  1000000 loops, best of 3: 0.229 usec per loop
2016-10-19 02:46:35 +02:00
Yuya Nishihara
bec7ade60c py3: drop unused aliases to array.array which are replaced with bytearray 2017-03-12 11:47:02 -07:00
Augie Fackler
9c70a09b17 branchmap: stringify int in a portable way
We actually need a bytes in Python 3, and thanks to our nasty source
loader this will portably do the right thing.
2017-03-12 00:42:46 -05:00
Augie Fackler
0c31289213 branchmap: don't use buffer() on Python 3
This is certainly slower than the Python 2 code, but it works, and we
can revisit it later if it's a problem.
2017-03-12 00:49:19 -05:00
Augie Fackler
9a15a28705 py3: use bytearray() instead of array('c', ...) constructions
Portable from 2.6-3.6.
2017-03-12 03:32:21 -04:00
Simon Farnsworth
e0b70e4f7f mercurial: switch to util.timer for all interval timings
util.timer is now the best available interval timer, at the expense of not
having a known epoch. Let's use it whenever the epoch is irrelevant.
2017-02-15 13:17:39 -08:00
Pierre-Yves David
d7ae979f6e branchmap: remove extra indent
This clean up the rest of the previous changeset.
2016-08-05 15:01:16 +02:00
Pierre-Yves David
35ede6eab9 branchmap: simplify error handlind when writing rev branch cache
Now that we have a general try except, we can move the error handling from the
individual writes in it.
Code will be reindented in the next changeset to help this on readability.
2016-08-05 15:00:53 +02:00
Pierre-Yves David
bcc53027a6 branchmap: acquires lock before writting the rev branch cache
We now attempt to acquire a lock and write the branch cache within that lock.
This would prevent cache corruption when multiple processes try to write the cache
at the same time.
2016-08-05 14:57:16 +02:00
Pierre-Yves David
b236d4e99c branchmap: preparatory indent of indent the branch rev writing code
The rev branch cache is written without a lock, we are going to fix this but we
indent the code beforehand to make the next changeset clearer.
2016-08-05 14:54:46 +02:00
Mads Kiilerich
ffd590bdea rbc: fix superfluous rebuilding from scratch - don't abuse self._rbcnamescount
The code used self._rbcnamescount as if it was the length of self._names ...
but actually it is just the number of good entries on disk. This caused the
cache to be populated inefficiently. In some cases very inefficiently.

Instead of checking the length before lookup, just try a lookup in self._names
- that is also in most cases faster.

Comments and debug messages are tweaked to help understanding the issue
and the fix.
2016-07-18 22:25:09 +02:00
Mads Kiilerich
9d89999d53 rbc: fix invalid rbc-revs entries caused by missing cache growth
It was in some cases possible to end up writing to the cache file without
growing it first. The range assignment in _setcachedata would append instead of
writing at the requested position and thus write the new record in the wrong
place.

To fix this, we avoid looking up in too small caches, and when growing the
cache, do it right before writing the new record to it so we know it has been
done correctly.
2016-07-18 22:22:38 +02:00
Gregory Szorc
c98d59c36a branchmap: remove unused exception variable 2016-03-12 16:08:19 -08:00
Mads Kiilerich
a4030fe6b2 cache: rebuild branch cache from scratch when inconsistencies are detected
This should recover automatically from some corruptions that for unknown
reasons are seen in the wild.
2016-03-13 02:06:23 +01:00
Mads Kiilerich
3f4cf91261 cache: safer handling of failing seek when writing revision branch cache
If the seek for some reason fails (perhaps because the file is too short to
search to the requested position), make sure we seek to the start and rewrite
everything.

It is unknown if this fixes a real problem that ever happened.
2016-03-13 02:06:22 +01:00
Mads Kiilerich
de2745150d cache: remove branch revision file before rewriting the branch name file
New branch names are usually appended to the branch name file. If that fails or
the file has been modified by another process, it is rewritten. That left a
small opportunity that there could be references to non-existent entries in the
file while it was rewritten.

To avoid that, remove the revision branch cache file with the references to the
branch name file before rewriting the branch name file. Worst case, when
interrupted at the wrong time, the cache will be lost and rebuilt next time.

It is unknown if this fixes a real problem that ever happened.
2016-03-13 02:06:21 +01:00
Durham Goode
ccda03e5b2 branchmap: check node against changelog instead of repo
Testing 'node in repo' requires constructing a changectx, which is a little
expensive.  Testing 'repo.changelog.hasnode(node)' is notably faster. This
saves 10-20ms off of every command, when testing a few thousand nodes from the
branch cache.

I considered changing the implementation of localrepository.__contains__ so
every place would benefit from the change, but since
localrepository.__contains__ uses changectx to check if the commit exists, it
means it supports a wider range of possible inputs (like revs, hashes, '.',
etc), so it seemed unnecessarily risky.
2016-03-07 17:26:47 -08:00
Pierre-Yves David
30913031d4 error: get Abort from 'error' instead of 'util'
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.

For great justice.
2015-10-08 12:55:45 -07:00
Gregory Szorc
eeba469be5 branchmap: move branch cache code out of streamclone.py
This is low-level branch map and cache manipulation code. It deserves to
live next to similar code in branchmap.py. Moving it also paves the road
for multiple consumers, such as a bundle2 part handler that receives
branch mappings from a remote.

This is largely a mechanical move, with only variable names and
indentation being changed.
2015-10-03 09:53:56 -07:00
Gregory Szorc
093007295f branchmap: use absolute_import 2015-08-07 19:51:55 -07:00
Gregory Szorc
5380dea2a7 global: mass rewrite to use modern exception syntax
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".

This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.

This patch was produced by running `2to3 -f except -w -n .`.
2015-06-23 22:20:08 -07:00
Yuya Nishihara
3ca8d79bef revbranchcache: return uncached branchinfo for nullrev (issue4683)
This fixes the crash caused by "branch(null)" revset. No cache should be
necessary for nullrev because changelog.branchinfo(nullrev) does not involve
IO operation.

Note that the problem of "branch(wdir())" isn't addressed by this patch.
"wdir()" will raise TypeError in many places because of None. This is the
reason why "wdir()" is still experimental.
2015-05-23 11:14:00 +09:00
Mads Kiilerich
f11e1d55b9 branchcache: stay silent if failing to read cache files
The warning has in some cases incorrectly attributed unrelated problems to rbc.

Instead, just do like the branch head cache does and stay quiet when reading
fails. The cache will be missing the first time a repo is used. It is a normal
situation and there is no reason to make a note of that.
2015-01-14 01:15:26 +01:00
Gregory Szorc
5b06fcaa29 repoview: move function for computing filtered hash
An upcoming patch will establish per-filter tags caches. We'll want
to use the same cache validation logic as the branch cache. Prepare
for that by moving the logic for computing a filtered view hash
to somewhere central.
2015-04-01 18:43:29 -07:00
Durham Goode
e0b3f09a3b revbranchcache: write cache even during read operations
Previously we would only actually write the revbranchcache to disk if we were in
the middle of a write operation (like commit). Now we will also write it during
any read operation. The cache knows how to invalidate itself, so it shouldn't
become corrupt if multiple writers try at once (and the write-on-read
behavior/risk is the same as all our other caches).
2015-02-24 18:43:31 -08:00
Durham Goode
9d9fb7f2ee revbranchcache: move cache writing to the transaction finalizer
Instead of writing the revbranchcache during updatecache (which often happens
too early, before the cache is even populated), let's run it as part of the
transaction finalizer. It still won't be written for read-only operations, but
that's no worse than it is today.

A future commit will remove the actual write that happens in updatecache().

This is also good prep for when all caches get moved into the transaction.
2015-02-10 20:06:12 -08:00
Durham Goode
f333b64f2f revbranchcache: populate cache incrementally
Previously the cache would populate completely the first time it was accessed.
This could take over a minute on larger repos. This patch changes it to update
incrementally.  Only values that are read will be written, and it will only
rewrite as much of the file as strictly necessary.

This adds a magic value of '\0\0\0\0' to represent an empty cache entry. The
probability of this matching an actual commit hash prefix is tiny, so it's ok if
that's always considered a cache miss. This is also BC safe since any existing
entries with '\0\0\0\0' will just be considered misses.

Perf numbers:

Mozilla-central: hg --time log -r 'branch(mobile)' -T.
Cold Cache: 14.7s -> 15.1s (3% worse)
Warm Cache: 1.6s -> 2.1s (30% worse)

Mozilla-cental: hg perfbranchmap
2s -> 2.4s (20% worse)

hg: hg log -r 'branch(stable) & branch(default)'
Cold Cache: 3.1s -> 1.9s (40% better - because the old code missed the cache on
both branch() revset iterations, so it did twice the work)
Warm Cache: 0.2 -> 0.26 (30% worse)

internal huge repo: hg --time log -r 'tip & branch(default)'
Cold Cache: 65.4s -> 0.2s (327x better)

While this change introduces minor regressions when iterating over every commit
in a branch, it massively improves the cold cache time for operations which
touch a single commit. I feel the better O() is worth it in this case.
2015-02-10 20:04:47 -08:00
Durham Goode
81ca4a2723 revbranchcache: move entry writing to a separate function
This moves the actual writing of entries to the cache to a separate function.
This will allow us to use it in multiple places. Ex: in one place we will write
dummy entries, and in another place we will write real data.
2015-02-10 20:01:08 -08:00
Durham Goode
23a18a419d revbranchcache: store repo on the object
Previously we would instantiate the revbranchcache with a repo object, use it
briefly, then require it be passed in every time we wanted to fetch any
information. This seems unnecessary since it's obviously specific to that repo
(since it was constructed with it).

This patch stores the repo on the revbranchcache object, and removes the repo
parameter from the various functions on that class. This has the other nice
benefit of removing the double-revbranchcache-read that existed before (it was
read once for the branch revset, and once for the repo.revbranchcache).
2015-02-10 19:57:51 -08:00
Durham Goode
9ac6d81ba3 revbranchcache: move out of branchmap onto localrepo
Previously the revbranchcache was a field inside the branchmap. This is bad for
a couple reasons:

1) There can be multiple branchmaps per repo (one for each filter level). There
can only be one revbranchcache per repo. In fact, a revbranchcache could only
exist on a branchmap that was for the unfiltered view, so you could have
branchmaps exist for which you couldn't have a revbranchcache. It was funky.
2) The write lifecycle for the revbranchcache is going to be different from
the branchmap (branchmap is greedily written early on, revbranchcache
should be lazily computed and written).

This patch moves the revbranchcache to live as a field on the localrepo
(alongside self._branchmap). This will allow us to handle it's lifecycle
differently, which will let us move it to be lazily computed in future patches.
2015-02-10 19:53:48 -08:00
Matt Mackall
b907416f7b merge with stable 2015-03-02 01:20:14 -06:00
Mads Kiilerich
56207b4242 revisionbranchcache: fall back to slow path if starting readonly (issue4531)
Transitioning to Mercurial versions with revision branch cache could be slow as
long as all operations were readonly (revset queries) and the cache would be
populated but not written back.

Instead, fall back to using the consistently slow path when readonly and the
cache doesn't exist yet. That avoids the overhead of populating the cache
without writing it back.

If not readonly, it will still populate all missing entries initially. That
avoids repeated writing of the cache file with small updates, and it also makes
sure a fully populated cache available for the readonly operations.
2015-02-06 02:52:10 +01:00
Angel Ezquerra
88cbab7845 localrepo: remove all external users of localrepo.opener
This change touches every module in which repository.opener was being used, and
changes it for the equivalent repository.vfs. This is meant to make it easier
to split the repository.vfs into several separate vfs.

It should now be possible to remove localrepo.opener.
2015-01-15 23:17:12 +01:00
Mads Kiilerich
bc7c34a53f branchcache: make _rbcrevslen handling more safe
self._rbcrevslen is used to keep track of the number of good records on disk.
It should thus not be updated before the records actually have been written to
disk.
2015-01-14 01:15:26 +01:00
Mads Kiilerich
49a09d3e6c branchcache: add debug output whenever cache files use truncate
The cache files are usually append only but will automatically be truncated and
recover in exceptional situations. Add a debug notice when such exceptional
situations are encountered.
2015-01-14 01:15:26 +01:00
Matt Harbison
9825da1159 branchmap: add seek() to end of file before calling tell() on append open()
This is similar to 5274228efcdc, which was subsequently modified in dd809b0d9714
for 2.4.  Unexpected test changes on Windows occurred without this.
2015-01-10 12:00:03 -05:00
Mads Kiilerich
835157e77d branchmap: use revbranchcache when updating branch map
The revbranchcache is read on demand before it will be used for updating the
branch map. It is written back when the branchmap is written and it will thus
use the same locking as branchmap. The revbranchcache instance is short-lived;
it is only stored in the branchmap from .update() is invoked and until .write()
is invoked. Branchmap already assume that the repo is locked in that case.

The use of revbranchcache for branch map updates will make sure that the
revbranchcache "always" is kept up-to-date.

The perfbranchmap benchmark is somewhat bogus, especially when we can see that
the caching makes a significant difference between the realistic case of a
first run and the rare case of rerunning it with a full cache. Here are some
'base' numbers on mozilla-central:
Before:
! wall 6.912745 comb 6.910000 user 6.840000 sys 0.070000 (best of 3)
After - initial, cache is empty:
! wall 7.792569 comb 7.790000 user 7.720000 sys 0.070000 (best of 3)
After - cache is full:
! wall 0.879688 comb 0.880000 user 0.870000 sys 0.010000 (best of 4)

The overhead when running with empty cache comes from checking, missing and
updating it every time.

Most of the performance improvement comes from not having to extract the branch
info from the changelog. The last doubling of performance comes from no longer
having to convert all branch names to local encoding but reuse the few already
converted branch names.

On the hg repo:
Before:
! wall 0.715703 comb 0.710000 user 0.710000 sys 0.000000 (best of 14)
After:
! wall 0.105489 comb 0.110000 user 0.110000 sys 0.000000 (best of 87)
2015-01-08 00:01:03 +01:00
Mads Kiilerich
1b3892318f branchcache: introduce revbranchcache for caching of revision branch names
It is expensive to retrieve the branch name of a revision. Very expensive when
creating a changectx and calling .branch() every time - slightly less when
using changelog.branchinfo().

Now, to speed things up, provide a way to cache the results on disk in an
efficient format. Each branchname is assigned a number, and for each revision
we store the number of the corresponding branch name. The branch names are
stored in a dedicated file which is strictly append only.

Branch names are usually reused across several revisions, and the total list of
branch names will thus be so small that it is feasible to read the whole set of
names before using the cache. It will however do that it might be more
efficient to use the changelog for retrieving the branch info for a single
revision.

The revision entries are stored in another file. This file is usually append
only, but if the repository has been modified, the file will be truncated and
the relevant parts rewritten on demand.

The entries for each revision are 8 bytes each, and the whole revision file
will thus be 1/8 of 00changelog.i.

Each revision entry contains the first 4 bytes of the corresponding node hash.
This is used as a check sum that always is verified before the entry is used.
That check is relatively expensive but it makes sure history modification is
detected and handled correctly. It will also detect and handle most revision
file corruptions.

This is just a cache. A new format can always be introduced if other
requirements or ideas make that seem like a good idea. Rebuilding the cache is
not really more expensive than it was to run for example 'hg log -b branchname'
before this cache was introduced.

This new method is still unused but promise to make some operations several
times faster once it actually is used.

Abandoning Python 2.4 would make it possible to implement this more efficiently
by using struct classes and pack_into. The Python code could probably also be
micro optimized or it could be implemented very efficiently in C where it would
be easy to control the data access.
2015-01-08 00:01:03 +01:00
Matt Harbison
3b17299e61 branchmap: backout 03f077311ea1
This is no longer needed now that posixfile handles seeking to EOF when it opens
a file in append mode.
2015-01-31 12:42:05 -05:00
Pierre-Yves David
5b2a89c72f branchmap: pre-filter topological heads before ancestors based filtering
We know that topological heads will not be ancestors of anything, so we filter
them out to potentially reduce the range of the ancestors computation.

On a strongly headed repo this gives humble speedup:

    from 0.1984 to 0.1629
2014-08-30 12:33:12 +02:00
Pierre-Yves David
fa154e4139 branchmap: issue a single call to ancestors for all heads
There is no reason to make multiple calls. This provides a massive speedup for
repo with a lot of heads.

On a strongly headed repo this gives humble speedup in simple case:

    from 8.1097 to 5.1051

And massive speedup in other case:

    from 7.8787 to 0.1984
2014-08-30 12:20:50 +02:00
Matt Mackall
7cba48bf37 whitespace: nuke triple blank lines in **.py 2014-08-07 14:58:12 -05:00
Matt Mackall
9e74ea490c branchmap: don't use ui.warn for debug message 2014-06-23 13:50:44 -05:00
Matt Mackall
561e39d121 branch: add debug message for branch cache write failure 2014-06-23 13:46:42 -05:00
Gregory Szorc
d620dd5f0f branchmap: log events related to branch cache
The blackblox log will now contain log events when the branch caches are
updated and written.
2014-03-22 17:14:37 -07:00
Pierre-Yves David
f0e0234ea1 branchmap: use set for update code
We are doing membership test and substraction. new code is marginally faster.
2014-01-06 15:19:31 -08:00
Pierre-Yves David
fc97641ca7 branchmap: simplify update code
We drop iterrevs which are not needed anymore. The know head are never a
descendant of the updated set. It was possible with the old strip code. This
simplification make the code easier to read an update.
2014-01-06 14:26:49 -08:00