Commit Graph

68 Commits

Author SHA1 Message Date
Durham Goode
e0b3f09a3b revbranchcache: write cache even during read operations
Previously we would only actually write the revbranchcache to disk if we were in
the middle of a write operation (like commit). Now we will also write it during
any read operation. The cache knows how to invalidate itself, so it shouldn't
become corrupt if multiple writers try at once (and the write-on-read
behavior/risk is the same as all our other caches).
2015-02-24 18:43:31 -08:00
Durham Goode
9d9fb7f2ee revbranchcache: move cache writing to the transaction finalizer
Instead of writing the revbranchcache during updatecache (which often happens
too early, before the cache is even populated), let's run it as part of the
transaction finalizer. It still won't be written for read-only operations, but
that's no worse than it is today.

A future commit will remove the actual write that happens in updatecache().

This is also good prep for when all caches get moved into the transaction.
2015-02-10 20:06:12 -08:00
Durham Goode
f333b64f2f revbranchcache: populate cache incrementally
Previously the cache would populate completely the first time it was accessed.
This could take over a minute on larger repos. This patch changes it to update
incrementally.  Only values that are read will be written, and it will only
rewrite as much of the file as strictly necessary.

This adds a magic value of '\0\0\0\0' to represent an empty cache entry. The
probability of this matching an actual commit hash prefix is tiny, so it's ok if
that's always considered a cache miss. This is also BC safe since any existing
entries with '\0\0\0\0' will just be considered misses.

Perf numbers:

Mozilla-central: hg --time log -r 'branch(mobile)' -T.
Cold Cache: 14.7s -> 15.1s (3% worse)
Warm Cache: 1.6s -> 2.1s (30% worse)

Mozilla-cental: hg perfbranchmap
2s -> 2.4s (20% worse)

hg: hg log -r 'branch(stable) & branch(default)'
Cold Cache: 3.1s -> 1.9s (40% better - because the old code missed the cache on
both branch() revset iterations, so it did twice the work)
Warm Cache: 0.2 -> 0.26 (30% worse)

internal huge repo: hg --time log -r 'tip & branch(default)'
Cold Cache: 65.4s -> 0.2s (327x better)

While this change introduces minor regressions when iterating over every commit
in a branch, it massively improves the cold cache time for operations which
touch a single commit. I feel the better O() is worth it in this case.
2015-02-10 20:04:47 -08:00
Durham Goode
81ca4a2723 revbranchcache: move entry writing to a separate function
This moves the actual writing of entries to the cache to a separate function.
This will allow us to use it in multiple places. Ex: in one place we will write
dummy entries, and in another place we will write real data.
2015-02-10 20:01:08 -08:00
Durham Goode
23a18a419d revbranchcache: store repo on the object
Previously we would instantiate the revbranchcache with a repo object, use it
briefly, then require it be passed in every time we wanted to fetch any
information. This seems unnecessary since it's obviously specific to that repo
(since it was constructed with it).

This patch stores the repo on the revbranchcache object, and removes the repo
parameter from the various functions on that class. This has the other nice
benefit of removing the double-revbranchcache-read that existed before (it was
read once for the branch revset, and once for the repo.revbranchcache).
2015-02-10 19:57:51 -08:00
Durham Goode
9ac6d81ba3 revbranchcache: move out of branchmap onto localrepo
Previously the revbranchcache was a field inside the branchmap. This is bad for
a couple reasons:

1) There can be multiple branchmaps per repo (one for each filter level). There
can only be one revbranchcache per repo. In fact, a revbranchcache could only
exist on a branchmap that was for the unfiltered view, so you could have
branchmaps exist for which you couldn't have a revbranchcache. It was funky.
2) The write lifecycle for the revbranchcache is going to be different from
the branchmap (branchmap is greedily written early on, revbranchcache
should be lazily computed and written).

This patch moves the revbranchcache to live as a field on the localrepo
(alongside self._branchmap). This will allow us to handle it's lifecycle
differently, which will let us move it to be lazily computed in future patches.
2015-02-10 19:53:48 -08:00
Matt Mackall
b907416f7b merge with stable 2015-03-02 01:20:14 -06:00
Mads Kiilerich
56207b4242 revisionbranchcache: fall back to slow path if starting readonly (issue4531)
Transitioning to Mercurial versions with revision branch cache could be slow as
long as all operations were readonly (revset queries) and the cache would be
populated but not written back.

Instead, fall back to using the consistently slow path when readonly and the
cache doesn't exist yet. That avoids the overhead of populating the cache
without writing it back.

If not readonly, it will still populate all missing entries initially. That
avoids repeated writing of the cache file with small updates, and it also makes
sure a fully populated cache available for the readonly operations.
2015-02-06 02:52:10 +01:00
Angel Ezquerra
88cbab7845 localrepo: remove all external users of localrepo.opener
This change touches every module in which repository.opener was being used, and
changes it for the equivalent repository.vfs. This is meant to make it easier
to split the repository.vfs into several separate vfs.

It should now be possible to remove localrepo.opener.
2015-01-15 23:17:12 +01:00
Mads Kiilerich
bc7c34a53f branchcache: make _rbcrevslen handling more safe
self._rbcrevslen is used to keep track of the number of good records on disk.
It should thus not be updated before the records actually have been written to
disk.
2015-01-14 01:15:26 +01:00
Mads Kiilerich
49a09d3e6c branchcache: add debug output whenever cache files use truncate
The cache files are usually append only but will automatically be truncated and
recover in exceptional situations. Add a debug notice when such exceptional
situations are encountered.
2015-01-14 01:15:26 +01:00
Matt Harbison
9825da1159 branchmap: add seek() to end of file before calling tell() on append open()
This is similar to 5274228efcdc, which was subsequently modified in dd809b0d9714
for 2.4.  Unexpected test changes on Windows occurred without this.
2015-01-10 12:00:03 -05:00
Mads Kiilerich
835157e77d branchmap: use revbranchcache when updating branch map
The revbranchcache is read on demand before it will be used for updating the
branch map. It is written back when the branchmap is written and it will thus
use the same locking as branchmap. The revbranchcache instance is short-lived;
it is only stored in the branchmap from .update() is invoked and until .write()
is invoked. Branchmap already assume that the repo is locked in that case.

The use of revbranchcache for branch map updates will make sure that the
revbranchcache "always" is kept up-to-date.

The perfbranchmap benchmark is somewhat bogus, especially when we can see that
the caching makes a significant difference between the realistic case of a
first run and the rare case of rerunning it with a full cache. Here are some
'base' numbers on mozilla-central:
Before:
! wall 6.912745 comb 6.910000 user 6.840000 sys 0.070000 (best of 3)
After - initial, cache is empty:
! wall 7.792569 comb 7.790000 user 7.720000 sys 0.070000 (best of 3)
After - cache is full:
! wall 0.879688 comb 0.880000 user 0.870000 sys 0.010000 (best of 4)

The overhead when running with empty cache comes from checking, missing and
updating it every time.

Most of the performance improvement comes from not having to extract the branch
info from the changelog. The last doubling of performance comes from no longer
having to convert all branch names to local encoding but reuse the few already
converted branch names.

On the hg repo:
Before:
! wall 0.715703 comb 0.710000 user 0.710000 sys 0.000000 (best of 14)
After:
! wall 0.105489 comb 0.110000 user 0.110000 sys 0.000000 (best of 87)
2015-01-08 00:01:03 +01:00
Mads Kiilerich
1b3892318f branchcache: introduce revbranchcache for caching of revision branch names
It is expensive to retrieve the branch name of a revision. Very expensive when
creating a changectx and calling .branch() every time - slightly less when
using changelog.branchinfo().

Now, to speed things up, provide a way to cache the results on disk in an
efficient format. Each branchname is assigned a number, and for each revision
we store the number of the corresponding branch name. The branch names are
stored in a dedicated file which is strictly append only.

Branch names are usually reused across several revisions, and the total list of
branch names will thus be so small that it is feasible to read the whole set of
names before using the cache. It will however do that it might be more
efficient to use the changelog for retrieving the branch info for a single
revision.

The revision entries are stored in another file. This file is usually append
only, but if the repository has been modified, the file will be truncated and
the relevant parts rewritten on demand.

The entries for each revision are 8 bytes each, and the whole revision file
will thus be 1/8 of 00changelog.i.

Each revision entry contains the first 4 bytes of the corresponding node hash.
This is used as a check sum that always is verified before the entry is used.
That check is relatively expensive but it makes sure history modification is
detected and handled correctly. It will also detect and handle most revision
file corruptions.

This is just a cache. A new format can always be introduced if other
requirements or ideas make that seem like a good idea. Rebuilding the cache is
not really more expensive than it was to run for example 'hg log -b branchname'
before this cache was introduced.

This new method is still unused but promise to make some operations several
times faster once it actually is used.

Abandoning Python 2.4 would make it possible to implement this more efficiently
by using struct classes and pack_into. The Python code could probably also be
micro optimized or it could be implemented very efficiently in C where it would
be easy to control the data access.
2015-01-08 00:01:03 +01:00
Matt Harbison
3b17299e61 branchmap: backout 03f077311ea1
This is no longer needed now that posixfile handles seeking to EOF when it opens
a file in append mode.
2015-01-31 12:42:05 -05:00
Pierre-Yves David
5b2a89c72f branchmap: pre-filter topological heads before ancestors based filtering
We know that topological heads will not be ancestors of anything, so we filter
them out to potentially reduce the range of the ancestors computation.

On a strongly headed repo this gives humble speedup:

    from 0.1984 to 0.1629
2014-08-30 12:33:12 +02:00
Pierre-Yves David
fa154e4139 branchmap: issue a single call to ancestors for all heads
There is no reason to make multiple calls. This provides a massive speedup for
repo with a lot of heads.

On a strongly headed repo this gives humble speedup in simple case:

    from 8.1097 to 5.1051

And massive speedup in other case:

    from 7.8787 to 0.1984
2014-08-30 12:20:50 +02:00
Matt Mackall
7cba48bf37 whitespace: nuke triple blank lines in **.py 2014-08-07 14:58:12 -05:00
Matt Mackall
9e74ea490c branchmap: don't use ui.warn for debug message 2014-06-23 13:50:44 -05:00
Matt Mackall
561e39d121 branch: add debug message for branch cache write failure 2014-06-23 13:46:42 -05:00
Gregory Szorc
d620dd5f0f branchmap: log events related to branch cache
The blackblox log will now contain log events when the branch caches are
updated and written.
2014-03-22 17:14:37 -07:00
Pierre-Yves David
f0e0234ea1 branchmap: use set for update code
We are doing membership test and substraction. new code is marginally faster.
2014-01-06 15:19:31 -08:00
Pierre-Yves David
fc97641ca7 branchmap: simplify update code
We drop iterrevs which are not needed anymore. The know head are never a
descendant of the updated set. It was possible with the old strip code. This
simplification make the code easier to read an update.
2014-01-06 14:26:49 -08:00
Pierre-Yves David
7641136888 branchmap: stop useless rev -> node -> rev round trip
We never use the node of new revisions unless in the very specific case of
closed heads. So we can just use the revision number.

So give another handfull of percent speedup.
2014-01-03 16:44:23 -08:00
Pierre-Yves David
4a3c0bd301 branchmap: stop membership test in update logic
Now that no user try to update the cache on a truncated repo we can drop the
extra lookup. Give an handfull percent speedup on big branchmap update.
2013-01-15 20:04:12 +01:00
Pierre-Yves David
f4aa184d64 branchmap: remove silly line break
The line fit in 80 character limit without it. It is even shorter without it.
2014-01-03 17:06:07 -08:00
Mads Kiilerich
1428b73c06 help: branch names primarily denote the tipmost unclosed branch head
Was the behavior correct and the description wrong so it should be updated as
in this patch? Or should the code work as the documentation says?

Both ways could make some sense ... but none of them are obvious in all cases.

One place where it currently cause problems is when the current revision has
another branch head that is closer to tip but closed. 'hg rebase' refuses to
rebase to that as it only see the tip-most unclosed branch head which is the
current revision.

/me kind of likes named branches, but no so much how branch closing works ...
2013-11-21 15:17:18 -05:00
Brodie Rao
34af0d72ea branchmap: introduce iterbranches() method 2013-09-16 01:08:29 -07:00
Brodie Rao
f0a5d60210 branchmap: introduce branchheads() method 2013-09-16 01:08:29 -07:00
Brodie Rao
b2b08444eb branchmap: introduce branchtip() method 2013-09-16 01:08:29 -07:00
Brodie Rao
a446720e09 branchmap: cache open/closed branch head information
This lets us determine the open/closed state of a branch without
reading from the changelog (which can be costly over NFS and/or with
many branches).
2013-09-16 01:08:29 -07:00
Brodie Rao
b8ea796521 branchmap: add documentation on the branchcache on-disk format 2013-11-15 23:18:08 -05:00
Augie Fackler
2859ed15ec subsettable: move from repoview to branchmap, the only place it's used
This is a step towards breaking an import cycle between revset and
repoview. Import cycles happened to work in Python 2 with implicit
relative imports, but breaks on Python 3 when we start using explicit
relative imports via 2to3 rewrite rules.
2013-11-06 14:38:34 -05:00
Pierre-Yves David
b5bc8d504a branchmap: stop looking for stripped branch
Since repoview in 2.5 we do not make special call to `branchmap` when stripping.
We just recompute the branchmap from a lower subset that still has valid
branchmap. So I'm dropping this dead code.
2013-09-30 17:42:38 +02:00
Pierre-Yves David
64005d41eb branchmap: remove the droppednodes logic
It was unused. note how it is only extended if the list is empty. So it's always
empty at the end.

We could try to fix that, however this would part of the code is to be removed
in the next changeset as we do not run `branchmap` on truncated repo since
`repoview` in 2.5.
2013-09-30 17:31:39 +02:00
Pierre-Yves David
a58c8b0406 branchmap: fix blank line position
The blank line was after was after the `if` condition instead of before.
2013-09-30 15:52:37 +02:00
Mads Kiilerich
5787baee50 spelling: fix some minor issues found by spell checker 2013-02-10 18:24:29 +01:00
Pierre-Yves David
e0122c56e3 branchmap: display filtername when updatebranch fails to do its jobs
We have a very handy assert at the ends of `branchmap.updatecache` that check
the resulting branchmap is actually valid.

I know we do not like assert in mercurial but this one is very handy for
debugging. There is really not reason for `branchmap.updatecache` to have this
kind of issue but this happened and handful of time during the development of
this or introduction of other related feature. I advice to keep it around until
we are a bit more confident with the new code.
2013-01-19 02:29:56 +01:00
Mads Kiilerich
641f4b8a6c localrepo: store branchheads sorted 2013-01-15 02:59:12 +01:00
Pierre-Yves David
77db5734c1 branchmap: Save changectx creation during update
The newly introduced `branchmap` function allows us to skip the
creation of changectx objects. This speeds up the construction of
the branchmap.

On the mozilla repository (117293 changesets, 15490 mutable)

Before:
  ! impactable 19.9
  ! mutable 0.576
  ! unserved 3.16

After:
  ! impactable 7.03 (2.8x faster)
  ! mutable 0.352 (1.6x)
  ! unserved 1.15 (2.7x)

On the cpython repository (81418 changesets, 6418 mutable)

Before:
  ! impactable 15.9
  ! mutable 0.451
  ! unserved 0.861

After:
  ! impactable 6.55 (2.4x faster)
  ! mutable 0.170 (2.6x faster)
  ! unserved 0.289 (2.9x faster)

On the pypy repository (58852 changesets)

Before:
  ! impactable 13.6

After:
  ! impactable 6.17 (2.2x faster)

On my Mercurial repository (18295 changesets, 2210 mutable)

Before:
  ! impactable 23.9
  ! mutable 0.368
  ! unserved 0.057

After:
  ! impactable 1.31 (18x faster)
  ! mutable 0.042 (8.7x)
  ! unserved 0.025 (2.2x)
2013-01-11 18:47:42 +01:00
Pierre-Yves David
4bd2fce08b branchmap: pass revision insteads of changectx to the update function
Creation of changectx objects is very slow, and they are not very
useful. We are going to drop them. The first step is to change the
function argument type.
2013-01-08 01:28:39 +01:00
Pierre-Yves David
01b68ae973 branchmap: allow to use cache of subset
Filtered repository are *subset* of unfiltered repository. This means that a
filtered branchmap could be use to compute the unfiltered version.

And filtered version happen to be subset of each other:
- "all() - unserved()" is a subset of "all() - hidden()"
- "all() - hidden()" is a subset of "all()"

This means that branchmap with "unfiltered" filter can be used as a base for
"hidden" branchmap that itself could be used as a base for unfiltered
branchmap.

   unserved < hidden < None

This changeset implements this mechanism. If the on disk branchcache is not valid
we use the branchcache of the nearest subset as base instead of computing it from
scratch. Such fallback can be cascaded multiple time is necessary.

Note that both "hidden" and "unserved" set are a bit volatile. We will add more
stable filtering in next changesets.

This changeset enables collaboration between no filtering and "unserved"
filtering. Fixing performance regression introduced by 7bff5f37cb97
2013-01-07 17:23:25 +01:00
Pierre-Yves David
7b8d884b29 branchmap: add a copy method
If we want branchcache of different filter to collaborate, they need a simple
way to copy each other. This will ensure that each filtered have no side effect
on other filter level cache.
2013-01-02 01:40:42 +01:00
Pierre-Yves David
ea8f599221 branchmap: drop _cacheabletip usage in updatecache
Nobody overwrite the `_cacheabletip` any more. We always update the cache for
the whole repo and write it to disk (or at list try to). The `updatecache` code
is simplied to remove the double phase logic associated with _cacheabletip.
2013-01-04 01:25:55 +01:00
Pierre-Yves David
e5d81232c2 branchmap: ignore Abort error while writing cache
Read only vfs can now raise Abort exception. Note that encoding.local are also a
possible raiser.
2013-01-04 04:52:57 +01:00
Pierre-Yves David
f01949c09e branchmap: read return None in case of failure
This makes a clear distinction between having read a valid cache on disk or not.
This will help caches of various filtering level to collaborate.
2012-12-22 19:41:11 +01:00
Pierre-Yves David
0cd9115520 branchmap: enable caching for filtered version too
The `_branchcache` attribute is turned into a dictionary. Key are filter name and
value is a `branchcache` object. Unfiltered version is cached as `None` filter.

The attribute is renamed to `_branchcaches` to avoid confusion with the previous
one. Both old and new contents are dictionary even if their contents are
different. I prefer possible extension code to crash right away instead of just
messing the wrong dictionary.

As all different caches work isolated to each other, this code keeps the
previous behavior of using the unfiltered cache  we nothing is filtered.  This
is a cheap way to have cache collaborate and nullify potential impact in the
default case.
2012-12-24 03:21:15 +01:00
Pierre-Yves David
daf9851247 branchmap: report filtername when read fails
Now that we can have multiple one, we need to know which filecache failed to be
read from disk.
2013-01-01 21:27:13 +01:00
Pierre-Yves David
256c2dfbf0 branchmap: use a different file name for filtered view of repo 2012-12-24 03:06:03 +01:00
Pierre-Yves David
4ad32b10f3 branchmap: move the cache file name into a dedicated function
Filtered view of the repo will want to write they file name in a different file.
2012-12-24 03:04:12 +01:00