Commit Graph

253 Commits

Author SHA1 Message Date
Martin von Zweigbergk
126f9b1a2d treemanifest: fix bad argument order to treemanifestctx
Found by running tests with _treeinmem (both of them) modified to be
True.
2016-10-17 16:12:12 -07:00
Maciej Fijalkowski
6efacdd6af lazymanifest: write a more efficient, pypy friendly version of lazymanifest 2016-09-12 13:37:14 +02:00
FUJIWARA Katsunori
d542d9c7d2 manifest: specify checkambig=True to revlog.__init__, to avoid ambiguity
If steps below occurs at "the same time in sec", all of mtime, ctime
and size are same between (1) and (3).

  1. append data to 00manifest.i (and close transaction)
  2. discard appended data by truncation (strip or rollback)
  3. append same size but different data to 00manifest.i again

Therefore, cache validation doesn't work after (3) as expected.

To avoid such file stat ambiguity around truncation, this patch
specifies checkambig=True to revlog.__init__(). This makes revlog
write changes out with checkambig=True.

Even after this patch, avoiding file stat ambiguity of 00manifest.i
around truncation isn't yet completed, because truncation side isn't
aware of this issue.

This is a part of ExactCacheValidationPlan.

    https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
2016-09-22 21:51:58 +09:00
Durham Goode
847e1dc1fe manifest: add manifestlog.add
This adds a simple add() function to manifestlog. This lets us convert more
uses of repo.manifest to use repo.manifestlog, so we can further break our
dependency on the manifest class.
2016-09-20 12:24:01 -07:00
Durham Goode
48c110bc3e manifest: move manifest.add onto manifestrevlog
This moves add and _addtree onto manifestrevlog. manifestrevlog is responsible
for all serialization decisions, so therefore the add function should live on
it. This will allow us to call add() from manifestlog, which lets us further
break our dependency on manifest.
2016-09-20 12:24:01 -07:00
Durham Goode
e3b4c3f5e1 manifest: remove dependency on treeinmem from manifest.add
Currently manifest.add uses the treeinmem option to know if it can call
fastdelta on the given manifest instance. In a future patch we will be moving
add() to be on the manifestrevlog, so it won't have access to the treeinmem
option anymore. Instead, let's have it actually check if the given manifest
instance supports the fastdelta operation.

This also means that if treemanifest or any implementation eventually implements
fastdelta(), it will automatically benefit from this code path.
2016-09-20 12:24:01 -07:00
Durham Goode
c18c20b7de manifest: move treeinmem onto manifestlog
A previous patched moved all the serialization related options onto
manifestrevlog (since it is responsible for serialization). Let's move the
treeinmem option on manifestlog, since it is responsible for materialization
decisions. This reduces the number of dependencies manifestlog has on the old
manifest type as well, so we can eventually make them completely independent of
each other.
2016-09-20 12:24:01 -07:00
Durham Goode
58c889e31a manifest: move dirlog up to manifestrevlog
This removes dirlog and its associated cache from manifest and puts it in
manifestrevlog. The notion of there being sub-logs is specific to the revlog
implementation, and therefore belongs on the revlog class.

This patch will enable future patches to move the serialization logic for
manifests onto manifestrevlog, which will allow us to move manifest.add onto
manifestlog in a way that it just calls out to manifestrevlog for the
serialization.
2016-09-13 16:00:41 -07:00
Durham Goode
e5533574ca manifest: move revlog specific options from manifest to manifestrevlog
The manifestv2 and treeondisk options are specific to how we serialize the
manifest into revlogs, so let's move them onto the manifestrevlog class. This
will allow us to add a manifestlog.add() function in a future diff that will
rely on manifestrevlog to make decisions about how to serialize the given
manifest to disk.

We have to move a little bit of extra logic about the 'dir' as well, since it is
used in conjunction with the treeondisk option to decide the revlog file name.
It's probably good to move this down to the manifestrevlog class anyway, since
it's specific to the revlog.
2016-09-13 16:00:41 -07:00
Durham Goode
8365e36e59 manifest: adds manifestctx.readfast
This adds a copy of manifest.readfast to manifestctx.readfast and adds a
consumer of it. It currently looks like duplicate code, but a future patch
causes these functions to diverge as tree concepts are added to the tree
version.
2016-09-13 16:26:30 -07:00
Durham Goode
23d229132c manifest: add manifestctx.readdelta()
This adds an implementation of readdelta to the new manifestctx class and adds a
couple consumers of it. This currently appears to have some duplicate code, but
future patches cause this function to diverge when things like "shallow" are
introduced.
2016-09-13 16:25:21 -07:00
Durham Goode
8f48c965d8 manifest: change manifestctx to not inherit from manifestdict
If manifestctx inherits from manifestdict, it requires some weird logic to
lazily load the dict if a piece of information is asked for. This ended up being
complicated and unintuitive to use.

Let's move the dict creation to .read(). This will make even more sense once we
start adding readdelta() and other similar methods to manifestctx.
2016-09-12 10:55:43 -07:00
Pierre-Yves David
cb4c54634b manifest: backed out changeset 3e5e08efafc9
There is some suspicious failure in evolution tests. This changeset was supposed
to be dropped until we investigate.
2016-09-10 01:42:05 +02:00
Pierre-Yves David
eb569a3c73 manifest: backed out changeset ec5be4246a05
There is some suspicious failure in evolution tests. This changeset was supposed
to be dropped until we investigate.
2016-09-10 01:41:38 +02:00
Durham Goode
ab661bf355 manifest: change manifestctx to not inherit from manifestdict
If manifestctx inherits from manifestdict, it requires some weird logic to
lazily load the dict if a piece of information is asked for. This ended up being
complicated and unintuitive to use.

Let's move the dict creation to .read(). This will make even more sense once we
start adding readdelta() and other similar methods to manifestctx.
2016-08-31 12:46:53 -07:00
Durham Goode
e8a39ee6a7 manifest: make uses of _mancache aware of contexts
In a future patch we will change manifestctx and treemanifestctx to no longer
derive from manifestdict and treemanifest, respectively. This means that
consumers of the _mancache will now need to be aware of the different between
the two, until we get rid of the manifest entirely and the _mancache becomes
only filled with ctxs.
2016-08-29 18:02:09 -07:00
Durham Goode
fd7e94b89b manifest: add treemanifestctx class
Before we start using repo.manifestlog in the rest of the code base, we need to
make sure that it's capable of returning treemanifests. As we add new
functionality to manifestctx, we'll add it to treemanifestctx at the same time.

We also comment out the manifestctx p1, p2, and linkrev fields for now, since
we're not implementing them on treemanifest yet.
2016-08-31 13:29:49 -07:00
Durham Goode
133c1fe33e manifest: call m1.load and m2.load before writing a subtree
As part of refactoring the manifest, certain test cases started failing because
writesubtrees was called with p1 and p2 manifests that had not been loaded (so
accessing m1._dirs resulted in an empty set). Let's call _load on these before
attempting to access _dirs.

This was caught by tests when future patches were applied.
2016-08-29 17:48:14 -07:00
Durham Goode
f38741166f manifest: use property instead of field for manifest revlog storage
The file caches we're using to avoid reloading the manifest from disk everytime
has an annoying bug that causes the in memory structure to not be reloaded if
the mtime and the size haven't changed. This causes a breakage in the tests
because the manifestlog is not being reloaded after a commit+strip operation in
mq (the mtime is the same because it all happens in the same second, and the
resulting size is the same because we add 1 and remove 1). The only reason this
doesn't affect the manifest itself is because we touch it so often that we
had already reloaded it after the commit, but before the strip.

Once the entire manifest has migrated to manifestlog, we can get rid of these
properties, since then the manifestlog will be touched after the commit, but
before the strip, as well.
2016-08-17 13:25:13 -07:00
Durham Goode
4c0439aa0a manifest: introduce manifestlog and manifestctx classes
This is the start of a large refactoring of the manifest class. It introduces
the new manifestlog and manifestctx classes which will represent the collection
of all manifests and individual instances, respectively.

Future patches will begin to convert usages of repo.manifest to
repo.manifestlog, adding the necessary functionality to manifestlog and instance
as they are needed.
2016-08-17 13:25:13 -07:00
Durham Goode
948314a949 manifest: make manifest derive from manifestrevlog
As part of our refactoring to split the manifest concept from its storage, we
need to start moving the revlog specific parts of the manifest implementation to
a new class. This patch creates manifestrevlog and moves the fulltextcache onto
the base class.
2016-08-17 13:25:13 -07:00
Durham Goode
9dfdbc1f92 manifest: break mancache into two caches
The old manifest cache cached both the inmemory representation and the raw text.
As part of the manifest refactor we want to separate the storage format from the
in memory representation, so let's split this cache into two caches.

This will let other manifest implementations participate in the in memory cache,
while allowing the revlog based implementations to still depend on the full text
caching where necessary.
2016-08-17 13:25:13 -07:00
Augie Fackler
ba4d11b62e bundlerepo: add support for treemanifests in cg3 bundles
This is a little messier than I'd like, and I'll probably come back
and do some more refactoring later, but as it is this unblocks
narrowhg. An alternative approach (which I may do as part of the
mentioned refactoring) would be to construct *all* dirlog instances up
front, so that we don't have to keep track of the linkmapper
method. This would avoid a reference cycle between the bundlemanifest
and the bundlerepository, but I was hesitant to do all the work up
front like that.

With this change, it's possible to do 'hg incoming' and 'hg pull' from
bundles in .hg/strip-backup in a treemanifest repository. Sadly, this
doesn't make it possible to 'hg clone' one of those (if you do 'hg
strip 0'), because the cg3 in the bundle gets written without a
treemanifest flag. Since that's going to be an involved refactor in a
different part of the code (which I *suspect* won't touch any of the
code I've just written here), let's leave it as an idea for Later.
2016-08-05 13:08:11 -04:00
liscju
c7ec9d159e i18n: translate abort messages
I found a few places where message given to abort is
not translated, I don't find any reason to not translate
them.
2016-06-14 11:53:55 +02:00
Tony Tung
9f3c4b8958 manifest: improve filesnotin performance by using lazymanifest diff
lazymanifests can compute diffs significantly faster than taking the set
of two manifests and calculating the delta.

when running hg diff --git -c . on Facebook's big repo, this reduces the
run time from 2.1s to 1.5s.
2016-05-02 15:22:16 -07:00
Martin von Zweigbergk
58c3ff9aaf changegroup: fix treemanifests on merges
The current code for generating treemanifest revisions takes the list
of files in the changeset and finds the directories from them. This
does not work for merges, since a merge may pick file A from one side
and file B from another and neither of them would appear in the
changeset's "files" list, but the manifest would still change.

Fix this by instead walking the root manifest log for all needed
revisions, storing all needed file and subdirectory revisions, then
recursively visiting the subdirectories. This also turns out to be
faster: cloning a version of hg core converted to treemanifests went
from ~28s to ~19s (timing somewhat unfair: before this patch, timed
until crash; after this patch, timed until manifests complete).

The new algorithm is used only on treemanifest repos. Although it
works equally well on flat manifests, we leave the iteration over
files in the changeset for flat manifests for now.
2016-02-12 23:09:09 -08:00
Martin von Zweigbergk
a05047be24 treemanifest: allow setting flag to 't'
When using treemanifests, an on-disk manifest entry with the 't' flag
set means that that entry is a directory and not a file. When read
into memory, these become instances of the treemanifest class. The 't'
flag should therefore never be visible to outside of manifest.py, so
setflag() checks that it is not called with the 't' flag. However, it
turns out that it will be useful for the narrowhg extension to expose
the 't' flag to the user (see below), so let's drop the assertion.

The narrowhg extension allows cloning only a given set of files and
directories. Filelogs and dirlogs that don't match that set will not
be included in the clone. The extension currently doesn't work with
treemanifests. I plan on changing it so directories outside the narrow
clone appear in the manifest. For example, if a directory 'outside/'
is not part of the narrow clone, it will look like a file 'outside'
with the 't' flag set. That will make e.g. manifestmerge() just work
in most cases (and make it well prepared to handle the other
cases).
2016-02-09 20:22:33 -08:00
Martin von Zweigbergk
9cf0539032 treemanifest: rewrite text() using iterentries()
This simplifies a bit. Note that the function is only used when
manually testing with _treeinmem=True.
2016-02-20 23:57:21 -08:00
Martin von Zweigbergk
8f025f0656 treemanifest: implement iterentries()
To make tests pass with _treeinmem manually set to True, we need to
implement the recently added iterentries() on the treemanifest class
too.
2016-02-07 21:14:01 -08:00
Martin von Zweigbergk
b2b4f9e694 verify: check directory manifests
In repos with treemanifests, there is no specific verification of
directory manifest revlogs. It simply collects all file nodes by
reading each manifest delta. With treemanifests, that's means calling
the manifest._slowreaddelta(). If there are missing revlog entries in
a subdirectory revlog, 'hg verify' will simply report the exception
that occurred while trying to read the root manifest:


  manifest@0: reading delta 1700e2e92882: meta/b/00manifest.i@67688a370455: no node

This patch changes the verify code to load only the root manifest at
first and verify all revisions of it, then verify all revisions of
each direct subdirectory, and so on, recursively. The above message
becomes

  b/@0: parent-directory manifest refers to unknown revision 67688a370455

Since the new algorithm reads a single revlog at a time and in order,
'hg verify' on a treemanifest version of the hg core repo goes from
~50s to ~14s. As expected, there is no significant difference on a
repo with flat manifests.
2016-02-07 21:13:24 -08:00
Gregory Szorc
d6f69e17c6 manifest: use absolute_import 2015-12-21 21:35:46 -08:00
Gregory Szorc
ad1f138bcd manifest: implement clearcaches()
The manifest implements its own caches in addition to revlog's. Extend
the base clearcaches() to wipe these as well.
2015-12-20 19:31:46 -08:00
Martin von Zweigbergk
50de24bc06 treemanifest: don't iterate entire matching submanifests on match()
Before a4236180df5e (match: remove unnecessary optimization where
visitdir() returns 'all', 2015-05-06), match.visitdir() used to return
the special value 'all' to indicate that it was known that all
subdirectories would also be included in the match. The purpose for
that value was to avoid calling the matcher on all the paths. It
turned out that calling the matcher was not a problem, so the special
return value was removed and the code was simplified. However, if we
use the same special value for not just avoiding calling the matcher
on each file, but to avoid iterating over each file, it's a much
bigger win. On commands like

  hg st --rev .^ --rev . dom/

we run the matcher (dom/) on the two manifests, then diff the narrowed
manifest. If the size of the match is much larger than the size of the
diff, this is wasteful. In the above case, we would end up iterating
over the 15k-or-so files in dom/ for each of the manifests, only to
later discover that they are mostly the same. This means that runningt
the command above is usually slower than getting the status for the
entire repo, because that code avoids calling treemanifest.match() and
only calls treemanifest.diff(), which loads only what's needed for the
diff.

Let's fix this by reintroducing the 'all' value in match.visitdir()
and making treemanifest.match() return a lazy copy of the manifest
from dom/ and down (in the above case). This speeds up the above
command on the Firefox repo from 0.357s to 0.137s (best of 5). The
wider the match, the bigger the speedup.
2015-12-12 09:57:05 -08:00
Martin von Zweigbergk
8efd14d515 manifest: use 't' for tree manifest flag
We currently use 'd' to indicate that a manifest entry is a
directory. Let's switch to 't', since that's not a valid hex digit and
therefore easier to spot in the raw manifest data.

This will break any existing repos with tree manifests, but it's still
an experimental feature and there are probably only a few test repos
in existence with 'd' flags.
2015-12-04 14:24:45 -08:00
Durham Goode
9980cf1bdd manifest: skip fastdelta if the change is large
In large repos, the existing manifest fastdelta computation (which performs a
bisect on the raw manifest for every file that is changing), is excessively
slow. This patch makes fastdelta fallback to the normal string delta algorithm
if the number of changes is large.

On a large repo with a commit of 8000 files, this reduces the commit time by 7
seconds (fastdelta goes from 8 seconds to 1).

I tested this change by modifying the function to compare the old and the new
values and running the test suite. The only difference is that the pure
text-diff algorithm sometimes produces smaller (but functionaly identical)
deltatexts than the bisect algorithm.
2015-11-05 18:56:40 -08:00
Augie Fackler
4f77804eb0 treemanifest: rework lazy-copying code (issue4840)
The old lazy-copy code formed a chain of copied manifests with each
copy. Under typical operation, the stack never got more than a couple
of manifests deep and was fine. Under conditions like hgsubversion or
convert, the stack could get hundreds of manifests deep, and
eventually overflow the recursion limit for Python. I was able to
consistently reproduce this by converting an hgsubversion clone of
svn's history to treemanifests.

This may result in fewer manifests staying in memory during operations
like convert when treemanifests are in use, and should make those
operations faster since there will be significantly fewer noop
function calls going on.

A previous attempt (never mailed) of mine to fix this problem tried to
simply have all treemanifests only have a loadfunc - that caused
somewhat weird problems because the gettext() callable passed into
read() wasn't idempotent, so the easy solution is to have a loadfunc
and a copyfunc.
2015-09-25 22:54:46 -04:00
Augie Fackler
70820b78e7 manifest: rename treemanifest load functions to ease debugging
I'm hunting an infinite recursion bug at the moment, and having both
of these methods named just _load is muddying the waters slightly.
2015-09-25 17:18:28 -04:00
Augie Fackler
83cf95501a manifest: add id(self) to treemanifest __repr__
Also rename __str__ to __repr__ since that's what we really want for
pdb.
2015-09-25 17:17:36 -04:00
timeless@mozdev.org
4af6115a32 manifest: switch add() to heapq.merge (available in Py2.6+) 2015-09-04 05:57:58 -04:00
Martin von Zweigbergk
93d5f56103 manifest: use match.prefix() instead of 'not match.anypats()'
It seems clearer to check for what it is than what it isn't.
2015-05-19 11:16:20 -07:00
Martin von Zweigbergk
e9f7136157 treemanifest: lazily load manifests
Most operations on treemanifests already visit only relevant
submanifests. Notable examples include __getitem__, __contains__,
walk/matches with matcher, diff. By making submanifests lazily loaded,
we speed up all these operations.

The lazy loading is achieved by adding a _load() method that gets
defined where we currently eagerly parse the manifest. We make sure to
call it before any access to _dirs, _files or _flags.

Some timings on the Mozilla repo (with flat manifest timings for
reference):

hg cat -r . README.txt: 1.644s -> 0.096s (0.255s)
hg diff -r .^ -r .    : 1.746s -> 0.137s (0.431s)
hg files -r . python  : 1.508s -> 0.146s (0.335s)
hg files -r .         : 2.125s -> 2.203s (0.712s)
2015-04-09 17:14:35 -07:00
Martin von Zweigbergk
61642a4536 treemanifest: speed up commit using dirty flag
We currently avoid saving a treemanifest revision if it's the same as
one of it's parents. This is checked by comparing the generated text
for all three versions. Let's avoid that when possible by comparing
the nodeids for clean (not dirty) nodes.

On the Mozilla repo, this speeds up commit from 2.836s to 2.343s.
2015-05-18 21:31:40 -07:00
Martin von Zweigbergk
176b5e14d6 treemanifest: speed up diff by keeping track of dirty nodes
Since tree manifests have a nodeid per directory, we can avoid diffing
entire directories if they have the same nodeid. The comparison is
only valid for unmodified treemanifest instances, of course, so we
need to keep track of which have been modified. Therefore, let's add a
dirty flag to treemanifest indicating whether its nodeid can be
trusted. We set it when _files or _dirs is modified, and make diff(),
and its cousin filesnotin(), not descend into subdirectories that are
the same on both sides.

On the Mozilla repo, this speeds up 'hg diff -r .^ -r .' from 1.990s
to 1.762s. The improvement will be much larger when we start lazily
loading subdirectory manifests.
2015-02-26 08:16:13 -08:00
Drew Gottlieb
ca0e804650 match: remove unnecessary optimization where visitdir() returns 'all'
Match's visitdir() was prematurely optimized to return 'all' in some cases, so
that the caller would not have to call it for directories within the current
directory. This change makes the visitdir system less flexible for future
changes, such as making visitdir consider the match's include and exclude
patterns.

As a demonstration of this optimization not actually improving performance,
I ran 'hg files -r . media' on the Mozilla repository, stored as treemanifest
revlogs.

With best of ten tries, the command took 1.07s both with and without the
optimization, even though the optimization reduced the calls from visitdir()
from 987 to 51.
2015-05-06 15:59:35 -07:00
Martin von Zweigbergk
f569f9222c treemanifest: cache directory logs and manifests
Since manifests instances are cached on the manifest log instance, we
can cache directory manifests by caching the directory manifest
logs. The directory manifest log cache is a plain dict, so it never
expires; we assume that we can keep all the directories in memory.

The cache is kept on the root manifestlog, so access to directory
manifest logs now has to go through the root manifest log.

The caching will soon not be only an optimization. When we start
lazily loading directory manifests, we need to make sure we don't
create multiple instances of the log objects. The caching takes care
of that problem.
2015-04-10 23:12:33 -07:00
Augie Fackler
9c2e980a64 cleanup: use __builtins__.all instead of util.all 2015-05-16 14:34:19 -04:00
Martin von Zweigbergk
decbcc4c31 treemanifest: add --dir option to debug{revlog,data,index}
It should be possible to debug the submanifest revlogs without having
to know where they are stored (in .hg/store/meta/), so let's add a
--dir option for this purpose.
2015-04-12 23:51:06 -07:00
Martin von Zweigbergk
1acf6c029c treemanifest: store submanifest revlog per directory
With this change, when tree manifests are enabled (in .hg/requires),
commits will be written with one manifest revlog per directory. The
manifest revlogs are stored in
.hg/store/meta/$dir/00manifest.[id].

Flat manifests can still be read and interacted with as usual (they
are also read into treemanifest instances). The functionality for
writing treemanifest as a flat manifest to disk is still left in the
code; tests still pass with '_treeinmem=True' hardcoded.

Exchange is not yet implemented.
2015-04-13 23:21:02 -07:00
Martin von Zweigbergk
7d233de844 treemanifest: set requires at repo creation time, ignore config after
The very next changeset will start writing one revlog per directory
when tree manifests are enabled. That is backwards incompatible, so it
requires .hg/requires to be updated. Just like with generaldelta, we
want to update .hg/requires only when the repo is created. Updating
..hg/requires is bad for repos on shared disk. Instead, those who do
want to upgrade a repo to using treemanifest (or manifestv2, etc) can
run

  hg clone --config experimental.treemanifest repo clone


which will create a new repo with the requirement set. Unlike the case
of e.g. generaldelta, it will not rewrite the changesets, since tree
manifests hash differently.
2015-05-05 08:40:59 -07:00
Augie Fackler
504ab1d1d6 manifest: document return type of readfast()
I keep having to ponder out what readfast() means, and it always
surprises me. Document the return type in the docstring so that future
readers won't have to puzzle this out again.
2015-04-28 12:31:30 -04:00
Martin von Zweigbergk
35368e1596 treemanifest: extract parse method from constructor
When we start to lazily load submanifests, it will be useful to be
able to create an treemanifest instance before manifest data gets
parsed into it. To prepare for this, extract the parsing code from
treemanifest's constructor to a separate method.
2015-04-12 23:01:18 -07:00
Martin von Zweigbergk
c1ccc70121 manifest: duplicate call to addrevision()
When we start writing submanifests to their own revlogs, we will not
want to write a new revision for a directory if there were no changes
to it. To prepare for this, duplicate the call to addrevision() and
move them earlier where they can more easily be avoided.
2015-04-12 14:37:55 -07:00
Martin von Zweigbergk
63f47478d7 treemanifest: separate flags for trees in memory and trees on disk
When we start writing tree manifests with one manifest revlog per
directory, it will still be nice to be able to run tests using tree
manifests in memory but writing to a flat manifest to a single
revlog. Let's break the current '_usetreemanifest' flag on the revlog
into '_treeinmem' and '_treeondisk'. Both are populated from the same
config, but after this change, one can temporarily hard-code
_treeinmem=True to see that tests still pass.
2015-04-10 18:54:33 -07:00
Martin von Zweigbergk
320b8b5298 manifestdict: drop empty-string argument when creating empty manifest
manifestdict() creates an empty manifestdict, so let's consistently
use that instead of explicitly parsing an empty string (which does
result in an empty manifest).
2015-04-10 18:13:01 -07:00
Martin von Zweigbergk
4c187a8462 manifestdict: extract condition for _intersectfiles() and use for walk()
The condition on which manifestdict.matches() and manifestdict.walk()
take the fast path of iterating over files instead of the manifest, is
slightly different. Specifically, walk() does not take the fast path
for exact matchers and it does not avoid taking the fast path when
there are more than 100 files. Let's extract the condition so we don't
have to maintain it in two places and so walk() can gain these two
missing pieces of the condition (although there seems to be no current
caller of walk() with an exact matcher).
2015-04-08 09:38:09 -07:00
Martin von Zweigbergk
2592408744 manifestdict.walk: remove now-redundant check for match.files()
When checking whether we can take the fast path of iterating over
matcher files instead of manifest files, we check whether
match.files() is non-empty. However, now that return early for
match.always(), it can only be empty when there are only
include/exclude patterns, but in that case anypats() will be True, so
it's already covered. This makes manifestdict.walk() more similar to
manifestdict.matches().
2015-04-07 22:40:25 -07:00
Martin von Zweigbergk
67897f5b0b manifest.walk: special-case match.always() for speed
This cuts down the run time of

  hg files -r . > /dev/null

from ~0.850s to ~0.780s on the Firefox repo. Note that
manifest.matches() already has the corresponding optimization.
2015-04-07 21:08:23 -07:00
Martin von Zweigbergk
fc5772e190 manifest.walk: use return instead of StopIteration in generator
Using "return" within a generator is supposedly more Pythonic than
raising StopIteration.
2015-04-07 22:36:17 -07:00
Drew Gottlieb
6d2651f8ba treemanifest: optimize treemanifest._walk() to skip directories
This makes treemanifest.walk() not visit submanifests that are known not to
have any matching files. It does this by calling match.visitdir() on
submanifests as it walks.

This change also updates largefiles to be able to work with this new behavior
in treemanifests. It overrides match.visitdir(), the function that dictates
how walk() and matches() skip over directories.

The greatest speed improvements are seen with narrower scopes. For example,
this commit speeds up the following command on the Mozilla repo from 1.14s
to 1.02s:
  hg files -r . dom/apps/

Whereas with a wider scope, dom/, the speed only improves from 1.21s to 1.13s.

As with similar a similar optimization to treemanifest.matches(), this change
will bring out even bigger performance improvements once treemanifests are
loaded lazily. Once that happens, we won't just skip over looking at
submanifests, but we'll skip even loading them.
2015-04-07 15:18:52 -07:00
Martin von Zweigbergk
89a5bacd48 manifest.walk: join nested if-conditions
This makes it more closely match the similar condition in
manifestdict.matches().
2015-04-07 22:35:44 -07:00
Martin von Zweigbergk
eff6f72dc8 manifestdict: inline _intersectfiles()
The _intersectfiles() method is only called from one place, it's
pretty short, and its caller has to be aware when it's appropriate to
call it (when the number of files in the matcher is not too large), so
let's inline it.
2015-04-08 10:01:31 -07:00
Martin von Zweigbergk
1430a21750 manifestdict._intersectfiles: avoid one level of property indirection
We have already bothered to extract "lm = self._lm", so let's use "lm"
where possible.
2015-04-08 10:03:59 -07:00
Martin von Zweigbergk
0d47282240 manifestdict.matches: avoid name 'lm' for a not-lazymanifest 2015-04-08 10:06:05 -07:00
Drew Gottlieb
d67091a36f treemanifest: refactor treemanifest.walk()
This refactor is a preparation for an optimization in the next commit. This
introduces a recursive element that recurses each submanifest. By using a
recursive function, the next commit can avoid walking over some subdirectories
altogether.
2015-04-07 15:18:52 -07:00
Drew Gottlieb
ee2eebcb93 manifest: move changectx.walk() to manifests
The logic of walking a manifest to yield files matching a match object is
currently being done by context, not the manifest itself. This moves the walk()
function to both manifestdict and treemanifest. This separate implementation
will also permit differing, optimized implementations for each manifest.
2015-04-07 15:18:52 -07:00
Drew Gottlieb
d01f641c75 treemanifest: further optimize treemanifest.matches()
The matches function was previously traversing all submanifests to look for
matching files, even though it was possible to know if a submanifest won't
contain any matches.

This change adds a visitdir function on the match object to decide quickly if
a directory should be visited when traversing. The function also decides if
_all_ subdirectories should be traversed.

Adding this logic as methods on the match object also makes the logic
modifiable by extensions, such as largefiles.

An example of a command this speeds up is running
  hg status --rev .^ python/
on the Mozilla repo with the treemanifest experiment enabled.
It goes from 2.03s to 1.85s.

More improvements to speed from this change will happen when treemanifests are
lazily loaded. Because a flat manifest is still loaded and then converted
into treemanifests, speed improvements are limited.

This change has no negative effect on speed. For a worst-case example, this
command is not negatively impacted:
  hg status --rev .^ 'relglob:*.js'
on the Mozilla repo. It goes from 2.83s to 2.82s.
2015-04-06 10:51:53 -07:00
Drew Gottlieb
901ac5e726 util: move dirs() and finddirs() from scmutil to util
An upcoming commit requires that match.py be able to call scmutil.dirs(), but
when match.py imports scmutil, a dependency cycle is created. This commit
avoids the cycle by moving dirs() and its related finddirs() function from
scmutil to util, which match.py already depends on.
2015-04-06 14:36:08 -07:00
Martin von Zweigbergk
eeace59f46 treemanifest: disable readdelta optimization
When tree manifests are stored with one revlog per directory and
loaded lazily, it's unclear how much readdelta will help. If only a
few files change, then only a small part of the full manifest will be
loaded, and the delta chains should also be shorter for tree
manifests. Therefore, let's disable readdelta for tree manifests for
now.
2015-03-10 09:57:42 -07:00
Martin von Zweigbergk
ebd2a39ab3 manifestv2: add support for writing new manifest format
If .hg/requires has 'manifestv2', the manifest will be written using
the new format.
2015-03-31 14:01:33 -07:00
Martin von Zweigbergk
c5433d6da0 manifestv2: add support for reading new manifest format
The new manifest format is designed to be smaller, in particular to
produce smaller deltas. It stores hashes in binary and puts the hash
on a new line (for smaller deltas). It also uses stem compression to
save space for long paths. The format has room for metadata, but
that's there only for future-proofing. The parser thus accepts any
metadata and throws it away. For more information, see
http://mercurial.selenic.com/wiki/ManifestV2Plan.

The current manifest format doesn't allow an empty filename, so we use
an empty filename on the first line to tell a manifest of the new
format from the old. Since we still never write manifests in the new
format, the added code is unused, but it is tested by
test-manifest.py.
2015-03-27 22:26:41 -07:00
Martin von Zweigbergk
e931247479 manifestv2: set requires at repo creation time
While it should be safe to switch to the new manifest format on an
existing repo, let's keep it simple for now and make the configuration
have any effect only at repo creation time. If the configuration is
enabled then (at repo creation), we add an entry to requires and read
that instead of the configuration from then on.
2015-03-31 22:45:45 -07:00
Drew Gottlieb
f02ce7c1fd treemanifest: make treemanifest.matches() faster
By converting treemanifest.matches() into a recursively additivie operation,
it becomes O(n).

The old matches function made a copy of the entire manifest and deleted
files that didn't match. With tree manifests, this was an O(n log n) operation
because del() was O(log n).

This change speeds up the command
  "hg status --rev .^ 'relglob:*.js'
on the Mozilla repo, now taking 2.53s, down from 3.51s.
2015-03-30 18:10:59 -07:00
Drew Gottlieb
5843babb9b treemanifest: add treemanifest._isempty()
During operations that involve building up a new manifest tree, it will be
useful to be able to quickly check if a submanifest is empty, and if so, to
avoid including it in the final tree. Doing this check lets us avoid creating
treemanifest structures that contain any empty submanifests.
2015-03-30 17:21:49 -07:00
Drew Gottlieb
84f08f1d56 treemanifest: remove treemanifest._intersectfiles()
In preparation for the optimization in the following commit, this commit
removes treemanifest.matches()'s call to _intersectfiles(), and removes
_intersectfiles() itself since it's unused at this point.
2015-03-27 13:16:13 -07:00
Martin von Zweigbergk
7dfcb254e9 manifestv2: implement slow readdelta() without revdiff
For manifest v2, revlog.revdiff() usually does not provide enough
information to produce a manifest. As a simple workaround, implement
readdelta() by reading both the old and the new manifest and use
manifest.diff() to find the difference. This is several times slower
than the current readdelta() for v1 manifests, but there seems to be
no other simple option, and this is still much faster than returning
the full manifest (at least for verify).
2015-03-27 20:41:30 -07:00
Martin von Zweigbergk
b7bfa722d1 manifestv2: disable fastdelta optimization
We may add support for the fastdelta optimization for manifest v2 at a
later point, but let's disable it for now, so we don't have to
implement it right away.
2015-03-27 17:07:24 -07:00
Martin von Zweigbergk
16d87fc88d manifestv2: add (unused) config option
With tree manifests, hashes will change anyway, so now is a good time
to also take up the old plans of a new manifest format. While there
should be little or no reason to use tree manifests with the current
manifest format (v1) once the new format (v2) is supported, we'll try
to keep the two dimensions (flat/tree and v1/v2) separate.

In preparation for adding a the new format, let's add configuration
for it and propagate that configuration to the manifest revlog
subclass. The new configuration ("experimental.manifestv2") says in
what format to write the manifest data. We may later add other
configuration to choose how to hash it, either keeping the v1 hash for
BC or hashing the v2 content.

See http://mercurial.selenic.com/wiki/ManifestV2Plan for more details.
2015-03-27 16:19:44 -07:00
Martin von Zweigbergk
a7479ae566 manifest: extract method for creating manifest text
Similar to the previous change, this one extracts a method for
producing a manifest text from an iterator over (path, node, flags)
tuples.
2015-03-27 15:37:46 -07:00
Martin von Zweigbergk
a16ddaed87 manifest: extract method for parsing manifest
By extracting a method that generates (path, node, flags) tuples, we
can reuse the code for parsing a manifest without doing it via a
_lazymanifest like treemanifest currently does. It also prepares for
parsing the new manifest format.

Note that this makes parsing into treemanifest slower, since the
parsing is now always done in pure Python. Since treemanifests will be
expected (or even forced) to be used only with the new manifest
format, parsing via _lazymanifest was not an option anyway.
2015-03-27 15:02:43 -07:00
Martin von Zweigbergk
35cf546efe _lazymanifest: drop unnecessary call to sorted()
The entries returned from _lazymanifest.iterentries() are already
sorted.
2015-03-27 20:55:54 -07:00
Drew Gottlieb
d2ab66f723 manifest: make manifest.intersectfiles() internal
manifest.intersectfiles() is just a utility used by manifest.matches(), and
a future commit removes intersectfiles for treemanifest for optimization
purposes.

This commit makes the intersectfiles methods on manifestdict and treemanifest
internal, and converts its test to a more generic testMatches(), which has the
exact same coverage.
2015-03-30 10:43:52 -07:00
Martin von Zweigbergk
c7787f3a4e treemanifest: drop 22nd byte for consistency with manifestdict
When assigning a 22-byte hash to a nodeid in a manifest, manifestdict
drops the 22nd byte, while treemanifest keeps it. Let's make
treemanifest drop the 22nd byte as well.
2015-03-26 09:42:21 -07:00
Martin von Zweigbergk
8874cd66a5 match: add isexact() method to hide internals
Comparing a function reference seems bad.
2014-10-29 08:43:39 -07:00
Martin von Zweigbergk
db97ff212f treemanifest: make hasdir() faster
Same rationale as the previous change.
2015-03-16 16:01:16 -07:00
Martin von Zweigbergk
6aeabac9d6 treemanifest: make filesnotin() faster
Same rationale as the previous change.
2015-03-03 13:50:06 -08:00
Martin von Zweigbergk
a417c46247 treemanifest: make diff() faster
Containment checking is slower in treemanifest than it is in
manifestdict, making the current diff algorithm O(n log n). By
traversing both treemanifests in parallel, we can make it O(n). More
importantly, once we start lazily loading submanifests, we will be
able to easily skip entire submanifest if they have the same nodeid.
2015-02-19 17:13:35 -08:00
Martin von Zweigbergk
4c03dc48c7 treemanifest: store directory path in treemanifest nodes
This leads to less concatenation while iterating, and it's useful for
debugging.
2015-02-23 10:57:57 -08:00
Martin von Zweigbergk
8790c2008e treemanifest: add configuration for using treemanifest type
This change adds boolean configuration option
experimental.treemanifest. When the option is enabled, manifests are
parsed into the new treemanifest type.

Tests can be now run using treemanifest by switching the config option
default in localrepo._applyrequirements(). Tests pass even when made
to randomly choose between manifestdict and treemanifest, suggesting
that the two types produce identical manifests (so e.g. a manifest
revlog entry written from a treemanifest can be parsed by the
manifestdict code).
2015-03-19 11:07:57 -07:00
Martin von Zweigbergk
4ab8e2d4fe treemanifest: create treemanifest class
There are a number of problems with large and flat manifests. Copying
from http://mercurial.selenic.com/wiki/ManifestShardingPlan:

 * manifest too large for RAM

 * manifest resolution too much CPU (long delta chains)

 * committing is slow because entire manifest has to be hashed

 * impossible for narrow clone to leave out part of manifest as all is
   needed to calculate new hash

 * diffing two revisions involves traversing entire subdirectories
   even if identical

This is a first step in a series introducing a manifest revlog per
directory.

This change adds a new manifest class: treemanifest, which is a tree
where each node has a dict of files (nodeids), a dict of flags, and a
dict of subdirectories (treemanifests). So far, it behaves just like
manifestdict, but it will later help us write one manifest revlog per
directory. The new class is still unused; it will be used after the
next change.

The code is not yet optimized. Running with it (see below) makes most
or all operations slower. Once we start storing manifest revlogs for
every directory, it should be possible to make many of these
operations much faster. The fastdelta() optimization has been
intentionally not implemented for the treemanifests. We can implement
it later if necessary.

All tests pass when run with the following patch (and without, of
couse):

  --- a/mercurial/manifest.py     Thu Mar 19 11:08:42 2015 -0700
  +++ b/mercurial/manifest.py     Thu Mar 19 11:15:50 2015 -0700
  @@ -596,7 +596,7 @@ class manifest(revlog.revlog):
               return None, None

       def add(self, m, transaction, link, p1, p2, added, removed):
  -        if p1 in self._mancache:
  +        if False and p1 in self._mancache:
               # If our first parent is in the manifest cache, we can
               # compute a delta here using properties we know about the
               # manifest up-front, which may save time later for the
  @@ -626,3 +626,5 @@ class manifest(revlog.revlog):
           self._mancache[n] = (m, arraytext)

           return n
  +
  +manifestdict = treemanifest
2015-03-19 11:08:42 -07:00
Durham Goode
e4183e1549 manifest: avoid intersectfiles for matches > 100 files
Previously we tried to avoid manifest.intersectfiles for exact matches
with less than 100 files. However, when the left side of the "or" is false,
the right side gets evaluated, of course, and the evaluation of "util.all(fn
in self for fn in files)" is both costly in itself, and likely to be true,
causing intersectfiles() to be called after all. Fix this by moving the
check for less than 100 files outside of the "or" expression, thereby also
making it apply for a non-exact matcher, should one be passed in.
2015-03-18 15:59:45 -07:00
Matt Mackall
ce4c2d6512 manifest: speed up matches for large sets of files
If the number of files being matched is large, the bisection overhead
can dominate, which caused a performance regression for revert --all
and histedit. This introduces a (fairly arbitrary) cross-over from
using bisections to bulk search.
2015-03-18 13:37:18 -05:00
Drew Gottlieb
31ae70b088 manifest: add manifestdict.hasdir() method
Allows for alternative implementations of manifestdict to decide if a directory
exists in whatever way is most optimal.
2015-03-13 15:25:01 -07:00
Drew Gottlieb
30b3b3df39 manifest: add dirs() to manifestdict
Manifests should have a method of accessing its own dirs, not just the
context that references the manifest. This makes it easier for other
optimized versions of manifests to compute their own dirs in the most efficient
way.
2015-03-13 15:19:54 -07:00
Martin von Zweigbergk
ce0723ee16 lazymanifest: make __iter__ generate filenames, not 3-tuples
The _lazymanifest type(s) behave very much like a sorted dict with
filenames as keys and (nodeid, flags) as values. It therefore seems
surprising that its __iter__ generates 3-tuples of (path, nodeid,
flags). Let's make it match dict's behavior of generating the keys
instead, and add a new iterentries method for the 3-tuples. With this
change, the "x" in "if x in lm" and "for x in lm" now have the same
type (a filename string).
2015-03-12 18:18:29 -07:00
Martin von Zweigbergk
e77f997074 lazymanifest: fix pure hg iterkeys()
I broke pure hg when I just added iterkeys() to the native version in
461b7a7f595b. I forgot to make the pure version sorted. Fix it.
2015-03-12 18:53:44 -07:00
Martin von Zweigbergk
2421174b13 lazymanifest: add iterkeys() method
So we don't have to iteratate over (path, node, flags) tuples only to
throw away the node and flags.
2015-03-11 13:46:15 -07:00
Martin von Zweigbergk
8c90477dfa manifest: rewrite find(node, f) in terms of read(node)
Since find() now always works with a full manifest, we can simplify by
calling read() to give us that manifest. That way, we also populate
the manifest cache. However, now that we no longer parse the manifest
text into a Python type (thanks, lazymanifest/Augie), the cost of
parsing (scanning for newlines, really) is small enough that it seems
generally drowned by revlog reading.
2015-03-11 08:28:56 -07:00
Martin von Zweigbergk
a3602d8ea4 manifest: don't let find() look inside manifestdict
The find() method is currently implemented by looking inside the _lm
field of the manifest dict. Future manifests types (tree manifests)
may not have such a field, so add a method for getting to the data
instead.
2015-03-10 16:26:13 -07:00
Augie Fackler
f060f96a67 manifest: use custom C implementation of lazymanifest
This version is actually lazy, unlike the pure-python version. The
latter could stand to be optimized if anyone actually wants to use it
seriously. I put no work into it.

Before any of my related changes on mozilla-central:

perfmanifest tip
! wall 0.268805 comb 0.260000 user 0.260000 sys 0.000000 (best of 37)
perftags
! result: 162
! wall 0.007099 comb 0.000000 user 0.000000 sys 0.000000 (best of 401)
perfstatus
! wall 0.415680 comb 0.420000 user 0.260000 sys 0.160000 (best of 24)
hgperf export tip
! wall 0.142118 comb 0.140000 user 0.140000 sys 0.000000 (best of 67)

after all of my changes on mozilla-central:

./hg:
perfmanifest tip
! wall 0.232640 comb 0.230000 user 0.220000 sys 0.010000 (best of 43)
perftags
! result: 162
! wall 0.007057 comb 0.010000 user 0.000000 sys 0.010000 (best of 395)
perfstatus
! wall 0.415503 comb 0.420000 user 0.280000 sys 0.140000 (best of 24)
hgperf export tip
! wall 0.025096 comb 0.030000 user 0.030000 sys 0.000000 (best of 102)

so it's no real change in performance on perf{manifest,tags,status},
but is a huge win on 'hgperf export tip'.

There's a little performance work that could still be done here:
fastdelta() could be done significantly more intelligently by using
the internal state of the lazymanifest type in C, but that seems like
good future work.
2015-03-06 21:29:47 -05:00
Augie Fackler
d1ec34adfe manifest: split manifestdict into high-level and low-level logic
The low-level logic type (_lazymanifest) matches the behavior of the C
implementation introduced in a5f1bccd. A future patch will use that
when available.
2015-03-07 12:04:39 -05:00