I'm not entirely happy with using a trailing / on a "file" entry for
transferring a treemanifest. We've discussed putting some flags on
each file header[0], but I'm unconvinced that's actually any better:
if we were going to add another feature to the cg format we'd still be
doing a version bump anyway to cg4, so I'm inclined to not spend time
coming up with a more sophisticated format until we actually know what
the next feature we want to stuff in a changegroup will be.
Test changes outside test-treemanifest.t are only due to the new CG3
bundlecap showing up in the wire protocol.
Many thanks to adgar@google.com and martinvonz@google.com for helping
me with various odd corners of the changegroup and treemanifest API.
0: It's not hard refactoring, nor is it a lot of work. I'm just
disinclined to do speculative work when it's not clear what the
customer would actually be.
We currently use 'd' to indicate that a manifest entry is a
directory. Let's switch to 't', since that's not a valid hex digit and
therefore easier to spot in the raw manifest data.
This will break any existing repos with tree manifests, but it's still
an experimental feature and there are probably only a few test repos
in existence with 'd' flags.
In the most complex case, we try using the incoming delta base, then
we try both parents, and then we try the previous revlog entry. If
none of these result in a good delta, we natually use the null
revision as base. However, we sometimes consider the nullrev before we
have exhausted our other options. Specifically, when both parents are
null, we use the nullrev as delta base if it produces a good delta
(according to _isgooddelta()), and we fail to try the previous revlog
entry as delta base. After e60126c6093d (addrevision: use general
delta when the incoming base delta is bad, 2015-12-01), it can also
happen for non-merge commits when the incoming delta is not good.
The Firefox repo (from many months back) shrinks a tiny bit with this
patch: from 1.855GB to 1.830GB (1.4%). The hg repo itself shrinks even
less: by less than 0.1%. There may be repos that get larger instead.
This undoes the unexplained test change in e60126c6093d.
For globs like 'foo/ba?', match._roots() will return 'foo'. Since
visitdir(), excludes directories in the excluded roots, it would skip
the entire foo directory. This is incorrect, since 'foo/ba?' doesn't
mean that everything in foo/ should be exluded. Note that visitdir()
is called only from the treemanifest class, so this only affects tree
manifests. Fix by adding roots to the set of excluded roots only if
there are no excluded patterns.
Since 'glob' is the default pattern type for globs, we also need to
update some -X patterns in the tests to be of 'path' type to take
advantage of the visitdir tricks. For consistency, also update the -I
patterns.
It seems a little unfortunate that 'foo' in 'hg files -X foo' is
considered a pattern because of the implied 'glob' type, but improving
that is left for another day.
match.visitdir() used to only look at the match's primary pattern roots to
decide if a treemanifest traverser should descend into a particular directory.
This change logically makes visitdir also consider the match's include and
exclude pattern roots (if applicable) to make this decision.
This is especially important for situations like using narrowhg with multiple
treemanifest revlogs.
Most operations on treemanifests already visit only relevant
submanifests. Notable examples include __getitem__, __contains__,
walk/matches with matcher, diff. By making submanifests lazily loaded,
we speed up all these operations.
The lazy loading is achieved by adding a _load() method that gets
defined where we currently eagerly parse the manifest. We make sure to
call it before any access to _dirs, _files or _flags.
Some timings on the Mozilla repo (with flat manifest timings for
reference):
hg cat -r . README.txt: 1.644s -> 0.096s (0.255s)
hg diff -r .^ -r . : 1.746s -> 0.137s (0.431s)
hg files -r . python : 1.508s -> 0.146s (0.335s)
hg files -r . : 2.125s -> 2.203s (0.712s)
It should be possible to debug the submanifest revlogs without having
to know where they are stored (in .hg/store/meta/), so let's add a
--dir option for this purpose.
With this change, when tree manifests are enabled (in .hg/requires),
commits will be written with one manifest revlog per directory. The
manifest revlogs are stored in
.hg/store/meta/$dir/00manifest.[id].
Flat manifests can still be read and interacted with as usual (they
are also read into treemanifest instances). The functionality for
writing treemanifest as a flat manifest to disk is still left in the
code; tests still pass with '_treeinmem=True' hardcoded.
Exchange is not yet implemented.