Some of the localrepo property caches must be computed unfiltered and
stored globally. Some others must see the filtered version and store data
relative to the current filtering.
This changeset introduces two classes `unfilteredpropertycache`
and `filteredpropertycache` for this purpose. A new function
`hasunfilteredcache` is introduced for unambiguous checking for cached
values on unfiltered repos.
A few tweaks are made to the property cache class to allow overriding
the way the computed value is stored on the object.
Some logic relative to _tagcaches is cleaned up in the process.
The logic recently added to `bookmark.validdest` uses data about obsolete
changesets to see if a bookmark destination is valid. Obsolete changesets
are likely to be filtered, so we need to work on an unfiltered repository.
Computation of common changesets during push needs to be done on the
widest set possible. An unfiltered version of the repo is kept for
discovery and various revset calls. The discovery code itself enforces
the filtering of unserved outgoing changeset.
During changectx __init__ the dirstate's parents MAY be checked. If
the repo is filtered, this check will complain "working directory has
unknown parents" even if the parents are perfectly known.
This may happen when the repo is used for serving and the dirstate has
parents that are secret, as those secret changesets will be filtered.
Strip is a "write" operation that needs to be aware of the whole repo's
content before destroying changesets.
Only the low level function is altered. The top level command will still
process its argument filtered (if any filtering is in place).
All obsolescence related sets need to be computed on the full unfiltered
version of the repository, in particular because several of them
(obsolete, extinct) are used to compute the hidden revisions.
On a filtered repo, revset predicates related to these sets will be
properly filtered because of revset's own pre-filtering.
The current branchcache construction is not aware of filtering. We keep
the status quo, ensuring that the branch cache logic is computed as
before: without any filtering.
This decorator ensure the method in run on an unfiltered version of the
repository. See follow-up commit for details.
This decorator is not named `unfiltered` because it would clash with the
`unfilteredmethod` on `localrepo` itself.
This commit is part of the changelog level filtering effort. It returns
the main "unfiltered" version of a repo-like object. For localrepo this
means the same localrepo object. But this method will be overwritten
by the filtered versions of a repository to return the core unfiltered
version of the repo.
Introducing this simple method first allows later commits to prepare
for the use of a filtered version of a repository.
A new repo method is added because a lot of users may call it. At the
end of this series of commits, about 40 calls exist in core and hgext.
For changelog level filtering to take effect it need to be used for any
iteration.
This changeset removes usage of `range` and `xrange` that survived the first
pass.
During merge of branches, it is useful to compare merge results against
the two parents. This change adds this support to hgweb. To specify
which parent to compare to, use rev/12300:12345 where 12300 is a
parent changeset number. Two links are added to changeset web page so
that one can choose which parent to compare to.
scmutil.checknewlabel takes a repo object as its first argument.
When the call to this function was added in 4d438984605c, the
first argument was mistakenly set to 'None'.
Starting with 049792af94d6, users are no longer able to update a
working copy to a branch named with a "bad" character (such as ':').
Prior to v2.4, it was possible to create branch names using "bad"
characters, so this breaks backwards compatibility.
Mercurial must allow users to update to existing branches with bad
names. However, it should continue to prevent the creation of new
branches with bad names.
A test was added to confirm that 'hg update' works as expected. The
test uses a bundled repo that was created with an earlier version of
Mercurial.
Looks like there are instances where sys.stdout/stderr contain file
handles that are invalid. We should be tolerant of this for hook I/O
redirection, as our primary concern is not garbling our own output stream.
The old str-based += collector performed very nicely on Linux, but
turns out to be quadratically expensive on Windows, causing
chunkbuffer to dominate in profiles.
This list-based version has been measured to significantly improve
performance with large chunks on Windows, with negligible overall
overhead on Linux (though microbenchmarks show it to be about 50% slower).
This may increase memory overhead where += didn't behave quadratically. If we
want to gather up 1G of data to join, we temporarily have 1G in our
list and 1G in our string.
When commiting to a repo with lots of files (>170000),
manifest.py:addlistdelta takes some time because it's editing a large
array many times. Changing it to build a new array instead of editing
the old one saves around 0.04 seconds on a 1.64 second commit. A 2.5%
gain.
The gain here is pretty minor, but it was blatantly at the top of the
profiler report and the fix is straight forward.
I tested it by comparing the arrays produced by the new and old logic
while running all of the tests.
If a template iterator is implemented with generator, the iterator is exhau=
sted
after we use it. This leads to undesired behavior in template. This chang=
e
converts a generator-based iterator to list-based iterator when template en=
gine
first detects a generator-based iterator. All future usages of iterator wi=
ll
use list instead.
When commiting to a repo with lots of history (>400000 changesets)
checking the results of revset.py:descendants against the subset takes
some time. Since the subset equals the entire changelog, the check
isn't necessary. Avoiding it in that case saves 0.1 seconds off of
a 1.78 second commit. A 6% gain.
We use the length of the subset to determine if it is the entire repo.
There is precedence for this in revset.py:stringset.
When commiting to a repo with lots of history (>400000 changesets)
the filteredrevs check (added with 373606589de5) in changelog.py
takes a bit of time even if the filteredrevs set is empty. Skipping
the check in that case shaves 0.36 seconds off a 2.14 second commit.
A 17% gain.
'*' causes the resulting RE to match 0 or more repetitions of the preceding RE:
>>> bool(re.search('.*', ''))
>>> True
This causes an infinite loop because currently we're only checking if there was
a match without looking at where we are in the searched string.
We often need to perform rev iteration in reverse order. This
changeset makes it possible to do so, in order to avoid costly reverse
or reversed() calls later.
This also speeds up other commands that use findmissing, like
incoming and merge --preview. With a large linear repository (>400000
commits) and with one incoming changeset, incoming is sped up from
around 4-4.5 seconds to under 3.
One of the major reasons rebase is slow in large repositories is
the computation of the detach set: the set of ancestors of the
changesets to rebase not in the destination parent. This is currently
done via a revset that does two walks all the way to the root of
the DAG. Instead of doing that, to find ancestors of a set <revs>
not in another set <common> we walk up the tree in reverse revision
number order, maintaining sets of nodes visited from <revs>, <common>
or both.
For the common case where the sets are close both topologically and
in revision number (relative to repository size), this has been
found to speed up rebase by around 15-20%. When the nodes are farther
apart and the DAG is highly branching, it is harder to say which
would win.
Here's how long computing the detach set takes in a linear repository
with over 400000 changesets, rebasing near tip:
Rebasing across 4 changesets
Revset method: 2.2s
New algorithm: 0.00015s
Rebasing across 250 changesets
Revset method: 2.2s
New algorithm: 0.00069s
Rebasing across 10000 changesets
Revset method: 2.4s
New algorithm: 0.019s
The bisect command does not have an option to limit itself only to
subdirectories, but it's possible to use revsets for the --skip option
for the same effect. Given the relative obscurity of revsets, it helps
to have this as another example for bisect.
Files in a subrepo were overwritten on update. But this should only happen on a
clean update (example: -C is specified).
Use the overwrite parameter introduced for svn subrepos in e3640daa4703 to
decide whether to merge changes (as update) or remove them (as clean).
The new function hg.updaterepo is intruduced to keep all update calls in hg.
test-subrepo.t is extended to test if an untracked file is overwritten
(issue3276). (Update -C is already tested in many places.)
The first two chunks are debugging output which has changed. (Because overwrite
is not always true anymore for subrepos)
All other tests still pass without any change.
Before this patch, case-folding collision is checked simply between
manifests of each merged revisions.
So, files may be considered as colliding each other, even though one
of them is already deleted on one of merged branches: in such case,
merge causes deleting it, so case-folding collision doesn't occur.
This patch checks whether both of files colliding each other still
remain after merge or not, and ignores collision if at least one of
them is deleted by merge.
In the case that one of colliding files is deleted on one of merged
branches and changed on another, file is considered to still remain
after merge, even though it may be deleted by merge, if "deleting" of
it is chosen in "manifestmerge()".
This avoids fail to merge by case-folding collisions after choices
from "changing" and "deleting" of files.
This patch adds only tests for "removed remotely" code paths in
"_remains()", because other ones are tested by existing tests in
"test-casecollision-merge.t".