When commit is followed by strip (qrefresh), phasecache contains nodes that were
removed from the changelog. Since phasecache is filecached with .hg/store/phaseroots
which doesn't change as a result of stripping, we have to filter it manually.
If we don't write it immediately, the next time it is read from disk the nodes
will be filtered again. That's what happened before, but there's no reason not
to write it immediately.
The change in test-keyword.t is caused by the above.
Nobody overwrite the `_cacheabletip` any more. We always update the cache for
the whole repo and write it to disk (or at list try to). The `updatecache` code
is simplied to remove the double phase logic associated with _cacheabletip.
We do not want anything computed with the bundle overlay to be written back in
the repo. Such write will likely contains invalid data.
The short terms goal of this change is to drop use of `_cacheabletip` in bundle
repo.
On `filectx`, linkrev may point to any revision in the repository. When the
repository is filtered this may lead to `filectx` trying to build `changectx`
for filtered revision. In such case we fallback to creating `changectx` on the
unfiltered version of the reposition. This fallback should not be an issue
because `changectx` from `filectx` are not used in complex operation that
care about filtering. It is complicated to work around the issue in a
clearer way as code raising such `filectx` rarely have access to the
repository directly.
Linkrevs create a lot of issue with filtering. It is stored in revlog entry at
creation time and never changed. Nothing prevent the changeset revision pointed
to become filtered. Several bogus behavior emerge from such situation. Those
bugs are complex to solve and not part of the current effort to install
filtering. This changeset is simple hack that prevent plain crash in favor on
minor misbehavior without visible effect.
This "hack" is longly documented in to code itself to help people that would
look at it in the future.
The phase command have some logic to report change made. We ensure this logic
run unfiltered.
With --force the command can change phase of a changeset for public to draft.
Such change can lead to obsolescence marker to apply again and the changeset to
be "hidden". If we do not run the logic unfiltered it could failed to fetch the
phase of a newly filtered changeset.
Problem:
getremotechanges would return the 'other' repo if nothing was incoming and
there thus wasn't any bundle to base the repo on. The 'other' could be a http
peer which only implement the functionality available over the http protocol.
Transplant could thus fail with
TypeError: argument of type 'httppeer' is not iterable
Solution:
Return the local repo instead of the remote peer if there is no reason to place
a bundlerepo on top of the local repo.
A type mismatch caused the search for the other head to fail. The code is
fragile, and instead it ended up using the 'first' bookmark head, but the
ordering is undefined and it could thus randomly use the wrong bookmarkhead
and fail with:
$ hg up -q -C e@diverged
$ hg merge
abort: merging with a working directory ancestor has no effect
This is similar to the subscribe links that already exist in other templates.
Rather than the usual RSS and Atom links a single feed icon linking to the
atom-log is shown.
The `_branchcache` attribute is turned into a dictionary. Key are filter name and
value is a `branchcache` object. Unfiltered version is cached as `None` filter.
The attribute is renamed to `_branchcaches` to avoid confusion with the previous
one. Both old and new contents are dictionary even if their contents are
different. I prefer possible extension code to crash right away instead of just
messing the wrong dictionary.
As all different caches work isolated to each other, this code keeps the
previous behavior of using the unfiltered cache we nothing is filtered. This
is a cheap way to have cache collaborate and nullify potential impact in the
default case.
Now that we have a third part for the cache key we need to write and read it on
disk. It is only written when there is filtered revision. This keep the format
compatible with older version.
Notes that, at this state, filtered repository does not use any disk caches yet.
cmdutil.getgraphlogrevs does a ton of work trying to build a graphlog lazily,
and then cmdutil.graphlog comes along and destroys all of that.
graphmod.dagwalker requires that it be given the full list of revs upfront so
that it can perform filtering and tests against known revs.
For a repository with over 400,000 changesets, this speeds up graphlog by
around 0.02 seconds (~20% with a small limit).
Tracking tipnode and tiprev is not enough to ensure validaty of the cache as
they do not help distinguish a cache that ignored various revisions below
tiprev.
To detect such difference, we build a hash of all ignored revisions. This hash
is then used when checking the validity of a cache for a repo.
With revision filtering the effective revision number of "tip" may be lower than:
len(changelog) - 1
We now use a more correct version preventing useless writing on disk in some
case.
Obsolescence marker can represent this situation just fine. The old
version is marked as precursor of the new changeset. All its
descendants become "unstable".
If obsolescence is not enabled we keep the current behavior of
aborting. This new behavior only applies when obsolete is
enabled and is subject to future discussion and changes.
At this moment, the cache is invalid, and will be thrown away.
Later the strip function will call the `localrepo.destroyed` method
that will update the branchmap cache.
The inverse of a rename is a rename, but the inverse of a copy is not a copy.
Presenting it as such -- in particular, stuffing it into the same dict as real
copies -- causes bugs because other code starts believing the inverse copies
are real.
The only test whose output changes is test-mv-cp-st-diff.t. When a backwards
status -C command is run where a copy is involved, the inverse copy (which was
hitherto presented as a real copy) is no longer displayed.
Keeping track of inverse copies is useful in some situations -- composability
of diffs, for example, since adding "a" followed by an inverse copy "b" to "a"
is equivalent to a rename "b" to "a". However, representing them would require
a more complex data structure than the same dict in which real copies are also
stored.
The -> in debug messages is currently overloaded to mean both source to dest
and dest to source. To fix this, we add explicit labels and make the arrow
direction consistent.
Currently the "copy" dict contains both explicit copies/moves made by a
context and pending moves that need to happen because the other context moved
the directory the file was in. For explicit copies, the dict stores a
destination to source map, while for pending moves via directory renames, it
stores a source to destination map. The merge code uses this fact in a non-
obvious way to differentiate between these two cases.
We make this explicit by storing these pending moves in a separate dict. The
dict still has a source to destination map, but that is called out in the
docstring.
In several place, We check if a branchcache is still valid regarding the current
state of the repository. This changeset puts this logic in a method of the object
that can be reused when necessary.
A branch map is considered valid whenever it is up to date or a strict subset of
the repository state.
The change will help making branchcache aware of filtered revision.
The change in keyword is expected. the branch cache is actually invalid after
the amend. The previous check did not detected it.
The update function have all necessary data to keep the branchcache key
up to date with its value.
This saves assignment to the cache key that each caller of update had to do by
hand.
The strip case is a bit more complicated to handles from inside the function but
I do not expect any impact.
Value and key of branchcache would benefit from being hold by the same object.
Moreover some logic (update, write, validation) could be move on such object.
The creation of this object is the first step toward this move. The result will
clarify branchcache related code and hide most of the detail in the class
itself. This encapsulation will greatly helps implementation of branchcache for
filtered view of the repo.
The previous code was writing it to a non existent `branchcache` attribute. We
now write is to the proper `_branchcache` attribute and initialize the
`_branchcachetip` at the same time.
We keep writing it to disk, the previous code had this part right.
The `_updatebranchmap` method on repo does not need to be filtered as all
callers are already handling filtering themself.
The fact it is filtered may had even lead to buggy behaviors, but by chances the method
make very sparse use of the repo object.
This is the foundation stone for an extraction of branches map logic from local
repository class. Most of the branches map logic have very few caller and
therefor does not fit in the current criteria for code held by the localrepo
class. Important change will be made to this code in relation with revision
filtering. So we extract it in a dedicated module before adding additional
complexity.
Follow up commit do the actual code movement.
Instead of preventing any cache write we allow writing cache for all content of
the original repo.
The motivation for this change is to drop the custom _writebranchcache of
bundlerepo to help extraction of the branchmap logic out of localrepo.
The current _branchtags method is a remnant of an older, much larger function.
Its only remaining role is to help MQ to alter the version of branchcache we
store on disk. As MQ mutates the repository it ensures persistent cache does not
contain anything it is likely to alter.
This changeset makes explicit the stable vs volatile part of repository and
reduces the MQ specific code to the computation of this limit. The main
_branchtags code now handles this possible limit in all cases.
This will help to extract the branchmap logic from the repository.
The new code of _branchtags is a bit duplicated, but as I expect major
refactoring of this section I'm not keen to setup factorisation function here.
This solved an obscure bug for me. In upgrading Distribute on the server, a
patch was added to has a try: import a except ImportError thing that's only
supposed to work with Python 3.3. I'm using 2.7. My hook failed with an
ImportError because of this. It seems kind of sensible to turn off
demandimport before importing the hook, since the except ImportError pattern
is used quite a bit in Python code (including in other Distribute code).
This change appends the subrepo path to subrepo errors. That is, when there
is an error performing an operation a subrepo, rather than displaying a message
such as:
pushing subrepo MYSUBREPO to PATH
searching for changes
abort: push creates new remote head HEADHASH!
hint: did you forget to merge? use push -f to force
mercurial will show:
pushing subrepo MYSUBREPO to PATH
searching for changes
abort: push creates new remote head HEADHASH! (in subrepo MYSUBREPO)
hint: did you forget to merge? use push -f to force
The rationale for this change is that the current error messages make it hard
for TortoiseHg (and similar tools) to tell the user which subrepo caused the
push failure.
The "(in subrepo MYSUBREPO)" message has been added to those subrepo methods
were it made sense (by using a decorator). We avoid appending "(in subrepo XXX)"
multiple times when subrepos are nexted by throwing a "SubrepoAbort" exception
after the extra message is appended. The decorator will then "ignore" (i.e. just
re-raise) the exception and never add the message again.
A small drawback of this method is that part of the exception trace is lost when
the exception is catched and re-raised by the annotatesubrepoerror decorator.
Also, because the state() function already printed the subrepo path when it
threw an error, that error has been changed to avoid duplicating the subrepo
path in the error message.
Note that I have also updated several subrepo related tests to reflect these
changes.
The `hiddenrevs` cache is volatile too (It use content from `obscache`). When
unsure it is invalidated when necessary. In a near future, the cache will
probably be moved to `revsfiltercache`
This is the second real use of changelog filtering. The change is very small to
allow testing the new filter with a setup close to the original one.
We replace custom post processing on `heads`function by call to the standard
code pass on a filtering repo.
In later coming will have wider usage of filtering that will make the dedicated
function useless.
Here is the first real use of changelog filtering. The change is very small to
allow testing the new filter with a setup close to the original one.
We replace custom post processing on branchmap function by call to the
standard code pass on a filtering repo.
In later coming will have wider usage of filtering that will make the dedicated
function useless.
Recomputing the filtered revisions at every access to changelog is far too
expensive. This changeset introduce a cache for this information. This cache is
hold by the repository (unfiltered repository) and invalidated when necessary.
This cache is not a protected attribute (leading _) because some logic that
invalidate it is not held by the local repo itself.
We add a `filtered` method on repo. This method return an instance of `repoview`
that behaves exactly as the original repository but with a filtered changelog
attribute. Filters are identified by a "name". Planned filter are `unserved`,
`hidden` and `mutable`. Filtering the repository in place what out of question
as it wont not allows multiple thread to share the same repo. It would makes
control of the filtering scope harder too. See the `repoview` docstring for
details.
A mechanism to compute filtered revision is also installed. Some caches will be
installed in later commit.
We also turn the unix domain socket into a class, so that we have a
sensible place to hang its logically related attributes and behaviour.
We'll shortly want to reuse this in other code.
For a repository with over 400,000 commits, rebasing one revision near tip,
this avoids two treks up the DAG, speeding the operation up by around 1.6
seconds.
This also makes the perfancestorset command use lazy membership testing. In a
linear repository with over 400,000 commits, without this patch, hg
perfancestorset takes 0.80 seconds no matter how far behind we're looking.
With this patch, hg perfancestorset -- X takes:
Rev X Time
-1 0.00s
-4000 0.01s
-20000 0.04s
-80000 0.17s
-200000 0.43s
-300000 0.69s
0 0.88s
Thus, for revisions close to tip, we're up to several orders of magnitude
faster. At 0 we're around 10% slower.
With the current implementation, `changelog.nodemap` is not filtered. So some
filtered changeset in common are not filtered by `n in nodemap`. This leads to
crash lower in the stack when the bundle generation try to access those node on
a filtered changelog.
Currently the code path of `changectx(filteredrepo, rev)` call
`filteredrepo.changelog.node(rev)`. When `rev` is filtered this raise an
unhandled `IndexError`. This case now raise a `RepoLookupError` as other
error case do.
This is in preparation for an upcoming refactoring. This also fixes a bug in
incancestors, where if an element of revs was an ancestor of another it would
be generated twice.
Before this patch, enabling strict command processing (ui.strict=True)
meant that 'hg bookmark NAME', as referenced several places in the
documentation, would not work. This adds 'bookmark' as an explicit alias
to 'bookmarks'.
Color extension achieves colorization by overriding the class of
"ui" object just before command execution.
Before this patch, "diff()" of abstractsubrepo and classes
derived from it has no "ui" argument, so "diff()" of hgsubrepo
uses "self._repo.ui" to invoke "cmdutil.diffordiffstat()".
For separation of configuration between repositories, revision
1498948ee815 changed the initialization source of "self._repo.ui"
from "ui"(overridden) to "baseui"(plain) of parent repository.
And this caused break of colorization.
This patch adds "ui" argument to "diff()" of abstractsubrepo and
classes derived from it to pass "ui" object of caller side.
Starting with 049792af94d6, users are no longer able to update a
working copy to a branch named with a "bad" character (such as ':').
Prior to v2.4, it was possible to create branch names using "bad"
characters, so this breaks backwards compatibility.
Mercurial must allow users to update to existing branches with bad
names. However, it should continue to prevent the creation of new
branches with bad names.
A test was added to confirm that 'hg update' works as expected. The
test uses a bundled repo that was created with an earlier version of
Mercurial.
Looks like there are instances where sys.stdout/stderr contain file
handles that are invalid. We should be tolerant of this for hook I/O
redirection, as our primary concern is not garbling our own output stream.
The old str-based += collector performed very nicely on Linux, but
turns out to be quadratically expensive on Windows, causing
chunkbuffer to dominate in profiles.
This list-based version has been measured to significantly improve
performance with large chunks on Windows, with negligible overall
overhead on Linux (though microbenchmarks show it to be about 50% slower).
This may increase memory overhead where += didn't behave quadratically. If we
want to gather up 1G of data to join, we temporarily have 1G in our
list and 1G in our string.
When commiting to a repo with lots of history (>400000 changesets)
checking the results of revset.py:descendants against the subset takes
some time. Since the subset equals the entire changelog, the check
isn't necessary. Avoiding it in that case saves 0.1 seconds off of
a 1.78 second commit. A 6% gain.
We use the length of the subset to determine if it is the entire repo.
There is precedence for this in revset.py:stringset.
When commiting to a repo with lots of history (>400000 changesets)
the filteredrevs check (added with 373606589de5) in changelog.py
takes a bit of time even if the filteredrevs set is empty. Skipping
the check in that case shaves 0.36 seconds off a 2.14 second commit.
A 17% gain.
'*' causes the resulting RE to match 0 or more repetitions of the preceding RE:
>>> bool(re.search('.*', ''))
>>> True
This causes an infinite loop because currently we're only checking if there was
a match without looking at where we are in the searched string.
The same we have `unstable` and `bumped`. Convenient method to access troubles
information in general may land later.
This get actual use and testing in the next changesets.
This changesets add a new `divergent()` revset similar to `unstable()` and
`bumped()` one. Introducting this revset allows actuall test of the divergent
detection.
Divergent changeset are final successors (non obsolete) of a changeset who
compete with another set of final successors for this same changeset.
For example if you have two obsolescence markers A -> B and A -> C, B and C are
both "divergent" because they compete to be the one true successors of A.
Public revision can't be divergent.
This function is used and tested in the next changeset.
Successors set are an important part of obsolescence. It is necessary to detect
and solve divergence situation. This changeset add a core function to compute
them, a debug command to audit them and solid test on the concept.
Check function docstring for details about the concept.
This replaces unnecessary parentrevs() calls with calculating min(parentset).
Even though the min operation is O(size of parentset), since parentrevs is
relatively expensive, this tradeoff almost always works in our favour. In a
repository with over 400,000 changesets, hg perfrevset "children(X)" takes:
Set X Before After
-1 0.51s 0.06s
-1000: 0.55s 0.08s
-10000: 0.56s 0.10s
-100000: 0.60s 0.25s
-100000:-99000 0.55s 0.19s
0:100000 0.60s 0.61s
all() 0.72s 0.74s
The relative performance is similar for Mercurial's own repository -- several
times faster in most cases, slightly slower for revisions close to 0 and
all().
The check pattern only checked for whitespace between keyword and operator.
Now it also warns:
> x = f(),7
missing whitespace after ,
> x = f()+7
missing whitespace in expression
Up until now the templates that show RSS and Atom feeds on the "repository
lists" (i.e. gitweb and monoblue) showed them for all entries, including regular
folders. Clicking on those "folder RSS" links would result in an error page
being shown.
This patch hides those links for regular folders.
There were no RSS nor Atom feeds for the branches page. Different hgweb
templates linked to different feeds on their branches page (some linked to the
tags feed, some to the log feed and some to the unexisting branches feed).
The current query to get the new bookmark target for stripped revisions
involves multiple walks up the DAG, and is really expensive, taking over 2.5
seconds on a repository with over 400,000 changesets even if just one
changeset is being stripped.
A slightly simplified version of the current query is
max(heads(::<tostrip> - <tostrip>))
We make two observations here.
1. For any set s, max(heads(s)) == max(s). That is because revision numbers
define a topological order, so that the element with the highest revision
number in s will not have any children in s.
2. For any set s, max(::s - s) == max(parents(s) - s). In other words, the
ancestor of s with the highest revision number not in s is a parent of one
of the revs in s. Why? Because if it were an ancestor but not a parent of s,
it would have a descendant that would be a parent of s. This descendant
would have a higher revision number, leading to a contradiction.
Combining these two observations, we rewrite the revset query as
max(parents(<tostrip>) - <tostrip>)
The time complexity is now linear in the number of changesets being stripped.
For the above repository, the query now takes 0.1 seconds when one changeset
is stripped. This speeds up operations that use repair.strip, like the rebase
and strip commands.
This changes graft to explicitly track the progression of commits it
makes, and updates it's idea of the current node based on it's last
commit, rather than from the working copy parent. This should have no
effect on the value of current since we were reading the working copy
parent immediately after commiting to it.
The motivation for this change is that a subsequent patch will break
the current node and working copy relationship. Splitting this out
into a separate patch will make that one more readible.
This moves the logic for generating the commit metadata ahead of the
merge operation. The only purposae of this patch is to make
subsequent patches easier to read, and there should be no behavior
changes.
This pulls the code used to calculate the changes that need to happen
during merge.update() into a separate function. This is not useful on
its own, but is instead preparatory to performing grafts in memory
when there are no potential conflicts.
hg perfstatus -u on a working directory with 170,000 files, without this
change:
! wall 1.839561 comb 1.830000 user 1.120000 sys 0.710000 (best of 6)
With this change:
! wall 1.804222 comb 1.790000 user 1.140000 sys 0.650000 (best of 6)
hg perfstatus on the same directory, without this change:
! wall 1.016609 comb 1.020000 user 0.670000 sys 0.350000 (best of 10)
With this change:
! wall 0.985573 comb 0.980000 user 0.650000 sys 0.330000 (best of 10)
hg perfstatus -u on a working directory with 170,000 files, without this
change:
! wall 1.869404 comb 1.850000 user 1.170000 sys 0.680000 (best of 6)
With this change:
! wall 1.839561 comb 1.830000 user 1.120000 sys 0.710000 (best of 6)
This makes a big difference to performance.
In a clean working directory containing 170,000 files, performance of
"hg --time diff" improves from 2.38 seconds to 1.69.
In a clean working directory containing 170,000 tracked files, this
improves performance of "hg --time diff" from 1.69 seconds to 1.43.
This idea is due to Siddharth Agarwal.
Files in a subrepo were overwritten on update. But this should only happen on a
clean update (example: -C is specified).
Use the overwrite parameter introduced for svn subrepos in e3640daa4703 to
decide whether to merge changes (as update) or remove them (as clean).
The new function hg.updaterepo is intruduced to keep all update calls in hg.
test-subrepo.t is extended to test if an untracked file is overwritten
(issue3276). (Update -C is already tested in many places.)
The first two chunks are debugging output which has changed. (Because overwrite
is not always true anymore for subrepos)
All other tests still pass without any change.
Before this patch, case-folding collision is checked simply between
manifests of each merged revisions.
So, files may be considered as colliding each other, even though one
of them is already deleted on one of merged branches: in such case,
merge causes deleting it, so case-folding collision doesn't occur.
This patch checks whether both of files colliding each other still
remain after merge or not, and ignores collision if at least one of
them is deleted by merge.
In the case that one of colliding files is deleted on one of merged
branches and changed on another, file is considered to still remain
after merge, even though it may be deleted by merge, if "deleting" of
it is chosen in "manifestmerge()".
This avoids fail to merge by case-folding collisions after choices
from "changing" and "deleting" of files.
This patch adds only tests for "removed remotely" code paths in
"_remains()", because other ones are tested by existing tests in
"test-casecollision-merge.t".