If the input path is already longer than _maxstorepathlen, then we can skip
doing the basic encoding (encodedir, _encodefname and _auxencode) and directly
proceed to the hashed encoding. Those encodings, if at all, will make the path
only longer.
This changeset allows changelog object to be "filtered". You can assign a set of
revision numbers to the `changelog.filteredrevs` attributes. The changelog will
then pretends these revision does not exists in this repo.
A few methods need to be altered to achieve this behavior:
- tip
- __iter_
- irevs
- hasnode
- headrevs
For consistency and to help debugging, the following methods are altered too.
Tests tend to show it's not necessary to alter them but have them raise proper
exception helps to detect bad acces to filtered revisions.
- rev
- node
- linkrev
- parentrevs
- flags
The following methods would also need alteration for consistency purpose but
this is non-trivial and not done yet.
- nodemap
- strip
The C version of headrevs is not run if there is any revision to filter. It'll
need a proper rewrite later to restore performance.
Make the pure python implementation of headrevs available to derived classes. It
is important because filtering logic applied by `revlog` derived class won't
have effect on `index`. We want to be able to bypass this C call to implement
our own.
This prepares changelog level filtering. We can't assume that any revision can
be heads because filtered revisions need to be excluded.
New algorithm:
- All revisions now start as "non heads",
- every revision we iterate over is made candidate head,
- parents of iterated revisions are definitely not head.
Filtered revisions are never iterated over and never considered as candidate
head.
This prepares changelog level filtering. We need the algorithms used in revlog to
work on a subset of revisions. To achieve this, the use of explicit range of
revision is banned. `range` and `xrange` calls are replaced by a `revlog.irevs`
method. Filtered super class can then overwrite the `irevs` method to filter out
revision.
We can only use copy clone if the cloned repo do not have any secret changeset.
The current method for that is to run the "secret()" revset on the remote repo.
But with proper filtering of hidden or unserved revision by the remote this
revset won't return any revision even if some exist remotely. This changeset
adds an explicit function to know if a repo have any secret revision or not.
The other option would be to disable filtering for the query but I prefer the
approach above, lighter both regarding code and performance.
The forced recomputation of the branch cache was introduced by `b4909adfc093`.
Back there, `addchangegroup` did not handle any lock logic.
Later `6042c410e045` introduced lock logic to `addchangegroup`. Its description
does not explain why the `updatebranchcache` call is made outside locking. I
believe that the lock was released there because it fit well with the transaction
release already in the code.
Finally `1eda82d76f0c` moved all "unlocked" code of `addchangegroup` to an
`repo._afterlock` callback.
I do not think that the call to `updatebranchcache()` requires to be done
outside locking. That may even be a bad idea to do so. Bringing this call back
in the `addchangegroup` function makes the flow simpler and eases the following
up changelog level filtering business.
At the moment the resolve command doesn't save progress during the resolve process. In example if you try to resolve 100 conflicting files and interrupt the process (e.g., you close the external merge tool) after resolving 50 files you'll end up with 100 unresolved conflicts. Saving the progress helps a lot with long going merges. It's easy to achieve same behavior with simple script that calls resolve command for each unresolved file but it makes sense to make such behavior a default
Before this patch, the argument bound to the source repository of
incoming bookmarks for "bookmarks.diff()" is named as "remote".
But in "hg outgoing" case, this argument is bound to local repository
object.
In addition to it, "local"/"remote" seem to mean not the direction of
propagation of bookmarks, but just the location of cooperative
repositories.
To indicate the direction of propagation of bookmarks clearly on the
source code, this patch uses "d(st)" and "s(rc)" combination instead
of "l(ocal)" and "r(emote)" one.
- "repo" and "remote" arguments are renamed to "dst" and "src"
- "lmarks" and "rmarks" variables are renamed to "dmarsk" and "smarks"
This patch also changes initialization order of "*opener" and "*vfs"
fields: first, "*vfs" fields are initialized , and then, "*opener"
ones are initialized.
For backwards compatibility, aliases for the old names are added,
except for "abstractopener", "statichttpopener" and "_fncacheopener",
because these are not used in Mercurial core implementation after this
patch.
"_fncacheopener" was only referred in "fncachestore" constructor, so
this patch also renames from "_fncacheopener" to "_fncachevfs" there.
This alternate syntax was proposed by Bryan O'Sullivan in a review of
441ebe37ceb5. I haven't been able to measure any particular performance
difference, but the new syntax is more concise and easier to read.
This patch adds "descendant()", which uses "revlog.descendant()" for
descendant examination, to changectx.
This implementation is more efficient than "new in old.descendants()"
expression, because:
- "changectx.descendants()" creates temporary "changectx" objects,
but "revlog.descendant()" doesn't
"revlog.descendant()" checks only revision numbers of descendants.
- "revlog.descendant()" stops scanning, when scanning of all
revisions less than one of examination target is finished
this can avoid useless scanning in "not descendant" case.
(This is not yet enabled; it will be turned on in a followup patch.)
The path encoding performed by fncache is complex and (perhaps
surprisingly) slow enough to negatively affect the overall performance
of Mercurial.
For a short path (< 120 bytes), the Python code can be reduced to a fairly
tractable state machine that either determines that nothing needs to be
done in a single pass, or performs the encoding in a second pass.
For longer paths, we avoid the more complicated hashed encoding scheme
for now, and fall back to Python.
Raw performance: I measured in a repo containing 150,000 files in its tip
manifest, with a median path name length of 57 bytes, and 95th percentile
of 96 bytes.
In this repo, the Python code takes 3.1 seconds to encode all path
names, while the hybrid C-and-Python code (called from Python) takes
0.21 seconds, for a speedup of about 14.
Across several other large repositories, I've measured the speedup from
the C code at between 26x and 40x.
For path names above 120 bytes where we must fall back to Python for
hashed encoding, the speedup is about 1.7x. Thus absolute performance
will depend strongly on the characteristics of a particular repository.
For a netbeans clone on Windows 7 x64:
Before:
$ hg perffncacheencode
! wall 3.516000 comb 3.525623 user 3.525623 sys 0.000000 (best of 3)
After:
$ hg perffncacheencode
! wall 3.443000 comb 3.447622 user 3.447622 sys 0.000000 (best of 3)
Before this patch, zip archives created by "hg archive" are extracted
with unexpected timestamp, if TZ is not configured as GMT.
This patch adds "extended-timestamp" extra block to zip archives, and
unzip will extract such archives with timestamp specified in added
extra block, even though TZ is not configured as GMT.
Please see documents below for detail about specification of zip file
format and "extended-timestamp" extra block:
http://www.pkware.com/documents/casestudies/APPNOTE.TXThttp://www.opensource.apple.com/source/zip/zip-6/unzip/unzip/proginfo/extra.fld
Original implementation of this patch was suggested by "Jun Omae
<jun66j5@gmail.com>".
Not yet used (will be enabled in a later patch).
This patch is a stripped down version of patches originally created by
Bryan O'Sullivan <bryano@fb.com>
For a netbeans clone on Windows 7 x64:
Before:
$ hg perffncacheload
! wall 0.124000 comb 0.124801 user 0.124801 sys 0.000000 (best of 76)
After:
$ hg perffncacheload
! wall 0.096000 comb 0.093601 user 0.078001 sys 0.015600 (best of 97)
For a netbeans clone on Windows 7 x64:
Before:
$ hg perffncachewrite
! wall 0.210000 comb 0.218401 user 0.202801 sys 0.015600 (best of 47)
After:
$ hg perffncachewrite
! wall 0.104000 comb 0.109201 user 0.078000 sys 0.031200 (best of 95)
I don't think we will ever have anything in the store that resides inside a
directory that ends in .i or .d under store/ that we wouldn't want to have
direncoded. The files not under data/ surely don't need direncoding, but it
doesn't harm to let these few run through it. It hurts more to check whether the
thousands of other files start with 'data/'. They do anyway.
See also 67e6074ba430 (fixed with 0c522fe42894), which moved the direncoding
from filelog into store
hgweb has an incorrect padding calculation, causing the text to move further
away from the graph the more branches there are (issue3626). This patch fixes
all existing templates (gitweb, monoblue, paper and spartan).
Tests updated by Patrick Mezard <patrick@mezard.eu>
by refactoring
for i, n in enumerate(res):
if n:
<main code block>
to
for i, n in enumerate(res):
if not n:
continue
<main code block>
(no functional change)
Merely creating and using a generator has a measurable impact,
particularly since the common case for stream_out is generators that
yield just once. Avoiding generators improves stream_out performance
by about 7%.
Auditing at this stage is both pointless (paths are already trusted by
the local repo) and expensive. Skipping the audits improves stream_out
performance by about 15%.
New commit from the amend process were created without any phase contraint. If
the amended changeset had a different phase from it's parent, the phases data
were lost.
The changeset ensure the new commit are created in the same phase than the
original changeset.
Subversion 1.7 changes its XML output to include an explicit encoding tag:
<?xml version="1.0" encoding="UTF-8"?>
This triggers xml.dom.minidom to always return unicode strings, causing
other parts of the code to explode.
We unconditionally encode path names before handing them back, which
works with both str (actually a no-op) and unicode values.
JavaScript .replace always magically processed $$ $& $' $` in replacement
strings and thus displayed subject lines incorrectly in the graph view.
Instead of regexps and .replace we now just create the strings the right way in
the first place.
When we rewrite a bookmarked changeset, we want to update the
bookmark on its successors. But the successors are not descendants
of its precursor (by definition). This changeset alters the bookmarks
logic to update bookmark location if the newer location is a successor
of the old one[1].
note: valid destinations are in fact any kind of successors of any kind
of descendants (recursively.)
This changeset requires the enabling of the obsolete feature in
some bookmark tests.
We usually update bookmarks only if the new location is descendant of the old
bookmarks location. We extract this logic into a function. This is the first
step to allow more complex logic using obsolescence in this validation of the
bookmark movement.
A relevant obsolete marker may have been added -after- we previously
exchanged the changeset. We have to search for remote heads that
disappear by the sole fact of pushing obsolescence.
This case will also happen when remote got the new version from a
repository that does not propagate obsolescence markers.
Checkheads was more permissive than expected. When the remote heads
are public we don't need to search for successors. None will make a
public head disappear.
If all heads are bookmarks, merge fails to find what node to merge
with (throws an IndexError while indexing into the non-bookmark heads
list) as of 208ca72b9343. This catches that case and prints an error
to specify a rev explicitly.
When running:
$ hg debugfileset 'binary() and ignored()'
getfileset() was correctly retrieving ignored files but
matchctx.existing() was not taking them in account. Just add them along
with unknown files.
By default, unknown files are ignored. If the 'unknown()' predicate
appears in the syntax tree, then they are taken in account.
Unfortunately, matchctx.existing() was filtering against non-deleted
context files, which does not include unknown files. So:
$ hg debugfileset 'binary() and unknown()'
would not return existing binary unknown files.
Running:
$ hg debugfileset 'binary()'
would traceback if there were one deleted file in the working directory.
It happened because matchctx.existing() was filtering files against the
ctx.__contains__() but deleted files are still considered part of
workingctx.
The `repair` code builds a giant revset query instead of using the "%lr" idiom.
It is inefficient and crash when the number of stripped changeset is too big.
This changeset replaces the bad code by a better revset usage.
_partialmatch() does prefix matching against nodes. String passed
to _partialmetch() actualy may be any string, not prefix only.
For example,
"63af8381691a9e5c52ee57c4e965eb306f86826e or 300" is a good
argument for _partialmatch().
When _partialmatch() searches using radix tree, index_partialmatch()
C function shouldn't try to match too long strings.
This function is designed to be used by all code that creates new
obsolete markers in the local repository.
It is not used by debugobsolete because debugobsolete allows the
use of an unknown hash as argument.
Before this changeset, the extra commit created during amend had
the same description as the final commit. This was a bit confusing
when trying to understand what that extra commit was about.
This changeset changes the description of such commit to:
temporary amend commit for <ammend-commit-hash>
The old behaviour was not a big deal, but would become more confusing
once we use obsolescence marker instead of stripping the precursors.
This also helps if the user restores a strip backup.
This allows proper recovery of an interrupted amend process.
No changes are made to the logic besides:
- indent operations into a single try-except clause,
- some comment and code wrapping to 80 chars,
- strip logic should not be contained in the transaction and is extracted from
the main code.
This changeset introduces caches on the `obsstore` that keeps track of sets of
revisions meaningful for obsolescence related logics. For now they are:
- obsolete: changesets used as precursors (and not public),
- extinct: obsolete changesets with osbolete descendants only,
- unstable: non obsolete changesets with obsolete ancestors.
The cache is accessed using the `getobscache(repo, '<set-name>')` function which
builds the cache on demand. The `clearobscaches(repo)` function takes care of
clearing the caches if any.
Caches are cleared when one of these events happens:
- a new marker is added,
- a new changeset is added,
- some changesets are made public,
- some public changesets are demoted to draft or secret.
Declaration of more sets is made easy because we will have to handle at least
two other "troubles" (latecomer and conflicting).
Caches are now used by revset and changectx. It is usually not much more
expensive to compute the whole set than to check the property of a few elements.
The performance boost is welcome in case we apply obsolescence logic on a lot of
revisions. This makes the feature usable!
The export command didn't output the diffs in color, even when color support
was enabled. This patch fixes that by making the export command use the default
ui.write method, instead of directly manipulating the ui.fout file object.
Also added a test case to verify color output to test-export.t.
It's preferable to report "ssl required" as an error, so that the client
can detect error and exit with 255. Currently hg exits with 1, which is
"nothing to push."