For changelog level filtering to take effect it need to be used for any
iteration. Some remaining use of `xrange` in revset code is replace by proper
use of `changelog.revs` or direct iteration over changelog.
The branchpoint() function returns changesets with more than one child.
Eventually I would like to be able to see only branch points and merge
points in a graphical log to see the topology of the repository.
This patch invokes "osutil.listdir()" via vfs object.
The function added newly to "abstractvfs" is named not as "listdir()"
but as "readdir()", because:
- "os.listdir()" seems to be more familiar as "listdir()" than
"osutil.listdir()"
- "osutil.listdir()" returns also type of each files like
"readdir()" POSIX API: even though "d_type" field of "dirent"
structure is defined mainly only on BSD/Linux
This patch invokes "osutil.listdir()" via "rawvfs" object to avoid
filename encoding, because the path passed to "osutil.listdir()"
shouldn't be encoded.
This patch also omits importing "osutil" module, because it is no
longer used.
'hg log' on untracked files tends to be fairly slow. The root cause is that we end up using the 'slowpath' when we can't find a revlog for the files listed. This could happen if the file in question is an untracked file, or it is a directory.
This diff tries to speed up 'hg log' (by avoiding the slowpath) for files if we can determine if that file is not (and was never) a directory. We use the previously added store.__contains__ methods to test if the directory exists (or existed) in the store.
To avoid changing any existing semantics, this 'optimization' kicks in only when none of the files listed as arguments to the hg log command exist in the store.
Adds a __contains__ method to fncachestore to check for file/dir existence (using fncache.__contains__).
Also extends fncache.__contains__ to check for directories (by prefix matching)
Recomputing branch cache on clone may be expensive,
therefore if possible we fetch it along with the data.
- If the clone is performed by copying, we just copy branchcache file.
- If we localrepo.clone and streaming then we follow the procedure:
1. Fetch branchmap from the remote
2. Fetch the actual data.
3. Find the latest rev within branch heads (tip at the time of
branchmap fetch)
4. Update the cache for the revs in [remotetip+1, tip]
This way we ensure that the branchcache is correct even in case
of races with commits.
Dates and times that are outside the 31-bit signed range are now
compared modulo 2^31. This should prevent it from behaving badly with
very large files or corrupt dates while still having a high
probability of detecting changes.
a0d2da57b726 added this help text to hg bookmark:
If no NAME is given, the current active bookmark will be marked inactive.
But that was never actually the case.
Originally spotted by Idan Kamara <idankk86@gmail.com>.
For example LC_ALL=de_DE.utf-8 would cause the version check to fail,
because "svn, Version 1.6.12 (r955767)" with a capital "V" will be printed.
Using "svn --version --quiet" would only print the version number, but then
matching other messages, e.g. "Committed revision" would fail.
Suppose the following scenario:
1. Process A takes the lock (e.g. on commit).
2. Process B wants to grab the lock. Since lock file exists
the exception is raised. In the catch block the testlock
function is called.
3. Process A releases the lock.
4. Process B tries to read the lock file as a part of testlock
function. This results in OSError (ENOENT) and since we're
not inside the exception handler function this is propagated
and aborts the whole operation.
To fix this we now check in testlock function whether lock file
actually exists and if not (i.e. if readlock fails) we just return.
We give up using CPython's PythonXX.lib import libraries (and Python.h), and
now "manually" call the LoadLibrary() / GetProcAddress() Windows API's instead.
If there is a "hg-python" subdirectory (the canonical directory name for
HackableMercurial's private Python copy) next to the hg.exe, we load the
pythonXX.dll from there (feeding an absolute path to LoadLibrary) and we set
Py_SetPythonHome() to that directory, so that the Python libraries are used
from there as well.
If there is no "hg-python" subdir found next to the hg.exe, we do not feed an
absolute path to LoadLibrary. This continues to allow to find a globally
installed Python DLL, as before this change - that is, without having to edit,
delete, rename, or configure anything.
Note that the hg.exe built is still bound to a *specific* major version of the
pythonXX.dll (e.g. python27.dll). What version it is, is inferred from the
version of the python interpreter that was used when calling setup.py. For
example
C:\python27_x86\python.exe setup.py build_hgexe -i --compiler=mingw32
builds a hg.exe (using the mingw32 tool chain) bound to (x86) Python 2.7. And
C:\python27_x86\python.exe setup.py build_hgexe -i
builds the same using the Microsoft C compiler/linker. (Note that the Microsoft
toolchain combined with x64 CPython can be used to build an x64 hg.exe.)
setup.py is changed to write the name of the pythonlib into the generated header
file "mercurial/hgpythonlib.h", which is #included by exewrapper.c. For a Python
2.7 build, it for example contains:
#define HGPYTHONLIB "python27"
exewrapper.c then uses HGPYTHONLIB for the name of the Python dll to load.
We don't want to track mercurial/hgpythonlib.h, so we add it to .hgignore.
This patch invokes "os.path.isdir()" via "rawvfs" object to avoid
filename encoding, because the path passed to "os.path.isdir()"
shouldn't be encoded.
This patch newly adds "self.rawvfs" field only to "basicstore" and
"encodedstore", because "fncachestore" has "self.rawvfs" already.
This patch replaces invocation of "getsize()", which calls "os.stat()"
internally, by "vfs.stat()".
The object referred by "self.rawvfs" is used internally by
"_fncachevfs" and doesn't encode filename for each file API invocation.
This patch invokes "os.stat()" via "self.rawvfs" to avoid redundant
filename encoding: invocation of "os.stat()" via "self.vfs" hides
filename encoding and encoding result from caller, so it is not
appropriate, when both encoded and non-encoded filenames should be
yield.
Even though changeset 89eacc6262af improved stream_out performance by
"self.pathsep + path", this patch replaces it by
"os.path.join(self.base, path)" of vfs. So, this may increase cost to
join path components.
But this shouldn't have large impact, because:
- such cost is much less than cost of "os.stat()" which causes
system call invocation
- "datafiles()" of store object is invoked only for "hg manifest
--all" or "hg verify" which are both heavy functions
This just replaces "os.stat()" invocation: refactoring around
"self.createmode" and "vfs.createmode" initialization is omitted.
This patch also newly adds "stat()" function to "abstractvfs".
This patch defines "join()" in each classes derived from "abstractvfs"
except "vfs", which already defines it.
This allows all vfs instances to be used for indirect file API
invocation.
This patch initializes "vfs" field in the constructor of each store
classes to use it for initialization of others.
In this patch, "self.vfs.base" is used to initialize "self.path",
because redo join of path components for "self.path" is redundant.
Definition functions for vfs support in dictionary order increases
readability/maintainability, because there are functions which invoke
file API:
- with same name: "os.listdir" and "osutil.listdir", for example
- with ambiguous names: "os.mkdir" and "util.makedirs", for example
Before this patch, there are two ambiguous variables: "havemf" and
"hasmanifest".
"havemf" means whether there are any "manifest" entries.
"hasmanifest" means whether there are any "changelog" entries
referring to "manifest" entry.
This patch renames from "hasmanifest" to "refersmf" to clear
difference from "havemf".
Before this patch, "checkentry()" internal function uses both
"node"(argument of itself) and "n"(defined in outer of it) variables.
Because all callers of "checkentry()" use "n" to refer the object
which is passed to "checkentry()" as "node", both can refer same
object in "checkentry()". So, "checkentry()" works correctly.
But such usage is not good for independence of "checkentry()".
This patch replaces "n" in "checkentry()" with "node".
Before this patch, verify module shows verification error message
below:
unknown parent 2 <HASH_OF_P2> of <HASH_OF_P1>
even though it should show:
unknown parent 2 <HASH_OF_P2> of <HASH_OF_TARGET>
This patch uses appropriate node information.
Before this patch, there is no information about what users should (or
can) do for recovery from corruption of repositories.
This patch adds URL of the Mercurial Wiki page explaining about
recovery from corruption.
Tags initially prevented revision to be hidden. It seemed a bad idea to have
tags refer to revisions that one can't see. But proper filtering of hidden
revisions excludes them from tag computation. Coming changelog filtering will do
that. Anyway, tags that really matter will likely be public and therefore not
hidden.
The current working directory parent and bookmarked revision are still not
hidden. Bookmarks were likely automatically moved at rewrite time, bookmarks
that remain on obsolete revisions were probably moved there on purpose.
If there are filtered changesets the cache is not valid. We'll have to cache
branchmap for filtered state too, but for now recomputing the branchmap is
enough.
svn --version --quiet is implemented since svn 0.14.1 (August 2002)
and prints just the version number, not the long output (21 lines)
of "svn --version".
Additionally I expect this output format to be more stable, at least
it is not changed with different translations.
If the input path is already longer than _maxstorepathlen, then we can skip
doing the basic encoding (encodedir, _encodefname and _auxencode) and directly
proceed to the hashed encoding. Those encodings, if at all, will make the path
only longer.
This changeset allows changelog object to be "filtered". You can assign a set of
revision numbers to the `changelog.filteredrevs` attributes. The changelog will
then pretends these revision does not exists in this repo.
A few methods need to be altered to achieve this behavior:
- tip
- __iter_
- irevs
- hasnode
- headrevs
For consistency and to help debugging, the following methods are altered too.
Tests tend to show it's not necessary to alter them but have them raise proper
exception helps to detect bad acces to filtered revisions.
- rev
- node
- linkrev
- parentrevs
- flags
The following methods would also need alteration for consistency purpose but
this is non-trivial and not done yet.
- nodemap
- strip
The C version of headrevs is not run if there is any revision to filter. It'll
need a proper rewrite later to restore performance.
Make the pure python implementation of headrevs available to derived classes. It
is important because filtering logic applied by `revlog` derived class won't
have effect on `index`. We want to be able to bypass this C call to implement
our own.
This prepares changelog level filtering. We can't assume that any revision can
be heads because filtered revisions need to be excluded.
New algorithm:
- All revisions now start as "non heads",
- every revision we iterate over is made candidate head,
- parents of iterated revisions are definitely not head.
Filtered revisions are never iterated over and never considered as candidate
head.
This prepares changelog level filtering. We need the algorithms used in revlog to
work on a subset of revisions. To achieve this, the use of explicit range of
revision is banned. `range` and `xrange` calls are replaced by a `revlog.irevs`
method. Filtered super class can then overwrite the `irevs` method to filter out
revision.
We can only use copy clone if the cloned repo do not have any secret changeset.
The current method for that is to run the "secret()" revset on the remote repo.
But with proper filtering of hidden or unserved revision by the remote this
revset won't return any revision even if some exist remotely. This changeset
adds an explicit function to know if a repo have any secret revision or not.
The other option would be to disable filtering for the query but I prefer the
approach above, lighter both regarding code and performance.
The forced recomputation of the branch cache was introduced by `b4909adfc093`.
Back there, `addchangegroup` did not handle any lock logic.
Later `6042c410e045` introduced lock logic to `addchangegroup`. Its description
does not explain why the `updatebranchcache` call is made outside locking. I
believe that the lock was released there because it fit well with the transaction
release already in the code.
Finally `1eda82d76f0c` moved all "unlocked" code of `addchangegroup` to an
`repo._afterlock` callback.
I do not think that the call to `updatebranchcache()` requires to be done
outside locking. That may even be a bad idea to do so. Bringing this call back
in the `addchangegroup` function makes the flow simpler and eases the following
up changelog level filtering business.
At the moment the resolve command doesn't save progress during the resolve process. In example if you try to resolve 100 conflicting files and interrupt the process (e.g., you close the external merge tool) after resolving 50 files you'll end up with 100 unresolved conflicts. Saving the progress helps a lot with long going merges. It's easy to achieve same behavior with simple script that calls resolve command for each unresolved file but it makes sense to make such behavior a default
Before this patch, the argument bound to the source repository of
incoming bookmarks for "bookmarks.diff()" is named as "remote".
But in "hg outgoing" case, this argument is bound to local repository
object.
In addition to it, "local"/"remote" seem to mean not the direction of
propagation of bookmarks, but just the location of cooperative
repositories.
To indicate the direction of propagation of bookmarks clearly on the
source code, this patch uses "d(st)" and "s(rc)" combination instead
of "l(ocal)" and "r(emote)" one.
- "repo" and "remote" arguments are renamed to "dst" and "src"
- "lmarks" and "rmarks" variables are renamed to "dmarsk" and "smarks"
This patch also changes initialization order of "*opener" and "*vfs"
fields: first, "*vfs" fields are initialized , and then, "*opener"
ones are initialized.
For backwards compatibility, aliases for the old names are added,
except for "abstractopener", "statichttpopener" and "_fncacheopener",
because these are not used in Mercurial core implementation after this
patch.
"_fncacheopener" was only referred in "fncachestore" constructor, so
this patch also renames from "_fncacheopener" to "_fncachevfs" there.
This alternate syntax was proposed by Bryan O'Sullivan in a review of
441ebe37ceb5. I haven't been able to measure any particular performance
difference, but the new syntax is more concise and easier to read.
This patch adds "descendant()", which uses "revlog.descendant()" for
descendant examination, to changectx.
This implementation is more efficient than "new in old.descendants()"
expression, because:
- "changectx.descendants()" creates temporary "changectx" objects,
but "revlog.descendant()" doesn't
"revlog.descendant()" checks only revision numbers of descendants.
- "revlog.descendant()" stops scanning, when scanning of all
revisions less than one of examination target is finished
this can avoid useless scanning in "not descendant" case.
(This is not yet enabled; it will be turned on in a followup patch.)
The path encoding performed by fncache is complex and (perhaps
surprisingly) slow enough to negatively affect the overall performance
of Mercurial.
For a short path (< 120 bytes), the Python code can be reduced to a fairly
tractable state machine that either determines that nothing needs to be
done in a single pass, or performs the encoding in a second pass.
For longer paths, we avoid the more complicated hashed encoding scheme
for now, and fall back to Python.
Raw performance: I measured in a repo containing 150,000 files in its tip
manifest, with a median path name length of 57 bytes, and 95th percentile
of 96 bytes.
In this repo, the Python code takes 3.1 seconds to encode all path
names, while the hybrid C-and-Python code (called from Python) takes
0.21 seconds, for a speedup of about 14.
Across several other large repositories, I've measured the speedup from
the C code at between 26x and 40x.
For path names above 120 bytes where we must fall back to Python for
hashed encoding, the speedup is about 1.7x. Thus absolute performance
will depend strongly on the characteristics of a particular repository.
For a netbeans clone on Windows 7 x64:
Before:
$ hg perffncacheencode
! wall 3.516000 comb 3.525623 user 3.525623 sys 0.000000 (best of 3)
After:
$ hg perffncacheencode
! wall 3.443000 comb 3.447622 user 3.447622 sys 0.000000 (best of 3)
Before this patch, zip archives created by "hg archive" are extracted
with unexpected timestamp, if TZ is not configured as GMT.
This patch adds "extended-timestamp" extra block to zip archives, and
unzip will extract such archives with timestamp specified in added
extra block, even though TZ is not configured as GMT.
Please see documents below for detail about specification of zip file
format and "extended-timestamp" extra block:
http://www.pkware.com/documents/casestudies/APPNOTE.TXThttp://www.opensource.apple.com/source/zip/zip-6/unzip/unzip/proginfo/extra.fld
Original implementation of this patch was suggested by "Jun Omae
<jun66j5@gmail.com>".
Not yet used (will be enabled in a later patch).
This patch is a stripped down version of patches originally created by
Bryan O'Sullivan <bryano@fb.com>
For a netbeans clone on Windows 7 x64:
Before:
$ hg perffncacheload
! wall 0.124000 comb 0.124801 user 0.124801 sys 0.000000 (best of 76)
After:
$ hg perffncacheload
! wall 0.096000 comb 0.093601 user 0.078001 sys 0.015600 (best of 97)
For a netbeans clone on Windows 7 x64:
Before:
$ hg perffncachewrite
! wall 0.210000 comb 0.218401 user 0.202801 sys 0.015600 (best of 47)
After:
$ hg perffncachewrite
! wall 0.104000 comb 0.109201 user 0.078000 sys 0.031200 (best of 95)
I don't think we will ever have anything in the store that resides inside a
directory that ends in .i or .d under store/ that we wouldn't want to have
direncoded. The files not under data/ surely don't need direncoding, but it
doesn't harm to let these few run through it. It hurts more to check whether the
thousands of other files start with 'data/'. They do anyway.
See also 67e6074ba430 (fixed with 0c522fe42894), which moved the direncoding
from filelog into store
hgweb has an incorrect padding calculation, causing the text to move further
away from the graph the more branches there are (issue3626). This patch fixes
all existing templates (gitweb, monoblue, paper and spartan).
Tests updated by Patrick Mezard <patrick@mezard.eu>
by refactoring
for i, n in enumerate(res):
if n:
<main code block>
to
for i, n in enumerate(res):
if not n:
continue
<main code block>
(no functional change)
Merely creating and using a generator has a measurable impact,
particularly since the common case for stream_out is generators that
yield just once. Avoiding generators improves stream_out performance
by about 7%.
Auditing at this stage is both pointless (paths are already trusted by
the local repo) and expensive. Skipping the audits improves stream_out
performance by about 15%.
New commit from the amend process were created without any phase contraint. If
the amended changeset had a different phase from it's parent, the phases data
were lost.
The changeset ensure the new commit are created in the same phase than the
original changeset.
Subversion 1.7 changes its XML output to include an explicit encoding tag:
<?xml version="1.0" encoding="UTF-8"?>
This triggers xml.dom.minidom to always return unicode strings, causing
other parts of the code to explode.
We unconditionally encode path names before handing them back, which
works with both str (actually a no-op) and unicode values.
JavaScript .replace always magically processed $$ $& $' $` in replacement
strings and thus displayed subject lines incorrectly in the graph view.
Instead of regexps and .replace we now just create the strings the right way in
the first place.
When we rewrite a bookmarked changeset, we want to update the
bookmark on its successors. But the successors are not descendants
of its precursor (by definition). This changeset alters the bookmarks
logic to update bookmark location if the newer location is a successor
of the old one[1].
note: valid destinations are in fact any kind of successors of any kind
of descendants (recursively.)
This changeset requires the enabling of the obsolete feature in
some bookmark tests.
We usually update bookmarks only if the new location is descendant of the old
bookmarks location. We extract this logic into a function. This is the first
step to allow more complex logic using obsolescence in this validation of the
bookmark movement.
A relevant obsolete marker may have been added -after- we previously
exchanged the changeset. We have to search for remote heads that
disappear by the sole fact of pushing obsolescence.
This case will also happen when remote got the new version from a
repository that does not propagate obsolescence markers.
Checkheads was more permissive than expected. When the remote heads
are public we don't need to search for successors. None will make a
public head disappear.
If all heads are bookmarks, merge fails to find what node to merge
with (throws an IndexError while indexing into the non-bookmark heads
list) as of 208ca72b9343. This catches that case and prints an error
to specify a rev explicitly.
When running:
$ hg debugfileset 'binary() and ignored()'
getfileset() was correctly retrieving ignored files but
matchctx.existing() was not taking them in account. Just add them along
with unknown files.
By default, unknown files are ignored. If the 'unknown()' predicate
appears in the syntax tree, then they are taken in account.
Unfortunately, matchctx.existing() was filtering against non-deleted
context files, which does not include unknown files. So:
$ hg debugfileset 'binary() and unknown()'
would not return existing binary unknown files.
Running:
$ hg debugfileset 'binary()'
would traceback if there were one deleted file in the working directory.
It happened because matchctx.existing() was filtering files against the
ctx.__contains__() but deleted files are still considered part of
workingctx.
The `repair` code builds a giant revset query instead of using the "%lr" idiom.
It is inefficient and crash when the number of stripped changeset is too big.
This changeset replaces the bad code by a better revset usage.
_partialmatch() does prefix matching against nodes. String passed
to _partialmetch() actualy may be any string, not prefix only.
For example,
"63af8381691a9e5c52ee57c4e965eb306f86826e or 300" is a good
argument for _partialmatch().
When _partialmatch() searches using radix tree, index_partialmatch()
C function shouldn't try to match too long strings.
This function is designed to be used by all code that creates new
obsolete markers in the local repository.
It is not used by debugobsolete because debugobsolete allows the
use of an unknown hash as argument.
Before this changeset, the extra commit created during amend had
the same description as the final commit. This was a bit confusing
when trying to understand what that extra commit was about.
This changeset changes the description of such commit to:
temporary amend commit for <ammend-commit-hash>
The old behaviour was not a big deal, but would become more confusing
once we use obsolescence marker instead of stripping the precursors.
This also helps if the user restores a strip backup.
This allows proper recovery of an interrupted amend process.
No changes are made to the logic besides:
- indent operations into a single try-except clause,
- some comment and code wrapping to 80 chars,
- strip logic should not be contained in the transaction and is extracted from
the main code.
This changeset introduces caches on the `obsstore` that keeps track of sets of
revisions meaningful for obsolescence related logics. For now they are:
- obsolete: changesets used as precursors (and not public),
- extinct: obsolete changesets with osbolete descendants only,
- unstable: non obsolete changesets with obsolete ancestors.
The cache is accessed using the `getobscache(repo, '<set-name>')` function which
builds the cache on demand. The `clearobscaches(repo)` function takes care of
clearing the caches if any.
Caches are cleared when one of these events happens:
- a new marker is added,
- a new changeset is added,
- some changesets are made public,
- some public changesets are demoted to draft or secret.
Declaration of more sets is made easy because we will have to handle at least
two other "troubles" (latecomer and conflicting).
Caches are now used by revset and changectx. It is usually not much more
expensive to compute the whole set than to check the property of a few elements.
The performance boost is welcome in case we apply obsolescence logic on a lot of
revisions. This makes the feature usable!
The export command didn't output the diffs in color, even when color support
was enabled. This patch fixes that by making the export command use the default
ui.write method, instead of directly manipulating the ui.fout file object.
Also added a test case to verify color output to test-export.t.
It's preferable to report "ssl required" as an error, so that the client
can detect error and exit with 255. Currently hg exits with 1, which is
"nothing to push."