Most local change that occurs during a pull are doing within a `transaction`.
Currently this mean (1) adding new changeset (2) adding obsolescence markers. We
want the two operations to be done in the same transaction. However we do not
want to create a transaction if nothing is added to the repo. Creating an empty
transaction would drop the previous transaction data and confuse tool and people
who are still using rollback.
So the current pull code has some logic to create and handle this transaction on
demand. We are moving this logic in to the `pulloperation` object itself to
simplify this lazy creation logic through all different par of the push.
Note that, in the future, other part of pull (phases, bookmark) will probably
want to be part of the transaction too.
The obsolescence marker exchange code was already extracted during a previous
cycle. We are moving the extracted functio in this module. This function will
read and write data in the `pulloperation` object and I prefer to have all core
function collaborating through this object in the same place.
This changeset is pure code movement only. Code change for direct consumption of
the `pulloperation` object will come later.
This object will hold all data and state gathered through the pull. This will
allow us to split the long function into multiple small one. Smaller function
will be easier to maintains and wrap. The idea is to blindly store all
information related to the pull in this object so that each step and extension
can use them if necessary.
We start by putting the `repo` variable in the object. More migration in other
function.
The localrepo class if far too big. Push and pull logic are extracted and
reworked to better fit with the fact we exchange more than bundle now.
This changeset extract the pulh code. later changeset will slowly slice it into
smaller brick.
The localrepo.pull method is kept for now to limit impact on user code. But it
will be ultimately removed, now that the public API is hold by peer.
Now that every necessary information is held in the `pushoperation` object, we
can extract the new common set computation to it's own function.
This changeset is pure code movement only.
The phase synchronisation start by computing the new set of common head between
local and remote and then do the phase synchronisation on this set. This new
common set logic will eventually be used by the obsolescence markers exchange.
So we are going to split the long phase synchronisation in two.
Now that every necessary information is held in the `pushoperation` object, we
can extract the discovery logic to it's own function.
This changeset is pure code movement only.
Now that every necessary information is held in the `pushoperation` object, we
can extract the part responsible of aborting the push to it's own function.
This changeset is mostly pure code movement. the exception is the fact this
function returns a value to decide if changeset bundle should be pushed.
The fact that there is some unknown changes on remote one of the result of
discovery. It is then used by some push validation logic.
We move it in the object to be able to extract the said logic.
Now that every necessary information is held in the `pushoperation` object, we
can extract the logic pushing changeset to it's own function.
This changeset is pure code movement only.
The heads of the remote repository are used to detect race when pushing
changeset. We now store this information in `pushoperation` object to allow
extraction of the changeset pushing part.
Performance Benchmarking:
$ time hg log -qr "first(matching(0))"
0:3a6a38229d41
real 0m2.213s
user 0m2.149s
sys 0m0.055s
$ time ./hg log -qr "first(matching(0))"
0:3a6a38229d41
real 0m0.177s
user 0m0.137s
sys 0m0.038s
Performance Benchmarking:
$ time hg log -qr "first(file(README))"
0:3a6a38229d41
real 0m2.234s
user 0m2.180s
sys 0m0.044s
$ time ./hg log -qr "first(file(README))"
0:3a6a38229d41
real 0m0.172s
user 0m0.129s
sys 0m0.042s
This improves the performance of the revsets 'adds' 'modifies' and 'removes'
Performance benchmarking:
$ time hg log -qr "first(adds(README))"
0:3a6a38229d41
real 0m2.279s
user 0m2.222s
sys 0m0.053s
$ time ./hg log -qr "first(adds(README))"
0:3a6a38229d41
real 0m0.172s
user 0m0.131s
sys 0m0.041s
$ time hg log -qr "first(modifies(README))"
1:692bee203f23
real 0m2.292s
user 0m2.227s
sys 0m0.061s
$ time ./hg log -qr "first(modifies(README))"
1:692bee203f23
real 0m0.178s
user 0m0.130s
sys 0m0.038s
$ time hg log -qr "first(removes(README))"
2379:f24de2acd560
real 0m2.297s
user 0m2.235s
sys 0m0.058s
$ time ./hg log -qr "first(removes(README))"
2379:f24de2acd560
real 0m0.975s
user 0m0.797s
sys 0m0.056s
Performance Benchmarking:
$ time hg log -qr "first(public())"
...
real 0m1.184s
user 0m1.051s
sys 0m0.130s
$ time ./hg log -qr "first(public())"
...
real 0m0.548s
user 0m0.427s
sys 0m0.118s
Performance benchmarking:
$ time hg log -qr "first(merge())"
102:1634643e0cd8
real 0m0.276s
user 0m0.208s
sys 0m0.047s
$ time ./hg log -qr "first(merge())"
102:1634643e0cd8
real 0m0.192s
user 0m0.154s
sys 0m0.027s
Performance benchmarking:
$ time hg log -qr "first(grep(hg))"
0:3a6a38229d41
real 0m2.214s
user 0m2.163s
sys 0m0.045s
$ time ./hg log -qr "first(grep(hg))"
0:3a6a38229d41
real 0m0.211s
user 0m0.146s
sys 0m0.035s
Performance benchmarking:
$ time hg log -qr "first(desc(hg))"
changeset: 0:3a6a38229d41
real 0m2.210s
user 0m2.158s
sys 0m0.049s
$ time ./hg log -qr "first(desc(hg))"
changeset: 0:3a6a38229d41
real 0m0.171s
user 0m0.131s
sys 0m0.035s
Performance Benchmarking:
$ time hg log -qr "first(date(05/03/2005))"
0:3a6a38229d41
real 0m3.157s
user 0m2.994s
sys 0m0.087s
$ time ./hg log -qr "first(date(05/03/2005))"
0:3a6a38229d41
real 0m0.509s
user 0m0.289s
sys 0m0.070s
Performance benchmarking:
$ time hg log -qr "first(author(mpm))"
0:3a6a38229d41
real 0m3.486s
user 0m3.317s
sys 0m0.077s
$ time ./hg log -qr "first(author(mpm))"
0:3a6a38229d41
real 0m0.551s
user 0m0.295s
sys 0m0.072s
Performance benchmarking:
$ time hg log -qr "first(keyword(changeset))"
0:3a6a38229d41
real 0m3.466s
user 0m3.345s
sys 0m0.072s
$ time ./hg log -qr "first(keyword(changeset))"
0:3a6a38229d41
real 0m0.365s
user 0m0.199s
sys 0m0.083s
Performance benchmarking:
$ time hg log -qr "first(branch(default))"
0:3a6a38229d41
real 0m3.130s
user 0m3.025s
sys 0m0.074s
$ time ./hg log -qr "first(branch(default))"
0:3a6a38229d41
real 0m0.300s
user 0m0.198s
sys 0m0.069s
Performance Benchmarking:
$ time hg log -l1 -qr "branch(default)"
0:3a6a38229d41
real 0m3.366s
user 0m3.217s
sys 0m0.095s
$ time ./hg log -l1 -qr "branch(default)"
0:3a6a38229d41
real 0m0.389s
user 0m0.199s
sys 0m0.061s
Now that every necessary information is held in the `pushoperation` object, we
can finally extract the phase synchronisation phase to it's own function. This
is the first concrete block of code we extract from the huge push function.
Hooray!
This changeset is pure code movement only.
The set of outgoing and common changeset are used by phases to compute the new
common set between local and remote. So we need to move it into the object to
extract the phase sync from the god function.
Note that this information will be used by obsolescence markers too.
The return code convey information about the success of changeset push. This is
used by phases to compute the new common set between local and remote. So we
need to move it into the object to extract the phase sync from the god
function.
Note that this information will be used by obsolescence markers too.
Now that pushop is holding all the push related data, we do not really need a
closure anymore. So we start feeding the object to `localphasemove` and will
make it a normal function in the next commit.
During push, we try to lock the local repo to move local phase according to what
we saw/pushed on the remote repo. Locking the repo may fail, in that case we let
the push proceed without local phase movement (printing warning).
This mean we have code in phase synchronisation that will check if the local repo is
locked or not. we need to move this information in the push object to be able to
extract the phase synchronisation in its own function. This is done as a boolean
because putting reference to the actual lock outside of the main function
sounded a bad idea.
The obsolescence marker exchange code was already extracted during a previous
cycle. We are moving the extracted functio in this module. This function will
read and write data in the `pushoperation` object and I prefer to have all core
function collaborating through this object in the same place.
This changeset is pure code movement only. Code change for direct consumption of
the `pushoperation` object will come later.
This class allows us to return values from large revsets as soon as they are
computed instead of having to wait for the entire revset to be calculated.
win32.spawndetached starts the detached process by `cmd.exe` (or COMSPEC). The
pid it returned was the one of cmd.exe and not the one of the detached process.
When this pid is used to kill the process, the detached process is not killed,
but only cmd.exe.
With this patch the pid of the detached process is written to the pid file.
Killing the process works as expected.
The pid is only evaluated on writing the pid file. It is unnecessary to search
the pid when it is not needed. And more important, it probably does not yet
exist right after the cmd.exe process was started. When the pid is written to
the file, waiting for the start of the detached process has already happened.
Use this functionality instead of writing a 2nd wait function.
Many tests on windows will not fail anymore, all those with the first failing
line "abort: child process failed to start". (The processes still hanging
around from previous test runs have to be killed first. They still block a
tcp port.)
A good test for the functionality of this patch is test-treediscovery.t,
because it starts and kills `hg serve -d` several times.
changegroupsubset will take the parents of the roots to find the bases.
Other parts of Mercurial do not expect that, and a result of that is that some
bundles contain more changesets than necessary.
No real changes here - just renaming a parameter to document what it is.
This note (which actually is a warning) frequently caused confusion.
"unsynced" is not a well established user-facing concept in Mercurial and the
message was not very specific or helpful.
Instead, show a messages like:
remote has heads on branch 'default' that are not known locally: 6c0482d977a3
and show it before aborting on "push creates new remote head". This will also
give more of a hint in the case where the branch has been closed remotely and
'hg heads' thus not would show any new heads after pulling.
A similar (but actually very different) message was addressed in cbd5a12a601a.
Missing templates where not reported as a problem, only an empty bracket
were shown as indication of no found template directory:
$ hg debuginstall
*...some lines*
checking templates ()...
*...some lines*
no problems detected
Now the problem is reported and extended with some information. The style
of the messages is adapted to the other messages of debuginstall.
When no templates directories exist, it writes:
$ hg debuginstall
*...some lines*
checking templates ()...
no template directories found
(templates seem to have been installed incorrectly)
*...some lines*
1 problems detected, please check your install!
When the template map is not found, it writes:
$ hg debuginstall
*...some lines*
checking templates (/path/to/mercurial/templates)...
template 'default' not found
(templates seem to have been installed incorrectly)
*...some lines*
1 problems detected, please check your install!
When the template map is buggy the message is the same as before. The error
message is shown before the line "(templates seem ...)".
No test is added because testing this failure is complicated. It would
require to modify the templates directory of the mercurial installation,
or to monkey patch a function (os.listdir or any from mercurial.templater)
by a test extension.
A message like this was sometimes shown when pushing:
remote: waiting for lock on repository foo held by 'mercurial:20858'
That could scare users, making them wonder whether the push actually succeeded.
To mitigate that fear, issue an additional "warning" such as:
got lock after 2 seconds
The return value from lock.lock.lock() was unused - instead we return the
delay.
This also adds the first test coverage for waiting for locks.
Pure mercurial (i.e. without c extensions) does not support partialmatch() on
the revlog index, so we must fall back to use revlog._partialmatch() to handle
that case for us. The tests caught this.
We don't use revlog._partialmatch() for the normal case because it performs a
very expensive index iteration when the string being tested fails to find a
unique result via index.partialmatch(). It does this in order to filter
out hidden revs in hopes of the string being unique amongst non-hidden revs.
For the shortest(node) case, we'd prefer performance over worrying about
hidden revs.
Adds a pad template function with the following signature:
pad(text, width, fillchar=' ', right=False)
This uses the standard python ljust and rjust functions to produce a string
that is at least a certain width. This is useful for aligning variable length
strings in log output (like user names or shortest(node) output).
Adds a '{shortest(node)}' template function that results in the shortest hex node
that uniquely identifies the changeset at that time. The minimum length can be
specified as an optional second argument and defaults to 4.
This is useful for producing prettier log output, like so:
@ durham shortestnode
| 77cf template: add pad function for padding output
|
o durham
| b183 template: add shortestnode keyword
|
o pierre-yves @
| 6545 backout: add a message after backout that need manual commit
|
| o durham manifestcache
|/ 93f0 manifest cache
|
| o durham catperf
| | c765 cat: increase perf when catting single files
| |
| o durham
|/ 9c53 changectx: increase perf of walk function
|
Added set caching to the baseset class. It lazily builds the set whenever it's
needed and keeps a reference which is returned when the set is requested
instead of being built again.
The bookmark exchange code was already extracted during a previous cycle. This
changesets moves the extracted function in this module. This function will read
and write data in the `pushoperation` object and It is preferable to have all
core function collaborating through this object in the same place.
This changeset is pure code movement only. Code change for direct consumption of
the `pushoperation` object will come later.
This object will hold all data and state gathered through the push. This will
allow us to split the long function into multiple small one. Smaller function
will be easier to maintains and wrap. The idea is to blindly store all
information related to the push in this object so that each step and extension
can use them if necessary.
We start by putting the `repo` variable in the object. More migration in other
changeset.
The localrepo class if far too big. Push and pull logic will be extracted and
reworked to better fit with the fact they now exchange more than plain changeset
bundle.
This changeset extract the push code. later changeset will slowly slice this
over 200 hundred lines and 8 indentation level function into smaller saner
brick.
The localrepo.push method is kept for now to limit impact on user code. But it
will be ultimately removed, now that the public supposed API is hold by peer.
Now that discovery is working on unfiltered changeset, I had a good occasion to
look at that bug again. This let me realise that a trivial node vs rev
comparision was the cause of this two years old bugs…
Happy second birthday phases!
Running `hg log --style compact` (or any other style) raised a traceback when
no template directory was there. Now there is a message:
Abort: style 'compact' not found
(available styles: no templates found, try `hg debuginstall` for more info)
There is no test because this would require to rename the template directory.
But this would influence other tests running in parallel. And when the test
would be aborted the wrong named directory would remain, especially a problem
when running with -l.
For technical reason (discovery, obsolescence marker) the hash of secret
changeset are communicated outside of your repo. We clarifie that in the help so
that people does not used hash of secret changeset as nuclear launch code.
This removes an optimization that was introduced in 5a644704d5eb but was too
aggressive - as indicated by how it changed test-mq-merge.t .
We are walking filelogs to find copy sources and we can thus not be sure to hit
the base revision and find the renamed file there - it could also be in the
first ancestor of the base ... in the filelog.
We are walking the filelog and can thus not easily know when we hit the first
ancestor of the base revision and which filename to look for there. Instead, we
use _findlimit like mergecopies do: The lower bound for how far we have to go
is found from the lowest changelog revision that is an ancestor of only one of
the compared revisions. Any filelog ancestor with a revision number lower than
that revision will be the ancestor of both compared revisions, and there is
thus no reason to go further back than that.
Special case the single file case in hg cat. This allows us to avoid
parsing the manifest, which shaves 15% off hg cat perf. This is worth
it, since automation often uses hg cat for retrieving single files.
When running 'hg cat -r . <file>' it was doing an expensive ctx.walk(m) which
applied the regex to every file in the manifest.
This changes changectx.walk to iterate over just the files in the regex, if no
other patterns are specified. This cuts hg cat time by 50% in our repo and
probably benefits a few other commands as well.
When users are using a revset they can get multiple password prompts.
This prompts have no extra information about which password is being requested
so I added the authuri to the prompt to make it recognizable.
As in:
$ hg log -r "outgoing('https://bitbucket.org/mg/test') -
outgoing('https://bitbucket.org/nesneros/test')"
http authorization required
realm: Bitbucket.org HTTP
user: interrupted!
I changed it to describe the url when prompting for password.
As in:
$ hg log -r "outgoing('https://bitbucket.org/mg/test') -
outgoing('https://bitbucket.org/nesneros/test')"
http authorization required for https://bitbucket.org/mg/test
realm: Bitbucket.org HTTP
user: interrupted!
Before this patch, there is no explicit description about pattern
matching against directories, even though users may understand it from
"plain examples" in "hg help patterns".
This patch adds description about pattern matching against
directories.
Before this patch, online help of "adds()", "contains()", "filelog()",
"file()", "modifies()" and "removes()" predicates doesn't explain
about how the pattern without explicit kind like "glob:" is treated,
even though each predicates treat it differently:
- as "relpath:" by "adds()", "modifies()" and "removes()"
- as "glob:" by "file()"
- as special by "contains()" and "filelog()"
- be relative to cwd, and
- match against a file exactly
("relpath:" matches also against a directory)
This may confuse users.
This patch adds explanation about the pattern without explicit kind
to these predicates.
Before this patch, revset predicate "filelog()" uses "match.files()"
to get filename also for the pattern without explicit kind.
But in such case, only canonicalization of relative path is required,
and other initializations of "match" object including regexp
compilation are meaningless.
This patch uses "pathutil.canonpath()" directly for "filelog()"
pattern without explicit kind like "glob:", for efficiency.
This patch also does below as a part of introducing "canonpath()":
- move location of "matchmod.match()" invocation, because "m" is no
more used in "if not matchmod.patkind(pat)" code path
- omit passing "default" argument to "matchmod.match()", because
"pat" should have explicit kind of pattern in this code path
This patch avoids the loop for "match.files()" having always one
element in revset predicate "filelog()" for efficiency: "match" object
"m" is constructed with "[pat]" as "patterns" argument.
Before this patch, default kind of pattern for revset predicate
"contains()" is treated as the exact file path rooted at the root of
the repository. This decreases usability, because:
- all other predicates taking pattern argument (also "filelog()")
treat such pattern as the path rooted at the current working
directory
- "contains()" doesn't describe this difference in its help
- this difference may confuse users
for example, this prevents revset aliases from sharing same
argument between "contains()" and other predicates
This patch makes default kind of pattern for revset predicate
"contains()" be rooted at the current working directory.
This patch uses "pathutil.canonpath()" instead of creating "match"
object for efficiency.
This patch narrows scope of the variable "m" in the function for
revset predicate "contains()", because it is referred only in "else"
code path of "if not matchmod.patkind(pat)" examination.
Previously, when an obsolete changeset was bookmarked, successor changesets were not considered
when moving the bookmark forward. Now that a bare update will move to the tip most of the
successor changesets, we also update the bookmark logic to allow the bookmark to move with this
update.
Tests have been updated and keep issue4015 covered as well.
Previously, a bare update would ignore any successor changesets thus
potentially leaving you on an obsolete head. This happens commonly when there
is an old bookmark that hasn't been moved forward which is the motivating
reason for this patch series.
Now, we will check for successor changesets if two conditions hold: 1) we are
doing a bare update 2) *and* we are currently on an obsolete head.
If we are in this situation, then we calculate the branchtip of the successor
set and update to that changeset.
Tests coverage has been added.
In some case Backout silently succeeded to back out but left all the change
uncommitted. This may be confusing for user so this changeset add a note
reminding to commit. Other backout case already actively informs the user about
created commit.
Before the changeset the backout process was:
1) go to <target>
2) revert to <target> parent
3) update back to changeset we came from
The two update steps can takes a very long time to move back and forth unrelated
file change between <target> and current working directory.
The new process is just merging current working directory with the parent of
<target> using <target> as ancestor. This give the very same result but skip
the two updates. On big repo with a lot of files and changes that save a lots of
time (x20 for one week window).
The "merge" version (hg backout --merge) is still done with upgrades. We could
imagine using in memory commit to speed it up but this is another fish.
We drop iterrevs which are not needed anymore. The know head are never a
descendant of the updated set. It was possible with the old strip code. This
simplification make the code easier to read an update.
We never use the node of new revisions unless in the very specific case of
closed heads. So we can just use the revision number.
So give another handfull of percent speedup.
Was the behavior correct and the description wrong so it should be updated as
in this patch? Or should the code work as the documentation says?
Both ways could make some sense ... but none of them are obvious in all cases.
One place where it currently cause problems is when the current revision has
another branch head that is closer to tip but closed. 'hg rebase' refuses to
rebase to that as it only see the tip-most unclosed branch head which is the
current revision.
/me kind of likes named branches, but no so much how branch closing works ...
This is often very handy when hacking/debugging.
Calling util.debugstacktrace('hey') from a place in hg will give something like:
hey at:
./hg:38 in <module>
/home/user/hgsrc/mercurial/dispatch.py:28 in run
/home/user/hgsrc/mercurial/dispatch.py:65 in dispatch
/home/user/hgsrc/mercurial/dispatch.py:88 in _runcatch
/home/user/hgsrc/mercurial/dispatch.py:740 in _dispatch
/home/user/hgsrc/mercurial/dispatch.py:514 in runcommand
/home/user/hgsrc/mercurial/dispatch.py:830 in _runcommand
/home/user/hgsrc/mercurial/dispatch.py:801 in checkargs
/home/user/hgsrc/mercurial/dispatch.py:737 in <lambda>
/home/user/hgsrc/mercurial/util.py:472 in check
...
b33db384a66e not only introduced the 'bisect(current)' revset predicate, it
also changed how the 'current' revision is used in combination with --command.
The new behaviour might be ok for --noupdate where the working directory and
its revision shouldn't be used, but it also did that when --command is used to
run a command on the currently checked out revision then it could register the
test result on the wrong revision.
An example:
Before, bisect with --command could use the wrong revision when recording the
test result:
$ hg up -qr 0
$ hg bisect --command "python \"$TESTTMP/script.py\" and some parameters"
changeset 31:58c80a7c8a40: bad
abort: inconsistent state, 31:58c80a7c8a40 is good and bad
Now it works as before and as expected and uses the working directory revision
for the --command result:
$ hg up -qr 0
$ hg bisect --command "python \"$TESTTMP/script.py\" and some parameters"
changeset 0:b99c7b9c8e11: bad
...
The 'copies' method has no test coverage and calls copies.pathcopies with an
incorrect number of parameters and is thus (fortunately) not used. Kill it.
Any invocations of bookmarks other than a plain 'hg bookmarks' will likely
cause a write to the bookmark store. These should be guarded by the wlock.
The repo._bookmarks read should be similarly guarded by the wlock if we're
going to be subsequently writing to it.
Upcoming patches will acquire the wlock for write operations, such as make
inactive, but not read-only ones, such as list bookmarks. Separate out the
status messages so that the code paths can be separated.
Nodemap is not aware of filtering so we need to ask the changelog itself if a
node is known. This is probably a bit slower but such check does not dominated
discovery time. This is necessary if we want to run discovery on filtered repo.
The discovery is not yet ready for filtered repo. Pull was using filtered for
its discovery which is wrong. It worked by dumb luck because discovery mainly
use funtion that does not respect the filtering.
Trying to makes discovery work on filtered repo revealed this bug.
When all changesets in the local repo are either being pushed or remotly known,
we can take a fast path when bundling changeset because we are certain all local
deltas are computed againts base known remotely.
So we have a check to detect this situation, when we did a bare push and nothing
was excluded.
In a coming refactoring, the discovery will run on filtered view and the content
of `outgoing.excluded` will just include unserved (secret) changeset not filtered by the
repoview used to call push (usually "visible"). So we need to check if there is
both no excluded changeset and nothing filtered by the current repoview.
Before that changes, pulled revision that happend to be already known locally
(so, not actually added) was not taken into account when computing the new
common set between local and remote.
It appears that we already know the heads of the pulled set. It is in the
`rheads` variable, so we are just using it and everything is works fine.
We are dropping the, now useless, computation of `added` set in the process.
Moves the code that actually writes to a file to a separate function in
revlog.py. This allows extensions to intercept and use the data being written to
disk. For example, an extension might want to replicate these writes elsewhere.
When cloning the Mercurial repo on /dev/shm with --pull, I see about a 0.3% perf change.
It goes from 28.2 to 28.3 seconds.
The double-for form of list comprehensions gets particularly unreadable
when you throw in an 'if' condition. This expands the only remaining
instance of the double-for syntax in our codebase into a loop.
Reminder: a changeset is said "bumped" if it tries to obsolete a immutable
changeset.
The previous algorithm for computing bumped changeset was:
1) Get all public changesets
2) Find all they successors
3) Search for stuff that are eligible for being "bumped"
(mutable and non obsolete)
The entry size of this algorithm is `O(len(public))` which is mostly the same as
`O(len(repo))`. Even this this approach mean fewer obsolescence marker are
traveled, this is not very scalable.
The new algorithm is:
1) For each potential bumped changesets (non obsolete mutable)
2) iterate over precursors
3) if a precursors is public. changeset is bumped
We travel more obsolescence marker, but the entry size is much smaller since
the amount of potential bumped should remains mostly stable with time `O(1)`.
On some confidential gigantic repo this move bumped computation from 15.19s to
0.46s (×33 speedup…). On "smaller" repo (mercurial, cubicweb's review) no
significant gain were seen. The additional traversal of obsolescence marker is
probably probably counter balance the advantage of it.
Other optimisation could be done in the future (eg: sharing precursors cache
for divergence detection)
Detection of bumped changeset should use `allprecursors(<mutable>)` instead or
`allsuccessors(<immutable>)` so we need the all precursors function to exists.
Changeset aad678a92970 moved `subsettable` from `mercurial/repoview.py` to
`mercurial/branchmap.py`. This mean that `filtertable` and `subsettable` are no
longer next to each other. So we add a comment to remind people to update both.
parse() cannot be called at the same time because a parser object keeps its
states. This is no problem for command-line hg client, but it would cause
strange errors in multi-threaded hgweb.
Creating parser object is not too expensive.
original:
% python -m timeit -s 'from mercurial import revset' 'revset.parse("0::tip")'
100000 loops, best of 3: 11.3 usec per loop
thread-safe:
% python -m timeit -s 'from mercurial import revset' 'revset.parse("0::tip")'
100000 loops, best of 3: 13.1 usec per loop
Passing a non-string to parsers.parse_index2() causes Mercurial to crash
instead of raising a TypeError (found on Mac OS X 10.8.5, Python 2.7.6):
import mercurial.parsers as parsers
parsers.parse_index2(0, 0)
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 parsers.so 0x000000010e071c59 _index_clearcaches + 73 (parsers.c:644)
1 parsers.so 0x000000010e06f2d5 index_dealloc + 21 (parsers.c:1767)
2 parsers.so 0x000000010e074e3b parse_index2 + 347 (parsers.c:1891)
3 org.python.python 0x000000010dda8b17 PyEval_EvalFrameEx + 9911
This happens because when arguments of the wrong type are passed to
parsers.parse_index2(), indexType's initialization function index_init() in
parsers.c leaves the indexObject instance in a state that indexType's
destructor function index_dealloc() cannot handle.
This patch moves enough of the indexObject initialization code inside
index_init() from after the argument validation code to before it.
This way, when bad arguments are passed to index_init(), the destructor
doesn't crash and the existing code to raise a TypeError works. This
patch also adds a test to check that a TypeError is raised.
Previously, this required -f because we didn't consider obsolete changesets
(and their children ... or successors of those children, etc.). We now use
obsolete.foreground to calculate acceptable changesets when advancing the
bookmark.
Test coverage has been added.
Backslashes (\) in paths were encoded to %C5 when converting from url to
string. This does not look nice for windows paths. And it introduces many
problems when running tests on windows.
When trying to turn a draft changeset into a secret changeset, I was
told:
% hg phase -s .
cannot move 1 changesets to a more permissive phase, use --force
no phases changed
That message struck me as being backwards -- the secret phase feels
less permissive to me since it restricts the changesets from being
pushed.
We don't use the word "permissive" elsewhere, 'hg help phase' talks
about "lower phases" and "higher phases". I therefore reformulated the
error message to be
cannot move 1 changesets to a higher phase, use --force
That is not perfect either, but more in line with the help text. An
alternative could be
cannot move phase backwards for 1 changesets, use --force
which fits better with the help text for --force.
Before the templater got extended for nested expressions, it made
sense to decode string escapes across the whole string. Now we do it
on a piece by piece basis.
The search mode description can't be translated by itself, since
it's displayed as part of a template phrase (the "Assuming ..."
/ "Use ... instead" bits). Just drop the translation markers for
now, since the templates themselves currently do not support
translations.
Paths ending with \ will fail the verification introduced in 0bc0c17d663e when
checking out on Windows ... and if it didn't fail it would probably not do what
the user expected.
Lines with only a directive are not deleted anymore because they are detected
before comments are deleted by prunecomments().
addmargins() will be adapted later.
Forgets need to be in the beginning of the action list, same as removes. This
lets us avoid clashes in the dirstate where a directory is forgotten and a
file with the same name is added, or vice versa.
When aborting a rebase where tip-1 is public, rebase would fail to undo the merge
state. This caused unexpected dirstate parents and also caused unshelve to
become unabortable (since it uses rebase under the hood).
The problem was that rebase uses -2 as a marker rev, and when it checked for
immutableness during the abort, -2 got resolved to the second to last entry in
the phase cache.
Adds a test for the fix. Add exception to phase code to prevent this in the
future.
This is arguably a workaround, a better fix may be in the repo to
ensure that it won't list a file 'modified' unless there is a file
context for the previous version.
On Windows, only double quotation mark can quote command line
arguments.
So, this patch uses double quotation mark to quote command line
arguments in all examples of online help document.
This patch revises hint message from "for detail about" introduced by
changeset 49ed20ea8da0 to "for details about", to unify it with the
hint message introduced by proceeding patch.
This patch adds more detailed explanation about "--force" to online
help document of "hg push" to prevent novice users to execute "push
--force" easily without understanding about problems of multiple
branch heads in the repository.
"use push -f to force" in the hint at abortion of "hg push" may cause
novice users to execute "push -f" easily without understanding about
problems of multiple branch heads in the repository.
This patch hides description about "-f" in the hint, and leads into
seeing "hg help push" for details about pushing new heads.
Before this patch, python modules of each extensions can't import
another one in own extension by absolute name, because root modules of
each extensions are loaded with "hgext_" prefix.
For example, "import extroot.bar" in "extroot/foo.py" of "extroot"
extension fails, even though "import bar" in it succeeds.
Installing extensions into site-packages of python library path can
avoid this problem, but this solution is not reasonable in some cases:
using binary package of Mercurial on Windows, for example.
This patch retries to import with "hgext_" prefix after ImportError,
if the module in the extension may try to import another one in own
extension.
This patch doesn't change some "_import()"/"_origimport()" invocations
below, because ordinary extensions shouldn't cause such invocations.
- invocation of "_import()" when root module imports sub-module by
absolute path without "fromlist"
for example, "import a.b" in "a.__init__.py".
extensions are loaded with "hgext_" prefix, and this causes
execution of another (= fixed by this patch) code path.
- invocation of "_origimport()" when "level != -1" with "fromlist"
for example, importing after "from __future__ import
absolute_import" (level == 0), or "from . import b" or "from .a
import b" (0 < level),
for portability between python versions and environments,
extensions shouldn't cause "level != -1".
Before this patch, demandimport of Mercurial may fail to load external
libraries using "from __future__ import absolute_import": for example,
importing "foo" in "bar.baz" module will load "bar.foo" if it exists,
even though "absolute_import" is enabled in "bar.baz" module.
So, extensions for Mercurial can't use such external libraries.
This patch saves "level" of import request for on-demand module
loading in the future: default value of level is -1, and level is 0
when "absolute_import" is enabled.
"level" value is passed to built-in import function in
"_demandmod._load()" and it should load target module correctly.
This patch changes only one "_demandmod" construction case other than
cases below:
- construction in "_demandmod._load()"
this code path should be used only in relative sub-module
loading case
- constructions other than patched one in"_demandimport()"
these code paths shouldn't be used in "level != -1" case
hg update . (or equivalents) are effectively no-ops in just about all
circumstances. These sorts of updates can be especially common in a
bookmark-oriented workflow. This saves us a status check and a manifest
decompression, which means that on a repo with over 210,000 files, this brings
hg update . down from 2.5 seconds to 0.15.
There is one change in behavior: a file that was added, not committed, and then
deleted but not removed used to be removed from the dirstate. With this patch
it isn't. This is what causes the change in test-mq-qpush-exact.t. This seems
like it's enough of an edge case to not be worth handling.
The output of test-empty.t changes because those files are not yet created.
Before this patch, each feature setup functions for localrepository
class should examine whether corresponding extension is enabled or not
by themselves.
This patch invokes only feature setup functions defined in module of
enabled extensions, and it makes implementation of feature setup
functions easier and simpler.
Push is currently allowed to create a new head if there is a remote
bookmark that will be updated to point to the new head. If the
bookmark is not known remotely then push aborts, even if a -B argument
is about to push the bookmark. This change allows push to continue in
this case. This does not require a wireproto force.
This modifies slightly the behavior introduced in fcc482469a3c to allow
showextras to return a hybrid, rather than showlist. The example in the
template help file now executes and returns meaningful results.
Running perfmoonwalk on the Mercurial repo (with almost 20,000 changesets) on
Mac OS X with an SSD, before this change:
$ hg --config format.chunkcachesize=1024 perfmoonwalk
! wall 2.022021 comb 2.030000 user 1.970000 sys 0.060000 (best of 5)
(16,154 cache hits, 3,840 misses.)
$ hg --config format.chunkcachesize=4096 perfmoonwalk
! wall 1.901006 comb 1.900000 user 1.880000 sys 0.020000 (best of 6)
(19,003 hits, 991 misses.)
$ hg --config format.chunkcachesize=16384 perfmoonwalk
! wall 1.802775 comb 1.800000 user 1.800000 sys 0.000000 (best of 6)
(19,746 hits, 248 misses.)
$ hg --config format.chunkcachesize=32768 perfmoonwalk
! wall 1.818545 comb 1.810000 user 1.810000 sys 0.000000 (best of 6)
(19,870 hits, 124 misses.)
$ hg --config format.chunkcachesize=65536 perfmoonwalk
! wall 1.801350 comb 1.810000 user 1.800000 sys 0.010000 (best of 6)
(19,932 hits, 62 misses.)
$ hg --config format.chunkcachesize=131072 perfmoonwalk
! wall 1.805879 comb 1.820000 user 1.810000 sys 0.010000 (best of 6)
(19,963 hits, 31 misses.)
We may want to change the default size in the future based on testing and
user feedback.
When reading a revlog chunk, instead of reading up to 64 KB ahead of the
request offset and caching that, this change caches a fixed window before
and after the requested data that falls on 64 KB boundaries. This increases
cache hits when reading revlogs backwards.
Running perfmoonwalk on the Mercurial repo (with almost 20,000 changesets) on
Mac OS X with an SSD, before this change:
$ hg perfmoonwalk
! wall 2.307994 comb 2.310000 user 2.120000 sys 0.190000 (best of 5)
(Each run has 10,668 cache hits and 9,304 misses.)
After this change:
$ hg perfmoonwalk
! wall 1.814117 comb 1.810000 user 1.810000 sys 0.000000 (best of 6)
(19,931 cache hits, 62 misses.)
On a busy NFS share, before this change:
$ hg perfmoonwalk
! wall 17.000034 comb 4.100000 user 3.270000 sys 0.830000 (best of 3)
After:
$ hg perfmoonwalk
! wall 1.746115 comb 1.670000 user 1.660000 sys 0.010000 (best of 5)
Before this patch, phase of newly created commit is determined by
"phases.new-commit" configuration regardless of phase of state in each
subrepositories.
For example, this may cause the "public" revision in the parent
repository referring the "secret" one in subrepository.
This patch checks phase of state in each subrepositories before
committing in the parent, and aborts or changes phase of newly created
commit if subrepositories have more restricted phase than the parent.
This patch uses "follow" as default value of "phases.checksubrepos"
configuration, because it can keep consistency between phases of the
parent and subrepositories without breaking existing tool chains.
This change causes an informative ImportError to be raised when importing
the extension module parsers if the minor version of the currently-running
Python interpreter doesn't match that of the Python that was used when
compiling the extension module. Here is an example of what the new error
looks like:
Traceback (most recent call last):
File "test.py", line 1, in <module>
import mercurial.parsers
ImportError: Python minor version mismatch: The Mercurial extension
modules were compiled with Python 2.7.6, but Mercurial is currently using
Python with sys.hexversion=33883888: Python 2.5.6
(r256:88840, Nov 18 2012, 05:37:10)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))]
at: /opt/local/Library/Frameworks/Python.framework/Versions/2.5/Resources/
Python.app/Contents/MacOS/Python
The reason for raising an error in this scenario is that Python's C API
is known not to be compatible from minor version to minor version, even
if sys.api_version is the same. See for example this Python bug report
about incompatibilities between 2.5 and 2.6+:
http://bugs.python.org/issue8118
These incompatibilities can cause Mercurial to break in mysterious,
unforeseen ways. For example, when Mercurial compiled with Python 2.7 was
run with 2.5, the following crash occurred when running "hg status":
http://bz.selenic.com/show_bug.cgi?id=4110
After this crash was fixed, running with Python 2.5 no longer crashes, but
the following puzzling behavior still occurs:
$ hg status
...
File ".../mercurial/changelog.py", line 123, in __init__
revlog.revlog.__init__(self, opener, "00changelog.i")
File ".../mercurial/revlog.py", line 251, in __init__
d = self._io.parseindex(i, self._inline)
File ".../mercurial/revlog.py", line 158, in parseindex
index, cache = parsers.parse_index2(data, inline)
TypeError: data is not a string
which can be reproduced more simply with:
import mercurial.parsers as parsers
parsers.parse_index2("", True)
Both the crash and the TypeError occurred because the Python C API's
PyString_Check returns the wrong value when the C header files from
Python 2.7 are run with Python 2.5. This is an example of an
incompatibility of the sort mentioned in the Python bug report above.
Failing fast with an informative error message will result in a better
user experience in cases like the above. The information in the ImportError
will also simplify troubleshooting for those on Mercurial mailing lists,
the bug tracker, etc.
This patch only adds the version check to parsers.c, which is sufficient
to affect command-line commands like "hg status" and "hg summary".
An idea for a future improvement is to move the version-checking C code
to a more central location, and have it run when importing all
Mercurial extension modules and not just parsers.c.
When creating patches modifying binary files using "git format-patch",
git creates 'literal' and 'delta' hunks. Mercurial currently supports
'literal' hunks only, which makes it impossible to import patches with
'delta' hunks.
This changeset adds support for 'delta' hunks. It is a reimplementation
of patch-delta.c from git :
http://git.kernel.org/cgit/git/git.git/tree/patch-delta.c
Previously, directories were added with the trailing slash and, if there was
only one completion, then another ambiguous entry was created using '.', as
follows:
$ hg rm mer<TAB>
mercurial/./ mercurial//
This was added in bc559aff745c (though, some logic existed before that) to work
around bash completion adding a space after the sole entry because we treated
directories and files the same. We no longer do that now so we remove this
unneeded code.
Tests have been updated to match this new behavior.
Some debuggers, such as ipdb, load escape codes and color codes even when later
turned off. This will affect scripts that do simple parsing and can't handle
escape codes. Therefore, we only load a custom debugger if ui.plain() is false.
When try to compile on x64 OS X, I get this warning:
mercurial/parsers.c:931:27: warning: implicit conversion loses integer precision
: 'long' to 'int' [-Wshorten-64-to-32]
? 4 : self->raw_length / 2;
The patch verifies if value of self->raw_length falls bellow INT_MAX; if not,
it raises the ValueError exception.
If value of self->raw_length is greater than 4, it's casted to int type, to
eliminate the warning.
There are currently two different ways we can have no active bookmark:
.hg/bookmarks.current being missing and it being an empty file. This patch and
upcoming ones make an empty file the only way to represent no active bookmarks.
This is the right choice because it matches the state that a new repository
without bookmarks will be in.
This patch makes "lock.lock.__init__()" take both vfs and lock file
path relative to vfs, instead of absolute path to lock file.
This allows lock file to be accessed via vfs.
This patch rewrites "copystore()" with vfs, because succeeding patch
requires "lock.lock()" invocation with vfs.
This patch uses 'dstbase + "/lock"' instead of "join()" with both
elements even on Windows environment but it should be reasonable,
because target files given from "store.copyfiles()" already uses "/"
as path separator.
"util.copyfiles()" between two vfs-s may have to be rewritten in the
future.
Before this patch, "localrepo.py" has many methods defining local
variable "lock", even though it imports "lock" module as "lock". This
ambiguity decreases readability.
Before this patch, unlink target file is once opened before unlinking,
because "opener" before vfs migration doesn't have "unlink()"
function.
This patch uses "vfs.unlink()" instead of "open()" and "fp.name".
Copying a repos local configuration to another repo is a bad idea because the
2nd repo gets the configuration of the first. Prevent this by really calling
repo.baseui.copy when repo.ui.copy is called.
This requires some changes in commandserver which needs to clone repo.ui for
rejecting temporary changes.
This patch has its roots back in the topic "repo isolation" around 68ae3063a47d
and was suggested by mpm.
The {branches} keyword dates to pre-1.0 Mercurial's tag-like branch
scheme which allowed changesets to be on multiple branches. This is
the last visible vestige of that scheme, users should instead be using
{branch}, possibly with if().
During a commit amend there are 4 manifests being handled:
- original commit
- temporary commit
- amended commit
- merge base
This causes a manifest cache miss which hurts perf on large repos. On a large
repo, this fix causes amend to go from 6 seconds to 5.5 seconds.
The previous revlog strip computation would walk every rev in the revlog, from
the bottom to the top. Since we're usually stripping only the top few revs of
the revlog, this was needlessly expensive on large repos.
The new algorithm walks the exact number of revs that will be stripped, thus
making the operation not dependent on the number of revs in the repo.
This makes amend on a large repo go from 8.7 seconds to 6 seconds.
When computing the commonmissing, it greedily computes the entire set
immediately. On a large repo where the majority of history is irrelevant, this
causes a significant slow down.
Replacing it with a lazy set makes amend go from 11 seconds to 8.7 seconds.
Upcoming patches will allow the file cache to watch over multiple files, and
call the decorated function again if any of the files change.
The particular use case for this is the bookmark store, which needs to be
invalidated if either .hg/bookmarks or .hg/bookmarks.current changes. (This
doesn't currently happen, which is a bug. This bug will also be fixed in
upcoming patches.)
This is a step towards breaking an import cycle between revset and
repoview. Import cycles happened to work in Python 2 with implicit
relative imports, but breaks on Python 3 when we start using explicit
relative imports via 2to3 rewrite rules.
Before this patch, duplicated obsolescence markers could slip into an
obstore if the bookmark was unknown locally and duplicated in the
incoming obsolescence stream.
Existing duplicate markers will not be automatically removed but
they'll stop propagating. Having a few duplicated markers is harmless
and people have been warned evolution is <blink>experimental</blink>
anyway.
This patch adds "updateremote()", which uses "compare()" to compare
bookmarks between the local and the remote repositories, to replace
pushing local bookmarks in "localrepository.push()".
This patch adds "pushtoremote()", which uses "compare()" to compare
bookmarks between the local and the remote repositories, to replace
pushing local bookmarks in "commands.push()".
To update entries in bmstore "localmarks", this patch uses
"bin(changesetid)" instead of "repo[changesetid].node()" used in
original "updatefromremote()" implementation, because the former is
cheaper than the latter.
This is the same thing which was done for changelog earlier, and it doesn't
affect performance at all. This change will make it possible to get the first
entry of the next page easily without computing the list twice.
Before this patch, almost all of check code in
"casecollisionauditor.__call__()" is executed, even if specified
filename is already checked, because "f in self._newfiles" is examined
lastly.
In addition to it, adding "fl" to "self._loweredfiles" and "f" to
"self._newfiles" are also redundant in such case.
This patch checks "f in self._newfiles" first, and returns immediately
to avoid execution of check code for efficiency.
dirstate.walk will return unknown files that were explicitly requested, even
if listunknown is false. There's no point in dropping these files on the
floor in dirstate.status.
This has no effect on any current callers, because all of them assume the
unknown list is empty and ignore it. Future callers may find it useful,
though.
When a user reaches the end of repo history with graph view, it (unsuccessfully)
tried to load more entries. This patch disables further loading attempts.
This patch gets stat object of target path by "vfs.lstat()", and
examines stat object to know the type of it. This follows the way in
"workingctx.add()".
This should be cheaper than original implementation invoking
"lexists()", "isfile()" and "islink()".
This patch also changes paths added to "rejected" list from full path
(referred by "p") to relative one (referred by "f"), when type of
target path is neither file nor symlink.
This change should be reasonable, because the path added to "rejected"
list is relative one, when "OSError" is raised at "lstat()".
Just invoking "os.fstat()" with "file.fileno()" doesn't require non
ANSI file API, because filename is not used for invocation of
"os.fstat()".
But "util.fstat()" should invoke "os.stat()" with "fp.name", if file
object doesn't have "fileno()" method for portability, and "fp.name"
may cause invocation of non ANSI file API.
So, this patch makes the constructor of appender class invoke
"util.fstat()" via vfs, to encapsulate filename handling.
The changegroup hook runs when the repo lock is released, but it's possible that
multiple transactions have happened during that single lock and therefore the
commits the hook was for may be gone. This is the case in the shelve extension
where it adds a commit and strips it in the same lock but different
transactions (which results in warning messages during unshelve on hgsubversion
repos).
A real fix would be to attach the hook to the transaction instead, but that
might have unknown consequences. Since we're this close to code-freeze, this fix
just prevents the hook from running if the commit disappeared.