The goal of 'hg revert --interactive' is usually to keep some change in the
revert file, so the files -must-not- be marked as clean. We want the status
logic to do its usual job here.
For unclear reasons (probably timing related), I was unable to build an automated
test that reproduced issue4592 but manual testing shows this is fixed.
__setitem__ on the lazymanifest C type wasn't checking to see if a
line had previously been malloced before replacing it, leading to
leaks if files got updated multiple times in the course of a task.
I was able to reproduce the leak with this change to test-manifest.py:
diff --git a/tests/test-manifest.py b/tests/test-manifest.py
--- a/tests/test-manifest.py
+++ b/tests/test-manifest.py
@@ -456,6 +456,16 @@ class basemanifesttests(object):
['a/b/c/bar.txt', 'a/b/c/foo.txt', 'a/b/d/ten.txt'],
m2.keys())
+ def testManifestSetItem(self):
+ m = self.parsemanifest('')
+ for x in range(3):
+ m['file%d' % x] = BIN_HASH_1
+ for x in range(3):
+ m['file%d' % x] = BIN_HASH_2
+ import time
+ time.sleep(4)
+
+
along with the commands:
$ make local
$ PYTHONPATH=. SILENT_BE_NOISY=1 python tests/test-manifest.py testmanifestdict.testManifestSetItem &
$ sleep 4
$ leaks $(jobs -p | tee /dev/stderr | awk '{print $3}')
$ wait
in an interactive shell on OS X. As far as I can tell, it had to be an
interactive shell so that I could get the pid of the test run using
the jobs builtin. Prior to this change, I was leaking several strings,
and after this change leaks reports no leaks.
I thought there was a bug filed for this in bugzilla, but I can't find
it either in bugzilla or by searching my email.
I came across a case where an internal command was using a revset that I didn't
immediately pass in and it was difficult to debug what was going wrong with the
revset. This prints out the revset and informs the user that the error is with
a rebset so it should be more obvious what and where the error is.
In repos with many changesets, the computation of allfuturecommon
can take a significant amount of time. Since it's only used if
there's an obsstore, don't compute it otherwise.
When we start writing tree manifests with one manifest revlog per
directory, it will still be nice to be able to run tests using tree
manifests in memory but writing to a flat manifest to a single
revlog. Let's break the current '_usetreemanifest' flag on the revlog
into '_treeinmem' and '_treeondisk'. Both are populated from the same
config, but after this change, one can temporarily hard-code
_treeinmem=True to see that tests still pass.
manifestdict() creates an empty manifestdict, so let's consistently
use that instead of explicitly parsing an empty string (which does
result in an empty manifest).
The '--all' option have been introduced in 0a81b7721d8f (August 2006), most
probably to prevent user shooting themselves in the foot. As the record process
will let you, view and select the set of files and change you want to revert, I
feel like the '--all' flag is superfluous in the '--interactive' case.
The series at a17556fc1521::77b112363d48 introduced generic transaction level
hooking. This makes the experimental bundle2 specific hooks redundant, we drop
them.
That way, any new server will be ready to accept bundle2 payload. The decision
for the client to use it is still off by default so this is not turning bundle2
everywhere.
We introduce a new kill switch for this in case stuff goes wrong.
According to 6b1369445b7b introducing "windows._removedirs()":
If a hg repository including working directory is a reparse point
(directory symlinked or a junction point), then using
os.removedirs will remove the reparse point erroneously.
"windows._removedirs()" should be used instead of "os.removedirs()" on
Windows.
This patch adds "removedirs" as platform depending function to replace
"os.removedirs()" invocations for portability and safety
This duplicates "onerror()" function from "svnsubrepo.remove()" for
equivalence of replacing in subsequent patch.
This "onerror()" function for "shutil.rmtree()" was introduced by
094a056562e7, which avoids failure of removing svn repository on
Windows.
Revision 7fbf0ef28408 ('graft: record intermediate grafts in extras') introduced
'intermediate-source' extra which refers to the closest graft source.
As 'intermediate-source' extra provides more detailed information about the source
changeset than 'source' one, it is better to prefer the first one to use as a
value of HGREVISION environment variable for an editor.
It is finally time to freeze the bundle2 format! To do so we:
- rename HG2Y to HG20,
- drop "b2x:" prefix from all part names,
- rename capability to "bundle2-exp" to "bundle2"
- rename the hook flag from 'bundle2-exp' to 'bundle2'
The condition on which manifestdict.matches() and manifestdict.walk()
take the fast path of iterating over files instead of the manifest, is
slightly different. Specifically, walk() does not take the fast path
for exact matchers and it does not avoid taking the fast path when
there are more than 100 files. Let's extract the condition so we don't
have to maintain it in two places and so walk() can gain these two
missing pieces of the condition (although there seems to be no current
caller of walk() with an exact matcher).
When checking whether we can take the fast path of iterating over
matcher files instead of manifest files, we check whether
match.files() is non-empty. However, now that return early for
match.always(), it can only be empty when there are only
include/exclude patterns, but in that case anypats() will be True, so
it's already covered. This makes manifestdict.walk() more similar to
manifestdict.matches().
This cuts down the run time of
hg files -r . > /dev/null
from ~0.850s to ~0.780s on the Firefox repo. Note that
manifest.matches() already has the corresponding optimization.
This patch adds propertycache-ed "_relpath" field to
"abstractsubrepo", to centralize "subrelpath" logic into it.
Now, "subrelpath()" can always return "_relpath" field of the
specified subrepo object, because it is ensured that subrepo object
has it. To reduce changes in this patch, "subrelpath()" itself is
still kept, even though it seems to be redundant.
This is also a part of eliminating "os.path.*" API invocations for
"Windows UTF-8 Plan".
This patch doesn't create vfs object in "abstractsubrepo.__init__()"
but adds propertycache-ed "wvfs" field, because the latter can:
- delay vfs instantiation until it is actually needed
- allow to use "hgsubrepo._repo.wvfs" as "wvfs"
"hgsubrepo._repo" is initialized after
"abstractsubrepo.__init__()" invocation, and passing
"hgsubrepo._repo.wvfs" to "abstractsubrepo.__init__()" is
difficult.
This patch passes "ctx" and "path" instead of "ui" to
"abstractsubrepo.__init__()" and stores them as "_ctx" and "_path" to
use them in subsequent patches.
This also removes redundant field initializations in the constructor
of classes derived from "abstractsubrepo".
This makes treemanifest.walk() not visit submanifests that are known not to
have any matching files. It does this by calling match.visitdir() on
submanifests as it walks.
This change also updates largefiles to be able to work with this new behavior
in treemanifests. It overrides match.visitdir(), the function that dictates
how walk() and matches() skip over directories.
The greatest speed improvements are seen with narrower scopes. For example,
this commit speeds up the following command on the Mozilla repo from 1.14s
to 1.02s:
hg files -r . dom/apps/
Whereas with a wider scope, dom/, the speed only improves from 1.21s to 1.13s.
As with similar a similar optimization to treemanifest.matches(), this change
will bring out even bigger performance improvements once treemanifests are
loaded lazily. Once that happens, we won't just skip over looking at
submanifests, but we'll skip even loading them.
The _intersectfiles() method is only called from one place, it's
pretty short, and its caller has to be aware when it's appropriate to
call it (when the number of files in the matcher is not too large), so
let's inline it.
Currently __contains__ is called only by "rev()" revset, but "x in cl" is a
function that is likely to be used in hot loop. revlog.__contains__ is simple
enough to duplicate to changelog, so just inline it.
Before this patch, "hg outgoing -B" shows only difference of bookmarks
between two repositories, and it isn't user friendly.
This patch shows detailed status about outgoing bookmarks at "hg
outgoing -B".
To avoid breaking backward compatibility with other tool chains, this
patch shows status, only if --verbose is specified,
Before this patch, "hg incoming -B" shows only difference of bookmarks
between two repositories, and it isn't user friendly.
This patch shows detailed status about incoming bookmarks at "hg
incoming -B".
To avoid breaking backward compatibility with other tool chains, this
patch shows status, only if --verbose is specified,
Before this patch, "hg outgoing -B" shows only bookmarks added
locally. Then, users can't know about bookmarks below before "hg push"
execution.
- deleted locally (even though it may be added remotely from "hg pull" view)
- advanced locally
- diverged
- changed (= remote revision is unknown for local)
This patch shows such bookmarks, too.
Before this patch, "hg incoming -B" shows only bookmarks added
remotely. Then, users can't know about bookmarks below before "hg
pull" execution.
- advanced remotely
- diverged
- changed (remote revision is unknown for local)
This patch shows such bookmarks, too.
It appears that the read() in readpipe() never actually ran before (in
test-ssh.t anyway). A print of the size returned from os.fstat() is 0 for every
single print output in test-ssh.t, so the data in the pipe ends up being read
later instead of when it is available. This is the same problem as Linux, as
mentioned in e20a5309b88d.
There are several places in the Windows SSH tests where the order of local
output vs remote output differ from the other platforms. This only fixes one of
those cases (and interstingly, not the one added in order to test e20a5309b88d),
so there is more investigation needed. However, without this patch, test-ssh.t
also has this diff:
--- c:/Users/Matt/Projects/hg/tests/test-ssh.t
+++ c:/Users/Matt/Projects/hg/tests/test-ssh.t.err
@@ -397,11 +397,11 @@
$ hg push --ssh "sh ../ssh.sh"
pushing to ssh://user@dummy/*/remote (glob)
searching for changes
- remote: Permission denied
- remote: abort: prechangegroup.hg-ssh hook failed
- remote: Permission denied
- remote: pushkey-abort: prepushkey.hg-ssh hook failed
updating 6c0482d977a3 to public failed!
+ remote: Permission denied
+ remote: abort: prechangegroup.hg-ssh hook failed
+ remote: Permission denied
+ remote: pushkey-abort: prepushkey.hg-ssh hook failed
[1]
$ cd ..
Output with this change was stable over 600+ runs of test-ssh.t. I initially
tried a background thread to read the pipe[1], but this was simpler and the test
results were exactly the same. I also tried SetNamedPipeHandleState(), but the
PIPE_NOWAIT is for compatibility with LANMAN 2.0, not for async I/O (the results
were identical though).
[1] http://eyalarubas.com/python-subproc-nonblock.html
This will be used in the next patch to do nonblocking reads from the child
process, like on posix platforms. See that for why os.fstat() is insufficient.
I changed this to an explicit Py_DECREF + set to null in 4f6b7d37d06a. This was
a silly misunderstanding on my part -- for some reason I thought Py_CLEAR set
its argument to null only if its refcount reached 0. Turns out that's not
actually the case -- Py_CLEAR is just Py_DECREF + set to null with some
additional precautions around destructors that aren't relevant here.
The real bug that 4f6b7d37d06a fixed was the fact that we were mutating the
string after setting it in the Python dictionary.
This function refactors the logic that decides to use 'bundle2' during an
exchange (pull/push). This will help being consistent while transitioning from
the experimental protocol to the final frozen version.
I do not expect this function to survive on the long run when using 'bundle2'
will become a simple capability check.
This is also necessary to allow HG2Y support in an extension to ease transition
of companies using the experimental protocol in production (yeah...). Such
extension will be able to wrap this function to use the experimental protocol in
some case.
To support more bundle2 formats, we need a wider detection of bundle2-family
streams. The various places what were explicitly detecting the full magic string
are now matching on the first three characters of it.
We now take full advantage of the 'getunbundler' function by using a
'{version -> unbundler-class}' mapping. This map currently contains a single
entry but will make it easy to support more versions from an extension/the
future.
At some point, this map will probably contain bundler-class information too,
in the same fashion the packer map does. However, this is not critically required
right now so it will happen by itself when needed.
The main target is to allow HG2Y support in an extension to ease transition of
companies using the experimental protocol in production (yeah...) But I've no
doubt this will be useful when playing with a future HG21.
This refactor is a preparation for an optimization in the next commit. This
introduces a recursive element that recurses each submanifest. By using a
recursive function, the next commit can avoid walking over some subdirectories
altogether.
The logic of walking a manifest to yield files matching a match object is
currently being done by context, not the manifest itself. This moves the walk()
function to both manifestdict and treemanifest. This separate implementation
will also permit differing, optimized implementations for each manifest.
It isn't obvious which file is the problem with deep subrepos, so provide the
path. Since the parsing is done with a ctx and not a subrepo object, it isn't
possible to display a path from the root subrepo. Therefore, the path shown is
relative to cwd.
There's no test coverage for the first abort, and I couldn't figure out how to
trigger it, but it is changed for consistency.
Previously the extra field for a graft only contained the original commit hash.
This made it impossible to use graft to copy a commit more than once, because
the extras fields did not change after the second graft.
The fix is to add an extra.intermediate-source field that records the immediate
predecessor to graft. This changes hashes for commits that have been grafted
twice, which is why the test was affected.
Previously it was impossible to graft a commit onto it's own parent (i.e. create
a copy of the commit). This is useful when wanting to create a backup of the
commit before continuing to amend it. This patch enables that behavior.
The change to the histedit test is because histedit uses graft to apply commits.
The test in question moves a commit backwards onto an ancestor. Since the graft
logic now more explicitly supports this, it knows to simply accept the incoming
changes (since they are more recent), instead of prompting.
To support multiple bundle2 formats, we will need a function returning
the proper unbundler according to the header. We introduce such aa
function and change the usage in the code base. The function will get
smarter in later changesets.
This is somewhat similar to the dispatching we do for 'HG10' and 'HG11'.
The main target is to allow HG2Y support in an extension to ease transition of
companies using the experimental protocol in production (yeah...) But I've no
doubt this will be useful when playing with a future HG21.
This makes it easy to create a new bundler class that inherits from
the core one. This matches the way 'changegroup' packers work.
The main target is to allow HG2Y support in an extension to ease transition of
companies using the experimental protocol in production (yeah...) But I've no
doubt this will be useful when playing with a future HG21.
The 'format' argument was not used even when it was added in
05af7fe09faf (getbundle: pass arbitrary arguments all along the call
chain, 2014-04-17).
Note that by removing the argument, if any caller did pass a named
'format' argument, we will now pass that along to exchange.getbundle()
via the kwargs. If the idea was to remove such a key, that should have
been done explicitly.
When the 'kwargs' variable was added in 038ae1f12393 (bundle2: allow
pulling changegroups using bundle2, 2014-04-01), it could contain only
'bundlecaps', 'common' and 'heads', so the check for 'format' would
always be false. Since then, _pullbundle2extraprepare() has been added
for hooks, but it seems unlikely that they would a 'format' key.
The matches function was previously traversing all submanifests to look for
matching files, even though it was possible to know if a submanifest won't
contain any matches.
This change adds a visitdir function on the match object to decide quickly if
a directory should be visited when traversing. The function also decides if
_all_ subdirectories should be traversed.
Adding this logic as methods on the match object also makes the logic
modifiable by extensions, such as largefiles.
An example of a command this speeds up is running
hg status --rev .^ python/
on the Mozilla repo with the treemanifest experiment enabled.
It goes from 2.03s to 1.85s.
More improvements to speed from this change will happen when treemanifests are
lazily loaded. Because a flat manifest is still loaded and then converted
into treemanifests, speed improvements are limited.
This change has no negative effect on speed. For a worst-case example, this
command is not negatively impacted:
hg status --rev .^ 'relglob:*.js'
on the Mozilla repo. It goes from 2.83s to 2.82s.
An upcoming commit requires that match.py be able to call scmutil.dirs(), but
when match.py imports scmutil, a dependency cycle is created. This commit
avoids the cycle by moving dirs() and its related finddirs() function from
scmutil to util, which match.py already depends on.
Parsers.py had a reference to util.sha1 which was unused. This commit removes
this reference as well as the unused import of util to simplify the dependency
graph. This is important for the next commit which actually relocates part
of a module to eliminate a cycle.
The _computenonoverlap function takes two manifests to allow extensions to hook
in and read the manifest nodes produced by the function. The remotefilelog
extension actually needs the entire changectx instead (which includes the
manifest) so it can prefetch the subset of files necessary for a sparse checkout
(and the sparse checkout depends on which commit is being accessed, hence the
need for the changectx).
I have tests in the remotefilelog extension that cover this.
One of the rules of Python strings is that they're immutable. dirs._addpath
breaks this assumption for performance, which is fine as long as it is done
safely -- once a string is no longer internal-only it shouldn't be mutated.
Unfortunately, we weren't being safe here -- we were mutating 'key' even after
adding it to a dictionary.
This only really affects other C code that reads strings, so it's somewhat hard
to write a test for this without poking into the internal representation of the
string via ctypes or similar. There is currently no C code that reads the
output of the string, but there will likely be some soon as the bug indicates.
There's no significant difference in performance.
Since all children are processed before their parents, we can apply the following algorithm:
For each rev (descending order):
* If I'm still hidden, no children will block me,
* If I'm not hidden, I must remove my parent from the hidden set,
This allows us to dynamically change the set of 'hidden' revisions, dropping the
need for the 'actuallyhidden' dictionary and the 'blocked' boolean in the queue.
As before, we start iterating from all heads and stop at the first public
changesets. This ensures the hidden computation is 'O(not public())' instead of
'O(len(min(not public()):))'.
Since we want to process all non-public changesets from top to bottom, a heap
seems more appropriate. This will ensure any revision is processed after all
its children, opening the way to code simplification.
test-https.t was broken at d133034be253 if /usr/bin/pythonX.Y is used on
Mac OS X.
If python executable is not named as "python", run-tests.py creates a symlink
and hghave uses it. On the other hand, the installed hg executable knows the
real path to the system Python. Therefore, there was an inconsistency that
hghave said it was not an Apple python but hg knew it was.
This is a significant performance win on large repositories. perffilefoldmap:
On Linux/gcc, on a test repo with over 500,000 files:
before: wall 0.605021 comb 0.600000 user 0.560000 sys 0.040000 (best of 17)
after: wall 0.280530 comb 0.280000 user 0.250000 sys 0.030000 (best of 35)
On Mac OS X/clang, on a real-world repo with over 200,000 files:
before: wall 0.281103 comb 0.280000 user 0.260000 sys 0.020000 (best of 34)
after: wall 0.133622 comb 0.140000 user 0.120000 sys 0.020000 (best of 65)
This visibly impacts status times on case-insensitive file systems. On the Mac
OS X repo, status goes from 3.64 seconds to 3.50.
With the third-party hgwatchman extension [1], 'hg status' on the same repo
goes from 0.80 seconds to 0.65.
[1] https://bitbucket.org/facebook/hgwatchman
This is a hot path on case-insensitive filesystems -- it's guaranteed to be
called every time 'hg status' is run.
This is significantly faster than the equivalent Python code: see the following
patch for numbers.
Unlike changeset_printer, it does not hide the manifest field because JSON
output will be parsed by machine where explicit "null" will be more useful
than nothing.
When tree manifests are stored with one revlog per directory and
loaded lazily, it's unclear how much readdelta will help. If only a
few files change, then only a small part of the full manifest will be
loaded, and the delta chains should also be shorter for tree
manifests. Therefore, let's disable readdelta for tree manifests for
now.
These will be used in upcoming patches to efficiently create a dirstate
foldmap.
The Cygwin normcase behavior is more complicated than just a simple lowercasing
or uppercasing. That's why we specify 'other'.
For C code we don't want to pay the cost of calling into a Python function for
the common case of ASCII filenames. However, while on most POSIX platforms we
normalize filenames by lowercasing them, on Windows we uppercase them. We
define an enum here indicating the direction that filenames should be
normalized as. Some platforms (notably Cygwin) have more complicated
normalization behavior -- we add a case for that too.
In upcoming patches we'll also define a fallback function that is called if the
string has non-ASCII bytes.
This enum will be replicated in the C code to make foldmaps. There's
unfortunately no nice way to avoid that -- we can't have encoding import
parsers because of import cycles. One way might be to have parsers import
encoding, but accessing Python modules from C code is just awkward.
The name 'normcasespecs' was chosen to indicate that this is merely an integer
that specifies a behavior, not a function. The name was pluralized since in
upcoming patches we'll introduce 'normcasespec' which will be one of these
values.
We should consider add HTML rendering of the RST into the response as a
follow-up. I attempted to do this, but there was an empty array
returned by the rstdoc() template function. Not sure what's going on.
Will deal with it later.
These are the same dispatch function under the hood. The only difference
is the default number of entries to render and the template to use. So
it makes sense to use a shared template.
Format for {changelistentry} is similar to {changeset}. However, there
are differences to argument names and their values preventing us from
(easily) using the same template. (Perhaps there is room to consolidate
the templates as a follow-up.)
We're currently not recording some data in {changelistentry} that exists
in {changeset}. This includes the branch name. This should be added in
a follow-up. For now, something is better than nothing.
The content of "hg help templating" is largely derived from docstrings
on functions providing functionality. Template functions are the long
holdout.
Prepare for generating them dynamically by defining docstrings for all
template functions.
There are numerous ways these docs could be improved. Right now, the
help output simply shows function names and arguments. So literally
any accurate data is better than what is there now.
It's unfortunate that workingctx revision is None, which doesn't work well in
arithmetic operation or comparison. This function is trivial but will be used
in several places.
We're going to reuse this in upcoming patches.
The change to Py_ssize_t is necessary because parsers.c doesn't define
PY_SSIZE_T_CLEAN. That macro changes the behavior of PyArg_ParseTuple but not
PyBytes_GET_SIZE.
The new manifest format is designed to be smaller, in particular to
produce smaller deltas. It stores hashes in binary and puts the hash
on a new line (for smaller deltas). It also uses stem compression to
save space for long paths. The format has room for metadata, but
that's there only for future-proofing. The parser thus accepts any
metadata and throws it away. For more information, see
http://mercurial.selenic.com/wiki/ManifestV2Plan.
The current manifest format doesn't allow an empty filename, so we use
an empty filename on the first line to tell a manifest of the new
format from the old. Since we still never write manifests in the new
format, the added code is unused, but it is tested by
test-manifest.py.
While it should be safe to switch to the new manifest format on an
existing repo, let's keep it simple for now and make the configuration
have any effect only at repo creation time. If the configuration is
enabled then (at repo creation), we add an entry to requires and read
that instead of the configuration from then on.
Previously we would compute the repoview's static blockers by finding all the
children of hidden commits that were not hidden. This was O(number of commits
since first hidden change) since 'children' requires walking every commit from
tip until the first hidden change.
The new algorithm walks all heads down until it sees a public commit. This makes
the computation O(number of draft) commits, which is much faster in large
repositories with a large number of commits and a low number of drafts.
On a large repo with 1000+ obsolete markers and the earliest draft commit around
tip~200000, this improves computehidden perf by 200x (2s to 0.01s).
It's pretty surprising phase wasn't part of this template call already.
We now expose {phase} to the {changeset} template and we expose this
data to JSON.
This brings JSON output in line with the output from `hg log -Tjson`.
The lone exception is hweb doesn't print the numeric rev. As has been
stated previously, I don't believe hgweb should be exposing these
unstable identifiers. (We can add them later if we really want them.)
There is still work to bring hgweb in parity with --verbose and
--debug output from the CLI.
Constructing the dirfoldmap is expensive, so if there's a hit in the
filefoldmap, don't construct the directory foldmap.
This helps with cases like 'hg add foo' where foo is already tracked: for a
large repository, the operation goes from 1.5 seconds to 1.2 (which is still
way too much, but that's a matter for another day.)
Rev cce24e8019c8 changed the semantics of the work list to store (normalized,
non-normalized) pairs. All the tuple creation and destruction hurts perf: on a
large repo on OS X, 'hg status' went from 3.62 seconds to 3.78.
It also is unnecessary in most cases:
- it is clearly unnecessary on case-sensitive filesystems.
- it is also unnecessary when filenames have been read off of disk rather than
being supplied by the user.
The only case where the non-normalized case is required at all is when the file
is unknown.
To eliminate most of the perf cost, keep trace of whether the directory needs
to be normalized at all with a boolean called 'alreadynormed'. Pay the cost of
directory normalization only when necessary.
For the above large repo, 'hg status' goes to 3.63 seconds.
By converting treemanifest.matches() into a recursively additivie operation,
it becomes O(n).
The old matches function made a copy of the entire manifest and deleted
files that didn't match. With tree manifests, this was an O(n log n) operation
because del() was O(log n).
This change speeds up the command
"hg status --rev .^ 'relglob:*.js'
on the Mozilla repo, now taking 2.53s, down from 3.51s.
During operations that involve building up a new manifest tree, it will be
useful to be able to quickly check if a submanifest is empty, and if so, to
avoid including it in the final tree. Doing this check lets us avoid creating
treemanifest structures that contain any empty submanifests.
In preparation for the optimization in the following commit, this commit
removes treemanifest.matches()'s call to _intersectfiles(), and removes
_intersectfiles() itself since it's unused at this point.
Tags is pretty easy to implement. Let's start there.
The output is slightly different from `hg tags -Tjson`. For reference,
the CLI has the following output:
[
{
"node": "e2049974f9a23176c2addb61d8f5b86e0d620490",
"rev": 29880,
"tag": "tip",
"type": ""
},
...
]
Our output has the format:
{
"node": "0aeb19ea57a6d223bacddda3871cb78f24b06510",
"tags": [
{
"node": "e2049974f9a23176c2addb61d8f5b86e0d620490",
"tag": "tag1",
"date": [1427775457.0, 25200]
},
...
]
}
"rev" is omitted because it isn't a reliable identifier. We shouldn't
be exposing them in web APIs and giving the impression it remotely
resembles a stable identifier. Perhaps we could one day hide this behind
a config option (it might be useful to expose when running servers
locally).
The "type" of the tag isn't defined because this information isn't yet
exposed to the hgweb templater (it could be in a follow-up) and because
it is questionable whether different types should be exposed at all.
(Should the web interface really be exposing "local" tags?)
We use an object for the outer type instead of Array for a few reasons.
First, it is extensible. If we ever need to throw more global properties
into the output, we can do that without breaking backwards compatibility
(property additions should be backwards compatible). Second, uniformity
in web APIs is nice. Having everything return objects seems much saner than
a mix of array and object. Third, there are security issues with arrays
in older browsers. The JSON web services world almost never uses arrays
as the main type for this reason.
Another possibly controversial part about this patch is how dates are
defined. While JSON has a Date type, it is based on the JavaScript Date
type, which is widely considered a pile of garbage. It is a non-starter
for this reason.
Many of Mercurial's built-in date filters drop seconds resolution. So
that's a non-starter as well, since we want the API to be lossless where
possible. rfc3339date, rfc822date, isodatesec, and date are all lossless.
However, they each require the client to perform string parsing on top of
JSON decoding. While date parsing libraries are pretty ubiquitous, some
languages don't have them out of the box. However, pretty much every
programming language can deal with UNIX timestamps (which are just
integers or floats). So, we choose to use Mercurial's internal date
representation, which in JSON is modeled as float seconds since UNIX
epoch and an integer timezone offset from UTC (keep in mind
JavaScript/JSON models all "Numbers" as double prevision floating point
numbers, so there isn't a difference between ints and floats in JSON).
Many have long wanted hgweb to emit a common machine readable output.
We start the process by defining a stub json template.
Right now, each endpoint returns a stub "not yet implemented" string.
Individual templates will be implemented in subsequent patches.
Basic tests for templates have been included. Coverage isn't perfect,
but it is better than nothing.
Computing the set of directories in the dirstate is expensive. It turns out
that it isn't necessary for operations like 'hg status' at all.
Why? Consider the file 'foo/bar' on disk, which is represented in the dirstate
as 'FOO/BAR'.
On 'hg status', we'd walk down the directory tree, coming across 'foo' first.
Before: we'd normalize 'foo' to 'FOO', then add 'FOO' to our visited stack.
We'd then visit 'FOO', finding the file 'bar'. We'd normalize 'FOO/bar' to
'FOO/BAR', then add it to the results dict.
After: we wouldn't normalize 'foo' at all. We'd add it to our visited stack,
then visit 'foo', finding the file 'bar'. We'd normalize 'foo/bar' to
'FOO/BAR', then add it to the results dict.
So whether we normalize intermediate directories or not actually makes no
difference in most cases.
The only case where normalization matters at all is if a file is replaced with
a directory with the same case-folded name. In that case we can do a relatively
cheap file normalization instead and still get away with not computing the set
of directories.
This is a nice boost in status performance. On OS X with case-insensitive HFS+,
for a large repo with over 200,000 files, this brings down 'hg status' from
4.00 seconds to 3.62.
Computing the set of directories in the dirstate can be pretty expensive. For
'hg status' without arguments, it turns out we actually never need to figure
out the right case for directories in the foldmap. (An upcoming patch explains
why.)
This patch splits up the directory and file maps into separate ones, allowing
for the subsequent optimization in status.
Caches should be transparent. If a cache is damaged, it should
silently be rebuilt, much like if it were invalid. No one seems to
have ever hit this in the wild.
For manifest v2, revlog.revdiff() usually does not provide enough
information to produce a manifest. As a simple workaround, implement
readdelta() by reading both the old and the new manifest and use
manifest.diff() to find the difference. This is several times slower
than the current readdelta() for v1 manifests, but there seems to be
no other simple option, and this is still much faster than returning
the full manifest (at least for verify).
We may add support for the fastdelta optimization for manifest v2 at a
later point, but let's disable it for now, so we don't have to
implement it right away.
With tree manifests, hashes will change anyway, so now is a good time
to also take up the old plans of a new manifest format. While there
should be little or no reason to use tree manifests with the current
manifest format (v1) once the new format (v2) is supported, we'll try
to keep the two dimensions (flat/tree and v1/v2) separate.
In preparation for adding a the new format, let's add configuration
for it and propagate that configuration to the manifest revlog
subclass. The new configuration ("experimental.manifestv2") says in
what format to write the manifest data. We may later add other
configuration to choose how to hash it, either keeping the v1 hash for
BC or hashing the v2 content.
See http://mercurial.selenic.com/wiki/ManifestV2Plan for more details.
By extracting a method that generates (path, node, flags) tuples, we
can reuse the code for parsing a manifest without doing it via a
_lazymanifest like treemanifest currently does. It also prepares for
parsing the new manifest format.
Note that this makes parsing into treemanifest slower, since the
parsing is now always done in pure Python. Since treemanifests will be
expected (or even forced) to be used only with the new manifest
format, parsing via _lazymanifest was not an option anyway.
The overwhelmingly common case is running commands like 'hg diff' with no
arguments. Therefore the only file that'll be listed is the root directory.
Normalizing that's just a waste of time.
This means that for a plain 'hg diff' we'll never need to construct the
foldmap, saving us a significant chunk of time.
On case-insensitive HFS+ on OS X, for a large repository with over 200,000
files, this brings down 'hg diff' from 2.97 seconds to 2.36.
This will be useful to execute actions after the tree is parsed and
before the revset returns a match. Finding symbols in the parse tree
will later allow hashes of hidden revisions to work on the command
line without the --hidden flag.
manifest.intersectfiles() is just a utility used by manifest.matches(), and
a future commit removes intersectfiles for treemanifest for optimization
purposes.
This commit makes the intersectfiles methods on manifestdict and treemanifest
internal, and converts its test to a more generic testMatches(), which has the
exact same coverage.
With record's curses interface toggling and untoggling a newly added
file would lead to a confusing UI (the header was marked as partial
and the hunks as unselected). Tested additionally using the curses
interface with newly added, removed and modified files in a test repo.
This improves performance because it uses strchr rather than a loop.
For LLVM/clang version "Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM
3.5svn)" on OS X, for a repo with over 200,000 files, this improves perfdirs
from 0.248 seconds to 0.230 (7.3%)
For gcc 4.4.6 on Linux, for a test repo with over 500,000 files, this improves
perfdirs from 0.704 seconds to 0.658 (6.5%).
In the very early days of hg, it was possible to commit /dev/null because our
patch importer was too simple. Repos from this era may still
exist, add a note about why we ignore this name.
This patch makes wrapping "commands.update()" by largefiles extension
useless, because "cmdutil.bailifchanged()" can detect changes of
largefiles in the working directory.
This patch also changes test-update-branches.t, because
"cmdutil.bailifchanged()" shows more detailed information about
dirty-ness of the working directory than "workingctx.dirty()".
In "commands.update()", "cmdutil.bailifchanged()" isn't used for
"abort if the working directory is dirty", because it forcibly
examines about merging in progress.
"workingctx.dirty()" used in "commands.update()" can't detect changes
of largefiles in the working directory without "repo.lfstatus = True"
wrapping. This is only reason of "commands.update()" wrapping by
largefiles extension.
On the other hand, "cmdutil.bailifchanged()" already wrapped by
largefiles extension can detect changes of largefiles.
This patch is a preparations for replacing "workingctx.dirty()" and
raising Abort in "commands.update()" by "cmdutil.bailifchanged()". It
can remove redundant "commands.update()" wrapping.
This patch newly adds "dirtyreason()" to centralize composing dirty
reason message like "uncommitted changes in subrepository 'xxxx'".
There are 3 similar messages below, and this patch is a part of
preparations for unifying them into (1), too.
1. uncommitted changes in subrepository 'XXXX'
2. uncommitted changes in subrepository XXXX
3. uncommitted changes in subrepo XXXX
This patch chooses adding new method "dirtyreason()" instead of making
"dirty()" return "reason string", because:
- some of existing "dirty()" implementation is too complicated to do
so simply, and
- ill-mannered 3rd party subrepo classes, of which "dirty()" doesn't
return "reason string", cause meaningless message (even though it
is rare case)
When assigning a 22-byte hash to a nodeid in a manifest, manifestdict
drops the 22nd byte, while treemanifest keeps it. Let's make
treemanifest drop the 22nd byte as well.
Reverting to a revision where the subrepo didn't exist will now abort, and
matching subrepos against the working directory is consistent with how filesets
are evaluated since dd1c701aad4d.
The list of subrepos to revert is currently based on the given --rev, so there
is currently no way for this to fail. Using the --rev context is wrong though,
because if the subrepo doesn't exist in --rev, it is skipped, so it won't be
changed. This change makes it so that the revert aborts, which is what happens
if a plain file is reverted to -1. Finding matches based on --rev is also
inconsistent with evaluating files against the working directory (dd1c701aad4d).
This change is made now, so as to not cause breakage when the context is
switched in an upcoming patch.
This is a significant win for large repositories on OS X, especially with a
cold cache. Unfortunately we need to keep the lstat-based implementation around
for two reasons:
- Not all filesystems support this call.
- There's an edge case in which it's best to fall back to avoid a retry loop.
More about this in the comments.
The below tests are all performed on a Mac with an SSD running OS X 10.9, on a
repository with over 200k files. The results are best of 5 with simulated
best-effort conditions.
The gains with a hot cache are pretty impressive: 'hg status' goes from 5.18
seconds to 3.79 seconds.
However, a repository that large will probably already be using something like
hgwatchman [1], which helps much more (for this repo, 'hg status' with
hgwatchman is approximately 1 second). Where this really helps is when the
cache is cold [2]: hg status goes from 31.0 seconds to 9.66.
See http://lists.apple.com/archives/filesystem-dev/2014/Dec/msg00002.html for
some more discussion about this function.
This is based on a patch by Sean Farley <sean@farley.io>.
[1] https://bitbucket.org/facebook/hgwatchman
[2] There appears to be no easy way to clear the file cache (aka "vnodes") on
OS X short of rebooting. purge(8) purportedly does that but in my testing had
little effect. The workaround I came up with was to assume that vnode eviction
was LRU, make sure the kern.maxvnodes sysctl is smaller than the size of the
repository, then make sure we'd always miss the cache by running 'hg status' in
another clone of the repository before running it in the test repository.
In upcoming patches we'll add another implementation of listdir on OS X. That
implementation will have to fall back to this one under some circumstances,
though. We'll make _listdir be able to detect those circumstances and use the
right function as appropriate.
If self is a smartset and other is a fullreposet, nothing should be necessary.
A small win for trivial query in mozilla-central repo:
revset #0: (0:100000)
0) wall 0.017211 comb 0.020000 user 0.020000 sys 0.000000 (best of 163)
1) wall 0.001324 comb 0.000000 user 0.000000 sys 0.000000 (best of 2160)
Previously, it was difficult to find out how to display the status of files
relative to your current working directory. This patch adds that knowledge to
the help text.
The checkinlinesize function, which converts inline revlogs to non-inline,
uses the current transaction's "data" field to determine how to update the
transaction after the conversion.
This change works around the missing data field, which is not in the
transaction after a strip.
This speeds up 'hg revert -r .^ --all --dry-run' on the Mozilla repo
from 4.081s to 0.826s. Note that 'hg revert -r .^ .' does not get any
faster, since '.' does not make match.always() True.
I can't think of a reason it would break anything, and if it does,
it's clearly not covered by tests.
This can simplify the conversion from numeric revision to string. Without it,
we have to handle -1 specially because repo['-1'] != repo[-1].
The -1 revision is not officially documented, but this change makes sense
assuming that "rev(%d)" exists for scripting or third-party tools.
This was introduced way back in 2006 (rev e8e3582d6f80) as sys.exit(0) if
loading an extension failed when --traceback was on, then at some point morphed
into a 'return 1' in a function that otherwise returns nothing.
At this point, if ui.traceback is enabled and if loading an extension fails for
whatever reason, including one as innocent as it not being present, we leave
any extensions loaded so far in a bogus half-initialized state. That doesn't
really make any sense.
When running "hg log 'set:added()'", we create two matchers: one used
for producing the revset and one used for finding files to match. In
185b6b930e8c (graphlog: evaluate FILE/-I/-X filesets on the working
dir, 2012-02-26), we started passing a revision argument along from
what's currently in cmdutil._makelogrevset() to
revset._matchfiles(). When the revision was an empty string, it
referred to the working copy. This was subtly done with "repo[rev or
None]". Then, in 5ff5c5c9e69f (revset: avoid recalculating filesets,
2014-10-22), that conversion from empty string to None was lost. Note
that repo[''] is equivalent to repo['.'], not repo[None].
The consequence of this, to the user, is that when running "hg log
'set:added()'", the file matcher matches files added in the working
copy, while the revset matcher matches revisions that touch files
added in the parent of the working copy. As a result, only revisions
that touch any files added in the parent of the working copy will be
considered, but they will only be included if they also touch files
added in the working copy.
Fix the bug by converting '' to None again, but make it a little more
explicit this time (plus, we now have tests for it).
On IRC, rom1dep reported a traceback[1] from setting
experimental.strip-bundle2-version to True. This diff catches
unexpected values and falls back to the non-experimental bundle1
implementation after issuing a warning.
[1] http://gist.tamytro.org/_admin/gists/qXcdQLwtApgy6e3NwWgl
Previously, git subrepo diffs did not have a newline at the end.
This caused multiple subrepo diffs to be joined on the same line.
Additionally, the command prompt after the diff still contained
a part of the diff.
Previously, we had weird, nonsensical behavior when committing a file move
with a missing source. This removes that weird logic and tests that the bug
this strange behavior caused is fixed. Also adds a longish comment to prevent
some poor soul from accidentally re-implementing the bug in the future.
When a binary source has been copied or renamed into a non-binary
file, we don't check whether the copy source was binary. There is a
code comment explaining that a git mode will be forced anyway in order
to capture the copy record (i.e. losedatafn() will be called). This is
true, but forcing git mode is not the only effect binary files have:
when git mode was already requested, we use the binary-ness to tell us
whether to use a regular unified diff or a git binary diff. The user
sees this as a "Binary file $file has changed" instead of the binary
Once the transaction is closed, we now write transaction related data for
possible future undo. For now, we only do it for full file "backup" because
their were not handle at all in that case. In the future, we could move all the
current logic to set undo up (that currently exists in localrepository) inside
transaction itself, but it is not strictly requires to solve the current
situation.
It is time for the transaction to be responsible for setting up the
undo data. It is necessary to move this logic into the transaction
because many more files are handled now, and the transaction is the
object tracking them all.
The value can be set to None if no undo should be set.
The argument is a string containing the journal name (used as prefix for all
other transaction file). This is not the transaction file itself. So we clarify
this.
Using 'copyfile' (single file) instead of 'copyfiles' (tree) will ensures
destination file will be overwritten. This will prevent some abort if backup
file are left in place for random reason.
It also seems more correct.
Some code paths use 'copyfiles' (full tree) for a single file to take advantage
of the best-effort-hard-linking parameter. We add similar parameter and logic
to 'copyfile' (single file) for this purpose.
The single file version have the advantage to overwrite the destination file if
it exists.
This adds an experimental option 'strip-bundle2-version' which causes backup
bundles to use bundle2 formatting. Especially for generaldelta repositories,
this should provide significant performance gains for any operation that needs
to write a backup.
This diff adds support to writebundle to generate a bundle2 wrapper; upcoming
diffs will add an option to write a v2 changegroup part instead of v1 in these
bundles.
The next diff will add support for writing bundle2 files to writebundle, but
the bundle2 generator wants access to a ui object. This changes the signature
and callsites to pass one in.
The largefiles extensions needs to be able to pass --large, --normal and
--lfsize to subrepos via cmdutil.add() and hgsubrepo.add(). Rather than add
additional special purpose arguments, stop extracting the existing args from the
**opts passed to commands.add() and just pass them along.
0e5f75a52c9d introduced a way to share bookmarks. When a repository share that
shares bookmarks was created, a .hg/bookmarks.shared file was created to mark
the repository share as one that shares its bookmarks.
We have plans to introduce other levels of sharing, including a "full share"
mode. Rather than creating a new ".shared" file for each new thing that we may
want to share It seems better to create a single "shared" file that will list
what is shared for a given shared repository. This should make it much easier
to get a list of everything that is shared by a given shared repository.
The shared file contains a list of shared "items" (such as bookmarks). Each
shared "item" is added as a new line in the file. For now the only possible
entry in the file is "bookmarks".
Now that the matcher supports 'include:' rules, let's change the dirstate.ignore
creation to just create a matcher with a bunch of includes. This allows us to
completely delete ignore.py.
I moved some of the syntax documentation over to readpatternfile in match.py so
we don't lose it.
This allows the matcher to understand 'include:path/to/file' style rules. The
files support the standard hgignore syntax and any rules read from the file are
included in the matcher without regard to the files location in the repository
(i.e. if the included file is in somedir/otherdir, all of it's rules will still
apply to the entire repository).
Occasionally the matcher will want to print warning messages instead of throwing
exceptions (like if it encounters a bad syntax parameter when parsing files).
Let's add an optional warn argument that can provide this. The next patch will
actually use this argument.
Future patches will be adding the ability to recursively include pattern files
in a match rule expression. Part of that behavior will require tracking which
file each pattern came from so we can report errors correctly.
Let's add a 'source' arg to the kindpats list to track this. Initially it will
only be populated by listfile rules.
At least as far back as Python 2.6 the .code attribute is always
defined, and to the best of my detective skills a .getcode() method
has never been a thing.
This patvh speeds up the computation of the not public() changeset
and incidentally speed up the computation of divergents() changeset on our big
repo by 100x from 50% to 0.5% of the time spent in smartlog with evolve.
In this patch we optimize not public() to _notpublic() (new revset) and use
the work on phaseset (from the previous commit) to be able to compute
_notpublic() quickly.
We use a non-lazy approach making the assumption the number of notpublic
change will not be in the order of magnitude of the repo size. Adopting a
lazy approach gives a speedup of 5x (vs 100x) only due to the overhead of the
code for lazy generation.
To speed up the computation of draft(), secret(), divergent(), obsolete() and
unstable() we need to have a fast way of getting the list of revisions that
are in draft(), secret() or the union of both: not public().
This patch extends the work on phase computation in C and make the phase
computation code also return a list of set for each non public phase.
Using these sets we can quickly obtain all the revisions of a given phase.
We do not return a set for the public phase as we expect it to be roughly the
size of the repo. Also, it can be computed easily by substracting the entries in the
non public phases from all the revs in the repo.
Match's visitdir() was prematurely optimized to return 'all' in some cases, so
that the caller would not have to call it for directories within the current
directory. This change makes the visitdir system less flexible for future
changes, such as making visitdir consider the match's include and exclude
patterns.
As a demonstration of this optimization not actually improving performance,
I ran 'hg files -r . media' on the Mozilla repository, stored as treemanifest
revlogs.
With best of ten tries, the command took 1.07s both with and without the
optimization, even though the optimization reduced the calls from visitdir()
from 987 to 51.
Since manifests instances are cached on the manifest log instance, we
can cache directory manifests by caching the directory manifest
logs. The directory manifest log cache is a plain dict, so it never
expires; we assume that we can keep all the directories in memory.
The cache is kept on the root manifestlog, so access to directory
manifest logs now has to go through the root manifest log.
The caching will soon not be only an optimization. When we start
lazily loading directory manifests, we need to make sure we don't
create multiple instances of the log objects. The caching takes care
of that problem.
This get rid of another StopIteration abomination. The change in self.current
value is supposed to not matter as nobody should be calling '_advance' after
that (as per Matt wisdom).
A future commit will move the readignorefile logic into match.py so it can be
used from general match rules. Let's rename the function to represent this new
behavior.
_ignorefile did nothing except open the file. Let's combine it with
readignorefile for simplicity. This will make it easier to rename and move to
match.py in upcoming patches.
In preparation for moving readignorefile to match.py to make it more generally
usable, let's move the bad ignore file handling up to the ignore specific logic.
Previously we would always pass the root .hgignore path to the ignore parser.
The parser then had to be aware that the first path was special, and not warn if
it didn't exist.
In preparation for making the ignore file parser more generically usable, let's
make the parse logic not aware of this special case, and instead just not pass
the root .hgignore in if it doesn't exist.