This improves the poor "time to first changeset" compared to the
original log command. When running:
$ hg log -u user
log will enumerate the changelog and display matching revisions when
they are found. But:
$ hg log -G -u user
will first find all revisions matching the user then start to display
them.
Initially, I considered turning revset.match() into a generator. This is
doable but requires a fair amount of work. Instead,
cmdutil.increasingwindows() is reused to call the revset matcher
repeatedly. This has the nice properties of:
- Let us reorder the windows after filtering, which is necessary as the
matcher can reorder inputs but is an internal detail not a feature.
- Let us feed the matcher with windows in changelog order, which is good
for performances.
- Have a generator designed for log-like commands, returning small
windows at first then batching larger ones.
I feel that calling the matcher multiple times is correct, at least with
the revsets involved in getlogrevs() because they are:
- stateless (no limit())
- respecting f(a|b) = f(a) | f(b), though I have no valid argument about
that.
Known issues compared to log code:
- Calling the revset matcher multiple times can be slow when revset
functions have to create expensive data structure for filtering. This
will be addressed in a followup.
- Predicate combinations like "--user foo --user bar" or "--user foo and
--branch bar" are inherently slower because all input revision are
checked against the first condition, then against the second, and so
forth. log would enumerate the input revisions once and check each of
them once against all conditions, which is faster. There are solutions
but nothing cheap to implement.
Some numbers against mozilla repository:
first line total
* hg log -u rnewman
/Users/pmezard/bin/hg-2.2 0.148s 7.293s
/Users/pmezard/bin/hgdev 0.132s 5.747s
* hg log -u rnewman -u girard
/Users/pmezard/bin/hg-2.2 0.146s 7.323s
/Users/pmezard/bin/hgdev 0.136s 11.096s
* hg log -l 10
/Users/pmezard/bin/hg-2.2 0.137s 0.153s
/Users/pmezard/bin/hgdev 0.128s 0.144s
* hg log -l 10 -u rnewman
/Users/pmezard/bin/hg-2.2 0.146s 0.265s
/Users/pmezard/bin/hgdev 0.133s 0.236s
* hg log -b GECKO193a2_20100228_RELBRANCH
/Users/pmezard/bin/hg-2.2 2.332s 6.618s
/Users/pmezard/bin/hgdev 1.972s 5.543s
* hg log xulrunner
/Users/pmezard/bin/hg-2.2 5.829s 5.958s
/Users/pmezard/bin/hgdev 0.194s 6.017s
* hg log --follow xulrunner/build.mk
/Users/pmezard/bin/hg-2.2 0.353s 0.438s
/Users/pmezard/bin/hgdev 0.394s 0.580s
* hg log -u girard tools
/Users/pmezard/bin/hg-2.2 5.853s 6.012s
/Users/pmezard/bin/hgdev 0.195s 6.030s
* hg log -b COMM2000_20110314_RELBRANCH --copies
/Users/pmezard/bin/hg-2.2 2.231s 6.653s
/Users/pmezard/bin/hgdev 1.897s 5.585s
* hg log --follow
/Users/pmezard/bin/hg-2.2 0.137s 14.140s
/Users/pmezard/bin/hgdev 0.381s 44.246s
* hg log --follow -r 80000:90000
/Users/pmezard/bin/hg-2.2 0.127s 1.611s
/Users/pmezard/bin/hgdev 0.147s 1.847s
* hg log --follow -r 90000:80000
/Users/pmezard/bin/hg-2.2 0.130s 1.702s
/Users/pmezard/bin/hgdev 0.368s 6.106s
* hg log --follow -r 80000:90000 js/src/jsproxy.cpp
/Users/pmezard/bin/hg-2.2 0.343s 0.388s
/Users/pmezard/bin/hgdev 0.437s 0.631s
* hg log --follow -r 90000:80000 js/src/jsproxy.cpp
/Users/pmezard/bin/hg-2.2 0.342s 0.389s
/Users/pmezard/bin/hgdev 0.442s 0.628s
The situation is complicated because filelog() revset uses a match object in
relpath mode while follow() revset interprets the filename as a manifest entry.
This solves a similar problem than the previous --follow/--rev patch. This time
we need changelog.ancestors()/descendants() filtering on first parent.
Duplicating the code looked better than introducing keyword arguments. Besides,
the ancestors() version was already implemented in follow() revset.
The previous behaviour of --follow was really a subset of what is really
happening in log command:
- If --rev is not passed, default to '.:0'
- Resolve --rev into a revision list "revs"
- Set the starting revision to revs[0]
- If revs[1] > revs[0] keep descendants(revs[0]) in revs, otherwise keep
ancestors.
Running:
$ time hg debugrevspec 'user(mpm)' | wc
on Mercurial repository takes 1.0s with a regular version and 1.8s if
commands.debugrevspec() is patched to pass revisions to revset.match() from tip
to 0.
Depending on what we expect from the revset API and caller wisdom, we might
want to push this change in revset.match() later.
When --follow and --rev are passed, --follow actual behaviour depends on the
input revision sequence defined by --rev. If --rev is not passed, the default
revision sequence depends on the presence of --follow. It means the revision
sequence generation is part of log logic and must be wrapped. The issue
described above is fixed in following patches.
--rev options cannot be merged into a single revset because we do not know if
they are valid revset or old-style revision specifications, like 'foo-bar'
tags. Instead, a base revision set is generated with scmutil.revrange() then
filtered with the revset built from log options. It also fixes incorrect or
hostile expressions passed in --rev.
The previous code was correct for command line as opts always contains the
default empty lists for --branch and --only-branch options. But calling
graphlog.revset() directly with only --only-branch set would leave it
unprocessed.
When passing --patch/--stat, file filters have to be applied to generate the
correct diff or stat output:
- Without --follow, the static match object can be reused
- With --follow, the files displayed at revision X are the ancestors of
selected files at parent revision. To do this, we reproduce the ancestry
calculations done by --follow, lazily.
test-glog.t changes show that --patch output is not satisfying because renames
are reported as copies. This can probably be fixed by:
- Without --follow: compute files to display, look for renames sources and
extend the matcher to include them.
- With --follow: detect .path() transitions between parent/child filectx,
filter them using the linked changectx .removed() field and extend fcache
with them.
"hg log --removed FILE" does not return changesets where FILE was removed, but
ones where FILE was changed and possibly removed. The flag is really here to
disable walkchangerevs() fast path, which cannot see file removals by scanning
filelogs.
This subtlety is not documented yet but:
- pats/--include/--exclude filesets are evaluated against the working directory
- --rev filesets are reevaluated against every revisions
log --graph --follow-first FILE cannot be compared with the regular version
because it never worked: --follow-first is not taken in account in
walkchangerevs() fast path and is explicitely bypassed in FILE case in
walkchangerevs() nested iterate() function.
On platforms not supporting shell expansion, scmutil.match() performs glob
expansion on 'pats' arguments. But _matchfiles() revset calls match.match()
directly, bypassing this behaviour. To avoid duplicating scmutil.match(), a
secondary scmutil.matchandpats() is introduced returning both the match object
and the expanded inputs. Note the expanded pats are also needed in the fast
path, and will be used by --follow code path.
The filtering logic of match objects cannot be reproduced with the existing
revsets as it operates at changeset files level. A changeset touching "a" and
"b" is matched by "-I a -X b" but not by "file(a) and not file(b)".
To solve this, a new internal "_matchfiles(...)" revset is introduced. It works
like "file(x)" but accepts more than one argument and its arguments are
prefixed with "p:", "i:" and "x:" to be used as patterns, include patterns or
exclude patterns respectively.
The _matchfiles revset is kept private for now:
- There are probably smarter ways to pass the arguments in a user-friendly way
- A "rev:" argument is likely appear at some point to emulate log command
behaviour with regard to filesets: they are evaluated for the parent revision
and applied everywhere instead of being reevaluated for each revision.
This will let use override the "join" value (and/or) depending on the option
considered. The option revset arity is now deduced from the revset and the
option value type, to simplify opt2revset definition.
This bug may be caused by file subgraphs have more than two parents
per node. I have no idea if this fix is correct as the graphlog code
is mysterious, but it seems to be fine on the available test case.
The grapher cannot really handled revisions if they are not emitted in
topological order. The previous 'reverse()' revset was not enough to achieve
that and was replaced by an explicit sort call for simplicity. The --limit
option is now also handled as usual with cmdutil.loglimit() instead of a
'limit' revset.
While nodes with more than 2 parents do not exist in revision graphs, they do
appear when you transform them by removing subgraphs while trying to preserve
ancestry links.
This code was borrowed from Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
pbranch extension.
follow() revset really means '::.' while we want something based on the passed
argument. Also, ancestors() revset does not include the parent revisions.
Thanks for the idea and most of the implementation to Klaus Koch
Backs revisions() and filerevs() with DAG walker which can iterate through
arbitrary list of revisions instead of strict one by one iteration from start to
stop. When a gap occurs in a revisions (i.e. in file log), the next topological
parent within the revset is searched and the connection to it is printed in the
ascii graph.
File graph can draw sometimes more connections than previous version, because
graph is produced according to the revset, not according to a file's filelog.
In case the graph contains several branches where the left parent is null, the
graphs for each are printed sequentially, not in parallel as it was a case
earlier (see for example the graph for README in hg-dev).