Before this patch, the "lines" function inside "annotate" returns 1 for
empty text (''). This patch makes it 0. Because the function should match
mdiff.splitnewlines (used by mdiff.allblocks), or s.splitlines (used at the
end of the "annotate" method). Both len(mdiff.splitnewlines('')) and
len(''.splitlines(True)) are 0.
This issue was discovered while testing fastannotate [1].
I could not find a test case to reveal this issue. However in theory this
could reduce memory usage a little bit, and avoids surprises when people
are touching this area in the future.
[1]: https://bitbucket.org/facebook/hg-experimental/commits/525b3b98e93a
Before, "#foo" paths made hg crash. We've moved the #fragment parsing at
246862840084, but we shouldn't set path to None too early. This patch just
removes the "if not path:" block since that's checked a few lines later.
Otherwise tolocal() and fromlocal() wouldn't work on Python 3. Still tolocal()
can't make a valid localstr object because localstr inherits str, but it can
return some object without raising exceptions.
Since Py3 bytes() behaves much like bytearray() than str() of Py2, we can't
simply do s/str/bytes/g. I have no good idea to handle str/bytes divergence.
This will be used to convert encoding.encoding to a str acceptable by
Python 3 functions.
The source encoding is changed to "latin-1" because encoding.encoding can
have arbitrary bytes. Since valid names should consist of ASCII characters,
we don't care about the mapping of non-ASCII characters so long as invalid
names are distinct from valid names.
The "needed" dict is used as a reference counter to free items in the giant
"hist" dict. However, currently it is not very accurate and can lead to
dropping "hist" items unnecessarily, for example, with the following DAG,
-3-
/ \
0--1--2--4--
The current algorithm will visit and calculate rev 1 twice, undesired. And
it tries to be smart by clearing rev 1's parents: "pcache[1] = []" at the
time hist[1] being accessed (note: hist[1] needs to be used twice, by rev 2
and rev 3). It can result in incorrect results if p1 of rev 4 deletes chunks
belonging to rev 0.
However, simply removing "needed" is not okay, because it will consume 10x
memory:
# without any change
% HGRCPATH= lrun ./hg annotate mercurial/commands.py -r d130a38 3>&2 [1]
MEMORY 49074176
CPUTIME 9.213
REALTIME 9.270
# with "needed" removed
MEMORY 637673472
CPUTIME 8.164
REALTIME 8.249
This patch moves "needed" (and "pcache") calculation to a separate DFS to
address the issue. It improves perf and fixes issue5360 by correctly reusing
hist, while maintaining low memory usage. Some additional attempt has been
made to further reduce memory usage, like changing "pcache[f] = []" to "del
pcache[f]". Therefore the result can be both faster and lower memory usage:
# with this patch applied
MEMORY 47575040
CPUTIME 7.870
REALTIME 7.926
[1]: lrun is a lightweight sandbox built on Linux cgroup and namespace. It's
used to measure CPU and memory usage here. Source code is available at
github.com/quark-zju/lrun.
Updating dirstate by simply adding and dropping files from self._map doesn't
keep the other maps updated (think: _dirs, _copymap, _foldmap, _nonormalset)
thus introducing cache inconsistency.
This is also affecting the debugstate tests since now we don't even try to set
correct mode and mtime for the files because they are marked dirty anyway and
will be checked during next status call.
That is, help gets tweaked thus:
global options ([+] can be repeated):
-v --[no-]verbose enable additional output
Other proposals have included:
global options ([+] can be repeated, options marked [?] are boolean flags):
-v --verbose[?] enable additional output
and
global options ([+] can be repeated, options marked [^] are boolean flags):
-v --verbose[^] enable additional output
which avoid the unfortunate visual noise in this patch. In this
version's favor, it's consistent with what I'm used to seeing in man
pages and similar documentation venues.
This command can be used for testing the performance of producing the
changelog portion of a changegroup.
We could use additional perf* commands for testing other parts of
changegroup. Those can be written another time, when they are needed.
(And those may want to refactor the changegroup generation API so code
can be reused.) Speaking of code reuse, yes, this command does reinvent
a small wheel. I didn't want to scope bloat to change the changegroup
API because that will invite bikeshedding.
We can't use fctx.linkrev() because follow() revset tries hard to simulate
the traversal of changelog DAG, not filelog DAG. This patch fixes
_makefollowlogfilematcher() to walk file ancestors in the same way as
revset._follow().
I'll factor out a common function in future patches.
groupchunks() is a generic "turn a file object into a generator"
function. It isn't limited to changegroups. Rename the argument
and update the docstring to reflect this.
Before this patch, bundle2 application attempted to consume remaining
bundle2 part data when the process is interrupted (SIGINT) or when
sys.exit is called (translated into a SystemExit exception). This
meant that if one of these occurred when applying a say 1 GB
changegroup bundle2 part being downloaded over a network, it may take
Mercurial *several minutes* to terminate after a SIGINT because the
process is waiting on the network to stream megabytes of data. This is
not a great user experience and a regression from bundle1. Furthermore,
many process supervisors tend to only give processes a finite amount of
time to exit after delivering SIGINT: if processes take too long to
self-terminate, a SIGKILL is issued and Mercurial has no opportunity to
clean up. This would mean orphaned locks and transactions. Not good.
This patch changes the bundle2 application behavior to fail faster
when an interrupt or system exit is requested. It does so by not
catching BaseException (which includes KeyboardInterrupt and
SystemExit) and by explicitly checking for these conditions in
yet another handler which would also seek to the end of the current
bundle2 part on failure.
The end result of this patch is that SIGINT is now reacted to
significantly faster: the active transaction is rolled back
immediately without waiting for incoming bundle2 data to be consumed.
This restores the pre-bundle2 behavior and makes Mercurial treat
signals with the urgency they deserve.
We already support multiple primitive for listing files, which were
affected by the current changeset.
This patch adds files() which returns files of the current changeset
matching a given pattern or fileset query via the "set:" prefix.
There are two reasons that rebase should be done this way:
1. This would make rebasing faster because it would minimize the total
number of files to be checked out in the process, as it don't need
to switch back and forth between branches.
2. It makes resolving conflicts easier as user has a better context.
This commit changes the behavior in "Test multiple root handling" of
test-rebase-obsolete.t. It is an expected change which reflects the new
behavior that commits in a branch are grouped together when rebased.
Add an if True: placeholder for a profiling context manager that
will be added in the next commit, for the purpose of reducing size
of the diff due to trivial indentation changes.
This change should be a no-op.
Before this patch, if steps below occurs at "the same time in sec",
all of mtime, ctime and size are same between (1) and (3).
1. append data to revlog-style file (and close transaction)
2. discard appended data by truncation of rollback
3. append same size but different data to revlog-style file again
Therefore, cache validation doesn't work after (3) as expected.
To avoid file stat ambiguity around truncation, this patch opens a
file with checkambig=True.
This is a part of ExactCacheValidationPlan.
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
Before this patch, if steps below occurs at "the same time in sec",
all of mtime, ctime and size are same between (1) and (3).
1. append data to revlog-style file (and close transaction)
2. discard appended data by truncation of strip
3. append same size but different data to revlog-style file again
Therefore, cache validation doesn't work after (3) as expected.
To avoid such file stat ambiguity around truncation, this patch opens
a file with checkambig=True.
This patch also introduces "with" statement style, to ensure immediate
invocation of close() after truncation, because closing file is the
only trigger to check (and get rid of) file stat ambiguity.
This is a part of ExactCacheValidationPlan.
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
If steps below occurs at "the same time in sec", all of mtime, ctime
and size are same between (1) and (3).
1. append data to 00changelog.i (and close transaction)
2. discard appended data by truncation (strip or rollback)
3. append same size but different data to 00changelog.i again
Therefore, cache validation doesn't work after (3) as expected.
To avoid such file stat ambiguity around truncation, this patch
specifies checkambig=True to revlog.__init__(). This makes revlog
write changes out with checkambig=True.
Even though changes of 00changelog.i themselves are written out at
changelog._finalize(), this checkambig=True is needed, because
revlog.checkinlinesize(), which is invoked at the end of
changelog._finalize(), might replace already changed 00changelog.i by
converted one.
Even after this patch, avoiding file stat ambiguity of 00changelog.i
around truncation isn't yet completed, because truncation side isn't
aware of this issue.
This is a part of ExactCacheValidationPlan.
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
If steps below occurs at "the same time in sec", all of mtime, ctime
and size are same between (1) and (3).
1. append data to 00changelog.i (and close transaction)
2. discard appended data by truncation (strip or rollback)
3. append same size but different data to 00changelog.i again
Therefore, cache validation doesn't work after (3) as expected.
To avoid such file stat ambiguity around truncation, this patch
specifies checkambig=True for renaming or opening to write changes out
at finalization.
Even after this patch, avoiding file stat ambiguity of 00changelog.i
around truncation isn't yet completed, because truncation side isn't
aware of this issue.
This is a part of ExactCacheValidationPlan.
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
If steps below occurs at "the same time in sec", all of mtime, ctime
and size are same between (1) and (3).
1. append data to 00manifest.i (and close transaction)
2. discard appended data by truncation (strip or rollback)
3. append same size but different data to 00manifest.i again
Therefore, cache validation doesn't work after (3) as expected.
To avoid such file stat ambiguity around truncation, this patch
specifies checkambig=True to revlog.__init__(). This makes revlog
write changes out with checkambig=True.
Even after this patch, avoiding file stat ambiguity of 00manifest.i
around truncation isn't yet completed, because truncation side isn't
aware of this issue.
This is a part of ExactCacheValidationPlan.
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
This allows revlog-style files to be written out with checkambig=True
easily.
Because avoiding file stat ambiguity is needed only for filecache-ed
manifest and changelog, this patch does:
- use False for default value of checkambig
- focus only on writing changes of index file out
This patch also adds optional argument checkambig to _divert/_delay
for changelog, to safely accept checkambig specified in revlog
layer. But this argument can be fully ignored, because:
- changes are written into other than index file, if name != target
- changes are never written into index file, otherwise
(into pending file by _divert, or into in-memory buffer by _delay)
This is a part of ExactCacheValidationPlan.
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
In Mercurial source tree, opening a file in "a"/"a+" mode like below
doesn't specify atomictemp=True for vfs, and this avoids file stat
ambiguity check by atomictempfile.
- writing changes out in revlog layer uses "a+" mode
- truncation in repair.strip() uses "a" mode
- truncation in transaction._playback() uses "a" mode
If steps below occurs at "the same time in sec", all of mtime, ctime
and size are same between (1) and (3).
1. append data to revlog-style file (and close transaction)
2. discard appended data by truncation (strip or rollback)
3. append same size but different data to revlog-style file again
Therefore, cache validation doesn't work after (3) as expected.
This patch uses checkambigatclosing in checkambig=True but
atomictemp=False case, to check (and get rid of) file stat ambiguity
at closing.
This is a part of ExactCacheValidationPlan.
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan