Commit Graph

30292 Commits

Author SHA1 Message Date
Mads Kiilerich
3f0e3f18d8 bdiff: adjust criteria for getting optimal longest match in the A side middle
We prefer matches closer to the middle to balance recursion, as introduced in
d3deb406b55b.

For ranges with uneven length, matches starting exactly in the middle should
have preference. That will be optimal for matches of length 1. We will thus
accept equality in the half check.

For ranges with even length, half was ceil'ed when calculated but we got the
preference for low matches from the 'less than half' check. To get the same
result as before when we also accept equality, floor it. Without that,
test-annotate.t would show some different (still correct but less optimal)
results.

This will change the heuristics. Tests shows a slightly different output - and
sometimes slightly smaller bundles.

The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from
22804885 to 22803824 bytes - an 0.005% reduction.
2016-11-08 18:37:33 +01:00
Mads Kiilerich
9d1edc2d4f tests: explore some bdiff cases 2016-11-08 18:37:33 +01:00
Mads Kiilerich
f19d3ccfaf tests: make test-bdiff.py easier to maintain
Add more stdout logging to help navigate the .out file.
2016-11-15 21:56:49 +01:00
Gregory Szorc
1c6b8908b4 perf: unbust perfbdiff --alldata
This broke in c7236da49964 due to a refactored manifest API.

The fix is a bit hacky - perfbdiff doesn't yet support tree manifests
for example. But it gets the job done.

A test has been added for --alldata so this doesn't happen again.
2016-11-17 08:52:52 -08:00
Yuya Nishihara
71e47c621a worker: discard waited pid by anyone who noticed it first
This makes sure all waited pids are removed before calling killworkers()
even if waitpid()-pids.discard() sequence is interrupted by another SIGCHLD.
2016-11-17 20:57:09 +09:00
Yuya Nishihara
bb91ef72fa worker: kill workers after all zombie processes are reaped
Since we now wait child processes in non-blocking way (changed by 6c7588a50638
and 13c3aefdee29), we don't have to kill them in the middle of the waitpid()
loop. This change will help solving a possible race of waitpid()-pids.discard()
sequence and another SIGCHLD.

waitforworkers() is called by cleanup(), in which case we do killworkers()
beforehand so we can remove killworkers() from waitforworkers().
2016-11-17 21:08:58 +09:00
Yuya Nishihara
524706bec6 worker: make sure killworkers() never be interrupted by another SIGCHLD
killworkers() iterates over pids, which can be updated by SIGCHLD handler.
So we should either copy pids or prevent killworkers() from being interrupted
by SIGCHLD. I chose the latter as it is simpler and can make pids handling
more consistent.

This fixes a possible "set changed size during iteration" error at
killworkers() before cleanup().
2016-11-17 20:44:05 +09:00
Yuya Nishihara
4d599ed630 worker: fix missed break on successful waitpid()
Follow-up for 5414fcc0ba19.
2016-11-17 21:43:01 +09:00
Augie Fackler
0058294764 filterpyflakes: dramatically simplify the entire thing by blacklisting
We've only got one kind of pyflakes failure left in our codebase, so
it's time to switch over to a blacklist-based checking scheme. I've
left in the filtering of two undefined names for now out of paranoia,
but those can probably go too.
2016-11-10 16:49:42 -05:00
Augie Fackler
edf8ac03be run-tests: forward Python USER_BASE from site (issue5425)
We do this so that any linters installed via pip install --user don't
break. See https://docs.python.org/2/library/site.html#site.USER_BASE
for a description of what this nonsense is all about.

An alternative would be to not set HOME, but that'll cause other
problems (see issue2707), or to forward every single path entry from
sys.path in PYTHONPATH (which seems sketchy in its own way).
2016-11-10 16:07:24 -05:00
Jun Wu
96be8ae269 util: improve iterfile so it chooses code path wisely
We have performance concerns on "iterfile" as it is 4X slower on normal
files. While modern systems have the nice property that reading a "fast"
(on-disk) file cannot be interrupted and should be made use of.

This patch dumps the related knowledge in comments. And "iterfile" chooses
code paths wisely:

  1. If it's CPython 3, or PyPY, use the fast path.
  2. If fp is a normal file, use the fast path.
  3. If fp is not a normal file and CPython version >= 2.7.4, use the same
     workaround (4x slower) as before.
  4. If fp is not a normal file and CPython version < 2.7.4, use another
     workaround (2x slower but may block longer then necessary) which
     basically re-invents the buffer + readline logic in Python.

This will give us good confidence on both correctness and performance
dealing with EINTR in iterfile(fp) for all known supported Python versions.
2016-11-15 20:25:51 +00:00
Augie Fackler
73f7abdd33 merge with stable 2016-11-16 23:29:28 -05:00
FUJIWARA Katsunori
367ebf8ba3 scmutil: ignore EPERM at os.utime, which avoids ambiguity at closing
According to POSIX specification, just having group write access to a
file causes EPERM at invocation of os.utime() with an explicit time
information (e.g. working on the repository shared by group access
permission).

To ignore EPERM at closing file object in such case, this patch makes
checkambigatclosing._checkambig() use filestat.avoidambig() introduced
by previous patch.

Some functions below imply this code path at truncation of an existing
(= might be owned by another user) file.

  - strip() in repair.py, introduced by 4d0a08431b6f
  - _playback() in transaction.py, introduced by 48fe04792102

This is a variant of issue5418.
2016-11-13 06:12:22 +09:00
FUJIWARA Katsunori
11742ce806 vfs: ignore EPERM at os.utime, which avoids ambiguity at renaming (issue5418)
According to POSIX specification, just having group write access to a
file causes EPERM at invocation of os.utime() with an explicit time
information (e.g. working on the repository shared by group access
permission).

To ignore EPERM at renaming in such case, this patch makes
vfs.rename() use filestat.avoidambig() introduced by previous patch.
2016-11-13 06:11:56 +09:00
FUJIWARA Katsunori
64644e300c util: add utility function to skip avoiding file stat ambiguity if EPERM
Now, advancing stat.st_mtime by os.utime() is used to avoid file stat
ambiguity. But according to POSIX specification, utime(2) with an
explicit time information is permitted only for a process with:

  - the effective user ID equal to the user ID of the file, or
  - appropriate privileges

  http://pubs.opengroup.org/onlinepubs/9699919799/functions/utime.html

Therefore, just having group write access to a file causes EPERM at
applying os.utime() on it (e.g. working on the repository shared by
group access permission).

This patch adds class filestat utility function avoidamgig() to avoid
file stat ambiguity but skip it if EPERM.

It is reasonable to always ignore EPERM, because utime(2) causes EPERM
only in the case described above (EACCES is used only for utime(2)
with NULL).
2016-11-13 06:06:23 +09:00
Jun Wu
ea73f2efd0 worker: stop using a separate thread waiting for children
Now that we have a SIGCHLD hander, and it could get executed when waiting
for I/O. It's no longer necessary to have a separated waitpid thread. So
just remove it.
2016-11-12 03:06:07 +00:00
Jun Wu
483697646a worker: add a SIGCHLD handler to collect worker immediately
As planned by previous patches, add a SIGCHLD handler to get notifications
about worker exits, and deals with worker failure immediately.

Note that the SIGCHLD handler gets unregistered before killworkers(), so
SIGCHLD won't interrupt "killworkers" - making it harder to send kill
signals to waited processes.
2016-11-12 03:07:22 +00:00
Jun Wu
5b9ad89016 worker: make waitforworkers reentrant
We are going to use it in the SIGCHLD handler. The handler will be executed
in the main thread with the non-blocking version of waitpid, while the
waitforworkers thread runs the blocking version. It's possible that one of
them collects a worker and makes the other error out (no child to wait).
This patch handles these errors: ECHILD is ignored. EINTR needs a retry.

The "pids" set is designed to be only modifiable by "waitforworkers". And we
only remove items after a successful waitpid. Since a child process can only
be "waitpid"-ed once. It's guaranteed that "pids.remove(p)" won't be called
with duplicated "p"s. And once a "p" is removed from "pids", that "p" does
not need to be killed or waited any more.
2016-11-15 02:12:16 +00:00
Jun Wu
c6f4ebbf7e worker: change "pids" to a set
There is no need to keep any order of the "pids" array. A set is more
efficient for the "remove" operation. And the following patch will use that.
2016-11-15 02:10:40 +00:00
Gregory Szorc
085fa86140 hgweb: cache fctx.parents() in annotate command (issue5414)
43e3fb1c484e introduced a call to fctx.parents() for each line in
annotate output. This function call isn't cheap, as it requires
linkrev adjustment.

Since multiple lines in annotate output tend to belong to the same
file revision, a cache of fctx.parents() lookups for each input
should be effective in the common case. So we implement one.

Since the cache has to precompute parents so an aborted generator
doesn't leave an incomplete cache, we could just return a list.
However, we preserve the generator for backwards compatibility.

The effect of this change when requesting /annotate/96ca0ecdcfa/
browser/locales/en-US/chrome/browser/downloads/downloads.dtd on
the mozilla-aurora repo is significant:

p1(43e3fb1c484e)  5.5s
43e3fb1c484e:    66.3s
this patch:      10.8s

We're still slower than before. But only by ~2x instead of ~12x.

On the tip revisions of layout/base/nsCSSFrameConstructor.cpp file in
the mozilla-unified repo, time went from 12.5s to 14.5s and back to
12.5s. I'm not sure why the mozilla-aurora repo is so slow.

Looking at the code of basefilectx.parents(), there is room for
further improvements. Notably, we still perform redundant calls to
filelog.renamed() and basefilectx._parentfilectx(). And
basefilectx.annotate() also makes similar calls, so there is potential
for object reuse. However, introducing caches here are not appropriate
for the stable branch.
2016-11-05 09:38:07 -07:00
Augie Fackler
95a87ffb9b Added signature for changeset 9506ee30a64d 2016-11-01 14:12:39 -04:00
Kevin Bullock
e51456912a merge with i18n 2016-11-01 13:03:42 -05:00
Nathan Goldbaum
cd41ee4190 tag: clarify warning about making a tag on a branch head
Currently the warning is ambiguous about whether the new tag (possibly specified
via --rev) is being added on a branch head or whether the working directory is
based on a branch head. Clarify the error message to eliminate this ambiguity.
2016-10-31 17:12:32 -05:00
FUJIWARA Katsunori
33379270bf contrib: check reference to old selenic.com domain
Now, all URL in Mercurial source tree should refer mercurial-scm.org
domain instead of selenic.com.

*.po files are ignored in this patch, because they might contain
msgid/msgstr coming from old source files.

This ignorance seems safe enough, because such msgstr should be
ignored at runtime, because:

  - msgid corresponded to it should be invalid, or
  - msgstr itself should be marked as fuzzy at synchronized to recent hg.pot

If any additional examination for *.po files is needed in the future,
let i18n/check-translation.py achieve such examination.

BTW, some binary files (e.g. *.png) are meaningless for checking
reference to old domain in this patch, but aren't ignored like as *.po
files, because excluding multiple suffixes is difficult for regexp
matching.
2016-11-01 20:39:37 +09:00
FUJIWARA Katsunori
9616956afd check-code: discard filtering result of previous check for independence
Before this patch, check-code.py applies filtering on the file
content, to which filtering of previous check is already applied.

This might hide issues, which should be detected by a subsequent check
in "checks" list.

Fortunately, this problem hasn't appeared, because there is no
overlapping of filename matching (examined in the order below).

  1. *.py or *.cgi
  2. test-* (not *.t suffix)
  3. *.c or *.h
  4. *.t
  5. *.txt
  6. *.tmpl

For example, adding a test, which wants to examine raw comment text in
*.py files, at the end of current "checks" list doesn't work as
expected, because a filter for *.py files normalizes comment text in
them.

Putting such test at the beginning of "checks" list also resolves this
problem, but such dependence on the order decreases maintainability of
check-code.py itself.

This patch discards filtering result of previous check at the
beginning of each checks, for independence of each checks.
2016-11-01 20:39:36 +09:00
FUJIWARA Katsunori
38ad72f729 help: replace selenic.com by mercurial-scm.org in man pages
Source code repository and mailing list services have been already
migrated to mercurial-scm.org domain.
2016-11-01 20:39:36 +09:00
FUJIWARA Katsunori
15640c5749 help: replace selenic.com by mercurial-scm.org in command examples
Source code repository service of Mercurial itself has been already
migrated to mercurial-scm.org domain.
2016-11-01 20:39:35 +09:00
Yuya Nishihara
26d053eede hghave: fix 'rmcwd' to ensure temporary directory is removed
On platforms where cwd can't be removed, it should try rmdir() after chdir
to the original cwd.
2016-11-01 21:14:33 +09:00
FUJIWARA Katsunori
01dbfe3b9d i18n-ja: synchronized with 7b9e11755707 2016-11-01 04:27:41 +09:00
Mads Kiilerich
40ab99f130 httppeer: make __del__ access to self.urlopener more safe
Some errors could in some cases show unfortunate scary and confusing warnings
from the httppeer delstructors:

  abort: nodename nor servname provided, or not known
  Exception AttributeError: "'httpspeer' object has no attribute 'urlopener'" in <bound method httpspeer.__del__ of <mercurial.httppeer.httpspeer object at 0x106e1f5d0>> ignored```

To mute that, take 8bdb0bb8e209 to the next level and use getattr in __del__.
2016-10-31 13:43:48 +01:00
FUJIWARA Katsunori
f6a54b0fb3 tests: test preserving execbit changes at amending only on execbit platform 2016-10-30 06:15:09 +09:00
FUJIWARA Katsunori
2e91a346cc tests: put temporary file outside the working directory for test portability
test-largefiles-update.t creates temporary file exec-bit.patch inside
the working directory for no-execbit platform specific test, but
subsequent tests aren't aware of it.

On execbit platform, subsequent tests can run successfully, because
exec-bit.patch isn't created.

But on no-execbit platform, this temporary file makes subsequent tests
show "? exec-bit.patch" at each "hg status".
2016-10-30 06:15:09 +09:00
FUJIWARA Katsunori
395fcead6f tests: avoid quoting of commit messages for test portability
journal extension uses util.shellquote() to record command line, but
result of it depends on runtime platform: double quotation is used on
Windows and OpenVMS, but single quotation is used otherwise.

test-journal-share.t sometimes specifies commit messages including
white space on command line. It makes journal output depend on runtime
platform, but commit message itself isn't important in this test case.
2016-10-30 06:15:09 +09:00
FUJIWARA Katsunori
8b10b9dd7e tests: use basic format code "%Y" instead of "%s" for test portability
On Windows, strftime() doesn't support format code "%s", and it causes
"invalid format string" error.

    https://msdn.microsoft.com/en-us/library/fe06s4ak.aspx

test-command-template.t examines not seconds value in UTC, but
arithmetic calculation. Therefore, using format code "%Y" instead of
"%s" should be reasonable.

FYI:

- Python standard library reference doesn't list "%s" up in format
  code list required for "C standard (1989 version)", even though it
  also mentions that additional format codes are required for "C
  standard (1999 version)"

  https://docs.python.org/2.7/library/datetime.html#strftime-and-strptime-behavior

- The Open Group Base Specifications Issue 7 (IEEE Std 1003.1-2008,
  2016 Edition) doesn't require strftime to support format code "%s"

  http://pubs.opengroup.org/onlinepubs/9699919799/functions/strftime.html

- "man strftime" of (Open/Oracle) Solaris and Mac OS X (= UNIX
  certified OSs) describes about format code "%s"
2016-10-30 06:15:07 +09:00
FUJIWARA Katsunori
feea36218b tests: add test-commit-interactive-curses.t "require tic" for test portability
Standard library of Python on Windows doesn't have curses module.
2016-10-29 03:08:08 +09:00
FUJIWARA Katsunori
85658908df tests: use "?" to glob both ":" and ";" in output for test portability
If environment variable looks like PATH or so (e.g. any of components
joined by ":" contains "/"), ":" in it is replaced with ";" by MinGW
at spawning Windows native process, to follow path concatenation style
of Windows.

Therefore, "bundle:../full.hg" is converted into "bundle;..\full.hg"
on MinGW.

Difference between "/" and "\" is automatically ignored by "(glob)",
but difference between ":" and ";" should be globed explicitly.
2016-10-29 03:04:54 +09:00
FUJIWARA Katsunori
2568974470 tests: invoke printenv.py via sh -c for test portability
On Windows platform, invoking printenv.py directly via hook is
problematic, because:

  - unless binding between *.py suffix and python runtime, application
    selector dialog is displayed, and running test is blocked at each
    printenv.py invocations

  - it isn't safe to assume binding between *.py suffix and python
    runtime, because application binding is easily broken

    For example, installing IDE (VisualStudio with Python Tools, or
    so) often requires binding between source files and IDE itself.

This patch invokes printenv.py via sh -c for test portability. This is
a kind of follow up for 9e4331825bea, which eliminated explicit
"python" for printenv.py. There are already other 'sh -c "printenv.py"'
in *.t files, and this fix should be reasonable.

This changes were confirmed in cases below:

  - without any application binding for *.py suffix
  - with binding between *.py suffix and VisualStudio

This patch also replaces "echo + redirection" style with "heredoc"
style, because:

  - hook command line is parsed by cmd.exe as shell at first, and
  - single quotation can't quote arguments on cmd.exe, therefore,
  - "printenv.py foobar" should be quoted by double quotation, but
  - nested quoting (or tricky escaping) isn't readable
2016-10-29 02:44:45 +09:00
Mads Kiilerich
4409f61ab2 largefiles: handle that a found standin file doesn't exist when removing it
I somehow ended up in a situation where hg crashed on an unlink I introduced in
8fd3fc1ef4c6.

I don't know how it happened and can't reproduce it. It seems like it only can
happen when the file is removed between the time of check in a working
directory context walk that finds a standin file, and the time of use when we
try to remove it because the corresponding largefile doesn't exist.

But better safe than sorry: replace the plain unlink with unlinkpath with
ignoremissing=True. That will also remove remaining empty directories, which
arguably is more correct.
2016-10-27 20:06:33 +02:00
Gábor Stefanik
5533b05a12 merge: avoid superfluous filemerges when grafting through renames (issue5407)
This is a fix for a regression introduced by the patches for issue4028.

The test changes are due to us doing fewer _checkcopies searches now, which
makes some test outputs revert to the pre-issue4028 behavior. That issue itself
remains fixed, we only skip copy tracing for files where it isn't relevant.
As a nice side effect, this makes copy detection much faster when tracing
backwards through lots of renames.
2016-10-25 21:01:53 +02:00
Yuya Nishihara
01ff276025 templater: use unfiltered changelog to calculate shortest() at constant time
cl._partialmatch() can be pretty slow if hidden revisions are involved. This
patch cancels the slowdown introduced by the previous patch by using an
unfiltered changelog, which means shortest(node) isn't always the shortest.

The result isn't perfect, but seems okay as long as shortest(node) is short
enough to type and can be used as an identifier.

  (with hidden revisions)
  % hg log -R hg-committed -r0:20000 -T '{node|shortest}\n' --time > /dev/null
  (.^^) time: real 1.530 secs (user 1.480+0.000 sys 0.040+0.000)
  (.^)  time: real 43.080 secs (user 43.060+0.000 sys 0.030+0.000)
  (.)   time: real 1.680 secs (user 1.650+0.000 sys 0.020+0.000)
2016-10-25 21:49:30 +09:00
Yuya Nishihara
35fcce9afc templater: do not use index.partialmatch() directly to calculate shortest()
cl.index.partialmatch() isn't a drop-in replacement for cl._partialmatch().
It has no knowledge about hidden revisions, and it raises ValueError if a node
shorter than 4 chars is given. Instead, use index.partialmatch() through
cl._partialmatch(), which has no such problems and gives the identical result
with/without --pure.

The test output was sampled with --pure without this patch, which shows the
most correct result. However, we'll need to switch to using an unfiltered
changelog because _partialmatch() of a filtered changelog can be an order of
magnitude slower.

  (with hidden revisions)
  % hg log -R hg-committed -r0:20000 -T '{node|shortest}\n' --time > /dev/null
  (.^)  time: real 1.530 secs (user 1.480+0.000 sys 0.040+0.000)
  (.)   time: real 43.080 secs (user 43.060+0.000 sys 0.030+0.000)
2016-10-23 14:05:23 +09:00
Yuya Nishihara
85c5af29fa tests: run "cwd was removed" test only if cwd can actually be removed
On some platforms, cwd can't be removed. In which case, util.unlinkpath()
continues with no error since the failure of directory removal isn't critical.
So it doesn't make sense to run the test added by 6395630fdfdc on those
platforms. OTOH, we need to run the test in test-rebase-scenario-global.t
since the repository is referenced after that.
2016-10-26 22:50:06 +09:00
Gábor Stefanik
e9b2eb13b5 sslutil: guard against broken certifi installations (issue5406)
Certifi is currently incompatible with py2exe; the Python code for certifi gets
included in library.zip, but not the cacert.pem file - and even if it were
included, SSLContext can't load a cacert.pem file from library.zip.
This currently makes it impossible to build a standalone Windows version of
Mercurial.

Guard against this, and possibly other situations where a module with the name
"certifi" exists, but is not usable.
2016-10-19 18:06:14 +02:00
Mads Kiilerich
b4b748a9ed revset: don't cache abstractsmartset min/max invocations infinitely
There was a "leak", apparently introduced in b37a67b41690. When running:

    hg = hglib.open('repo')
    while True:
        hg.log("max(branch('default'))")

all filteredset instances from branch() would be cached indefinitely by the
@util.cachefunc annotation on the max() implementation.

util.cachefunc seems dangerous as method decorator and is barely used elsewhere
in the code base. Instead, just open code caching by having the min/max
methods replace themselves with a plain lambda returning the result.
2016-10-25 18:56:27 +02:00
Kevin Bullock
77b67e498a merge with i18n 2016-10-24 09:14:34 -05:00
Wagner Bruna
9782d338af i18n-pt_BR: synchronized with e12e65f5ea1f 2016-10-22 23:18:43 -02:00
Simon Farnsworth
06082f0f98 tests: fix test-casefolding.t
The message had changed, but the test was not updated. This test does not
run on Linux, but failed on my Mac.
2016-10-21 16:31:16 +01:00
Gregory Szorc
3f32afbd84 commands: print security protocol support in debuginstall
Over the past week I've had to instruct multiple people to run
Python code to query the ssl module to see what TLS protocol support
is present. I think it would be useful for `hg debuginstall` to print
this info to make it easier to access and debug why Mercurial is
complaining about using an insecure TLS 1.0 protocol.

Ideally we'd also print the path to the CA cert bundle. But the APIs
for querying that in sslutil can emit warnings, making it slightly
more difficult to integrate into `hg debuginstall`. That work will
have to wait for another day.
2016-10-19 15:07:11 -07:00
Durham Goode
9fcac302ea manifest: make treemanifestctx store the repo
Same as in the last commit, the old treemanifestctx stored a reference to the
revlog.  If the inmemory revlog became invalid, the ctx now held an old copy and
would be incorrect. To fix this, we need the ctx to go through the manifestlog
for each access.

This is the same pattern that changectx already uses (it stores the repo, and
accesses commit data through self._repo.changelog).
2016-10-18 17:44:42 -07:00
Durham Goode
46fbc1bfc1 manifest: make manifestctx store the repo
The old manifestctx stored a reference to the revlog. If the inmemory revlog
became invalid, the ctx now held an old copy and would be incorrect. To fix
this, we need the ctx to go through the manifestlog for each access.

This is the same pattern that changectx already uses (it stores the repo, and
accesses commit data through self._repo.changelog).
2016-10-18 17:44:26 -07:00