Summary:
Update the chg code to correctly honor the `pager.stderr` setting, and avoid
piping stderr to the pager when it is disabled.
Since only the server-side code knows the hg config values, the server-side
chgserver.py code passes this config setting back to the client when sending
the pager request.
Reviewed By: quark-zju
Differential Revision: D17109106
fbshipit-source-id: 6b69b1a7de9f61db51af7b0ba00d65fa5053a795
Summary:
This refactors the code the code that sends system and pager requests from the
chg server to the client.
Previously these were both sent using the `S` channel code. The `S` channel
code was slightly unusual and had special handling: in general the code
assumed that upper-case channel codes did not include any request body data,
but this wasn't true for the `S` channel data. I changed this to lower-case
in order to eliminate this special case handling, and I also split up the
system and pager data into different channel codes, since they have fairly
different behavior. System requests are now sent with an `s` channel code,
and pager requests are sent with a `p` channel code.
I also changed the code to require that the server always adds a terminating
`\0` byte after each environment variable value. Previously the client code
was responsible for adding a nul terminator on the last string, which could
potentially require the client to copy the data into a larger buffer in order
to do so.
I also made a minor change to the client-side `readchannel()` code so that it
can read the channel type and body data length with a single system call
instead of making 2 separate `recv()` calls.
The main benefit of these changes is that they will let the server pass some
additional configuration information with pager requests. The change to make
use of this new field will come in a subsequent diff.
Reviewed By: quark-zju
Differential Revision: D17216291
fbshipit-source-id: c3044cf3d5f5e103f0b62d083e4ef3764160f20e
Summary:
After D7840236 stack, it's possible to have a single chg server that handles
different [extensions] configurations. The 'validate' step and 'hashstate' were
mainly designed to detect changes of [extensions], the source code of the
extensions. That becomes unnecessary with the latest design.
Remove them to simplify the logic.
chg no longer creates symlink `server2 -> server2-hash`. Bump the name
to `server3` to explicitly break compatibility.
Reviewed By: xavierd
Differential Revision: D16866463
fbshipit-source-id: 5e1d00e6f895d9b8ead0bcabefcea11756f57c94
Summary:
Usually the handshake process is pretty quick (<0.01 seconds):
chg: debug: 0.000148 try connect to ...
chg: debug: 0.000338 connected to server
chg: debug: 0.000359 initialize context buffer with size 4096
chg: debug: 0.008225 hello received: ...
chg: debug: 0.008269 capflags=0x7b03, pid=31941
chg: debug: 0.008282 request setprocname, block size 17
chg: debug: 0.008316 request attachio
chg: debug: 0.008978 response read from channel r, size 4
chg: debug: 0.009045 request chdir, block size 45
chg: debug: 0.009092 version matched (6119653365548183087)
However, we have seen some OSX cases where the handshake and basically
everything takes much longer:
chg: debug: 0.000139 try connect to ...
chg: debug: 0.000297 connected to server
chg: debug: 0.000321 initialize context buffer with size 4096
chg: debug: 0.192316 hello received: ...
chg: debug: 0.192362 capflags=0x7b03, pid=55634
chg: debug: 0.192373 request setprocname, block size 17
chg: debug: 0.192420 request attachio
chg: debug: 0.229009 response read from channel r, size 4
chg: debug: 0.229072 request chdir, block size 34
chg: debug: 0.229111 version matched (6119653365548183087)
(See P59677258 for the full paste)
If restart the chg server, the problem goes away and commands will be fast
again.
Unfortunately I'm not sure about the root cause of the problem. Maybe it's
Python's GC doing something very expensive? Maybe it's OSX thinking the server
process is "inactive" and put it to some state that's very slow to recover? Or maybe
it's some weird 3rdparty service?
For now, what we do know are:
- The slowness *sometimes* reproduces with chg.
- The slowness goes away if chg server is restarted.
As a last resort, detect the slowness by measuring the handshake time, then
restart the server accordingly. To avoid an infinite restart loop on slow machines,
the restart can only happen once.
The threshold is set to 0.05 seconds, which is roughly 5x the normal value, and
can be disabled by `CHGSTARTTIMECHECK=0`.
Reviewed By: phillco
Differential Revision: D8294468
fbshipit-source-id: 75246ea4d872045664e7feadb0acc47dfa1d8eae
Summary:
They're actively fighting against the clang-format config
and don't have an auto-fix.
Reviewed By: quark-zju
Differential Revision: D8283622
fbshipit-source-id: 2de45f50e6370a5ed14915c6ff23dc843ff14e8a
Summary:
Generate a `u64` integer about the "version" at build time, and make chg
client check the version before connecting to the server.
This would ensure a chg client would only connect to a matched version of
the server.
- In setup.py, compute the "versionhash", write it as
`mercurial.__version__.versionhash`.
- In dispatch.py, `mercurial.__version__` needs to be explicitly loaded
before forking.
- In commandserver.py, send the versionhash to the client with the "hello"
message.
- In chg.c, verify the versionhash. If it does not match, unlink the socket
path and reconnect.
Reviewed By: farnz
Differential Revision: D7978131
fbshipit-source-id: 50acc923e72e40a4f66a96f01a194cf1a57fe832
Summary:
This solves issues when the binary is linked with another C library that define
those functions.
Reviewed By: DurhamG
Differential Revision: D6888242
fbshipit-source-id: d714c7eb18bc4c281912df50567e7f176d64a669
This patch uses the newly introduced "setprocname" interface to update the
process title server-side, to make it easier to tell what a worker is actually
doing.
The new title is "chg[worker/$PID]", where PID is the process ID of the
connected client. It can be directly observed using "ps -AF" under Linux, or
"ps -A" under FreeBSD.
We have enough bits to switch to the new chg pager code path in runcommand.
So just remove the legacy getpager support.
This is a red-only patch, and will break chg's pager support temporarily.
This patch implements the simple S-channel pager handling at chg
client-side.
Note: It does not deal with environ and cwd currently for simplicity, which
will be fixed later.
Previously S channel is only used to send system commands. It will also be
used to send pager commands. So add a type parameter.
This breaks older chg clients. But chg and hg should always come from a
single commit and be packed into a single package. Supporting running
inconsistent versions of chg and hg seems to be unnecessarily complicated
with little benefit. So just make the change and assume people won't use
inconsistent chg with hg.
"sizeof(sun_path)" is too small. Use the chdir trick to support long socket
path, like "mercurial.util.bindunixsocket".
It's useful for cases where TMPDIR is long. Modern OS X rewrites TMPDIR to a
long value. And we probably want to use XDG_RUNTIME_DIR [2] for Linux.
The approach is a bit different from the previous plan, where we will have
hgc_openat and pass cmdserveropts.sockdirfd to it. That's because the
current change is easier: chg has to pass a full path to "hg" as the
"--address" parameter. There is no "--address-basename" or "--address-dirfd"
flags. The next patch will remove "sockdirfd".
Note: It'd be nice if we can use a native "connectat" implementation.
However, that's not available everywhere. Some platform (namely FreeBSD)
does support it, but the implementation has bugs so it cannot be used [2].
[1]: https://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html
[2]: https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-April/082892.html
We recently discovered a case in production that chg uses 100% CPU and is
trying to read data forever:
recvfrom(4, "", 1814012019, 0, NULL, NULL) = 0
Using gdb, apparently readchannel() got wrong data. It was reading in an
infinite loop because rsize == 0 does not exit the loop, while the server
process had ended.
(gdb) bt
#0 ... in recv () at /lib64/libc.so.6
#1 ... in readchannel (...) at /usr/include/bits/socket2.h:45
#2 ... in readchannel (hgc=...) at hgclient.c:129
#3 ... in handleresponse (hgc=...) at hgclient.c:255
#4 ... in hgc_runcommand (hgc=..., args=<optimized>, argsize=<optimized>)
#5 ... in main (argc=...486922636, argv=..., envp=...) at chg.c:661
(gdb) frame 2
(gdb) p *hgc
$1 = {sockfd = 4, pid = 381152, ctx = {ch = 108 'l',
data = 0x7fb05164f010 "st):\nTraceback (most recent call last):\n"
"Traceback (most recent call last):\ne", maxdatasize = 1814065152,"
" datasize = 1814064225}, capflags = 16131}
This patch addresses the infinite loop issue by detecting continuously empty
responses and abort in that case.
Note that datasize can be translated to ['l', ' ', 'l', 'a']. Concatenate
datasize and data, it forms part of "Traceback (most recent call last):".
This may indicate a server-side channeledoutput issue. If it is a race
condition, we may want to use flock to protect the channels.
Before this patch, "connect to" debug message is printed repeatedly because
a previous patch changed how the chg client decides the server is ready to be
connected.
This patch revises the places we print connect debug messages so they are less
repetitive without losing useful information.
If the server has an uncaught exception, it will exit without being able to
write the channel information. In this case, the client is likely to complain
about "failed to read channel", which looks inconsistent with original hg.
This patch silences the error message and makes uncaught exception behavior
more like original hg. It will help chg to pass test-fileset.t.
In some rare cases (next patch), we may want validate to do "unlink" without
forcing the client reconnect. This patch addes a new "reconnect" instruction
and makes "unlink" not to reconnect by default.
See the previous patch for details. Since the socket will be closed by the
server, handleresponse() will never return:
Traceback (most recent call last):
...
chg: abort: failed to read channel