Commit Graph

98 Commits

Author SHA1 Message Date
Jun Wu
3d025d497a chg: do not start or use a chg server if the process was running with low priority
Summary:
If a chg server was started by a low priority process, it will inherit the low
priority and can be painfully slow.

On the other hand, if the chg client was low priority, the user or the automation
wants the process to be low priority intentionally. Therefore connecting to an
existing chg server might bypass intention.

Therefore, detect the low priority case, and refuse to start or connect to a
chg server.

Previous attempts to workaround slow chg server were removed as they are
probably no longer necessary.

Reviewed By: singhsrb

Differential Revision: D15223518

fbshipit-source-id: 718e807820d481eac204f2293f949c3b315f923d
2019-05-06 15:50:43 -07:00
Jun Wu
d24d57930d setup: rebuild libchg.a if versionhash has changed
Summary:
Otherwise `hg.rust` might use a stale libchg.a and cannot pass the
client/server version check when installed.

Reviewed By: sfilipco

Differential Revision: D14531275

fbshipit-source-id: 6d8f04a75d91ad2c25ef2002ee3c3f9848094578
2019-03-21 11:23:56 -07:00
Kostia Balytskyi
b2284a0ad1 chg: remove the ability to produce an actual binary
Summary: We don't run this binary anymore, no reason to build and ship it.

Reviewed By: quark-zju

Differential Revision: D14437317

fbshipit-source-id: dd6da521783f18a2a518a7aa042be98950894e89
2019-03-14 06:35:45 -07:00
Jun Wu
b74ee0564d chg: make it incompatible with upstream chg server
Summary:
It would be massy if there are 2 chg servers running: one for fb hg, one for
upstream hg, and they share a same socket path.

Change socket path and the commandserver name so fb-hg chg can only talk to
fb-hg chg servers.

Reviewed By: markbt

Differential Revision: D13869319

fbshipit-source-id: f9d42af9bdfc542207f23c536b478fd5ef8d02a0
2019-02-08 16:12:53 -08:00
Kostia Balytskyi
9bb35f3233 chg: tell original cmd to not restart chg
Summary:
When `chg` decides, that the command is unsupported, it executes the real hg
instead. Now that the real `hg` and the `chg` can be the same binary, we need
to make sure that `chg` is definitely not re-executed. Passing `CHGDISABLE`
achieves that.

Reviewed By: markbt

Differential Revision: D13118766

fbshipit-source-id: e41a2c230d17c4c26e590d648e4d0b722cf41ee5
2018-11-19 06:15:32 -08:00
Kostia Balytskyi
c5aa7a01b3 chg: learn to build a static lib
Summary:
In order to be able to run `chg` from the main `hg.rust` binary, we decided
that we will turn it into a static lib. This diff teaches our current build
scripts to do this.

Reviewed By: quark-zju

Differential Revision: D10853906

fbshipit-source-id: 8e0f37aa7e52d4a0610f36d7903eb0a318c193ad
2018-11-05 10:08:29 -08:00
Jun Wu
993e900c79 chg: check validate time instead of hello time for detecting slow servers
Summary:
There is an issue on OSX where the chg worker runs much slower. Its root cause
is still yet to be figured out (does OSX compresses background process memory
to make). It seems we hit it again according to https://fburl.com/co1eagws.

The previous workaround was to check "hello" time, and has a threshold to
restart the server. "hello" turned out to be a cheap operation and could be
inaccurate. From P59864987 it seems much more accurate to check the "validate"
operation (0.56s vs 0.026s). So this patch does that instead.

Note: we might need to revisit this if the Rust config parser lands, as that
will speed up the "validate" command.

Reviewed By: singhsrb

Differential Revision: D9019429

fbshipit-source-id: b2c02d08d20cc075d55e2d745d9394c90e543a1b
2018-07-26 15:21:35 -07:00
Jun Wu
108668759f chg: cleanup the new entry point
Summary:
The new entry point was added by D7840237.

With it, it is now pointless to pass `--config` arguments to the chg server.
So let's just remove the related logic, to avoid accidental profiling
related configs to the server (although the code path should ignore the
config flags).

Since we no longer use the old `hg serve` command code path, raise a
ProgrammingError explicitly to avoid surprises.

Reviewed By: singhsrb

Differential Revision: D8370965

fbshipit-source-id: 6a54cd54b41dc66c10f87e821ceb8e79adef09c7
2018-06-14 18:36:46 -07:00
Jun Wu
5d9fba271f chg: restart server automatically if handshake takes too long
Summary:
Usually the handshake process is pretty quick (<0.01 seconds):

  chg: debug: 0.000148 try connect to ...
  chg: debug: 0.000338 connected to server
  chg: debug: 0.000359 initialize context buffer with size 4096
  chg: debug: 0.008225 hello received: ...
  chg: debug: 0.008269 capflags=0x7b03, pid=31941
  chg: debug: 0.008282 request setprocname, block size 17
  chg: debug: 0.008316 request attachio
  chg: debug: 0.008978 response read from channel r, size 4
  chg: debug: 0.009045 request chdir, block size 45
  chg: debug: 0.009092 version matched (6119653365548183087)

However, we have seen some OSX cases where the handshake and basically
everything takes much longer:

  chg: debug: 0.000139 try connect to ...
  chg: debug: 0.000297 connected to server
  chg: debug: 0.000321 initialize context buffer with size 4096
  chg: debug: 0.192316 hello received: ...
  chg: debug: 0.192362 capflags=0x7b03, pid=55634
  chg: debug: 0.192373 request setprocname, block size 17
  chg: debug: 0.192420 request attachio
  chg: debug: 0.229009 response read from channel r, size 4
  chg: debug: 0.229072 request chdir, block size 34
  chg: debug: 0.229111 version matched (6119653365548183087)

(See P59677258 for the full paste)

If restart the chg server, the problem goes away and commands will be fast
again.

Unfortunately I'm not sure about the root cause of the problem. Maybe it's
Python's GC doing something very expensive? Maybe it's OSX thinking the server
process is "inactive" and put it to some state that's very slow to recover? Or maybe
it's some weird 3rdparty service?

For now, what we do know are:
- The slowness *sometimes* reproduces with chg.
- The slowness goes away if chg server is restarted.

As a last resort, detect the slowness by measuring the handshake time, then
restart the server accordingly. To avoid an infinite restart loop on slow machines,
the restart can only happen once.

The threshold is set to 0.05 seconds, which is roughly 5x the normal value, and
can be disabled by `CHGSTARTTIMECHECK=0`.

Reviewed By: phillco

Differential Revision: D8294468

fbshipit-source-id: 75246ea4d872045664e7feadb0acc47dfa1d8eae
2018-06-05 22:02:50 -07:00
Wez Furlong
31bcfbe58e hg: disable check-code tests for C code
Summary:
They're actively fighting against the clang-format config
and don't have an auto-fix.

Reviewed By: quark-zju

Differential Revision: D8283622

fbshipit-source-id: 2de45f50e6370a5ed14915c6ff23dc843ff14e8a
2018-06-05 19:21:43 -07:00
Wez Furlong
a09e44a469 pass down CHG path from above
Summary:
I've been troubleshooting eden integration test failures on my
devserver and traced it to some slightly off behavior in the telemetry
wrapper.

The wrapper was setting `CHGHG` to `hg.real` rather than the computed
path to the `hg.real` executable.  In the eden integration tests this
path is the buck generated `hg.par`.  The problem this caused was running
the installed hg.real rather than the one from the test environment
and this caused resolution of the eden extension to fail.

Once I fixed that up I found that chg had detected a problem with the
paths to the hg executable that were being used; we were picking up `chg`
from the system path and had a similar issue to above.

I introduced an environmental variable `CHG_BIN` to hold the desired path
and set it to the buck built `chg` binary.

In the process of this I found that `chg` was triggering a UBSAN issue
by passing a nullptr as the second argument to `memcpy`.  I've included
the trivial fix for that in this diff also.

Reviewed By: quark-zju

Differential Revision: D8274636

fbshipit-source-id: 7ee0740cbfb447ab41b9e08308767d42790ba296
2018-06-05 13:54:26 -07:00
Jun Wu
a6fcceea31 chg: make sure client and server have a matched version
Summary:
Generate a `u64` integer about the "version" at build time, and make chg
client check the version before connecting to the server.

This would ensure a chg client would only connect to a matched version of
the server.

- In setup.py, compute the "versionhash", write it as
  `mercurial.__version__.versionhash`.
- In dispatch.py, `mercurial.__version__` needs to be explicitly loaded
  before forking.
- In commandserver.py, send the versionhash to the client with the "hello"
  message.
- In chg.c, verify the versionhash. If it does not match, unlink the socket
  path and reconnect.

Reviewed By: farnz

Differential Revision: D7978131

fbshipit-source-id: 50acc923e72e40a4f66a96f01a194cf1a57fe832
2018-05-24 09:12:00 -07:00
Jun Wu
9532336199 chg: do not start servers with --traceback
Summary:
In some tests using chg, `--traceback` got accidentally enabled for
commands without `--traceback`. That is a side effect starting
the server using `--traceback`. Let's avoid passing `--traceback`
when starting the server.

Differential Revision: D6942974

fbshipit-source-id: d40697c8a9487ae53ffb5ae43da4f4582ca86545
2018-04-13 21:51:09 -07:00
Jun Wu
79e07a5585 chg: prefix (re|m)allocx with chg_
Summary:
This solves issues when the binary is linked with another C library that define
those functions.

Reviewed By: DurhamG

Differential Revision: D6888242

fbshipit-source-id: d714c7eb18bc4c281912df50567e7f176d64a669
2018-04-13 21:51:03 -07:00
Yuya Nishihara
9618d42102 chg: remove outdated rule to start test server
This rule is no longer useful because chg daemon may be killed and respawned
per config/environment hash. We can't reliably run a daemon in foreground.
2017-10-12 22:21:14 +09:00
muxator
292c1fdcc4 build: chg build was failing when the base directory contained spaces 2017-10-11 02:06:12 +02:00
Yuya Nishihara
bac3fece2c chg: just forward --time to command server
Since we've removed the use of atexit in ee4f321cd621, --time just works.
2017-10-07 22:07:10 +09:00
Jun Wu
a05655c958 chg: show timestamp with debug messages
Like `strace -tr`, this helps finding performance bottlenecks.

Differential Revision: https://phab.mercurial-scm.org/D807
2017-09-23 14:58:40 -07:00
Mathias De Maré
cafe7e372c chg: define _GNU_SOURCE to allow CentOS 5 compilation
Without this flag, compilation fails with:
 hgclient.c: In function 'hgc_open':
 hgclient.c:466: error: 'O_DIRECTORY' undeclared (first use in this function)
 hgclient.c:466: error: (Each undeclared identifier is reported only once
 hgclient.c:466: error: for each function it appears in.)

Differential Revision: https://phab.mercurial-scm.org/D260
2017-08-07 13:40:36 +02:00
Jun Wu
488ba17f87 chg: respect environment variables for pager
Previously chg runs the pager command without respecting its environment
variables being told to use. This patch makes it so.
2017-04-12 16:50:23 -07:00
Jun Wu
c81e982932 chg: always wait for pager
Previously, when runcommand raises, chg aborts with, and does not wait for
pager. The call stack is like:

  hgc_runcommand -> handleresponse -> readchannel -> debugmsg("failed to
  read channel") -> exit(255)

That means, chg returns to the shell, then both the pager and the shell will
read from the terminal at the same time, causing problems.

This patch fixes that by using "atexit" to register the pager cleanup
function so chg will always wait for pager even if runcommand raises.
2017-04-11 18:31:40 -07:00
Jun Wu
dc918444b5 chg: forward user-defined signals
SIGUSR1 and SIGUSR2 are reserved for user-defined behaviors. They may be
redefined by an hg extension [1], but cannot be easily redefined for chg.
Since the default behavior (kill) is not that useful for chg, let's forward
them to hg, hoping it got redefined there and could be more useful.

[1] https://bitbucket.org/facebook/hg-experimental/commits/e7c883a465
2017-03-08 13:46:26 -08:00
Jun Wu
ddc5fd5cc8 chg: document why we send SIGHUP and SIGINT to process group
This makes the code more consistent - other signals are documented.
2017-03-08 13:34:25 -08:00
Jun Wu
bb3aea397e chg: verify XDG_RUNTIME_DIR
According to the specification [1], $XDG_RUNTIME_DIR should be ignored
unless:

  The directory MUST be owned by the user, and he MUST be the only one
  having read and write access to it. Its Unix access mode MUST be 0700.

This patch adds a check and ignores it if it does not meet part of the
criteria.

[1]: https://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html
2017-02-06 17:01:06 -08:00
Jun Wu
402691b892 chg: check snprintf result strictly
This makes the program more robust when somebody changes hgclient's
maxdatasize in the future.
2017-01-11 23:39:24 +08:00
Jun Wu
bb277b3ff3 chg: change server's process title
This patch uses the newly introduced "setprocname" interface to update the
process title server-side, to make it easier to tell what a worker is actually
doing.

The new title is "chg[worker/$PID]", where PID is the process ID of the
connected client. It can be directly observed using "ps -AF" under Linux, or
"ps -A" under FreeBSD.
2017-01-11 07:40:52 +08:00
Jun Wu
b61b02a865 chg: remove getpager support
We have enough bits to switch to the new chg pager code path in runcommand.
So just remove the legacy getpager support.

This is a red-only patch, and will break chg's pager support temporarily.
2017-01-10 06:59:39 +08:00
Jun Wu
5ae59a4110 chg: handle pager request client-side
This patch implements the simple S-channel pager handling at chg
client-side.

Note: It does not deal with environ and cwd currently for simplicity, which
will be fixed later.
2017-01-10 06:59:03 +08:00
Jun Wu
923ee6957d chg: check type read from S channel
The previous patch added the check server-side. This patch added it
client-side.
2017-01-06 16:14:52 +00:00
Jun Wu
734e02b02d chg: send type information via S channel (BC)
Previously S channel is only used to send system commands. It will also be
used to send pager commands. So add a type parameter.

This breaks older chg clients. But chg and hg should always come from a
single commit and be packed into a single package. Supporting running
inconsistent versions of chg and hg seems to be unnecessarily complicated
with little benefit. So just make the change and assume people won't use
inconsistent chg with hg.
2017-01-06 16:11:03 +00:00
Jun Wu
dfb8c98044 chg: add procutil.h
This patch adds a formal header procutil.h for procutil.c, and changes
Makefile to build procutil.c independently.
2017-01-02 14:57:14 +00:00
Jun Wu
8e6abc8b48 chg: let procutil maintain its own pagerpid
Previously, chg.c maintains the pagerpid. Let's move it to procutil.c.

Note: chg.c still have a pagerpid to decide whether to call attachio or not.
In the future, attachio may be moved from hgc_open to hgc_runcommand, and
hgc_runcommand handles both pager and attachio so we don't need to run
attachio twice. And chg.c will be free of pagerpid.
2017-01-02 14:43:37 +00:00
Jun Wu
4321520c94 chg: decouple hgclient from setuppager
procutil should not depend on hgclient. This patch makes the pager handling
part independent from hgclient.
2017-01-02 14:10:32 +00:00
Jun Wu
efbbb12a58 chg: decouple hgclient from setupsignalhandler
procutil should not depend on hgclient. This patch makes the signal handling
part independent from hgclient.
2017-01-02 14:04:35 +00:00
Jun Wu
fc44f74541 chg: move signal and pager handling to a separate file
In the future hgclient will deal with pager directly inside runcommand, so
related signal handling stuff needs to be decoupled from chg.c.

The signal handling and pager logic are coupled because we need to forward
SIGPIPE when pager exits. So they are moved together, otherwise a global
variable (pagerpid) is inevitable.

This patch moves related functions from chg.c to procutil.c, which was
marked as copied to maintain annotate history.

The move is done without code modification for easy review, therefore
`#include "procutil.c"` was introduced temporarily.
2017-01-02 14:02:47 +00:00
Jun Wu
2419c6679a chg: respect XDG_RUNTIME_DIR
$XDG_RUNTIME_DIR [1] is a better place for user daemons. Let's use it and
fallback to $TMPDIR.

After this patch, chg will try socket paths in the following order:

  1. $CHGSOCKNAME
  2. $XDG_RUNTIME_DIR/chg/server
  3. ${TMPDIR:-tmp}/chg$UID/server

[1]: https://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html
2016-12-26 00:02:42 +00:00
Jun Wu
4548a63b44 chg: make "get default sockdir" a separate method
The logic to get a default socket directory will become longer in the next
patch. So let's move it out.
2016-12-25 23:49:54 +00:00
Jun Wu
f437cb2c85 chg: handle connect failure before errno gets overridden
This patch moves the error handling logic up so that errno after connect
won't be overridden.
2016-12-25 23:32:11 +00:00
Jun Wu
d23c117db4 chg: support long socket path
This patch replaces UNIX_PATH_MAX (108) with PATH_MAX (4096) so we can have
long unix path.
2016-12-23 16:26:40 +00:00
Jun Wu
c5374c4a97 chg: remove sockdirfd
See the previous patch for the reason.
2016-12-23 16:16:44 +00:00
Jun Wu
8499ade54b chg: let hgc_open support long path
"sizeof(sun_path)" is too small. Use the chdir trick to support long socket
path, like "mercurial.util.bindunixsocket".

It's useful for cases where TMPDIR is long. Modern OS X rewrites TMPDIR to a
long value. And we probably want to use XDG_RUNTIME_DIR [2] for Linux.

The approach is a bit different from the previous plan, where we will have
hgc_openat and pass cmdserveropts.sockdirfd to it. That's because the
current change is easier: chg has to pass a full path to "hg" as the
"--address" parameter. There is no "--address-basename" or "--address-dirfd"
flags. The next patch will remove "sockdirfd".

Note: It'd be nice if we can use a native "connectat" implementation.
However, that's not available everywhere. Some platform (namely FreeBSD)
does support it, but the implementation has bugs so it cannot be used [2].

[1]: https://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html
[2]: https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-April/082892.html
2016-12-23 16:37:00 +00:00
Jun Wu
e912a360fc chg: remove locks
See the previous two patches for the reason. The advantage is a simplified
code base and better throughput when starting multiple servers with multiple
confighashes. The disadvantage is starting multiple servers in parallel with
a single confighash will waste some CPU time, which is probably fine in
common use-cases.

This makes it easier to switch to relative paths to support long unix domain
socket paths.
2016-12-19 22:15:00 +00:00
Jun Wu
24cb48448f chg: start server at a unique address
See the previous patch for motivation. Previously, the server is started at
a globally shared address. This patch appends pid to the address so it
becomes unique.

Note: with Linux pid namespace, the address may be non-unique, but it does
not affect correctness of chg - chg client will receive an redirection and
that's it.
2016-12-19 22:09:49 +00:00
Yuya Nishihara
30fe4722fb chgserver: make it a core module and drop extension flags
It was an extension just because there were several dependency cycles I
needed to address.

I don't add 'chgserver' to extensions._builtin since chgserver is considered
an internal extension so nobody should enable it by their config.
2016-10-15 14:30:16 +09:00
Yuya Nishihara
6c0a0f348a chg: just take it as EOF if recv() returns 0
hgc->sockfd is a blocking stream socket. recv() should never return 0 other
than EOF.

See 4ff8628991ea for the original problem.
2016-08-05 21:21:33 +09:00
Jun Wu
997125affe chg: forward SIGINT, SIGHUP to process group
These signals are meant to send to a process group, instead of a single
process: SIGINT is usually emitted by the terminal and sent to the process
group. SIGHUP usually happens to a process group if termination of a process
causes that process group to become orphaned.

Before this patch, chg will only forward these signals to the single server
process. This patch changes it to the server process group.

This will allow us to properly kill processes started by the forked server
process, like a ssh process. The behavior difference can be observed by
setting SSH_ASKPASS to a dummy script doing "sleep 100" and then run
"chg push ssh://dest-need-password-auth". Before this patch, the first Ctrl+C
will kill the hg process while ssh-askpass and ssh will remain alive. This
patch will make sure they are killed properly.
2016-07-17 22:55:47 +01:00
Jun Wu
ecf32d3bf2 chg: handle EOF reading data block
We recently discovered a case in production that chg uses 100% CPU and is
trying to read data forever:

  recvfrom(4, "", 1814012019, 0, NULL, NULL) = 0

Using gdb, apparently readchannel() got wrong data. It was reading in an
infinite loop because rsize == 0 does not exit the loop, while the server
process had ended.

  (gdb) bt
  #0 ... in recv () at /lib64/libc.so.6
  #1 ... in readchannel (...) at /usr/include/bits/socket2.h:45
  #2 ... in readchannel (hgc=...) at hgclient.c:129
  #3 ... in handleresponse (hgc=...) at hgclient.c:255
  #4 ... in hgc_runcommand (hgc=..., args=<optimized>, argsize=<optimized>)
  #5 ... in main (argc=...486922636, argv=..., envp=...) at chg.c:661
  (gdb) frame 2
  (gdb) p *hgc
  $1 = {sockfd = 4, pid = 381152, ctx = {ch = 108 'l',
        data = 0x7fb05164f010 "st):\nTraceback (most recent call last):\n"
        "Traceback (most recent call last):\ne", maxdatasize = 1814065152,"
        " datasize = 1814064225}, capflags = 16131}

This patch addresses the infinite loop issue by detecting continuously empty
responses and abort in that case.

Note that datasize can be translated to ['l', ' ', 'l', 'a']. Concatenate
datasize and data, it forms part of "Traceback (most recent call last):".

This may indicate a server-side channeledoutput issue. If it is a race
condition, we may want to use flock to protect the channels.
2016-07-18 18:55:06 +01:00
Jun Wu
2106a129bc chg: add pgid to hgclient struct
The previous patch makes the server tell the client its pgid. This patch
stores it in hgclient_t and adds a function to get it.
2016-07-17 23:05:59 +01:00
Yuya Nishihara
5bae87a105 chg: silence warning of unused parameter 'sig' 2016-06-28 22:39:06 +09:00
Jun Wu
dfebfdbf56 chg: send SIGPIPE to server immediately when pager exits (issue5278)
If the user press 'q' to leave the 'less' pager, it is expected to end the
hg process immediately. We currently rely on SIGPIPE for this behavior. But
SIGPIPE won't arrive if we don't write anything (like doing heavy
computation, reading from network etc). If that happens, the user will feel
that the hg process just hangs.

The patch address the issue by adding a SIGCHLD signal handler and sends
SIGPIPE to the server as soon as the pager exits.

This is also an issue with hg's pager implementation.
2016-06-24 15:21:10 +01:00