sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-09 08:18:15 +03:00

Author	SHA1	Message	Date
David Soria Parra	3e94bf58b6	worker: flush ui buffers before running the worker a91c6275 introduces flushing ui buffers after a worker finished. If the ui was not flushed before the worker was started, fork will copy the existing buffers to the worker. This causes messages issued before the worker started to be written to the terminal for each worker. We are now flushing the ui before we start a worker and add an appropriate test which will fail before this patch.	2017-03-28 10:21:38 -07:00
Martin von Zweigbergk	ef59d7b1fd	merge with stable	2017-02-28 11:13:25 -08:00
Yuya Nishihara	ee998576d8	worker: flush messages written by child processes before exit I found some child outputs were lost while testing the previous patch. Since os._exit() does nothing special, we need to do that explicitly.	2017-02-25 12:48:50 +09:00
FUJIWARA Katsunori	47ba9fae77	worker: ignore meaningless exit status indication returned by os.waitpid() Before this patch, worker implementation assumes that os.waitpid() with os.WNOHANG returns '(0, 0)' for still running child process. This is explicitly specified as below in Python API document. os.WNOHANG The option for waitpid() to return immediately if no child process status is available immediately. The function returns (0, 0) in this case. On the other hand, POSIX specification doesn't define the "stat_loc" value returned by waitpid() with WNOHANG for such child process. http://pubs.opengroup.org/onlinepubs/9699919799/functions/waitpid.html CPython implementation for os.waitpid() on POSIX doesn't take any care of this gap, and this may cause unexpected "exit status indication" even on POSIX conformance platform. For example, os.waitpid() with os.WNOHANG returns non-zero "exit status indication" on FreeBSD. This implies os.kill() with own pid or sys.exit() with non-zero exit code, even if no child process fails. To ignore meaningless exit status indication returned by os.waitpid(), this patch skips subsequent steps forcibly, if os.waitpid() returns 0 as pid. This patch also arranges examination of 'p' value for readability. FYI, there are some issues below about this behavior reported for CPython. https://bugs.python.org/issue21791 https://bugs.python.org/issue27808	2017-02-25 01:07:52 +09:00
Pulkit Goyal	07314d0686	py3: convert the mode argument of os.fdopen to unicodes (1 of 2) os.fdopen() does not accepts bytes as its second argument which represent the mode in which the file is to be opened. This patch makes sure unicodes are passed in py3 by using pycompat.sysstr().	2017-02-13 20:06:38 +05:30
Pulkit Goyal	4780c32e4c	py3: replace os.name with pycompat.osname (part 1 of 2) os.name returns unicodes on py3 and we have pycompat.osname which returns bytes. This series of 2 patches will change every ocurrence of os.name with pycompat.osname.	2016-12-19 00:16:52 +05:30
Pulkit Goyal	1a4248666b	py3: replace os.environ with encoding.environ (part 2 of 5)	2016-12-18 01:46:39 +05:30
Jun Wu	cdd5ade3da	worker: use os._exit for posix worker in all cases Like commandserver, the worker should never run other resource cleanup logic. Previously this is not true for workers if they have exceptions other than KeyboardInterrupt. This actually caused a real-world deadlock with remotefilelog: 1. remotefilelog/fileserverclient creates a sshpeer. pipei/o/e get created. 2. worker inherits that sshpeer's pipei/o/e. 3. worker runs sshpeer.cleanup (only happens without os._exit) 4. worker closes pipeo/i, which will normally make the sshpeer read EOF from its stdin and exit. But the master process still have pipeo, so no EOF. 5. worker reads pipee (stderr of sshpeer), which never completes because the ssh process does not exit, does not close its stderr. 6. master waits for all workers, which never completes because they never complete sshpeer.cleanup. This could also be addressed by closing these fds after fork, which is not easy because Python 2.x does not have an official "afterfork" hook. Hacking os.fork is also ugly. Besides, sshpeer is probably not the only troublemarker. The patch changes _posixworker so all its code paths will use os._exit to avoid running unwanted resource clean-ups.	2016-11-24 01:15:34 +00:00
Yuya Nishihara	71e47c621a	worker: discard waited pid by anyone who noticed it first This makes sure all waited pids are removed before calling killworkers() even if waitpid()-pids.discard() sequence is interrupted by another SIGCHLD.	2016-11-17 20:57:09 +09:00
Yuya Nishihara	bb91ef72fa	worker: kill workers after all zombie processes are reaped Since we now wait child processes in non-blocking way (changed by 6c7588a50638 and 13c3aefdee29), we don't have to kill them in the middle of the waitpid() loop. This change will help solving a possible race of waitpid()-pids.discard() sequence and another SIGCHLD. waitforworkers() is called by cleanup(), in which case we do killworkers() beforehand so we can remove killworkers() from waitforworkers().	2016-11-17 21:08:58 +09:00
Yuya Nishihara	524706bec6	worker: make sure killworkers() never be interrupted by another SIGCHLD killworkers() iterates over pids, which can be updated by SIGCHLD handler. So we should either copy pids or prevent killworkers() from being interrupted by SIGCHLD. I chose the latter as it is simpler and can make pids handling more consistent. This fixes a possible "set changed size during iteration" error at killworkers() before cleanup().	2016-11-17 20:44:05 +09:00
Yuya Nishihara	4d599ed630	worker: fix missed break on successful waitpid() Follow-up for 5414fcc0ba19.	2016-11-17 21:43:01 +09:00
Jun Wu	ea73f2efd0	worker: stop using a separate thread waiting for children Now that we have a SIGCHLD hander, and it could get executed when waiting for I/O. It's no longer necessary to have a separated waitpid thread. So just remove it.	2016-11-12 03:06:07 +00:00
Jun Wu	483697646a	worker: add a SIGCHLD handler to collect worker immediately As planned by previous patches, add a SIGCHLD handler to get notifications about worker exits, and deals with worker failure immediately. Note that the SIGCHLD handler gets unregistered before killworkers(), so SIGCHLD won't interrupt "killworkers" - making it harder to send kill signals to waited processes.	2016-11-12 03:07:22 +00:00
Jun Wu	5b9ad89016	worker: make waitforworkers reentrant We are going to use it in the SIGCHLD handler. The handler will be executed in the main thread with the non-blocking version of waitpid, while the waitforworkers thread runs the blocking version. It's possible that one of them collects a worker and makes the other error out (no child to wait). This patch handles these errors: ECHILD is ignored. EINTR needs a retry. The "pids" set is designed to be only modifiable by "waitforworkers". And we only remove items after a successful waitpid. Since a child process can only be "waitpid"-ed once. It's guaranteed that "pids.remove(p)" won't be called with duplicated "p"s. And once a "p" is removed from "pids", that "p" does not need to be killed or waited any more.	2016-11-15 02:12:16 +00:00
Jun Wu	c6f4ebbf7e	worker: change "pids" to a set There is no need to keep any order of the "pids" array. A set is more efficient for the "remove" operation. And the following patch will use that.	2016-11-15 02:10:40 +00:00
Jun Wu	30695d34e0	worker: allow waitforworkers to be non-blocking This patch adds a boolean flag to waitforworkers and makes it non-blocking if set to True. This is to make it possible that we can reap our workers while keep other unrelated children untouched, after receiving SIGCHLD.	2016-07-28 20:57:07 +01:00
Jun Wu	300350ed69	worker: wait worker pid explicitly Before this patch, waitforworkers uses os.wait() to collect child workers, and only wait len(pids) processes. This can have serious issues if other code spawns new processes and does not reap them: 1. worker.py may get wrong exit code and kill innocent workers. 2. worker.py may continue without waiting for all workers to complete. This patch fixes the issue by using waitpid to wait worker pid explicitly. However, this patch introduces a new issue: worker failure may not be handled immediately. The issue will be addressed in next patches.	2016-07-28 20:51:20 +01:00
Jun Wu	d66ec54835	worker: move killworkers and waitforworkers up We need to use them in the SIGCHLD handler and SIGCHLD handler should be installed before fork.	2016-07-28 20:49:57 +01:00
Jun Wu	2893e63c0a	worker: migrate to util.iterfile	2016-11-14 23:12:11 +00:00
Gregory Szorc	bfe71a12f3	worker: document poor partitioning scheme impact mpm isn't a fan of the existing or previous partitioning scheme. He provided a fantastic justification for why on the mailing list. This patch adds his words to the code so they aren't forgotten.	2016-02-27 21:43:17 -08:00
Gregory Szorc	4a0cbf9fe4	worker: change partition strategy to every Nth element The only consumer of the worker pool code today is `hg update`. Previously, the algorithm to partition work to each worker process preserved input list ordering. We'd take the first N elements, then the next N elements, etc. Measurements on mozilla-central demonstrate this isn't an optimal partitioning strategy. I added debug code to print when workers were exiting. When performing a working copy update on a previously empty working copy of mozilla-central, I noticed that process lifetimes were all over the map. One worker would complete after 7s. Many would complete after 12s. And another worker would often take >16s. This behavior occurred for many worker process counts and was more pronounced on some than others. What I suspect is happening is some workers end up with lots of small files and others with large files. This is because the update code passes in actions according to sorted filenames. And, directories under tend to accumulate similar files. For example, test directories often consist of many small test files and media directories contain binary (often larger) media files. This patch changes the partitioning algorithm to select every Nth element from the input list. Each worker thus has a similar composition of files to operate on. The result of this change is that worker processes now all tend to exit around the same time. The possibility of a long pole due to being unlucky and receiving all the large files has been mitigated. Overall execution time seems to drop, but not by a statistically significant amount on mozilla-central. However, repositories with directories containing many large files will likely show a drop. There shouldn't be any regressions due to partial manifest decoding because the update code already iterates the manifest to determine what files to operate on, so the manifest should already be decoded.	2016-02-20 15:56:44 -08:00
Pierre-Yves David	30913031d4	error: get Abort from 'error' instead of 'util' The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be confused about that and gives all the credit to 'util' instead of the hardworking 'error'. In a spirit of equity, we break the cycle of injustice and give back to 'error' the respect it deserves. And screw that 'util' poser. For great justice.	2015-10-08 12:55:45 -07:00
Gregory Szorc	5880e9b671	worker: restore old countcpus code (issue4869) This is a backout of 0f1a7b0ccb69. The stdlib implementation of multiprocessing.cpu_count() attempts to invoke a process on BSD and Darwin platforms (at least on 2.7). Under certain conditions (such as cwd being removed) this could raise. Our old code was silently catching the exception. The old code was more robust, so restore it.	2015-10-08 10:57:03 -07:00
Gregory Szorc	8c2d6fb389	worker: use multiprocessing to find cpu count The multiprocessing package was added in Python 2.6. The implementation of worker.countcpus() is very similar to multiprocessing.cpu_count(). Ditch our one-off code. multiprocessing does result in a number of imports. However, the lazy importer ensures that we don't import anything until cpu_count() is called. Furthermore, if we are doing something with multiple cores, chances are the time of that operation will dwarf the import time, so module bloat isn't a concern here.	2015-05-25 13:10:38 -07:00
Gregory Szorc	999425dca1	worker: use absolute_import	2015-08-08 18:44:41 -07:00
Gregory Szorc	5380dea2a7	global: mass rewrite to use modern exception syntax Python 2.6 introduced the "except type as instance" syntax, replacing the "except type, instance" syntax that came before. Python 3 dropped support for the latter syntax. Since we no longer support Python 2.4 or 2.5, we have no need to continue supporting the "except type, instance". This patch mass rewrites the exception syntax to be Python 2.6+ and Python 3 compatible. This patch was produced by running `2to3 -f except -w -n .`.	2015-06-23 22:20:08 -07:00
Mads Kiilerich	23da6c1d98	cleanup: avoid _ for local unused tmp variables - that is reserved for i18n _ is usually used for i18n markup but we also used it for I-don't-care variables. Instead, name don't-care variables in a slightly descriptive way but use the _ prefix to designate unused variable. This will mute some pyflakes "import '_' ... shadowed by loop variable" warnings.	2014-08-15 16:20:47 +02:00
Augie Fackler	9f876f6c89	cleanup: move stdlib imports to their own import statement There are a few warnings still produced by my import checker, but those are false positives produced by modules that share a name with stdlib modules.	2013-11-06 16:48:06 -05:00
Matt Mackall	75eb7e7c7e	worker: properly report errors from worker processes (issue3982)	2013-07-16 15:18:12 -05:00
Matt Mackall	e4e0d4a087	worker: check problem state correctly (issue3982) If a large update triggered an abort, it was possible for the main thread to still update the dirstate. This fix is incomplete, as the failing worker now doesn't generate a proper error message. This is difficult in the fork-based framework, which relies on exceptions propagating to the top of the dispatcher for formatting.	2013-07-16 11:53:53 -05:00
Bryan O'Sullivan	7fb4e0bf12	worker: add missing import of errno Found using Cython.	2013-04-12 17:16:37 -07:00
Bryan O'Sullivan	c538c00399	worker: catch all exceptions, try to exit usefully/safely	2013-04-11 13:30:31 -07:00
Bryan O'Sullivan	0aa6f05307	worker: handle worker failures more aggressively We now wait for worker processes in a separate thread, so that we can spot failures in a timely way, wihout waiting for the progress pipe to drain. If a worker fails, we recover the pre-parallel-update behaviour of failing early by killing its peers before propagating the failure.	2013-02-20 11:31:34 -08:00
Bryan O'Sullivan	0da2636c99	worker: fix a race in SIGINT handling This is almost impossible to trigger due to the tiny time window involved.	2013-02-20 11:31:31 -08:00
Bryan O'Sullivan	9f53401ffd	worker: on error, exit similarly to the first failing worker Previously, if a worker failed, we exited with status 1. We now exit with the correct exit code (killing ourselves if necessary).	2013-02-20 11:31:27 -08:00
Bryan O'Sullivan	5d849a878a	worker: allow a function to be run in multiple worker processes If we estimate that it will be worth the cost, we run the function in multiple processes. Otherwise, we run it in-process. Children report progress to the parent through a pipe. Not yet implemented on Windows.	2013-02-09 15:51:32 -08:00
Bryan O'Sullivan	8ef1da44b3	worker: partition a list (of tasks) into equal-sized chunks	2013-02-09 15:51:32 -08:00
Bryan O'Sullivan	998baaf4d8	worker: estimate whether it's worth running a task in parallel Not implemented for Windows yet.	2013-02-09 15:51:26 -08:00
Bryan O'Sullivan	46bf2f5a6f	worker: count the number of CPUs This works on the major platforms, and falls back to a safe guess of 1 elsewhere.	2013-02-09 15:22:12 -08:00

40 Commits