Commit Graph

20 Commits

Author SHA1 Message Date
Gregory Szorc
bfe71a12f3 worker: document poor partitioning scheme impact
mpm isn't a fan of the existing or previous partitioning scheme. He
provided a fantastic justification for why on the mailing list.

This patch adds his words to the code so they aren't forgotten.
2016-02-27 21:43:17 -08:00
Gregory Szorc
4a0cbf9fe4 worker: change partition strategy to every Nth element
The only consumer of the worker pool code today is `hg update`.

Previously, the algorithm to partition work to each worker process
preserved input list ordering. We'd take the first N elements, then
the next N elements, etc. Measurements on mozilla-central demonstrate
this isn't an optimal partitioning strategy.

I added debug code to print when workers were exiting. When performing
a working copy update on a previously empty working copy of
mozilla-central, I noticed that process lifetimes were all over the
map. One worker would complete after 7s. Many would complete after
12s. And another worker would often take >16s. This behavior occurred
for many worker process counts and was more pronounced on some than
others.

What I suspect is happening is some workers end up with lots of
small files and others with large files. This is because the update
code passes in actions according to sorted filenames. And, directories
under tend to accumulate similar files. For example, test directories
often consist of many small test files and media directories contain
binary (often larger) media files.

This patch changes the partitioning algorithm to select every Nth
element from the input list. Each worker thus has a similar composition
of files to operate on.

The result of this change is that worker processes now all tend to exit
around the same time. The possibility of a long pole due to being
unlucky and receiving all the large files has been mitigated. Overall
execution time seems to drop, but not by a statistically significant
amount on mozilla-central. However, repositories with directories
containing many large files will likely show a drop.

There shouldn't be any regressions due to partial manifest decoding
because the update code already iterates the manifest to determine
what files to operate on, so the manifest should already be decoded.
2016-02-20 15:56:44 -08:00
Pierre-Yves David
30913031d4 error: get Abort from 'error' instead of 'util'
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.

For great justice.
2015-10-08 12:55:45 -07:00
Gregory Szorc
5880e9b671 worker: restore old countcpus code (issue4869)
This is a backout of 0f1a7b0ccb69. The stdlib implementation of
multiprocessing.cpu_count() attempts to invoke a process on BSD and
Darwin platforms (at least on 2.7). Under certain conditions (such as
cwd being removed) this could raise. Our old code was silently catching
the exception.

The old code was more robust, so restore it.
2015-10-08 10:57:03 -07:00
Gregory Szorc
8c2d6fb389 worker: use multiprocessing to find cpu count
The multiprocessing package was added in Python 2.6.
The implementation of worker.countcpus() is very similar to
multiprocessing.cpu_count(). Ditch our one-off code.

multiprocessing does result in a number of imports. However,
the lazy importer ensures that we don't import anything until
cpu_count() is called. Furthermore, if we are doing something
with multiple cores, chances are the time of that operation
will dwarf the import time, so module bloat isn't a concern
here.
2015-05-25 13:10:38 -07:00
Gregory Szorc
999425dca1 worker: use absolute_import 2015-08-08 18:44:41 -07:00
Gregory Szorc
5380dea2a7 global: mass rewrite to use modern exception syntax
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".

This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.

This patch was produced by running `2to3 -f except -w -n .`.
2015-06-23 22:20:08 -07:00
Mads Kiilerich
23da6c1d98 cleanup: avoid _ for local unused tmp variables - that is reserved for i18n
_ is usually used for i18n markup but we also used it for I-don't-care
variables.

Instead, name don't-care variables in a slightly descriptive way but use the _
prefix to designate unused variable.

This will mute some pyflakes "import '_' ... shadowed by loop variable"
warnings.
2014-08-15 16:20:47 +02:00
Augie Fackler
9f876f6c89 cleanup: move stdlib imports to their own import statement
There are a few warnings still produced by my import checker, but
those are false positives produced by modules that share a name with
stdlib modules.
2013-11-06 16:48:06 -05:00
Matt Mackall
75eb7e7c7e worker: properly report errors from worker processes (issue3982) 2013-07-16 15:18:12 -05:00
Matt Mackall
e4e0d4a087 worker: check problem state correctly (issue3982)
If a large update triggered an abort, it was possible for the main
thread to still update the dirstate.

This fix is incomplete, as the failing worker now doesn't generate a
proper error message. This is difficult in the fork-based framework,
which relies on exceptions propagating to the top of the dispatcher
for formatting.
2013-07-16 11:53:53 -05:00
Bryan O'Sullivan
7fb4e0bf12 worker: add missing import of errno
Found using Cython.
2013-04-12 17:16:37 -07:00
Bryan O'Sullivan
c538c00399 worker: catch all exceptions, try to exit usefully/safely 2013-04-11 13:30:31 -07:00
Bryan O'Sullivan
0aa6f05307 worker: handle worker failures more aggressively
We now wait for worker processes in a separate thread, so that we can
spot failures in a timely way, wihout waiting for the progress pipe
to drain.

If a worker fails, we recover the pre-parallel-update behaviour of
failing early by killing its peers before propagating the failure.
2013-02-20 11:31:34 -08:00
Bryan O'Sullivan
0da2636c99 worker: fix a race in SIGINT handling
This is almost impossible to trigger due to the tiny time window involved.
2013-02-20 11:31:31 -08:00
Bryan O'Sullivan
9f53401ffd worker: on error, exit similarly to the first failing worker
Previously, if a worker failed, we exited with status 1. We now exit
with the correct exit code (killing ourselves if necessary).
2013-02-20 11:31:27 -08:00
Bryan O'Sullivan
5d849a878a worker: allow a function to be run in multiple worker processes
If we estimate that it will be worth the cost, we run the function in
multiple processes. Otherwise, we run it in-process.

Children report progress to the parent through a pipe.

Not yet implemented on Windows.
2013-02-09 15:51:32 -08:00
Bryan O'Sullivan
8ef1da44b3 worker: partition a list (of tasks) into equal-sized chunks 2013-02-09 15:51:32 -08:00
Bryan O'Sullivan
998baaf4d8 worker: estimate whether it's worth running a task in parallel
Not implemented for Windows yet.
2013-02-09 15:51:26 -08:00
Bryan O'Sullivan
46bf2f5a6f worker: count the number of CPUs
This works on the major platforms, and falls back to a safe guess of
1 elsewhere.
2013-02-09 15:22:12 -08:00