The existing scheme using util.find_exe and subprocess.call meant we
couldn't use simple shell commands in tests. Fix that.
Also, it mistakenly used status from the system() call rather than
good from the bisect call in reporting results.
As reported in issue 1445:
A valid candidate revision for a bisect test is not considered for testing
due to its skipped ancestor. If this revision is the only untested one left
an assertion error occurs.
Automatically detect whether we're looking for a bad to good
transition rather than the usual good to bad transition by detecting
when badrev is inside the good set and flipping good/bad.