Most test scripts use "hg" to interact with a temporary test repository.
However a few tests also want to run hg commands to interact with the local
repository containing the mercurial source code. Notably, many of the
test-check-* tests want to check local files and commit messages.
These tests were previously using the version of hg being tested to query the
source repository. However, this will fail if the source repository requires
extensions or other settings not supported by the version of mercurial being
tested. The source repository was typically initially cloned using the system
hg installation, so we should use the system hg installation to query it.
There was already a helpers-testrepo.sh script designed to help cope with
different requirements for the source repository versus the test repositories.
However, it only handled the evolve extension. This new behavior works with
any extensions that are different between the system installation and the test
installation.
We didn't have explicit microbenchmark coverage for loading revlog
indexes. That seems like a useful thing to have, so let's add it.
We currently measure the low-level nodemap APIs. There is room to
hook in at the actual revlog layer. This could be done as a follow-up.
The hackiest thing about this patch is specifying revlog paths.
Other commands have arguments that allow resolution of changelog,
manifest, and filelog. I needed to hook in at a lower level of
the revlog API than what the existing helper functions to resolve
revlogs allowed. I was too lazy to write some new APIs. This could
be done as a follow-up easily enough.
Example output for `hg perfrevlogindex 00changelog.i` on my
Firefox repo (404418 revisions):
! revlog constructor
! wall 0.003106 comb 0.000000 user 0.000000 sys 0.000000 (best of 912)
! read
! wall 0.003077 comb 0.000000 user 0.000000 sys 0.000000 (best of 924)
! create index object
! wall 0.000000 comb 0.000000 user 0.000000 sys 0.000000 (best of 1803994)
! retrieve index entry for rev 0
! wall 0.000193 comb 0.000000 user 0.000000 sys 0.000000 (best of 14037)
! look up missing node
! wall 0.003313 comb 0.000000 user 0.000000 sys 0.000000 (best of 865)
! look up node at rev 0
! wall 0.003295 comb 0.010000 user 0.010000 sys 0.000000 (best of 858)
! look up node at 1/4 len
! wall 0.002598 comb 0.010000 user 0.010000 sys 0.000000 (best of 1103)
! look up node at 1/2 len
! wall 0.001909 comb 0.000000 user 0.000000 sys 0.000000 (best of 1507)
! look up node at 3/4 len
! wall 0.001213 comb 0.000000 user 0.000000 sys 0.000000 (best of 2275)
! look up node at tip
! wall 0.000453 comb 0.000000 user 0.000000 sys 0.000000 (best of 5697)
! look up all nodes (forward)
! wall 0.094615 comb 0.100000 user 0.100000 sys 0.000000 (best of 100)
! look up all nodes (reverse)
! wall 0.045889 comb 0.050000 user 0.050000 sys 0.000000 (best of 100)
! retrieve all index entries (forward)
! wall 0.078398 comb 0.080000 user 0.060000 sys 0.020000 (best of 100)
! retrieve all index entries (reverse)
! wall 0.079376 comb 0.080000 user 0.070000 sys 0.010000 (best of 100)
We have a couple of commands beginning with "perfrevlog." The
actual "perfrevlog" command actually measures resolving the fulltext
of multiple revisions. So let's rename it to reflect what it actually
does.
revlog.revision takes either node or rev, but taking a rev is more
efficient, because converting rev to node is just a seek and read.
That's cheaper than converting node to rev, which may require O(n) walk in
revlog index for the first times, and then triggering building the radix
tree index. Even with the radix tree built, rev -> node is still faster than
node -> rev because the radix tree requires more jumps in memory.
So r.revision(r.node(rev)) should be changed to r.revision(rev). This patch
adds a check-code rule to detect that.
I'm adding some performance logging to ui.write - this benchmark lets us
confirm that the cost of that logging is acceptably low.
At this point, the microbenchmark on Linux over SSH shows:
! wall 3.213560 comb 0.410000 user 0.350000 sys 0.060000 (best of 4)
while on the Mac locally, it shows:
! wall 0.342325 comb 0.180000 user 0.110000 sys 0.070000 (best of 20)
Upcoming commits will teach revlogs to leverage the new compression
engine API so that new compression formats can more easily be
leveraged in revlogs. We want to be sure this refactoring doesn't
regress performance. So this commit introduces "perfrevchunks" to
explicitly test performance of reading, decompressing, and
recompressing revlog chunks.
Here is output when run on the mozilla-unified repo:
$ hg perfrevlogchunks -c
! read
! wall 0.346603 comb 0.350000 user 0.340000 sys 0.010000 (best of 28)
! read w/ reused fd
! wall 0.337707 comb 0.340000 user 0.320000 sys 0.020000 (best of 30)
! read batch
! wall 0.013206 comb 0.020000 user 0.000000 sys 0.020000 (best of 221)
! read batch w/ reused fd
! wall 0.013259 comb 0.030000 user 0.010000 sys 0.020000 (best of 222)
! chunk
! wall 1.909939 comb 1.910000 user 1.900000 sys 0.010000 (best of 6)
! chunk batch
! wall 1.750677 comb 1.760000 user 1.740000 sys 0.020000 (best of 6)
! compress
! wall 5.668004 comb 5.670000 user 5.670000 sys 0.000000 (best of 3)
$ hg perfrevlogchunks -m
! read
! wall 0.365834 comb 0.370000 user 0.350000 sys 0.020000 (best of 26)
! read w/ reused fd
! wall 0.350160 comb 0.350000 user 0.320000 sys 0.030000 (best of 28)
! read batch
! wall 0.024777 comb 0.020000 user 0.000000 sys 0.020000 (best of 119)
! read batch w/ reused fd
! wall 0.024895 comb 0.030000 user 0.000000 sys 0.030000 (best of 118)
! chunk
! wall 2.514061 comb 2.520000 user 2.480000 sys 0.040000 (best of 4)
! chunk batch
! wall 2.380788 comb 2.380000 user 2.360000 sys 0.020000 (best of 5)
! compress
! wall 9.815297 comb 9.820000 user 9.820000 sys 0.000000 (best of 3)
We already see some interesting data, such as how much slower
non-batched chunk reading is and that zlib compression appears to be
>2x slower than decompression.
I didn't have the data when I wrote this commit message, but I ran this
on Mozilla's NFS-based Mercurial server and the time for reading with a
reused file descriptor was faster. So I think it is worth testing both
with and without file descriptor reuse so we can make informed
decisions about recycling file descriptors.
This broke in c7236da49964 due to a refactored manifest API.
The fix is a bit hacky - perfbdiff doesn't yet support tree manifests
for example. But it gets the job done.
A test has been added for --alldata so this doesn't happen again.
This command can be used for testing the performance of producing the
changelog portion of a changegroup.
We could use additional perf* commands for testing other parts of
changegroup. Those can be written another time, when they are needed.
(And those may want to refactor the changegroup generation API so code
can be reused.) Speaking of code reuse, yes, this command does reinvent
a small wheel. I didn't want to scope bloat to change the changegroup
API because that will invite bikeshedding.
To check importing modules in perf.py for historical portability, this
patch lists up files by "hg files" both for "1.2" and tip, and builds
up "module whitelist" check from those files.
This patch uses "1.2" as earlier side version of "module whitelist",
because "mercurial.error" module is a blocker for loading perf.py with
Mercurial earlier than 1.2, and just importing "mercurial.error"
separately isn't enough.
This patch introduces tests/check-perf-code.py as a preparation for
adding extra checks on contrib/perf.py in subsequent patches (mainly,
for historical portability).
At this change, check-perf-code.py doesn't add any extra check, and is
equal to check-code.py. This makes subsequent patch focus only on
adding an extra check on perf.py check-perf-code.py.
check-perf-code.py adds extra checks on perf.py by wrapping
contrib/check-code.py, because "filtering" by check-code.py (e.g.
normalize characters in string literal or comment line) is useful to
simplify regexp for check, and avoid false positive matching.
We have a convention of using -c|-m|FILE elsewhere for reading from
revlogs. Use it for `hg perfrevlog`.
While I was here, I also added a docstring to document what this
command does, as "perfrevlog" is ambiguous.
As part of investigating performance improvements to revlog reading,
I needed a mechanism to measure every part of revlog reading so I knew
where time was spent and how effective optimizations were.
This patch implements a perf command for benchmarking the various
stages of reading a single revlog revision.
When executed against a manifest revision at the end of a 30,000+
long delta chain in mozilla-central, the command demonstrates that
~80% of time is spent in zlib decompression.