sapling/mercurial
Gregory Szorc 2f77487f6f bundle2: don't use seekable bundle2 parts by default (issue5691)
The last commit removed the last use of the bundle2 part seek() API
in the generic bundle2 part iteration code. This means we can now
switch to using unseekable bundle2 parts by default and have the
special consumers that actually need the behavior request it.

This commit changes unbundle20.iterparts() to expose non-seekable
unbundlepart instances by default. If seekable parts are needed,
callers can pass "seekable=True." The bundlerepo class needs
seekable parts, so it does this.

The interrupt handler is also changed to use a regular unbundlepart.
So, by default, all consumers except bundlerepo will see unseekable
parts.

Because the behavior of the iterparts() benchmark changed, we add
a variation to test seekable parts vs unseekable parts. And because
parts no longer have seek() unless "seekable=True," we update the
"part seek" benchmark.

Speaking of benchmarks, this change has the following impact to
`hg perfbundleread` on an uncompressed bundle of the Firefox repo
(6,070,036,163 bytes):

! read(8k)
! wall 0.722709 comb 0.720000 user 0.150000 sys 0.570000 (best of 14)
! read(16k)
! wall 0.602208 comb 0.590000 user 0.080000 sys 0.510000 (best of 17)
! read(32k)
! wall 0.554018 comb 0.560000 user 0.050000 sys 0.510000 (best of 18)
! read(128k)
! wall 0.520086 comb 0.530000 user 0.020000 sys 0.510000 (best of 20)
! bundle2 forwardchunks()
! wall 2.996329 comb 3.000000 user 2.300000 sys 0.700000 (best of 4)
! bundle2 iterparts()
! wall 8.070791 comb 8.060000 user 7.180000 sys 0.880000 (best of 3)
! wall 6.983756 comb 6.980000 user 6.220000 sys 0.760000 (best of 3)
! bundle2 iterparts() seekable
! wall 8.132131 comb 8.110000 user 7.160000 sys 0.950000 (best of 3)
! bundle2 part seek()
! wall 10.370142 comb 10.350000 user 7.430000 sys 2.920000 (best of 3)
! wall 10.860942 comb 10.840000 user 7.790000 sys 3.050000 (best of 3)
! bundle2 part read(8k)
! wall 8.599892 comb 8.580000 user 7.720000 sys 0.860000 (best of 3)
! wall 7.258035 comb 7.260000 user 6.470000 sys 0.790000 (best of 3)
! bundle2 part read(16k)
! wall 8.265361 comb 8.250000 user 7.360000 sys 0.890000 (best of 3)
! wall 7.099891 comb 7.080000 user 6.310000 sys 0.770000 (best of 3)
! bundle2 part read(32k)
! wall 8.290308 comb 8.280000 user 7.330000 sys 0.950000 (best of 3)
! wall 6.964685 comb 6.950000 user 6.130000 sys 0.820000 (best of 3)
! bundle2 part read(128k)
! wall 8.204900 comb 8.150000 user 7.210000 sys 0.940000 (best of 3)
! wall 6.852867 comb 6.850000 user 6.060000 sys 0.790000 (best of 3)

The significant speedup is due to not incurring the overhead to track
payload offset data. Of course, this overhead is proportional to
bundle2 part size. So a multiple gigabyte changegroup part is on the
extreme side of the spectrum for real-world impact.

In addition to the CPU efficiency wins, not tracking offset data
also means not using memory to hold that data. Using a bundle based on
the example BSD repository in issue 5691, this change has a drastic
impact to memory usage during `hg unbundle` (`hg clone` would behave
similarly). Before, memory usage incrementally increased for the
duration of bundle processing. In other words, as we advanced through
the changegroup and bundle2 part, we kept allocating more memory to
hold offset data. After this change, we still increase memory during
changegroup application. But the rate of increase is significantly
slower. (A bulk of the remaining gradual increase appears to be the
storing of revlog sizes in the transaction object to facilitate
rollback.)

The RSS at the end of filelog application is as follows:

Before: ~752 MB
After:  ~567 MB

So, we were storing ~185 MB of offset data that we never even used.
Talk about wasteful!

.. api::

   bundle2 parts are no longer seekable by default.

.. perf::

   bundle2 read I/O throughput significantly increased.

.. perf::

   Significant memory use reductions when reading from bundle2 bundles.

   On the BSD repository, peak RSS during changegroup application
   decreased by ~185 MB from ~752 MB to ~567 MB.

Differential Revision: https://phab.mercurial-scm.org/D1390
2017-11-13 21:10:37 -08:00
..
cext parsers: allow clang-format here 2017-10-16 14:53:57 -04:00
cffi codemod: use pycompat.isdarwin 2017-10-12 23:34:34 -07:00
default.d mergetools.rc: find OSX FileMerge in the new location inside Xcode 4.3 2015-10-16 11:37:34 +02:00
help sshpeer: add a configurable hint for the ssh error message 2017-11-20 01:40:26 -08:00
hgweb hgweb: use webutil.commonentry() for nodes (but not for jsdata yet) in /graph 2017-11-20 21:59:00 +08:00
httpclient httpclient: update to 54868ef054d2 of httpplus 2016-06-27 11:53:50 -04:00
pure codemod: use pycompat.iswindows 2017-10-12 23:30:46 -07:00
templates hgweb: use webutil.commonentry() for nodes (but not for jsdata yet) in /graph 2017-11-20 21:59:00 +08:00
thirdparty thirdparty: vendor attrs 2017-10-01 04:14:16 -07:00
__init__.py python3: don't byte mangle third-party packages 2017-10-01 04:04:18 -07:00
ancestor.py py3: add __bool__ to every class defining __nonzero__ 2017-03-13 12:40:14 -07:00
archival.py archive: add an experimental config to control the metadata file template 2017-07-17 00:49:29 -04:00
bdiff.c bdiff: remove trailing newlines 2017-10-04 10:51:39 -04:00
bdiff.h bdiff: include compat.h in header to define ssize_t 2017-10-13 22:38:24 +09:00
bitmanipulation.h bitmanipulation: reformat with clang-format 2017-10-04 10:52:50 -04:00
bookmarks.py bookmark: add a dedicated txnclose-bookmark hook 2017-10-10 17:53:42 +02:00
branchmap.py branchmap: remove superfluous pass statements 2017-09-30 07:42:59 -04:00
bundle2.py bundle2: don't use seekable bundle2 parts by default (issue5691) 2017-11-13 21:10:37 -08:00
bundlerepo.py bundle2: don't use seekable bundle2 parts by default (issue5691) 2017-11-13 21:10:37 -08:00
byterange.py cleanup: use urllibcompat for renamed methods on urllib request objects 2017-10-01 12:14:21 -04:00
changegroup.py changegroup: use any node, not min(), in treemanifest's generatemanifests 2017-11-08 18:24:43 -08:00
changelog.py changelog: use a Factory for default value for files 2017-10-02 11:03:53 +01:00
chgserver.py chgserver: do not treat HG as sensitive environ when CHGHG is set 2017-10-18 14:55:39 -07:00
cmdutil.py docs: add args/returns docs for some cmdutil, context, and registrar functions 2017-11-16 15:01:21 -08:00
color.py codemod: use pycompat.iswindows 2017-10-12 23:30:46 -07:00
commands.py commands: add value for cmdtype argument for read only commands 2017-11-21 04:37:51 +05:30
commandserver.py style: never use a space before a colon or comma 2017-09-29 15:48:34 +00:00
compat.h encoding: add function to test if a str consists of ASCII characters 2017-04-23 12:59:42 +09:00
config.py config: allow remapping the default section 2017-10-14 17:41:41 +09:00
configitems.py sshpeer: add a configurable hint for the ssh error message 2017-11-20 01:40:26 -08:00
context.py docs: add args/returns docs for some cmdutil, context, and registrar functions 2017-11-16 15:01:21 -08:00
copies.py copies: add a config to limit the number of candidates to check in heuristics 2017-10-10 02:25:03 +05:30
crecord.py crecord: fix revert -ir '.^' crash caused by 3649c3f2cd 2017-11-13 18:22:25 -08:00
dagop.py revset: optimize "draft() & ::x" pattern 2017-08-28 14:49:00 -07:00
dagparser.py py3: iterate bytes as a byte string in dagparser.py 2017-09-03 15:32:45 +09:00
dagutil.py dagutil: use a listcomp instead of a map() 2017-10-15 00:37:24 -04:00
debugcommands.py debugdeltachain: output information about sparse read if enabled 2017-10-26 09:27:09 +02:00
destutil.py show: implement "stack" view 2017-07-01 22:38:42 -07:00
dirstate.py dirstate: make map implementation overridable 2017-11-15 01:07:42 -08:00
dirstateguard.py dirstate: update backup functions to take full backup filename 2017-07-12 15:24:07 -07:00
discovery.py discovery: prevent crash caused by prune marker having no parent data 2017-04-19 23:10:05 +09:00
dispatch.py dispatch: when --pager=no is passed, also disable pager on req.repo.ui 2017-10-09 12:42:28 -07:00
dummycert.pem ssl: on OS X, use a dummy cert to trick Python/OpenSSL to use system CA certs 2014-09-26 02:19:48 +02:00
encoding.py py3: use 'surrogatepass' error handler to process U+DCxx transparently 2017-09-16 22:55:48 +09:00
error.py error: add InMemoryMergeConflictsError 2017-11-15 21:07:30 -08:00
exchange.py exchange: drop unused '_getbookmarks' function 2017-10-17 15:55:40 +02:00
exewrapper.c exewrapper: format with clang-format 2017-10-04 11:04:18 -04:00
extensions.py extensions: always include traceback when extension setup fails 2017-10-17 10:31:44 -07:00
fancyopts.py py3: slice over bytes to prevent getting it's ascii value 2017-06-25 08:36:51 +05:30
filelog.py python3: replace sorted(<dict>.iterkeys()) with sorted(<dict>) 2017-08-22 20:06:58 -04:00
filemerge.py filemerge: pass a default value to _toolstr (issue5718) 2017-10-26 11:07:06 -07:00
fileset.py help: clarify quotes are needed for filesets.size expressions 2016-09-21 16:33:37 +00:00
formatter.py templater: load aliases from [templatealias] section in map file 2017-10-14 18:06:42 +09:00
graphmod.py log: add a "graphwidth" template variable 2017-08-15 10:15:31 -07:00
hbisect.py bisect: move check_state into the bisect module 2016-08-24 04:25:20 +02:00
help.py help: adding a topic on flags 2017-10-30 20:35:30 -07:00
hg.py share: handle --relative shares to a different drive letter gracefully 2017-11-02 23:55:09 -04:00
hook.py hook: add a 'hashook' function to test for hook existence 2017-10-08 13:08:31 +02:00
httpconnection.py cleanup: use urllibcompat for renamed methods on urllib request objects 2017-10-01 12:14:21 -04:00
httppeer.py httppeer: always produce native str header keys and values 2017-10-15 00:03:31 -04:00
i18n.py i18n: cache translated messages per encoding 2017-10-13 21:36:10 +09:00
keepalive.py cleanup: use urllibcompat for renamed methods on urllib request objects 2017-10-01 12:14:21 -04:00
localrepo.py sparse-read: skip gaps too small to be worth splitting 2017-10-18 09:07:48 +02:00
lock.py lock: avoid unintentional lock acquisition at failure of readlock 2017-05-01 19:59:13 +09:00
lsprof.py lsprof: use print function 2016-01-02 11:40:53 -08:00
lsprofcalltree.py lsprofcalltree: use print function 2016-01-02 11:45:29 -08:00
mail.py codemod: register core configitems using a script 2017-07-14 14:22:40 -07:00
manifest.py py3: return False early while checking whether None is a key in lazymanifest 2017-09-30 05:22:22 +05:30
match.py match: remove superfluous pass statements 2017-09-30 07:44:45 -04:00
mdiff.py py3: use '%d' for integers instead of '%s' 2017-10-02 04:48:06 +05:30
merge.py merge: add a config option to disable path conflict checking 2017-10-24 11:14:38 -07:00
mergeutil.py checkunresolved: move to new package to help avoid import cycles 2016-11-21 21:31:45 -05:00
minirst.py python3: use our bytes-only version of cgi.escape everywhere 2017-10-05 14:16:20 -04:00
mpatch.c mpatch: switch alignment of wrapped line from tab to spaces with clang-format 2017-10-04 11:00:04 -04:00
mpatch.h mpatch: reformat function prototypes with clang-format 2017-10-04 10:56:33 -04:00
namespaces.py namespaces: record and expose whether namespace is built-in 2017-06-24 14:52:15 -07:00
node.py revlog: add support for partial matching of wdir node id 2016-08-19 18:26:04 +09:00
obsolete.py config: also gather effect-flags on experimental.evolution 2017-10-19 17:50:20 +02:00
obsutil.py obsfate: makes successorsetverb takes the markers as argument 2017-10-19 12:35:47 +02:00
parser.py doctest: use print_function and convert bytes to unicode where needed 2017-09-03 14:56:31 +09:00
patch.py patch: accept prefix argument to changedfiles() helper 2017-11-14 10:26:36 -08:00
pathutil.py pathutil: add doctests for canonpath() 2017-11-03 22:22:50 -04:00
peer.py peer: ensure command names are always ascii bytestrs 2017-10-15 00:05:00 -04:00
phases.py phases: pass phase names to hooks instead of internal values 2017-10-18 12:19:53 -05:00
policy.py encoding: add fast path of jsonescape() (issue5533) 2017-04-23 14:47:52 +09:00
posix.py codemod: use pycompat.isdarwin 2017-10-12 23:34:34 -07:00
profiling.py configitems: register the 'profiling.type' config 2017-06-30 03:44:00 +02:00
progress.py configitems: register the 'progress.format' config 2017-10-11 22:53:17 +02:00
pushkey.py pushkey: use absolute_import 2015-08-08 19:57:27 -07:00
pvec.py pvec: use absolute_import 2015-12-21 21:32:58 -08:00
pycompat.py pycompat: define operating system constants 2017-10-12 19:20:04 -07:00
rcutil.py codemod: use pycompat.iswindows 2017-10-12 23:30:46 -07:00
registrar.py docs: add args/returns docs for some cmdutil, context, and registrar functions 2017-11-16 15:01:21 -08:00
repair.py repair: preserve phase also when not using generaldelta (issue5678) 2017-09-14 11:16:57 -07:00
repository.py repository: formalize wire protocol interface 2017-08-13 11:04:42 -07:00
repoview.py repoview: remove incorrect documentation of the function 2017-10-10 23:19:35 +05:30
revlog.py sparse-read: ignore trailing empty revs in each read chunk 2017-10-18 15:28:19 +02:00
revset.py revset: extract a parsefollowlinespattern helper function 2017-10-04 15:27:43 +02:00
revsetlang.py revset: move weight information to predicate 2017-09-01 19:42:09 -07:00
scmposix.py codemod: use pycompat.isdarwin 2017-10-12 23:34:34 -07:00
scmutil.py scmutil: don't try to delete origbackup symlinks to directories (issue5731) 2017-11-03 09:27:36 -07:00
scmwindows.py pager: use less as a fallback on Unix 2017-04-28 20:51:14 +09:00
selectors2.py selectors2: do not use platform.system() 2017-10-11 17:27:21 -07:00
server.py server: drop executable bit from daemon log file 2017-10-25 21:20:01 +09:00
setdiscovery.py setdiscovery: use iterbatch interface instead of batch 2016-03-01 17:44:41 -05:00
similar.py similar: remove caching from the module level 2017-01-13 11:42:36 -08:00
simplemerge.py simplemerge: remove unused repo parameter 2017-09-01 10:35:43 -07:00
smartset.py py3: fix type of attribute name in smartset.py 2017-09-03 17:14:53 +09:00
sparse.py merge: add merge action 'pr' to rename files during update 2017-10-02 14:05:30 -07:00
sshpeer.py sshpeer: add a configurable hint for the ssh error message 2017-11-20 01:40:26 -08:00
sshserver.py style: never put multiple statements on one line 2017-09-29 15:49:20 +00:00
sslutil.py configitems: register the 'hostsecurity.*:fingerprints' config 2017-10-14 00:29:31 +02:00
statichttprepo.py statichttprepo: do not use platform path separator to build a URL 2017-10-28 17:23:52 +09:00
statprof.py statprof: require input file 2017-01-18 22:45:07 -08:00
store.py py3: iterate bytes as a byte string in store.lowerencode() 2017-09-03 17:28:47 +09:00
streamclone.py codemod: register core configitems using a script 2017-07-14 14:22:40 -07:00
subrepo.py subrepo: use per-type config options to enable subrepos 2017-11-06 22:32:41 -08:00
tagmerge.py tagmerge: use workingfilectx to write merged tags 2017-07-11 16:48:15 -07:00
tags.py cachevfs: migration the tags fnode cache to 'cachevfs' 2017-07-15 23:30:25 +02:00
templatefilters.py templatefilters: defend against evil unicode strs in json filter 2017-10-16 22:44:43 -04:00
templatekw.py templatekw: add verbosity keyword to select template by -q/-v/--debug flag 2017-10-21 17:46:41 +09:00
templater.py obsfate: makes successorsetverb takes the markers as argument 2017-10-19 12:35:47 +02:00
transaction.py util: add base class for transactional context managers 2017-07-28 22:42:10 -07:00
treediscovery.py error: get Abort from 'error' instead of 'util' 2015-10-08 12:55:45 -07:00
txnutil.py txnutil: factor out the logic to read file in according to HG_PENDING 2017-02-21 01:20:59 +09:00
ui.py tweakdefaults: turn on ui.statuscopies 2017-11-16 17:11:14 -08:00
unionrepo.py revlog: update signature of dummy addgroup() in bundlerepo and unionrepo 2017-09-15 23:58:45 +09:00
upgrade.py codemod: simplify nested withs 2017-07-13 18:31:35 -07:00
url.py url: add cgi.escape equivalent for bytestrings 2017-10-14 02:57:26 -04:00
urllibcompat.py urllibcompat: move some adapters from pycompat to urllibcompat 2017-10-04 11:58:00 -04:00
util.py util: add util.clearcachedproperty 2017-11-08 09:18:18 -08:00
verify.py codemod: register core configitems using a script 2017-07-14 14:22:40 -07:00
vfs.py codemod: use pycompat.iswindows 2017-10-12 23:30:46 -07:00
win32.py win32: work around a WinError problem handling HRESULT types 2017-03-30 00:33:00 -04:00
windows.py ssh: quote parameters using shellquote (SEC) 2017-08-04 23:54:12 -07:00
wireproto.py wireproto: more strkwargs cleanup 2017-10-15 00:39:53 -04:00
worker.py codemod: use pycompat.iswindows 2017-10-12 23:30:46 -07:00