sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-11 09:17:30 +03:00

Author	SHA1	Message	Date
Gregory Szorc	5d1b4c49ee	clonebundles: support for seeding clones from pre-generated bundles Cloning can be an expensive operation for servers because the server generates a bundle from existing repository data at request time. For a large repository like mozilla-central, this consumes 4+ minutes of CPU time on the server. It also results in significant network utilization. Multiplied by hundreds or even thousands of clients and the ensuing load can result in difficulties scaling the Mercurial server. Despite generation of bundles being deterministic until the next changeset is added, the generation of bundles to service a clone request is not cached. Each clone thus performs redundant work. This is wasteful. This patch introduces the "clonebundles" extension and related client-side functionality to help alleviate this deficiency. The client-side feature is behind an experimental flag and is not enabled by default. It works as follows: 1) Server operator generates a bundle and makes it available on a server (likely HTTP). 2) Server operator defines the URL of a bundle file in a .hg/clonebundles.manifest file. 3) Client `hg clone`ing sees the server is advertising bundle URLs. 4) Client fetches and applies the advertised bundle. 5) Client performs equivalent of `hg pull` to fetch changes made since the bundle was created. Essentially, the server performs the expensive work of generating a bundle once and all subsequent clones fetch a static file from somewhere. Scaling static file serving is a much more manageable problem than scaling a Python application like Mercurial. Assuming your repository grows less than 1% per day, the end result is 99+% of CPU and network load from clones is eliminated, allowing Mercurial servers to scale more easily. Serving static files also means data can be transferred to clients as fast as they can consume it, rather than as fast as servers can generate it. This makes clones faster. Mozilla has implemented similar functionality of this patch on hg.mozilla.org using a custom extension. We are hosting bundle files in Amazon S3 and CloudFront (a CDN) and have successfully offloaded >1 TB/day in data transfer from hg.mozilla.org, freeing up significant bandwidth and CPU resources. The positive impact has been stellar and I believe it has proved its value to be included in Mercurial core. I feel it is important for the client-side support to be enabled in core by default because it means that clients will get faster, more reliable clones and will enable server operators to reduce load without requiring any client-side configuration changes (assuming clients are up to date, of course). The scope of this feature is narrowly and specifically tailored to cloning, despite "serve pulls from pre-generated bundles" being a valid and useful feature. I would eventually like for Mercurial servers to support transferring all repository data via statically hosted files. You could imagine a server that siphons all pushed data to bundle files and instructs clients to apply a stream of bundles to reconstruct all repository data. This feature, while useful and powerful, is significantly more work to implement because it requires the server component have awareness of discovery and a mapping of which changesets are in which files. Full, clone bundles, by contrast, are much simpler. The wire protocol command is named "clonebundles" instead of something more generic like "staticbundles" to leave the door open for a new, more powerful and more generic server-side component with minimal backwards compatibility implications. The name "bundleclone" is used by Mozilla's extension and would cause problems since there are subtle differences in Mozilla's extension. Mozilla's experience with this idea has taught us that some form of "content negotiation" is required. Not all clients will support all bundle formats or even URLs (advanced TLS requirements, etc). To ensure the highest uptake possible, a server needs to advertise multiple versions of bundles and clients need to be able to choose the most appropriate from that list one. The "attributes" in each server-advertised entry facilitate this filtering and sorting. Their use will become apparent in subsequent patches. Initial inspiration and credit for the idea of cloning from static files belongs to Augie Fackler and his "lookaside clone" extension proof of concept.	2015-10-09 11:22:01 -07:00
Pierre-Yves David	30913031d4	error: get Abort from 'error' instead of 'util' The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be confused about that and gives all the credit to 'util' instead of the hardworking 'error'. In a spirit of equity, we break the cycle of injustice and give back to 'error' the respect it deserves. And screw that 'util' poser. For great justice.	2015-10-08 12:55:45 -07:00
Durham Goode	ceec7b0056	bundle2: allow lazily acquiring the lock In the external pushrebase extension, it is valuable to be able to do some work without taking the lock (like running expensive hooks). This enables significantly higher commit throughput. This patch adds an option to lazily acquire the lock. It means that all bundle2 part handlers that require writing to the repo must first call op.gettransction(), when in this mode.	2015-10-05 16:19:54 -07:00
Gregory Szorc	ba84e5a41f	exchange: add "streaming all changes" to bundle2 pulling This is the beginning of client-side support for performing a stream clone using bundle2. The main bundle2 pull function checks whether to perform a streaming clone and outputs a message if so. While we have a duplicate message, it seems easier to have all the bundle2 console writing in one location and in an easy-to-read conditional block.	2015-10-04 12:11:44 -07:00
Gregory Szorc	dcbb92c7a1	exchange: expose bundle2 availability on pulloperation Like the previous patch, the value is cached and will prevent a function level import in streamclone.py.	2015-10-04 12:03:30 -07:00
Gregory Szorc	0500a32918	exchange: expose bundle2 capabilities on pulloperation This adds a cache and makes accessing the capabilities slightly simpler, as you don't need to directly go through the bundle2 module. This will also help prevent a function-level import in streamclone.py. This patch arguably isn't necessary. But I think it makes things slightly nicer.	2015-10-04 18:31:53 -07:00
Gregory Szorc	f283944ccb	streamclone: rename and document maybeperformstreamclone() Upcoming patches will introduce bundle2 based streaming clones. Add "legacy" to the function name and add a docstring clarifying the intent of the function.	2015-10-04 11:34:28 -07:00
Gregory Szorc	8ac7d32ad1	streamclone: refactor maybeperformstreamclone to take a pullop Just like all the other pull steps. Consistency is good. This seems a little excessive right now since maybeperformstreamclone is such a short function. This will be addressed in a subsequent patch.	2015-10-04 11:20:52 -07:00
Gregory Szorc	13e503977f	exchange: move stream clone logic into pull code path Stream clones are a special case of clones. Clones are a special case of pull. Most of the logic for deciding what to do at pull time is in exchange.py. It makes sense for the stream clone determination to live there as well. This patch moves the calling of the stream clone code into pull(). The checks in streamclone.canperformstreamclone() ensure that we don't perform a stream clone unless it is possible. A future patch will convert maybeperformstreamclone() to accept a pullop to make it consistent with everything else in pull(). It will also grow some functionality (in case you doubted the necessity of a 4 line function).	2015-10-02 23:04:52 -07:00
Gregory Szorc	7d658837f3	exchange: teach pull about requested stream clones An upcoming patch will move the invocation of stream cloning logic to the normal pull code path (from localrepository.clone). In preparation for this, we teach pull() and pulloperation about whether a streaming clone is requested. The return logic in localrepository.clone() has been reformatted slightly because of line length issues.	2015-10-02 22:16:34 -07:00
Gregory Szorc	d8e74180f0	streamclone: move code out of exchange.py We bulk move functions from exchange.py related to streaming clones. Function names were renamed slightly to drop a component redundant with the module name. Docstrings and comments referencing old names and locations were updated accordingly.	2015-10-02 16:05:52 -07:00
Gregory Szorc	b9c7577e2a	exchange: add docstring to pull() This seems like the kind of important function that should be documented better.	2015-10-02 15:36:00 -07:00
Ryan McElroy	83dd86ff3e	bundle2: generate check:heads in a independent function	2015-10-01 10:48:14 -07:00
Durham Goode	7638a913ab	exchange: allow fallbackheads to use lazy set behavior The common ancestor set implementation was made lazy a couple years ago, but this piece of code still required processing the entire repo by putting set() around the lazy set. The code was introduced in 984b6b21bf13, a year before the lazy ancestor set was added. Dropping the set() shaves 3.5 seconds off of 'push -r' in repos with hundreds of thousands of commits.	2015-09-07 17:08:35 -07:00
Martin von Zweigbergk	6ea2f0a8d1	exchange: fix dead assignment The assignment of the value from bundle2.processbundle() to 'r' is unused. It is currently the same as its third argument (if given), and since that argument may eventually go away (according to the method's docstring), let's reassign the return value to 'op' instead to better prepare for that.	2015-07-20 13:39:25 -07:00
Martin von Zweigbergk	2b14caf707	exchange: s/phase/bookmark/ in _pushb2bookmarks()	2015-07-20 13:35:19 -07:00
Matt Mackall	2c7ca511c2	bookmarks: mark internal-only config option	2015-06-25 17:47:32 -05:00
Siddharth Agarwal	6bd6a96a1b	exchange: replace references to 'sopener' with 'svfs'	2015-06-25 22:18:56 -07:00
Gregory Szorc	5380dea2a7	global: mass rewrite to use modern exception syntax Python 2.6 introduced the "except type as instance" syntax, replacing the "except type, instance" syntax that came before. Python 3 dropped support for the latter syntax. Since we no longer support Python 2.4 or 2.5, we have no need to continue supporting the "except type, instance". This patch mass rewrites the exception syntax to be Python 2.6+ and Python 3 compatible. This patch was produced by running `2to3 -f except -w -n .`.	2015-06-23 22:20:08 -07:00
Pierre-Yves David	0cb42d22dd	bundle2.getunbundler: rename "header" to "magicstring" This is more consistent with the name used in the bundler class. Thanks goes to Martin von Zweigbergk for pointing this out.	2015-04-07 14:14:27 -07:00
Pierre-Yves David	dea1de453c	push: only say we are trying to push obsmarkers when we actually try The message was issued unconditionally. Move it inside the `if` that actually tries to push obsmarkers.	2015-06-11 13:02:21 -07:00
Pierre-Yves David	6375fa427b	getbundle: add data about the number of changesets bundled We use an advisory parameters to carry the number of changesets bundled. This will be used for progress output.	2015-06-07 15:52:57 -07:00
Pierre-Yves David	f31811b25c	exchange: expand usage of getchangegroupraw The 'getchangegroupraw' is very simple (two lines) so we inline it in its only caller. This exposes the 'outgoing' object of the part generator function, allowing us to add information on the number of changesets contained in the part in a later changeset. Such information is useful for progress bar.	2015-06-07 15:49:17 -07:00
Pierre-Yves David	78d8db05a7	getbundle: have a single getchangegroupraw call site Having a single call site will simplify the code and help with coming refactoring.	2015-06-07 15:47:07 -07:00
Pierre-Yves David	a12d2cf4da	phases: abort the whole push if phases fail to update (BC) When using bundle2, the phase pushkey parts are now made mandatory. As a result, failure to update the bookmark server side will result in the transaction being aborted.	2015-05-27 22:25:51 -07:00
Pierre-Yves David	c09d056447	bookmarks: abort the whole push if bookmarks fails to update (BC) When using bundle2, the bookmark's pushkey parts are now made mandatory. As a result failure to update the bookmark server side will result in the transaction being aborted.	2015-05-27 22:25:33 -07:00
Pierre-Yves David	97353a98b6	push: catch and process PushkeyFailed error We add a way to register "pushkey failure callback" that will be used if the push is aborted by a pushkey failure. A part generator adding mandatory pushkey parts should register a failure callback for all of them. The callback will be in charge of generating a meaningful abort if this part fails. If no callback is registered, the error is propagated. Catch PushkeyFailed error in exchange.	2015-06-05 16:30:11 -07:00
Pierre-Yves David	4d9a5fc0fe	push: make pushkey part advisory The current behavior (with bundle1) is to let the rest of the push succeed if the pushkey call (phases, bookmarks) failed (this comes from the fact that each item is sent in its own command). We kept this behavior with bundle2, which is highly debatable, but let us keep thing as they are now as a start. We are about to enforce 'mandatory' pushkey part as 'mandatory' successful, so we need to marks parts as advisory to preserve the current (debatable) behavior.	2015-05-27 05:35:00 -07:00
Pierre-Yves David	aa4b0e00d4	bundle2: pull bookmark the old way if no bundle2 listkeys support (issue4701) All known server implementations have listkeys support with bundle2, but people in the process of implementing new servers may not. Let's be nice with them.	2015-06-08 13:32:38 -07:00
Pierre-Yves David	a1ac214778	pull: prevent race condition in bookmark update when using -B (issue4689) We are already fetching remote bookmarks to honor the -B option, we now pass that data to the pull process so it can reuse it. This prevents a race condition between the initial looking and the actual pulling of changesets and bookmarks. Tests are updated to handle this fact.	2015-06-01 22:34:01 -07:00
Pierre-Yves David	095d91ad59	pull: allow a generic way to pass parameters to the pull operation We have been feeling the need for this in extensions for quite some time. This will be used to pass remote bookmark information around in the next changesets.	2015-06-02 00:43:11 -07:00
Pierre-Yves David	40a75acd5b	pull: skip pulling remote bookmarks with bundle2 if a value already exists For efficiency and consistency purpose, remote bookmarks, retrieved at the time the pull command code is doing lookup, will be reused during the core pull operation. A second step toward this is to avoid requesting bookmark information in the bundle 2 if we already have them locally.	2015-06-01 22:29:49 -07:00
Pierre-Yves David	c16639dae3	pull: skip pulling remote bookmarks with bundle1 if a value already exist For efficiency and consistency purpose, remote bookmarks, retrieved at the time the pull command code is doing lookup, will be reused during the core pull operation. A first step toward this is to setup the logic avoiding pulling the data again during the discovery phase if some have already been provided.	2015-06-01 22:28:03 -07:00
Pierre-Yves David	289c8c97de	bundle2: stop capturing output for ssh again This backout 668da6e015fd since we can have real time output with ssh again. The tests change is not backed-out because it was a test output fix.	2015-05-20 11:44:06 -05:00
Pierre-Yves David	401e1dc20d	bundle2: use bundle2 by default All the test change have been isolated and validated. We have free to turn on bundle2 as the default exchange protocol. "To reach a port we must set sail – Sail, not tie at anchor Sail, not drift."	2015-02-06 17:41:24 +00:00
Gregory Szorc	979bcb0109	exchange: support transferring .hgtags fnodes mapping On Mozilla's mozilla-beta repository .hgtags fnodes resolution takes ~18s from a clean cache on my machine. This means that the first time a user runs `hg tags`, `hg log`, or any other command that displays or accesses tags data, a ~18s pause will occur. There is no output during this pause. This results in a poor user experience and perception that Mercurial is slow. The .hgtags changeset to filenode mapping is deterministic. This patch takes advantage of that property by implementing support for transferring .hgtags filenodes mappings in a dedicated bundle2 part. When a client advertising support for the "hgtagsfnodes" capability requests a bundle, a mapping of changesets to .hgtags filenodes will be sent to the client. Only mappings of head changesets included in the bundle will be sent. The transfer of this mapping effectively eliminates one time tags cache related pauses after initial clone. The mappings are sent as binary data. So, 40 bytes per pair of SHA-1s. On the aforementioned mozilla-beta repository, 659 * 40 = 26,360 raw bytes of mappings are sent over the wire (in addition to the bundle part headers). Assuming 18s to populate the cache, we only need to transfer this extra data faster than 1.5 KB/s for overall clone + tags cache population time to be shorter. Put into perspective, the mozilla-beta repository is ~1 GB in size. So, this additional data constitutes <0.01% of the cloned data. The marginal overhead for a multi-second performance win on clones in my opinion justifies an on-by-default behavior.	2015-05-25 17:14:11 -07:00
Pierre-Yves David	8305af854f	pull: only prefetch bookmarks when using bundle1 All bundle2 servers now support the 'listkeys' part(1), so we'll always be able to fetch bookmarks data at the same time as the changeset. This should be enough to avoid the one race condition that this bookmark prefetching is trying to work around. It even allows future server to make sure everything is generated from the same "transaction" if they become capable of such. The current code was already overwriting the prefetched value with the one in bundle2 anyway. Note that this is not preventing all race conditions in related to bookmark in 'hg pull' it makes nothing better and nothing worse. Reducing the number of listkeys calls will reduce the latency on pull. The pre-fetch is also moved into a discovery step because it seems to belong there. (1) Because all servers not speaking 'pushkey' parts are compatible with the 'HG2X' protocol only.	2015-05-27 04:57:03 -07:00
Pierre-Yves David	d83ef60585	subrepo: detect issue3781 case earlier so it apply to bundle2 We are doing some strange special casing of phase push when: - the source is a subrepo - the destination is publishing - some changeset are still draft on the destination In that case we do not push phases information (to publish the draft changesets) because it could break simple cycle of 'clone/pull/push' of subrepos. We have to detect this case earlier to have bundle2 respecting it. We change the test to check the behavior for both bundle1 and bundle2.	2015-05-27 06:08:14 -07:00
Gregory Szorc	491594d890	exchange: move code for consuming streaming clone into exchange For reasons outlined in the previous commit, we want to make the code for consuming "stream bundles" reusable. This patch extracts the code into a standalone function.	2015-05-21 10:27:45 -07:00
Gregory Szorc	ce88e05117	exchange: move code for generating a streaming clone into exchange Streaming clones are fast because they are essentially tar files. On mozilla-central, a streaming clone only consumes ~55s CPU time on clients as opposed to ~340s CPU time for a regular clone or gzip bundle unbundle. Mozilla is deploying static file "lookaside" support to our Mercurial server. Static bundles are pre-generated and uploaded to S3. When a clone is performed, the static file is fetched, applied, and then an incremental pull is performed. Unfortunately, on an ideal network connection this still takes as much wall and CPU time as a regular clone (although it does save significant server resources). We like the client-side wall time wins of streaming clones. But we want to leverage S3-based pre-generated files for serving the bulk of clone data. This patch moves the code for producing a "stream bundle" into its own standalone function, away from the wire protocol. This will enable stream bundle files to be produced outside the context of the wire protocol. A bikeshed on whether exchange is the best module for this function might be warranted. I selected exchange instead of changegroup because "stream bundles" aren't changegroups (yet).	2015-05-21 10:27:22 -07:00
Matt Mackall	7e1cf5444c	merge with stable	2015-05-19 07:17:57 -05:00
Pierre-Yves David	cfe00fb313	ssh: capture output with bundle2 again (issue4642) I just discovered that we are not displaying ssh server output in real time anymore. So we can just fall back to the bundle2 output capture for now. This fix the race condition issue we where seeing in tests. Re-instating real time output for ssh would fix the issue too but lets get the test to pass first.	2015-05-18 22:35:27 -05:00
Pierre-Yves David	e5b26a2e94	bundle2: disable ouput capture unless we use http (issue4613 issue4615) The current bundle2 processing was capturing all output. This is nice as it provide better meta data about what output what, but this was changing two things: 1) adding a prefix "remote: " to "other" output during local push (issue4613) 2) local and ssh push does not provide real time output anymore (issue4615) As we are unsure about what form should be used in (1) and how to solve (2) we disable output capture in this two cases. Output capture can be forced using an experimental option.	2015-04-28 17:38:02 -07:00
Pierre-Yves David	99a3f79bb5	exchange: catch down to BaseException when handling bundle2 We can now catch more things.	2015-05-18 15:33:21 -05:00
Augie Fackler	a5b17bd9d1	cleanup: use __builtins__.any instead of util.any any() is available in all Python versions we support now.	2015-05-16 14:30:07 -04:00
Pierre-Yves David	96cf5ab0aa	obsolete: sort obsmarkers during exchange Because bundle2 allows a more precise exchange of obsmarkers during pull, it sends them in a different order (previously unstable because of sets.) As a result, they are added to the repository in a different order. To stabilize the order and ensure tests are unchanged when moving from bundle1 to bundle2 we sort markers when exchanging them. In the long run, the obsstore will probably not use a linear storage.	2015-05-10 06:48:08 -07:00
Pierre-Yves David	1dc90191f8	bundle2: also save output when error happens during part processing Until this changeset, we were only able to save output if an error happened during the 'transaction.close()' phase. If the 'processbundle' call raised an exception, the 'bundleoperation' object was never returned, so the reply bundle was never accessible and no output could be salvaged. We introduce a quick (but not very elegant) fix to gain access to any reply created during the processing. This conclude this output related series. We should hopefully be able client-side to see the whole server output, in a proper order. The code is now complex enough that a refactoring of it would make sense on default.	2015-04-23 16:36:18 +01:00
Pierre-Yves David	ae773fc61e	bundle2: capture output issue during transaction close We were capturing all output issue during bundle2 processing, and all output issue during transaction rollback in case of failure. However, the output issue during transaction commit was still roaming the land freely. It is now put back in line. This let the user see output from 'pretxnclose' and 'txnclose' (and related) in the right order.	2015-04-23 14:54:45 +01:00
Pierre-Yves David	1e8ec29191	bundle2: also capture hook output during processing External hook used to directly write on stdout and stderr. As a result their output was not captured by the bundle2 processing. This resulted in confusing out of order output on the client side. We are now capturing hooks output in this context.	2015-04-23 17:03:58 +01:00
Pierre-Yves David	47985fc815	bundle2: capture transaction rollback message output (issue4614) The output from the transaction rollback was not included into the reply bundle. It was eventually caught by the usual 'unbundle' output capture and sent to the client but the result was out of order on the client side. We now capture the output for the transaction release and transmit it the same way as all other output. We should probably rethink the whole output capture things but this would not be appropriate for stable. The is still multiple cases were output failed to be properly capture, they will be fixed in later changesets.	2015-04-23 14:20:36 +01:00

1 2 3 4 5 ...

276 Commits