sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-10 00:45:18 +03:00

Author	SHA1	Message	Date
Gregory Szorc	1538b87cfc	wireproto: compress data from a generator Currently, the "getbundle" wire protocol command obtains a generator of data, converts it to a util.chunkbuffer, then converts it back to a generator via the protocol's groupchunks() implementation. For the SSH protocol, groupchunks() simply reads 4kb chunks then write()s the data to a file descriptor. For the HTTP protocol, groupchunks() reads 32kb chunks, feeds those into a zlib compressor, emits compressed data as it is available, and that is sent to the WSGI layer, where it is likely turned into HTTP chunked transfer chunks as is or further buffered and turned into a larger chunk. For both the SSH and HTTP protocols, there is inefficiency from using util.chunkbuffer. For SSH, emitting consistent 4kb chunks sounds nice. However, the file descriptor it is writing to is almost certainly buffered. That means that a Python .write() probably doesn't translate into exactly what is written to the I/O layer. For HTTP, we're going through an intermediate layer to zlib compress data. So all util.chunkbuffer is doing is ensuring that the chunks we feed into the zlib compressor are of uniform size. This means more CPU time in Python buffering and emitting chunks in util.chunkbuffer but fewer function calls to zlib. This patch introduces and implements a new wire protocol abstract method: compresschunks(). It is like groupchunks() except it operates on a generator instead of something with a .read(). The SSH implementation simply proxies chunks. The HTTP implementation uses zlib compression. To avoid duplicate code, the HTTP groupchunks() has been reimplemented in terms of compresschunks(). To prove this all works, the "getbundle" wire protocol command has been switched to compresschunks(). This removes the util.chunkbuffer from that command. Now, data essentially streams straight from the changegroup emitter to the wire, possibly through a zlib compressor. Generators all the way, baby. There were slim to no performance changes on the server as measured with the mozilla-central repository. This is likely because CPU time is dominated by reading revlogs, producing the changegroup, and zlib compressing the output stream. Still, this brings us a little closer to our ideal of using generators everywhere.	2016-10-16 11:10:21 -07:00
Gregory Szorc	26f6f03d4c	exchange: refactor APIs to obtain bundle data (API) Currently, exchange.getbundle() returns either a cg1unpacker or a util.chunkbuffer (in the case of bundle2). This is kinda OK, as both expose a .read() to consumers. However, localpeer.getbundle() has code inferring what the response type is based on arguments and converts the util.chunkbuffer returned in the bundle2 case to a bundle2.unbundle20 instance. This is a sign that the API for exchange.getbundle() is not ideal because it doesn't consistently return an "unbundler" instance. In addition, unbundlers mask the fact that there is an underlying generator of changegroup data. In both cg1 and bundle2, this generator is being fed into a util.chunkbuffer so it can be re-exposed as a file object. util.chunkbuffer is a nice abstraction. However, it should only be used "at the edges." This is because keeping data as a generator is more efficient than converting it to a chunkbuffer, especially if we convert that chunkbuffer back to a generator (as is the case in some code paths currently). This patch refactors exchange.getbundle() into exchange.getbundlechunks(). The new API returns an iterator of chunks instead of a file-like object. Callers of exchange.getbundle() have been updated to use the new API. There is a minor change of behavior in test-getbundle.t. This is because `hg debuggetbundle` isn't defining bundlecaps. As a result, a cg1 data stream and unpacker is being produced. This is getting fed into a new bundle20 instance via bundle2.writebundle(), which uses a backchannel mechanism between changegroup generation to add the "nbchanges" part parameter. I never liked this backchannel mechanism and I plan to remove it someday. `hg bundle` still produces the "nbchanges" part parameter, so there should be no user-visible change of behavior. I consider this "regression" a bug in `hg debuggetbundle`. And that bug is captured by an existing "TODO" in the code to use bundle2 capabilities.	2016-10-16 10:38:52 -07:00
Gregory Szorc	36f039b85b	wireproto: rename argument to groupchunks() groupchunks() is a generic "turn a file object into a generator" function. It isn't limited to changegroups. Rename the argument and update the docstring to reflect this.	2016-09-25 12:20:31 -07:00
Gregory Szorc	6fcb5a026b	wireproto: remove gboptslist (API) This variable has been unused since 4e11f86d8ce7, which was over 2 years ago. gboptsmap should be used instead. Marking as API because this could break extensions.	2016-08-06 15:00:34 -07:00
Gregory Szorc	b3c092eb61	wireproto: unescape argument names in batch command (BC) Clients escape both argument names and values when using the batch command. Yet the server was only unescaping argument values. Fortunately we don't have any argument names that need escaped. But that isn't an excuse to lack symmetry in the code. Since the server wasn't properly unescaping argument names, this means we can never introduce an argument to a batchable command that needs escaped because an old server wouldn't properly decode its name. So we've introduced an assertion to detect the accidental introduction of this in the future. Of course, we could introduce a server capability that says the server knows how to decode argument names and allow special argument names to go through. But until there is a need for it (which I doubt there will be), we shouldn't bother with adding an unused capability.	2016-08-06 13:55:21 -07:00
Gregory Szorc	7315f10dde	wireproto: consolidate code for obtaining "cmds" argument value Both wireproto.py and sshpeer.py had code for producing the value to the "cmds" argument used by the "batch" command. This patch extracts that code to a standalone function and uses it.	2016-08-06 13:46:28 -07:00
Augie Fackler	d19a4d923c	wirepeer: rename confusing `source` parameter It's named "url" everyplace else this method is defined, so let's be consistent.	2016-08-05 16:34:30 -04:00
Gregory Szorc	589c58d0cd	wireproto: extract repo filtering to standalone function As part of teaching Mozilla's replication extension to better handle repositories with obsolescence data, I encountered a few scenarios where I wanted built-in wire protocol commands from replication clients to operate on unfiltered repositories so they could have access to obsolete changesets. While the undocumented "web.view" config option provides a mechanism to choose what filter/view hgweb operates on, this doesn't apply to wire protocol commands because wireproto.dispatch() is always operating on the "served" repo. This patch extracts the line for obtaining the repo that wireproto commands operate on to its own function so extensions can monkeypatch it to e.g. return an unfiltered repo. I stopped short of exposing a config option because I view the use case for changing this as a niche feature, best left to the domain of extensions.	2016-07-15 13:41:34 -07:00
Augie Fackler	ad67b99d20	cleanup: replace uses of util.(md5\|sha1\|sha256\|sha512) with hashlib.\1 All versions of Python we support or hope to support make the hash functions available in the same way under the same name, so we may as well drop the util forwards.	2016-06-10 00:12:33 -04:00
timeless	a1cb3173a2	py3: convert to next() function next(..) was introduced in py2.6 and .next() is not available in py3 https://docs.python.org/2/library/functions.html#next	2016-05-16 21:30:53 +00:00
Augie Fackler	6a7aaa4bf4	wireproto: optimize handling of large batch responses Now that batch can be used by remotefilelog, the quadratic string copying this was doing was actually disastrous. In my local testing, fetching a 56 meg file used to take 3 minutes, and now takes only a few seconds.	2016-05-12 09:39:14 -04:00
timeless	109fcbc79e	pycompat: switch to util.urlreq/util.urlerr for py3 compat	2016-04-06 23:22:12 +00:00
Martin von Zweigbergk	4cc86f7b27	bundle: move writebundle() from changegroup.py to bundle2.py (API) writebundle() writes a bundle2 bundle or a plain changegroup1. Imagine away the "2" in "bundle2.py" for a moment and this change should makes sense. The bundle wraps the changegroup, so it makes sense that it knows about it. Another sign that this is correct is that the delayed import of bundle2 in changegroup goes away. I'll leave it for another time to remove the "2" in "bundle2.py" (alternatively, extract a new bundle.py from it).	2016-03-28 14:41:29 -07:00
Augie Fackler	b3f8347d29	http: support sending hgargs via POST body instead of in GET or headers narrowhg (for its narrow spec) and remotefilelog (for its large batch requests) would like to be able to make requests with argument sets so absurdly large that they blow out total request size limit on some http servers. As a workaround, support stuffing args at the start of the POST body. We will probably want to leave this behavior off by default in servers forever, because it makes the old "POSTs are only for writes" assumption wrong, which might break some of the simpler authentication configurations.	2016-03-11 11:37:00 -05:00
Augie Fackler	af1601947d	wireproto: make iterbatcher behave streamily over http(s) Unfortunately, the ssh and http implementations are slightly different due to differences in their _callstream implementations, which prevents ssh from behaving streamily. We should probably introduce a new batch command that can stream results over ssh at some point in the near future. The streamy behavior of batch over http(s) is an enormous win for remotefilelog over http: in my testing, it's saving about 40% on file fetches with a cold cache against a server on localhost.	2016-03-01 18:41:43 -05:00
Augie Fackler	19d5a9a428	peer: add an iterbatcher interface This is very much like ordinary batch(), but it will let me add a mode for batch where we have pathologically large requests which are then handled streamily. This will be a significant improvement for things like remotefilelog, which may want to request thousands of entities at once.	2016-03-01 18:39:25 -05:00
Augie Fackler	44da57a9e4	wireproto: document quirk of _callstream between http and ssh This tripped me up when trying to use it, so it feels like we should document this to avoid future pain.	2016-03-02 14:18:43 -05:00
Gregory Szorc	b5e53cc1d6	wireproto: support disabling bundle1 only if repo is generaldelta I recently implemented the server.bundle1* options to control whether bundle1 exchange is allowed. After thinking about Mozilla's strategy for handling generaldelta rollout a bit more, I think server operators need an additional lever: disable bundle1 if and only if the repo is generaldelta. bundle1 exchange for non-generaldelta repos will not have the potential for CPU explosion that generaldelta repos do. Therefore, it makes sense for server operators to continue to allow bundle1 exchange for non-generaldelta repos without having to set a per-repo hgrc option to change the policy depending on whether the repo is generaldelta. This patch introduces a new set of options to control bundle1 behavior for generaldelta repos. These options enable server operators to limit bundle1 restrictions to the class of repos that can be performance issues. It also allows server operators to tie bundle1 access to store format. In many server environments (including Mozilla's), legacy repos will not be generaldelta and new repos will or might be. New repos often aren't bound by legacy access requirements, so setting a global policy that disallows access to new/generaldelta repos via bundle1 could be a reasonable policy in many server environments. This patch makes this policy very easy to implement (modify global hgrc, add options to existing generaldelta repos to grandfather them in).	2015-12-20 11:56:24 -08:00
Gregory Szorc	9af952ad6e	wireproto: config options to disable bundle1 bundle2 is the new and preferred wire protocol format. For various reasons, server operators may wish to force clients to use it. One reason is performance. If a repository is stored in generaldelta, the server must recompute deltas in order to produce the bundle1 changegroup. This can be extremely expensive. For mozilla-central, bundle generation typically takes a few minutes. However, generating a non-gd bundle from a generaldelta encoded mozilla-central requires over 30 minutes of CPU! If a large repository like mozilla-central were encoded in generaldelta and non-gd clients connected, they could easily flood a server by cloning. This patch gives server operators config knobs to control whether bundle1 is allowed for push and pull operations. The default is to support legacy bundle1 clients, making this patch backwards compatible.	2015-12-04 15:12:11 -08:00
Gregory Szorc	49ce77c867	wireproto: add docstring for wirepeer	2015-12-04 13:15:14 -08:00
Pierre-Yves David	2e9e629297	stream: sort stream capability before serialisation We want that capability to be stable in our testing. This is currently not an issue because the set is size 1, but this will be once generaldelta related data gets in there.	2015-10-20 12:28:42 +02:00
Gregory Szorc	61b9ffeec8	wireproto: move clonebundles command from extension (issue4931) The SSH peer class accesses wireproto.commands[cmd] as part of encoding command arguments. Previously, the wire protocol command was defined in the clonebundles extension. If the client didn't have this extension enabled (which it likely doesn't since it is meant as a server-side extension), then clients attempting to clone via ssh:// would get a crash due to a KeyError accessing wireproto.commands['clonebundles'] when cloning from a server that is advertising clone bundles. Moving the definition of the wire protocol command to wireproto.py makes this problem go away. A side effect of this code move is servers will always respond to "clonebundles" wire protocol command requests. This should be fine: the server will return an empty response unless a clone bundles manifest file is present and clients shouldn't call the command unless the server is advertising the capability, which only happens if the clonebundles extension is enabled and the manifest file exists.	2015-11-03 12:31:33 -08:00
Gregory Szorc	9f94bd29c6	exchange: advertise if a clone bundle was attempted The client now sends a "cbattempted" boolean flag to the "getbundle" wire protocol command to tell the server whether a clone bundle was attempted. The presence of this flag will enable the server to conditionally emit a bundle2 "output" part advertising the availability of clone bundles to compatible clients that don't have it enabled.	2015-10-14 10:36:20 -07:00
Gregory Szorc	b794e37a76	wireproto: properly parse false boolean args (BC) The client represents boolean arguments as '0' and '1'. bool('0') == bool('1') == True, so a simple bool(val) isn't sufficient for converting the argument back to a bool type. Currently, "obsmarkers" is the only boolean argument to getbundle. I /think/ the only place where we currently set the "obsmarkers" argument is during bundle2 pulls. As a result of this bug, the server /might/ be sending obsolete markers bundle2 part(s) to clients that don't request them. That is why I marked this BC. Surprisingly there was no test fall out from this change. I suspect a lapse in test coverage.	2015-10-14 10:58:35 -07:00
Pierre-Yves David	30913031d4	error: get Abort from 'error' instead of 'util' The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be confused about that and gives all the credit to 'util' instead of the hardworking 'error'. In a spirit of equity, we break the cycle of injustice and give back to 'error' the respect it deserves. And screw that 'util' poser. For great justice.	2015-10-08 12:55:45 -07:00
Gregory Szorc	9250e99393	streamclone: move payload header generation into own function The stream clone data over the wire protocol contains a header line indicating total file count and data size. In bundle2, this metadata can be captured by a part parameter and doesn't need to be in the body. In preparation for bundle2, have generatev1() return the raw metadata and move the header generation to its own function.	2015-10-04 19:06:06 -07:00
Gregory Szorc	40483f6f59	streamclone: move _allowstream() from wireproto While we're moving things into streamclone.py...	2015-10-02 16:24:56 -07:00
Gregory Szorc	d8e74180f0	streamclone: move code out of exchange.py We bulk move functions from exchange.py related to streaming clones. Function names were renamed slightly to drop a component redundant with the module name. Docstrings and comments referencing old names and locations were updated accordingly.	2015-10-02 16:05:52 -07:00
Gregory Szorc	f01d323a28	wireproto: use absolute_import	2015-08-08 18:53:17 -07:00
Augie Fackler	c93fd0260e	wireproto: make wirepeer look-before-you-leap on batching This means that users of request batching don't need to worry themselves with capability checking. Instead, they can just use batching, and if the remote server doesn't support batching for some reason the wirepeer code will transparently un-batch the requests. This will allow for some slight simplification in a handful of places. Prior to this change, largefiles would have been silently broken against a server which did not support batching.	2015-08-05 14:15:17 -04:00
Augie Fackler	6a6926de03	batching: migrate basic noop batching into peer.peer "Real" batching only makes sense for wirepeers, but it greatly simplifies the clients of peer instances if they can be ignorant to actual batching capabilities of that peer. By moving the not-really-batched batching code into peer.peer, all peer instances now work with the batching API, thus simplifying users. This leaves a couple of name forwards in wirepeer.py. Originally I had planned to clean those up, but it kind of unclarifies other bits of code that want to use batching, so I think it makes sense for the names to stay exposed by wireproto. Specifically, almost nothing is currently aware of peer (see largefiles.proto for an example), so making them be aware of the peer module and the wireproto module seems like some abstraction leakage. I think the right long-term fix would actually be to make wireproto an implementation detail that clients wouldn't need to know about, but I don't really know what that would entail at the moment. As far as I'm aware, no clients of batching in third-party extensions will need updating, which is nice icing.	2015-08-05 14:51:34 -04:00
Pierre-Yves David	3f4dfaf501	wireproto: remove a debug print This looks like someone forgot something here.	2015-06-30 22:02:40 -07:00
Augie Fackler	8642271531	wireproto: correctly escape batched args and responses (issue4739) This issue appears to be as old as wireproto batching itself: I can reproduce the failure as far back as 6afda0a50a20 trivially by rebasing the test changes in this patch, which was back in the 1.9 era. I didn't test before that change, because prior to that the testfile has a different name and I'm lazy. Note that the test thought it was checking this case, but it actually wasn't: it put a literal ; in the arg and response for its greet command, but the mangle/unmangle step defined in the test meant that instead of "Fo, =;o" going over the wire, "Gp-!><p" went instead, which doesn't contain any special characters (those being [.=;]) and thus not exercising the escaping. The test has been updated to use pre-unmangled special characters, so the request is now "Fo+<:o", which mangles to "Gp,=;p". I have confirmed that the test fails without the adjustment to the escaping rules in wireproto.py. No existing clients of RPC batching were depending on the old behavior in any way. The only actual users of batchable RPCs in core were: 1) largefiles, wherein it batches up many statlfile calls. It sends hexlified hashes over the wire and gets a 0, 1, or 2 back as a response. No risk of special characters. 2) setdiscovery, which was using heads() and known(), both of which communicate via hexlified nodes. Again, no risk of special characters. Since the escaping functionality has been completely broken since it was introduced, we know that it has no users. As such, we can change the escaping mechanism without having to worry about backwards compatibility issues. For the curious, this was detected by chance: it happens that the lz4-compressed text of a test file for remotefilelog compressed to something containing a ;, which then caused the failure when I moved remotefilelog to using batching for file content fetching.	2015-06-30 19:19:17 -04:00
Mike Edgar	2dcf6e652f	wireproto: add config knob for http header length limit Well-behaved Mercurial clients will respect the httpheader capability by not sending http headers longer than the given limit in bytes. The limit is currently hard-coded at 1024 bytes, a safe value for any web server. Since parsing headers is a notable factor in web server performance, tuning header size can nontrivially improve performance for request-heavy operations (eg. obsolete marker negotiation). Exposing the maximum header length limit as a configuration setting is a simple way to enable such tuning.	2015-06-29 12:35:31 -04:00
Gregory Szorc	5380dea2a7	global: mass rewrite to use modern exception syntax Python 2.6 introduced the "except type as instance" syntax, replacing the "except type, instance" syntax that came before. Python 3 dropped support for the latter syntax. Since we no longer support Python 2.4 or 2.5, we have no need to continue supporting the "except type, instance". This patch mass rewrites the exception syntax to be Python 2.6+ and Python 3 compatible. This patch was produced by running `2to3 -f except -w -n .`.	2015-06-23 22:20:08 -07:00
Pierre-Yves David	e567efc154	bundle2: convey PushkeyFailed error over the wire We add a way to convey the precise exception. This will allow better error message on the server.	2015-06-10 13:10:53 -04:00
Pierre-Yves David	cf1fa4bc36	wireprotocol: distinguish list and set in getbundle argument The 'bundlecaps' argument is expected to be a set, but 'listkeys' is expected to be a list where ordering matters. We introduce a new 'scsv' argument type for the 'set' version and move 'csv' to the 'list' version. 'test-ssh.t' is changed because this introduced an instability in the order we were producing listkeys parts.	2015-06-01 10:28:40 -07:00
Pierre-Yves David	c7946f172c	listkey: display the size of the listkey payload in a debug message This is a useful information to have in general and we already have debug output related to listkeys. I'm planning to play around with massive amount of phases roots and bookmarks so having this data in debug will be very useful. This already got me to spot that one of the Logilab's review repo is exchanging 65KB of phases data during each exchanges.	2015-05-28 23:49:19 -07:00
Martin von Zweigbergk	1ed9409235	wireproto: remove unused 'store' import	2015-05-21 11:34:40 -07:00
Gregory Szorc	ce88e05117	exchange: move code for generating a streaming clone into exchange Streaming clones are fast because they are essentially tar files. On mozilla-central, a streaming clone only consumes ~55s CPU time on clients as opposed to ~340s CPU time for a regular clone or gzip bundle unbundle. Mozilla is deploying static file "lookaside" support to our Mercurial server. Static bundles are pre-generated and uploaded to S3. When a clone is performed, the static file is fetched, applied, and then an incremental pull is performed. Unfortunately, on an ideal network connection this still takes as much wall and CPU time as a regular clone (although it does save significant server resources). We like the client-side wall time wins of streaming clones. But we want to leverage S3-based pre-generated files for serving the bulk of clone data. This patch moves the code for producing a "stream bundle" into its own standalone function, away from the wire protocol. This will enable stream bundle files to be produced outside the context of the wire protocol. A bikeshed on whether exchange is the best module for this function might be warranted. I selected exchange instead of changegroup because "stream bundles" aren't changegroups (yet).	2015-05-21 10:27:22 -07:00
Pierre-Yves David	47040c45dd	wireproto: turn an 'except' into a 'finally' as suggest by the comment Look! More hidden footprints!	2015-05-18 13:25:07 -05:00
Augie Fackler	a5b17bd9d1	cleanup: use __builtins__.any instead of util.any any() is available in all Python versions we support now.	2015-05-16 14:30:07 -04:00
Pierre-Yves David	ea218938aa	getbundle: sort bundlecaps before exchanging then over the wire The 'bundlecaps' argument is built as a set, we need to stabilise the order before exchanging them. Otherwise, in the test, http logs are unstable when the 'bundlecaps' contains something (eg: using bundle2).	2015-05-10 05:11:13 -07:00
Pierre-Yves David	0bbcf25859	bundle2-wireproto: properly propagate the server output on error (issue4594) In case of errors, output parts salvaged from the reply bundle are re-injected into the bundle carrying the exception. We still need to fix the situation for non-wireprotocol push.	2015-04-16 03:17:37 -04:00
Pierre-Yves David	e97e706e0f	bundle2: refactor error bundle creation for the wireprotocol We want to add output information to the error bundle. Before doing this, we rework the code to have a single bundler creation and return statement. This will make the update with the output simpler as only one place will have to be touched.	2015-04-16 03:56:50 -04:00
Pierre-Yves David	393230ee25	bundle2: advertise bundle2 by default That way, any new server will be ready to accept bundle2 payload. The decision for the client to use it is still off by default so this is not turning bundle2 everywhere. We introduce a new kill switch for this in case stuff goes wrong.	2015-04-10 15:41:33 -04:00
Pierre-Yves David	af7d20b000	bundle2: rename format, parts and config to final names It is finally time to freeze the bundle2 format! To do so we: - rename HG2Y to HG20, - drop "b2x:" prefix from all part names, - rename capability to "bundle2-exp" to "bundle2" - rename the hook flag from 'bundle2-exp' to 'bundle2'	2015-04-09 16:25:48 -04:00
Pierre-Yves David	151c000fca	bundle2: detect bundle2 stream/request on /HG2./ instead of /HG2Y/ To support more bundle2 formats, we need a wider detection of bundle2-family streams. The various places what were explicitly detecting the full magic string are now matching on the first three characters of it.	2015-04-07 16:01:32 -07:00
Pierre-Yves David	9dd801717e	unbundle20: retrieve unbundler instances through a factory function To support multiple bundle2 formats, we will need a function returning the proper unbundler according to the header. We introduce such aa function and change the usage in the code base. The function will get smarter in later changesets. This is somewhat similar to the dispatching we do for 'HG10' and 'HG11'. The main target is to allow HG2Y support in an extension to ease transition of companies using the experimental protocol in production (yeah...) But I've no doubt this will be useful when playing with a future HG21.	2015-04-06 16:04:33 -07:00
Angel Ezquerra	6e49f7def8	localrepo: remove all external users of localrepo.sopener This change touches every module in which repository.sopener was being used, and changes it for the equivalent repository.svfs. It should now be possible to remove localrepo.sopener.	2015-01-11 00:25:54 +01:00

1 2 3 4

199 Commits