sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-11 09:17:30 +03:00

Author	SHA1	Message	Date
Gregory Szorc	f71c86b7e9	protocol: send application/mercurial-0.2 responses to capable clients With this commit, the HTTP transport now parses the X-HgProto-<N> header to determine what media type and compression engine to use for responses. So far, we only compress responses that are already being compressed with zlib today (stream response types to specific commands). We can expand things to cover additional response types later. The practical side-effect of this commit is that non-zlib compression engines will be used if both ends support them. This means if both ends have zstd support, zstd - not zlib - will be used to compress data! When cloning the mozilla-unified repository between a local HTTP server and client, the benefits of non-zlib compression are quite noticeable: engine server CPU (s) client CPU (s) bundle size zlib (l=6) 174.1 283.2 1,148,547,026 zstd (l=1) 99.2 267.3 1,127,513,841 zstd (l=3) 103.1 266.9 1,018,861,363 zstd (l=7) 128.3 269.7 919,190,278 zstd (l=10) 162.0 - 894,547,179 none 95.3 277.2 4,097,566,064 The default zstd compression level is 3. So if you deploy zstd capable Mercurial to your clients and servers and CPU time on your server is dominated by "getbundle" requests (clients cloning and pulling) - and my experience at Mozilla tells me this is often the case - this commit could drastically reduce your server-side CPU usage and save on bandwidth costs! Another benefit of this change is that server operators can install any compression engine. While it isn't enabled by default, the "none" compression engine can now be used to disable wire protocol compression completely. Previously, commands like "getbundle" always zlib compressed output, adding considerable overhead to generating responses. If you are on a high speed network and your server is under high load, it might be advantageous to trade bandwidth for CPU. Although, zstd at level 1 doesn't use that much CPU, so I'm not convinced that disabling compression wholesale is worthwhile. And, my data seems to indicate a slow down on the client without compression. I suspect this is due to a lack of buffering resulting in an increase in socket read() calls and/or the fact we're transferring an extra 3 GB of data (parsing HTTP chunked transfer and processing extra TCP packets can add up). This is definitely worth investigating and optimizing. But since the "none" compressor isn't enabled by default, I'm inclined to punt on this issue. This commit introduces tons of tests. Some of these should arguably have been implemented on previous commits. But it was difficult to test without the server functionality in place.	2016-12-24 15:29:32 -07:00
Gregory Szorc	52ab84abd8	httppeer: extract code for HTTP header spanning A second consumer of HTTP header spanning will soon be introduced. Factor out the code to do this so it can be reused.	2016-12-24 14:46:02 -07:00
Gregory Szorc	2220c845b2	protocol: declare transport protocol name We add an attribute to the HTTP and SSH protocol implementations identifying the transport so future patches can conditionally expose capabilities on a per-transport basis.	2016-11-28 20:46:59 -08:00
Gregory Szorc	2112fb0fd2	wireproto: perform chunking and compression at protocol layer (API) Currently, the "streamres" response type is populated with a generator of chunks with compression possibly already applied. This puts the onus on commands to perform chunking and compression. Architecturally, I think this is the wrong place to perform this work. I think commands should say "here is the data" and the protocol layer should take care of encoding the final bytes to put on the wire. Additionally, upcoming commits will improve wire protocol support for compression. Having a central place for performing compression in the protocol transport layer will be easier than having to deal with compression at the commands layer. This commit refactors the "streamres" response type to accept either a generator or an object with "read." Additionally, the type now accepts a flag indicating whether the response is a "version 1 compressible" response. This basically identifies all commands currently performing compression. I could have used a special type for this, but a flag works just as well. The argument name foreshadows the introduction of wire protocol changes, hence the "v1." The code for chunking and compressing has been moved to the output generation function for each protocol transport. Some code has been inlined, resulting in the deletion of now unused methods.	2016-11-20 13:50:45 -08:00
Augie Fackler	5229cbbf19	protocol: drop unused import of zlib Something weird is happening that breaks pyflakes installed via 'pip install --user'. I haven't had a chance to finish debugging this, but this at least fixes the build.	2016-11-10 15:14:05 -05:00
Gregory Szorc	f8bef20b48	hgweb: use compression engine API for zlib compression More low-level compression code elimination because we now have nice APIs. This patch also demonstrates why we needed and implemented the "level" option on the "compressstream" API.	2016-11-07 18:54:35 -08:00
Gregory Szorc	1538b87cfc	wireproto: compress data from a generator Currently, the "getbundle" wire protocol command obtains a generator of data, converts it to a util.chunkbuffer, then converts it back to a generator via the protocol's groupchunks() implementation. For the SSH protocol, groupchunks() simply reads 4kb chunks then write()s the data to a file descriptor. For the HTTP protocol, groupchunks() reads 32kb chunks, feeds those into a zlib compressor, emits compressed data as it is available, and that is sent to the WSGI layer, where it is likely turned into HTTP chunked transfer chunks as is or further buffered and turned into a larger chunk. For both the SSH and HTTP protocols, there is inefficiency from using util.chunkbuffer. For SSH, emitting consistent 4kb chunks sounds nice. However, the file descriptor it is writing to is almost certainly buffered. That means that a Python .write() probably doesn't translate into exactly what is written to the I/O layer. For HTTP, we're going through an intermediate layer to zlib compress data. So all util.chunkbuffer is doing is ensuring that the chunks we feed into the zlib compressor are of uniform size. This means more CPU time in Python buffering and emitting chunks in util.chunkbuffer but fewer function calls to zlib. This patch introduces and implements a new wire protocol abstract method: compresschunks(). It is like groupchunks() except it operates on a generator instead of something with a .read(). The SSH implementation simply proxies chunks. The HTTP implementation uses zlib compression. To avoid duplicate code, the HTTP groupchunks() has been reimplemented in terms of compresschunks(). To prove this all works, the "getbundle" wire protocol command has been switched to compresschunks(). This removes the util.chunkbuffer from that command. Now, data essentially streams straight from the changegroup emitter to the wire, possibly through a zlib compressor. Generators all the way, baby. There were slim to no performance changes on the server as measured with the mozilla-central repository. This is likely because CPU time is dominated by reading revlogs, producing the changegroup, and zlib compressing the output stream. Still, this brings us a little closer to our ideal of using generators everywhere.	2016-10-16 11:10:21 -07:00
Gregory Szorc	36f039b85b	wireproto: rename argument to groupchunks() groupchunks() is a generic "turn a file object into a generator" function. It isn't limited to changegroups. Rename the argument and update the docstring to reflect this.	2016-09-25 12:20:31 -07:00
Gregory Szorc	312f42b6e4	hgweb: tweak zlib chunking behavior When doing streaming compression with zlib, zlib appears to emit chunks with data after ~20-30kb on average is available. In other words, most calls to compress() return an empty string. On the mozilla-unified repo, only 48,433 of 921,167 (5.26%) of calls to compress() returned data. In other words, we were sending hundreds of thousands of empty chunks via a generator where they touched who knows how many frames (my guess is millions). Filtering out the empty chunks from the generator cuts down on overhead. In addition, we were previously feeding 8kb chunks into zlib compression. Since this function tends to emit compressed data after 20-30kb is available, it would take several calls before data was produced. We increase the amount of data fed in at a time to 32kb. This reduces the number of calls to compress() from 921,167 to 115,146. It also reduces the number of output chunks from 48,433 to 31,377. This does increase the average output chunk size by a little. But I don't think this will matter in most scenarios. The combination of these 2 changes appears to shave ~6s CPU time or ~3% from a server serving the mozilla-unified repo.	2016-08-14 21:29:46 -07:00
Gregory Szorc	118980f02b	hgweb: document why we don't allow untrusted settings to control zlib Added comment per discussion on mercurial-devel.	2016-08-15 20:39:33 -07:00
Gregory Szorc	8720bcb69f	hgweb: config option to control zlib compression level Before this patch, the HTTP transport protocol would always zlib compress certain responses (notably "getbundle" wire protocol commands) at zlib compression level 6. zlib can be a massive CPU resource sink for servers. Some server operators may wish to reduce server-side CPU requirements while requiring more bandwidth. This is common on corporate intranets, for example. Others may wish to use more CPU but reduce bandwidth. This patch introduces a config option to allow server operators to control the zlib compression level. On the "mozilla-unified" generaldelta repository, setting this value to "0" (disable compression) results in server-side CPU utilization for a `hg clone` going from ~180s to ~124s CPU time on my i7-6700K. A level of "1" (which increases the transfer size from ~1,074 MB at level 6 to ~1,222 MB) utilizes ~132s CPU time.	2016-08-07 18:09:58 -07:00
timeless	109fcbc79e	pycompat: switch to util.urlreq/util.urlerr for py3 compat	2016-04-06 23:22:12 +00:00
timeless	f77cdcd3b1	pycompat: switch to util.stringio for py3 compat	2016-04-10 20:55:37 +00:00
Augie Fackler	b3f8347d29	http: support sending hgargs via POST body instead of in GET or headers narrowhg (for its narrow spec) and remotefilelog (for its large batch requests) would like to be able to make requests with argument sets so absurdly large that they blow out total request size limit on some http servers. As a workaround, support stuffing args at the start of the POST body. We will probably want to leave this behavior off by default in servers forever, because it makes the old "POSTs are only for writes" assumption wrong, which might break some of the simpler authentication configurations.	2016-03-11 11:37:00 -05:00
Yuya Nishihara	47690f822c	hgweb: use absolute_import	2015-10-31 22:07:40 +09:00
Pierre-Yves David	5d414d928b	wireproto: introduce an abstractserverproto class sshserver and webproto now inherit from an abstractserverproto class. This class is introduced for documentation purpose.	2014-03-28 11:10:33 -07:00
Mads Kiilerich	202753eeb5	hgweb: pass the actual response body to request.response, not just the length This makes it less likely to send a response that doesn't match Content-Length.	2013-01-15 01:07:03 +01:00
Mads Kiilerich	87b39b3461	hgweb: use Content-Length for pushres This prevents some unnecessary http connection close.	2013-01-15 01:05:12 +01:00
Andrew Pritchard	2d8acb3e0b	wireproto: add out-of-band error class to allow remote repo to report errors Older clients will still print the provided error message and not much else: over ssh, this will be each line prefixed with 'remote: ' in addition to an "abort: unexpected response: '\n'"; over http, this will be the '---%<---' banners in addition to the 'does not appear to be a repository' message. Currently, clients with this patch will display 'abort: remote error:\n' and the provided error text, but it is trivial to style the error text however is deemed appropriate.	2011-08-02 15:21:10 -04:00
Idan Kamara	325f77da0a	ui: use I/O descriptors internally and as a result: - fix webproto to redirect the ui descriptors instead of sys.stdout/err - fix sshserver to use the ui descriptors	2011-06-08 01:39:20 +03:00
Martin Geisler	af8a35e078	check-code: flag 0/1 used as constant Boolean expression	2011-06-01 12:38:46 +02:00
Matt Mackall	89ec131e91	http: minor tweaks to long arg handling x-arg -> x-hgarg replace itertools.count(1)	2011-05-01 03:51:04 -05:00
Steven Brown	c1075f3880	httprepo: long arguments support (issue2126) Send the command arguments in the HTTP headers. The command is still part of the URL. If the server does not have the 'httpheader' capability, the client will send the command arguments in the URL as it did previously. Web servers typically allow more data to be placed within the headers than in the URL, so this approach will: - Avoid HTTP errors due to using a URL that is too large. - Allow Mercurial to implement a more efficient wire protocol. An alternate approach is to send the arguments as part of the request body. This approach has been rejected because it requires the use of POST requests, so it would break any existing configuration that relies on the request type for authentication or caching. Extensibility: - The header size is provided by the server, which makes it possible to introduce an hgrc setting for it. - The client ignores the capability value after the first comma, which allows more information to be included in the future.	2011-05-01 01:04:37 +08:00
Peter Arrenbrecht	5925b26799	wireproto: fix handling of '*' args for HTTP and SSH	2011-03-22 07:38:32 +01:00
Benoit Boissinot	9bd037b429	wireproto/http: drain the incoming bundle in case of errors	2010-10-11 12:47:11 -05:00
Benoit Boissinot	16e11c728a	wireproto: introduce pusherr() to deal with "unsynced changes" error The behaviour between http and ssh still differ: - the "unsynced changes" is seen as a remote output in the http cases - but it is correctly seen as a push error for ssh	2010-10-11 12:45:36 -05:00
Dirkjan Ochtman	5772f0c9ef	protocol: use generators instead of req.write() for hgweb stream responses	2010-07-20 09:56:37 +02:00
Dirkjan Ochtman	bfec66d497	protocol: wrap non-string protocol responses in classes	2010-07-20 20:53:33 +02:00
Dirkjan Ochtman	d80015833d	protocol: extract compression from streaming mechanics	2010-07-16 22:20:10 +02:00
Dirkjan Ochtman	3e08fe969b	protocol: rename send methods to get grouping by prefix	2010-07-16 18:18:35 +02:00
Dirkjan Ochtman	4d89fb24c9	protocol: shuffle server methods to group send methods	2010-07-16 18:16:15 +02:00
Dirkjan Ochtman	1a4105fea1	protocol: command must be checked before passing in	2010-07-16 19:01:34 +02:00
Matt Mackall	20113ea93e	protocol: move hgweb protocol support back into protocol.py - introduce iscmd - simplify error handling - remove unneeded imports	2010-07-15 15:05:04 -05:00
Matt Mackall	0cc5d56580	protocol: unify server-side capabilities functions	2010-07-15 13:56:52 -05:00
Matt Mackall	a6024ca63a	protocol: unify unbundle on the server side	2010-07-15 11:24:42 -05:00
Matt Mackall	47c6d08427	protocol: unify stream_out command	2010-07-14 16:19:27 -05:00
Matt Mackall	050367f581	protocol: unify changegroup commands - add sendchangegroup protocol helpers - handle commands with None results - move changegroup commands into wireproto.py	2010-07-14 15:43:20 -05:00
Matt Mackall	24246d7bcf	protocol: use new wireproto infrastructure in ssh - add protocol helper - insert wireproto into dispatcher - drop duplicate functions from hgweb implementation	2010-07-14 15:25:15 -05:00
Matt Mackall	e4cf775b71	addchangegroup: pass in lock to release it before changegroup hook is called Currently, callers of addchangegroup first acquire the repository lock, usually to check that an unbundle request isn't racing. This means that changegroup hook actions that might write to a repo get stuck waiting for a lock. Here, we add a new optional lock parameter and update all the callers. Post-1.6 we may make it non-optional.	2010-06-25 13:47:28 -05:00
Matt Mackall	d8e0a2188b	pushkey: add http support pushkey requires the same permissions as push listitems requires the same permissions as pull	2010-06-16 16:05:19 -05:00
Mark Determann	7b2e3d8cf6	hgweb: fix attribute error in error response (issue2060)	2010-04-01 22:04:30 +01:00
Sune Foldager	272ea1e4f1	hgweb: use string join instead of slower cStringIO	2010-02-23 11:37:40 -05:00
Sune Foldager	b4bd699e4d	hgweb: fix handling of arguments in the between command The 'pairs' argument was coded to be optional, but the code would crash if it was not provided.	2010-02-23 11:34:08 -05:00
Matt Mackall	b7afbe529a	streamclone: allow uncompressed clones by default	2010-02-07 15:31:53 +01:00
Matt Mackall	595d66f424	Update license to GPLv2+	2010-01-19 22:20:08 -06:00
Dirkjan Ochtman	d1740999b1	hgweb/sshserver: extract capabilities for easier modification	2009-11-05 11:07:01 +01:00
Sune Foldager	ee001cdc90	hgweb: send proper error messages to the client Fixes a bug in protocol which caused an exception during exception handling in some cases on Windows. Also makes sure the server error message is correctly propagated to the client, instead of being thrown away.	2009-11-02 10:20:04 +01:00
Martin Geisler	ae0794fd45	coding style: use a space after comma I left a cases like 'lambda x,y:' alone -- the lack of a space does not bother me as much when the variables are single letters.	2009-07-22 23:12:54 +02:00
Henrik Stuart	b843569ec5	acl: support for getting authenticated user from web server (issue298) Previously, the acl extension just read the current system user, which is fine for direct file system access and SSH, but will not work for HTTP(S) as that would return the web server process user identity rather than the authenticated user. An empty user is returned if the user is not authenticated.	2009-06-07 20:31:38 +02:00
Henrik Stuart	45e3728174	hgweb: escape REMOTE_HOST when passing url for addchangegroup If DNS lookups are turned off on the web server, REMOTE_HOST may be populated with REMOTE_ADDR, which, if the remote is an IPv6 hosts will contain colons, thus interfering with the separator character. This is solved by URL quoting the REMOTE_HOST string.	2009-06-07 20:15:37 +02:00

1 2

77 Commits