Commit Graph

77 Commits

Author SHA1 Message Date
Gregory Szorc
f71c86b7e9 protocol: send application/mercurial-0.2 responses to capable clients
With this commit, the HTTP transport now parses the X-HgProto-<N>
header to determine what media type and compression engine to use for
responses. So far, we only compress responses that are already being
compressed with zlib today (stream response types to specific
commands). We can expand things to cover additional response types
later.

The practical side-effect of this commit is that non-zlib compression
engines will be used if both ends support them. This means if both
ends have zstd support, zstd - not zlib - will be used to compress
data!

When cloning the mozilla-unified repository between a local HTTP
server and client, the benefits of non-zlib compression are quite
noticeable:

  engine     server CPU (s)   client CPU (s)    bundle size
zlib (l=6)      174.1            283.2         1,148,547,026
zstd (l=1)       99.2            267.3         1,127,513,841
zstd (l=3)      103.1            266.9         1,018,861,363
zstd (l=7)      128.3            269.7           919,190,278
zstd (l=10)     162.0               -            894,547,179
none             95.3            277.2         4,097,566,064

The default zstd compression level is 3. So if you deploy zstd
capable Mercurial to your clients and servers and CPU time on
your server is dominated by "getbundle" requests (clients cloning
and pulling) - and my experience at Mozilla tells me this is often
the case - this commit could drastically reduce your server-side
CPU usage *and* save on bandwidth costs!

Another benefit of this change is that server operators can install
*any* compression engine. While it isn't enabled by default, the
"none" compression engine can now be used to disable wire protocol
compression completely. Previously, commands like "getbundle" always
zlib compressed output, adding considerable overhead to generating
responses. If you are on a high speed network and your server is under
high load, it might be advantageous to trade bandwidth for CPU.
Although, zstd at level 1 doesn't use that much CPU, so I'm not
convinced that disabling compression wholesale is worthwhile. And, my
data seems to indicate a slow down on the client without compression.
I suspect this is due to a lack of buffering resulting in an increase
in socket read() calls and/or the fact we're transferring an extra 3 GB
of data (parsing HTTP chunked transfer and processing extra TCP packets
can add up). This is definitely worth investigating and optimizing. But
since the "none" compressor isn't enabled by default, I'm inclined to
punt on this issue.

This commit introduces tons of tests. Some of these should arguably
have been implemented on previous commits. But it was difficult to
test without the server functionality in place.
2016-12-24 15:29:32 -07:00
Gregory Szorc
52ab84abd8 httppeer: extract code for HTTP header spanning
A second consumer of HTTP header spanning will soon be introduced.
Factor out the code to do this so it can be reused.
2016-12-24 14:46:02 -07:00
Gregory Szorc
2220c845b2 protocol: declare transport protocol name
We add an attribute to the HTTP and SSH protocol implementations
identifying the transport so future patches can conditionally
expose capabilities on a per-transport basis.
2016-11-28 20:46:59 -08:00
Gregory Szorc
2112fb0fd2 wireproto: perform chunking and compression at protocol layer (API)
Currently, the "streamres" response type is populated with a generator
of chunks with compression possibly already applied. This puts the onus
on commands to perform chunking and compression. Architecturally, I
think this is the wrong place to perform this work. I think commands
should say "here is the data" and the protocol layer should take care
of encoding the final bytes to put on the wire.

Additionally, upcoming commits will improve wire protocol support for
compression. Having a central place for performing compression in the
protocol transport layer will be easier than having to deal with
compression at the commands layer.

This commit refactors the "streamres" response type to accept either
a generator or an object with "read." Additionally, the type now
accepts a flag indicating whether the response is a "version 1
compressible" response. This basically identifies all commands
currently performing compression. I could have used a special type
for this, but a flag works just as well. The argument name
foreshadows the introduction of wire protocol changes, hence the "v1."

The code for chunking and compressing has been moved to the output
generation function for each protocol transport. Some code has been
inlined, resulting in the deletion of now unused methods.
2016-11-20 13:50:45 -08:00
Augie Fackler
5229cbbf19 protocol: drop unused import of zlib
Something weird is happening that breaks pyflakes installed via 'pip
install --user'. I haven't had a chance to finish debugging this, but
this at least fixes the build.
2016-11-10 15:14:05 -05:00
Gregory Szorc
f8bef20b48 hgweb: use compression engine API for zlib compression
More low-level compression code elimination because we now have nice
APIs.

This patch also demonstrates why we needed and implemented the
"level" option on the "compressstream" API.
2016-11-07 18:54:35 -08:00
Gregory Szorc
1538b87cfc wireproto: compress data from a generator
Currently, the "getbundle" wire protocol command obtains a generator of
data, converts it to a util.chunkbuffer, then converts it back to a
generator via the protocol's groupchunks() implementation. For the SSH
protocol, groupchunks() simply reads 4kb chunks then write()s the
data to a file descriptor. For the HTTP protocol, groupchunks() reads
32kb chunks, feeds those into a zlib compressor, emits compressed data
as it is available, and that is sent to the WSGI layer, where it is
likely turned into HTTP chunked transfer chunks as is or further
buffered and turned into a larger chunk.

For both the SSH and HTTP protocols, there is inefficiency from using
util.chunkbuffer.

For SSH, emitting consistent 4kb chunks sounds nice. However, the file
descriptor it is writing to is almost certainly buffered. That means
that a Python .write() probably doesn't translate into exactly what is
written to the I/O layer.

For HTTP, we're going through an intermediate layer to zlib compress
data. So all util.chunkbuffer is doing is ensuring that the chunks we
feed into the zlib compressor are of uniform size. This means more CPU
time in Python buffering and emitting chunks in util.chunkbuffer but
fewer function calls to zlib.

This patch introduces and implements a new wire protocol abstract
method: compresschunks(). It is like groupchunks() except it operates
on a generator instead of something with a .read(). The SSH
implementation simply proxies chunks. The HTTP implementation uses
zlib compression.

To avoid duplicate code, the HTTP groupchunks() has been reimplemented
in terms of compresschunks().

To prove this all works, the "getbundle" wire protocol command has been
switched to compresschunks(). This removes the util.chunkbuffer from
that command. Now, data essentially streams straight from the
changegroup emitter to the wire, possibly through a zlib compressor.
Generators all the way, baby.

There were slim to no performance changes on the server as measured
with the mozilla-central repository. This is likely because CPU
time is dominated by reading revlogs, producing the changegroup, and
zlib compressing the output stream. Still, this brings us a little
closer to our ideal of using generators everywhere.
2016-10-16 11:10:21 -07:00
Gregory Szorc
36f039b85b wireproto: rename argument to groupchunks()
groupchunks() is a generic "turn a file object into a generator"
function. It isn't limited to changegroups. Rename the argument
and update the docstring to reflect this.
2016-09-25 12:20:31 -07:00
Gregory Szorc
312f42b6e4 hgweb: tweak zlib chunking behavior
When doing streaming compression with zlib, zlib appears to emit chunks
with data after ~20-30kb on average is available. In other words, most
calls to compress() return an empty string. On the mozilla-unified repo,
only 48,433 of 921,167 (5.26%) of calls to compress() returned data.
In other words, we were sending hundreds of thousands of empty chunks
via a generator where they touched who knows how many frames (my guess
is millions). Filtering out the empty chunks from the generator
cuts down on overhead.

In addition, we were previously feeding 8kb chunks into zlib
compression. Since this function tends to emit *compressed* data after
20-30kb is available, it would take several calls before data was
produced. We increase the amount of data fed in at a time to 32kb.
This reduces the number of calls to compress() from 921,167 to
115,146. It also reduces the number of output chunks from 48,433 to
31,377. This does increase the average output chunk size by a little.
But I don't think this will matter in most scenarios.

The combination of these 2 changes appears to shave ~6s CPU time
or ~3% from a server serving the mozilla-unified repo.
2016-08-14 21:29:46 -07:00
Gregory Szorc
118980f02b hgweb: document why we don't allow untrusted settings to control zlib
Added comment per discussion on mercurial-devel.
2016-08-15 20:39:33 -07:00
Gregory Szorc
8720bcb69f hgweb: config option to control zlib compression level
Before this patch, the HTTP transport protocol would always zlib
compress certain responses (notably "getbundle" wire protocol commands)
at zlib compression level 6.

zlib can be a massive CPU resource sink for servers. Some server
operators may wish to reduce server-side CPU requirements while
requiring more bandwidth. This is common on corporate intranets, for
example. Others may wish to use more CPU but reduce bandwidth.

This patch introduces a config option to allow server operators
to control the zlib compression level.

On the "mozilla-unified" generaldelta repository, setting this
value to "0" (disable compression) results in server-side CPU
utilization for a `hg clone` going from ~180s to ~124s CPU time on
my i7-6700K.  A level of "1" (which increases the transfer size from
~1,074 MB at level 6 to ~1,222 MB) utilizes ~132s CPU time.
2016-08-07 18:09:58 -07:00
timeless
109fcbc79e pycompat: switch to util.urlreq/util.urlerr for py3 compat 2016-04-06 23:22:12 +00:00
timeless
f77cdcd3b1 pycompat: switch to util.stringio for py3 compat 2016-04-10 20:55:37 +00:00
Augie Fackler
b3f8347d29 http: support sending hgargs via POST body instead of in GET or headers
narrowhg (for its narrow spec) and remotefilelog (for its large batch
requests) would like to be able to make requests with argument sets so
absurdly large that they blow out total request size limit on some
http servers. As a workaround, support stuffing args at the start
of the POST body.

We will probably want to leave this behavior off by default in servers
forever, because it makes the old "POSTs are only for writes"
assumption wrong, which might break some of the simpler authentication
configurations.
2016-03-11 11:37:00 -05:00
Yuya Nishihara
47690f822c hgweb: use absolute_import 2015-10-31 22:07:40 +09:00
Pierre-Yves David
5d414d928b wireproto: introduce an abstractserverproto class
sshserver and webproto now inherit from an abstractserverproto class. This class
is introduced for documentation purpose.
2014-03-28 11:10:33 -07:00
Mads Kiilerich
202753eeb5 hgweb: pass the actual response body to request.response, not just the length
This makes it less likely to send a response that doesn't match Content-Length.
2013-01-15 01:07:03 +01:00
Mads Kiilerich
87b39b3461 hgweb: use Content-Length for pushres
This prevents some unnecessary http connection close.
2013-01-15 01:05:12 +01:00
Andrew Pritchard
2d8acb3e0b wireproto: add out-of-band error class to allow remote repo to report errors
Older clients will still print the provided error message and not much else:
over ssh, this will be each line prefixed with 'remote: ' in addition to an
"abort: unexpected response: '\n'"; over http, this will be the '---%<---'
banners in addition to the 'does not appear to be a repository' message.

Currently, clients with this patch will display 'abort: remote error:\n' and
the provided error text, but it is trivial to style the error text however is
deemed appropriate.
2011-08-02 15:21:10 -04:00
Idan Kamara
325f77da0a ui: use I/O descriptors internally
and as a result:
- fix webproto to redirect the ui descriptors instead of sys.stdout/err
- fix sshserver to use the ui descriptors
2011-06-08 01:39:20 +03:00
Martin Geisler
af8a35e078 check-code: flag 0/1 used as constant Boolean expression 2011-06-01 12:38:46 +02:00
Matt Mackall
89ec131e91 http: minor tweaks to long arg handling
x-arg -> x-hgarg
replace itertools.count(1)
2011-05-01 03:51:04 -05:00
Steven Brown
c1075f3880 httprepo: long arguments support (issue2126)
Send the command arguments in the HTTP headers. The command is still part
of the URL. If the server does not have the 'httpheader' capability, the
client will send the command arguments in the URL as it did previously.

Web servers typically allow more data to be placed within the headers than
in the URL, so this approach will:
- Avoid HTTP errors due to using a URL that is too large.
- Allow Mercurial to implement a more efficient wire protocol.

An alternate approach is to send the arguments as part of the request body.
This approach has been rejected because it requires the use of POST
requests, so it would break any existing configuration that relies on the
request type for authentication or caching.

Extensibility:
- The header size is provided by the server, which makes it possible to
  introduce an hgrc setting for it.
- The client ignores the capability value after the first comma, which
  allows more information to be included in the future.
2011-05-01 01:04:37 +08:00
Peter Arrenbrecht
5925b26799 wireproto: fix handling of '*' args for HTTP and SSH 2011-03-22 07:38:32 +01:00
Benoit Boissinot
9bd037b429 wireproto/http: drain the incoming bundle in case of errors 2010-10-11 12:47:11 -05:00
Benoit Boissinot
16e11c728a wireproto: introduce pusherr() to deal with "unsynced changes" error
The behaviour between http and ssh still differ:
- the "unsynced changes" is seen as a remote output in the http cases
- but it is correctly seen as a push error for ssh
2010-10-11 12:45:36 -05:00
Dirkjan Ochtman
5772f0c9ef protocol: use generators instead of req.write() for hgweb stream responses 2010-07-20 09:56:37 +02:00
Dirkjan Ochtman
bfec66d497 protocol: wrap non-string protocol responses in classes 2010-07-20 20:53:33 +02:00
Dirkjan Ochtman
d80015833d protocol: extract compression from streaming mechanics 2010-07-16 22:20:10 +02:00
Dirkjan Ochtman
3e08fe969b protocol: rename send methods to get grouping by prefix 2010-07-16 18:18:35 +02:00
Dirkjan Ochtman
4d89fb24c9 protocol: shuffle server methods to group send methods 2010-07-16 18:16:15 +02:00
Dirkjan Ochtman
1a4105fea1 protocol: command must be checked before passing in 2010-07-16 19:01:34 +02:00
Matt Mackall
20113ea93e protocol: move hgweb protocol support back into protocol.py
- introduce iscmd
- simplify error handling
- remove unneeded imports
2010-07-15 15:05:04 -05:00
Matt Mackall
0cc5d56580 protocol: unify server-side capabilities functions 2010-07-15 13:56:52 -05:00
Matt Mackall
a6024ca63a protocol: unify unbundle on the server side 2010-07-15 11:24:42 -05:00
Matt Mackall
47c6d08427 protocol: unify stream_out command 2010-07-14 16:19:27 -05:00
Matt Mackall
050367f581 protocol: unify changegroup commands
- add sendchangegroup protocol helpers
- handle commands with None results
- move changegroup commands into wireproto.py
2010-07-14 15:43:20 -05:00
Matt Mackall
24246d7bcf protocol: use new wireproto infrastructure in ssh
- add protocol helper
- insert wireproto into dispatcher
- drop duplicate functions from hgweb implementation
2010-07-14 15:25:15 -05:00
Matt Mackall
e4cf775b71 addchangegroup: pass in lock to release it before changegroup hook is called
Currently, callers of addchangegroup first acquire the repository
lock, usually to check that an unbundle request isn't racing. This
means that changegroup hook actions that might write to a repo get
stuck waiting for a lock. Here, we add a new optional lock parameter
and update all the callers. Post-1.6 we may make it non-optional.
2010-06-25 13:47:28 -05:00
Matt Mackall
d8e0a2188b pushkey: add http support
pushkey requires the same permissions as push
listitems requires the same permissions as pull
2010-06-16 16:05:19 -05:00
Mark Determann
7b2e3d8cf6 hgweb: fix attribute error in error response (issue2060) 2010-04-01 22:04:30 +01:00
Sune Foldager
272ea1e4f1 hgweb: use string join instead of slower cStringIO 2010-02-23 11:37:40 -05:00
Sune Foldager
b4bd699e4d hgweb: fix handling of arguments in the between command
The 'pairs' argument was coded to be optional, but the code would
crash if it was not provided.
2010-02-23 11:34:08 -05:00
Matt Mackall
b7afbe529a streamclone: allow uncompressed clones by default 2010-02-07 15:31:53 +01:00
Matt Mackall
595d66f424 Update license to GPLv2+ 2010-01-19 22:20:08 -06:00
Dirkjan Ochtman
d1740999b1 hgweb/sshserver: extract capabilities for easier modification 2009-11-05 11:07:01 +01:00
Sune Foldager
ee001cdc90 hgweb: send proper error messages to the client
Fixes a bug in protocol which caused an exception during exception handling in
some cases on Windows. Also makes sure the server error message is correctly
propagated to the client, instead of being thrown away.
2009-11-02 10:20:04 +01:00
Martin Geisler
ae0794fd45 coding style: use a space after comma
I left a cases like 'lambda x,y:' alone -- the lack of a space does
not bother me as much when the variables are single letters.
2009-07-22 23:12:54 +02:00
Henrik Stuart
b843569ec5 acl: support for getting authenticated user from web server (issue298)
Previously, the acl extension just read the current system user, which
is fine for direct file system access and SSH, but will not work for
HTTP(S) as that would return the web server process user identity
rather than the authenticated user. An empty user is returned if the
user is not authenticated.
2009-06-07 20:31:38 +02:00
Henrik Stuart
45e3728174 hgweb: escape REMOTE_HOST when passing url for addchangegroup
If DNS lookups are turned off on the web server, REMOTE_HOST may be
populated with REMOTE_ADDR, which, if the remote is an IPv6 hosts will
contain colons, thus interfering with the separator character. This is
solved by URL quoting the REMOTE_HOST string.
2009-06-07 20:15:37 +02:00