Commit Graph

240 Commits

Author SHA1 Message Date
Jun Wu
f1c575a099 flake8: enable F821 check
Summary:
This check is useful and detects real errors (ex. fbconduit).  Unfortunately
`arc lint` will run it with both py2 and py3 so a lot of py2 builtins will
still be warned.

I didn't find a clean way to disable py3 check. So this diff tries to fix them.
For `xrange`, the change was done by a script:

```
import sys
import redbaron

headertypes = {'comment', 'endl', 'from_import', 'import', 'string',
               'assignment', 'atomtrailers'}

xrangefix = '''try:
    xrange(0)
except NameError:
    xrange = range

'''

def isxrange(x):
    try:
        return x[0].value == 'xrange'
    except Exception:
        return False

def main(argv):
    for i, path in enumerate(argv):
        print('(%d/%d) scanning %s' % (i + 1, len(argv), path))
        content = open(path).read()
        try:
            red = redbaron.RedBaron(content)
        except Exception:
            print('  warning: failed to parse')
            continue
        hasxrange = red.find('atomtrailersnode', value=isxrange)
        hasxrangefix = 'xrange = range' in content
        if hasxrangefix or not hasxrange:
            print('  no need to change')
            continue

        # find a place to insert the compatibility  statement
        changed = False
        for node in red:
            if node.type in headertypes:
                continue
            # node.insert_before is an easier API, but it has bugs changing
            # other "finally" and "except" positions. So do the insert
            # manually.
            # # node.insert_before(xrangefix)
            line = node.absolute_bounding_box.top_left.line - 1
            lines = content.splitlines(1)
            content = ''.join(lines[:line]) + xrangefix + ''.join(lines[line:])
            changed = True
            break

        if changed:
            # "content" is faster than "red.dumps()"
            open(path, 'w').write(content)
            print('  updated')

if __name__ == "__main__":
    sys.exit(main(sys.argv[1:]))
```

For other py2 builtins that do not have a py3 equivalent, some `# noqa`
were added as a workaround for now.

Reviewed By: DurhamG

Differential Revision: D6934535

fbshipit-source-id: 546b62830af144bc8b46788d2e0fd00496838939
2018-04-13 21:51:09 -07:00
Pulkit Goyal
86a5e03ac9 py3: handle keyword arguments correctly in wireproto.py
Differential Revision: https://phab.mercurial-scm.org/D1647
2017-12-10 04:50:16 +05:30
Boris Feld
96e2aa3564 getbundle: add support for 'bookmarks' boolean argument
This new argument requests a 'bookmarks' part from the server. It is meant to
be used instead of the "listkeys" request.
2017-10-17 15:27:17 +02:00
Augie Fackler
4b3002a912 wireproto: more strkwargs cleanup
Differential Revision: https://phab.mercurial-scm.org/D1109
2017-10-15 00:39:53 -04:00
Augie Fackler
4fa42c83ce wireproto: bounce kwargs to/from bytes/str as needed
Differential Revision: https://phab.mercurial-scm.org/D1107
2017-10-15 00:05:43 -04:00
Augie Fackler
f4a2677719 wireproto: use %d to encode int, not %s
Differential Revision: https://phab.mercurial-scm.org/D1102
2017-10-15 00:40:07 -04:00
Augie Fackler
29944e3fdf wireproto: use a proper exception instead of assert False
Differential Revision: https://phab.mercurial-scm.org/D1101
2017-10-15 00:06:06 -04:00
Augie Fackler
7914273147 wireproto: use listcomp instead of map()
The latter returns a generator object on Python 3, which breaks
various parts of hg that expected a list.

Differential Revision: https://phab.mercurial-scm.org/D1100
2017-10-15 00:39:29 -04:00
Augie Fackler
6509db2306 python3: replace im_{self,func} with __{self,func}__ globally
These new names are portable back to Python 2.6.

Differential Revision: https://phab.mercurial-scm.org/D1091
2017-10-14 12:02:15 -04:00
Boris Feld
bea8fba132 configitems: register the 'server.bundle*' family of config
All these config use the same function specifying a default value. We need to
register them all at the same time.
2017-10-11 17:51:40 +02:00
Boris Feld
bbf23f4d9a pull: use 'phase-heads' to retrieve phase information
A new bundle2 capability 'phases' has been added. If 'heads' is part of the
supported value for 'phases', the server supports reading and sending 'phase-
heads' bundle2 part.

Server is now able to process a 'phases' boolean parameter to 'getbundle'. If
'True', a 'phase-heads' bundle2 part will be included in the bundle with phase
information relevant to the whole pulled set. If this method is available the
phases listkey namespace will no longer be listed.

Beside the more efficient encoding of the data, this new method will greatly
improve the phase exchange efficiency for repositories with non-served
changesets (obsolete, secret) since we'll no longer send data about the
filtered heads.

Add a new 'devel.legacy.exchange' config item to allow fallback to the old
'listkey in bundle2' method.

Reminder: the pulled set is not just the changesets bundled by the pull. It
also contains changeset selected by the "pull specification" on the client
side (eg: everything for bare pull). One of the reason why the 'pulled set' is
important is to make sure we can move -common- nodes to public.
2017-09-24 21:27:18 +02:00
Durham Goode
738a32ad20 changegroup: replace changegroup with makechangegroup
As part of reducing the number of changegroup creation APIs, let's replace the
changegroup function with makechangegroup. This pushes the responsibility of
creating the outgoing set to the caller, but that seems like a simple and
reasonable concept for the caller to be aware of.

Differential Revision: https://phab.mercurial-scm.org/D668
2017-09-10 18:48:42 -07:00
Durham Goode
394bc74c72 changegroup: replace changegroupsubset with makechangegroup
As part of getting rid of all the permutations of changegroup creation, let's
remove changegroupsubset and call makechangegroup instead. This moves the
responsibility of creating the outgoing set to the caller, but that seems like a
relatively reasonable unit of functionality for the caller to have to care about
(i.e. what commits should be bundled).

Differential Revision: https://phab.mercurial-scm.org/D665
2017-09-10 18:43:59 -07:00
Kyle Lippincott
8cf1a22c40 wireproto: do not abort after successful lookup
As far as I can tell, this interface originally used 'return' here, so the
"fallthrough" to self._abort made sense. When it was switched to 'yield' this
didn't make sense, but doesn't impact most uses because the 'plain' wrapper in
peer.py's 'batchable' decorator only attempts to yield two items (args and
value).

When using iterbatch, however, it attempts to verify that the @batchable
generators only emit 2 results, by expecting a StopIteration when attempting to
access a third.

Differential Revision: https://phab.mercurial-scm.org/D608
2017-09-01 14:00:13 -07:00
Gregory Szorc
d5338b2208 wireproto: use new peer interface
The wirepeer class provides concrete implementations of peer interface
methods for calling wire protocol commands. It makes sense for this
class to inherit from the peer abstract base class. So we change
that.

Since httppeer and sshpeer have already been converted to the new
interface, peerrepository is no longer adding any value. So it has
been removed. httppeer and sshpeer have been updated to reflect the
loss of peerrepository and the inheritance of the abstract base
class in wirepeer.

The code changes in wirepeer are reordering of methods to group
by interface.

Some Python code in tests was updated to reflect changed APIs.

.. api::

   peer.peerrepository has been removed. Use repository.peer abstract
   base class to represent a peer repository.

Differential Revision: https://phab.mercurial-scm.org/D338
2017-08-10 20:58:28 -07:00
Gregory Szorc
b5802cf167 peer: remove non iterating batcher (API)
The last use of this API was removed in 3bcb9f9a4a63 in 2016. While
not formally deprecated, as of the last commit the code is no longer
explicitly tested. I think the new API has existed long enough for
people to transition to it.

I also have plans to more formalize the peer API and removing batch()
makes that work easier.

I'm not convinced the current client-side API around batching is
great. But it's the best we have at the moment.

.. api:: remove peer.batch()

   Replace with peer.iterbatch().

Differential Revision: https://phab.mercurial-scm.org/D320
2017-08-09 23:35:20 -07:00
Gregory Szorc
34a7acd6d7 wireproto: overhaul iterating batcher code (API)
The remote batching code is difficult to read. Let's improve it.

As part of the refactor, the future returned by method calls on
batchiter() instances is now populated. However, you still need to
consume the results() generator for the future to be set.  But at
least now we can stuff the future somewhere and not have to worry
about aligning method call order with result order since you can
use a future to hold the result.

Also as part of the change, we now verify that @batchable generators
yield exactly 2 values. In other words, we enforce their API.

The non-iter batcher has been unused since 3bcb9f9a4a63. And to my
surprise we had no explicit unit test coverage of it! test-batching.py
has been overhauled to use the iterating batcher.

Since the iterating batcher doesn't allow non-batchable method
calls nor local calls, tests have been updated to reflect reality.
The iterating batcher has been used for multiple releases apparently
without major issue. So this shouldn't cause alarm.

.. api::

   @peer.batchable functions must now yield exactly 2 values

Differential Revision: https://phab.mercurial-scm.org/D319
2017-08-09 23:29:30 -07:00
Gregory Szorc
014510187c wireproto: remove support for local results in @batchable (API)
@peer.batchable decorated generator functions have two forms:

    yield value, None

and

    yield args, future
    yield value

These forms have been present since the decorator was introduced.

There are currently no in-repo consumers of the first form. So this
commit removes support for it.

Note that remoteiterbatcher.submit() asserts the 2nd form. And
3bcb9f9a4a63 removed the last user of remotebatcher, forcing everyone
to remoteiterbatcher. So anything relying on this in the wild would
have been broken since 3bcb9f9a4a63.

.. api::

   @peer.batchable can no longer emit local values

Differential Revision: https://phab.mercurial-scm.org/D318
2017-08-09 22:52:05 -07:00
Gregory Szorc
c3d3dc57f1 wireproto: properly implement batchable checking
remoteiterbatcher (unlike remotebatcher) only supports batchable
commands. This claim can be validated by comparing their
implementations of submit() and noting how remoteiterbatcher assumes
the invoked method has a "batchable" attribute, which is set by
@peer.batchable.

remoteiterbatcher has a custom __getitem__ that was trying to
validate that only batchable methods are called. However, it was only
validating that the called method exists, not that it is batchable.

This wasn't a big deal since remoteiterbatcher.submit() would raise
an AttributeError attempting to `mtd.batchable(...)`.

Let's fix the check and convert it to ProgrammingError, which may
not have been around when this was originally implemented.

Differential Revision: https://phab.mercurial-scm.org/D317
2017-08-09 21:51:45 -07:00
Jun Wu
e47f7dc2fa codemod: register core configitems using a script
This is done by a script [2] using RedBaron [1], a tool designed for doing
code refactoring. All "default" values are decided by the script and are
strongly consistent with the existing code.

There are 2 changes done manually to fix tests:

  [warn] mercurial/exchange.py: experimental.bundle2-output-capture: default needs manual removal
  [warn] mercurial/localrepo.py: experimental.hook-track-tags: default needs manual removal

Since RedBaron is not confident about how to indent things [2].

[1]: https://github.com/PyCQA/redbaron
[2]: https://github.com/PyCQA/redbaron/issues/100
[3]:

#!/usr/bin/env python
# codemod_configitems.py - codemod tool to fill configitems
#
# Copyright 2017 Facebook, Inc.
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.
from __future__ import absolute_import, print_function

import os
import sys

import redbaron

def readpath(path):
    with open(path) as f:
        return f.read()

def writepath(path, content):
    with open(path, 'w') as f:
        f.write(content)

_configmethods = {'config', 'configbool', 'configint', 'configbytes',
                  'configlist', 'configdate'}

def extractstring(rnode):
    """get the string from a RedBaron string or call_argument node"""
    while rnode.type != 'string':
        rnode = rnode.value
    return rnode.value[1:-1]  # unquote, "'str'" -> "str"

def uiconfigitems(red):
    """match *.ui.config* pattern, yield (node, method, args, section, name)"""
    for node in red.find_all('atomtrailers'):
        entry = None
        try:
            obj = node[-3].value
            method = node[-2].value
            args = node[-1]
            section = args[0].value
            name = args[1].value
            if (obj in ('ui', 'self') and method in _configmethods
                and section.type == 'string' and name.type == 'string'):
                entry = (node, method, args, extractstring(section),
                         extractstring(name))
        except Exception:
            pass
        else:
            if entry:
                yield entry

def coreconfigitems(red):
    """match coreconfigitem(...) pattern, yield (node, args, section, name)"""
    for node in red.find_all('atomtrailers'):
        entry = None
        try:
            args = node[1]
            section = args[0].value
            name = args[1].value
            if (node[0].value == 'coreconfigitem' and section.type == 'string'
                and name.type == 'string'):
                entry = (node, args, extractstring(section),
                         extractstring(name))
        except Exception:
            pass
        else:
            if entry:
                yield entry

def registercoreconfig(cfgred, section, name, defaultrepr):
    """insert coreconfigitem to cfgred AST

    section and name are plain string, defaultrepr is a string
    """
    # find a place to insert the "coreconfigitem" item
    entries = list(coreconfigitems(cfgred))
    for node, args, nodesection, nodename in reversed(entries):
        if (nodesection, nodename) < (section, name):
            # insert after this entry
            node.insert_after(
                'coreconfigitem(%r, %r,\n'
                '    default=%s,\n'
                ')' % (section, name, defaultrepr))
            return

def main(argv):
    if not argv:
        print('Usage: codemod_configitems.py FILES\n'
              'For example, FILES could be "{hgext,mercurial}/*/**.py"')
    dirname = os.path.dirname
    reporoot = dirname(dirname(dirname(os.path.abspath(__file__))))

    # register configitems to this destination
    cfgpath = os.path.join(reporoot, 'mercurial', 'configitems.py')
    cfgred = redbaron.RedBaron(readpath(cfgpath))

    # state about what to do
    registered = set((s, n) for n, a, s, n in coreconfigitems(cfgred))
    toregister = {} # {(section, name): defaultrepr}
    coreconfigs = set() # {(section, name)}, whether it's used in core

    # first loop: scan all files before taking any action
    for i, path in enumerate(argv):
        print('(%d/%d) scanning %s' % (i + 1, len(argv), path))
        iscore = ('mercurial' in path) and ('hgext' not in path)
        red = redbaron.RedBaron(readpath(path))
        # find all repo.ui.config* and ui.config* calls, and collect their
        # section, name and default value information.
        for node, method, args, section, name in uiconfigitems(red):
            if section == 'web':
                # [web] section has some weirdness, ignore them for now
                continue
            defaultrepr = None
            key = (section, name)
            if len(args) == 2:
                if key in registered:
                    continue
                if method == 'configlist':
                    defaultrepr = 'list'
                elif method == 'configbool':
                    defaultrepr = 'False'
                else:
                    defaultrepr = 'None'
            elif len(args) >= 3 and (args[2].target is None or
                                     args[2].target.value == 'default'):
                # try to understand the "default" value
                dnode = args[2].value
                if dnode.type == 'name':
                    if dnode.value in {'None', 'True', 'False'}:
                        defaultrepr = dnode.value
                elif dnode.type == 'string':
                    defaultrepr = repr(dnode.value[1:-1])
                elif dnode.type in ('int', 'float'):
                    defaultrepr = dnode.value
            # inconsistent default
            if key in toregister and toregister[key] != defaultrepr:
                defaultrepr = None
            # interesting to rewrite
            if key not in registered:
                if defaultrepr is None:
                    print('[note] %s: %s.%s: unsupported default'
                          % (path, section, name))
                    registered.add(key) # skip checking it again
                else:
                    toregister[key] = defaultrepr
                    if iscore:
                        coreconfigs.add(key)

    # second loop: rewrite files given "toregister" result
    for path in argv:
        # reconstruct redbaron - trade CPU for memory
        red = redbaron.RedBaron(readpath(path))
        changed = False
        for node, method, args, section, name in uiconfigitems(red):
            key = (section, name)
            defaultrepr = toregister.get(key)
            if defaultrepr is None or key not in coreconfigs:
                continue
            if len(args) >= 3 and (args[2].target is None or
                                   args[2].target.value == 'default'):
                try:
                    del args[2]
                    changed = True
                except Exception:
                    # redbaron fails to do the rewrite due to indentation
                    # see https://github.com/PyCQA/redbaron/issues/100
                    print('[warn] %s: %s.%s: default needs manual removal'
                          % (path, section, name))
            if key not in registered:
                print('registering %s.%s' % (section, name))
                registercoreconfig(cfgred, section, name, defaultrepr)
                registered.add(key)
        if changed:
            print('updating %s' % path)
            writepath(path, red.dumps())

    if toregister:
        print('updating configitems.py')
        writepath(cfgpath, cfgred.dumps())

if __name__ == "__main__":
    sys.exit(main(sys.argv[1:]))
2017-07-14 14:22:40 -07:00
Pierre-Yves David
a13e2abdc0 configitems: register the 'server.preferuncompressed' config 2017-06-30 03:44:12 +02:00
Pierre-Yves David
7b7818c09a configitems: register the 'server.maxhttpheaderlen' config 2017-06-30 03:44:11 +02:00
Pierre-Yves David
3369f427a0 configitems: register the 'server.disablefullbundle' config 2017-06-30 03:44:10 +02:00
Pierre-Yves David
834e358808 configitems: register the 'server.bundle1gd' config 2017-06-30 03:44:07 +02:00
Pierre-Yves David
548593d9ae configitems: register the 'server.bundle1' config 2017-06-30 03:44:06 +02:00
Martin von Zweigbergk
dc0d2c3515 wireproto: update reference to deleted addchangegroup()
Thanks to Yuya for catching this.
2017-06-16 09:37:22 -07:00
Gregory Szorc
bc8582fc01 streamclone: consider secret changesets (BC) (issue5589)
Previously, a repo containing secret changesets would be served via
stream clone, transferring those secret changesets. While secret
changesets aren't meant to imply strong security (if you really
want to keep them secret, others shouldn't have read access to the
repo), we should at least make an effort to protect secret changesets
when possible.

After this commit, we no longer serve stream clones for repos
containing secret changesets by default. This is backwards
incompatible behavior. In case anyone is relying on the behavior,
we provide a config option to opt into the old behavior.

Note that this defense is only beneficial for remote repos
accessed via the wire protocol: if a client has access to the
files backing a repo, they can get to the raw data and see secret
revisions.
2017-06-09 10:41:13 -07:00
Martin von Zweigbergk
c3406ac3db cleanup: use set literals
We no longer support Python 2.6, so we can now use set literals.
2017-02-10 16:56:29 -08:00
Siddharth Agarwal
0c7305054f clone: add a server-side option to disable full getbundles (pull-based clones)
For large enough repositories, pull-based clones take too long, and an attempt
to use them indicates some sort of configuration or other issue or maybe an
outdated Mercurial. Add a config option to disable them.
2017-05-11 10:50:05 -07:00
Yuya Nishihara
02022fc3c5 util: wrap s.encode('string_escape') call for future py3 compatibility 2017-03-15 23:06:50 +09:00
Pierre-Yves David
1be9d3fb99 clonebundle: use 'repo.vfs' instead of 'repo.opener'
The later is a 5 years old form.
2017-03-02 03:23:18 +01:00
Jun Wu
22df174ea7 wireproto: remove unused code
Removed an unused line introduced by 251ec05dc288.
2017-02-22 10:14:18 -08:00
Pulkit Goyal
07314d0686 py3: convert the mode argument of os.fdopen to unicodes (1 of 2)
os.fdopen() does not accepts bytes as its second argument which represent the
mode in which the file is to be opened. This patch makes sure unicodes are
passed in py3 by using pycompat.sysstr().
2017-02-13 20:06:38 +05:30
Pierre-Yves David
43b1ef004c wireproto: properly report server Abort during 'getbundle'
Previously Abort raised during 'getbundle' call poorly reported (HTTP-500 for
http, some scary messages for ssh). Abort error have been properly reported for
"push" for a long time, there is not reason to be different for 'getbundle'. We
properly catch such error and report them back the best way available. For
bundle, we issue a valid bundle2 reply (as expected by the client) with an
'error:abort' part. With bundle1 we do as best as we can depending of http or
ssh.
2017-02-10 18:20:58 +01:00
Pierre-Yves David
d00dbd00d9 bundle1: fix bundle1-denied reporting for pull over ssh
Changeset a0966f529e1b introduced a config option to have the server deny pull
using bundle1. The original protocol has not really been design to allow that
kind of error reporting so some hack was used. It turned the hack only works on
HTTP and that ssh server hangs forever when this is used. After further
digging, there is no way to report the error in a unified way. Using `ooberror`
freeze ssh and raising 'Abort' makes HTTP return a HTTP-500 without further
details. So with sadness we implement a version that dispatch according to the
protocol used.

Now the error is properly reported, but we still have ungraceful abort after
that. The protocol do not allow anything better to happen using bundle1.
2017-02-10 18:06:08 +01:00
Pierre-Yves David
5b07cfa3b3 bundle1: display server abort hint during unbundle
The code was printing the abort message but not the hint. This is now fixed.
2017-02-10 17:56:52 +01:00
Pierre-Yves David
64f57e513b bundle1: fix bundle1-denied reporting for push over ssh
Changeset a0966f529e1b introduced a config option to have the server deny push
using bundle1. The original protocol has not really be design to allow such kind
of error reporting so some hack was used. It turned the hack only works on HTTP
and that ssh wire peer hangs forever when the same hack is used. After further
digging, there is no way to report the error in a unified way. Using 'ooberror'
freeze ssh and raising 'Abort' makes HTTP return a HTTP500 without further
details. So with sadness we implement a version that dispatch according to the
protocol used.

We also add a test for pushing over ssh to make sure we won't regress in the
future. That test show that the hint is missing, this is another bug fixed in
the next changeset.
2017-02-10 17:56:59 +01:00
Gregory Szorc
a95fb0b61b wireproto: advertise supported media types and compression formats
This commit introduces support for advertising a server's support for
media types and compression formats in accordance with the spec defined
in internals.wireproto.

The bulk of the new code is a helper function in wireproto.py to
obtain a prioritized list of compression engines available to the
wire protocol. While not utilized yet, we implement support
for obtaining the list of compression engines advertised by the
client.

The upcoming HTTP protocol enhancements are a bit lower-level than
existing tests (most existing tests are command centric). So,
this commit establishes a new test file that will be appropriate
for holding tests around the functionality of the HTTP protocol
itself.

Rounding out this change, `hg debuginstall` now prints compression
engines available to the server.
2016-12-24 15:21:46 -07:00
Gregory Szorc
6a4fd5ab05 wireproto: only advertise HTTP-specific capabilities to HTTP peers (BC)
Previously, the capabilities list was protocol agnostic and we
advertised the same capabilities list to all clients, regardless of
transport protocol.

A few capabilities are specific to HTTP. I see no good reason why we
should advertise them to SSH clients. So this patch limits their
advertisement to HTTP clients.

This patch is BC, but SSH clients shouldn't be using the removed
capabilities so there should be no impact.
2016-11-28 20:46:42 -08:00
Yuya Nishihara
47f9c8b52e py3: bulk replace sys.stdin/out/err by util's
Almost all sys.stdin/out/err in hgext/ and mercurial/ are replaced by util's.
There are a few exceptions:

 - lsprof.py and statprof.py are untouched since they are a kind of vendor
   code and they never import mercurial modules right now.
 - ui._readline() needs to replace sys.stdin and stdout to pass them to
   raw_input(). We'll need another workaround here.
2016-10-20 23:53:36 +09:00
Gregory Szorc
2112fb0fd2 wireproto: perform chunking and compression at protocol layer (API)
Currently, the "streamres" response type is populated with a generator
of chunks with compression possibly already applied. This puts the onus
on commands to perform chunking and compression. Architecturally, I
think this is the wrong place to perform this work. I think commands
should say "here is the data" and the protocol layer should take care
of encoding the final bytes to put on the wire.

Additionally, upcoming commits will improve wire protocol support for
compression. Having a central place for performing compression in the
protocol transport layer will be easier than having to deal with
compression at the commands layer.

This commit refactors the "streamres" response type to accept either
a generator or an object with "read." Additionally, the type now
accepts a flag indicating whether the response is a "version 1
compressible" response. This basically identifies all commands
currently performing compression. I could have used a special type
for this, but a flag works just as well. The argument name
foreshadows the introduction of wire protocol changes, hence the "v1."

The code for chunking and compressing has been moved to the output
generation function for each protocol transport. Some code has been
inlined, resulting in the deletion of now unused methods.
2016-11-20 13:50:45 -08:00
Gregory Szorc
1538b87cfc wireproto: compress data from a generator
Currently, the "getbundle" wire protocol command obtains a generator of
data, converts it to a util.chunkbuffer, then converts it back to a
generator via the protocol's groupchunks() implementation. For the SSH
protocol, groupchunks() simply reads 4kb chunks then write()s the
data to a file descriptor. For the HTTP protocol, groupchunks() reads
32kb chunks, feeds those into a zlib compressor, emits compressed data
as it is available, and that is sent to the WSGI layer, where it is
likely turned into HTTP chunked transfer chunks as is or further
buffered and turned into a larger chunk.

For both the SSH and HTTP protocols, there is inefficiency from using
util.chunkbuffer.

For SSH, emitting consistent 4kb chunks sounds nice. However, the file
descriptor it is writing to is almost certainly buffered. That means
that a Python .write() probably doesn't translate into exactly what is
written to the I/O layer.

For HTTP, we're going through an intermediate layer to zlib compress
data. So all util.chunkbuffer is doing is ensuring that the chunks we
feed into the zlib compressor are of uniform size. This means more CPU
time in Python buffering and emitting chunks in util.chunkbuffer but
fewer function calls to zlib.

This patch introduces and implements a new wire protocol abstract
method: compresschunks(). It is like groupchunks() except it operates
on a generator instead of something with a .read(). The SSH
implementation simply proxies chunks. The HTTP implementation uses
zlib compression.

To avoid duplicate code, the HTTP groupchunks() has been reimplemented
in terms of compresschunks().

To prove this all works, the "getbundle" wire protocol command has been
switched to compresschunks(). This removes the util.chunkbuffer from
that command. Now, data essentially streams straight from the
changegroup emitter to the wire, possibly through a zlib compressor.
Generators all the way, baby.

There were slim to no performance changes on the server as measured
with the mozilla-central repository. This is likely because CPU
time is dominated by reading revlogs, producing the changegroup, and
zlib compressing the output stream. Still, this brings us a little
closer to our ideal of using generators everywhere.
2016-10-16 11:10:21 -07:00
Gregory Szorc
26f6f03d4c exchange: refactor APIs to obtain bundle data (API)
Currently, exchange.getbundle() returns either a cg1unpacker or a
util.chunkbuffer (in the case of bundle2). This is kinda OK, as
both expose a .read() to consumers. However, localpeer.getbundle()
has code inferring what the response type is based on arguments and
converts the util.chunkbuffer returned in the bundle2 case to a
bundle2.unbundle20 instance. This is a sign that the API for
exchange.getbundle() is not ideal because it doesn't consistently
return an "unbundler" instance.

In addition, unbundlers mask the fact that there is an underlying
generator of changegroup data. In both cg1 and bundle2, this generator
is being fed into a util.chunkbuffer so it can be re-exposed as a
file object.

util.chunkbuffer is a nice abstraction. However, it should only be
used "at the edges." This is because keeping data as a generator is
more efficient than converting it to a chunkbuffer, especially if we
convert that chunkbuffer back to a generator (as is the case in some
code paths currently).

This patch refactors exchange.getbundle() into
exchange.getbundlechunks(). The new API returns an iterator of chunks
instead of a file-like object.

Callers of exchange.getbundle() have been updated to use the new API.

There is a minor change of behavior in test-getbundle.t. This is
because `hg debuggetbundle` isn't defining bundlecaps. As a result,
a cg1 data stream and unpacker is being produced. This is getting fed
into a new bundle20 instance via bundle2.writebundle(), which uses
a backchannel mechanism between changegroup generation to add the
"nbchanges" part parameter. I never liked this backchannel mechanism
and I plan to remove it someday. `hg bundle` still produces the
"nbchanges" part parameter, so there should be no user-visible
change of behavior. I consider this "regression" a bug in
`hg debuggetbundle`. And that bug is captured by an existing
"TODO" in the code to use bundle2 capabilities.
2016-10-16 10:38:52 -07:00
Gregory Szorc
36f039b85b wireproto: rename argument to groupchunks()
groupchunks() is a generic "turn a file object into a generator"
function. It isn't limited to changegroups. Rename the argument
and update the docstring to reflect this.
2016-09-25 12:20:31 -07:00
Gregory Szorc
6fcb5a026b wireproto: remove gboptslist (API)
This variable has been unused since 4e11f86d8ce7, which was over
2 years ago. gboptsmap should be used instead.

Marking as API because this could break extensions.
2016-08-06 15:00:34 -07:00
Gregory Szorc
b3c092eb61 wireproto: unescape argument names in batch command (BC)
Clients escape both argument names and values when using the
batch command. Yet the server was only unescaping argument values.

Fortunately we don't have any argument names that need escaped. But
that isn't an excuse to lack symmetry in the code.

Since the server wasn't properly unescaping argument names, this
means we can never introduce an argument to a batchable command that
needs escaped because an old server wouldn't properly decode its name.
So we've introduced an assertion to detect the accidental introduction
of this in the future. Of course, we could introduce a server
capability that says the server knows how to decode argument names and
allow special argument names to go through. But until there is a need
for it (which I doubt there will be), we shouldn't bother with adding
an unused capability.
2016-08-06 13:55:21 -07:00
Gregory Szorc
7315f10dde wireproto: consolidate code for obtaining "cmds" argument value
Both wireproto.py and sshpeer.py had code for producing the value to
the "cmds" argument used by the "batch" command. This patch extracts
that code to a standalone function and uses it.
2016-08-06 13:46:28 -07:00
Augie Fackler
d19a4d923c wirepeer: rename confusing source parameter
It's named "url" everyplace else this method is defined, so let's be
consistent.
2016-08-05 16:34:30 -04:00
Gregory Szorc
589c58d0cd wireproto: extract repo filtering to standalone function
As part of teaching Mozilla's replication extension to better handle
repositories with obsolescence data, I encountered a few scenarios
where I wanted built-in wire protocol commands from replication clients
to operate on unfiltered repositories so they could have access to
obsolete changesets.

While the undocumented "web.view" config option provides a mechanism
to choose what filter/view hgweb operates on, this doesn't apply
to wire protocol commands because wireproto.dispatch() is always
operating on the "served" repo.

This patch extracts the line for obtaining the repo that
wireproto commands operate on to its own function so extensions
can monkeypatch it to e.g. return an unfiltered repo.

I stopped short of exposing a config option because I view the use
case for changing this as a niche feature, best left to the domain
of extensions.
2016-07-15 13:41:34 -07:00
Augie Fackler
ad67b99d20 cleanup: replace uses of util.(md5|sha1|sha256|sha512) with hashlib.\1
All versions of Python we support or hope to support make the hash
functions available in the same way under the same name, so we may as
well drop the util forwards.
2016-06-10 00:12:33 -04:00