Commit Graph

1681 Commits

Author SHA1 Message Date
Augie Fackler
bd4e5e07d7 localrepo: remove superfluous pass statements 2017-09-30 07:44:34 -04:00
Mark Thomas
073ae56963 revlog: add option to mmap revlog index
Following on from Jun Wu's patch last October[1], we have found that using mmap
for the revlog index in repos with large revlogs gives a noticable performance
improvment (~110ms on each hg invocation), particularly for commands that don't
touch the index very much.

This changeset adds this as an option, activated by a new experimental config
option so that it can be enabled on a per-repo basis. The configuration option
specifies an index size threshold at which Mercurial will switch to using mmap
to access the index.

If the configuration option is not specified, the default remains to load the
full file, which seems to be the best option for smaller repos.

Some initial performance numbers for average of 5 invocations of `hg log -l 5`
for different cache states:

| Repo: | HG | FB |
|---|---|---|
| Index size: | 2.3MB | much bigger |
| read (warm): | 237ms | 432ms |
| mmap (warm): | 227ms | 321ms |
|   | (-3%) | (-26%) |
| read (cold): | 397ms | 696ms |
| mmap (cold): | 410ms | 888ms |
|   | (+3%) | (+28%) |

[1] https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-October/088737.html

Test Plan:
`hg log --config experimental.mmapindex=true`

Differential Revision: https://phab.mercurial-scm.org/D477
2017-09-13 17:26:26 +00:00
Durham Goode
738a32ad20 changegroup: replace changegroup with makechangegroup
As part of reducing the number of changegroup creation APIs, let's replace the
changegroup function with makechangegroup. This pushes the responsibility of
creating the outgoing set to the caller, but that seems like a simple and
reasonable concept for the caller to be aware of.

Differential Revision: https://phab.mercurial-scm.org/D668
2017-09-10 18:48:42 -07:00
Durham Goode
394bc74c72 changegroup: replace changegroupsubset with makechangegroup
As part of getting rid of all the permutations of changegroup creation, let's
remove changegroupsubset and call makechangegroup instead. This moves the
responsibility of creating the outgoing set to the caller, but that seems like a
relatively reasonable unit of functionality for the caller to have to care about
(i.e. what commits should be bundled).

Differential Revision: https://phab.mercurial-scm.org/D665
2017-09-10 18:43:59 -07:00
Gregory Szorc
cf126063f8 localrepo: use peer interfaces
We now have a formal abstract base class for peers. Let's
transition the peer classes in localrepo to it.

As part of the transition, we reorder methods so they are grouped
by interface and match the order they are defined in the interface.
We also had to change self.ui from an instance attribute to a
property to satisfy the @abstractproperty requirement.

As part of this change, we uncover the first "bug" as part of
enforcing interfaces: stream_out() wasn't implemented on localpeer!
This isn't technically a bug since the repo isn't advertising the
stream capability, so clients shouldn't be attempting to call it.
But I don't think there's a good reason why this is the case.
We implement a dummy method to satisfy the interface requriements.
We can make localpeer instances streamable as a future enhancement.

# no-check-commit

Differential Revision: https://phab.mercurial-scm.org/D335
2017-08-09 23:52:25 -07:00
Augie Fackler
ef945af30b merge with stable 2017-08-10 18:55:33 -04:00
Yuya Nishihara
ba69ca47d4 pathauditor: disable cache of audited paths by default (issue5628)
The initial attempt was to discard cache when appropriate, but it appears
to be error prone. We had to carefully inspect all places where audit() is
called e.g. without actually updating filesystem, before removing files and
directories, etc.

So, this patch disables the cache of audited paths by default, and enables
it only for the following cases:

 - short-lived auditor objects
 - repo.vfs, repo.svfs, and repo.cachevfs, which are managed directories
   and considered sort of append-only (a file/directory would never be
   replaced with a symlink)

There would be more cacheable vfs objects (e.g. mq.queue.opener), but I
decided not to inspect all of them in this patch. We can make them cached
later.

Benchmark result:

- using old clone of http://selenic.com/repo/linux-2.6/ (38319 files)
- on tmpfs
- run HGRCPATH=/dev/null hg up -q --time tip && hg up -q null
- try 4 times and take the last three results

original:
real 7.480 secs (user 1.140+22.760 sys 0.150+1.690)
real 8.010 secs (user 1.070+22.280 sys 0.170+2.120)
real 7.470 secs (user 1.120+22.390 sys 0.120+1.910)

clearcache (the other series):
real 7.680 secs (user 1.120+23.420 sys 0.140+1.970)
real 7.670 secs (user 1.110+23.620 sys 0.130+1.810)
real 7.740 secs (user 1.090+23.510 sys 0.160+1.940)

enable cache only for vfs and svfs (this series):
real 8.730 secs (user 1.500+25.190 sys 0.260+2.260)
real 8.750 secs (user 1.490+25.170 sys 0.250+2.340)
real 9.010 secs (user 1.680+25.340 sys 0.280+2.540)

remove cache function at all (for reference):
real 9.620 secs (user 1.440+27.120 sys 0.250+2.980)
real 9.420 secs (user 1.400+26.940 sys 0.320+3.130)
real 9.760 secs (user 1.530+27.270 sys 0.250+2.970)
2017-07-26 22:10:15 +09:00
Gregory Szorc
8509056f34 sparse: add a requirement when a repository uses sparse (BC)
The presence of a sparse checkout can confuse legacy clients or
clients without sparse enabled for reasons that should be obvious.

This commit introduces a new repository requirement that tracks
whether sparse is enabled. The requirement is added when a sparse
config is activated and removed when the sparse config is reset.

The localrepository constructor has been taught to not open repos
with this requirement unless the sparse feature is enabled. It yields
a more actionable error message than what you would get if the
lockout were handled strictly at the requirements verification phase.
Old clients that aren't sparse aware will see the generic
"repository requires features unknown to this Mercurial" error,
however.

The new requirement has "exp" in its name to reflect the
experimental nature of sparse. There's a chance that the eventual
non-experimental feature won't change significantly and we could
have squatted on the "sparse" requirement without ill effect. If
that happens, we can teach new clients to still recognize the old
name. But I suspect we'll sneak in some BC and we'll want a new
requirement to convey new meaning.

Differential Revision: https://phab.mercurial-scm.org/D110
2017-07-17 11:45:38 -07:00
Boris Feld
68ddce738e transaction-summary: display the summary for all transactions
Now that we records "all" changes happening in a transaction (in tr.changes)
we will be able to provide better report on various changes (phases turned
public, changeset obsoleted, branch merged or created, etc..)

This is far too late in the cycle to play with this, but having this existing
method called more widely will help extensions to play around with various
options during the 4.4 cycle.

Instead of calling registersummarycallback only for transactions we want, we
always call it and use the transaction name to decide when to report (eg: we
do not want `hg amend` to report new obsoleted changesets). Filtering on
transaction name does not seems great, but seems good enough for the moment.
We can change the API during the next cycle.

The previous manual call during unbundling of the bundle2 "obsmarkers" part is
no longer necessary and has been dropped.
2017-07-16 02:20:06 +02:00
Boris Feld
838490c3d7 share: share 'cachevfs' with the source clone (issue5108)
Share extension now also share caches reads and writes. Not sharing caches
results in costly caches recomputations which can takes up to minutes when
using shares on large repositories.

There are a couple of file in the '.hg/cache/' that depends of the current
visibility. Visibility can be affected by the working copy location, something
which is specific to each share. We ignores them for this series because they:

* are the minority,
* already have a good fallback to other precomputed caches,
* are only affected when people use the experimental evolution feature.
2017-07-15 23:49:22 +02:00
Boris Feld
7c41aa9dd4 cachevfs: add a devel warning for cache access though 'vfs'
This will help third party extensions to migrate to the new 'cachevfs'.
2017-07-15 23:05:15 +02:00
Boris Feld
7f091f15a2 cachevfs: add a vfs dedicated to cache
Most of the cache content lives in '.hg/cache/'. Moreover they are computed
exclusively from data in the '.hg/store' directory. This creates issues with
the share extension as the '.hg/store' directory is shared but the '.hg/cache'
is not. On large repositories, this makes this prevent some usage of the share
extension inefficient as some caches can take minutes to be recomputed.

To improve the situation, we introduce a new 'cachevfs' that will be dedicated
to cache reading and writing. In the next patches of this series, we'll
migrate the 4 existing caches to it and update the share extension.
2017-07-15 23:05:04 +02:00
Boris Feld
4d593bc1e9 vfsward: register 'write with no lock' warnings as 'check-locks' config
Update 'write with no lock' warnings in order to be better controlled by the
config.  We reuse the option used for lock order for these other lock related
message.

The message can now be disabled using 'devel.check-locks = no' (in addition to
the usual 'devel.all-warnings = no').
2017-07-15 22:40:51 +02:00
Boris Feld
54f5d4d490 bookmark: track bookmark changes at the transaction level
The transaction has now a 'bookmarks' dictionary in tr.changes. The structure
of the dictionary is {BOOKMARK_NAME: (OLD_NODE, NEW_NODE)}. If a bookmark is
deleted NEW_NODE will be None. If a bookmark is created OLD_NODE will be None.

If the bookmark is updated multiple time, the initial value is preserved.
2017-07-10 20:26:53 +02:00
Martin von Zweigbergk
61c94e060a repo: skip invalidation of changelog if it has 'delayed' changes (API)
The changelog object can store recently added revisions in memory
until the transaction is committed. We don't want to lose those
changes even if repo.invalidate(clearfilecache=True), so let's skip
the changelog when it has such 'delayed' changes.

Differential Revision: https://phab.mercurial-scm.org/D152
2017-07-19 13:34:06 -07:00
Gregory Szorc
a5fcc9d7f8 localrepo: remove unused requirements attributes on localpeer (API)
The previous changeset removed the last consumer of requirements. I'm
not sure when supportedformats became unused. But I couldn't find
any obvious instances where it is being used. It likely stems from
peers being derived from repository instances several years ago and
is a holdover from that day.

Differential Revision: https://phab.mercurial-scm.org/D266
2017-08-07 20:17:02 -07:00
Gregory Szorc
59e773f0f6 exchange: drop support for lock-based unbundling (BC)
Locking over the wire protocol and the "addchangegroup" wire
protocol command has been deprecated since f8e443eb02c9, which was
first part of Mercurial 0.9.1.

Support for handling these commands from sshserver was dropped in
93297d5f4df2 in 2015, effectively locking out pre 0.9.1 clients
from new servers.

However, client-side code for calling lock and addchangegroup is
still present in exchange.py and the various peer classes to
facilitate pushing to pre 0.9.1 servers.

The lock-based pushing mechanism is extremely brittle. 0.9.1 was
released in July 2006 and I highly doubt anyone is still running
such an ancient version of Mercurial on a server. I'm about to
refactor the peer API and I don't think it is worth keeping
support for this ancient protocol feature. So, this commit removes
client support for the lock-based pushing mechanism. This means
modern clients will no longer be able to push to pre 0.9.1 servers.

Differential Revision: https://phab.mercurial-scm.org/D264
2017-08-06 17:44:56 -07:00
Jun Wu
e47f7dc2fa codemod: register core configitems using a script
This is done by a script [2] using RedBaron [1], a tool designed for doing
code refactoring. All "default" values are decided by the script and are
strongly consistent with the existing code.

There are 2 changes done manually to fix tests:

  [warn] mercurial/exchange.py: experimental.bundle2-output-capture: default needs manual removal
  [warn] mercurial/localrepo.py: experimental.hook-track-tags: default needs manual removal

Since RedBaron is not confident about how to indent things [2].

[1]: https://github.com/PyCQA/redbaron
[2]: https://github.com/PyCQA/redbaron/issues/100
[3]:

#!/usr/bin/env python
# codemod_configitems.py - codemod tool to fill configitems
#
# Copyright 2017 Facebook, Inc.
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.
from __future__ import absolute_import, print_function

import os
import sys

import redbaron

def readpath(path):
    with open(path) as f:
        return f.read()

def writepath(path, content):
    with open(path, 'w') as f:
        f.write(content)

_configmethods = {'config', 'configbool', 'configint', 'configbytes',
                  'configlist', 'configdate'}

def extractstring(rnode):
    """get the string from a RedBaron string or call_argument node"""
    while rnode.type != 'string':
        rnode = rnode.value
    return rnode.value[1:-1]  # unquote, "'str'" -> "str"

def uiconfigitems(red):
    """match *.ui.config* pattern, yield (node, method, args, section, name)"""
    for node in red.find_all('atomtrailers'):
        entry = None
        try:
            obj = node[-3].value
            method = node[-2].value
            args = node[-1]
            section = args[0].value
            name = args[1].value
            if (obj in ('ui', 'self') and method in _configmethods
                and section.type == 'string' and name.type == 'string'):
                entry = (node, method, args, extractstring(section),
                         extractstring(name))
        except Exception:
            pass
        else:
            if entry:
                yield entry

def coreconfigitems(red):
    """match coreconfigitem(...) pattern, yield (node, args, section, name)"""
    for node in red.find_all('atomtrailers'):
        entry = None
        try:
            args = node[1]
            section = args[0].value
            name = args[1].value
            if (node[0].value == 'coreconfigitem' and section.type == 'string'
                and name.type == 'string'):
                entry = (node, args, extractstring(section),
                         extractstring(name))
        except Exception:
            pass
        else:
            if entry:
                yield entry

def registercoreconfig(cfgred, section, name, defaultrepr):
    """insert coreconfigitem to cfgred AST

    section and name are plain string, defaultrepr is a string
    """
    # find a place to insert the "coreconfigitem" item
    entries = list(coreconfigitems(cfgred))
    for node, args, nodesection, nodename in reversed(entries):
        if (nodesection, nodename) < (section, name):
            # insert after this entry
            node.insert_after(
                'coreconfigitem(%r, %r,\n'
                '    default=%s,\n'
                ')' % (section, name, defaultrepr))
            return

def main(argv):
    if not argv:
        print('Usage: codemod_configitems.py FILES\n'
              'For example, FILES could be "{hgext,mercurial}/*/**.py"')
    dirname = os.path.dirname
    reporoot = dirname(dirname(dirname(os.path.abspath(__file__))))

    # register configitems to this destination
    cfgpath = os.path.join(reporoot, 'mercurial', 'configitems.py')
    cfgred = redbaron.RedBaron(readpath(cfgpath))

    # state about what to do
    registered = set((s, n) for n, a, s, n in coreconfigitems(cfgred))
    toregister = {} # {(section, name): defaultrepr}
    coreconfigs = set() # {(section, name)}, whether it's used in core

    # first loop: scan all files before taking any action
    for i, path in enumerate(argv):
        print('(%d/%d) scanning %s' % (i + 1, len(argv), path))
        iscore = ('mercurial' in path) and ('hgext' not in path)
        red = redbaron.RedBaron(readpath(path))
        # find all repo.ui.config* and ui.config* calls, and collect their
        # section, name and default value information.
        for node, method, args, section, name in uiconfigitems(red):
            if section == 'web':
                # [web] section has some weirdness, ignore them for now
                continue
            defaultrepr = None
            key = (section, name)
            if len(args) == 2:
                if key in registered:
                    continue
                if method == 'configlist':
                    defaultrepr = 'list'
                elif method == 'configbool':
                    defaultrepr = 'False'
                else:
                    defaultrepr = 'None'
            elif len(args) >= 3 and (args[2].target is None or
                                     args[2].target.value == 'default'):
                # try to understand the "default" value
                dnode = args[2].value
                if dnode.type == 'name':
                    if dnode.value in {'None', 'True', 'False'}:
                        defaultrepr = dnode.value
                elif dnode.type == 'string':
                    defaultrepr = repr(dnode.value[1:-1])
                elif dnode.type in ('int', 'float'):
                    defaultrepr = dnode.value
            # inconsistent default
            if key in toregister and toregister[key] != defaultrepr:
                defaultrepr = None
            # interesting to rewrite
            if key not in registered:
                if defaultrepr is None:
                    print('[note] %s: %s.%s: unsupported default'
                          % (path, section, name))
                    registered.add(key) # skip checking it again
                else:
                    toregister[key] = defaultrepr
                    if iscore:
                        coreconfigs.add(key)

    # second loop: rewrite files given "toregister" result
    for path in argv:
        # reconstruct redbaron - trade CPU for memory
        red = redbaron.RedBaron(readpath(path))
        changed = False
        for node, method, args, section, name in uiconfigitems(red):
            key = (section, name)
            defaultrepr = toregister.get(key)
            if defaultrepr is None or key not in coreconfigs:
                continue
            if len(args) >= 3 and (args[2].target is None or
                                   args[2].target.value == 'default'):
                try:
                    del args[2]
                    changed = True
                except Exception:
                    # redbaron fails to do the rewrite due to indentation
                    # see https://github.com/PyCQA/redbaron/issues/100
                    print('[warn] %s: %s.%s: default needs manual removal'
                          % (path, section, name))
            if key not in registered:
                print('registering %s.%s' % (section, name))
                registercoreconfig(cfgred, section, name, defaultrepr)
                registered.add(key)
        if changed:
            print('updating %s' % path)
            writepath(path, red.dumps())

    if toregister:
        print('updating configitems.py')
        writepath(cfgpath, cfgred.dumps())

if __name__ == "__main__":
    sys.exit(main(sys.argv[1:]))
2017-07-14 14:22:40 -07:00
Boris Feld
9d590cdd8f localrepo: use the 'registernew' function to set the phase of new commit 2017-07-11 01:05:27 +02:00
Boris Feld
794ac05dda phases: track phase movements in 'advanceboundary'
Makes advanceboundary record the phase movement of affected revisions in
tr.changes['phases'].

The tracking is not usable yet because the 'retractboundary' function can also
affect phases.

We'll improve that in the coming changesets.
2017-07-11 02:39:52 +02:00
Adam Simpkins
2a467faccd dirstate: update backup functions to take full backup filename
Update the dirstate functions so that the caller supplies the full backup
filename rather than just a prefix and suffix.

The localrepo code was already hard-coding the fact that the backup name must
be (exactly prefix + "dirstate" + suffix): it relied on this in _journalfiles()
and undofiles().  Making the caller responsible for specifying the full backup
name removes the need for the localrepo code to assume that dirstate._filename
is always "dirstate".

Differential Revision: https://phab.mercurial-scm.org/D68
2017-07-12 15:24:07 -07:00
Boris Feld
ca27090453 reposvfs: add a ward to check if locks are properly taken
we wrap 'repo.svfs.audit' to check for the store lock when accessing file in
'.hg/store' for writing. This caught a couple of instance where the transaction
was released after the lock, we should probably have a dedicated checker for
that case.
2016-08-08 18:14:42 +02:00
Boris Feld
c9fe43d98b repovfs: add a ward to check if locks are properly taken
When the appropriate developer warnings are enabled, We wrap 'repo.vfs.audit' to
check for locks when accessing file in '.hg' for writing. Another changeset will
add a 'ward' for the store vfs (svfs).

This check system has caught a handful of locking issues that have been fixed
in previous series (mostly in 4.0). I expect another batch to be caught in third
party extensions.

We introduce two real exceptions from extensions 'blackbox.log' (because a lot of
read-only operations add entry to it), and 'last-email.txt' (because 'hg email'
is currently a read only operation and there is value to keep it this way).

In addition we are currently allowing bisect to operate outside of the lock
because the current code is a bit hard to get properly locked for now. Multiple
clean up have been made but there is still a couple of them to do and the freeze
is coming.
2017-07-11 12:38:17 +02:00
Martin von Zweigbergk
bf2a3c6ad8 py3: make localrepo filtered repo cache work on py3
I don't know if this is the right fix, but it makes
test-py3-commands.t pass again.

Differential Revision: https://phab.mercurial-scm.org/D56
2017-07-11 11:21:04 -07:00
Gregory Szorc
c0447df5a2 localrepo: cache types for filtered repos (issue5043)
Python introduces a reference cycle on dynamically created types
via __mro__, making them very easy to leak. See
https://bugs.python.org/issue17950.

Previously, repo.filtered() created a type on every invocation.
Long-running processes (like `hg convert`) could call this
function thousands of times, leading to a steady memory leak.

Since we're Unable to stop the leak because this is a bug in
Python, the next best thing is to contain it.

This patch adds a cache of of the dynamically generated repoview/filter
types on the localrepo object. Since we only generate each type
once, we cap the amount of memory that can leak to something
reasonable.

After this change, `hg convert` no longer leaks memory on every
revision. The process will likely grow memory usage over time due
to e.g. larger manifests. But there are no leaks.
2017-07-01 20:51:19 -07:00
FUJIWARA Katsunori
63fd3449f5 localrepo: add isfilecached to check filecache-ed property is already cached
isfilecached() encapsulates internal implementation of filecache-ed
property.

"name in repo.unfiltered().__dict__" or so can't be used for this
purpose, because corresponded entry in __dict__ might be discarded by
repo.invalidate(), repo.invalidatedirstate() or so (fsmonitor does so,
for example).

This patch makes isfilecached() return not only whether filecache-ed
property is already cached, but also already cached value (or None),
in order to avoid subsequent access to cached object via "repo.NAME",
which prevents main Mercurial procedure after reposetup() from
validating cache.
2017-07-10 23:09:51 +09:00
Gregory Szorc
a7c49e2ec2 dirstate: expose a sparse matcher on dirstate (API)
The sparse extension performs a lot of monkeypatching of dirstate
to make it sparse aware. Essentially, various operations need to
take the active sparse config into account. They do this by obtaining
a matcher representing the sparse config and filtering paths through
it.

The monkeypatching is done by stuffing a reference to a repo on
dirstate and calling sparse.matcher() (which takes a repo instance)
during each function call. The reason this function takes a repo
instance is because resolving the sparse config may require resolving
file contents from filelogs, and that requires a repo. (If the
current sparse config references "profile" files, the contents of
those files from the dirstate's parent revisions is resolved.)

I seem to recall people having strong opinions that the dirstate
object not have a reference to a repo. So copying what the sparse
extension does probably won't fly in core. Plus, the dirstate
modifications shouldn't require a full repo: they only need a matcher.
So there's no good reason to stuff a reference to the repo in
dirstate.

This commit exposes a sparse matcher to dirstate via a property that
when looked up will call a function that eventually calls
sparse.matcher(). The repo instance is bound in a closure, so it
isn't exposed to dirstate.

This approach is functionally similar to what the sparse extension does
today, except it hides the repo instance from dirstate. The approach
is not optimal because we have to call a proxy function and
sparse.matcher() on every property lookup. There is room to cache
the matcher instance in dirstate. After all, the matcher only changes
if the dirstate's parents change or if the sparse config changes. It
feels like we should be able to detect both events and update the
matcher when this occurs. But for now we preserve the existing
semantics so we can move the dirstate sparseness bits into core. Once
in core, refactoring becomes a bit easier since it will be clearer how
all these components interact.

The sparse extension has been updated to use the new property.
Because all references to the repo on dirstate have been removed,
the code for setting it has been removed.
2017-07-08 16:18:04 -07:00
Jun Wu
f50841989e revset: make repo.anyrevs accept customized alias override (API)
Previously repo.anyrevs only expand aliases in [revsetalias] config. This
patch makes it more flexible to accept a customized dict defining aliases
without having to couple with ui.

revsetlang.expandaliases now has the signature (tree, aliases, warn=None)
which is more consistent with templater.expandaliases. revsetlang.py is now
free from "ui", which seems to be a good thing.
2017-06-24 15:29:42 -07:00
Gregory Szorc
0cd417305b localrepo: add sparse caches
The sparse extension maintains caches for the sparse files
to a signature and a signature to a matcher. This allows the
sparse matchers to be resolved quickly, which is apparently
something that can occur in loops.

This patch ports the sparse caches to the localrepo class
pretty much as-is. There is potentially room to improve the
caching mechanism. But that can be done as a follow-up.

The default invalidatecaches() now clears the relevant sparse
cache. invalidatesignaturecache() has been moved to sparse.py.
2017-07-06 12:20:53 -07:00
FUJIWARA Katsunori
ba85a5641c transaction: avoid file stat ambiguity only for files in blacklist
Advancing mtime by os.utime() fails for EPERM, if the target file is
owned by another. 0d920bcb0fd1 and related changes made some code
paths give advancing mtime up in such case, to fix issue5418.

This causes file stat ambiguity (again), if it is owned by another.

    https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan

To avoid file stat ambiguity in such case, especially for
.hg/dirstate, c75c7b3e3284 made vfs.rename() copy the target file, and
advance mtime of renamed one again, if EPERM (see issue5584 for detail).

But straightforward "copy if EPERM" isn't reasonable for truncation of
append-only files at rollbacking, because rollbacking might cost much
for truncation of many filelogs, even though filelogs aren't
filecache-ed.

Therefore, this patch introduces blacklist "checkambigfiles", and
avoids file stat ambiguity only for files specified in this blacklist.

This patch consists of two parts below, which should be applied at
once in order to avoid regression.

  - specify 'checkambig=True' at vfs.open(mode='a') in _playback()
    according to checkambigfiles

  - invoke _playback() with checkambigfiles
    - add transaction.__init__() checkambigfiles argument, for _abort()
      - make localrepo instantiate transaction with _cachedfiles
    - add rollback() checkambigfiles argument, for "hg rollback/recover"
      - make localrepo invoke rollback() with _cachedfiles

After this patch, straightforward "copy if EPERM" will be reasonable
at closing the file opened with checkambig=True, because this policy
is applied only on files, which are listed in blacklist
"checkambigfiles".
2017-07-04 23:13:46 +09:00
FUJIWARA Katsunori
749a6e710f localrepo: store path and vfs location of cached properties
This information is used to make transaction handle these files
specially, in order to avoid file stat ambiguity of them.

Gathering information about cached files via annotation classes can
avoid overlooking properties newly introduced in the future.
2017-07-04 23:13:46 +09:00
Pierre-Yves David
c13c22104d auditor: add simple comment about repo.auditor and al
Every once in a while, I get confused by what these are. Let us add a comment.
2017-07-02 02:19:05 +02:00
Pierre-Yves David
8093995fed transaction: track new obsmarkers in the 'changes' mapping
The obsstore collaborate with transaction to make sure we track all the
obsmarkers added during a transaction. This will be useful for various usages:
hooks, caches, better output, etc.

This is the seconds kind of data added to tr.changes (first one was added revisions)
2017-06-27 02:45:09 +02:00
Pierre-Yves David
d708a01e9b configitems: register the 'format.usestore' config 2017-06-30 03:42:30 +02:00
Pierre-Yves David
ef58ebbe2d configitems: register the 'format.usefncache' config 2017-06-30 03:42:28 +02:00
Pierre-Yves David
66973f9ece configitems: register the 'format.dotencode' config 2017-06-30 03:42:22 +02:00
Pierre-Yves David
6f55ce8db4 configitems: register the 'format.aggressivemergedeltas' config 2017-06-30 03:42:20 +02:00
Pierre-Yves David
7c5463c25b revlog: add an experimental option to mitigated delta issues (issue5480)
The general delta heuristic to select a delta do not scale with the number of
branch. The delta base is frequently too far away to be able to reuse a chain
according to the "distance" criteria. This leads to insertion of larger delta (or
even full text) that themselves push the bases for the next delta further away
leading to more large deltas and full texts. This full text and frequent
recomputation throw Mercurial performance in disarray.

For example of a slightly large repository

  280 000 files (2 150 000 versions)
  430 000 changesets (10 000 topological heads)

Number below compares repository with and without the distance criteria:

manifest size:
    with:    21.4 GB
    without:  0.3 GB

store size:
    with:    28.7 GB
    without   7.4 GB

bundle last 15 00 revisions:
    with:    800 seconds
             971 MB
    without:  50 seconds
              73 MB

unbundle time (of the last 15K revisions):
    with:    1150 seconds (~19 minutes)
    without:   35 seconds

Similar issues has been observed in other repositories.


Adding a new option or "feature" on stable is uncommon. However, given that this
issues is making Mercurial practically unusable, I'm exceptionally targeting
this patch for stable.

What is actually needed is a full rework of the delta building and reading
logic. However, that will be a longer process and churn not suitable for stable.

In the meantime, we introduces a quick and dirty mitigation of this in the
'experimental' config space. The new option introduces a way to set the maximum
amount of memory usable to store a diff in memory. This extend the ability for
Mercurial to create chains without removing all safe guard regarding memory
access. The option should be phased out when core has a more proper solution
available.

Setting the limit to '0' remove all limits, setting it to '-1' use the default
limit (textsize x 4).
2017-06-23 13:49:34 +02:00
FUJIWARA Katsunori
0f2623ec07 localrepo: factor out base of filecache annotation class
It isn't needed that storecache is derived from repofilecache.

Changes in this patch allow repofilecache and storecache to do in own
__init__() differently from each other.
2017-06-30 01:47:49 +09:00
Pulkit Goyal
d1e9e38065 py3: use '%d' instead of '%s' for integers
Python 3 does not let you use '%s' for integers.
2017-06-17 14:53:25 +05:30
Martin von Zweigbergk
d75cb87451 localrepo: remove unused addchangegroup() (API)
This completes the cleanup started in f1b3c9ce0ce7 (localrepo: move
the addchangegroup method in changegroup module, 2014-04-01).
2017-06-15 15:13:18 -07:00
Siddharth Agarwal
f23bf55820 workingctx: add a way for extensions to run code at status fixup time
Some extensions like fsmonitor need to run code after dirstate.status is
called, but while the wlock is held. The extensions could grab the wlock again,
but that has its own peculiar race issues. For example, fsmonitor would not
like its state to be written out if the dirstate has changed underneath (see
issue5581 for what can go wrong in that sort of case).

To protect against these sorts of issues, allow extensions to declare that they
would like to run some code to run at fixup time.

fsmonitor will switch to using this in the next patch in the series.
2017-06-12 13:56:50 -07:00
Gregory Szorc
2c7b5a6b43 localrepo: move filtername to __init__
This is obviously an instance attribute, not a type attribute. The
modern Python style is to use __init__ for defining these.

This exposes statichttprepo as inheriting from localrepository
without calling its __init__. As a result, its __init__ defines
a lot of variables that methods on localrepository's methods need.
But factoring the common bits into a separate class is for another
day.
2017-06-08 23:23:37 -07:00
Gregory Szorc
943d55015e obsolete: move obsstore creation logic from localrepo
This code has more to do with obsolete.py than localrepo.py. Let's
move it there.
2017-06-08 21:54:30 -07:00
Gregory Szorc
efbb740737 revlog: skeleton support for version 2 revlogs
There are a number of improvements we want to make to revlogs
that will require a new version - version 2. It is unclear what the
full set of improvements will be or when we'll be done with them.
What I do know is that the process will likely take longer than a
single release, will require input from various stakeholders to
evaluate changes, and will have many contentious debates and
bikeshedding.

It is unrealistic to develop revlog version 2 up front: there
are just too many uncertainties that we won't know until things
are implemented and experiments are run. Some changes will also
be invasive and prone to bit rot, so sitting on dozens of patches
is not practical.

This commit introduces skeleton support for version 2 revlogs in
a way that is flexible and not bound by backwards compatibility
concerns.

An experimental repo requirement for denoting revlog v2 has been
added. The requirement string has a sub-version component to it.
This will allow us to declare multiple requirements in the course
of developing revlog v2. Whenever we change the in-development
revlog v2 format, we can tweak the string, creating a new
requirement and locking out old clients. This will allow us to
make as many backwards incompatible changes and experiments to
revlog v2 as we want. In other words, we can land code and make
meaningful progress towards revlog v2 while still maintaining
extreme format flexibility up until the point we freeze the
format and remove the experimental labels.

To enable the new repo requirement, you must supply an experimental
and undocumented config option. But not just any boolean flag
will do: you need to explicitly use a value that no sane person
should ever type. This is an additional guard against enabling
revlog v2 on an installation it shouldn't be enabled on. The
specific scenario I'm trying to prevent is say a user with a
4.4 client with a frozen format enabling the option but then
downgrading to 4.3 and accidentally creating repos with an
outdated and unsupported repo format. Requiring a "challenge"
string should prevent this.

Because the format is not yet finalized and I don't want to take
any chances, revlog v2's version is currently 0xDEAD. I figure
squatting on a value we're likely never to use as an actual revlog
version to mean "internal testing only" is acceptable. And
"dead" is easily recognized as something meaningful.

There is a bunch of cleanup that is needed before work on revlog
v2 begins in earnest. I plan on doing that work once this patch
is accepted and we're comfortable with the idea of starting down
this path.
2017-05-19 20:29:11 -07:00
Yuya Nishihara
e6297851af localrepo: map integer and hex wdir identifiers to workingctx
changectx.__init__() is slightly modified to take str(wdirrev) as a valid
integer revision (and raise WdirUnsupported exception.)

Test will be added by the next patch.
2016-08-19 18:40:35 +09:00
Yuya Nishihara
75533e6603 localrepo: document that __contains__() may raise LookupError 2017-05-25 23:18:02 +09:00
Pierre-Yves David
c1ca9ad6ee transaction: run _writejournal unfiltered
The function use the length of the repository, something affected by filtering.
It seems better to use the unfiltered length here.

Credit for finding this goes to Durham Goode.
2017-05-25 01:45:52 +02:00
Augie Fackler
39eba5889f localrepo: extract bookmarkheads method to bookmarks.py
This method is only used internally by destutil, and it's obscure
enough I'm willing to just move it without a deprecation warning,
especially since the new method has more constrained functionality.

Design-wise I'd also like to get active bookmark handling folded into
the bookmark store, so that we don't squirrel away an extra attribute
for the active bookmark on the repository object.
2017-05-18 16:43:56 -04:00
Augie Fackler
33cedfa925 localrepo: mark walk convenience method as deprecated (API) 2017-05-18 18:01:48 -04:00