Commit Graph

122 Commits

Author SHA1 Message Date
Durham Goode
26f64a8aa2 hg: get rid of len(revlog) requirement in verify
Summary:
Verify had some logic that checked the length of the changelog and
manifest to decide if either existed. This allowed for simplifying certain error
messages (like not reporting all the broken changelog manifest pointers if the
manifest was simply gone, and just reporting the manifest was gone).
Unfortunately, in future changelog and manifest implementations len() will be an
expensive function, so let's just get rid of that optimization.

This fixes hg verify in a treeonly repository.

Reviewed By: quark-zju

Differential Revision: D7127168

fbshipit-source-id: 8ddc3dfe3c3c913efd4b7af5fc9715a3e48b60a1
2018-04-13 21:51:22 -07:00
Kaley Huang
6425129a45 verify: add --rev support
Summary:
Currently, commits adding LFS files are not verified server-side since the LFS
extension explicitly skips hash checking during unbundle, to avoid overhead
downloading/reading LFS objects.

We'd like those commit hashes to be verified.  However, the existing verify
command does not scale with a huge repo.  So let's add `verify -r REV` support
to be able to incrementally verify a repo.

Reviewed By: quark-zju

Differential Revision: D6947227

fbshipit-source-id: 6ffda3a814822942630339fc7de62c3fcb284fda
2018-04-13 21:51:12 -07:00
Phil Cohen
0584f5d23f hg: fastverify: unify and fold into core
Summary: `fastverifier` was sometimes being overriden by `shallowverifier` when remotefilelog was enabled. Since the latter is a subset of the former, let's just fold both into the core verifier code backed by a config, `verify.skipmanifests`, that we can default to true.

Reviewed By: DurhamG

Differential Revision: D6882222

fbshipit-source-id: 9f337ca031a070425ccdc9ee02f6765e68436da9
2018-04-13 21:51:03 -07:00
Zhihui Huang
5518fa1a0b Backed out changeset 4dd462c5c1c3 2018-01-03 05:35:56 -08:00
Zhihui Huang
274bc5fb01 verify: add --rev support
Differential Revision: https://phabricator.intern.facebook.com/D6558110
2018-01-03 05:35:56 -08:00
Jun Wu
e47f7dc2fa codemod: register core configitems using a script
This is done by a script [2] using RedBaron [1], a tool designed for doing
code refactoring. All "default" values are decided by the script and are
strongly consistent with the existing code.

There are 2 changes done manually to fix tests:

  [warn] mercurial/exchange.py: experimental.bundle2-output-capture: default needs manual removal
  [warn] mercurial/localrepo.py: experimental.hook-track-tags: default needs manual removal

Since RedBaron is not confident about how to indent things [2].

[1]: https://github.com/PyCQA/redbaron
[2]: https://github.com/PyCQA/redbaron/issues/100
[3]:

#!/usr/bin/env python
# codemod_configitems.py - codemod tool to fill configitems
#
# Copyright 2017 Facebook, Inc.
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.
from __future__ import absolute_import, print_function

import os
import sys

import redbaron

def readpath(path):
    with open(path) as f:
        return f.read()

def writepath(path, content):
    with open(path, 'w') as f:
        f.write(content)

_configmethods = {'config', 'configbool', 'configint', 'configbytes',
                  'configlist', 'configdate'}

def extractstring(rnode):
    """get the string from a RedBaron string or call_argument node"""
    while rnode.type != 'string':
        rnode = rnode.value
    return rnode.value[1:-1]  # unquote, "'str'" -> "str"

def uiconfigitems(red):
    """match *.ui.config* pattern, yield (node, method, args, section, name)"""
    for node in red.find_all('atomtrailers'):
        entry = None
        try:
            obj = node[-3].value
            method = node[-2].value
            args = node[-1]
            section = args[0].value
            name = args[1].value
            if (obj in ('ui', 'self') and method in _configmethods
                and section.type == 'string' and name.type == 'string'):
                entry = (node, method, args, extractstring(section),
                         extractstring(name))
        except Exception:
            pass
        else:
            if entry:
                yield entry

def coreconfigitems(red):
    """match coreconfigitem(...) pattern, yield (node, args, section, name)"""
    for node in red.find_all('atomtrailers'):
        entry = None
        try:
            args = node[1]
            section = args[0].value
            name = args[1].value
            if (node[0].value == 'coreconfigitem' and section.type == 'string'
                and name.type == 'string'):
                entry = (node, args, extractstring(section),
                         extractstring(name))
        except Exception:
            pass
        else:
            if entry:
                yield entry

def registercoreconfig(cfgred, section, name, defaultrepr):
    """insert coreconfigitem to cfgred AST

    section and name are plain string, defaultrepr is a string
    """
    # find a place to insert the "coreconfigitem" item
    entries = list(coreconfigitems(cfgred))
    for node, args, nodesection, nodename in reversed(entries):
        if (nodesection, nodename) < (section, name):
            # insert after this entry
            node.insert_after(
                'coreconfigitem(%r, %r,\n'
                '    default=%s,\n'
                ')' % (section, name, defaultrepr))
            return

def main(argv):
    if not argv:
        print('Usage: codemod_configitems.py FILES\n'
              'For example, FILES could be "{hgext,mercurial}/*/**.py"')
    dirname = os.path.dirname
    reporoot = dirname(dirname(dirname(os.path.abspath(__file__))))

    # register configitems to this destination
    cfgpath = os.path.join(reporoot, 'mercurial', 'configitems.py')
    cfgred = redbaron.RedBaron(readpath(cfgpath))

    # state about what to do
    registered = set((s, n) for n, a, s, n in coreconfigitems(cfgred))
    toregister = {} # {(section, name): defaultrepr}
    coreconfigs = set() # {(section, name)}, whether it's used in core

    # first loop: scan all files before taking any action
    for i, path in enumerate(argv):
        print('(%d/%d) scanning %s' % (i + 1, len(argv), path))
        iscore = ('mercurial' in path) and ('hgext' not in path)
        red = redbaron.RedBaron(readpath(path))
        # find all repo.ui.config* and ui.config* calls, and collect their
        # section, name and default value information.
        for node, method, args, section, name in uiconfigitems(red):
            if section == 'web':
                # [web] section has some weirdness, ignore them for now
                continue
            defaultrepr = None
            key = (section, name)
            if len(args) == 2:
                if key in registered:
                    continue
                if method == 'configlist':
                    defaultrepr = 'list'
                elif method == 'configbool':
                    defaultrepr = 'False'
                else:
                    defaultrepr = 'None'
            elif len(args) >= 3 and (args[2].target is None or
                                     args[2].target.value == 'default'):
                # try to understand the "default" value
                dnode = args[2].value
                if dnode.type == 'name':
                    if dnode.value in {'None', 'True', 'False'}:
                        defaultrepr = dnode.value
                elif dnode.type == 'string':
                    defaultrepr = repr(dnode.value[1:-1])
                elif dnode.type in ('int', 'float'):
                    defaultrepr = dnode.value
            # inconsistent default
            if key in toregister and toregister[key] != defaultrepr:
                defaultrepr = None
            # interesting to rewrite
            if key not in registered:
                if defaultrepr is None:
                    print('[note] %s: %s.%s: unsupported default'
                          % (path, section, name))
                    registered.add(key) # skip checking it again
                else:
                    toregister[key] = defaultrepr
                    if iscore:
                        coreconfigs.add(key)

    # second loop: rewrite files given "toregister" result
    for path in argv:
        # reconstruct redbaron - trade CPU for memory
        red = redbaron.RedBaron(readpath(path))
        changed = False
        for node, method, args, section, name in uiconfigitems(red):
            key = (section, name)
            defaultrepr = toregister.get(key)
            if defaultrepr is None or key not in coreconfigs:
                continue
            if len(args) >= 3 and (args[2].target is None or
                                   args[2].target.value == 'default'):
                try:
                    del args[2]
                    changed = True
                except Exception:
                    # redbaron fails to do the rewrite due to indentation
                    # see https://github.com/PyCQA/redbaron/issues/100
                    print('[warn] %s: %s.%s: default needs manual removal'
                          % (path, section, name))
            if key not in registered:
                print('registering %s.%s' % (section, name))
                registercoreconfig(cfgred, section, name, defaultrepr)
                registered.add(key)
        if changed:
            print('updating %s' % path)
            writepath(path, red.dumps())

    if toregister:
        print('updating configitems.py')
        writepath(cfgpath, cfgred.dumps())

if __name__ == "__main__":
    sys.exit(main(sys.argv[1:]))
2017-07-14 14:22:40 -07:00
Jun Wu
ac08ec9cb1 verify: add a config option to skip certain flag processors
Previously, "hg verify" verifies everything, which could be undesirable when
there are expensive flag processor contents.

This patch adds a "verify.skipflags" developer config. A flag processor will
be skipped if (flag & verify.skipflags) == 0.

In the LFS usecase, that means "hg verify --config verify.skipflags=8192"
will not download all LFS blobs, which could be too large to be stored
locally.

Note: "renamed" is also skipped since its default implementation may call
filelog.data() which will trigger the flag processor.
2017-05-14 09:38:06 -07:00
Jun Wu
319c497b3a verify: always check rawsize
Previously, verify only checks "rawsize == len(rawtext)", if
"len(fl.read()) != fl.size()".

With flag processor, "len(fl.read()) != fl.size()" does not necessarily mean
"rawsize == len(rawtext)". So we may miss a useful check.

This patch removes the "if len(fl.read()) != fl.size()" condition so the
rawsize check is always performed.

With the condition removed, "fl.read(n)" looks unnecessary so a comment was
added to explain the side effect is wanted.
2017-05-11 14:52:02 -07:00
Jun Wu
7d2129ac1d verify: fix length check
According to the document added above, we should check L1 == L2, and the
only way to get L1 in all cases is to call "rawsize()", and the only way to
get L2 is to call "revision(raw=True)". Therefore the fix.

Meanwhile there are still a lot of things about flagprocessor broken in
revlog.py. Tests will be added after revlog.py gets fixed.
2017-03-29 14:49:14 -07:00
Jun Wu
5de9c76dc1 verify: document corner cases
It seems a good idea to list all kinds of "surprises" and expected behavior
to make the upcoming changes easier to understand.

Note: the comment added does not reflect the actual behavior of the current
code.
2017-03-29 14:45:01 -07:00
Martin von Zweigbergk
f225c55b2a verify: replace _validpath() by matcher
The verifier calls out to _validpath() to check if it should verify
that path and the narrowhg extension overrides _validpath() to tell
the verifier to skip that path. In treemanifest repos, the verifier
calls the same method to check if it should visit a
directory. However, the decision to visit a directory is different
from the condition that it's a matching path, and narrowhg was working
around it by returning True from its _validpath() override if *either*
was true.

Similar to how one can do "hg files -I foo/bar/ -X foo/" (making the
include pointless), narrowhg can be configured to track the same
paths. In that case match("foo/bar/baz") would be false, but
match.visitdir("foo/bar/baz") turns out to be true, causing verify to
fail. This may seem like a bug in visitdir(), but it's explicitly
documented to be undefined for subdirectories of excluded
directories. When using treemanifests, the walk would not descend into
foo/, so verification would pass. However, when using flat manifests,
there is no recursive directory walk and the file path "foo/bar/baz"
would be passed to _validpath() without "foo/" (actually without the
slash) being passed first. As explained above, _validpath() would
return true for the file path and "hg verify" would fail.

Replacing the _validpath() method by a matcher seems like the obvious
fix. Narrowhg can then pass in its own matcher and not have to
conflate the two matching functions (for dirs and files). I think it
also makes the code clearer.
2017-01-23 10:48:55 -08:00
Augie Fackler
1a551b5433 verify: avoid shadowing two variables with a list comprehension
The variable names are clearly worse now, but since we're really just
transposing key and value I'm not too worried about the clarity loss.
2016-11-10 16:35:54 -05:00
Durham Goode
52b8095f37 manifest: remove last uses of repo.manifest
Now that all the functionality has been moved to manifestlog/manifestrevlog/etc,
we can finally change all the uses of repo.manifest to use the new versions. A
future diff will then delete repo.manifest.

One additional change in this commit is to change repo.manifestlog to be a
@storecache property instead of @property. This is required by some uses of
repo.manifest require that it be settable (contrib/perf.py and the static http
server). We can't do this in a prior change because we can't use @storecache on
this until repo.manifest is no longer used anywhere.
2016-11-10 02:13:19 -08:00
Durham Goode
d793e01462 manifest: remove manifest.readshallowdelta
This removes manifest.readshallowdelta and converts its one consumer to use
manifestlog instead.
2016-11-02 17:10:47 -07:00
Anton Shestakov
edb97f0e4a verify: specify unit for ui.progress when checking files 2016-03-11 20:18:41 +08:00
Martin von Zweigbergk
766e80bab5 verify: show progress while verifying dirlogs
In repos with treemanifests, the non-root-directory dirlogs often have
many more total revisions than the root manifest log has. This change
adds progress out to that part of 'hg verify'. Since the verification
is recursive along the directory tree, we don't know how many total
revisions there are at the beginning of the command, so instead we
report progress in units of directories, much like we report progress
for verification of files today.

I'm not very happy with passing both 'storefiles' and 'progress' into
the recursive calls. I tried passing in just a 'visitdir(dir)'
callback, but the results did not seem better overall. I'm happy to
update if anyone has better ideas.
2016-02-11 15:38:56 -08:00
Martin von Zweigbergk
45e493c761 verify: check for orphaned dirlogs
We already report orphaned filelogs, i.e. revlogs for files that are
not mentioned in any manifest. This change adds checking for orphaned
dirlogs, i.e. revlogs that are not mentioned in any parent-directory
dirlog.

Note that, for fncachestore, only files mentioned in the fncache are
considered, there's not check for files in .hg/store/meta that are not
mentioned in the fncache. This is no different from the current
situation for filelogs.
2016-02-03 15:35:15 -08:00
Martin von Zweigbergk
b2b4f9e694 verify: check directory manifests
In repos with treemanifests, there is no specific verification of
directory manifest revlogs. It simply collects all file nodes by
reading each manifest delta. With treemanifests, that's means calling
the manifest._slowreaddelta(). If there are missing revlog entries in
a subdirectory revlog, 'hg verify' will simply report the exception
that occurred while trying to read the root manifest:


  manifest@0: reading delta 1700e2e92882: meta/b/00manifest.i@67688a370455: no node

This patch changes the verify code to load only the root manifest at
first and verify all revisions of it, then verify all revisions of
each direct subdirectory, and so on, recursively. The above message
becomes

  b/@0: parent-directory manifest refers to unknown revision 67688a370455

Since the new algorithm reads a single revlog at a time and in order,
'hg verify' on a treemanifest version of the hg core repo goes from
~50s to ~14s. As expected, there is no significant difference on a
repo with flat manifests.
2016-02-07 21:13:24 -08:00
Martin von Zweigbergk
73fc56cffe verify: extract "manifest" constant into variable
The "manifest" label that's used in error messages will instead be the
directory path for subdirectory manifests (not the root manifest), so
let's extract the constant to a variable already to make future
patches simpler.
2016-02-03 15:53:48 -08:00
Martin von Zweigbergk
8868dab63d verify: use similar language for missing manifest and file revisions
When a changeset refers to a manifest revision that's not found in the
manifest log, we say "changeset refers to missing revision X", but
when a manifest refers to file revision that's not found in the
filelog, we say "X in manifests not found". The language used for
missing manifest revisions seems clearer, so let's use that for
missing filelog revisions too.
2016-02-07 22:46:20 -08:00
Martin von Zweigbergk
0e98867ba2 verify: include "manifest" prefix in a few more places
We include the "manifest" prefix on most other errors, so it seems
consistent to add them to the remaining messages too. Also, having the
"manifest" prefix will be more consistent with having the directory
prefix there when we add support for treemanifests. With the
"manifest" at the beginning, let's remove the now-redundant
"manifest" in the message itself.
2016-02-02 10:42:28 -08:00
Martin von Zweigbergk
bb06be61b6 verify: drop unnecessary check for nullid
In 1f64dad11884 (verify: filter messages about missing null manifests
(issue2900), 2011-07-13), we started ignoring nullid in the list of
manifest nodeids to check. Then, in 954b09a82a61 (verify: do not choke
on valid changelog without manifest, 2012-08-21), we stopped adding
nullid to the list to start with. So let's drop the left-over check
now.
2016-02-02 09:46:14 -08:00
Martin von Zweigbergk
d138e53179 verify: move cross-checking of changeset/manifest out of _crosscheckfiles()
Reasons:

 * _crosscheckfiles(), as the name suggests, is about checking that
   the set of files files mentioned in changesets match the set of
   files mentioned in the manifests.

 * The "checking" in _crosscheckfiles() looked rather strange, as it
   just emitted an error for *every* entry in mflinkrevs. The reason
   was that these were the entries remaining after the call to
   _verifymanifest(). Moving all the processing of mflinkrevs into
   _verifymanifest() makes it much clearer that it's the remaining
   entries that are a problem.

Functional change: progress is no longer reported for "crosschecking"
of missing manifest entries. Since the crosschecking phase takes a
tiny fraction of the verification, I don't think this is a
problem. Also, any reports of "changeset refers to unknown manifest"
will now come before "crosschecking files in changesets and
manifests".
2016-01-31 00:10:56 -08:00
Martin von Zweigbergk
e50c296659 treemanifests: fix streaming clone
Similar to the previous patch, the .hg/store/meta/ directory does not
get copied when when using "hg clone --uncompressed". Fix by including
"meta/" in store.datafiles(). This seems safe to do, as there are only
a few users of this method. "hg manifest" already filters the paths by
"data/" prefix. The calls from largefiles also seem safe. The use in
verify needs updating to prevent it from mistaking dirlogs for
orphaned filelogs. That change is included in this patch.

Since the dirlogs will now be in the fncache when using fncachestore,
let's also update debugrebuildfncache(). That will also allow any
existing treemanifest repos to get their dirlogs into the fncache.

Also update test-treemanifest.t to use an a directory name that
requires dot-encoding and uppercase-encoding so we test that the path
encoding works.
2016-02-04 08:34:07 -08:00
Martin von Zweigbergk
80df9c6505 verify: recover lost freeing of memory
In 0413f674179e (verify: move file cross checking to its own function,
2016-01-05), "mflinkrevs = None" was moved into function, so the
reference was cleared there, but the calling function now held on to
the variable. The point of clearing it was presumably to free up
memory, so let's move the clearing to the calling function where it
makes a difference. Also change "mflinkrevs = None" to "del
mflinkrevs", since the comment about scope now surely is obsolete.
2016-01-31 00:31:55 -08:00
Bryan O'Sullivan
1d0c7077f2 with: use context manager in verify 2016-01-15 13:14:49 -08:00
Martin von Zweigbergk
61149c5496 verify: replace "output parameters" by return values
_verifychangelog() and _verifymanifest() accept dictionaries that they
populate. We pass in empty dictionaries, so it's clearer to create
them in the functions and return them.
2016-01-05 21:25:51 -08:00
Durham Goode
20229ff6fc verify: get rid of some unnecessary local variables
Now that all the major functionality has been refactored out, we can delete some
unused local variables.
2016-01-05 17:08:14 -08:00
Durham Goode
b0d3f36941 verify: move changelog verificaiton to its own function
This makes verify more modular so extensions can hook into it.
2016-01-05 17:08:14 -08:00
Durham Goode
797ef3c770 verify: move manifest verification to its own function
This makes verify more modular, making it easier for extensions to extend.
2016-01-05 18:34:39 -08:00
Durham Goode
828f841525 verify: move file cross checking to its own function
This is part of making verify more modular so extensions can hook into it.
2016-01-05 18:31:51 -08:00
Durham Goode
04c50e28c4 verify: move filelog verification to its own function
This makes verify more modular so extensions can hook in more easily.
2016-01-05 18:28:46 -08:00
Durham Goode
6ce82430dc verify: move checkentry() to be a class function
This is part of making verify more modular so extensions can hook into it.
2016-01-05 17:08:14 -08:00
Durham Goode
14ee73b340 verify: move checklog() onto class
This is part of an effort to make verify more modular so extensions can hook
into it.
2016-01-05 17:08:14 -08:00
Matt Mackall
13d86f9294 verify: clean up weird error/warning lists
Nested functions in Python are not able to assign to variables in the
outer scope without something like the list trick because assignments
refer to the inner scope. So, we formerly used a list to give an
object to assign into.

Now that error and warning are object members, the [0] hack is no
longer needed.
2015-12-20 16:33:44 -06:00
Yuya Nishihara
ca75b0a3eb verify: remove unreachable code to reraise KeyboardInterrupt
KeyboardInterrupt should never be caught as it doesn't inherit Exception in
Python 2.5 or later. And if it was, "interrupted" would be printed twice.

https://docs.python.org/2.7/library/exceptions.html#exception-hierarchy
2015-12-20 18:38:21 +09:00
Durham Goode
1f8eab7615 verify: move exc() function onto class
This is part of an effort to make verify more modular so extensions can hook
into it.
2015-12-18 16:42:39 -08:00
Durham Goode
43f6f64559 verify: move err() to be a class function
This is part of an effort to make it easier for extensions to hook into verify.
2015-12-18 16:42:39 -08:00
Durham Goode
fc349ab32f verify: move warn() to a class level function
This is part of the effort to make verify more modular so extensions can hook
into it more easily.
2015-12-18 16:42:39 -08:00
Durham Goode
37c5c70102 verify: move fncachewarned up to a class variable
This is part of making verify more modular so hooks can extend it.
2015-12-18 16:42:39 -08:00
Durham Goode
81530280b2 verify: move widely used variables into class members
This will allow us to start moving some of the nested functions inside verify()
out onto the class.

This will allow extensions to hook into verify more easily.
2015-12-18 16:42:39 -08:00
Durham Goode
21a5133eda verify: move verify logic into a class
In order to allow extensions to hook into the verification logic more easily, we
need to refactor it into multiple functions. The first step is to move it to a
class so the shared state can be more easily accessed.
2015-12-18 16:42:39 -08:00
Augie Fackler
578db94dcb verify: add a hook that can let extensions manipulate file lists
Without a hook of this nature, narrowhg[0] clones always result in 'hg
verify' reporting terrible damage to the entire repository
history. With this hook, we can ignore files that aren't supposed to
be in the clone, and then get an accurate report of any damage present
(or not) in the repo.

0: https://bitbucket.org/Google/narrowhg
2015-11-04 12:14:18 -05:00
Pierre-Yves David
30913031d4 error: get Abort from 'error' instead of 'util'
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.

For great justice.
2015-10-08 12:55:45 -07:00
Gregory Szorc
16bb1ed421 verify: use absolute_import 2015-08-08 18:48:10 -07:00
Matt Mackall
89c8dd4cd3 censor: mark experimental option 2015-06-25 17:56:26 -05:00
Gregory Szorc
5380dea2a7 global: mass rewrite to use modern exception syntax
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".

This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.

This patch was produced by running `2to3 -f except -w -n .`.
2015-06-23 22:20:08 -07:00
Gregory Szorc
8e1cdd21f4 verify: print hint to run debugrebuildfncache
Corrupt fncache is now a recoverable operation. Inform the user how to
recover from this warning.
2015-06-20 20:11:53 -07:00
Matt Mackall
fa83df39d1 verify: clarify misleading fncache message
This is a message about cache corruption, not repository corruption or
actually missing files. Fix message and reduce to a warning.
2015-06-19 12:00:06 -05:00
Matt Mackall
9fdd0e8abe verify: add a note about a paleo-bug
In the very early days of hg, it was possible to commit /dev/null because our
patch importer was too simple. Repos from this era may still
exist, add a note about why we ignore this name.
2015-03-27 15:13:21 -05:00