The usage of getattr was unsafe. Use hgutil.safehasattr instead.
util.safehasattr has been around since Mercurial 2.0.
This also fixes the formerly disabled test in test-pull.t.
Previously we'd attempt to import every single reachable commit in the Git
object store.
The test adds another branch to the Git repo and doesn't import it until much
later. Previously we'd import it when we ran `hg -R hgrepo pull -r beta`. Now
we won't.
The return value as implemented in git_handler.fetch was pretty bogus. It used
to return the number of values that changed in the 'refs/heads/' namespace,
regardless of whether multiple values in there point to the same Mercurial
commit, or whether particular heads were even imported. Fix all of that by
using the actual heads in the changelog, just like vanilla Mercurial.
The test output changes demonstrate examples where the code was buggy.
Since Mercurial is commit-oriented, the 'no changes found' message really
should rely on what new commits are in the repo, not on new heads. This also
makes an upcoming patch much simpler.
Since everything around this code is completely broken anyway, writing a test
for this that doesn't trigger other bugs is close to impossible. An upcoming
patch will include tests.
The test output change is for an empty clone -- the output is precisely how
vanilla Mercurial treats an empty clone.
The theme of this and upcoming patches is that relying on self.git.object_store
to figure out which commits/tags/bookmarks to import is not great. This breaks
if the git repo is manually put in place (as might be done in a server-based
replication scenario), or if a partial fetch pulled too many commits in for
whatever reason. Indeed we were just about always pulling an entire pack in,
because listkeys for bookmarks currently calls fetch_pack without any
filtering. (This is probably a bug and should be fixed, but this series doesn't
do that.)
Instead, rely on whether we actually imported the commit into Mercurial to
determine whether to import the tag. This is clean, straightforward, and
clearly correct.
There is a whole series of bugs in this code that any test case for this would
hit -- an upcoming patch will include a test for all these bugs at once.
object_store.add_object doesn't check to see if the object is already in a
pack, so it is still written out in that case. Do the check ourselves before
calling add_object.
Since the Git to Mercurial conversion process is incremental, it's at risk of
missing files, or recording files the wrong way, or recording the wrong commit
metadata. Add a command called 'gverify' that can verify the contents of a
particular Mercurial rev against the corresponding Git commit.
Currently, this is limited to checking file names, flags and contents, but this
can be made as robust as desired. Further additions will probably require
refactoring git_handler.py a bit though.
This function is pretty fast: on a Linux machine with a warm cache, verifying a
repository with around 50,000 files takes just 20 seconds. There is scope for
further improvement through parallelization, but conducting tree walks in
parallel is non-trivial with the current worker infrastructure in Mercurial.
This allows other functions to be able to use the `git` property without
needing to care about initializing it.
An upcoming patch will remove the `init_if_missing` function.
Previously, we'd try to access commit.parents[0] and fail. Now, check for
commit.parents being empty and return what Mercurial thinks is a repository
root in that case.
Previously we'd just test if gitrev was falsy, which it is if the rev returned
is 0, even though it shouldn't be. With this patch, test against None
explicitly.
This unmasks another bug: see next patch for a fix and a test.
Previously we'd recompute the repo tags each time we'd consider importing a Git
tag. This is O(n^2) in the number of tags and produced noticeable slowdowns in
repos with large numbers of tags.
To fix this, compute the tags just once. This is correct because the only case
where we'd have issues is if multiple new Git tags with the same name were
introduced, which can't happen because Git tags cannot share names.
For a repository with over 200 tags, this causes a no-op hg pull to be sped up
by around 0.5 seconds.
A new property called _tagscache was introduced in Mercurial 2.0, so the cache
wasn't actually working.
The contract for tags() also changed at some point -- it stopped returning
nodes that weren't in the repo. This will need to be accounted for if we
start using the tags cache again. However, it isn't very clear whether the
Mercurial tags cache is actually worth doing, since we already have a
separate in-memory cache for Git tags in the handler.
Previously, the correctness of _handle_subrepos was based on the order the
files were processed in. For example, consider the case where a subrepo at
location 'loc' is replaced with a file at 'loc', while another subrepo exists.
This would cause .hgsubstate and .hgsub to be modified and the file added.
If .hgsubstate was seen _before_ 'loc' in the modified/added loop, then
_handle_subrepos would run and remove 'loc' correctly, before 'loc' was added
back later. If, however, .hgsubstate was seen _after_ 'loc', then
_handle_subrepos would run after 'loc' was added and would remove 'loc'.
With this patch, _handle_subrepos merely computes the changes that need to be
applied. The changes are then applied, making sure removed files and subrepos
are processed before added ones.
This was detected by setting a random PYTHONHASHSEED (in this case, 3910358828)
and running the test suite against it. An upcoming patch will randomize the
PYTHONHASHSEED in run-tests.py, just like is done in Mercurial.
Since a fresh GitHandler is no longer created for every commit, this speeds up
the {gitnode} template massively.
For a repo with over 50,000 commits, the command
hg log -l 10 --template '{gitnode}\n'
speeds up from 2.4 seconds to 0.3.
Previously we'd load the git and hg maps twice on separate git handler objects.
This avoids that.
For a repo with over 50,000 commits, this brings a no-op hg pull down from 2.45
seconds to 2.37.
Currently we call hgrepo.tags() separately for each tag. (This should be fixed
at some point.) This avoids initializing a separate git handler for each tag.
For a repository with over 150 tags, this brings down a no-op hg pull by 0.05
seconds.