This fix caches the hashes() results for revision map. For big
repos the revision map can be huge (>30MB) and this fix is saving us
some time (~0.7s per every subsequent call) by avoiding reversing rev
dictionary multiple times.
Push operation for n commits regenerated SVNMeta class 2*n+1 times
(one time at beginning, n times in push() loop, 1 time per each of n
pulls). This operation is very costly when the revision map is big.
This commit reuses this metadata every time when there is no rebase
made between svn commits which leads to 1 metadata rebuild in optimistic
case and n+1 metadata rebuilds in pessimistic case (rebase after every commit).
To achieve this I added extra parameter to pull command to pass metadata
to it.
All unit tests are passing for this change.
We no longer need to pass the meta path since we have an internal reference to
the meta object, so we remove the parameter from the taglocations method.
Now that we have the machinery, we use the generator to define this
property. As a bonus, we no longer have to import util which saves from having
to import hgext_util.
Mercurial rev 1660b80d8083 introduced a transaction manager upstream. This
means that the closetransaction and releasetransaction methods on the pull
operation have gone away.
Code mostly based on Siddharth Agarwal's work on hg-git.
The revmap load process creates lots of tiny objects.
With just the bare minimum Mercurial runtime, loading a million-file revmap
goes from 6.83 seconds to 6.28. For longer running processes (e.g. hg push a
series of changes) the difference will probably be dramatic.
Python's GC can cause serious performance regressions if lots of small objects
are created within a function. We've empirically found that that happens while
loading the revmap.
Mercurial rev 20bb6e6b4dc5 removed localrepository.pull. We don't do it the
other way round (wrap pull if exchange.pull is available) because that's been
available with a different signature since Mercurial 3.0.
Mercurial rev 88d9d4ec499e removed localrepository.push. We don't do it the
other way round (wrap push if exchange.push is available) because that's been
available with a different signature since Mercurial 3.0.
This seems to be required on my Linux machine, but not on my Mac. I'm
not motivated enough right now to try and figure out what's going on
here, so I'm just adding it (it can't hurt, after all) and moving on
so that hgsubversion works again with hg 3.2.
We previously updated to the repository tip after pushing a revision,
presumably on the assumption that tip would be the last revision we
just pushed. This assumption is flawed for high traffic repositories.
In particular, you previsouly would sometimes end up on a completley
unrelated commit if someone else commits to a different branch in
between the time we push a revision and pull it back from the server.
This changes to instead update to the branch tip of the branch we were
on at the beginning of the push. This should be either the revision
we just pushed or a linear descendent of the revision we just pushed,
with a fair degree of reliability.
We strip the 'link ' prefix from symlinks when we store it in
Mercurial. We reapply it when we start editing the file via
open_file, but not via add_file. this means that modified symlniks
would replay correctly, but not copied and modified symlinks. This
corrects that ommission.
We alwasy fail editing for symlinks, since we strip the leading 'link '
subversion includes when storing in mercurial, and then let svn
attempt to apply deltas against the stripped version. This
unsurprisingly fails, and we write the resulting empty-string to the
Filestore for the current revision, and add the symlink in question to
the missing list to handle stupidly later.
Unfortunately, this would break down because editing adds files to the
store using their absolute path whereas missing files are added
relative to our subdir. the absolut path file appears to win, which
results in us getting a symlink whose target is the empty string.
This fixes the problem by adding missing files to the fileStore using
their absolute path.
There are a few cases where we will set a single file into to the
editor's FileStore object more than once. Notably, for copied and
then modified files, we will set it at least twice. Three times if
editing fails (which it can for symlinks).
If we pass the in-memory storage limit in between the first (or second
if editing fails) time we set the file and the last time we set the
file, we will write the data to the in memory store the first time and
the file store the last time. We didn't remove it form the in-memory
store though, and we always prefer reading from the in-memory store.
This means we can sometimes end up with the wrong version of a file.
This is fairly unlikely to happen in normal use since you need to hit
the memory limit between two writes to the store for the same file.
We only write a file multiple times if a) the file (and not one of
it's parent directories) is copied and then modified or b) editing
fails. From what I can tell, it's only common for editing to fail for
symlinks, and they ten to be relatively small data that is unlikely to
push over the limit. Finally, the default limit is 100MB which I
would expect to be most often either well over (source code) or well
under (binaries or automated changes) the size of the changes files in
a single commit.
The easiest way to reproduce this is to set the in-memory cache size
to 0 and then commit a copied and modified symlink. The empty-string
version from the failed editing will be the one that persists. I
happened to stumble upon this while trying (and failing) to test a
bug-fix for a related bug with identical symptoms (empty simlink). I
have seen this in the wild, once, but couldn't reproduce it at the
time. The repo in question is quite large and quite active, so I am
quite confident in my estimation that this is a real, but very rare,
problem.
The test changes attached to this was mneant to test a related bug,
but turned out not to actually cover the bug in question. They did
trigger this bug though, and are worthwhile to test, so I kept them.
This change is another step forward to centralize subversion configuration
options and help refactor.
I couldn't find an easier way to split up this change since there are many
interdependent function calls. There is no functionality change in this patch,
only renaming ui -> meta.
This is a temporary move to make refactoring easier. Later patches will
use util.dump for writing layout_from_config, so we need to use
util.load first (since it provides a safe fallback to reading plain
text).
In later patches, all 'from hgext_hgsubversion' statements will be
removed.
http://svn.apache.org/viewvc?view=revision&revision=1223036 exposes
what is arguably a bug in hgsubversion push code. Specifically, when
we are receiving text from the server in an editor, we prepend a "link
" to the text of symlinks when opening a file and strip it when
closing a file. We don't, however, prepend "link " to the base we use
when sending text changes to the server.
This was working before because prior to that revision, the first
thing subversion did was to check whether the entirety of the before
text or the entirety of the after text was less than 64 bytes. In
that case, it just sent the entirety of the after text as a single
insert operation. I'd expect most, but not all symlinks to fit under
the 64 byte limit, including the leading "link " text on the
subversion end.
After the change, the first thing subversion does is check for a
leading match that is more than 4 bytes long, or that is the full
length of the after text. In this case, it sends a copy operation for
the leading match, and then goes into the if < 64 bytes remaining send
the whole thing behavior. It also looks for trailing matches of more
than 4 bytes even in the <64 byte case, but that's not what breaks the
tests.
Incidentally, changing the destination of long symlinks was broken
even before this subversion change. This diff includes test additions
that cover that breakage.
Commit 87cd351dd15c in upstream Mercurial changed the checkpush function
signature. So we need to update hgsubversion accordingly.
Ran the tests against the tip of the hg repo, against a version of hg from
January before the exchange module, and against a version of hg after
pushoperations was added but before checkpush used it, and the tests passed in
all cases.
This was causing subtle failures during pull. I believe the line where
we manually "set phase to public" isn't required any more, but more
work is required to verify that behavior on all versions of hg, so
we'll do that as a followup on default if needed.
Funcationally, this is the same as before but consolidates the logic to its own
object so we later refactor all the map objects to inherit from a common base
object.
Funcationally, this is the same as before but consolidates the logic to its own
object so we later refactor all the map objects to inherit from a common base
object.
Funcationally, this is the same as before but consolidates the logic to its own
object so we later refactor all the map objects to inherit from a common base
object.
There was a slight error in the way we were checking the value of 'c' via 'if
c'. If the type of 'c' is a bool then this could incorrectly be false. Instead,
we check for None explicitly. Nothing in production should have been affected
by this yet since this was only a problem with patches not yet released.
This refactoring will help us break AuthorMaps access of global options via
ui.config allowing us to use svnmeta as the central store for reading and
writing configuration options.
Funcationally, this is the same as before but consolidates the logic to its own
object so we later refactor all the map objects to inherit from a common base
object.
Using our new generator, we factor out revmap.youngest and renamed it to the
same name as the config file 'lastpulled' because that is the name of the file
and is arguably less confusing to read.
This will allow use to unify the reading and writing of configuration options
into a central object to simplify their use sprinkled throughout the codebase.
The idea is that after this patch, we will move each option to the svnmeta
class thereby allowing us to remove lots of I/O cruft. Once the cruft is gone,
we'll refactor objects where necessary. After refactoring, we'll have a
framework for easily adding new configuration options.
Commit 87cd351dd15c in upstream Mercurial changed the checkpush function
signature. So we need to update hgsubversion accordingly.
Ran the tests against the tip of the hg repo, against a version of hg from
January before the exchange module, and against a version of hg after
pushoperations was added but before checkpush used it, and the tests passed in
all cases.
This has no effect currently but will be used in a future patch to make it
possible to create a SVNMeta object without having to load the tags file (for
use in rebuilding metadata).
Tests have been updated.
This has no effect currently but will be used in a future patch to make it
possible to create a SVNMeta object without having to load the revmap (for use
in rebuilding metadata).
We need to change both svnmeta and svncommands at the same time since they are
heavily tied together.
The reason for this change is to remove the duplicate code for reading and
writing subdir present in svncommands.py. We will now use the standard
util.dump and util.load for writing and reading.
Due to the way json reads a string, the old format is still valid for use and
will be read correctly.
We need to change both svnmeta and svncommands at the same time since they are
heavily tied together.
The reason for this change is to remove the duplicate code for reading and
writing uuid present in svncommands.py. We will now use the standard util.dump
and util.load for writing and reading.
This presents a slight change in file format. Previously, the uuid file had the
format:
d073be05-634f-4543-b044-5fe20cf6d1d6[no newline]
and after this change, it is:
"d073be05-634f-4543-b044-5fe20cf6d1d6"[newline optional]
Due to the way json reads a string, the old format is still valid for use and
will be read correctly.
This parameter is needed as a stopgap so that tests can use the common
util.load method without having to change the format of the file. It isn't used
now but will be in upcoming patches.
Currently, this will do nothing since no part of hgsubversion writes json but
that will happen in a future patch. The goal of this is to move away from
pickle completely but fallback to reading pickle if json fails.
These functions are for future patches that will add safer serialization via
json. '_convert' is a visitor pattern that will be used for lists,
dictionaries, and strings for helping convert None to the empty string since
json forbids 'null' as a key for a dictionary.
None -> '' is a safe mapping because this is for the 'branch_info' variable
which already maps the empty string to None.
Note, also, that json is chosen instead of, say, csv because json has a concept
of 'null' and will better handle utf8 strings (which subversion supports).
Important: this changes the requirement of hgsubversion to python 2.6+.
This change was introduced in 82dea72986d7 and fixed importing issues for
mercurial < 2.8. Unfortunately, this broke imports for newer versions of
mercurial that have hgsubversion installed in sys.path.
We now wrap the import in a try-block to catch this ImportError.
Mercurial extensions are a bit weird: they aren't normally in
sys.path, so you can't assume that "import hgsubversion" works.
Luckily, Mercurial sneaks a little treat into sys.modules so that
"import hgext_hgsubversion" does work. In fact, to get things working
*as a Mercurial extension*, all that's needed is that trivial change
to two import lines, in layouts/detect.py and layouts/standard.py.
Unfortunately, hgsubversion is also imported as a Python module, in
its own test suite. In that context, there is no "hgext_" trick --
unless we do it in ourselves, which I've done in TestBase.setUp().
That would work fine ... except that test_util imports from
hgsubversion, which ends up importing hgsubversion.layouts.{detect,standard},
which want the "hgext_" trick to work. But it hasn't been done yet
when we're still importing; it doesn't happen until setUp() runs.
So make those two imports happen late, in the functions that need them.
Incidentally, this is only necessary to support Mercurial <= 2.7.
Mercurial got a bit smarter in 2.8:
http://selenic.com/repo/hg/rev/284a000c67bf
Prior to this diff, we would either crash, or continue past the
replacement without actually recording the change. This could lead to
later failing varify if the state before and after weren't identical.