This is similar to "revlog: use raw revisions in revdiff". revdiff()
generates raw text used in revlog directly.
This makes test-flagprocessor.t happy.
Similar to fixes in revlog.py, this patch uses "rawtext" to explicitly label
contents expected to be raw, and makes sure content stored in _cache is raw
text.
Now test-flagprocessor.t points us to another issue.
"baserevision" returns the text that will be used to apply deltas. Since
deltas are against raw texts, "baserevision" should return raw text.
Now test-flagprocessor.t points us to a new error.
Now that the 'vfs' classes moved in their own module, lets use the new module
directly. We update code iteratively to help with possible bisect needs in the
future.
os.fdopen() does not accepts bytes as its second argument which represent the
mode in which the file is to be opened. This patch makes sure unicodes are
passed in py3 by using pycompat.sysstr().
Add the ability for revlog objects to process revision flags and apply
registered transforms on read/write operations.
This patch introduces:
- the 'revlog._processflags()' method that looks at revision flags and applies
flag processors registered on them. Due to the need to handle non-commutative
operations, flag transforms are applied in stable order but the order in which
the transforms are applied is reversed between read and write operations.
- the 'addflagprocessor()' method allowing to register processors on flags.
Flag processors are defined as a 3-tuple of (read, write, raw) functions to be
applied depending on the operation being performed.
- an update on 'revlog.addrevision()' behavior. The current flagprocessor design
relies on extensions to wrap around 'addrevision()' to set flags on revision
data, and on the flagprocessor to perform the actual transformation of its
contents. In the lfs case, this means we need to process flags before we meet
the 2GB size check, leading to performing some operations before it happens:
- if flags are set on the revision data, we assume some extensions might be
modifying the contents using the flag processor next, and we compute the
node for the original revision data (still allowing extension to override
the node by wrapping around 'addrevision()').
- we then invoke the flag processor to apply registered transforms (in lfs's
case, drastically reducing the size of large blobs).
- finally, we proceed with the 2GB size check.
Note: In the case a cachedelta is passed to 'addrevision()' and we detect the
flag processor modified the revision data, we chose to trust the flag processor
and drop the cachedelta.
This patch introduces a new 'raw' argument (defaults to False) to revlog's
revision() and _addrevision() methods.
When the 'raw' argument is set to True, it indicates the revision data should be
handled as raw data by the flagprocessor.
Note: Given revlog.addgroup() calls are restricted to changegroup generation, we
can always set raw to True when calling revlog._addrevision() from
revlog.addgroup().
Now that all the functionality has been moved to manifestlog/manifestrevlog/etc,
we can finally change all the uses of repo.manifest to use the new versions. A
future diff will then delete repo.manifest.
One additional change in this commit is to change repo.manifestlog to be a
@storecache property instead of @property. This is required by some uses of
repo.manifest require that it be settable (contrib/perf.py and the static http
server). We can't do this in a prior change because we can't use @storecache on
this until repo.manifest is no longer used anywhere.
A future patch will be moving manifest creation to be inside manifestlog as part
of improving our cache guarantees. bundlerepo and unionrepo currently rely on
being able to hook into manifest creation, so let's temporarily move the actual
manifest creation to a helper function for them to intercept.
In the future manifest.manifest() will disappear entirely and this can
disappear.
In a future patch we will change manifestctx and treemanifestctx to no longer
derive from manifestdict and treemanifest, respectively. This means that
consumers of the _mancache will now need to be aware of the different between
the two, until we get rid of the manifest entirely and the _mancache becomes
only filled with ctxs.
This fixes one case of it that can be fixed by using the other cache. Future
patches will address the others uses using the upcoming manifestctx.read()
function.
In a future patch we will change manifestctx and treemanifestctx to no longer
derive from manifestdict and treemanifest, respectively. This means that
consumers of the _mancache will now need to be aware of the different between
the two, until we get rid of the manifest entirely and the _mancache becomes
only filled with ctxs.
The old manifest cache cached both the inmemory representation and the raw text.
As part of the manifest refactor we want to separate the storage format from the
in memory representation, so let's split this cache into two caches.
This will let other manifest implementations participate in the in memory cache,
while allowing the revlog based implementations to still depend on the full text
caching where necessary.
This is a little messier than I'd like, and I'll probably come back
and do some more refactoring later, but as it is this unblocks
narrowhg. An alternative approach (which I may do as part of the
mentioned refactoring) would be to construct *all* dirlog instances up
front, so that we don't have to keep track of the linkmapper
method. This would avoid a reference cycle between the bundlemanifest
and the bundlerepository, but I was hesitant to do all the work up
front like that.
With this change, it's possible to do 'hg incoming' and 'hg pull' from
bundles in .hg/strip-backup in a treemanifest repository. Sadly, this
doesn't make it possible to 'hg clone' one of those (if you do 'hg
strip 0'), because the cg3 in the bundle gets written without a
treemanifest flag. Since that's going to be an involved refactor in a
different part of the code (which I *suspect* won't touch any of the
code I've just written here), let's leave it as an idea for Later.
This moves us to the modern iter() technique instead of the `while
True` pattern since it's easy. Factored out as a function because I'm
about to need this in a second place.
The iter() builtin has a neat pattern where you give it a callable of
no arguments and a sentinel value, and you can then loop over the
function calls like a normal iterator. This cleans up the code a
little.
The iter() builtin has a neat pattern where you give it a callable of
no arguments and a sentinel value, and you can then loop over the
function calls like a normal iterator. This cleans up the code a
little.
All users are migrated to 'devel.legacy.exchange', we can clean up the
experimental namespace.
Marking as (BC) because I know some large installation have bundle2 off and I
want to make sure they notice the change.
Now its done silently, so unless user really knows what he is doing
will be suprised to find that after update 'hg status' doesn't work.
This commit makes also merge operation warns about missing parent when
revision to merge exists only in the bundle.
writebundle() writes a bundle2 bundle or a plain changegroup1. Imagine
away the "2" in "bundle2.py" for a moment and this change should makes
sense. The bundle wraps the changegroup, so it makes sense that it
knows about it. Another sign that this is correct is that the delayed
import of bundle2 in changegroup goes away.
I'll leave it for another time to remove the "2" in "bundle2.py"
(alternatively, extract a new bundle.py from it).
The bundlerepository have to do some special magic to handle linkrev of the
bundled manifest. That logic was done from a repoview and obsolescence marker
affecting bundled changeset could lead to a crash. We now ensure we operate on
unfiltered repository.
The bundlerepository have to do some special magic to handle linkrev of the
bundlerepo filerev. That logic was done from a repoview and obsolescence marker
affecting bundled changeset could lead to a crash. We now ensure we operate on
unfiltered repository.
In b89de5ee5b31 (changegroup: don't support versions 01 and 02 with
treemanifests, 2016-01-19), I stopped supporting use of cg1 and cg2
with treemanifest repos. What I had not considered was that it's
perfectly safe to pull *to* a treemanifest repo using any changegroup
version. As reported in issue5066, I therefore broke pull from old
repos into a treemanifest repo. It was not covered by the test case,
because that pulled from a local repo while enabling treemanifests,
which enabled treemanifests on the source repo as well. After
switching to pulling via HTTP, it breaks.
Fix by splitting up changegroup.supportedversions() into
supportedincomingversions() and supportedoutgoingversions().
Remotefilelog overrides changegroup._addchangegroupfiles(), assuming
it is about files, which seems like a natural assumption. However, in
changegroup3, directory manifests are sent in the files section of the
changegroup. These naturally make remotefilelog unhappy.
The fact that the directories are not separated from the files
(although they do come before the files) also makes server.validate
harder to implement. Since we read one chunk at a time from the steam,
once we have found a file (non-directory) entry in the stream, we
would have to push the read data back into the stream, or otherwise
refactor the code. It will be easier if we add an empty chunk after
all directory manifests.
This change adds that empty chunk, although we don't yet take
advantage of it on the reading side. We will soon move the tree
manifest stuff out of _addchangegroupfiles() and into
_unpackmanifests().
Before this bundle repository were unable to work with compressed
bundle2. We use the same approach as with bundle1, we extract the
changegroup in uncompressed form into a temporary file.
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.
For great justice.
Incoming was using bundle1 in all cases, as bundle1 is restricted to
changegroup1 and does not support general delta, this can lead to significant
CPU overhead if the server is using general delta storage. We now properly
request and store a bundle2 to disk.
If the server include any output or error in the bundle, they will be stored on
disk and replayed when the bundle is read. As 'hg incoming' is going to read the
bundle right away, we call that 'good' enough and go back to the bigger plan of
having general delta on by default.
This was tracked as 4864
When looking up a base revision, we were ignoring the contents that were already
available in the manifest's _mancache. This patch allows us to use that data
instead of reading from the revlog.
This is useful in our pushrebase extension (which allows rebasing on the server
side during a push) because it allows us to prefetch the bundle base manifest
before aquiring the repo lock (1 second saving), which means doing less work inside the lock,
which means a 20% higher commit rate.
Revision d6a8b4e28635 (filelog: add file function to open other
filelogs, 2011-05-10) added a _file() method to revlog, which also
required a 'repo' parameter to be added to bundlefilelog's
constructor. The _file() method was then removed in 55fa3c487b0f
(filelog: remove unused _file method, 2015-01-22), which made the
constructor parameter unused, so let's remove that too.
This avoids the following error that happened if base revision of bundle file
was hidden. bundlerevlog needs it to construct revision texts from bundle
content as revlog.revision() does.
File "mercurial/context.py", line 485, in _changeset
return self._repo.changelog.read(self.rev())
File "mercurial/changelog.py", line 319, in read
text = self.revision(node)
File "mercurial/bundlerepo.py", line 124, in revision
text = self.baserevision(iterrev)
File "mercurial/bundlerepo.py", line 160, in baserevision
return changelog.changelog.revision(self, nodeorrev)
File "mercurial/revlog.py", line 1041, in revision
node = self.node(rev)
File "mercurial/changelog.py", line 211, in node
raise error.FilteredIndexError(rev)
mercurial.error.FilteredIndexError: 1
Since Python 2.7.9, "os.path.join(path, '')" doesn't add "os.sep" at
the end of UNC path (see issue4557 for detail).
This makes bundlerepo incorrectly work, if:
1. cwd is the root of UNC share (e.g. "\host\share"), and
2. mainreporoot is near cwd (e.g. "\host\sharefoo\repo")
- host of UNC path is same as one of cwd
- share of UNC path starts with one of cwd
3. "repopath" isn't specified in bundle URI
(e.g. "bundle:bundlefile" or just "bundlefile")
For example:
$ hg --cwd \host\share -R \host\sharefoo\repo incoming bundle
In this case:
- os.path.join(r"\host\share", "") returns r"\host\share",
- r"\host\sharefoo\repo".startswith(r"\host\share") returns True, then
- r"foo\repo" is treated as repopath of bundlerepo instead of
r"\host\sharefoo\repo"
This causes failure of combining "\host\sharefoo\repo" and bundle
file: in addition to it, "\host\share\foo\repo" may be combined with
bundle file, if it accidentally exists.
This patch uses "pathutil.normasprefix()" to ensure "os.sep" at the
end of cwd safely, even with some problematic encodings, which use
0x5c (= "os.sep" on Windows) as the tail byte of some multi-byte
characters.
BTW, normalization before "pathutil.normasprefix()" isn't needed in
this case, because "os.getcwd()" always returns normalized one.
It is finally time to freeze the bundle2 format! To do so we:
- rename HG2Y to HG20,
- drop "b2x:" prefix from all part names,
- rename capability to "bundle2-exp" to "bundle2"
- rename the hook flag from 'bundle2-exp' to 'bundle2'
Although Python supports `X = Y if COND else Z`, this was only
introduced in Python 2.5. Since we have to support Python 2.4, it was
a very common thing to write instead `X = COND and Y or Z`, which is a
bit obscure at a glance. It requires some intricate knowledge of
Python to understand how to parse these one-liners.
We change instead all of these one-liners to 4-liners. This was
executed with the following perlism:
find -name "*.py" -exec perl -pi -e 's,(\s*)([\.\w]+) = \(?(\S+)\s+and\s+(\S*)\)?\s+or\s+(\S*)$,$1if $3:\n$1 $2 = $4\n$1else:\n$1 $2 = $5,' {} \;
I tweaked the following cases from the automatic Perl output:
prev = (parents and parents[0]) or nullid
port = (use_ssl and 443 or 80)
cwd = (pats and repo.getcwd()) or ''
rename = fctx and webutil.renamelink(fctx) or []
ctx = fctx and fctx or ctx
self.base = (mapfile and os.path.dirname(mapfile)) or ''
I also added some newlines wherever they seemd appropriate for readability
There are probably a few ersatz ternary operators still in the code
somewhere, lurking away from the power of a simple regex.
For bundlerepo to work with bundle2 files, we need to find the part that
contains the bundle's changegroup data and work with that instead of the
entire bundle. Future work can add separate processing for other bundle2
parts.