Commit Graph

253 Commits

Author SHA1 Message Date
Martin von Zweigbergk
86800baf0f changegroup: don't fail on empty changegroup (API)
I don't know why applying an empty changegroup should be an error. It
seems harmless. I suspect the check was there to find code that
creates empty changegroups just because that would be wasteful. Let's
use develwarn() for that instead, so we catch any such cases that run
with our test runner, but we still allow others to generate empty
changegroups if they want to.

We have run into this check at Google once or twice and had to work
around it, but I'm changing this not so much because of that, but
because it seems like it shouldn't be an error.

I also changed the message slightly to be more modern ("changelog
group" -> "changegroup") and more generic ("received" -> "applied").
2017-06-30 23:58:59 -07:00
Martin von Zweigbergk
ed2bd4f3f3 changegroup: remove option to allow empty changegroup (API)
No caller sets the "emptyok" option, so let's remove it.
2017-07-01 00:00:09 -07:00
Pierre-Yves David
8524b4275f configitems: register the 'server.validate' config 2017-06-30 03:44:15 +02:00
Pierre-Yves David
1084baf8c5 configitems: register the 'bundle.reorder' config 2017-06-30 03:31:35 +02:00
Martin von Zweigbergk
80bc6d2e40 bundle: move combineresults() from changegroup to bundle2
The results only need to be combined if they come from a bundle2. More
importantly, we'll change its argument to a bundleoperation soon, and
then it definitely will no longer belong in changegroup.py.
2017-06-22 13:58:20 -07:00
Martin von Zweigbergk
3015c0a9a8 bundle2: record changegroup data in 'op.records' (API)
When adding support for bundling and unbundling phases, it will be
useful to have the list of added changesets. To do that, we return the
list from changegroup.apply().
2017-06-16 16:56:16 -07:00
Martin von Zweigbergk
16b7a41d93 changegroup: delete "if True" and reflow 2017-06-15 23:23:47 -07:00
Martin von Zweigbergk
560e5ce4f1 changegroup: let callers pass in transaction to apply() (API)
I think passing in the transaction makes it a little clearer and more
consistent with bundle2.
2017-06-15 22:46:38 -07:00
Martin von Zweigbergk
ed4a2d83be changegroup: inline 'publishing' variable in apply() 2017-06-19 00:06:23 -07:00
Martin von Zweigbergk
02b210363d changegroup: rename "dh" to the clearer "deltaheads"
We have a lot of frequently used abbreviations, but this is not one of
them.
2017-06-15 13:47:54 -07:00
Martin von Zweigbergk
f2408d8030 changegroup: rename "srccontent" to "cgnodes"
It's the list of nodes in the incoming changegroup, so "cgnodes" made
more sense to me.
2017-06-15 13:42:41 -07:00
Durham Goode
a73dbb6c8d changegroup: add bundlecaps back
Commit 9233182ea547d0aa removed the unused bundlecaps argument from the
changegroup code. While it is unused in core Mercurial, it was an important
feature for the remotefilelog extension because it allowed the exchange layer to
communicate to the changegroup packer that this was a shallow repo and that
filelogs should not be included. Without bundlecaps, there is currently no other
way to pass that information along without a more extensive refactor of
exchange, bundle, and changegroup code.

This patch backs out the original removal, and merges it with some recent
changes to changegroup apis.
2017-05-15 09:35:27 -07:00
Pierre-Yves David
dc1e05ed68 caches: stop warming the cache after changegroup application
Now that we garantee that branchmap cache is updated at the end of the
transaction we can drop this update. This removes a problematic case with
nested transaction where the new cache could be written on disk before the
transaction is finished (and even roll-backed)

Such premature cache write was visible in the following test:

* tests/test-acl.t
* tests/test-rebase-conflicts.t

In addition, running the cache update later means having more date about the
state of the repository (in particular: phases). So we can generate caches with
more information. This creates harmless changes to the following tests:

* tests/test-hardlinks-whitelisted.t
* tests/test-hardlinks.t
* tests/test-phases.t
* tests/test-tags.t
* tests/test-inherit-mode.t
2017-05-02 18:57:52 +02:00
Pierre-Yves David
9c635f53f5 caches: move the 'updating the branch cache' message in 'updatecaches'
We are about to remove the branchmap cache update in changegroup application.
There is a debug message alongside this update that we do not want to loose. We
move the message beforehand to simplify the test update in the next changeset.
The message move is quite noisy and isolating that noise is useful.

Most tests update are just line reordering since the message is issued at a
later point during the transaction.

After this changes, the message is displayed in more case since local commit
creation also issue it.
2017-05-02 22:27:44 +02:00
Pierre-Yves David
5149c68aae changegroup: deprecate 'getlocalchangroup' (API)
We have 'getchangegroup' with a shorter name for the exactly same feature. Now
that all users are gone we can formally deprecate it.
2017-05-04 12:43:41 +02:00
Pierre-Yves David
cc5d603ef2 changegroup: deduplicate 'getlocalchangegroup'
The two functions 'getlocalchangegroup' and 'getchangegroup' have been strictly
identical for multiple years ('getchangegroup' had a deprecated docstring)

We'll drop one of them (getlocalchangegroup, since it has the longest name).
However, we needs to migrate all users of the dropped one to the new one before
we can deprecate it. In the mean time we drop one of the duplicated definition
and the outdated docstring.
2017-05-04 12:36:45 +02:00
Martin von Zweigbergk
feebdd58e5 changegroup: delete unused 'bundlecaps' argument (API) 2017-05-02 23:47:10 -07:00
Martin von Zweigbergk
2e8637ffda merge with stable 2017-03-24 08:37:26 -07:00
Gregory Szorc
6fb64a9cca changegroup: store old heads as a set
Previously, the "oldheads" variable was a list. On a repository at
Mozilla with 46,492 heads, profiling revealed that list membership
testing was dominating execution time of applying small changegroups.

This patch converts the list of old heads to a set. This makes
membership testing significantly faster. On the aforementioned
repository with 46,492 heads:

$ hg unbundle <file with 1 changeset>
before: 18.535s wall
after:   1.303s

Consumers of this variable only check for truthiness (`if oldheads`),
length (`len(oldheads)`), and (most importantly) item membership
(`h not in oldheads` - which occurs twice). So, the change to a set
should be safe and suitable for stable.

The practical effect of this change is that changegroup application
and related operations (like `hg push`) no longer exhibit an O(n^2)
CPU explosion as the number of heads grows.
2017-03-23 19:54:59 -07:00
Remi Chaintron
6d11b9177b revlog: add 'raw' argument to revision and _addrevision
This patch introduces a new 'raw' argument (defaults to False) to revlog's
revision() and _addrevision() methods.
When the 'raw' argument is set to True, it indicates the revision data should be
handled as raw data by the flagprocessor.

Note: Given revlog.addgroup() calls are restricted to changegroup generation, we
can always set raw to True when calling revlog._addrevision() from
revlog.addgroup().
2017-01-05 17:16:07 +00:00
Pierre-Yves David
b77eb86598 changegroup: simplify logic around enabling changegroup 03
There was multiple spot that took care of adding '03' as supported changegroup
version for different condition. We gather them all in one location for
simplicity.

The 'supportedincomingversions' function is now doing nothing, but I kept it
around because it looks like a great hooking point for extension.

(Note that we should probably just get changegroup3 out of experimental now, But
that would be a patch with a much wider scope).
2016-12-19 04:25:18 +01:00
Pierre-Yves David
281f6d3210 changegroup: pass 'repo' to allsupportedversions
In the next changesets, we will introduce more logic directly related to the
repository to decide what version have to be supported. So we now directly pass
the repo object instead of just ui.
2016-12-19 04:29:33 +01:00
Pierre-Yves David
c066ecb20b changegroup: simplify 'allsupportedversions' logic
Discarding '03' to add it back is a bit strange. Instead we only discard it when
needed.
2016-12-19 04:31:13 +01:00
Stanislau Hlebik
da605718f2 cg1packer: fix compressed method
`cg1packer.compressed()` returns True even if `self._type` is 'UN'. This patch
fixes it.
2016-12-14 09:53:56 -08:00
Gregory Szorc
ce177e2b4d changegroup: use compression engines API
The new API doesn't have the equivalence for None and 'UN' so we
introduce code to use 'UN' explicitly.
2016-11-07 18:38:13 -08:00
Durham Goode
1a774a75e3 changegroup: remove remaining uses of repo.manifest
The remaining uses of repo.manifest in the changegroup module are treating the
manifest exclusively as a revlog, so let's replace them with instances of the
revlog directly.

This is part of dropping all dependencies on repo.manifest in favor of
repo.manifestlog.
2016-11-08 08:03:43 -08:00
Durham Goode
f952eca1af manifest: get rid of manifest.readshallowfast
This removes manifest.readshallowfast and converts it's one user to use
manifestlog instead.
2016-11-02 17:10:47 -07:00
Gregory Szorc
f6f301e95d changegroup: use changelogrevision()
Using offsets for accessing changelog entries isn't very readable.
As a bonus, changelog.changelogrevision() also accepts a revision,
so we don't need to perform the inline node resolution either.
2016-11-01 18:29:09 -07:00
Gregory Szorc
881f361857 changegroup: cache changelog and manifestlog outside of loop
History has taught us that repo.changelog can add significant overhead
to loops. So cache the changelog instance outside of the loop to
avoid the lookup. While we're here, do the same for manifestlog,
since each loop would otherwise initialize a new manifestlog instance.
2016-11-01 18:28:03 -07:00
Pulkit Goyal
56031921a5 py3: convert the mode argument of os.fdopen to unicodes (2 of 2) 2017-02-13 22:15:28 +05:30
Gregory Szorc
722900ff91 changegroup: increase write buffer size to 128k
By default, Python defers to the operating system for choosing the
default buffer size on opened files. On my Linux machine, the default
is 4k, which is really small for 2016.

This patch bumps the write buffer size when writing
changegroups/bundles to 128k. This matches the 128k read buffer
we already use on revlogs.

It's worth noting that this only impacts when writing to an explicit
file (such as during `hg bundle`). Buffers when writing to bundle
files via the repo vfs or to a temporary file are not impacted.

When producing a none-v2 bundle file of the mozilla-unified repository,
this change caused the number of write() system calls to drop from
952,449 to 29,788. After this change, the most frequent system
calls are fstat(), read(), lseek(), and open(). There were
2,523,672 system calls after this patch (so a net decrease of
~950k is statistically significant).

This change shows no performance change on my system. But I have a
high-end system with a fast SSD. It is quite possible this change
will have a significant impact on network file systems, where
extra network round trips due to excessive I/O system calls could
introduce significant latency.
2016-10-16 13:35:23 -07:00
Pierre-Yves David
667d10975b changegroup: skip delta when the underlying revlog do not use them
Revlog can now be configured to store full snapshot only. This is used on the
changelog. However, the changegroup packing was still recomputing deltas to be
sent over the wire.

We now just reuse the full snapshot directly in this case, skipping delta
computation. This provides use with a large speed up(-30%):

# perfchangegroupchangelog on mercurial
! wall 2.010326 comb 2.020000 user 2.000000 sys 0.020000 (best of 5)
! wall 1.382039 comb 1.380000 user 1.370000 sys 0.010000 (best of 8)

# perfchangegroupchangelog on pypy
! wall 5.792589 comb 5.780000 user 5.780000 sys 0.000000 (best of 3)
! wall 3.911158 comb 3.920000 user 3.900000 sys 0.020000 (best of 3)

# perfchangegroupchangelog on mozilla central
! wall 20.683727 comb 20.680000 user 20.630000 sys 0.050000 (best of 3)
! wall 14.190204 comb 14.190000 user 14.150000 sys 0.040000 (best of 3)

Many tests have to be updated because of the change in bundle content. All
theses update have been verified.  Because diffing changelog was not very
valuable, the resulting bundle have similar size (often a bit smaller):

# full bundle of mozilla central
with delta:    1142740533B
without delta: 1142173300B

So this is a win all over the board.
2016-10-14 01:31:11 +02:00
Gregory Szorc
eb9f859c39 changegroup: document deltaparent's choice of previous revision
As part of debugging low-level changegroup generation, I came across
what I initially thought was a weird behavior: changegroup v2 is
choosing the previous revision in the changegroup as a delta base
instead of p1. I was tempted to rewrite this to use p1, as p1
will delta better than prev in the common case. However, I realized
that taking p1 as the base would potentially require resolving a
revision fulltext and thus require more CPU for e.g. server-side
processing of getbundle requests.

This patch tweaks the code comment to note the choice of behavior.
It also notes there is room for a flag or config option to tweak
this behavior later: using p1 as the delta base would likely make
changegroups smaller at the expense of more CPU, which could be
beneficial for things like clone bundles.
2016-10-13 12:49:47 +02:00
Durham Goode
23d229132c manifest: add manifestctx.readdelta()
This adds an implementation of readdelta to the new manifestctx class and adds a
couple consumers of it. This currently appears to have some duplicate code, but
future patches cause this function to diverge when things like "shallow" are
introduced.
2016-09-13 16:25:21 -07:00
Pierre-Yves David
7775b9bfe7 computeoutgoing: move the function from 'changegroup' to 'exchange'
Now that all users are in exchange, we can safely move the code in the
'exchange' module. This function is really about processing the argument of a
'getbundle' call, so it even makes senses to do so.
2016-08-09 17:06:35 +02:00
Pierre-Yves David
1b40b7e1c5 getchangegroup: take an 'outgoing' object as argument (API)
There is various version of this function that differ mostly by the way they
define the bundled set. The flexibility is now available in the outgoing object
itself so we move the complexity into the caller themself. This will allow use
to remove a good share of the similar function to obtains a changegroup in the
'changegroup.py' module.

An important side effect is that we stop calling 'computeoutgoing' in
'getchangegroup'. This is fine as code that needs such argument processing
is actually going through the 'exchange' module which already all this function
itself.
2016-08-09 17:00:38 +02:00
Pierre-Yves David
cca37f1814 outgoing: add a 'missingroots' argument
This argument can be used instead of 'commonheads' to determine the 'outgoing'
set. We remove the outgoingbetween function as its role can now be handled by
'outgoing' itself.

I've thought of using an external function instead of making the constructor
more complicated. However, there is low hanging fruit to improve the current
code flow by storing some side products of the processing of 'missingroots'. So
in my opinion it make senses to add all this to the class.
2016-08-09 22:31:38 +02:00
Pierre-Yves David
f460bb8823 outgoing: pass a repo object to the constructor
We are to introduce more code constructing such object in the code base. It will
be more convenient to pass a repository object, all current users already
operate at the repository level anyway. More changes to the contructor argument
are coming in later changeset.
2016-08-09 15:26:53 +02:00
Gregory Szorc
aa618850dc changegroup: move branch cache debug message to proper location
Before, we logged about performing a branch cache update when we
weren't actually doing it. Fix that.
2016-08-08 22:06:07 -07:00
Augie Fackler
715ea0d1cc changegroup: use iter(callable, sentinel) instead of while True
This is functionally equivalent, but is a little more concise.
2016-08-05 13:59:58 -04:00
Gregory Szorc
9f5f743c8a discovery: move code to create outgoing from roots and heads
changegroup.changegroupsubset() contained somewhat low-level code for
constructing an "outgoing" instance from a list of roots and heads
nodes. It feels like discovery.py is a more appropriate location for
this code.

This code can definitely be optimized, as outgoing.missing will
recompute the set of changesets we've already discovered from
cl.between(). But code shouldn't be refactored during a move, so
I've simply inserted a TODO calling attention to that.
2016-08-03 22:07:52 -07:00
Gregory Szorc
4ad5f2e492 bundle2: store changeset count when creating file bundles
The bundle2 changegroup part has an advisory param saying how many
changesets are in the part. Before this patch, we were setting
this part when generating bundle2 parts via the wire protocol but
not when generating local bundle2 files.

A side effect of not setting the changeset count part is that progress
bars don't work when applying changesets. As the tests show, this
impacted clone bundles, shelve, backup bundles, `hg unbundle`, and
anything touching bundle2 files.

This patch adds a backdoor to allow us to pass state from
changegroup generation into the unbundler. We store the number
of changesets in the changegroup in this state and use it to
populate the aforementioned advisory part parameter when generating
the bundle2 bundle.

I concede that I'm not thrilled by how state is being passed in
changegroup.py (it feels a bit hacky). I would love to overhaul the
rather confusing set of functions in changegroup.py with something that
passes rich objects around instead of e.g. low-level generators.
However, given the code freeze for 3.9 is imminent, I'd rather not
undertake this endeavor right now. This feels like the easiest way
to get the parameter added to the changegroup part.
2016-07-17 15:13:51 -07:00
Martin von Zweigbergk
6612ed3d4a changegroup: don't send empty subdirectory manifest groups
When grafting/rebasing, it is common for multiple changesets to make
the same change to a subdirectory. When writing the revlog for the
directory, the revlog code already takes care of not writing the entry
again. In 3eb9fa4180d3 (changegroup: prune subdirectory dirlogs too,
2016-02-12), I added the corresponding code in changegroup (not
sending entries the client already has), but I forgot to avoid sending
the entire changegroup if no nodes remained in the pruned
set. Although that's harmless besides the wasted network traffic, the
receiving side was checking for it (copied from the changegroup code
for handling files). This resulted in the client crashing with:

  abort: received dir revlog group is empty

Fix by simply not emitting a changegroup for the directory if there
were no changes is it. This matches how files are handled.
2016-06-16 15:15:33 -07:00
Augie Fackler
8945f7f25d changegroup: extract method that sorts nodes to send
The current implementation of narrowhg needs to influence the order in
which nodes are sent to the client. adgar@ and I think this is
fixable, but it's going to require pretty substantial time investment,
so in the interim we'd like to extract this method.

I think it makes the group() code a little more obvious, as it took us
a couple of tries to isolate the exact behavior we were observing.
2016-05-12 22:29:05 -04:00
Martin von Zweigbergk
4cc86f7b27 bundle: move writebundle() from changegroup.py to bundle2.py (API)
writebundle() writes a bundle2 bundle or a plain changegroup1. Imagine
away the "2" in "bundle2.py" for a moment and this change should makes
sense. The bundle wraps the changegroup, so it makes sense that it
knows about it. Another sign that this is correct is that the delayed
import of bundle2 in changegroup goes away.

I'll leave it for another time to remove the "2" in "bundle2.py"
(alternatively, extract a new bundle.py from it).
2016-03-28 14:41:29 -07:00
Martin von Zweigbergk
daa3928fb2 changegroup: clear progress callback after changelog processing
The progress callback is replaced by one for manifests after changelog
processing is done, but let's not depend on manifests replacing the
value and instead explicitly clear it.
2016-02-29 09:26:43 -08:00
Martin von Zweigbergk
e8ad80f690 changegroup: progress for added files is not measured in "chunks"
The "prog" class cg1unpacker.apply() has the unit set to
"chunks". This is not correct for files, where the file itself is the
unit. The unit is not usually printed, which is probably why this has
not been fixed yet. It can be show with e.g. "--config
progress.format='topic number unit'".
2016-02-28 22:51:07 -08:00
Martin von Zweigbergk
8d4ca9dc03 changegroup: exclude submanifests from manifest progress
The progress callback for manifests is cleared outside of
_unpackmanifests(), which means it will remain in effect while pulling
subdirectory manifests when using treemanifests. Since the total
number of revisions used for the progress is the number of changesets,
the total number of treemanifest revisions is usually larger than
that. One effect of this is that the ETA is negative. It's hard to
estimate the number of subdirectory revisions, so let's just exclude
them from progress for now.
2016-02-28 21:15:06 -08:00
Gregory Szorc
3774b02abb changegroup: use changelog.readfiles
We have a dedicated function to get just the list of files in
a changelog entry. Use it.

This will presumably speed up changegroup application since we're
no longer decoding the entire changelog entry. But I didn't measure
the impact.
2016-02-27 23:06:05 -08:00
Martin von Zweigbergk
225ad0fad5 changegroup: drop special-casing of flat manifests
Since 37e42a0009a4 (changegroup: avoid iterating the whole manifest,
2015-12-04), the manifest linkrev callback iterates over only the
files that were touched according the the changeset. Before that
change, we iterated over all files returned in
manifest.readfast(). That method returns the files in the delta, if
the delta parent is a parent, otherwise it returns the full
manifest. Most manifest revisions end up using one of the parents as
its delta parent, so most of the time, the method returns a short
manifest. It seems that that happens often enough that it doesn't
really matter; I could not reproduce the timings reported in that
change.

Since the treemanifest code now works quite differently, and since
that code also works correctly for flat manifests, let's drop the
special-casing of flat manifests.
2016-02-22 14:43:14 -08:00