sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-10 00:45:18 +03:00

Author	SHA1	Message	Date
Pierre-Yves David	7775b9bfe7	computeoutgoing: move the function from 'changegroup' to 'exchange' Now that all users are in exchange, we can safely move the code in the 'exchange' module. This function is really about processing the argument of a 'getbundle' call, so it even makes senses to do so.	2016-08-09 17:06:35 +02:00
Pierre-Yves David	1b40b7e1c5	getchangegroup: take an 'outgoing' object as argument (API) There is various version of this function that differ mostly by the way they define the bundled set. The flexibility is now available in the outgoing object itself so we move the complexity into the caller themself. This will allow use to remove a good share of the similar function to obtains a changegroup in the 'changegroup.py' module. An important side effect is that we stop calling 'computeoutgoing' in 'getchangegroup'. This is fine as code that needs such argument processing is actually going through the 'exchange' module which already all this function itself.	2016-08-09 17:00:38 +02:00
Pierre-Yves David	cca37f1814	outgoing: add a 'missingroots' argument This argument can be used instead of 'commonheads' to determine the 'outgoing' set. We remove the outgoingbetween function as its role can now be handled by 'outgoing' itself. I've thought of using an external function instead of making the constructor more complicated. However, there is low hanging fruit to improve the current code flow by storing some side products of the processing of 'missingroots'. So in my opinion it make senses to add all this to the class.	2016-08-09 22:31:38 +02:00
Pierre-Yves David	f460bb8823	outgoing: pass a repo object to the constructor We are to introduce more code constructing such object in the code base. It will be more convenient to pass a repository object, all current users already operate at the repository level anyway. More changes to the contructor argument are coming in later changeset.	2016-08-09 15:26:53 +02:00
Gregory Szorc	aa618850dc	changegroup: move branch cache debug message to proper location Before, we logged about performing a branch cache update when we weren't actually doing it. Fix that.	2016-08-08 22:06:07 -07:00
Augie Fackler	715ea0d1cc	changegroup: use `iter(callable, sentinel)` instead of while True This is functionally equivalent, but is a little more concise.	2016-08-05 13:59:58 -04:00
Gregory Szorc	9f5f743c8a	discovery: move code to create outgoing from roots and heads changegroup.changegroupsubset() contained somewhat low-level code for constructing an "outgoing" instance from a list of roots and heads nodes. It feels like discovery.py is a more appropriate location for this code. This code can definitely be optimized, as outgoing.missing will recompute the set of changesets we've already discovered from cl.between(). But code shouldn't be refactored during a move, so I've simply inserted a TODO calling attention to that.	2016-08-03 22:07:52 -07:00
Gregory Szorc	4ad5f2e492	bundle2: store changeset count when creating file bundles The bundle2 changegroup part has an advisory param saying how many changesets are in the part. Before this patch, we were setting this part when generating bundle2 parts via the wire protocol but not when generating local bundle2 files. A side effect of not setting the changeset count part is that progress bars don't work when applying changesets. As the tests show, this impacted clone bundles, shelve, backup bundles, `hg unbundle`, and anything touching bundle2 files. This patch adds a backdoor to allow us to pass state from changegroup generation into the unbundler. We store the number of changesets in the changegroup in this state and use it to populate the aforementioned advisory part parameter when generating the bundle2 bundle. I concede that I'm not thrilled by how state is being passed in changegroup.py (it feels a bit hacky). I would love to overhaul the rather confusing set of functions in changegroup.py with something that passes rich objects around instead of e.g. low-level generators. However, given the code freeze for 3.9 is imminent, I'd rather not undertake this endeavor right now. This feels like the easiest way to get the parameter added to the changegroup part.	2016-07-17 15:13:51 -07:00
Martin von Zweigbergk	6612ed3d4a	changegroup: don't send empty subdirectory manifest groups When grafting/rebasing, it is common for multiple changesets to make the same change to a subdirectory. When writing the revlog for the directory, the revlog code already takes care of not writing the entry again. In 3eb9fa4180d3 (changegroup: prune subdirectory dirlogs too, 2016-02-12), I added the corresponding code in changegroup (not sending entries the client already has), but I forgot to avoid sending the entire changegroup if no nodes remained in the pruned set. Although that's harmless besides the wasted network traffic, the receiving side was checking for it (copied from the changegroup code for handling files). This resulted in the client crashing with: abort: received dir revlog group is empty Fix by simply not emitting a changegroup for the directory if there were no changes is it. This matches how files are handled.	2016-06-16 15:15:33 -07:00
Augie Fackler	8945f7f25d	changegroup: extract method that sorts nodes to send The current implementation of narrowhg needs to influence the order in which nodes are sent to the client. adgar@ and I think this is fixable, but it's going to require pretty substantial time investment, so in the interim we'd like to extract this method. I think it makes the group() code a little more obvious, as it took us a couple of tries to isolate the exact behavior we were observing.	2016-05-12 22:29:05 -04:00
Martin von Zweigbergk	4cc86f7b27	bundle: move writebundle() from changegroup.py to bundle2.py (API) writebundle() writes a bundle2 bundle or a plain changegroup1. Imagine away the "2" in "bundle2.py" for a moment and this change should makes sense. The bundle wraps the changegroup, so it makes sense that it knows about it. Another sign that this is correct is that the delayed import of bundle2 in changegroup goes away. I'll leave it for another time to remove the "2" in "bundle2.py" (alternatively, extract a new bundle.py from it).	2016-03-28 14:41:29 -07:00
Martin von Zweigbergk	daa3928fb2	changegroup: clear progress callback after changelog processing The progress callback is replaced by one for manifests after changelog processing is done, but let's not depend on manifests replacing the value and instead explicitly clear it.	2016-02-29 09:26:43 -08:00
Martin von Zweigbergk	e8ad80f690	changegroup: progress for added files is not measured in "chunks" The "prog" class cg1unpacker.apply() has the unit set to "chunks". This is not correct for files, where the file itself is the unit. The unit is not usually printed, which is probably why this has not been fixed yet. It can be show with e.g. "--config progress.format='topic number unit'".	2016-02-28 22:51:07 -08:00
Martin von Zweigbergk	8d4ca9dc03	changegroup: exclude submanifests from manifest progress The progress callback for manifests is cleared outside of _unpackmanifests(), which means it will remain in effect while pulling subdirectory manifests when using treemanifests. Since the total number of revisions used for the progress is the number of changesets, the total number of treemanifest revisions is usually larger than that. One effect of this is that the ETA is negative. It's hard to estimate the number of subdirectory revisions, so let's just exclude them from progress for now.	2016-02-28 21:15:06 -08:00
Gregory Szorc	3774b02abb	changegroup: use changelog.readfiles We have a dedicated function to get just the list of files in a changelog entry. Use it. This will presumably speed up changegroup application since we're no longer decoding the entire changelog entry. But I didn't measure the impact.	2016-02-27 23:06:05 -08:00
Martin von Zweigbergk	225ad0fad5	changegroup: drop special-casing of flat manifests Since 37e42a0009a4 (changegroup: avoid iterating the whole manifest, 2015-12-04), the manifest linkrev callback iterates over only the files that were touched according the the changeset. Before that change, we iterated over all files returned in manifest.readfast(). That method returns the files in the delta, if the delta parent is a parent, otherwise it returns the full manifest. Most manifest revisions end up using one of the parents as its delta parent, so most of the time, the method returns a short manifest. It seems that that happens often enough that it doesn't really matter; I could not reproduce the timings reported in that change. Since the treemanifest code now works quite differently, and since that code also works correctly for flat manifests, let's drop the special-casing of flat manifests.	2016-02-22 14:43:14 -08:00
Martin von Zweigbergk	58c3ff9aaf	changegroup: fix treemanifests on merges The current code for generating treemanifest revisions takes the list of files in the changeset and finds the directories from them. This does not work for merges, since a merge may pick file A from one side and file B from another and neither of them would appear in the changeset's "files" list, but the manifest would still change. Fix this by instead walking the root manifest log for all needed revisions, storing all needed file and subdirectory revisions, then recursively visiting the subdirectories. This also turns out to be faster: cloning a version of hg core converted to treemanifests went from ~28s to ~19s (timing somewhat unfair: before this patch, timed until crash; after this patch, timed until manifests complete). The new algorithm is used only on treemanifest repos. Although it works equally well on flat manifests, we leave the iteration over files in the changeset for flat manifests for now.	2016-02-12 23:09:09 -08:00
Martin von Zweigbergk	c1d77f8a77	changegroup: write root manifests and subdir manifests in a single loop This is another step towards making the manifest generation recurse along the directory trees. The loop over 'tmfnodes' now takes the form of a queue. At this point, we only add to the queue twice: we add the root manifests, and, while visiting the root manifest revisions, we add all subdirectory revisions (for treemanifest repos). Thus, any iterations over 'tmfnodes' after the first will not add any items and the "queue" will just keep shrinking.	2016-02-12 23:30:18 -08:00
Martin von Zweigbergk	58d674fbd1	changegroup: introduce makelookupmflinknode(dir) This is another step towards making the manifest generation recurse along the directory trees. It makes the two calls to _packmanifests() more similar.	2016-02-12 23:26:15 -08:00
Martin von Zweigbergk	9381017496	changegroup: prune subdirectory dirlogs too We already prune changesets, root manifests and files whose linkrev is in the set of common revisions. We should do the same for dirlogs.	2016-02-12 21:21:28 -08:00
Martin von Zweigbergk	3761b3f9e7	changegroup: include subdirectory manifests in verbose size When verbose logging is one, we report the size in bytes of the manifest data in the changegroup. For files, we report the size per file, but I'm not sure we need that level of detail (i.e. size per directory manifest). Instead, report a single figure for the size of root manifest plus submanifests.	2016-02-12 15:42:16 -08:00
Martin von Zweigbergk	cd0a0297ee	changegroup: make _packmanifests() dumber The next few patches will rewrite the manifest generation code to work with merges. We will then walk dirlogs recursively. This prepares for that by moving much of the treemanifest code out of _packmanifests() and into generatemanifests(). For this to work, it also adds _manifestsdone() method that returns the "end of manifests" close chunk for cg3 and an empty string for cg1 and cg2.	2016-02-12 15:18:56 -08:00
Martin von Zweigbergk	fb3a96fcf4	changegroup: extract generatemanifests() The changegroup.generate() function is pretty long, so let's extract the manifest generation part of it.	2016-02-11 20:19:48 -08:00
Martin von Zweigbergk	86ca76bafe	changegroup: fix pulling to treemanifest repo from flat repo (issue5066) In b89de5ee5b31 (changegroup: don't support versions 01 and 02 with treemanifests, 2016-01-19), I stopped supporting use of cg1 and cg2 with treemanifest repos. What I had not considered was that it's perfectly safe to pull to a treemanifest repo using any changegroup version. As reported in issue5066, I therefore broke pull from old repos into a treemanifest repo. It was not covered by the test case, because that pulled from a local repo while enabling treemanifests, which enabled treemanifests on the source repo as well. After switching to pulling via HTTP, it breaks. Fix by splitting up changegroup.supportedversions() into supportedincomingversions() and supportedoutgoingversions().	2016-01-27 09:07:28 -08:00
Augie Fackler	db82034373	changegroup: fix treemanifest exchange code (issue5061) There were two mistakes: one was accidental reuse of the fclnode variable from the loop gathering file nodes, and the other (masked by that bug) was not correctly handling deleted directories. Both cases are now fixed and the test passes.	2016-01-27 10:24:25 -05:00
Martin von Zweigbergk	c28812c552	shelve: use cg3 for treemanifests Similar to previous change, this teaches shelve to pick the right changegroup version for repos that use treemanifests.	2016-01-19 15:37:07 -08:00
Martin von Zweigbergk	4208c8682a	changegroup: introduce safeversion() In a few places (at least repair.py and shelve.py), we want to find the best changegroup version that we can assume users of the repo will understand. For example, we choose version 01 by default, but if it's a generaldelta repo, we expect clients to support version 02 anyway, so we choose that for new bundles (for e.g. "hg strip"). Let's create a helper for this functionality in changegroup, so we can reuse it elsewhere later.	2016-01-19 15:32:32 -08:00
Martin von Zweigbergk	fb1b7626e4	changegroup: don't support versions 01 and 02 with treemanifests Since it would be terribly expensive to convert between flat manifests and treemanifests, we have decided to simply not support changegroup version 01 and 02 with treemanifests. Therefore, let's stop announcing that we support these versions on treemanifest repos. Note that this means that older clients that try to clone from a treemanifest repo will fail. What happens is that the server, after this patch, finds that there are no common versions and raises "ValueError: no common changegroup version". This results in "abort: HTTP Error 500: Internal Server Error" on the client. Before this patch, it was no better: The server would instead find that there were directory manifest nodes to put in the changegroup 01 or 02 and raise an AssertionError on changegroup.py#668 (assert not tmfnodes), which would also appear as a 500 to the client.	2016-01-19 14:27:18 -08:00
Martin von Zweigbergk	2e9366a5ee	changegroup: cg3 has two empty groups after manifests changegroup.getchunks() determines the end of the stream by looking for an empty chunk group (two consecutive empty chunks). It ignores empty groups in the first two groups. Changegroup 3 introduced an empty chunk between the manifests and the files, which confuses getchunks(). Since it comes after the first two, getchunks() will stop there. Fix by rewriting getchunks so it first counts two groups (empty or not) and then keeps antostarts counting empty groups. With this counting, changegroup 1 and 2 have exactly one empty group after the first two groups, while changegroup 3 has two (one for directories and one for files). It's a little hard to test this at this point, but I have verified that this patch fixes narrowhg (which was broken before this patch). Also, future patches will fix "hg strip" with treemanifests, and once that's done, getchunks() will be tested through tests of "hg strip".	2016-01-19 17:44:25 -08:00
Bryan O'Sullivan	337c3199e2	with: use context manager for transaction in changegroup apply (This needs some line wrapping due to the additional indent level. -mpm)	2016-01-15 13:14:50 -08:00
Martin von Zweigbergk	d9bf44d310	changegroup3: move treemanifest support into _unpackmanifests() By putting the treemanifest code in _unpackmanifests(), _addchangegroupfiles() will only be about files again, and we get a nice symmetry between _packmanifests() and _unpackmanifest(). The immediate benefit to me is that remotefilelog should not need to be updated to work with treemanifests. It should also make server.validate and progress output easier to get right. Probably bundlerepo too.	2016-01-08 16:12:58 -08:00
Martin von Zweigbergk	87d65b1188	changegroup3: add empty chunk separating directories and files Remotefilelog overrides changegroup._addchangegroupfiles(), assuming it is about files, which seems like a natural assumption. However, in changegroup3, directory manifests are sent in the files section of the changegroup. These naturally make remotefilelog unhappy. The fact that the directories are not separated from the files (although they do come before the files) also makes server.validate harder to implement. Since we read one chunk at a time from the steam, once we have found a file (non-directory) entry in the stream, we would have to push the read data back into the stream, or otherwise refactor the code. It will be easier if we add an empty chunk after all directory manifests. This change adds that empty chunk, although we don't yet take advantage of it on the reading side. We will soon move the tree manifest stuff out of _addchangegroupfiles() and into _unpackmanifests().	2016-01-11 15:10:31 -08:00
Martin von Zweigbergk	63c15f247e	changegroup3: introduce experimental.changegroup3 boolean config In order to give us the freedom to change the changegroup3 format, let's hide it behind an experimental config. Since it is required by treemanifests, that will override the cg3 config.	2016-01-12 21:23:45 -08:00
Martin von Zweigbergk	e5bd6473b3	changegroup: hide packermap behind methods This is to prepare for hiding changegroup3 behind a config option.	2016-01-12 21:01:06 -08:00
Mateusz Kwapich	6688b1c845	hooks: add HG_NODE_LAST to txnclose and changegroup hook environments Sometimes a txnclose or changegroup hook wants to iterate through all the changesets in transaction: in that situation usually the revset `$HG_NODE:` is used to select the revisions. Unfortunately this revset sometimes may contain too many changesets because we don't have the write lock while the hook runs newer changes may be added to repository in the meantime. That's why there is a need for extra variable carrying the information about the last change in the transaction.	2016-01-05 17:37:59 -08:00
Martin von Zweigbergk	fafdf90374	changegroup: remove now-unused 'wasempty' variable and parameter	2016-01-08 21:14:08 -08:00
Martin von Zweigbergk	417363259e	treemanifests: set bundle2 part parameter indicating treemanifest By adding a mandatory 'treemanifest' parameter in the bundle2 part, we make it possible for the recipient to set repo requirements before the manifest revlog is accessed.	2016-01-08 21:13:06 -08:00
Martin von Zweigbergk	88327fd798	changegroup: don't add a second trailing '/' in dir name The paths given from treemanifest.dir() already contains the trailing slash.	2016-01-08 14:47:02 -08:00
Martin von Zweigbergk	ed1140692c	changegroup: remove left-over debugging help	2016-01-08 14:33:13 -08:00
Mike Edgar	44af48ee4a	changegroup: add flags field to cg3 delta header This lets revlog flags be transmitted over the wire. Right now this is useful for censored nodes and for narrowhg's ellipsis nodes.	2015-12-14 15:55:12 -05:00
Augie Fackler	d33d6a0cb5	changegroup: introduce cg3, which has support for exchanging treemanifests I'm not entirely happy with using a trailing / on a "file" entry for transferring a treemanifest. We've discussed putting some flags on each file header[0], but I'm unconvinced that's actually any better: if we were going to add another feature to the cg format we'd still be doing a version bump anyway to cg4, so I'm inclined to not spend time coming up with a more sophisticated format until we actually know what the next feature we want to stuff in a changegroup will be. Test changes outside test-treemanifest.t are only due to the new CG3 bundlecap showing up in the wire protocol. Many thanks to adgar@google.com and martinvonz@google.com for helping me with various odd corners of the changegroup and treemanifest API. 0: It's not hard refactoring, nor is it a lot of work. I'm just disinclined to do speculative work when it's not clear what the customer would actually be.	2015-12-11 11:23:49 -05:00
Augie Fackler	f675aea41b	changegroup: restate file linknode callback using generator expressions I think this is slightly clearer, and it nicely avoids an extra nested function.	2015-12-04 11:39:03 -05:00
Augie Fackler	7ee8b9a4d3	changegroup: clean up file lookup function One case is basically degenerate, so just extract it and make the function clearer.	2015-12-04 11:38:02 -05:00
Augie Fackler	53ca8538c0	changegroup: remove one special case from lookupmflinknode In the fastpathlinkrev case, lookupmflinknode was a very complicated way of saying mfs.__getitem__, so let's just get that case out of our way so it's easier to understand what's going on.	2015-12-04 10:55:46 -05:00
Augie Fackler	4e80790b8d	changegroup: drop 'if True' that made the previous change clearer	2015-12-04 10:35:45 -05:00
Augie Fackler	c3a36c8116	changegroup: avoid iterating the whole manifest The old code gathered the list of all files that changed anywhere in history and then gathered changed file nodes by walking the entirety of each manifest to be sent in order to gather changed file nodes. That's going to be unfortunate for narrowhg, and it's already inefficient for medium-to-large repositories. Timings for bundle --all on my hg repo, tested with hgperf: Before: ! wall 23.442445 comb 23.440000 user 23.250000 sys 0.190000 (best of 3) After: ! wall 20.272187 comb 20.270000 user 20.190000 sys 0.080000 (best of 3)	2015-12-04 10:34:58 -05:00
Augie Fackler	514dae67c6	changegroup: document manifest linkrev callback some more Martin and I just got super-confused reading some code here, so I think it's time for some more documentation.	2015-12-03 10:56:05 -05:00
Augie Fackler	aa07f6f058	changegroup: note during bundle apply if the repo was empty An upcoming change for exchanging treemanifest data will need to update the repository capabilities, which we should only do if the repository was empty before we started applying this changegroup. In the future we will probably need a strategy for upgrading to treemanifest in requires during a pull (I'm assuming at some point we'll make it possible to have a flag day to enable treemanifests on an existing history.)	2015-12-02 14:32:17 -05:00
Pierre-Yves David	f89772113f	changegroup: back code change of b5988e1d3dcb out The previous changeset is a simpler way of fixing issue4934 without changing the spirit of the code. We can remove the dual call to 'delayupdate' but we keep the tests to show that the issue is still fixed.	2015-11-06 13:01:15 -05:00
Pierre-Yves David	dfd6e44ebe	changegroup: call 'prechangegroup' hook before setting up write delay The 'prechangegroup' interfere with 'delayupdate' logic because it trigger the one time call of 'changelog._writepending' (see issure4934). There is no reason not to call that hook before setting up 'delayupdate' so we move the call a bit earlier to avoid interference.	2015-11-06 12:59:09 -05:00

1 2 3 4 5

219 Commits