sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-08 15:57:43 +03:00

Author	SHA1	Message	Date
Martin von Zweigbergk	f569f9222c	treemanifest: cache directory logs and manifests Since manifests instances are cached on the manifest log instance, we can cache directory manifests by caching the directory manifest logs. The directory manifest log cache is a plain dict, so it never expires; we assume that we can keep all the directories in memory. The cache is kept on the root manifestlog, so access to directory manifest logs now has to go through the root manifest log. The caching will soon not be only an optimization. When we start lazily loading directory manifests, we need to make sure we don't create multiple instances of the log objects. The caching takes care of that problem.	2015-04-10 23:12:33 -07:00
Augie Fackler	9c2e980a64	cleanup: use __builtins__.all instead of util.all	2015-05-16 14:34:19 -04:00
Martin von Zweigbergk	decbcc4c31	treemanifest: add --dir option to debug{revlog,data,index} It should be possible to debug the submanifest revlogs without having to know where they are stored (in .hg/store/meta/), so let's add a --dir option for this purpose.	2015-04-12 23:51:06 -07:00
Martin von Zweigbergk	1acf6c029c	treemanifest: store submanifest revlog per directory With this change, when tree manifests are enabled (in .hg/requires), commits will be written with one manifest revlog per directory. The manifest revlogs are stored in .hg/store/meta/$dir/00manifest.[id]. Flat manifests can still be read and interacted with as usual (they are also read into treemanifest instances). The functionality for writing treemanifest as a flat manifest to disk is still left in the code; tests still pass with '_treeinmem=True' hardcoded. Exchange is not yet implemented.	2015-04-13 23:21:02 -07:00
Martin von Zweigbergk	7d233de844	treemanifest: set requires at repo creation time, ignore config after The very next changeset will start writing one revlog per directory when tree manifests are enabled. That is backwards incompatible, so it requires .hg/requires to be updated. Just like with generaldelta, we want to update .hg/requires only when the repo is created. Updating ..hg/requires is bad for repos on shared disk. Instead, those who do want to upgrade a repo to using treemanifest (or manifestv2, etc) can run hg clone --config experimental.treemanifest repo clone which will create a new repo with the requirement set. Unlike the case of e.g. generaldelta, it will not rewrite the changesets, since tree manifests hash differently.	2015-05-05 08:40:59 -07:00
Augie Fackler	504ab1d1d6	manifest: document return type of readfast() I keep having to ponder out what readfast() means, and it always surprises me. Document the return type in the docstring so that future readers won't have to puzzle this out again.	2015-04-28 12:31:30 -04:00
Martin von Zweigbergk	35368e1596	treemanifest: extract parse method from constructor When we start to lazily load submanifests, it will be useful to be able to create an treemanifest instance before manifest data gets parsed into it. To prepare for this, extract the parsing code from treemanifest's constructor to a separate method.	2015-04-12 23:01:18 -07:00
Martin von Zweigbergk	c1ccc70121	manifest: duplicate call to addrevision() When we start writing submanifests to their own revlogs, we will not want to write a new revision for a directory if there were no changes to it. To prepare for this, duplicate the call to addrevision() and move them earlier where they can more easily be avoided.	2015-04-12 14:37:55 -07:00
Martin von Zweigbergk	63f47478d7	treemanifest: separate flags for trees in memory and trees on disk When we start writing tree manifests with one manifest revlog per directory, it will still be nice to be able to run tests using tree manifests in memory but writing to a flat manifest to a single revlog. Let's break the current '_usetreemanifest' flag on the revlog into '_treeinmem' and '_treeondisk'. Both are populated from the same config, but after this change, one can temporarily hard-code _treeinmem=True to see that tests still pass.	2015-04-10 18:54:33 -07:00
Martin von Zweigbergk	320b8b5298	manifestdict: drop empty-string argument when creating empty manifest manifestdict() creates an empty manifestdict, so let's consistently use that instead of explicitly parsing an empty string (which does result in an empty manifest).	2015-04-10 18:13:01 -07:00
Martin von Zweigbergk	4c187a8462	manifestdict: extract condition for _intersectfiles() and use for walk() The condition on which manifestdict.matches() and manifestdict.walk() take the fast path of iterating over files instead of the manifest, is slightly different. Specifically, walk() does not take the fast path for exact matchers and it does not avoid taking the fast path when there are more than 100 files. Let's extract the condition so we don't have to maintain it in two places and so walk() can gain these two missing pieces of the condition (although there seems to be no current caller of walk() with an exact matcher).	2015-04-08 09:38:09 -07:00
Martin von Zweigbergk	2592408744	manifestdict.walk: remove now-redundant check for match.files() When checking whether we can take the fast path of iterating over matcher files instead of manifest files, we check whether match.files() is non-empty. However, now that return early for match.always(), it can only be empty when there are only include/exclude patterns, but in that case anypats() will be True, so it's already covered. This makes manifestdict.walk() more similar to manifestdict.matches().	2015-04-07 22:40:25 -07:00
Martin von Zweigbergk	67897f5b0b	manifest.walk: special-case match.always() for speed This cuts down the run time of hg files -r . > /dev/null from ~0.850s to ~0.780s on the Firefox repo. Note that manifest.matches() already has the corresponding optimization.	2015-04-07 21:08:23 -07:00
Martin von Zweigbergk	fc5772e190	manifest.walk: use return instead of StopIteration in generator Using "return" within a generator is supposedly more Pythonic than raising StopIteration.	2015-04-07 22:36:17 -07:00
Drew Gottlieb	6d2651f8ba	treemanifest: optimize treemanifest._walk() to skip directories This makes treemanifest.walk() not visit submanifests that are known not to have any matching files. It does this by calling match.visitdir() on submanifests as it walks. This change also updates largefiles to be able to work with this new behavior in treemanifests. It overrides match.visitdir(), the function that dictates how walk() and matches() skip over directories. The greatest speed improvements are seen with narrower scopes. For example, this commit speeds up the following command on the Mozilla repo from 1.14s to 1.02s: hg files -r . dom/apps/ Whereas with a wider scope, dom/, the speed only improves from 1.21s to 1.13s. As with similar a similar optimization to treemanifest.matches(), this change will bring out even bigger performance improvements once treemanifests are loaded lazily. Once that happens, we won't just skip over looking at submanifests, but we'll skip even loading them.	2015-04-07 15:18:52 -07:00
Martin von Zweigbergk	89a5bacd48	manifest.walk: join nested if-conditions This makes it more closely match the similar condition in manifestdict.matches().	2015-04-07 22:35:44 -07:00
Martin von Zweigbergk	eff6f72dc8	manifestdict: inline _intersectfiles() The _intersectfiles() method is only called from one place, it's pretty short, and its caller has to be aware when it's appropriate to call it (when the number of files in the matcher is not too large), so let's inline it.	2015-04-08 10:01:31 -07:00
Martin von Zweigbergk	1430a21750	manifestdict._intersectfiles: avoid one level of property indirection We have already bothered to extract "lm = self._lm", so let's use "lm" where possible.	2015-04-08 10:03:59 -07:00
Martin von Zweigbergk	0d47282240	manifestdict.matches: avoid name 'lm' for a not-lazymanifest	2015-04-08 10:06:05 -07:00
Drew Gottlieb	d67091a36f	treemanifest: refactor treemanifest.walk() This refactor is a preparation for an optimization in the next commit. This introduces a recursive element that recurses each submanifest. By using a recursive function, the next commit can avoid walking over some subdirectories altogether.	2015-04-07 15:18:52 -07:00
Drew Gottlieb	ee2eebcb93	manifest: move changectx.walk() to manifests The logic of walking a manifest to yield files matching a match object is currently being done by context, not the manifest itself. This moves the walk() function to both manifestdict and treemanifest. This separate implementation will also permit differing, optimized implementations for each manifest.	2015-04-07 15:18:52 -07:00
Drew Gottlieb	d01f641c75	treemanifest: further optimize treemanifest.matches() The matches function was previously traversing all submanifests to look for matching files, even though it was possible to know if a submanifest won't contain any matches. This change adds a visitdir function on the match object to decide quickly if a directory should be visited when traversing. The function also decides if _all_ subdirectories should be traversed. Adding this logic as methods on the match object also makes the logic modifiable by extensions, such as largefiles. An example of a command this speeds up is running hg status --rev .^ python/ on the Mozilla repo with the treemanifest experiment enabled. It goes from 2.03s to 1.85s. More improvements to speed from this change will happen when treemanifests are lazily loaded. Because a flat manifest is still loaded and then converted into treemanifests, speed improvements are limited. This change has no negative effect on speed. For a worst-case example, this command is not negatively impacted: hg status --rev .^ 'relglob:*.js' on the Mozilla repo. It goes from 2.83s to 2.82s.	2015-04-06 10:51:53 -07:00
Drew Gottlieb	901ac5e726	util: move dirs() and finddirs() from scmutil to util An upcoming commit requires that match.py be able to call scmutil.dirs(), but when match.py imports scmutil, a dependency cycle is created. This commit avoids the cycle by moving dirs() and its related finddirs() function from scmutil to util, which match.py already depends on.	2015-04-06 14:36:08 -07:00
Martin von Zweigbergk	eeace59f46	treemanifest: disable readdelta optimization When tree manifests are stored with one revlog per directory and loaded lazily, it's unclear how much readdelta will help. If only a few files change, then only a small part of the full manifest will be loaded, and the delta chains should also be shorter for tree manifests. Therefore, let's disable readdelta for tree manifests for now.	2015-03-10 09:57:42 -07:00
Martin von Zweigbergk	ebd2a39ab3	manifestv2: add support for writing new manifest format If .hg/requires has 'manifestv2', the manifest will be written using the new format.	2015-03-31 14:01:33 -07:00
Martin von Zweigbergk	c5433d6da0	manifestv2: add support for reading new manifest format The new manifest format is designed to be smaller, in particular to produce smaller deltas. It stores hashes in binary and puts the hash on a new line (for smaller deltas). It also uses stem compression to save space for long paths. The format has room for metadata, but that's there only for future-proofing. The parser thus accepts any metadata and throws it away. For more information, see http://mercurial.selenic.com/wiki/ManifestV2Plan. The current manifest format doesn't allow an empty filename, so we use an empty filename on the first line to tell a manifest of the new format from the old. Since we still never write manifests in the new format, the added code is unused, but it is tested by test-manifest.py.	2015-03-27 22:26:41 -07:00
Martin von Zweigbergk	e931247479	manifestv2: set requires at repo creation time While it should be safe to switch to the new manifest format on an existing repo, let's keep it simple for now and make the configuration have any effect only at repo creation time. If the configuration is enabled then (at repo creation), we add an entry to requires and read that instead of the configuration from then on.	2015-03-31 22:45:45 -07:00
Drew Gottlieb	f02ce7c1fd	treemanifest: make treemanifest.matches() faster By converting treemanifest.matches() into a recursively additivie operation, it becomes O(n). The old matches function made a copy of the entire manifest and deleted files that didn't match. With tree manifests, this was an O(n log n) operation because del() was O(log n). This change speeds up the command "hg status --rev .^ 'relglob:*.js' on the Mozilla repo, now taking 2.53s, down from 3.51s.	2015-03-30 18:10:59 -07:00
Drew Gottlieb	5843babb9b	treemanifest: add treemanifest._isempty() During operations that involve building up a new manifest tree, it will be useful to be able to quickly check if a submanifest is empty, and if so, to avoid including it in the final tree. Doing this check lets us avoid creating treemanifest structures that contain any empty submanifests.	2015-03-30 17:21:49 -07:00
Drew Gottlieb	84f08f1d56	treemanifest: remove treemanifest._intersectfiles() In preparation for the optimization in the following commit, this commit removes treemanifest.matches()'s call to _intersectfiles(), and removes _intersectfiles() itself since it's unused at this point.	2015-03-27 13:16:13 -07:00
Martin von Zweigbergk	7dfcb254e9	manifestv2: implement slow readdelta() without revdiff For manifest v2, revlog.revdiff() usually does not provide enough information to produce a manifest. As a simple workaround, implement readdelta() by reading both the old and the new manifest and use manifest.diff() to find the difference. This is several times slower than the current readdelta() for v1 manifests, but there seems to be no other simple option, and this is still much faster than returning the full manifest (at least for verify).	2015-03-27 20:41:30 -07:00
Martin von Zweigbergk	b7bfa722d1	manifestv2: disable fastdelta optimization We may add support for the fastdelta optimization for manifest v2 at a later point, but let's disable it for now, so we don't have to implement it right away.	2015-03-27 17:07:24 -07:00
Martin von Zweigbergk	16d87fc88d	manifestv2: add (unused) config option With tree manifests, hashes will change anyway, so now is a good time to also take up the old plans of a new manifest format. While there should be little or no reason to use tree manifests with the current manifest format (v1) once the new format (v2) is supported, we'll try to keep the two dimensions (flat/tree and v1/v2) separate. In preparation for adding a the new format, let's add configuration for it and propagate that configuration to the manifest revlog subclass. The new configuration ("experimental.manifestv2") says in what format to write the manifest data. We may later add other configuration to choose how to hash it, either keeping the v1 hash for BC or hashing the v2 content. See http://mercurial.selenic.com/wiki/ManifestV2Plan for more details.	2015-03-27 16:19:44 -07:00
Martin von Zweigbergk	a7479ae566	manifest: extract method for creating manifest text Similar to the previous change, this one extracts a method for producing a manifest text from an iterator over (path, node, flags) tuples.	2015-03-27 15:37:46 -07:00
Martin von Zweigbergk	a16ddaed87	manifest: extract method for parsing manifest By extracting a method that generates (path, node, flags) tuples, we can reuse the code for parsing a manifest without doing it via a _lazymanifest like treemanifest currently does. It also prepares for parsing the new manifest format. Note that this makes parsing into treemanifest slower, since the parsing is now always done in pure Python. Since treemanifests will be expected (or even forced) to be used only with the new manifest format, parsing via _lazymanifest was not an option anyway.	2015-03-27 15:02:43 -07:00
Martin von Zweigbergk	35cf546efe	_lazymanifest: drop unnecessary call to sorted() The entries returned from _lazymanifest.iterentries() are already sorted.	2015-03-27 20:55:54 -07:00
Drew Gottlieb	d2ab66f723	manifest: make manifest.intersectfiles() internal manifest.intersectfiles() is just a utility used by manifest.matches(), and a future commit removes intersectfiles for treemanifest for optimization purposes. This commit makes the intersectfiles methods on manifestdict and treemanifest internal, and converts its test to a more generic testMatches(), which has the exact same coverage.	2015-03-30 10:43:52 -07:00
Martin von Zweigbergk	c7787f3a4e	treemanifest: drop 22nd byte for consistency with manifestdict When assigning a 22-byte hash to a nodeid in a manifest, manifestdict drops the 22nd byte, while treemanifest keeps it. Let's make treemanifest drop the 22nd byte as well.	2015-03-26 09:42:21 -07:00
Martin von Zweigbergk	8874cd66a5	match: add isexact() method to hide internals Comparing a function reference seems bad.	2014-10-29 08:43:39 -07:00
Martin von Zweigbergk	db97ff212f	treemanifest: make hasdir() faster Same rationale as the previous change.	2015-03-16 16:01:16 -07:00
Martin von Zweigbergk	6aeabac9d6	treemanifest: make filesnotin() faster Same rationale as the previous change.	2015-03-03 13:50:06 -08:00
Martin von Zweigbergk	a417c46247	treemanifest: make diff() faster Containment checking is slower in treemanifest than it is in manifestdict, making the current diff algorithm O(n log n). By traversing both treemanifests in parallel, we can make it O(n). More importantly, once we start lazily loading submanifests, we will be able to easily skip entire submanifest if they have the same nodeid.	2015-02-19 17:13:35 -08:00
Martin von Zweigbergk	4c03dc48c7	treemanifest: store directory path in treemanifest nodes This leads to less concatenation while iterating, and it's useful for debugging.	2015-02-23 10:57:57 -08:00
Martin von Zweigbergk	8790c2008e	treemanifest: add configuration for using treemanifest type This change adds boolean configuration option experimental.treemanifest. When the option is enabled, manifests are parsed into the new treemanifest type. Tests can be now run using treemanifest by switching the config option default in localrepo._applyrequirements(). Tests pass even when made to randomly choose between manifestdict and treemanifest, suggesting that the two types produce identical manifests (so e.g. a manifest revlog entry written from a treemanifest can be parsed by the manifestdict code).	2015-03-19 11:07:57 -07:00
Martin von Zweigbergk	4ab8e2d4fe	treemanifest: create treemanifest class There are a number of problems with large and flat manifests. Copying from http://mercurial.selenic.com/wiki/ManifestShardingPlan: * manifest too large for RAM * manifest resolution too much CPU (long delta chains) * committing is slow because entire manifest has to be hashed * impossible for narrow clone to leave out part of manifest as all is needed to calculate new hash * diffing two revisions involves traversing entire subdirectories even if identical This is a first step in a series introducing a manifest revlog per directory. This change adds a new manifest class: treemanifest, which is a tree where each node has a dict of files (nodeids), a dict of flags, and a dict of subdirectories (treemanifests). So far, it behaves just like manifestdict, but it will later help us write one manifest revlog per directory. The new class is still unused; it will be used after the next change. The code is not yet optimized. Running with it (see below) makes most or all operations slower. Once we start storing manifest revlogs for every directory, it should be possible to make many of these operations much faster. The fastdelta() optimization has been intentionally not implemented for the treemanifests. We can implement it later if necessary. All tests pass when run with the following patch (and without, of couse): --- a/mercurial/manifest.py Thu Mar 19 11:08:42 2015 -0700 +++ b/mercurial/manifest.py Thu Mar 19 11:15:50 2015 -0700 @@ -596,7 +596,7 @@ class manifest(revlog.revlog): return None, None def add(self, m, transaction, link, p1, p2, added, removed): - if p1 in self._mancache: + if False and p1 in self._mancache: # If our first parent is in the manifest cache, we can # compute a delta here using properties we know about the # manifest up-front, which may save time later for the @@ -626,3 +626,5 @@ class manifest(revlog.revlog): self._mancache[n] = (m, arraytext) return n + +manifestdict = treemanifest	2015-03-19 11:08:42 -07:00
Durham Goode	e4183e1549	manifest: avoid intersectfiles for matches > 100 files Previously we tried to avoid manifest.intersectfiles for exact matches with less than 100 files. However, when the left side of the "or" is false, the right side gets evaluated, of course, and the evaluation of "util.all(fn in self for fn in files)" is both costly in itself, and likely to be true, causing intersectfiles() to be called after all. Fix this by moving the check for less than 100 files outside of the "or" expression, thereby also making it apply for a non-exact matcher, should one be passed in.	2015-03-18 15:59:45 -07:00
Matt Mackall	ce4c2d6512	manifest: speed up matches for large sets of files If the number of files being matched is large, the bisection overhead can dominate, which caused a performance regression for revert --all and histedit. This introduces a (fairly arbitrary) cross-over from using bisections to bulk search.	2015-03-18 13:37:18 -05:00
Drew Gottlieb	31ae70b088	manifest: add manifestdict.hasdir() method Allows for alternative implementations of manifestdict to decide if a directory exists in whatever way is most optimal.	2015-03-13 15:25:01 -07:00
Drew Gottlieb	30b3b3df39	manifest: add dirs() to manifestdict Manifests should have a method of accessing its own dirs, not just the context that references the manifest. This makes it easier for other optimized versions of manifests to compute their own dirs in the most efficient way.	2015-03-13 15:19:54 -07:00
Martin von Zweigbergk	ce0723ee16	lazymanifest: make __iter__ generate filenames, not 3-tuples The _lazymanifest type(s) behave very much like a sorted dict with filenames as keys and (nodeid, flags) as values. It therefore seems surprising that its __iter__ generates 3-tuples of (path, nodeid, flags). Let's make it match dict's behavior of generating the keys instead, and add a new iterentries method for the 3-tuples. With this change, the "x" in "if x in lm" and "for x in lm" now have the same type (a filename string).	2015-03-12 18:18:29 -07:00

1 2 3 4

159 Commits