sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-10 00:45:18 +03:00

Author	SHA1	Message	Date
Durham Goode	8f48c965d8	manifest: change manifestctx to not inherit from manifestdict If manifestctx inherits from manifestdict, it requires some weird logic to lazily load the dict if a piece of information is asked for. This ended up being complicated and unintuitive to use. Let's move the dict creation to .read(). This will make even more sense once we start adding readdelta() and other similar methods to manifestctx.	2016-09-12 10:55:43 -07:00
Pierre-Yves David	cb4c54634b	manifest: backed out changeset 3e5e08efafc9 There is some suspicious failure in evolution tests. This changeset was supposed to be dropped until we investigate.	2016-09-10 01:42:05 +02:00
Pierre-Yves David	eb569a3c73	manifest: backed out changeset ec5be4246a05 There is some suspicious failure in evolution tests. This changeset was supposed to be dropped until we investigate.	2016-09-10 01:41:38 +02:00
Durham Goode	ab661bf355	manifest: change manifestctx to not inherit from manifestdict If manifestctx inherits from manifestdict, it requires some weird logic to lazily load the dict if a piece of information is asked for. This ended up being complicated and unintuitive to use. Let's move the dict creation to .read(). This will make even more sense once we start adding readdelta() and other similar methods to manifestctx.	2016-08-31 12:46:53 -07:00
Durham Goode	e8a39ee6a7	manifest: make uses of _mancache aware of contexts In a future patch we will change manifestctx and treemanifestctx to no longer derive from manifestdict and treemanifest, respectively. This means that consumers of the _mancache will now need to be aware of the different between the two, until we get rid of the manifest entirely and the _mancache becomes only filled with ctxs.	2016-08-29 18:02:09 -07:00
Durham Goode	fd7e94b89b	manifest: add treemanifestctx class Before we start using repo.manifestlog in the rest of the code base, we need to make sure that it's capable of returning treemanifests. As we add new functionality to manifestctx, we'll add it to treemanifestctx at the same time. We also comment out the manifestctx p1, p2, and linkrev fields for now, since we're not implementing them on treemanifest yet.	2016-08-31 13:29:49 -07:00
Durham Goode	133c1fe33e	manifest: call m1.load and m2.load before writing a subtree As part of refactoring the manifest, certain test cases started failing because writesubtrees was called with p1 and p2 manifests that had not been loaded (so accessing m1._dirs resulted in an empty set). Let's call _load on these before attempting to access _dirs. This was caught by tests when future patches were applied.	2016-08-29 17:48:14 -07:00
Durham Goode	f38741166f	manifest: use property instead of field for manifest revlog storage The file caches we're using to avoid reloading the manifest from disk everytime has an annoying bug that causes the in memory structure to not be reloaded if the mtime and the size haven't changed. This causes a breakage in the tests because the manifestlog is not being reloaded after a commit+strip operation in mq (the mtime is the same because it all happens in the same second, and the resulting size is the same because we add 1 and remove 1). The only reason this doesn't affect the manifest itself is because we touch it so often that we had already reloaded it after the commit, but before the strip. Once the entire manifest has migrated to manifestlog, we can get rid of these properties, since then the manifestlog will be touched after the commit, but before the strip, as well.	2016-08-17 13:25:13 -07:00
Durham Goode	4c0439aa0a	manifest: introduce manifestlog and manifestctx classes This is the start of a large refactoring of the manifest class. It introduces the new manifestlog and manifestctx classes which will represent the collection of all manifests and individual instances, respectively. Future patches will begin to convert usages of repo.manifest to repo.manifestlog, adding the necessary functionality to manifestlog and instance as they are needed.	2016-08-17 13:25:13 -07:00
Durham Goode	948314a949	manifest: make manifest derive from manifestrevlog As part of our refactoring to split the manifest concept from its storage, we need to start moving the revlog specific parts of the manifest implementation to a new class. This patch creates manifestrevlog and moves the fulltextcache onto the base class.	2016-08-17 13:25:13 -07:00
Durham Goode	9dfdbc1f92	manifest: break mancache into two caches The old manifest cache cached both the inmemory representation and the raw text. As part of the manifest refactor we want to separate the storage format from the in memory representation, so let's split this cache into two caches. This will let other manifest implementations participate in the in memory cache, while allowing the revlog based implementations to still depend on the full text caching where necessary.	2016-08-17 13:25:13 -07:00
Augie Fackler	ba4d11b62e	bundlerepo: add support for treemanifests in cg3 bundles This is a little messier than I'd like, and I'll probably come back and do some more refactoring later, but as it is this unblocks narrowhg. An alternative approach (which I may do as part of the mentioned refactoring) would be to construct all dirlog instances up front, so that we don't have to keep track of the linkmapper method. This would avoid a reference cycle between the bundlemanifest and the bundlerepository, but I was hesitant to do all the work up front like that. With this change, it's possible to do 'hg incoming' and 'hg pull' from bundles in .hg/strip-backup in a treemanifest repository. Sadly, this doesn't make it possible to 'hg clone' one of those (if you do 'hg strip 0'), because the cg3 in the bundle gets written without a treemanifest flag. Since that's going to be an involved refactor in a different part of the code (which I suspect won't touch any of the code I've just written here), let's leave it as an idea for Later.	2016-08-05 13:08:11 -04:00
liscju	c7ec9d159e	i18n: translate abort messages I found a few places where message given to abort is not translated, I don't find any reason to not translate them.	2016-06-14 11:53:55 +02:00
Tony Tung	9f3c4b8958	manifest: improve filesnotin performance by using lazymanifest diff lazymanifests can compute diffs significantly faster than taking the set of two manifests and calculating the delta. when running hg diff --git -c . on Facebook's big repo, this reduces the run time from 2.1s to 1.5s.	2016-05-02 15:22:16 -07:00
Martin von Zweigbergk	58c3ff9aaf	changegroup: fix treemanifests on merges The current code for generating treemanifest revisions takes the list of files in the changeset and finds the directories from them. This does not work for merges, since a merge may pick file A from one side and file B from another and neither of them would appear in the changeset's "files" list, but the manifest would still change. Fix this by instead walking the root manifest log for all needed revisions, storing all needed file and subdirectory revisions, then recursively visiting the subdirectories. This also turns out to be faster: cloning a version of hg core converted to treemanifests went from ~28s to ~19s (timing somewhat unfair: before this patch, timed until crash; after this patch, timed until manifests complete). The new algorithm is used only on treemanifest repos. Although it works equally well on flat manifests, we leave the iteration over files in the changeset for flat manifests for now.	2016-02-12 23:09:09 -08:00
Martin von Zweigbergk	a05047be24	treemanifest: allow setting flag to 't' When using treemanifests, an on-disk manifest entry with the 't' flag set means that that entry is a directory and not a file. When read into memory, these become instances of the treemanifest class. The 't' flag should therefore never be visible to outside of manifest.py, so setflag() checks that it is not called with the 't' flag. However, it turns out that it will be useful for the narrowhg extension to expose the 't' flag to the user (see below), so let's drop the assertion. The narrowhg extension allows cloning only a given set of files and directories. Filelogs and dirlogs that don't match that set will not be included in the clone. The extension currently doesn't work with treemanifests. I plan on changing it so directories outside the narrow clone appear in the manifest. For example, if a directory 'outside/' is not part of the narrow clone, it will look like a file 'outside' with the 't' flag set. That will make e.g. manifestmerge() just work in most cases (and make it well prepared to handle the other cases).	2016-02-09 20:22:33 -08:00
Martin von Zweigbergk	9cf0539032	treemanifest: rewrite text() using iterentries() This simplifies a bit. Note that the function is only used when manually testing with _treeinmem=True.	2016-02-20 23:57:21 -08:00
Martin von Zweigbergk	8f025f0656	treemanifest: implement iterentries() To make tests pass with _treeinmem manually set to True, we need to implement the recently added iterentries() on the treemanifest class too.	2016-02-07 21:14:01 -08:00
Martin von Zweigbergk	b2b4f9e694	verify: check directory manifests In repos with treemanifests, there is no specific verification of directory manifest revlogs. It simply collects all file nodes by reading each manifest delta. With treemanifests, that's means calling the manifest._slowreaddelta(). If there are missing revlog entries in a subdirectory revlog, 'hg verify' will simply report the exception that occurred while trying to read the root manifest: manifest@0: reading delta 1700e2e92882: meta/b/00manifest.i@67688a370455: no node This patch changes the verify code to load only the root manifest at first and verify all revisions of it, then verify all revisions of each direct subdirectory, and so on, recursively. The above message becomes b/@0: parent-directory manifest refers to unknown revision 67688a370455 Since the new algorithm reads a single revlog at a time and in order, 'hg verify' on a treemanifest version of the hg core repo goes from ~50s to ~14s. As expected, there is no significant difference on a repo with flat manifests.	2016-02-07 21:13:24 -08:00
Gregory Szorc	d6f69e17c6	manifest: use absolute_import	2015-12-21 21:35:46 -08:00
Gregory Szorc	ad1f138bcd	manifest: implement clearcaches() The manifest implements its own caches in addition to revlog's. Extend the base clearcaches() to wipe these as well.	2015-12-20 19:31:46 -08:00
Martin von Zweigbergk	50de24bc06	treemanifest: don't iterate entire matching submanifests on match() Before a4236180df5e (match: remove unnecessary optimization where visitdir() returns 'all', 2015-05-06), match.visitdir() used to return the special value 'all' to indicate that it was known that all subdirectories would also be included in the match. The purpose for that value was to avoid calling the matcher on all the paths. It turned out that calling the matcher was not a problem, so the special return value was removed and the code was simplified. However, if we use the same special value for not just avoiding calling the matcher on each file, but to avoid iterating over each file, it's a much bigger win. On commands like hg st --rev .^ --rev . dom/ we run the matcher (dom/) on the two manifests, then diff the narrowed manifest. If the size of the match is much larger than the size of the diff, this is wasteful. In the above case, we would end up iterating over the 15k-or-so files in dom/ for each of the manifests, only to later discover that they are mostly the same. This means that runningt the command above is usually slower than getting the status for the entire repo, because that code avoids calling treemanifest.match() and only calls treemanifest.diff(), which loads only what's needed for the diff. Let's fix this by reintroducing the 'all' value in match.visitdir() and making treemanifest.match() return a lazy copy of the manifest from dom/ and down (in the above case). This speeds up the above command on the Firefox repo from 0.357s to 0.137s (best of 5). The wider the match, the bigger the speedup.	2015-12-12 09:57:05 -08:00
Martin von Zweigbergk	8efd14d515	manifest: use 't' for tree manifest flag We currently use 'd' to indicate that a manifest entry is a directory. Let's switch to 't', since that's not a valid hex digit and therefore easier to spot in the raw manifest data. This will break any existing repos with tree manifests, but it's still an experimental feature and there are probably only a few test repos in existence with 'd' flags.	2015-12-04 14:24:45 -08:00
Durham Goode	9980cf1bdd	manifest: skip fastdelta if the change is large In large repos, the existing manifest fastdelta computation (which performs a bisect on the raw manifest for every file that is changing), is excessively slow. This patch makes fastdelta fallback to the normal string delta algorithm if the number of changes is large. On a large repo with a commit of 8000 files, this reduces the commit time by 7 seconds (fastdelta goes from 8 seconds to 1). I tested this change by modifying the function to compare the old and the new values and running the test suite. The only difference is that the pure text-diff algorithm sometimes produces smaller (but functionaly identical) deltatexts than the bisect algorithm.	2015-11-05 18:56:40 -08:00
Augie Fackler	4f77804eb0	treemanifest: rework lazy-copying code (issue4840) The old lazy-copy code formed a chain of copied manifests with each copy. Under typical operation, the stack never got more than a couple of manifests deep and was fine. Under conditions like hgsubversion or convert, the stack could get hundreds of manifests deep, and eventually overflow the recursion limit for Python. I was able to consistently reproduce this by converting an hgsubversion clone of svn's history to treemanifests. This may result in fewer manifests staying in memory during operations like convert when treemanifests are in use, and should make those operations faster since there will be significantly fewer noop function calls going on. A previous attempt (never mailed) of mine to fix this problem tried to simply have all treemanifests only have a loadfunc - that caused somewhat weird problems because the gettext() callable passed into read() wasn't idempotent, so the easy solution is to have a loadfunc and a copyfunc.	2015-09-25 22:54:46 -04:00
Augie Fackler	70820b78e7	manifest: rename treemanifest load functions to ease debugging I'm hunting an infinite recursion bug at the moment, and having both of these methods named just _load is muddying the waters slightly.	2015-09-25 17:18:28 -04:00
Augie Fackler	83cf95501a	manifest: add id(self) to treemanifest __repr__ Also rename __str__ to __repr__ since that's what we really want for pdb.	2015-09-25 17:17:36 -04:00
timeless@mozdev.org	4af6115a32	manifest: switch add() to heapq.merge (available in Py2.6+)	2015-09-04 05:57:58 -04:00
Martin von Zweigbergk	93d5f56103	manifest: use match.prefix() instead of 'not match.anypats()' It seems clearer to check for what it is than what it isn't.	2015-05-19 11:16:20 -07:00
Martin von Zweigbergk	e9f7136157	treemanifest: lazily load manifests Most operations on treemanifests already visit only relevant submanifests. Notable examples include __getitem__, __contains__, walk/matches with matcher, diff. By making submanifests lazily loaded, we speed up all these operations. The lazy loading is achieved by adding a _load() method that gets defined where we currently eagerly parse the manifest. We make sure to call it before any access to _dirs, _files or _flags. Some timings on the Mozilla repo (with flat manifest timings for reference): hg cat -r . README.txt: 1.644s -> 0.096s (0.255s) hg diff -r .^ -r . : 1.746s -> 0.137s (0.431s) hg files -r . python : 1.508s -> 0.146s (0.335s) hg files -r . : 2.125s -> 2.203s (0.712s)	2015-04-09 17:14:35 -07:00
Martin von Zweigbergk	61642a4536	treemanifest: speed up commit using dirty flag We currently avoid saving a treemanifest revision if it's the same as one of it's parents. This is checked by comparing the generated text for all three versions. Let's avoid that when possible by comparing the nodeids for clean (not dirty) nodes. On the Mozilla repo, this speeds up commit from 2.836s to 2.343s.	2015-05-18 21:31:40 -07:00
Martin von Zweigbergk	176b5e14d6	treemanifest: speed up diff by keeping track of dirty nodes Since tree manifests have a nodeid per directory, we can avoid diffing entire directories if they have the same nodeid. The comparison is only valid for unmodified treemanifest instances, of course, so we need to keep track of which have been modified. Therefore, let's add a dirty flag to treemanifest indicating whether its nodeid can be trusted. We set it when _files or _dirs is modified, and make diff(), and its cousin filesnotin(), not descend into subdirectories that are the same on both sides. On the Mozilla repo, this speeds up 'hg diff -r .^ -r .' from 1.990s to 1.762s. The improvement will be much larger when we start lazily loading subdirectory manifests.	2015-02-26 08:16:13 -08:00
Drew Gottlieb	ca0e804650	match: remove unnecessary optimization where visitdir() returns 'all' Match's visitdir() was prematurely optimized to return 'all' in some cases, so that the caller would not have to call it for directories within the current directory. This change makes the visitdir system less flexible for future changes, such as making visitdir consider the match's include and exclude patterns. As a demonstration of this optimization not actually improving performance, I ran 'hg files -r . media' on the Mozilla repository, stored as treemanifest revlogs. With best of ten tries, the command took 1.07s both with and without the optimization, even though the optimization reduced the calls from visitdir() from 987 to 51.	2015-05-06 15:59:35 -07:00
Martin von Zweigbergk	f569f9222c	treemanifest: cache directory logs and manifests Since manifests instances are cached on the manifest log instance, we can cache directory manifests by caching the directory manifest logs. The directory manifest log cache is a plain dict, so it never expires; we assume that we can keep all the directories in memory. The cache is kept on the root manifestlog, so access to directory manifest logs now has to go through the root manifest log. The caching will soon not be only an optimization. When we start lazily loading directory manifests, we need to make sure we don't create multiple instances of the log objects. The caching takes care of that problem.	2015-04-10 23:12:33 -07:00
Augie Fackler	9c2e980a64	cleanup: use __builtins__.all instead of util.all	2015-05-16 14:34:19 -04:00
Martin von Zweigbergk	decbcc4c31	treemanifest: add --dir option to debug{revlog,data,index} It should be possible to debug the submanifest revlogs without having to know where they are stored (in .hg/store/meta/), so let's add a --dir option for this purpose.	2015-04-12 23:51:06 -07:00
Martin von Zweigbergk	1acf6c029c	treemanifest: store submanifest revlog per directory With this change, when tree manifests are enabled (in .hg/requires), commits will be written with one manifest revlog per directory. The manifest revlogs are stored in .hg/store/meta/$dir/00manifest.[id]. Flat manifests can still be read and interacted with as usual (they are also read into treemanifest instances). The functionality for writing treemanifest as a flat manifest to disk is still left in the code; tests still pass with '_treeinmem=True' hardcoded. Exchange is not yet implemented.	2015-04-13 23:21:02 -07:00
Martin von Zweigbergk	7d233de844	treemanifest: set requires at repo creation time, ignore config after The very next changeset will start writing one revlog per directory when tree manifests are enabled. That is backwards incompatible, so it requires .hg/requires to be updated. Just like with generaldelta, we want to update .hg/requires only when the repo is created. Updating ..hg/requires is bad for repos on shared disk. Instead, those who do want to upgrade a repo to using treemanifest (or manifestv2, etc) can run hg clone --config experimental.treemanifest repo clone which will create a new repo with the requirement set. Unlike the case of e.g. generaldelta, it will not rewrite the changesets, since tree manifests hash differently.	2015-05-05 08:40:59 -07:00
Augie Fackler	504ab1d1d6	manifest: document return type of readfast() I keep having to ponder out what readfast() means, and it always surprises me. Document the return type in the docstring so that future readers won't have to puzzle this out again.	2015-04-28 12:31:30 -04:00
Martin von Zweigbergk	35368e1596	treemanifest: extract parse method from constructor When we start to lazily load submanifests, it will be useful to be able to create an treemanifest instance before manifest data gets parsed into it. To prepare for this, extract the parsing code from treemanifest's constructor to a separate method.	2015-04-12 23:01:18 -07:00
Martin von Zweigbergk	c1ccc70121	manifest: duplicate call to addrevision() When we start writing submanifests to their own revlogs, we will not want to write a new revision for a directory if there were no changes to it. To prepare for this, duplicate the call to addrevision() and move them earlier where they can more easily be avoided.	2015-04-12 14:37:55 -07:00
Martin von Zweigbergk	63f47478d7	treemanifest: separate flags for trees in memory and trees on disk When we start writing tree manifests with one manifest revlog per directory, it will still be nice to be able to run tests using tree manifests in memory but writing to a flat manifest to a single revlog. Let's break the current '_usetreemanifest' flag on the revlog into '_treeinmem' and '_treeondisk'. Both are populated from the same config, but after this change, one can temporarily hard-code _treeinmem=True to see that tests still pass.	2015-04-10 18:54:33 -07:00
Martin von Zweigbergk	320b8b5298	manifestdict: drop empty-string argument when creating empty manifest manifestdict() creates an empty manifestdict, so let's consistently use that instead of explicitly parsing an empty string (which does result in an empty manifest).	2015-04-10 18:13:01 -07:00
Martin von Zweigbergk	4c187a8462	manifestdict: extract condition for _intersectfiles() and use for walk() The condition on which manifestdict.matches() and manifestdict.walk() take the fast path of iterating over files instead of the manifest, is slightly different. Specifically, walk() does not take the fast path for exact matchers and it does not avoid taking the fast path when there are more than 100 files. Let's extract the condition so we don't have to maintain it in two places and so walk() can gain these two missing pieces of the condition (although there seems to be no current caller of walk() with an exact matcher).	2015-04-08 09:38:09 -07:00
Martin von Zweigbergk	2592408744	manifestdict.walk: remove now-redundant check for match.files() When checking whether we can take the fast path of iterating over matcher files instead of manifest files, we check whether match.files() is non-empty. However, now that return early for match.always(), it can only be empty when there are only include/exclude patterns, but in that case anypats() will be True, so it's already covered. This makes manifestdict.walk() more similar to manifestdict.matches().	2015-04-07 22:40:25 -07:00
Martin von Zweigbergk	67897f5b0b	manifest.walk: special-case match.always() for speed This cuts down the run time of hg files -r . > /dev/null from ~0.850s to ~0.780s on the Firefox repo. Note that manifest.matches() already has the corresponding optimization.	2015-04-07 21:08:23 -07:00
Martin von Zweigbergk	fc5772e190	manifest.walk: use return instead of StopIteration in generator Using "return" within a generator is supposedly more Pythonic than raising StopIteration.	2015-04-07 22:36:17 -07:00
Drew Gottlieb	6d2651f8ba	treemanifest: optimize treemanifest._walk() to skip directories This makes treemanifest.walk() not visit submanifests that are known not to have any matching files. It does this by calling match.visitdir() on submanifests as it walks. This change also updates largefiles to be able to work with this new behavior in treemanifests. It overrides match.visitdir(), the function that dictates how walk() and matches() skip over directories. The greatest speed improvements are seen with narrower scopes. For example, this commit speeds up the following command on the Mozilla repo from 1.14s to 1.02s: hg files -r . dom/apps/ Whereas with a wider scope, dom/, the speed only improves from 1.21s to 1.13s. As with similar a similar optimization to treemanifest.matches(), this change will bring out even bigger performance improvements once treemanifests are loaded lazily. Once that happens, we won't just skip over looking at submanifests, but we'll skip even loading them.	2015-04-07 15:18:52 -07:00
Martin von Zweigbergk	89a5bacd48	manifest.walk: join nested if-conditions This makes it more closely match the similar condition in manifestdict.matches().	2015-04-07 22:35:44 -07:00
Martin von Zweigbergk	eff6f72dc8	manifestdict: inline _intersectfiles() The _intersectfiles() method is only called from one place, it's pretty short, and its caller has to be aware when it's appropriate to call it (when the number of files in the matcher is not too large), so let's inline it.	2015-04-08 10:01:31 -07:00

1 2 3 4

192 Commits