sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-07 23:38:50 +03:00

Author	SHA1	Message	Date
Durham Goode	e0b3f09a3b	revbranchcache: write cache even during read operations Previously we would only actually write the revbranchcache to disk if we were in the middle of a write operation (like commit). Now we will also write it during any read operation. The cache knows how to invalidate itself, so it shouldn't become corrupt if multiple writers try at once (and the write-on-read behavior/risk is the same as all our other caches).	2015-02-24 18:43:31 -08:00
Durham Goode	9d9fb7f2ee	revbranchcache: move cache writing to the transaction finalizer Instead of writing the revbranchcache during updatecache (which often happens too early, before the cache is even populated), let's run it as part of the transaction finalizer. It still won't be written for read-only operations, but that's no worse than it is today. A future commit will remove the actual write that happens in updatecache(). This is also good prep for when all caches get moved into the transaction.	2015-02-10 20:06:12 -08:00
Durham Goode	f333b64f2f	revbranchcache: populate cache incrementally Previously the cache would populate completely the first time it was accessed. This could take over a minute on larger repos. This patch changes it to update incrementally. Only values that are read will be written, and it will only rewrite as much of the file as strictly necessary. This adds a magic value of '\0\0\0\0' to represent an empty cache entry. The probability of this matching an actual commit hash prefix is tiny, so it's ok if that's always considered a cache miss. This is also BC safe since any existing entries with '\0\0\0\0' will just be considered misses. Perf numbers: Mozilla-central: hg --time log -r 'branch(mobile)' -T. Cold Cache: 14.7s -> 15.1s (3% worse) Warm Cache: 1.6s -> 2.1s (30% worse) Mozilla-cental: hg perfbranchmap 2s -> 2.4s (20% worse) hg: hg log -r 'branch(stable) & branch(default)' Cold Cache: 3.1s -> 1.9s (40% better - because the old code missed the cache on both branch() revset iterations, so it did twice the work) Warm Cache: 0.2 -> 0.26 (30% worse) internal huge repo: hg --time log -r 'tip & branch(default)' Cold Cache: 65.4s -> 0.2s (327x better) While this change introduces minor regressions when iterating over every commit in a branch, it massively improves the cold cache time for operations which touch a single commit. I feel the better O() is worth it in this case.	2015-02-10 20:04:47 -08:00
Durham Goode	81ca4a2723	revbranchcache: move entry writing to a separate function This moves the actual writing of entries to the cache to a separate function. This will allow us to use it in multiple places. Ex: in one place we will write dummy entries, and in another place we will write real data.	2015-02-10 20:01:08 -08:00
Durham Goode	23a18a419d	revbranchcache: store repo on the object Previously we would instantiate the revbranchcache with a repo object, use it briefly, then require it be passed in every time we wanted to fetch any information. This seems unnecessary since it's obviously specific to that repo (since it was constructed with it). This patch stores the repo on the revbranchcache object, and removes the repo parameter from the various functions on that class. This has the other nice benefit of removing the double-revbranchcache-read that existed before (it was read once for the branch revset, and once for the repo.revbranchcache).	2015-02-10 19:57:51 -08:00
Durham Goode	9ac6d81ba3	revbranchcache: move out of branchmap onto localrepo Previously the revbranchcache was a field inside the branchmap. This is bad for a couple reasons: 1) There can be multiple branchmaps per repo (one for each filter level). There can only be one revbranchcache per repo. In fact, a revbranchcache could only exist on a branchmap that was for the unfiltered view, so you could have branchmaps exist for which you couldn't have a revbranchcache. It was funky. 2) The write lifecycle for the revbranchcache is going to be different from the branchmap (branchmap is greedily written early on, revbranchcache should be lazily computed and written). This patch moves the revbranchcache to live as a field on the localrepo (alongside self._branchmap). This will allow us to handle it's lifecycle differently, which will let us move it to be lazily computed in future patches.	2015-02-10 19:53:48 -08:00
Matt Mackall	b907416f7b	merge with stable	2015-03-02 01:20:14 -06:00
Mads Kiilerich	56207b4242	revisionbranchcache: fall back to slow path if starting readonly (issue4531) Transitioning to Mercurial versions with revision branch cache could be slow as long as all operations were readonly (revset queries) and the cache would be populated but not written back. Instead, fall back to using the consistently slow path when readonly and the cache doesn't exist yet. That avoids the overhead of populating the cache without writing it back. If not readonly, it will still populate all missing entries initially. That avoids repeated writing of the cache file with small updates, and it also makes sure a fully populated cache available for the readonly operations.	2015-02-06 02:52:10 +01:00
Angel Ezquerra	88cbab7845	localrepo: remove all external users of localrepo.opener This change touches every module in which repository.opener was being used, and changes it for the equivalent repository.vfs. This is meant to make it easier to split the repository.vfs into several separate vfs. It should now be possible to remove localrepo.opener.	2015-01-15 23:17:12 +01:00
Mads Kiilerich	bc7c34a53f	branchcache: make _rbcrevslen handling more safe self._rbcrevslen is used to keep track of the number of good records on disk. It should thus not be updated before the records actually have been written to disk.	2015-01-14 01:15:26 +01:00
Mads Kiilerich	49a09d3e6c	branchcache: add debug output whenever cache files use truncate The cache files are usually append only but will automatically be truncated and recover in exceptional situations. Add a debug notice when such exceptional situations are encountered.	2015-01-14 01:15:26 +01:00
Matt Harbison	9825da1159	branchmap: add seek() to end of file before calling tell() on append open() This is similar to 5274228efcdc, which was subsequently modified in dd809b0d9714 for 2.4. Unexpected test changes on Windows occurred without this.	2015-01-10 12:00:03 -05:00
Mads Kiilerich	835157e77d	branchmap: use revbranchcache when updating branch map The revbranchcache is read on demand before it will be used for updating the branch map. It is written back when the branchmap is written and it will thus use the same locking as branchmap. The revbranchcache instance is short-lived; it is only stored in the branchmap from .update() is invoked and until .write() is invoked. Branchmap already assume that the repo is locked in that case. The use of revbranchcache for branch map updates will make sure that the revbranchcache "always" is kept up-to-date. The perfbranchmap benchmark is somewhat bogus, especially when we can see that the caching makes a significant difference between the realistic case of a first run and the rare case of rerunning it with a full cache. Here are some 'base' numbers on mozilla-central: Before: ! wall 6.912745 comb 6.910000 user 6.840000 sys 0.070000 (best of 3) After - initial, cache is empty: ! wall 7.792569 comb 7.790000 user 7.720000 sys 0.070000 (best of 3) After - cache is full: ! wall 0.879688 comb 0.880000 user 0.870000 sys 0.010000 (best of 4) The overhead when running with empty cache comes from checking, missing and updating it every time. Most of the performance improvement comes from not having to extract the branch info from the changelog. The last doubling of performance comes from no longer having to convert all branch names to local encoding but reuse the few already converted branch names. On the hg repo: Before: ! wall 0.715703 comb 0.710000 user 0.710000 sys 0.000000 (best of 14) After: ! wall 0.105489 comb 0.110000 user 0.110000 sys 0.000000 (best of 87)	2015-01-08 00:01:03 +01:00
Mads Kiilerich	1b3892318f	branchcache: introduce revbranchcache for caching of revision branch names It is expensive to retrieve the branch name of a revision. Very expensive when creating a changectx and calling .branch() every time - slightly less when using changelog.branchinfo(). Now, to speed things up, provide a way to cache the results on disk in an efficient format. Each branchname is assigned a number, and for each revision we store the number of the corresponding branch name. The branch names are stored in a dedicated file which is strictly append only. Branch names are usually reused across several revisions, and the total list of branch names will thus be so small that it is feasible to read the whole set of names before using the cache. It will however do that it might be more efficient to use the changelog for retrieving the branch info for a single revision. The revision entries are stored in another file. This file is usually append only, but if the repository has been modified, the file will be truncated and the relevant parts rewritten on demand. The entries for each revision are 8 bytes each, and the whole revision file will thus be 1/8 of 00changelog.i. Each revision entry contains the first 4 bytes of the corresponding node hash. This is used as a check sum that always is verified before the entry is used. That check is relatively expensive but it makes sure history modification is detected and handled correctly. It will also detect and handle most revision file corruptions. This is just a cache. A new format can always be introduced if other requirements or ideas make that seem like a good idea. Rebuilding the cache is not really more expensive than it was to run for example 'hg log -b branchname' before this cache was introduced. This new method is still unused but promise to make some operations several times faster once it actually is used. Abandoning Python 2.4 would make it possible to implement this more efficiently by using struct classes and pack_into. The Python code could probably also be micro optimized or it could be implemented very efficiently in C where it would be easy to control the data access.	2015-01-08 00:01:03 +01:00
Matt Harbison	3b17299e61	branchmap: backout 03f077311ea1 This is no longer needed now that posixfile handles seeking to EOF when it opens a file in append mode.	2015-01-31 12:42:05 -05:00
Pierre-Yves David	5b2a89c72f	branchmap: pre-filter topological heads before ancestors based filtering We know that topological heads will not be ancestors of anything, so we filter them out to potentially reduce the range of the ancestors computation. On a strongly headed repo this gives humble speedup: from 0.1984 to 0.1629	2014-08-30 12:33:12 +02:00
Pierre-Yves David	fa154e4139	branchmap: issue a single call to `ancestors` for all heads There is no reason to make multiple calls. This provides a massive speedup for repo with a lot of heads. On a strongly headed repo this gives humble speedup in simple case: from 8.1097 to 5.1051 And massive speedup in other case: from 7.8787 to 0.1984	2014-08-30 12:20:50 +02:00
Matt Mackall	7cba48bf37	whitespace: nuke triple blank lines in **.py	2014-08-07 14:58:12 -05:00
Matt Mackall	9e74ea490c	branchmap: don't use ui.warn for debug message	2014-06-23 13:50:44 -05:00
Matt Mackall	561e39d121	branch: add debug message for branch cache write failure	2014-06-23 13:46:42 -05:00
Gregory Szorc	d620dd5f0f	branchmap: log events related to branch cache The blackblox log will now contain log events when the branch caches are updated and written.	2014-03-22 17:14:37 -07:00
Pierre-Yves David	f0e0234ea1	branchmap: use set for update code We are doing membership test and substraction. new code is marginally faster.	2014-01-06 15:19:31 -08:00
Pierre-Yves David	fc97641ca7	branchmap: simplify update code We drop iterrevs which are not needed anymore. The know head are never a descendant of the updated set. It was possible with the old strip code. This simplification make the code easier to read an update.	2014-01-06 14:26:49 -08:00
Pierre-Yves David	7641136888	branchmap: stop useless rev -> node -> rev round trip We never use the node of new revisions unless in the very specific case of closed heads. So we can just use the revision number. So give another handfull of percent speedup.	2014-01-03 16:44:23 -08:00
Pierre-Yves David	4a3c0bd301	branchmap: stop membership test in update logic Now that no user try to update the cache on a truncated repo we can drop the extra lookup. Give an handfull percent speedup on big branchmap update.	2013-01-15 20:04:12 +01:00
Pierre-Yves David	f4aa184d64	branchmap: remove silly line break The line fit in 80 character limit without it. It is even shorter without it.	2014-01-03 17:06:07 -08:00
Mads Kiilerich	1428b73c06	help: branch names primarily denote the tipmost unclosed branch head Was the behavior correct and the description wrong so it should be updated as in this patch? Or should the code work as the documentation says? Both ways could make some sense ... but none of them are obvious in all cases. One place where it currently cause problems is when the current revision has another branch head that is closer to tip but closed. 'hg rebase' refuses to rebase to that as it only see the tip-most unclosed branch head which is the current revision. /me kind of likes named branches, but no so much how branch closing works ...	2013-11-21 15:17:18 -05:00
Brodie Rao	34af0d72ea	branchmap: introduce iterbranches() method	2013-09-16 01:08:29 -07:00
Brodie Rao	f0a5d60210	branchmap: introduce branchheads() method	2013-09-16 01:08:29 -07:00
Brodie Rao	b2b08444eb	branchmap: introduce branchtip() method	2013-09-16 01:08:29 -07:00
Brodie Rao	a446720e09	branchmap: cache open/closed branch head information This lets us determine the open/closed state of a branch without reading from the changelog (which can be costly over NFS and/or with many branches).	2013-09-16 01:08:29 -07:00
Brodie Rao	b8ea796521	branchmap: add documentation on the branchcache on-disk format	2013-11-15 23:18:08 -05:00
Augie Fackler	2859ed15ec	subsettable: move from repoview to branchmap, the only place it's used This is a step towards breaking an import cycle between revset and repoview. Import cycles happened to work in Python 2 with implicit relative imports, but breaks on Python 3 when we start using explicit relative imports via 2to3 rewrite rules.	2013-11-06 14:38:34 -05:00
Pierre-Yves David	b5bc8d504a	branchmap: stop looking for stripped branch Since repoview in 2.5 we do not make special call to `branchmap` when stripping. We just recompute the branchmap from a lower subset that still has valid branchmap. So I'm dropping this dead code.	2013-09-30 17:42:38 +02:00
Pierre-Yves David	64005d41eb	branchmap: remove the droppednodes logic It was unused. note how it is only extended if the list is empty. So it's always empty at the end. We could try to fix that, however this would part of the code is to be removed in the next changeset as we do not run `branchmap` on truncated repo since `repoview` in 2.5.	2013-09-30 17:31:39 +02:00
Pierre-Yves David	a58c8b0406	branchmap: fix blank line position The blank line was after was after the `if` condition instead of before.	2013-09-30 15:52:37 +02:00
Mads Kiilerich	5787baee50	spelling: fix some minor issues found by spell checker	2013-02-10 18:24:29 +01:00
Pierre-Yves David	e0122c56e3	branchmap: display filtername when `updatebranch` fails to do its jobs We have a very handy assert at the ends of `branchmap.updatecache` that check the resulting branchmap is actually valid. I know we do not like assert in mercurial but this one is very handy for debugging. There is really not reason for `branchmap.updatecache` to have this kind of issue but this happened and handful of time during the development of this or introduction of other related feature. I advice to keep it around until we are a bit more confident with the new code.	2013-01-19 02:29:56 +01:00
Mads Kiilerich	641f4b8a6c	localrepo: store branchheads sorted	2013-01-15 02:59:12 +01:00
Pierre-Yves David	77db5734c1	branchmap: Save changectx creation during update The newly introduced `branchmap` function allows us to skip the creation of changectx objects. This speeds up the construction of the branchmap. On the mozilla repository (117293 changesets, 15490 mutable) Before: ! impactable 19.9 ! mutable 0.576 ! unserved 3.16 After: ! impactable 7.03 (2.8x faster) ! mutable 0.352 (1.6x) ! unserved 1.15 (2.7x) On the cpython repository (81418 changesets, 6418 mutable) Before: ! impactable 15.9 ! mutable 0.451 ! unserved 0.861 After: ! impactable 6.55 (2.4x faster) ! mutable 0.170 (2.6x faster) ! unserved 0.289 (2.9x faster) On the pypy repository (58852 changesets) Before: ! impactable 13.6 After: ! impactable 6.17 (2.2x faster) On my Mercurial repository (18295 changesets, 2210 mutable) Before: ! impactable 23.9 ! mutable 0.368 ! unserved 0.057 After: ! impactable 1.31 (18x faster) ! mutable 0.042 (8.7x) ! unserved 0.025 (2.2x)	2013-01-11 18:47:42 +01:00
Pierre-Yves David	4bd2fce08b	branchmap: pass revision insteads of changectx to the update function Creation of changectx objects is very slow, and they are not very useful. We are going to drop them. The first step is to change the function argument type.	2013-01-08 01:28:39 +01:00
Pierre-Yves David	01b68ae973	branchmap: allow to use cache of subset Filtered repository are subset of unfiltered repository. This means that a filtered branchmap could be use to compute the unfiltered version. And filtered version happen to be subset of each other: - "all() - unserved()" is a subset of "all() - hidden()" - "all() - hidden()" is a subset of "all()" This means that branchmap with "unfiltered" filter can be used as a base for "hidden" branchmap that itself could be used as a base for unfiltered branchmap. unserved < hidden < None This changeset implements this mechanism. If the on disk branchcache is not valid we use the branchcache of the nearest subset as base instead of computing it from scratch. Such fallback can be cascaded multiple time is necessary. Note that both "hidden" and "unserved" set are a bit volatile. We will add more stable filtering in next changesets. This changeset enables collaboration between no filtering and "unserved" filtering. Fixing performance regression introduced by 7bff5f37cb97	2013-01-07 17:23:25 +01:00
Pierre-Yves David	7b8d884b29	branchmap: add a copy method If we want branchcache of different filter to collaborate, they need a simple way to copy each other. This will ensure that each filtered have no side effect on other filter level cache.	2013-01-02 01:40:42 +01:00
Pierre-Yves David	ea8f599221	branchmap: drop `_cacheabletip` usage in `updatecache` Nobody overwrite the `_cacheabletip` any more. We always update the cache for the whole repo and write it to disk (or at list try to). The `updatecache` code is simplied to remove the double phase logic associated with _cacheabletip.	2013-01-04 01:25:55 +01:00
Pierre-Yves David	e5d81232c2	branchmap: ignore Abort error while writing cache Read only vfs can now raise Abort exception. Note that encoding.local are also a possible raiser.	2013-01-04 04:52:57 +01:00
Pierre-Yves David	f01949c09e	branchmap: read return None in case of failure This makes a clear distinction between having read a valid cache on disk or not. This will help caches of various filtering level to collaborate.	2012-12-22 19:41:11 +01:00
Pierre-Yves David	0cd9115520	branchmap: enable caching for filtered version too The `_branchcache` attribute is turned into a dictionary. Key are filter name and value is a `branchcache` object. Unfiltered version is cached as `None` filter. The attribute is renamed to `_branchcaches` to avoid confusion with the previous one. Both old and new contents are dictionary even if their contents are different. I prefer possible extension code to crash right away instead of just messing the wrong dictionary. As all different caches work isolated to each other, this code keeps the previous behavior of using the unfiltered cache we nothing is filtered. This is a cheap way to have cache collaborate and nullify potential impact in the default case.	2012-12-24 03:21:15 +01:00
Pierre-Yves David	daf9851247	branchmap: report filtername when read fails Now that we can have multiple one, we need to know which filecache failed to be read from disk.	2013-01-01 21:27:13 +01:00
Pierre-Yves David	256c2dfbf0	branchmap: use a different file name for filtered view of repo	2012-12-24 03:06:03 +01:00
Pierre-Yves David	4ad32b10f3	branchmap: move the cache file name into a dedicated function Filtered view of the repo will want to write they file name in a different file.	2012-12-24 03:04:12 +01:00

1 2

68 Commits