sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-09 08:18:15 +03:00

Author	SHA1	Message	Date
Durham Goode	2e0dd44c19	branchmap: handle nullrev in setcachedata 906be86990 recently changed to switch from: self._rbcrevs[rbcrevidx:rbcrevidx + _rbcrecsize] = rec to pack_into(_rbcrecfmt, self._rbcrevs, rbcrevidx, node, branchidx) This causes an exception if rbcrevidx is -1 (i.e. the nullrev). The old code handled this because python handles out of bound sets to arrays gracefully. The new code throws because the self._rbcrevs buffer isn't long enough to write 8 bytes to. Normally it would've been resized by the immediately preceding line, but because the 0 length buffer is greater than the idx (-1) times the size, no resize happens. Setting the branch for the nullrev doesn't make sense anyway, so let's skip it. This was caught by external tests in the Facebook extensions repo, but I've added a test here that catches the issue.	2017-03-15 15:48:57 -07:00
Augie Fackler	38e6574e36	branchmap: fix python 2.6 by using util.buffer() instead of passing bytearray	2017-03-12 19:47:51 -04:00
Mads Kiilerich	d97e14e32b	rbc: empty (and invalid) rbc-names file should give an empty name list An empty file (if it somehow should exist) used to give a list with an empty name. That didn't do any harm, but it was "wrong". Fix that.	2017-03-12 12:17:30 -07:00
Mads Kiilerich	d6292de3bd	rbc: use struct unpack_from and pack_into instead of unpack and pack These functions were introduced in Python 2.5 and are faster and simpler than the old ones ... mainly because we can avoid intermediate buffers: $ python -m timeit -s "_rbcrecfmt='>4sI'" -s 's = "x"10000' -s 'from struct import unpack' 'unpack(_rbcrecfmt, buffer(s, 16, 8))' 1000000 loops, best of 3: 0.543 usec per loop $ python -m timeit -s "_rbcrecfmt='>4sI'" -s 's = "x"10000' -s 'from struct import unpack_from' 'unpack_from(_rbcrecfmt, s, 16)' 1000000 loops, best of 3: 0.323 usec per loop $ python -m timeit -s "from array import array" -s "_rbcrecfmt='>4sI'" -s "s = array('c')" -s 's.fromstring("x"10000)' -s 'from struct import pack' -s "rec = array('c')" 'rec.fromstring(pack(_rbcrecfmt, "asdf", 7))' 1000000 loops, best of 3: 0.364 usec per loop $ python -m timeit -s "from array import array" -s "_rbcrecfmt='>4sI'" -s "s = array('c')" -s 's.fromstring("x"10000)' -s 'from struct import pack_into' -s "rec = array('c')" -s 'rec.fromstring("x"*100)' 'pack_into(_rbcrecfmt, rec, 0, "asdf", 7)' 1000000 loops, best of 3: 0.229 usec per loop	2016-10-19 02:46:35 +02:00
Yuya Nishihara	bec7ade60c	py3: drop unused aliases to array.array which are replaced with bytearray	2017-03-12 11:47:02 -07:00
Augie Fackler	9c70a09b17	branchmap: stringify int in a portable way We actually need a bytes in Python 3, and thanks to our nasty source loader this will portably do the right thing.	2017-03-12 00:42:46 -05:00
Augie Fackler	0c31289213	branchmap: don't use buffer() on Python 3 This is certainly slower than the Python 2 code, but it works, and we can revisit it later if it's a problem.	2017-03-12 00:49:19 -05:00
Augie Fackler	9a15a28705	py3: use bytearray() instead of array('c', ...) constructions Portable from 2.6-3.6.	2017-03-12 03:32:21 -04:00
Simon Farnsworth	e0b70e4f7f	mercurial: switch to util.timer for all interval timings util.timer is now the best available interval timer, at the expense of not having a known epoch. Let's use it whenever the epoch is irrelevant.	2017-02-15 13:17:39 -08:00
Pierre-Yves David	d7ae979f6e	branchmap: remove extra indent This clean up the rest of the previous changeset.	2016-08-05 15:01:16 +02:00
Pierre-Yves David	35ede6eab9	branchmap: simplify error handlind when writing rev branch cache Now that we have a general try except, we can move the error handling from the individual writes in it. Code will be reindented in the next changeset to help this on readability.	2016-08-05 15:00:53 +02:00
Pierre-Yves David	bcc53027a6	branchmap: acquires lock before writting the rev branch cache We now attempt to acquire a lock and write the branch cache within that lock. This would prevent cache corruption when multiple processes try to write the cache at the same time.	2016-08-05 14:57:16 +02:00
Pierre-Yves David	b236d4e99c	branchmap: preparatory indent of indent the branch rev writing code The rev branch cache is written without a lock, we are going to fix this but we indent the code beforehand to make the next changeset clearer.	2016-08-05 14:54:46 +02:00
Mads Kiilerich	ffd590bdea	rbc: fix superfluous rebuilding from scratch - don't abuse self._rbcnamescount The code used self._rbcnamescount as if it was the length of self._names ... but actually it is just the number of good entries on disk. This caused the cache to be populated inefficiently. In some cases very inefficiently. Instead of checking the length before lookup, just try a lookup in self._names - that is also in most cases faster. Comments and debug messages are tweaked to help understanding the issue and the fix.	2016-07-18 22:25:09 +02:00
Mads Kiilerich	9d89999d53	rbc: fix invalid rbc-revs entries caused by missing cache growth It was in some cases possible to end up writing to the cache file without growing it first. The range assignment in _setcachedata would append instead of writing at the requested position and thus write the new record in the wrong place. To fix this, we avoid looking up in too small caches, and when growing the cache, do it right before writing the new record to it so we know it has been done correctly.	2016-07-18 22:22:38 +02:00
Gregory Szorc	c98d59c36a	branchmap: remove unused exception variable	2016-03-12 16:08:19 -08:00
Mads Kiilerich	a4030fe6b2	cache: rebuild branch cache from scratch when inconsistencies are detected This should recover automatically from some corruptions that for unknown reasons are seen in the wild.	2016-03-13 02:06:23 +01:00
Mads Kiilerich	3f4cf91261	cache: safer handling of failing seek when writing revision branch cache If the seek for some reason fails (perhaps because the file is too short to search to the requested position), make sure we seek to the start and rewrite everything. It is unknown if this fixes a real problem that ever happened.	2016-03-13 02:06:22 +01:00
Mads Kiilerich	de2745150d	cache: remove branch revision file before rewriting the branch name file New branch names are usually appended to the branch name file. If that fails or the file has been modified by another process, it is rewritten. That left a small opportunity that there could be references to non-existent entries in the file while it was rewritten. To avoid that, remove the revision branch cache file with the references to the branch name file before rewriting the branch name file. Worst case, when interrupted at the wrong time, the cache will be lost and rebuilt next time. It is unknown if this fixes a real problem that ever happened.	2016-03-13 02:06:21 +01:00
Durham Goode	ccda03e5b2	branchmap: check node against changelog instead of repo Testing 'node in repo' requires constructing a changectx, which is a little expensive. Testing 'repo.changelog.hasnode(node)' is notably faster. This saves 10-20ms off of every command, when testing a few thousand nodes from the branch cache. I considered changing the implementation of localrepository.__contains__ so every place would benefit from the change, but since localrepository.__contains__ uses changectx to check if the commit exists, it means it supports a wider range of possible inputs (like revs, hashes, '.', etc), so it seemed unnecessarily risky.	2016-03-07 17:26:47 -08:00
Pierre-Yves David	30913031d4	error: get Abort from 'error' instead of 'util' The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be confused about that and gives all the credit to 'util' instead of the hardworking 'error'. In a spirit of equity, we break the cycle of injustice and give back to 'error' the respect it deserves. And screw that 'util' poser. For great justice.	2015-10-08 12:55:45 -07:00
Gregory Szorc	eeba469be5	branchmap: move branch cache code out of streamclone.py This is low-level branch map and cache manipulation code. It deserves to live next to similar code in branchmap.py. Moving it also paves the road for multiple consumers, such as a bundle2 part handler that receives branch mappings from a remote. This is largely a mechanical move, with only variable names and indentation being changed.	2015-10-03 09:53:56 -07:00
Gregory Szorc	093007295f	branchmap: use absolute_import	2015-08-07 19:51:55 -07:00
Gregory Szorc	5380dea2a7	global: mass rewrite to use modern exception syntax Python 2.6 introduced the "except type as instance" syntax, replacing the "except type, instance" syntax that came before. Python 3 dropped support for the latter syntax. Since we no longer support Python 2.4 or 2.5, we have no need to continue supporting the "except type, instance". This patch mass rewrites the exception syntax to be Python 2.6+ and Python 3 compatible. This patch was produced by running `2to3 -f except -w -n .`.	2015-06-23 22:20:08 -07:00
Yuya Nishihara	3ca8d79bef	revbranchcache: return uncached branchinfo for nullrev (issue4683) This fixes the crash caused by "branch(null)" revset. No cache should be necessary for nullrev because changelog.branchinfo(nullrev) does not involve IO operation. Note that the problem of "branch(wdir())" isn't addressed by this patch. "wdir()" will raise TypeError in many places because of None. This is the reason why "wdir()" is still experimental.	2015-05-23 11:14:00 +09:00
Mads Kiilerich	f11e1d55b9	branchcache: stay silent if failing to read cache files The warning has in some cases incorrectly attributed unrelated problems to rbc. Instead, just do like the branch head cache does and stay quiet when reading fails. The cache will be missing the first time a repo is used. It is a normal situation and there is no reason to make a note of that.	2015-01-14 01:15:26 +01:00
Gregory Szorc	5b06fcaa29	repoview: move function for computing filtered hash An upcoming patch will establish per-filter tags caches. We'll want to use the same cache validation logic as the branch cache. Prepare for that by moving the logic for computing a filtered view hash to somewhere central.	2015-04-01 18:43:29 -07:00
Durham Goode	e0b3f09a3b	revbranchcache: write cache even during read operations Previously we would only actually write the revbranchcache to disk if we were in the middle of a write operation (like commit). Now we will also write it during any read operation. The cache knows how to invalidate itself, so it shouldn't become corrupt if multiple writers try at once (and the write-on-read behavior/risk is the same as all our other caches).	2015-02-24 18:43:31 -08:00
Durham Goode	9d9fb7f2ee	revbranchcache: move cache writing to the transaction finalizer Instead of writing the revbranchcache during updatecache (which often happens too early, before the cache is even populated), let's run it as part of the transaction finalizer. It still won't be written for read-only operations, but that's no worse than it is today. A future commit will remove the actual write that happens in updatecache(). This is also good prep for when all caches get moved into the transaction.	2015-02-10 20:06:12 -08:00
Durham Goode	f333b64f2f	revbranchcache: populate cache incrementally Previously the cache would populate completely the first time it was accessed. This could take over a minute on larger repos. This patch changes it to update incrementally. Only values that are read will be written, and it will only rewrite as much of the file as strictly necessary. This adds a magic value of '\0\0\0\0' to represent an empty cache entry. The probability of this matching an actual commit hash prefix is tiny, so it's ok if that's always considered a cache miss. This is also BC safe since any existing entries with '\0\0\0\0' will just be considered misses. Perf numbers: Mozilla-central: hg --time log -r 'branch(mobile)' -T. Cold Cache: 14.7s -> 15.1s (3% worse) Warm Cache: 1.6s -> 2.1s (30% worse) Mozilla-cental: hg perfbranchmap 2s -> 2.4s (20% worse) hg: hg log -r 'branch(stable) & branch(default)' Cold Cache: 3.1s -> 1.9s (40% better - because the old code missed the cache on both branch() revset iterations, so it did twice the work) Warm Cache: 0.2 -> 0.26 (30% worse) internal huge repo: hg --time log -r 'tip & branch(default)' Cold Cache: 65.4s -> 0.2s (327x better) While this change introduces minor regressions when iterating over every commit in a branch, it massively improves the cold cache time for operations which touch a single commit. I feel the better O() is worth it in this case.	2015-02-10 20:04:47 -08:00
Durham Goode	81ca4a2723	revbranchcache: move entry writing to a separate function This moves the actual writing of entries to the cache to a separate function. This will allow us to use it in multiple places. Ex: in one place we will write dummy entries, and in another place we will write real data.	2015-02-10 20:01:08 -08:00
Durham Goode	23a18a419d	revbranchcache: store repo on the object Previously we would instantiate the revbranchcache with a repo object, use it briefly, then require it be passed in every time we wanted to fetch any information. This seems unnecessary since it's obviously specific to that repo (since it was constructed with it). This patch stores the repo on the revbranchcache object, and removes the repo parameter from the various functions on that class. This has the other nice benefit of removing the double-revbranchcache-read that existed before (it was read once for the branch revset, and once for the repo.revbranchcache).	2015-02-10 19:57:51 -08:00
Durham Goode	9ac6d81ba3	revbranchcache: move out of branchmap onto localrepo Previously the revbranchcache was a field inside the branchmap. This is bad for a couple reasons: 1) There can be multiple branchmaps per repo (one for each filter level). There can only be one revbranchcache per repo. In fact, a revbranchcache could only exist on a branchmap that was for the unfiltered view, so you could have branchmaps exist for which you couldn't have a revbranchcache. It was funky. 2) The write lifecycle for the revbranchcache is going to be different from the branchmap (branchmap is greedily written early on, revbranchcache should be lazily computed and written). This patch moves the revbranchcache to live as a field on the localrepo (alongside self._branchmap). This will allow us to handle it's lifecycle differently, which will let us move it to be lazily computed in future patches.	2015-02-10 19:53:48 -08:00
Matt Mackall	b907416f7b	merge with stable	2015-03-02 01:20:14 -06:00
Mads Kiilerich	56207b4242	revisionbranchcache: fall back to slow path if starting readonly (issue4531) Transitioning to Mercurial versions with revision branch cache could be slow as long as all operations were readonly (revset queries) and the cache would be populated but not written back. Instead, fall back to using the consistently slow path when readonly and the cache doesn't exist yet. That avoids the overhead of populating the cache without writing it back. If not readonly, it will still populate all missing entries initially. That avoids repeated writing of the cache file with small updates, and it also makes sure a fully populated cache available for the readonly operations.	2015-02-06 02:52:10 +01:00
Angel Ezquerra	88cbab7845	localrepo: remove all external users of localrepo.opener This change touches every module in which repository.opener was being used, and changes it for the equivalent repository.vfs. This is meant to make it easier to split the repository.vfs into several separate vfs. It should now be possible to remove localrepo.opener.	2015-01-15 23:17:12 +01:00
Mads Kiilerich	bc7c34a53f	branchcache: make _rbcrevslen handling more safe self._rbcrevslen is used to keep track of the number of good records on disk. It should thus not be updated before the records actually have been written to disk.	2015-01-14 01:15:26 +01:00
Mads Kiilerich	49a09d3e6c	branchcache: add debug output whenever cache files use truncate The cache files are usually append only but will automatically be truncated and recover in exceptional situations. Add a debug notice when such exceptional situations are encountered.	2015-01-14 01:15:26 +01:00
Matt Harbison	9825da1159	branchmap: add seek() to end of file before calling tell() on append open() This is similar to 5274228efcdc, which was subsequently modified in dd809b0d9714 for 2.4. Unexpected test changes on Windows occurred without this.	2015-01-10 12:00:03 -05:00
Mads Kiilerich	835157e77d	branchmap: use revbranchcache when updating branch map The revbranchcache is read on demand before it will be used for updating the branch map. It is written back when the branchmap is written and it will thus use the same locking as branchmap. The revbranchcache instance is short-lived; it is only stored in the branchmap from .update() is invoked and until .write() is invoked. Branchmap already assume that the repo is locked in that case. The use of revbranchcache for branch map updates will make sure that the revbranchcache "always" is kept up-to-date. The perfbranchmap benchmark is somewhat bogus, especially when we can see that the caching makes a significant difference between the realistic case of a first run and the rare case of rerunning it with a full cache. Here are some 'base' numbers on mozilla-central: Before: ! wall 6.912745 comb 6.910000 user 6.840000 sys 0.070000 (best of 3) After - initial, cache is empty: ! wall 7.792569 comb 7.790000 user 7.720000 sys 0.070000 (best of 3) After - cache is full: ! wall 0.879688 comb 0.880000 user 0.870000 sys 0.010000 (best of 4) The overhead when running with empty cache comes from checking, missing and updating it every time. Most of the performance improvement comes from not having to extract the branch info from the changelog. The last doubling of performance comes from no longer having to convert all branch names to local encoding but reuse the few already converted branch names. On the hg repo: Before: ! wall 0.715703 comb 0.710000 user 0.710000 sys 0.000000 (best of 14) After: ! wall 0.105489 comb 0.110000 user 0.110000 sys 0.000000 (best of 87)	2015-01-08 00:01:03 +01:00
Mads Kiilerich	1b3892318f	branchcache: introduce revbranchcache for caching of revision branch names It is expensive to retrieve the branch name of a revision. Very expensive when creating a changectx and calling .branch() every time - slightly less when using changelog.branchinfo(). Now, to speed things up, provide a way to cache the results on disk in an efficient format. Each branchname is assigned a number, and for each revision we store the number of the corresponding branch name. The branch names are stored in a dedicated file which is strictly append only. Branch names are usually reused across several revisions, and the total list of branch names will thus be so small that it is feasible to read the whole set of names before using the cache. It will however do that it might be more efficient to use the changelog for retrieving the branch info for a single revision. The revision entries are stored in another file. This file is usually append only, but if the repository has been modified, the file will be truncated and the relevant parts rewritten on demand. The entries for each revision are 8 bytes each, and the whole revision file will thus be 1/8 of 00changelog.i. Each revision entry contains the first 4 bytes of the corresponding node hash. This is used as a check sum that always is verified before the entry is used. That check is relatively expensive but it makes sure history modification is detected and handled correctly. It will also detect and handle most revision file corruptions. This is just a cache. A new format can always be introduced if other requirements or ideas make that seem like a good idea. Rebuilding the cache is not really more expensive than it was to run for example 'hg log -b branchname' before this cache was introduced. This new method is still unused but promise to make some operations several times faster once it actually is used. Abandoning Python 2.4 would make it possible to implement this more efficiently by using struct classes and pack_into. The Python code could probably also be micro optimized or it could be implemented very efficiently in C where it would be easy to control the data access.	2015-01-08 00:01:03 +01:00
Matt Harbison	3b17299e61	branchmap: backout 03f077311ea1 This is no longer needed now that posixfile handles seeking to EOF when it opens a file in append mode.	2015-01-31 12:42:05 -05:00
Pierre-Yves David	5b2a89c72f	branchmap: pre-filter topological heads before ancestors based filtering We know that topological heads will not be ancestors of anything, so we filter them out to potentially reduce the range of the ancestors computation. On a strongly headed repo this gives humble speedup: from 0.1984 to 0.1629	2014-08-30 12:33:12 +02:00
Pierre-Yves David	fa154e4139	branchmap: issue a single call to `ancestors` for all heads There is no reason to make multiple calls. This provides a massive speedup for repo with a lot of heads. On a strongly headed repo this gives humble speedup in simple case: from 8.1097 to 5.1051 And massive speedup in other case: from 7.8787 to 0.1984	2014-08-30 12:20:50 +02:00
Matt Mackall	7cba48bf37	whitespace: nuke triple blank lines in **.py	2014-08-07 14:58:12 -05:00
Matt Mackall	9e74ea490c	branchmap: don't use ui.warn for debug message	2014-06-23 13:50:44 -05:00
Matt Mackall	561e39d121	branch: add debug message for branch cache write failure	2014-06-23 13:46:42 -05:00
Gregory Szorc	d620dd5f0f	branchmap: log events related to branch cache The blackblox log will now contain log events when the branch caches are updated and written.	2014-03-22 17:14:37 -07:00
Pierre-Yves David	f0e0234ea1	branchmap: use set for update code We are doing membership test and substraction. new code is marginally faster.	2014-01-06 15:19:31 -08:00
Pierre-Yves David	fc97641ca7	branchmap: simplify update code We drop iterrevs which are not needed anymore. The know head are never a descendant of the updated set. It was possible with the old strip code. This simplification make the code easier to read an update.	2014-01-06 14:26:49 -08:00

1 2

95 Commits