sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-11 09:17:30 +03:00

Author	SHA1	Message	Date
Pierre-Yves David	c05e3eea5d	setdiscovery: drop the 'always' argument to '_updatesample' This argument exists because of the complex code flow in '_takequicksample'. It first gets the list of heads and then calls '_updatesample' on an empty initial sample and a size limit matching the differences between the number of heads and the target sample size. Finally the heads and the sample from '_updatesample' were added. To ensure this addition result had the exact target length, the code had to ensure no elements from the heads were added to the '_updatesample' content and therefore was passing this "always included set of heads". Instead we can just update the initial heads sample directly and use the final target size as target size for the update. This removes the need for this 'always' parameter to the '_updatesample' function The test are affected because different set building order results in different random sampling.	2015-01-07 10:32:17 -08:00
Pierre-Yves David	6ff053fa11	setdiscovery: always add exponential sample to the heads As explained in a previous changeset, prioritizing heads too much behaves pathologically when there are more heads than the sample size. To counter this, we always inject exponential samples before reducing to the sample size limit. This already show some benefit in the test themselves, but on a real-world example this moves my discovery for push to pathologically headed repo from 45 rounds to 17 of them. We should maybe ensure that at least 25% of the result sample is heads, but I think the random sampling will be fine in practice.	2015-01-07 17:28:51 -08:00
Pierre-Yves David	60a9cd0334	setdiscovery: directly run '_updatesample' The heads and exponential sample are going to end up in the same set before any extra processing happens. We simplify the code by directly updating a set with heads. Changes in the order the set is built lead to small changes in the random sampling output. But after double checking, I can confirm the input data to the random sampling is consistent.	2015-01-07 17:23:21 -08:00
Pierre-Yves David	252ba1a3c3	setdiscovery: stop using '_setupsample' in '_takefullsample' Very few of the return values of '_setupsample' remain in use, so we directly retrieve the value we care about and drop the '_setupsample' call.	2015-01-07 17:17:56 -08:00
Pierre-Yves David	e3605ecf1f	setdiscovery: randomly pick between heads and sample when taking full sample Before this changeset, the discovery protocol was too heads-centric. Heads of the undiscovered set were always sent for discovery and any room remaining in the sample were filled with exponential samples (and random ones if any room remained). This behaved extremely poorly when the number of heads exceeded the sample size, because we keep just asking about the existence of heads, then their direct parent and so on. As a result, the 'O(log(len(repo)))' discovery turns into a 'O(len(repo))' one. As a solution we take a random sample of the heads plus exponential samples. This way we ensure some exponential sampling is achieved, bringing back some logarithmic convergence of the discovery again. This patch only applies this principle in one place. More places will be updated in future patches. One test is impacted because the random sample happen to be different. By chance, it helps a bit in this case.	2015-01-07 12:09:51 -08:00
Pierre-Yves David	6141054495	setdiscovery: document the '_updatesample' function This function is central in the sample building process, having it documented help code readability a lot.	2015-01-06 17:02:32 -08:00
Pierre-Yves David	2fde36047b	setdiscovery: avoid calling any sample building if the undecided set is small If the length of undecided is smaller than the sample size, we can just request information for all of them. This conditional was previously handled by '_setupsample'. But '_setupsample' is in my opinion a problematic function with blurry semantics. Having this conditional explicitly earlier makes the code more explicit and moves us closer to removing this '_setupsample' function.	2015-01-06 16:40:33 -08:00
Pierre-Yves David	ef881538c4	setdiscovery: delay sample building calls to gather them in a single place Some of the logic around sample building is duplicated in the sample builders, it would clean up thing to extract it in the top function, but this requires all codes to be in the same place. This changeset mostly exists to make the next one more clear.	2015-01-07 09:30:06 -08:00
Pierre-Yves David	ebee9c1c62	setdiscovery: drop unused 'initial' argument for '_takequicksample' There is a single call site, and it is always using 'initial=True'. So we just drop the argument and the associated condition.	2015-01-06 16:32:23 -08:00
Matt Mackall	efd707b6d7	readmarkers: add a SHA256 fixme note	2015-01-11 16:46:13 -06:00
Matt Mackall	1b1c572cac	readmarkers: fast-path single successors and parents This gives about a 5% performance bump.	2015-01-11 16:37:57 -06:00
Matt Mackall	aaefce821d	readmarkers: promote global constants to locals for performance	2015-01-11 15:35:09 -06:00
Matt Mackall	5a2d46743d	readmarkers: drop a temporary	2015-01-11 14:52:57 -06:00
Matt Mackall	f70b8899f1	readmarkers: read node reading into node length conditional This removes some conditional assignments	2015-01-11 14:51:49 -06:00
Matt Mackall	fd24385ccc	readmarkers: drop a temporary Two other temporaries are renamed to fit line-length.	2015-01-11 14:46:55 -06:00
Matt Mackall	54920da7a8	readmarkers: hoist subtraction out of loop comparison	2015-01-11 14:44:57 -06:00
Matt Mackall	ccb9e25cca	readmarkers: streamline offset tracking This minimizes the number of assignments and operations needed to use offsets.	2015-01-11 14:43:31 -06:00
Matt Mackall	4cb9887cb8	readmarkers: use unpacker for fixed header	2015-01-11 14:37:50 -06:00
Matt Mackall	66bb59fd8e	readmarkers: drop metadata temporary	2015-01-11 14:35:03 -06:00
Matt Mackall	eeecc7d717	readmarkers: drop date temporary	2015-01-11 14:33:49 -06:00
Matt Mackall	46c0e8872c	readmarkers: drop another conditional	2015-01-11 14:32:56 -06:00
Matt Mackall	b42ed84f7a	readmarkers: drop a conditional	2015-01-10 21:28:15 -06:00
Matt Mackall	aebd9ab379	readmarkers: add some whitespace	2015-01-10 21:27:29 -06:00
Matt Mackall	00030ac687	readmarkers: combine parent conditionals	2015-01-10 21:25:07 -06:00
Matt Mackall	57b821b575	readmarkers: drop temporary substring assignments Assignments are expensive in inner loops	2015-01-10 21:24:45 -06:00
Matt Mackall	2537cb8fbf	util: introduce unpacker This allows taking advantage of Python 2.5+'s struct.Struct, which provides a slightly faster unpack due to reusing formats. Sadly, .unpack_from is significantly slower.	2015-01-10 21:18:31 -06:00
Mads Kiilerich	61a36ea4fe	revset: use localrepo revbranchcache for branch name filtering Branch name filtering in revsets was expensive. For every rev it created a changectx and called .branch() which retrieved the branch name from the changelog. Instead, use the revbranchcache. The revbranchcache is used read-only. The revset implementation with generators and callbacks makes it hard to figure out when we are done using/updating the cache and could write it back. It would also be 'tricky' to lock the repo for writing from within a revset execution. Finally, the branchmap update will usually make sure that the cache is updated before any revset can be run. The revbranchcache is used without any locking but is short-lived and used in a tight loop where we can assume that the changelog doesn't change ... or where it not is relevant to us if it does. perfrevset 'branch(mobile)' on mozilla-central. Before: ! wall 10.989637 comb 10.970000 user 10.940000 sys 0.030000 (best of 3) After, no cache: ! wall 7.368656 comb 7.370000 user 7.360000 sys 0.010000 (best of 3) After, with cache: ! wall 0.528098 comb 0.530000 user 0.530000 sys 0.000000 (best of 18) The performance improvement even without cache come from being based on branchinfo on the changelog instead of using ctx.branch(). Some tests are added to verify that the revbranchcache works and keep an eye on when the cache files actually are updated.	2015-01-08 00:01:03 +01:00
Mads Kiilerich	835157e77d	branchmap: use revbranchcache when updating branch map The revbranchcache is read on demand before it will be used for updating the branch map. It is written back when the branchmap is written and it will thus use the same locking as branchmap. The revbranchcache instance is short-lived; it is only stored in the branchmap from .update() is invoked and until .write() is invoked. Branchmap already assume that the repo is locked in that case. The use of revbranchcache for branch map updates will make sure that the revbranchcache "always" is kept up-to-date. The perfbranchmap benchmark is somewhat bogus, especially when we can see that the caching makes a significant difference between the realistic case of a first run and the rare case of rerunning it with a full cache. Here are some 'base' numbers on mozilla-central: Before: ! wall 6.912745 comb 6.910000 user 6.840000 sys 0.070000 (best of 3) After - initial, cache is empty: ! wall 7.792569 comb 7.790000 user 7.720000 sys 0.070000 (best of 3) After - cache is full: ! wall 0.879688 comb 0.880000 user 0.870000 sys 0.010000 (best of 4) The overhead when running with empty cache comes from checking, missing and updating it every time. Most of the performance improvement comes from not having to extract the branch info from the changelog. The last doubling of performance comes from no longer having to convert all branch names to local encoding but reuse the few already converted branch names. On the hg repo: Before: ! wall 0.715703 comb 0.710000 user 0.710000 sys 0.000000 (best of 14) After: ! wall 0.105489 comb 0.110000 user 0.110000 sys 0.000000 (best of 87)	2015-01-08 00:01:03 +01:00
Mads Kiilerich	1b3892318f	branchcache: introduce revbranchcache for caching of revision branch names It is expensive to retrieve the branch name of a revision. Very expensive when creating a changectx and calling .branch() every time - slightly less when using changelog.branchinfo(). Now, to speed things up, provide a way to cache the results on disk in an efficient format. Each branchname is assigned a number, and for each revision we store the number of the corresponding branch name. The branch names are stored in a dedicated file which is strictly append only. Branch names are usually reused across several revisions, and the total list of branch names will thus be so small that it is feasible to read the whole set of names before using the cache. It will however do that it might be more efficient to use the changelog for retrieving the branch info for a single revision. The revision entries are stored in another file. This file is usually append only, but if the repository has been modified, the file will be truncated and the relevant parts rewritten on demand. The entries for each revision are 8 bytes each, and the whole revision file will thus be 1/8 of 00changelog.i. Each revision entry contains the first 4 bytes of the corresponding node hash. This is used as a check sum that always is verified before the entry is used. That check is relatively expensive but it makes sure history modification is detected and handled correctly. It will also detect and handle most revision file corruptions. This is just a cache. A new format can always be introduced if other requirements or ideas make that seem like a good idea. Rebuilding the cache is not really more expensive than it was to run for example 'hg log -b branchname' before this cache was introduced. This new method is still unused but promise to make some operations several times faster once it actually is used. Abandoning Python 2.4 would make it possible to implement this more efficiently by using struct classes and pack_into. The Python code could probably also be micro optimized or it could be implemented very efficiently in C where it would be easy to control the data access.	2015-01-08 00:01:03 +01:00
Anton Shestakov	6fe7d43de3	hgweb: move archive entries outside of <li> in monoblue style archiveentry already includes surrounding <li></li>, so putting archive entries inside <li> element produced incorrect markup.	2015-01-09 22:53:38 +08:00
Anton Shestakov	8bdbccf3bf	hgweb: add searchhint to templates/coal/map coal style uses every template (except header.tmpl) directly from paper style, but doesn't use paper/map file. Elements defined in such map files are used in templates as you would expect. For example, paper/search.tmpl contains '{searchhint}' and template engine replaces that with the actual hint. But when coal style reuses paper/search.tmpl, it needs to define searchhint in its map file as well, or template engine will not find it. So let's copy it from paper/map to coal/map. Before this change, if the coal style was selected, the hint for the search field in page header was present, but it was completely empty. Although the absence of searchhint in coal/map produced no error.	2015-01-09 15:24:55 +08:00
Martin von Zweigbergk	01d503fc7e	status: don't override _buildstatus() in workingcommitctx Now that the caching into _status is done in workingctx._dirstatestatus(), which workingcommitctx._dirstatestatus() does not call, there is no caching to prevent in _buildstatus(), so stop overriding it.	2015-01-08 13:29:06 -08:00
Martin von Zweigbergk	370c0e4b47	status: cache dirstate status in _dirstatestatus() Since it's only the dirstate status we cache, it makes more sense to cache it in the _dirstatestatus() method. Note that this change means the dirstate status will also be cached when status is requested between the working copy and some other revision, while we currently only cache the result if exactly the status between the working copy and its parent is requested.	2015-01-08 13:12:44 -08:00
Sean Farley	28f3ddd179	localrepo: add ignoremissing parameter to branchtip Previously, in the namespaces api, the only caller of branchtip was singlenode which happened to raise the same exception that branchtip raised: KeyError. This is a minor change but will allow upcoming patches to use repo.branchtip to not raise an exception if a branch doesn't exist. After that, it will be possible for extensions to use the namespace api in a stable way.	2014-10-16 21:49:28 -07:00
Sean Farley	6bc9243ecb	namespaces: add method to return a list of nodes for a given name This is a helpful method that some extensions can make use of (e.g. for custom revsets); currently not used in core.	2014-12-15 14:46:04 -08:00
Sean Farley	5c1d1bb100	log: use new namespaces api to display names The only caveat here is that branches must be displayed first due to backwards compatibility. The order of namespaces is defined to be the 'update' order which, unfortunately, is not the same as log output order. It's worth mentioning that the log output is still translated the same as before since we are formating our strings the same way: # i18n: column positioning for "hg log" _("bookmark: %s\n") % bookmark becomes tname = _(("%s:" % ns.templatename).ljust(13) + "%s\n") % name when name == 'bookmark'. The ljust(13) keeps the strings and whitespace equal. Adding a new namespace is even easier now because the log output code doesn't need to change. A future programmer would just need to add the string to the corresponding .po file (which is the same as they would have had to do previously).	2014-10-17 09:26:37 -07:00
Durham Goode	d73818aad4	filectx: fix annotate to not directly instantiate filectx b04f57726c73 changed basefilectx.annotate() to directly instantiate new filectx's instead of going through self.filectx(), this breaks extensions that replace the filectx class, and would also break future uses that would need memfilectx's.	2015-01-09 11:21:29 -08:00
Sean Farley	2a4b30c27c	revset: use '%' as an operator for 'only' With this patch, we can make it much easier to specify 'only(A,B)' -> A%B. Similarly, 'only(A)' -> A%. On Windows, '%' is a semi-reserved symbol in the following way: using non-bash shells (e.g. cmd.exe but NOT PowerShell, ConEmu, and cmder), %var% is only expanded when 'var' exists and is surrounded by '%'. That only leaves batch scripts which could prove to be problematic. I posit that this isn't a big issue because any developer of batch scripts already knows that to use '%' one needs to escape it by using a double '%%'. Alternatives to '%' could be '=' but that might be limiting our future if we ever decide to use temporary assignments in a revset.	2014-11-06 14:55:18 -08:00
Gregory Szorc	433ea5a1b2	transaction: support for callbacks during abort Previous transaction work added callbacks to be called during regular transaction commit/close. As part of refactoring Mozilla's pushlog extension (an extension that opens a SQLite database and tries to tie its transaction semantics to Mercurial's transaction), I discovered that the new transaction APIs were insufficient to avoid monkeypatching transaction instance internals. Adding a callback that is called during transaction abort removes the necessity for monkeypatching and completes the API.	2015-01-06 21:56:33 -08:00
Sean Farley	6b1d107b6c	debugnamecomplete: use new name api Instead of hardcoding a list of places to check, we use the new repo.names api to get a list of potential names to complete.	2014-12-15 14:11:19 -08:00
Sean Farley	dc331facee	debugnamecomplete: rename from debuglabelcomplete Now that we have decided on the use of 'name' instead of 'label' we rename this function accordingly. The old method 'debuglabelcomplete' has been left as a deprecated command so that current scripts don't break.	2014-10-17 13:41:29 -07:00
Sean Farley	0891487d00	namespaces: add __iter__ and iteritems methods Iterating over all the namespaces is a common operation, naturally, so we add those methods now. Since we are using a sorted dictionary, this method just calls the underlying __iter__ or iteritems method.	2014-12-22 09:07:37 -08:00
Sean Farley	c40032685e	namespaces: add 'listnames' property Currently, we have no way to list all the names in a given namespace. This is needed for things such as tab completion. Future patches will use this patch for exactly that purpose.	2014-12-15 14:09:00 -08:00
Augie Fackler	264b6aaf72	manifest: drop withflags() method, which is now unused	2015-01-07 15:55:02 -05:00
Augie Fackler	b539edc70e	context: use new manifest.diff(clean=True) support This further simplifies the status code. This simplification comes at a slight performance cost for `hg export`. Before, on mozilla-central: perfmanifest tip ! wall 0.265977 comb 0.260000 user 0.240000 sys 0.020000 (best of 38) perftags ! result: 162 ! wall 0.007172 comb 0.010000 user 0.000000 sys 0.010000 (best of 403) perfstatus ! wall 0.422302 comb 0.420000 user 0.260000 sys 0.160000 (best of 24) hgperf export tip ! wall 0.148706 comb 0.150000 user 0.150000 sys 0.000000 (best of 65) after, same repo: perfmanifest tip ! wall 0.267143 comb 0.270000 user 0.250000 sys 0.020000 (best of 37) perftags ! result: 162 ! wall 0.006943 comb 0.010000 user 0.000000 sys 0.010000 (best of 397) perfstatus ! wall 0.411198 comb 0.410000 user 0.260000 sys 0.150000 (best of 24) hgperf export tip ! wall 0.173229 comb 0.170000 user 0.170000 sys 0.000000 (best of 55) The next set of patches introduces a new manifest type implemented almost entirely in C, and more than makes up for the performance hit incurred in this change.	2014-12-15 16:06:04 -05:00
Augie Fackler	78c54eb4c7	manifest: add optional recording of clean entries to diff This makes manifest slightly easier to use for status code.	2014-12-15 16:04:28 -05:00
Augie Fackler	509875a2fe	context: use manifest.diff() to compute most of status We can do a little tiny bit better by enhancing manifest.diff to optionally include files that are in both sides. This will be done in a followup patch.	2014-12-15 15:33:55 -05:00
Martin von Zweigbergk	f60106670d	trydiff: replace dodiff=True/'binary' by binarydiff=False/True	2015-01-07 11:02:10 -08:00
Martin von Zweigbergk	4a8198a6db	trydiff: replace 'dodiff = False' by 'continue' The 'dodiff' variable is initialized to True and may later be set to either False or "binary". When it's set to False, we skip everything after that point, so we can simplify by instead continue-ing (the loop). We can then also drop the 'if dodiff', since it will always be true.	2015-01-07 10:59:40 -08:00
Martin von Zweigbergk	86561cd4f0	trydiff: make addindexmeta() unconditionally add index meta The conditional-ness is not clear from the name and there is only one caller, so it's clearer to check on the call site. Moving it also makes addindexmeta() no longer close on the 'opts' variable.	2015-01-07 08:54:26 -08:00

1 2 3 4 5 ...

12933 Commits