sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-09 00:14:35 +03:00

Author	SHA1	Message	Date
Gregory Szorc	d6a37b1c7d	changelog: avoid slicing raw data until needed Before, we were slicing the original raw text and storing individual variables with values corresponding to each field. This is avoidable overhead. With this patch, we store the offsets of the fields at construction time and perform the slice when a property is accessed. This appears to show a very marginal performance win on its own and the gains are so small as to not be worth reporting. However, this patch marks the end of our parsing refactor, so it is worth reporting the gains from the entire series: author(mpm) 0.896565 0.795987 89% desc(bug) 0.887169 0.803438 90% date(2015) 0.878797 0.773961 88% extra(rebase_source) 0.865446 0.761603 88% author(mpm) or author(greg) 1.801832 1.576025 87% author(mpm) or desc(bug) 1.812438 1.593335 88% date(2015) or branch(default) 0.968276 0.875270 90% author(mpm) or desc(bug) or date(2015) or extra(rebase_source) 3.656193 3.183104 87% Pretty consistent speed-up across the board for any revset accessing changelog revision data. Not bad! It's also worth noting that PyPy appears to experience a similar to marginally greater speed-up as well! According to statprof, revsets accessing changelog revision data are now clearly dominated by zlib decompression (16-17% of execution time). Surprisingly, it appears the most expensive part of revision parsing are the various text.index() calls to search for newlines! These appear to cumulatively add up to 5+% of execution time. I reckon implementing the parsing in C would make things marginally faster. If accessing larger strings (such as the commit message), encoding.tolocal() is the most expensive procedure outside of decompression.	2016-03-06 15:40:20 -08:00
Gregory Szorc	9f5cd6d5ba	changelog: parse description last Before, we first searched for the double newline as the first step in the parse then moved to the front of the string and worked our way to the back again. This made sense when we were splitting the raw text on the double newline. But in our new newline scanning based approach, this feels awkward. This patch updates the parsing logic to parse the text linearly and deal with the description field last. Because we're avoiding an extra string scan, revsets appear to demonstrate a very slight performance win. But the percentage change is marginal, so the numbers aren't worth reporting.	2016-03-06 13:13:54 -08:00
Gregory Szorc	b586216842	changelog: lazily parse files More of the same. Again, modest revset performance wins: author(mpm) 0.896565 0.822961 0.805156 desc(bug) 0.887169 0.847054 0.798101 date(2015) 0.878797 0.811613 0.786689 extra(rebase_source) 0.865446 0.797756 0.777408 author(mpm) or author(greg) 1.801832 1.668172 1.626547 author(mpm) or desc(bug) 1.812438 1.677608 1.613941 date(2015) or branch(default) 0.968276 0.896032 0.869017	2016-03-06 14:31:06 -08:00
Gregory Szorc	eee1336322	changelog: lazily parse date/extra field This is probably the most complicated patch in the parsing refactor. Because the date and extras are encoded in the same field, we stuff the entire field into a dedicated variable and add a property for accessing the sub-components of each. There is some duplicated code here. But the code is relatively simple, so it shouldn't be a big deal. We see revset performance wins across the board: author(mpm) 0.896565 0.876713 0.822961 desc(bug) 0.887169 0.895514 0.847054 date(2015) 0.878797 0.820987 0.811613 extra(rebase_source) 0.865446 0.823811 0.797756 author(mpm) or author(greg) 1.801832 1.784160 1.668172 author(mpm) or desc(bug) 1.812438 1.822756 1.677608 date(2015) or branch(default) 0.968276 0.910981 0.896032 author(mpm) or desc(bug) or date(2015) or extra(rebase_source) 3.656193 3.516788 3.265024 We see a speed-up on revsets accessing date and extras because the new parsing code only parses what you access. Even though they are stored the same text field, we avoid parsing dates when accessing extras and vice-versa. But strangely revsets accessing both date and extras appeared to speed up as well! I'm not sure if this is due to refactoring the parsing code or due to an optimization in revsets. You can't argue with the results!	2016-03-06 14:30:25 -08:00
Gregory Szorc	e043adff86	changelog: lazily parse user Same strategy as before. Revsets not accessing the user demonstrate a slight performance win: desc(bug) 0.887169 0.910400 0.895514 date(2015) 0.878797 0.870697 0.820987 extra(rebase_source) 0.865446 0.841644 0.823811 date(2015) or branch(default) 0.968276 0.945792 0.910981	2016-03-06 14:29:46 -08:00
Gregory Szorc	bb5baaea05	changelog: lazily parse manifest node Like the description, we store the raw bytes and convert from hex on access. This patch also marks the beginning of our new parsing method, which is based on newline offsets and doesn't rely on str.split(). Many revsets showed a performance improvement: author(mpm) 0.896565 0.869085 0.868598 desc(bug) 0.887169 0.928164 0.910400 extra(rebase_source) 0.865446 0.871500 0.841644 author(mpm) or author(greg) 1.801832 1.791589 1.731503 author(mpm) or desc(bug) 1.812438 1.851003 1.798764 date(2015) or branch(default) 0.968276 0.974027 0.945792	2016-03-06 14:29:13 -08:00
Gregory Szorc	59c7349712	changelog: lazily parse description Before, the description field was converted to a localstr at parse time. With this patch, we store the raw description and convert to a localstr when it is first accessed. We see a revset speedup for revsets that don't access the description: author(mpm) 0.896565 0.914234 0.869085 date(2015) 0.878797 0.891980 0.862525 extra(rebase_source) 0.865446 0.912514 0.871500 author(mpm) or author(greg) 1.801832 1.860402 1.791589 date(2015) or branch(default) 0.968276 0.994673 0.974027 author(mpm) or desc(bug) or date(2015) or extra(rebase_source) 3.656193 3.721032 3.643593 As you can see, most of these revsets are already faster than from before this refactoring: we have already offset the performance loss from the introduction of the new class representing parsed changelog entries!	2016-03-06 14:28:46 -08:00
Gregory Szorc	a18df3d4e5	changelog: add class to represent parsed changelog revisions Currently, changelog entries are parsed into their respective components at read time. Many operations are only interested in a subset of fields of a changelog entry. The parsing and storing of all the fields adds avoidable overhead. This patch introduces the "changelogrevision" class. It takes changelog raw text and exposes the parsed results as attributes. The code for parsing changelog entries has been moved into its construction function. changelog.read() has been modified to use the new class internally while maintaining its existing API. Future patches will make revision parsing lazy. We implement the construction function of the new class with __new__ instead of __init__ so we can use a named tuple to represent the empty revision. This saves overhead and complexity of coercing later versions of this class to represent an empty instance. While we are here, we add a method on changelog to obtain an instance of the new type. The overhead of constructing the new class regresses performance of revsets accessing this data: author(mpm) 0.896565 0.929984 desc(bug) 0.887169 0.935642 105% date(2015) 0.878797 0.908094 extra(rebase_source) 0.865446 0.922624 106% author(mpm) or author(greg) 1.801832 1.902112 105% author(mpm) or desc(bug) 1.812438 1.860977 date(2015) or branch(default) 0.968276 1.005824 author(mpm) or desc(bug) or date(2015) or extra(rebase_source) 3.656193 3.743381 Once lazy parsing is implemented, these revsets will all be faster than before. There is no performance change on revsets that do not access this data. There /could/ be a performance regression on operations that perform several changelog reads. However, I can't think of anything outside of revsets and `hg log` (basically the same as a revset) that would be impacted.	2016-03-06 14:28:02 -08:00
Matt Mackall	75ceec976e	changelog: backed out changeset 4ef1c9b76e22	2016-03-02 16:05:30 -06:00
Matt Mackall	b4737e5b27	changelog: backed out changeset 9f92d143bdd2 We want to avoid leaking UTF-8 to main body of code wherever possible.	2016-03-02 12:46:54 -06:00
Gregory Szorc	ab7f7f6a19	changelog: lazy decode user (API) This appears to show a similar speedup as the previous patch.	2016-02-27 22:34:18 -08:00
Gregory Szorc	6f651cb96b	changelog: lazy decode description (API) Currently, changelog reading decodes read values. This is wasteful because a lot of times consumers aren't interested in some of these values. This patch changes description decoding to occur in changectx as needed. revsets reading changelog entries appear to speed up slightly: revset #7: author(lmoscovicz) plain 0) 0.906329 1) 0.872653 revset #8: author(mpm) plain 0) 0.903478 1) 0.878037 revset #9: author(lmoscovicz) or author(mpm) plain 0) 1.817855 1) 1.778680 revset #10: author(mpm) or author(lmoscovicz) plain 0) 1.837052 1) 1.764568	2016-02-27 22:25:14 -08:00
Gregory Szorc	c5114195d9	changelog: remove redundant parentheses You don't need to surround returned tuples with parens.	2016-02-27 22:25:47 -08:00
Laurent Charignon	be5dd01ee1	changelog: add a new method to get files modified by a changeset This patch adds a new method "readfiles" to get the files modified by a changeset. It extracts some logic from "read" to only return the files modified by a changeset as efficiently as possible. This is used in the next patch to speed up hg log <file\|folder>	2015-12-18 13:45:55 -08:00
Yuya Nishihara	d0b6532f54	reachableroots: construct and sort baseset in revset module This can remove the dependency from changelog to revset, which seems a bit awkward for me.	2015-08-28 11:14:24 +09:00
Pierre-Yves David	25876521c9	reachableroots: use baseset lazy sorting smartset sorting is lazy (so faster in some case) and better (informs that the set is sorted allowing some optimisation). So we rely on it directly. Some test output are updated because we now have more information (ordering).	2015-08-20 17:23:21 -07:00
Yuya Nishihara	7f0aba37f0	reachableroots: use internal "revstates" array to test if rev is a root The main goal of this patch series is to reduce the use of PyXxx() function that is likely to require ugly error handling and inc/decref. Plus, this is faster than using PySet_Contains(). revset #0: 0::tip 0) 0.004168 1) 0.003678 88% This patch ignores out-of-range roots as they are in the pure implementation. Because reachable sets are calculated from heads, and out-of-range heads raise IndexError, we can just take out-of-range roots as unreachable. Otherwise, the test of "hg log -Gr '. + wdir()'" would fail. "heads" argument is changed to a list. Should we have to rename the C function as its signature is changed?	2015-08-14 15:43:29 +09:00
Augie Fackler	3abaf93572	changelog: trust C implementation of reachableroots more There are no remaining codepaths in reachableroots where it will return None, so just trust it completely and simplify this method. Result by revset ================ Revision: 0) Revision 462169b7f799: style: adjust whitespaces in webutil.py 1) Revision d1d91b8090c6: changelog: trust C implementation of reachableroots more revset #0: 0::tip plain 0) 0.067684 1) 0.006622 9% revset #1: 0::@ plain 0) 0.068249 1) 0.009394 13% IOW this is a 10x speedup in my repo for hg itself for 0::tip and similar revsets now that the C code is correctly wired up.	2015-08-11 15:06:02 -04:00
Laurent Charignon	1268f45eae	changelog: add way to call the reachableroots C implementation This patch is part of a series of patches to speed up the computation of revset.reachableroots by introducing a C implementation. The main motivation is to speed up smartlog on big repositories. At the end of the series, on our big repositories the computation of reachableroots is 10-50x faster and smartlog on is 2x-5x faster. This patch allows us to call the new C implementation of reachableroots from python by creating an entry point in the changelog class.	2015-08-06 22:10:31 -07:00
Gregory Szorc	68dd6a9cac	changelog: use absolute_import	2015-08-08 00:26:49 -07:00
Pierre-Yves David	b06e4ae06e	changelog: update read pending documentation The pending index contains a full copy of the index + in-transaction data. We replace "extend" with "overwrite" to make this clearer.	2015-07-17 15:53:56 +02:00
Pierre-Yves David	7538ebe084	changelog: document the 'readpending' method I happen to have spent some time understanding this logic, so I'm leaving documentation for the next poor fellow.	2014-09-28 20:18:43 -07:00
Yuya Nishihara	8776396de3	changelog: drop unnecessary override of "hasnode" revlog.hasnode() calls self.rev(node) that takes filtering into account.	2015-05-10 11:39:01 -05:00
Pierre-Yves David	bf28a4a61f	changelog: fix readpending if no pending data exist (issue4609) Since transaction are used for more than just changesets, it is possible to have a transaction without new changesets at all. In this case no ''00changelog.i.a' are written. In all cases the 'changelog.readpending' method is called if the repository has any pending data. The 'revlog' logic provides empty content if the file is missing, so the whole operation resulted in an empty changelog. We now skip reading the pending file if it is missing.	2015-04-20 17:16:22 +02:00
Yuya Nishihara	7941359b9d	changelog: inline revlog.__contains__ in case it is used in hot loop Currently __contains__ is called only by "rev()" revset, but "x in cl" is a function that is likely to be used in hot loop. revlog.__contains__ is simple enough to duplicate to changelog, so just inline it.	2015-04-04 22:30:59 +09:00
Yuya Nishihara	237120f282	revlog: add __contains__ for fast membership test Because revlog implements __iter__, "rev in revlog" works but does silly O(n) lookup unexpectedly. So it seems good to add fast version of __contains__. This allows "rev in repo.changelog" in the next patch.	2015-02-04 21:25:57 +09:00
Pierre-Yves David	500808d844	changelog: register changelog.i.a as a temporary file The file is registered to make sure the transaction is cleaned up in all cases.	2014-11-08 17:08:09 +00:00
Pierre-Yves David	29f854f61a	transaction: pass the transaction to 'finalize' callback The callback will likely need to perform some operation related to the transaction (eg: registering file update). So we better pass the current transaction as the callback argument. Otherwise callback that needs it has to rely on horrible weak reference trick. This allow already allow us to slay a wild weak reference usage.	2014-11-08 16:31:38 +00:00
Pierre-Yves David	92bf4dcbdc	transaction: pass the transaction to 'pending' callback The callback will likely need to perform some operation related to the transaction (eg: backing files up). So we better pass the current transaction as the callback argument. Otherwise callback that needs it has to rely on horrible weak reference trick. The first foreseen user of this is changelog._writepending. We would like it to register the temporary file it create for cleanup purpose.	2014-11-08 16:27:50 +00:00
Pierre-Yves David	8803fc197d	changelog: rely on transaction for finalization Instead of calling 'cl.finalize()' by hand (possibly at a bogus time) we register it in the transaction during 'delayupdate' and rely on 'tr.close()' to call it at the right time.	2014-10-18 01:09:41 -07:00
Pierre-Yves David	d6b8860637	changelog: handle writepending in the transaction The 'delayupdate' method now takes a transaction object and registers its '_writepending' method for execution in 'transaction.writepending()'. The hook can then use 'transaction.writepending()' directly. At some point this will allow the addition of other file creation during writepending.	2014-10-17 21:55:31 -07:00
Pierre-Yves David	71f171494e	changelog: rework the delayupdate mechanism The current way we use the 'delayupdate' mechanism is wrong. We call 'delayupdate' right after the transaction retrieval, then we call 'finalize' right before calling 'tr.close()'. The 'finalize' call will -always- result in a flush to disk, making the data available to all readers. But the 'tr.close()' may be a no-op if the transaction is nested. This would result in data: 1) exposed to reader too early, 2) rolled back by other part of the transaction after such exposure So we need to end up in a situation where we call 'finalize' a single time when the transaction actually closes. For this purpose we need to be able to call 'delayupdate' and '_writepending' multiple times and 'finalize' once. This was not possible with the previous state of the code. This changeset refactors the code to makes this possible. We buffer data in memory as much as possible and fall-back to writing to a ".a" file after the first call to '_writepending'.	2014-10-18 01:12:18 -07:00
Mads Kiilerich	9a3561b211	changelog: use headrevsfiltered 5d1adb6683fa introduced use of the new filtering headrevs C implementation. It caught TypeError to detect when to fall back to the implementation that was compatible with old extensions. That method was however not reliable. Instead, use the new headrevsfiltered function when passing a filter. It will reliably fail with AttributeError when an old extension that predates headrevsfiltered is used.	2014-10-26 12:14:12 +01:00
Pierre-Yves David	37d7d2958f	repoview: add a FilteredLookupError class This exception is a more precise LookupError that will allow us to issue a special message when we end up accessing a filtered revision.	2014-10-16 02:05:06 -07:00
Pierre-Yves David	ea3e835124	repoview: add a FilteredIndexError class This exception is a more precise IndexError that will allow us to issue a special message when we end up accessing a filtered revision.	2014-10-15 17:02:44 -07:00
Durham Goode	1b59b1e4c0	obsolete: use C code for headrevs calculation Previously, if there were filtered revs the repository could not use the C fast path for computing the head revs in the changelog. This slowed down many operations in large repositories. This adds the ability to filter revs to the C fast path. This speeds up histedit on repositories with filtered revs by 30% (13s to 9s). This could be improved further by sorting the filtered revs and walking the sorted list while we walk the changelog, but even this initial version that just calls __contains__ is still massively faster. The new C api is compatible for both new and old python clients, and the new python client can call both new and old C apis.	2014-09-16 16:03:21 -07:00
Pierre-Yves David	0203860e23	changelog: ensure changelog._delaybuf is initialized The ``localrepo.writepending`` method is using the ``changelog._delaybuff`` attribute to know if it has anything to do. However the ``changelog._delaybuff`` is never initialised at ``__init__`` time. This can lead to crash when using bundle2 for part that never touch the changelog. We simply initialize it to its base value. This is scheduled for stable as it both trivial and blocking for experimenting with bundle2.	2014-05-20 13:55:08 -07:00
Brodie Rao	a446720e09	branchmap: cache open/closed branch head information This lets us determine the open/closed state of a branch without reading from the changelog (which can be costly over NFS and/or with many branches).	2013-09-16 01:08:29 -07:00
FUJIWARA Katsunori	61a538d89f	changelog: use "vfs.fstat()" instead of "util.fstat()" Just invoking "os.fstat()" with "file.fileno()" doesn't require non ANSI file API, because filename is not used for invocation of "os.fstat()". But "util.fstat()" should invoke "os.stat()" with "fp.name", if file object doesn't have "fileno()" method for portability, and "fp.name" may cause invocation of non ANSI file API. So, this patch makes the constructor of appender class invoke "util.fstat()" via vfs, to encapsulate filename handling.	2013-10-15 00:51:04 +09:00
FUJIWARA Katsunori	6d7cc2f5e0	changelog: use "vfs.rename()" instead of "util.rename()"	2013-10-15 00:51:04 +09:00
Augie Fackler	f03df466e2	changelog: hexlify node when throwing a LookupError on a filtered node The non-hexlified node was leaking all the way out to the web interface, and wasn't consistent with the behavior for nonexistent nodes.	2013-02-09 06:07:32 -06:00
Mads Kiilerich	c506eac122	tests: fix doctest stability over Python versions pprint ain't pretty in Python 2.4: Changed in version 2.5: Dictionaries are sorted by key before the display is computed; before 2.5, a dictionary was sorted only if its display required more than one line, although that wasn’t documented. Fixes issue introduced in 06396987f8e8.	2013-01-15 18:42:04 +01:00
Mads Kiilerich	403c97887d	tests: stabilize doctest output Avoid dependencies to dict iteration order.	2013-01-15 02:59:14 +01:00
Mads Kiilerich	d8671137d4	changelog: please check-code and remove tabs Tabs were introduced in 972c32f23691.	2013-01-12 16:04:29 +01:00
Pierre-Yves David	a99d835522	changelog: add a `branch` method, bypassing changectx The only way to access the branch of a changeset is currently to create a changectx object and access its `branch()` method. Creating a new Python object is costly and has a huge impact on code doing heavy access to `branch()` (like branchmap). This change introduces a new method on changelog that allows direct access to the branch of a revision. See the next changeset for impact.	2013-01-10 00:41:40 +01:00
Pierre-Yves David	842d7e9f52	clfilter: use empty frozenset intead of empty tuple This will allows set operation needed for cache collaboration.	2013-01-02 01:40:06 +01:00
Durham Goode	cce0517fb6	commit: increase perf by avoiding unnecessary filteredrevs check When commiting to a repo with lots of history (>400000 changesets) the filteredrevs check (added with 373606589de5) in changelog.py takes a bit of time even if the filteredrevs set is empty. Skipping the check in that case shaves 0.36 seconds off a 2.14 second commit. A 17% gain.	2012-11-16 15:39:12 -08:00
Pierre-Yves David	83fc452dc5	changelog: extract description cleaning logic in a dedicated function The amend logic have use for it.	2012-10-18 22:04:49 +02:00
Pierre-Yves David	7fb7b9e016	clfilter: introduce `filteredrevs` attribute on changelog This changeset allows changelog object to be "filtered". You can assign a set of revision numbers to the `changelog.filteredrevs` attributes. The changelog will then pretends these revision does not exists in this repo. A few methods need to be altered to achieve this behavior: - tip - __iter_ - irevs - hasnode - headrevs For consistency and to help debugging, the following methods are altered too. Tests tend to show it's not necessary to alter them but have them raise proper exception helps to detect bad acces to filtered revisions. - rev - node - linkrev - parentrevs - flags The following methods would also need alteration for consistency purpose but this is non-trivial and not done yet. - nodemap - strip The C version of headrevs is not run if there is any revision to filter. It'll need a proper rewrite later to restore performance.	2012-09-20 19:02:47 +02:00
Mads Kiilerich	2f4504e446	fix trivial spelling errors	2012-08-15 22:38:42 +02:00

1 2 3

131 Commits