Commit Graph

12933 Commits

Author SHA1 Message Date
Pierre-Yves David
c05e3eea5d setdiscovery: drop the 'always' argument to '_updatesample'
This argument exists because of the complex code flow in '_takequicksample'. It
first gets the list of heads and then calls '_updatesample' on an empty initial
sample and a size limit matching the differences between the number of heads and
the target sample size. Finally the heads and the sample from '_updatesample'
were added. To ensure this addition result had the exact target length, the code
had to ensure no elements from the heads were added to the '_updatesample'
content and therefore was passing this "always included set of heads".

Instead we can just update the initial heads sample directly and use the final
target size as target size for the update.

This removes the need for this 'always' parameter to the '_updatesample' function

The test are affected because different set building order results in different
random sampling.
2015-01-07 10:32:17 -08:00
Pierre-Yves David
6ff053fa11 setdiscovery: always add exponential sample to the heads
As explained in a previous changeset, prioritizing heads too much behaves
pathologically when there are more heads than the sample size. To counter this,
we always inject exponential samples before reducing to the sample size limit.

This already show some benefit in the test themselves, but on a real-world example
this moves my discovery for push to pathologically headed repo from 45 rounds to
17 of them.

We should maybe ensure that at least 25% of the result sample is heads, but I
think the random sampling will be fine in practice.
2015-01-07 17:28:51 -08:00
Pierre-Yves David
60a9cd0334 setdiscovery: directly run '_updatesample'
The heads and exponential sample are going to end up in the same set
before any extra processing happens. We simplify the code by directly
updating a set with heads.

Changes in the order the set is built lead to small changes in the random
sampling output. But after double checking, I can confirm the input data to
the random sampling is consistent.
2015-01-07 17:23:21 -08:00
Pierre-Yves David
252ba1a3c3 setdiscovery: stop using '_setupsample' in '_takefullsample'
Very few of the return values of '_setupsample' remain in use, so we
directly retrieve the value we care about and drop the '_setupsample'
call.
2015-01-07 17:17:56 -08:00
Pierre-Yves David
e3605ecf1f setdiscovery: randomly pick between heads and sample when taking full sample
Before this changeset, the discovery protocol was too heads-centric. Heads of the
undiscovered set were always sent for discovery and any room remaining in the
sample were filled with exponential samples (and random ones if any room
remained).

This behaved extremely poorly when the number of heads exceeded the sample size,
because we keep just asking about the existence of heads, then their direct parent
and so on. As a result, the 'O(log(len(repo)))' discovery turns into a
'O(len(repo))' one. As a solution we take a random sample of the heads plus
exponential samples. This way we ensure some exponential sampling is achieved,
bringing back some logarithmic convergence of the discovery again.

This patch only applies this principle in one place. More places will be updated
in future patches.

One test is impacted because the random sample happen to be different. By
chance, it helps a bit in this case.
2015-01-07 12:09:51 -08:00
Pierre-Yves David
6141054495 setdiscovery: document the '_updatesample' function
This function is central in the sample building process, having it documented
help code readability a lot.
2015-01-06 17:02:32 -08:00
Pierre-Yves David
2fde36047b setdiscovery: avoid calling any sample building if the undecided set is small
If the length of undecided is smaller than the sample size, we can just request
information for all of them.

This conditional was previously handled by '_setupsample'. But '_setupsample' is
in my opinion a problematic function with blurry semantics. Having this
conditional explicitly earlier makes the code more explicit and moves us closer
to removing this '_setupsample' function.
2015-01-06 16:40:33 -08:00
Pierre-Yves David
ef881538c4 setdiscovery: delay sample building calls to gather them in a single place
Some of the logic around sample building is duplicated in the sample builders,
it would clean up thing to extract it in the top function, but this requires
all codes to be in the same place.

This changeset mostly exists to make the next one more clear.
2015-01-07 09:30:06 -08:00
Pierre-Yves David
ebee9c1c62 setdiscovery: drop unused 'initial' argument for '_takequicksample'
There is a single call site, and it is always using 'initial=True'. So we just drop
the argument and the associated condition.
2015-01-06 16:32:23 -08:00
Matt Mackall
efd707b6d7 readmarkers: add a SHA256 fixme note 2015-01-11 16:46:13 -06:00
Matt Mackall
1b1c572cac readmarkers: fast-path single successors and parents
This gives about a 5% performance bump.
2015-01-11 16:37:57 -06:00
Matt Mackall
aaefce821d readmarkers: promote global constants to locals for performance 2015-01-11 15:35:09 -06:00
Matt Mackall
5a2d46743d readmarkers: drop a temporary 2015-01-11 14:52:57 -06:00
Matt Mackall
f70b8899f1 readmarkers: read node reading into node length conditional
This removes some conditional assignments
2015-01-11 14:51:49 -06:00
Matt Mackall
fd24385ccc readmarkers: drop a temporary
Two other temporaries are renamed to fit line-length.
2015-01-11 14:46:55 -06:00
Matt Mackall
54920da7a8 readmarkers: hoist subtraction out of loop comparison 2015-01-11 14:44:57 -06:00
Matt Mackall
ccb9e25cca readmarkers: streamline offset tracking
This minimizes the number of assignments and operations needed to use offsets.
2015-01-11 14:43:31 -06:00
Matt Mackall
4cb9887cb8 readmarkers: use unpacker for fixed header 2015-01-11 14:37:50 -06:00
Matt Mackall
66bb59fd8e readmarkers: drop metadata temporary 2015-01-11 14:35:03 -06:00
Matt Mackall
eeecc7d717 readmarkers: drop date temporary 2015-01-11 14:33:49 -06:00
Matt Mackall
46c0e8872c readmarkers: drop another conditional 2015-01-11 14:32:56 -06:00
Matt Mackall
b42ed84f7a readmarkers: drop a conditional 2015-01-10 21:28:15 -06:00
Matt Mackall
aebd9ab379 readmarkers: add some whitespace 2015-01-10 21:27:29 -06:00
Matt Mackall
00030ac687 readmarkers: combine parent conditionals 2015-01-10 21:25:07 -06:00
Matt Mackall
57b821b575 readmarkers: drop temporary substring assignments
Assignments are expensive in inner loops
2015-01-10 21:24:45 -06:00
Matt Mackall
2537cb8fbf util: introduce unpacker
This allows taking advantage of Python 2.5+'s struct.Struct, which
provides a slightly faster unpack due to reusing formats. Sadly,
.unpack_from is significantly slower.
2015-01-10 21:18:31 -06:00
Mads Kiilerich
61a36ea4fe revset: use localrepo revbranchcache for branch name filtering
Branch name filtering in revsets was expensive. For every rev it created a
changectx and called .branch() which retrieved the branch name from the
changelog.

Instead, use the revbranchcache.

The revbranchcache is used read-only. The revset implementation with generators
and callbacks makes it hard to figure out when we are done using/updating the
cache and could write it back. It would also be 'tricky' to lock the repo for
writing from within a revset execution. Finally, the branchmap update will
usually make sure that the cache is updated before any revset can be run.
The revbranchcache is used without any locking but is short-lived and used in a
tight loop where we can assume that the changelog doesn't change ... or where
it not is relevant to us if it does.

perfrevset 'branch(mobile)' on mozilla-central.
Before:
! wall 10.989637 comb 10.970000 user 10.940000 sys 0.030000 (best of 3)
After, no cache:
! wall 7.368656 comb 7.370000 user 7.360000 sys 0.010000 (best of 3)
After, with cache:
! wall 0.528098 comb 0.530000 user 0.530000 sys 0.000000 (best of 18)

The performance improvement even without cache come from being based on
branchinfo on the changelog instead of using ctx.branch().

Some tests are added to verify that the revbranchcache works and keep an eye on
when the cache files actually are updated.
2015-01-08 00:01:03 +01:00
Mads Kiilerich
835157e77d branchmap: use revbranchcache when updating branch map
The revbranchcache is read on demand before it will be used for updating the
branch map. It is written back when the branchmap is written and it will thus
use the same locking as branchmap. The revbranchcache instance is short-lived;
it is only stored in the branchmap from .update() is invoked and until .write()
is invoked. Branchmap already assume that the repo is locked in that case.

The use of revbranchcache for branch map updates will make sure that the
revbranchcache "always" is kept up-to-date.

The perfbranchmap benchmark is somewhat bogus, especially when we can see that
the caching makes a significant difference between the realistic case of a
first run and the rare case of rerunning it with a full cache. Here are some
'base' numbers on mozilla-central:
Before:
! wall 6.912745 comb 6.910000 user 6.840000 sys 0.070000 (best of 3)
After - initial, cache is empty:
! wall 7.792569 comb 7.790000 user 7.720000 sys 0.070000 (best of 3)
After - cache is full:
! wall 0.879688 comb 0.880000 user 0.870000 sys 0.010000 (best of 4)

The overhead when running with empty cache comes from checking, missing and
updating it every time.

Most of the performance improvement comes from not having to extract the branch
info from the changelog. The last doubling of performance comes from no longer
having to convert all branch names to local encoding but reuse the few already
converted branch names.

On the hg repo:
Before:
! wall 0.715703 comb 0.710000 user 0.710000 sys 0.000000 (best of 14)
After:
! wall 0.105489 comb 0.110000 user 0.110000 sys 0.000000 (best of 87)
2015-01-08 00:01:03 +01:00
Mads Kiilerich
1b3892318f branchcache: introduce revbranchcache for caching of revision branch names
It is expensive to retrieve the branch name of a revision. Very expensive when
creating a changectx and calling .branch() every time - slightly less when
using changelog.branchinfo().

Now, to speed things up, provide a way to cache the results on disk in an
efficient format. Each branchname is assigned a number, and for each revision
we store the number of the corresponding branch name. The branch names are
stored in a dedicated file which is strictly append only.

Branch names are usually reused across several revisions, and the total list of
branch names will thus be so small that it is feasible to read the whole set of
names before using the cache. It will however do that it might be more
efficient to use the changelog for retrieving the branch info for a single
revision.

The revision entries are stored in another file. This file is usually append
only, but if the repository has been modified, the file will be truncated and
the relevant parts rewritten on demand.

The entries for each revision are 8 bytes each, and the whole revision file
will thus be 1/8 of 00changelog.i.

Each revision entry contains the first 4 bytes of the corresponding node hash.
This is used as a check sum that always is verified before the entry is used.
That check is relatively expensive but it makes sure history modification is
detected and handled correctly. It will also detect and handle most revision
file corruptions.

This is just a cache. A new format can always be introduced if other
requirements or ideas make that seem like a good idea. Rebuilding the cache is
not really more expensive than it was to run for example 'hg log -b branchname'
before this cache was introduced.

This new method is still unused but promise to make some operations several
times faster once it actually is used.

Abandoning Python 2.4 would make it possible to implement this more efficiently
by using struct classes and pack_into. The Python code could probably also be
micro optimized or it could be implemented very efficiently in C where it would
be easy to control the data access.
2015-01-08 00:01:03 +01:00
Anton Shestakov
6fe7d43de3 hgweb: move archive entries outside of <li> in monoblue style
archiveentry already includes surrounding <li></li>, so putting archive entries
inside <li> element produced incorrect markup.
2015-01-09 22:53:38 +08:00
Anton Shestakov
8bdbccf3bf hgweb: add searchhint to templates/coal/map
coal style uses every template (except header.tmpl) directly from paper style,
but doesn't use paper/map file. Elements defined in such map files are used in
templates as you would expect. For example, paper/search.tmpl contains
'{searchhint}' and template engine replaces that with the actual hint. But when
coal style reuses paper/search.tmpl, it needs to define searchhint in its map
file as well, or template engine will not find it. So let's copy it from
paper/map to coal/map.

Before this change, if the coal style was selected, the hint for the search
field in page header was present, but it was completely empty. Although the
absence of searchhint in coal/map produced no error.
2015-01-09 15:24:55 +08:00
Martin von Zweigbergk
01d503fc7e status: don't override _buildstatus() in workingcommitctx
Now that the caching into _status is done in
workingctx._dirstatestatus(), which workingcommitctx._dirstatestatus()
does not call, there is no caching to prevent in _buildstatus(), so
stop overriding it.
2015-01-08 13:29:06 -08:00
Martin von Zweigbergk
370c0e4b47 status: cache dirstate status in _dirstatestatus()
Since it's only the dirstate status we cache, it makes more sense to
cache it in the _dirstatestatus() method. Note that this change means
the dirstate status will also be cached when status is requested
between the working copy and some other revision, while we currently
only cache the result if exactly the status between the working copy
and its parent is requested.
2015-01-08 13:12:44 -08:00
Sean Farley
28f3ddd179 localrepo: add ignoremissing parameter to branchtip
Previously, in the namespaces api, the only caller of branchtip was singlenode
which happened to raise the same exception that branchtip raised: KeyError.

This is a minor change but will allow upcoming patches to use repo.branchtip to
not raise an exception if a branch doesn't exist. After that, it will be
possible for extensions to use the namespace api in a stable way.
2014-10-16 21:49:28 -07:00
Sean Farley
6bc9243ecb namespaces: add method to return a list of nodes for a given name
This is a helpful method that some extensions can make use of (e.g. for custom
revsets); currently not used in core.
2014-12-15 14:46:04 -08:00
Sean Farley
5c1d1bb100 log: use new namespaces api to display names
The only caveat here is that branches must be displayed first due to backwards
compatibility. The order of namespaces is defined to be the 'update' order
which, unfortunately, is not the same as log output order.

It's worth mentioning that the log output is still translated the same as
before since we are formating our strings the same way:

  # i18n: column positioning for "hg log"
  _("bookmark:    %s\n") % bookmark

becomes

  tname = _(("%s:" % ns.templatename).ljust(13) + "%s\n") % name

when name == 'bookmark'. The ljust(13) keeps the strings and whitespace equal.
Adding a new namespace is even easier now because the log output code doesn't
need to change. A future programmer would just need to add the string to the
corresponding .po file (which is the same as they would have had to do
previously).
2014-10-17 09:26:37 -07:00
Durham Goode
d73818aad4 filectx: fix annotate to not directly instantiate filectx
b04f57726c73 changed basefilectx.annotate() to directly instantiate new
filectx's instead of going through self.filectx(), this breaks extensions that
replace the filectx class, and would also break future uses that would need
memfilectx's.
2015-01-09 11:21:29 -08:00
Sean Farley
2a4b30c27c revset: use '%' as an operator for 'only'
With this patch, we can make it much easier to specify 'only(A,B)' ->
A%B. Similarly, 'only(A)' -> A%.

On Windows, '%' is a semi-reserved symbol in the following way: using non-bash
shells (e.g. cmd.exe but NOT PowerShell, ConEmu, and cmder), %var% is only
expanded when 'var' exists and is surrounded by '%'.

That only leaves batch scripts which could prove to be problematic. I posit
that this isn't a big issue because any developer of batch scripts already
knows that to use '%' one needs to escape it by using a double '%%'.

Alternatives to '%' could be '=' but that might be limiting our future if we
ever decide to use temporary assignments in a revset.
2014-11-06 14:55:18 -08:00
Gregory Szorc
433ea5a1b2 transaction: support for callbacks during abort
Previous transaction work added callbacks to be called during regular
transaction commit/close. As part of refactoring Mozilla's pushlog
extension (an extension that opens a SQLite database and tries to tie
its transaction semantics to Mercurial's transaction), I discovered that
the new transaction APIs were insufficient to avoid monkeypatching
transaction instance internals. Adding a callback that is called during
transaction abort removes the necessity for monkeypatching and completes
the API.
2015-01-06 21:56:33 -08:00
Sean Farley
6b1d107b6c debugnamecomplete: use new name api
Instead of hardcoding a list of places to check, we use the new repo.names api
to get a list of potential names to complete.
2014-12-15 14:11:19 -08:00
Sean Farley
dc331facee debugnamecomplete: rename from debuglabelcomplete
Now that we have decided on the use of 'name' instead of 'label' we rename this
function accordingly.

The old method 'debuglabelcomplete' has been left as a deprecated command so
that current scripts don't break.
2014-10-17 13:41:29 -07:00
Sean Farley
0891487d00 namespaces: add __iter__ and iteritems methods
Iterating over all the namespaces is a common operation, naturally, so we add
those methods now. Since we are using a sorted dictionary, this method just
calls the underlying __iter__ or iteritems method.
2014-12-22 09:07:37 -08:00
Sean Farley
c40032685e namespaces: add 'listnames' property
Currently, we have no way to list all the names in a given namespace. This is
needed for things such as tab completion. Future patches will use this patch
for exactly that purpose.
2014-12-15 14:09:00 -08:00
Augie Fackler
264b6aaf72 manifest: drop withflags() method, which is now unused 2015-01-07 15:55:02 -05:00
Augie Fackler
b539edc70e context: use new manifest.diff(clean=True) support
This further simplifies the status code.

This simplification comes at a slight performance cost for `hg
export`. Before, on mozilla-central:

perfmanifest tip
! wall 0.265977 comb 0.260000 user 0.240000 sys 0.020000 (best of 38)
perftags
! result: 162
! wall 0.007172 comb 0.010000 user 0.000000 sys 0.010000 (best of 403)
perfstatus
! wall 0.422302 comb 0.420000 user 0.260000 sys 0.160000 (best of 24)
hgperf export tip
! wall 0.148706 comb 0.150000 user 0.150000 sys 0.000000 (best of 65)

after, same repo:
perfmanifest tip
! wall 0.267143 comb 0.270000 user 0.250000 sys 0.020000 (best of 37)
perftags
! result: 162
! wall 0.006943 comb 0.010000 user 0.000000 sys 0.010000 (best of 397)
perfstatus
! wall 0.411198 comb 0.410000 user 0.260000 sys 0.150000 (best of 24)
hgperf export tip
! wall 0.173229 comb 0.170000 user 0.170000 sys 0.000000 (best of 55)

The next set of patches introduces a new manifest type implemented
almost entirely in C, and more than makes up for the performance hit
incurred in this change.
2014-12-15 16:06:04 -05:00
Augie Fackler
78c54eb4c7 manifest: add optional recording of clean entries to diff
This makes manifest slightly easier to use for status code.
2014-12-15 16:04:28 -05:00
Augie Fackler
509875a2fe context: use manifest.diff() to compute most of status
We can do a little tiny bit better by enhancing manifest.diff to
optionally include files that are in both sides. This will be done in
a followup patch.
2014-12-15 15:33:55 -05:00
Martin von Zweigbergk
f60106670d trydiff: replace dodiff=True/'binary' by binarydiff=False/True 2015-01-07 11:02:10 -08:00
Martin von Zweigbergk
4a8198a6db trydiff: replace 'dodiff = False' by 'continue'
The 'dodiff' variable is initialized to True and may later be set to
either False or "binary". When it's set to False, we skip everything
after that point, so we can simplify by instead continue-ing (the
loop). We can then also drop the 'if dodiff', since it will always be
true.
2015-01-07 10:59:40 -08:00
Martin von Zweigbergk
86561cd4f0 trydiff: make addindexmeta() unconditionally add index meta
The conditional-ness is not clear from the name and there is only one
caller, so it's clearer to check on the call site. Moving it also
makes addindexmeta() no longer close on the 'opts' variable.
2015-01-07 08:54:26 -08:00