Commit Graph

729 Commits

Author SHA1 Message Date
Denis Laxalde
761577866a context: follow all branches in blockdescendants()
In the initial implementation of blockdescendants (and thus followlines(...,
descend=True) revset), only the first branch encountered in descending
direction was followed.

Update the algorithm so that all children of a revision ('x' in code) are
considered. Accordingly, we need to prevent a child revision to be yielded
multiple times when it gets visited through different path, so we skip 'i'
when this occurs. Finally, since we now consider all parents of a possible
child touching a given line range, we take care of yielding the child if it
has a diff in specified line range with at least one of its parent (same logic
as blockancestors()).
2017-04-14 08:55:18 +02:00
Denis Laxalde
d7409a0458 context: add a blockdescendants function
This is symmetrical with blockancestors() and yields descendants of a filectx
with changes in the given line range. The noticeable difference is that the
algorithm does not follow renames (probably because filelog.descendants() does
not), so we are missing branches with renames.
2017-04-10 15:11:36 +02:00
Jun Wu
e06554212d metadataonlyctx: replace "changeset()[0]" to "manifestnode()"
As Yuya pointed out [1], "changeset()[0]" could be simplified to
"manifestnode()". I didn't notice that method earlier. It should definitely
be used - it's easier to read, and faster.

[1]: https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095716.html
2017-04-07 11:02:43 -07:00
Jun Wu
faefa92b13 metadataonlyctx: speed up sanity check
Previously the sanity check will construct manifestctx for both p1 and p2.
But it only needs the "manifest node" information, which could be read from
changelog directly.
2017-03-26 12:26:35 -07:00
FUJIWARA Katsunori
aaa8db9cef misc: update descriptions about removed file for filectxfn
Since 2eef89bfd70d, filectxfn for memctx should return None for
removed file instead of raising IOError.
2017-03-24 22:13:23 +09:00
Gregory Szorc
5ca0f908bf py3: add __bool__ to every class defining __nonzero__
__nonzero__ was renamed to __bool__ in Python 3. This patch simply
aliases __bool__ to __nonzero__ for every class implementing
__nonzero__.
2017-03-13 12:40:14 -07:00
Pierre-Yves David
8561a8e8ff context: simplify call to icase matcher in 'match()'
The two function takes the very same arguments. We make this clearer and less
error prone by dispatching on the function only and having a single call point
in the code.
2017-03-15 15:38:02 -07:00
Pierre-Yves David
b9d210d680 context: explicitly tests for None
Changeset c832083e5671 removed the mutable default value, but did not explicitly
tested for None. Such implicit testing can introduce semantic and performance
issue. We move to an explicit testing for None as recommended by PEP8:

https://www.python.org/dev/peps/pep-0008/#programming-recommendations
2017-03-15 15:33:24 -07:00
Gregory Szorc
b1f8fb51cd context: don't use mutable default argument value
Mutable default argument values are a Python gotcha and can
represent subtle, hard-to-find bugs. Lets rid our code base
of them.
2017-03-12 21:50:42 -07:00
Augie Fackler
d214daa434 context: use portable construction to verify int parsing 2017-03-12 00:43:47 -05:00
Augie Fackler
d44c41fe19 context: implement both __bytes__ and __str__ for Python 3
They're very similar, for obvious reasons.
2017-03-11 20:57:40 -05:00
Augie Fackler
89600a72c4 context: work around long not existing on Python 3
I can't figure out what this branch is even trying to accomplish, and
it was introduced in 387a3aa50d61 which doesn't really shed any
insight into why longs are treated differently from ints.
2017-03-11 20:57:04 -05:00
Mads Kiilerich
a936a7f3a7 vfs: use repo.wvfs.unlinkpath 2015-01-14 01:15:26 +01:00
Durham Goode
2d56462702 context: remove uses of manifest.matches
This removes the uses of manifest.matches in context.py in favor of the new
manifest.diff(match) api. This is part of removing manifest.matches since it is
O(manifest).

To drop the dependency on ctx._manifestmatches(s) we transfer responsibilty for
creating status oriented manifests over to ctx._buildstatusmanifest(s). This
already existed for workingctx, we just need to implement a simple version for
basectx. The old _manifestmatches functionality is basically identical to the
_buildstatusmanifest functionality (minus the matching part), so no behavior
should be lost.
2017-03-07 17:52:45 -08:00
Durham Goode
16ae1e05bb context: remove assumptions about manifest creation during _buildstatus
Previously we called self.manifest() in some cases to preload the
first manifest. This relied on the assumption that the later
_manifestmatches() call did not duplicate any work the original
self.manifest() call did. Let's remove that assumption, since it bit
me during my refactors of this area and is easy to remove.
2017-03-07 17:49:50 -08:00
Durham Goode
d0909f9a09 context: move _manifest from committablectx to workingctx
committablectx had a _manifest implementation that was only used by the derived
workingctx class. The other derived versions, like memctx and metadataonlyctx,
define their own _manifest functions.

Let's move the function down to workingctx, and let's break it into two parts,
the _manifest part that reads from self._status, and the part that actually
builds the new manifest. This separation will let us reuse the builder code in a
future patch to answer _buildstatus with varying status inputs, since workingctx
has special behavior for _buildstatus that the other ctx's don't have.
2017-03-07 17:56:30 -08:00
Durham Goode
31c97f4a1c status: handle more node indicators in buildstatus
There are several different node markers that indicate different working copy
states. The context._buildstatus function was only handling one of them, and
this patch makes it handle all of them (falling back to file content comparisons
when in one of these states).

This affects a future patch where we get rid of context._manifestmatches as part
of getting rid of manifest.matches(). context._manifestmatches is currently
hacky in that it uses the newnodeid for all added and modified files, which is
why the current newnodeid check is sufficient. When we get rid of this function
and use the normal manifest.diff function, we start to see the other indicators
in the nodes, so they need to be handled or else the tests fail.
2017-03-07 09:56:11 -08:00
Denis Laxalde
ca5e4eec65 context: also return ancestor's line range in blockancestors 2017-01-16 17:14:36 +01:00
Denis Laxalde
d3bcba7d25 context: add a followfirst flag to blockancestors 2017-01-16 17:08:25 +01:00
Denis Laxalde
098c0d5368 context: extract _changesinrange() out of blockancestors()
We'll need it to write a blockdescendants function in next changeset.
2017-01-16 09:22:32 +01:00
Remi Chaintron
6d11b9177b revlog: add 'raw' argument to revision and _addrevision
This patch introduces a new 'raw' argument (defaults to False) to revlog's
revision() and _addrevision() methods.
When the 'raw' argument is set to True, it indicates the revision data should be
handled as raw data by the flagprocessor.

Note: Given revlog.addgroup() calls are restricted to changegroup generation, we
can always set raw to True when calling revlog._addrevision() from
revlog.addgroup().
2017-01-05 17:16:07 +00:00
Denis Laxalde
7092fa95d8 context: add a blockancestors(fctx, fromline, toline) function
This yields ancestors of `fctx` by only keeping changesets touching the file
within specified linerange = (fromline, toline).

Matching revisions are found by inspecting the result of `mdiff.allblocks()`,
filtered by `mdiff.blocksinrange()`, to find out if there are blocks of type
"!" within specified line range.

If, at some iteration, an ancestor with an empty line range is encountered,
the algorithm stops as it means that the considered block of lines actually
has been introduced in the revision of this iteration. Otherwise, we finally
yield the initial revision of the file as the block originates from it.

When a merge changeset is encountered during ancestors lookup, we consider
there's a diff in the current line range as long as there is a diff between
the merge changeset and at least one of its parents (in the current line
range).
2016-12-28 23:03:37 +01:00
Jun Wu
76875fcc1b context: correct metadataonlyctx's parameter
It's "originalctx", not "path" as Yuya pointed in [1].

[1]: www.mercurial-scm.org/pipermail/mercurial-devel/2016-December/091508.html
2016-12-16 21:02:39 +00:00
Mateusz Kwapich
3de25e93ce memctx: allow the metadataonlyctx thats reusing the manifest node
When we have a lot of files writing a new manifest revision can be expensive.
This commit adds a possibility for memctx to reuse a manifest from a different
commit. This can be beneficial for commands that are creating metadata changes
without any actual files changed like "hg metaedit" in evolve extension.

I will send the change for evolve that leverages this once this is accepted.
2016-11-21 08:09:41 -08:00
Durham Goode
fb55c2fbf3 dirstate: change added/modified placeholder hash length to 20 bytes
Previously the added/modified placeholder hash for manifests generated from the
dirstate was a 21byte long string consisting of the p1 file hash plus a single
character to indicate an add or a modify. Normal hashes are only 20 bytes long.
This makes it complicated to implement more efficient manifest implementations
which rely on the hashes being fixed length.

Let's change this hash to just be 20 bytes long, and rely on the astronomical
improbability of an actual hash being these 20 bytes (just like we rely on no
hash every being the nullid).

This changes the possible behavior slightly in that the hash for all
added/modified entries in the dirstate manifest will now be the same (so simple
node comparisons would say they are equal), but we should never be doing simple
node comparisons on these nodes even with the old hashes, because they did not
accurately represent the content (i.e. two files based off the same p1 file
node, with different working copy contents would have the same hash (even with
the appended character) in the old scheme too, so we couldn't depend on the
hashes period).
2016-11-10 02:19:16 -08:00
Durham Goode
03d313b5fd dirstate: change placeholder hash length to 20 bytes
Previously the new-node placeholder hash for manifests generated from the
dirstate was a 21byte long string of "!" characters. Normal hashes are only 20
bytes long.  This makes it complicated to implement more efficient manifest
implementations which rely on the hashes being fixed length.

Let's change this hash to just be 20 bytes long, and rely on the astronomical
improbability of an actual hash being 20 "!" bytes in a row (just like we rely
on no hash ever being the nullid).

A future diff will do this for added and modified dirstate markers as well, so
we're putting the new newnodeid in node.py so there's a common place for these
placeholders.
2016-11-10 02:17:22 -08:00
Durham Goode
82152f0a38 context: add manifestctx property on changectx
This allows us to access the manifestctx for a given commit. This will be used
in a later patch to be able to copy the manifestctx when we want to make a new
commit.
2016-11-08 08:03:43 -08:00
Durham Goode
59f421f2ce manifest: remove manifest.find
As part of removing dependencies on manifest, this drops the find function and
fixes up the two existing callers to use the equivalent apis on manifestctx.
2016-11-08 08:03:43 -08:00
Jun Wu
05c17d65e5 adjustlinkrev: remove unnecessary parameters
Since adjustlinkrev has "self", and is a method of a filectx object, it does
not need path, filelog, filenode. They can be fetched from the "self"
easily.
2016-11-01 08:22:50 +00:00
Mads Kiilerich
5d3bf8a78a context: make sure __str__ works, also when there is no _changectx
Before, it could crash when trying to print the wrong kind of object at the
wrong time.
2015-03-19 22:22:50 +01:00
Jun Wu
5ca5bb4a7c annotate: calculate line count correctly
Before this patch, the "lines" function inside "annotate" returns 1 for
empty text (''). This patch makes it 0. Because the function should match
mdiff.splitnewlines (used by mdiff.allblocks), or s.splitlines (used at the
end of the "annotate" method). Both len(mdiff.splitnewlines('')) and
len(''.splitlines(True)) are 0.

This issue was discovered while testing fastannotate [1].

I could not find a test case to reveal this issue. However in theory this
could reduce memory usage a little bit, and avoids surprises when people
are touching this area in the future.

[1]: https://bitbucket.org/facebook/hg-experimental/commits/525b3b98e93a
2016-10-01 14:18:58 +01:00
Philippe Pepiot
f0ea56f87f mdiff: remove unused parameter 'refine' from allblocks() 2016-09-27 14:46:34 +02:00
Durham Goode
8365e36e59 manifest: adds manifestctx.readfast
This adds a copy of manifest.readfast to manifestctx.readfast and adds a
consumer of it. It currently looks like duplicate code, but a future patch
causes these functions to diverge as tree concepts are added to the tree
version.
2016-09-13 16:26:30 -07:00
Durham Goode
23d229132c manifest: add manifestctx.readdelta()
This adds an implementation of readdelta to the new manifestctx class and adds a
couple consumers of it. This currently appears to have some duplicate code, but
future patches cause this function to diverge when things like "shallow" are
introduced.
2016-09-13 16:25:21 -07:00
Pierre-Yves David
5bcff70a60 merge with stable 2016-09-14 17:12:39 +02:00
Jun Wu
a5ec43fe40 annotate: pre-calculate the "needed" dictionary (issue5360)
The "needed" dict is used as a reference counter to free items in the giant
"hist" dict. However, currently it is not very accurate and can lead to
dropping "hist" items unnecessarily, for example, with the following DAG,

       -3-
      /   \
  0--1--2--4--

The current algorithm will visit and calculate rev 1 twice, undesired. And
it tries to be smart by clearing rev 1's parents: "pcache[1] = []" at the
time hist[1] being accessed (note: hist[1] needs to be used twice, by rev 2
and rev 3). It can result in incorrect results if p1 of rev 4 deletes chunks
belonging to rev 0.

However, simply removing "needed" is not okay, because it will consume 10x
memory:

  # without any change
  % HGRCPATH= lrun ./hg annotate mercurial/commands.py -r d130a38 3>&2 [1]
  MEMORY   49074176
  CPUTIME  9.213
  REALTIME 9.270

  # with "needed" removed
  MEMORY   637673472
  CPUTIME  8.164
  REALTIME 8.249

This patch moves "needed" (and "pcache") calculation to a separate DFS to
address the issue. It improves perf and fixes issue5360 by correctly reusing
hist, while maintaining low memory usage. Some additional attempt has been
made to further reduce memory usage, like changing "pcache[f] = []" to "del
pcache[f]". Therefore the result can be both faster and lower memory usage:

  # with this patch applied
  MEMORY   47575040
  CPUTIME  7.870
  REALTIME 7.926

[1]: lrun is a lightweight sandbox built on Linux cgroup and namespace. It's
     used to measure CPU and memory usage here. Source code is available at
     github.com/quark-zju/lrun.
2016-09-02 15:20:59 +01:00
Denis Laxalde
873ac38beb context: eliminate handling of linenumber being None in annotate
I could not find any use of this parameter value. And it arguably makes
understanding of the function more difficult. Setting the parameter default
value to False.
2016-07-11 14:44:19 +02:00
Matt Mackall
6c6e5bfc8d annotate: optimize line counting
We used len(text.splitlines()) to count lines. This allocates, copies, and
deallocates an object for every line in a file. Instead, we use
count("\n") to count newlines and adjust based on whether there's a
trailing newline.

This improves the speed of annotating localrepo.py from 4.2 to 4.0
seconds.
2016-05-18 16:37:32 -05:00
Matt Harbison
3036220cae verify: don't init subrepo when missing one is referenced (issue5128) (API)
Initializing a subrepo when one doesn't exist is the right thing to do when the
parent is being updated, but in few other cases.  Unfortunately, there isn't
enough context in the subrepo module to distinguish this case.  This same issue
can be caused with other subrepo aware commands, so there is a general issue
here beyond the scope of this fix.

A simpler attempt I tried was to add an '_updating' boolean to localrepo, and
set/clear it around the call to mergemod.update() in hg.updaterepo().  That
mostly worked, but doesn't handle the case where archive will clone the subrepo
if it is missing.  (I vaguely recall that there may be other commands that will
clone if needed like this, but certainly not all do.  It seems both handy, and a
bit surprising for what should be a read only operation.  It might be nice if
all commands did this consistently, but we probably need Angel's subrepo caching
first, to not make a mess of the working directory.)

I originally handled 'Exception' in order to pick up the Aborts raised in
subrepo.state(), but this turns out to be unnecessary because that is called
once and cached by ctx.sub() when iterating the subrepos.

It was suggested in the bug discussion to skip looking at the subrepo links
unless -S is specified.  I don't really like that idea because missing a subrepo
or (less likely, but worse) a corrupt .hgsubstate is a problem of the parent
repo when checking out a revision.  The -S option seems like a better fit for
functionality that would recurse into each subrepo and do a full verification.

Ultimately, the default value for 'allowcreate' should probably be flipped, but
since the default behavior was to allow creation, this is less risky for now.
2016-04-27 22:45:52 -04:00
Durham Goode
8f48c965d8 manifest: change manifestctx to not inherit from manifestdict
If manifestctx inherits from manifestdict, it requires some weird logic to
lazily load the dict if a piece of information is asked for. This ended up being
complicated and unintuitive to use.

Let's move the dict creation to .read(). This will make even more sense once we
start adding readdelta() and other similar methods to manifestctx.
2016-09-12 10:55:43 -07:00
Pierre-Yves David
eb569a3c73 manifest: backed out changeset ec5be4246a05
There is some suspicious failure in evolution tests. This changeset was supposed
to be dropped until we investigate.
2016-09-10 01:41:38 +02:00
Durham Goode
ab661bf355 manifest: change manifestctx to not inherit from manifestdict
If manifestctx inherits from manifestdict, it requires some weird logic to
lazily load the dict if a piece of information is asked for. This ended up being
complicated and unintuitive to use.

Let's move the dict creation to .read(). This will make even more sense once we
start adding readdelta() and other similar methods to manifestctx.
2016-08-31 12:46:53 -07:00
Martin von Zweigbergk
2fbf01764c util: rename checkcase() to fscasesensitive() (API)
I always read the name "checkcase(path)" as "do we need to check for
case folding at this path", but it's actually (I think) meant to be
read "check if the file system cares about case at this path". I'm
clearly not the only one confused by this as the dirstate has this
property:

  def _checkcase(self):
      return not util.checkcase(self._join('.hg'))

Maybe we should even inverse the function and call it fscasefolding()
since that's what all callers care about?
2016-08-30 09:22:53 -07:00
Durham Goode
3f8e8b7f90 manifest: change changectx to access manifest via manifestlog
This is the first place where we'll start using manifestctx instances instead of
manifestdict. This will facilitate using different manifestctx implementations
in the future.
2016-08-17 13:25:13 -07:00
Gregory Szorc
91a364e88d context: use changelogrevision
Upcoming patches will make the changelogrevision object perform
lazy parsing. Let's switch to it.

Because we're switching from a tuple to an object, everthing that
accesses the internal cached attribute needs to be updated to access
via attributes. A nice side-effect is this makes the code easier to
read!

Surprisingly, this appears to make revsets accessing this data
slightly faster (values are before series, p1, this patch):

author(mpm)
0.896565
0.929984
0.914234

desc(bug)
0.887169
0.935642
0.921073

date(2015)
0.878797
0.908094
0.891980

extra(rebase_source)
0.865446
0.922624
0.912514

author(mpm) or author(greg)
1.801832
1.902112
1.860402

author(mpm) or desc(bug)
1.812438
1.860977
1.844850

date(2015) or branch(default)
0.968276
1.005824
0.994673

author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.743381
3.721032
2016-03-06 13:26:37 -08:00
Matt Mackall
75ceec976e changelog: backed out changeset 4ef1c9b76e22 2016-03-02 16:05:30 -06:00
Matt Mackall
b4737e5b27 changelog: backed out changeset 9f92d143bdd2
We want to avoid leaking UTF-8 to main body of code wherever possible.
2016-03-02 12:46:54 -06:00
Gregory Szorc
ab7f7f6a19 changelog: lazy decode user (API)
This appears to show a similar speedup as the previous patch.
2016-02-27 22:34:18 -08:00
Gregory Szorc
6f651cb96b changelog: lazy decode description (API)
Currently, changelog reading decodes read values. This is wasteful
because a lot of times consumers aren't interested in some of these
values.

This patch changes description decoding to occur in changectx as
needed.

revsets reading changelog entries appear to speed up slightly:

revset #7: author(lmoscovicz)
   plain
0) 0.906329
1) 0.872653

revset #8: author(mpm)
   plain
0) 0.903478
1) 0.878037

revset #9: author(lmoscovicz) or author(mpm)
   plain
0) 1.817855
1) 1.778680

revset #10: author(mpm) or author(lmoscovicz)
   plain
0) 1.837052
1) 1.764568
2016-02-27 22:25:14 -08:00
Durham Goode
a6b3658fd5 filectx: replace use of _filerev with _filenode
_filerev depends on the filelog implementation using revlogs and linkrevs.
Alternative implementations, like remotefilelog, do not have rev numbers, so
this call fails. Replacing it with _filenode means it doesn't rely on rev
numbers, and doesn't cost anything extra, since _filerev is using _filenode
under the hood anyway.
2016-02-08 14:17:11 -08:00