All versions of Python we support or hope to support make the hash
functions available in the same way under the same name, so we may as
well drop the util forwards.
The atomictemp.close() file attempts to do a rename, which can fail.
Moving the close inside the exception handler fixes it.
This doesn't fit well with the with: pattern, as it's the finalizer
that's failing.
Before this patch, revisions rollbacked at failure of previous
transaction might be visible at subsequent operations unintentionally,
if repoview object is reused even after failure of transaction:
e.g. command server and HTTP server are typical cases.
'repoview' uses the tuple of values below of unfiltered changelog as
"the key" to examine validity of filtered changelog cache.
- length
- tip node
- filtered revisions (as hashed value)
- '_delayed' field
'repoview' compares between "the key" of unfiltered changelog at
previous caching and now, and reuses filtered changelog cache if no
change is detected.
But this comparison indicates only that there is no change between
unfiltered 'repo.changelog' at last caching and now, but not that
filtered changelog cache is valid for current unfiltered one.
'repoview' uses "shallow copy" of unfiltered changelog to create
filtered changelog cache. In this case, 'index' buffer of unfiltered
changelog is also referred by filtered changelog.
At failure of transaction, unfiltered changelog itself is invalidated
(= un-referred) on the 'repo' side (see b7829fc79508 also). But
'index' of it still contains revisions to be rollbacked at this
failure, and is referred by filtered changelog.
Therefore, even if there is no change between unfiltered
'repo.changelog' at last caching and now, steps below makes rollbacked
revisions visible via filtered changelog unintentionally.
1. instantiate unfiltered changelog as 'repo.changelog'
(call it CL1)
2. make filtered (= shallow copy of) CL1
(call it FCL1)
3. cache FCL1 with "the key" of CL1
4. revisions are appended to 'index', which is shared by CL1 and FCL1
5. invalidate 'repo.changelog' (= CL1) at failure of transaction
6. instantiate 'repo.changelog' again at next operation
(call it CL2)
CL2 doesn't have revisions added at (4), because it is
instantiated from '00changelog.i', which isn't changed while
failed transaction.
7. compare between "the key" of CL1 and CL2
8. FCL1 cached at (3) is reused, because comparison at (7) doesn't
detect change between CL1 at (1) and CL2
9. revisions rollbacked at (5) are visible via FCL1 unintentionally,
because FCL1 still refers 'index' changed at (4)
The root cause of this issue is that there is no examination about
validity of filtered changelog cache against current unfiltered one.
This patch discards filtered changelog cache, if its 'index' object
isn't shared with unfiltered one.
BTW, at the time of this patch, redundant truncation of
'00changelog.i' at failure of transaction (see b7829fc79508 for
detail) often prevents "hg serve" from making already rollbacked
revisions visible, because updating timestamps of '00changelog.i' by
truncation makes "hg serve" discard old repoview object with invalid
filtered changelog cache.
This is reason why this issue is overlooked before this patch, even
though test-bundle2-exchange.t has tests in similar situation: failure
of "hg push" via HTTP by pretxnclose hook on server side doesn't
prevent subsequent commands from looking up outgoing revisions
correctly.
But timestamp on the filesystem doesn't have enough resolution for
recent computation power, and it can't be assumed that this avoidance
always works as expected.
Therefore, without this patch, this issue might appear occasionally.
Before this patch if the hiddencache existed but was empty, it would crash
mercurial. This patch adds exception handling when reading the hiddencache to
avoid the issue.
When encountering a corrupted cache file we print a devel warning. There would
be no point in issuing a normal warning as the user wouldn't be able to do
anything about the situation.
The warning looks like:
devel-warn: corrupted hidden cache, removing it at: /path/to/repoview.py
Getting the data necessary for the cache key using the changelog/revlog method
adds a significant overhead. Given how simple the underlying implementation is
and often this code is ran, it makes sense to violate layering and directly
compute the data.
Testing `hg log` on Mozilla-central, this reduce the time spent on changelog
cache validation by an extra half:
before: 12.2s of 69s
after: 6.1s of 62s
Total speed up from this patch and it's parent is 3x
(With stupid python profiler overhead)
The global speedup without profiler overhead is still there,
Before: 51s
After: 39s (-23%)
As explained in the comment, we were computing the key of the cache value every
time because of some obscure MQ test failure. I've dropped that code and ran the
test again that failure is gone. I assume some transaction cleanup got rid of
it.
So we are dropping that code. This provide a significant speedup.
Testing `hg log` on Mozilla-central this reduce the time spent on changelog
cache validation by a third:
before: 19.5s of 80s
after: 12.2s of 69s
(With stupid python profiler overhead)
Since all children are processed before their parents, we can apply the following algorithm:
For each rev (descending order):
* If I'm still hidden, no children will block me,
* If I'm not hidden, I must remove my parent from the hidden set,
This allows us to dynamically change the set of 'hidden' revisions, dropping the
need for the 'actuallyhidden' dictionary and the 'blocked' boolean in the queue.
As before, we start iterating from all heads and stop at the first public
changesets. This ensures the hidden computation is 'O(not public())' instead of
'O(len(min(not public()):))'.
Since we want to process all non-public changesets from top to bottom, a heap
seems more appropriate. This will ensure any revision is processed after all
its children, opening the way to code simplification.
Previously we would compute the repoview's static blockers by finding all the
children of hidden commits that were not hidden. This was O(number of commits
since first hidden change) since 'children' requires walking every commit from
tip until the first hidden change.
The new algorithm walks all heads down until it sees a public commit. This makes
the computation O(number of draft) commits, which is much faster in large
repositories with a large number of commits and a low number of drafts.
On a large repo with 1000+ obsolete markers and the earliest draft commit around
tip~200000, this improves computehidden perf by 200x (2s to 0.01s).
Starting with b1e85ff3a7fc, when a clone reached the checkout stage,
the cached changelog in the filtered view was still seeing the
_delayed flag, even though the changelog had already been finalized.
Monkey patching repoview does not really work and making it really work will
be really hard. So we better have it broken without complexity than broken with
extra complexity.
It doesn't seem to be a common idiom for repo instances, but the status() method
is replaced in largefiles' purge() override. Since __setattr__ is implemented
in repoview to setattr() on the unfiltered repo, the replacement method wouldn't
get called unless it was invoked with the unfiltered repo, because the filtered
repo remains unchanged.
Since this doesn't seem to be commonly used, I didn't bother to filter out
methods that perhaps shouldn't be replaced, such as changelog().
We have been running hiddencache verification since 3.1.1 and so
far not received a bug report concerning it. Therefore we remove
the verification code and make the hiddencache authoritive. That
way we get the intended speedup.
Incidentally, this avoids the changelog cache being invalidated each time
it's accessed on a repoview.
On a filtering experiment on a repository the size of mozilla-central,
this makes a significant difference:
Before, running hg log -l 10 --time with about 8k changesets filtered out:
time: real 1.490 secs (user 1.450+0.000 sys 0.040+0.000)
After:
time: real 0.540 secs (user 0.530+0.000 sys 0.010+0.000)
Use the introduced caching infrastructure to cache hidden
changesets. We crosscheck if the content of the cache unless
experimental.verifyhiddencache is set to False. This will be removed
in the future. Without crosschecking the caches speed ups hg status and
other commands:
without caching:
$ time hg status
hg status 0.72s user 0.20s system 100% cpu 0.917 total
with caching
$ time hg status
hg status 0.49s user 0.15s system 100% cpu 0.645 total
Add a caching infrastructure to cache hidden changesets. The cache tries to read
the cache lazily and falls back to recomputing if no wlock can be obtain.
To validate the cache we store a sha of the obstore content and repo heads in
the beginning of the cache which we check every request.
Split up _gethiddenblockers into two categories: (1) "static' blockers
that solely rely on the contents of obstore and are visible children of
hidden changsets. (2) "dynamic" blockers, appearing by having wd parents,
bookmarks or tags pointing to hidden changesets.
We assume that (1) doesn't change often and can be easily cached with a good
invalidation strategy. (2) change often, but barely produce blockers, so we
can recompute them if necessary.
No functionality has changed, since we've only extracted the code into its own
function. Now extensions can wrap _gethiddenblockers to provide their own
blocker without polluting bookmarks or local tags.
For repos with a large number of heads (including hidden heads), a stale tag
cache would cause computehidden to be drastically slower because of a the call
to repo.tags() (which would build the tag cache).
We actually don't need the tag cache for computehidden because we filter out
global tags. This patch replaces the call to repo.tags with readlocaltags so
as to avoid the tag cache.
Previously, only bookmarks would be considered for blocking a changeset from
being hidden. Now, we also consider non-global tags. This is helpful if we have
local tags that might be hard to find once they are hidden, or tag that are
added by extensions (e.g. hggit or remotebranches).
Changeset aad678a92970 moved `subsettable` from `mercurial/repoview.py` to
`mercurial/branchmap.py`. This mean that `filtertable` and `subsettable` are no
longer next to each other. So we add a comment to remind people to update both.
This is a step towards breaking an import cycle between revset and
repoview. Import cycles happened to work in Python 2 with implicit
relative imports, but breaks on Python 3 when we start using explicit
relative imports via 2to3 rewrite rules.