Having a dedicated function will allow us to experiment with other exchange
strategies in an extension. As we have no solid clues about how to do it right,
being able to experiment is vital.
Some transaction tricks are necessary for pull. But nothing too scary.
Having a dedicated function will allows us to experiment with other exchange
strategies in an extension. As we have no solid clues about how to do it right,
being able to experiment is vital.
I intended a more ambitious extraction of push logic, but we are far too
advanced in the release cycle for it.
This patch makes "_journalfiles()" return a list of pairs of journal
file and corresponded vfs instance instead of a list of journal files
in full path, to use "vfs.rename()" instead of "util.rename()" in
"aftertrans()".
"undofiles()" still returns a list of undo files in full path, because
"repair.strip()" expects such list. It'll be also made to return a
list of pairs of undo file and corresponded vfs at vfs migration for
"repair.strip()" in the near future.
In the point of view of efficiency, "vfs" instance created in this
patch should be passed to and reuse in "store.store()" invocation just
after patched code block, because "store" object is initialized by vfs
created with "self.sharedpath".
eBut to focus just on migration from direct file I/O API accessing to
vfs, this patch uses created vfs as temporary one. Refactoring around
"store.store()" invocation will be done in the future.
Before this patch, vfs constructor applies both "util.expandpath()"
and "os.path.realpath()" on "base" path, if "expand" is True.
This patch splits it into "realpath" and "expandpath", to apply each
functions separately: this splitting can allow to use vfs also where
one of each is not needed.
When the strip command is run, it calls repo.destroyed, which in turn checks if
we read _phasecache, and if we did calls filterunknown on it and flushes the
changes immediately. But in some cases, nothing causes _phasecache to be read,
so we miss out on this and the file remains the same on-disk.
Then a call to invalidate comes, which should refresh _phasecache if it
changed, but it didn't, so it keeps using the old one with the stripped
revision which causes an IndexError.
Test written by Yuya Nishihara.
Publishing server may contains draft changeset when they are created locally. As
publishing is the default, it is actually fairly common. Because of this
"inconsistency" phases synchronization may be done even to publishing server.
This may cause severe issues for subrepo. It is possible to reference read-only
repository as subrepo. Push in a super repo recursively push subrepo. Those
pushes to potential read only repo are not optional, they are "suffered" not
"choosed". This does not break because as the repo is untouched the push is
supposed to be empty. If the reference repo locally contains draft changesets, a
courtesy push is triggered to turn them public. As the repo is read only, the
push fails (after possible prompt asking for credential). Failure of the
sub-push aborts the whole subrepo push. This force the user to define a custom
default-push for such subrepo.
This changeset introduce a prevention of this error client side by skipping the
courtesy phase synchronisation in problematic situation. The phases
synchronisation is skipped when four conditions are gathered:
- this is a subrepo push, (normal push to read-only repo)
- and remote support phase
- and remote is publishing
- and no changesets was pushed (if we pushed changesets, repo is not read only)
The internal config option used in this version is not definitive. It is here to
demonstrate a working fix to the issue.
In the future we probably wants to track subrepo changes and avoid pushing to
untouched one. That will prevent any attempt to push to read-only or unreachable
subrepo.
Another fix to prevent courtesy push from older clients to push to newer server
is also still needed.
When you have obsolescence marker that apply to a pulled changesets, the added
changeset is immediately filtered. Then the list of added changeset needs to be
build against and unfiltered repo.
This saves us a couple of dict lookups in the common case, and improves
the performance of the status method by 5% (measured with util.timed)
in a repo with a large manifest.
User 'timeless' in irc mentioned that having the blackbox be
translated would result in logs that:
- may be mixed language, if multiple users use the same repo
- are not google searchable (since searching for english gives more
results)
- might not be readable by an admin if the employee is using hg in
his native language
And therefore we should log everything in english.
The blackbox was logging every head after every incoming group.
Now we only log the heads that have changed.
Added a test. Moved the hooks test to the bottom of the file since
the hooks interfer with the tests after it.
Logs incoming changes to a repo to ui.log(). Includes the number of changes
and the hashes of the heads after the new changes.
Example log line:
2013/02/09 08:35:19 durham> 1 incoming changes - new heads: cb9a9f314b8b
Currently the blackbox logs the unix user that is performing the push/pull.
It would be nice to log the http authorized user as well so it works with
hgweb, but that's outside the scope of this commit.
This pulls some of the logic for the cleanup that needs to happen
after a commit has been made otu of localrepo.commit and into
workingctx. This is part of a larger refactoring effort that will
eventually allow us to perform some types of merges in-memory.
This changes localrepo.commit to use the workingctx it creates form
the munged output of localrepo.status while running some precommit
validation. Specifically, it uses functions that were already present
on the workingctx. I believe this is a net readabilty improvement,
and that this makes these lines consistent with the refactoring in a
subsequent patch that pulls some of the validation logic into
workingctx so that it can be reused elsewhere.
localrepo.commit creates a workingctx, calls self.status, does some
munging on the changes status returns, does some validation on those
changes, and then creates a new workingctx from the changes. This
moves the creation of the new workginctx ahead of some validation,
with the intention of refactoring some of that validation logic into
the workingctx, so that it can be reused elsewhere.
Accepting those will lead to "mild corruption", correctly reported as
an error by hg verify, but often not a problem in practice.
Enabled when server.validate is switched on.
Writing cache for unfiltered repo only is barely useful, Most repo user are now
at least use the `hidden` filter. This changeset now assigns the remote cache
for a filter as low as possible for a wider reuse as possible.
Before this changesets the `destroyed` function updated the branchcache for
unfiltered repository. As seen in a previous changeset, Read only repo does
not cares about the unfiltered repo. We now update it for `unserved`.
The strip code used a trick to lower the cost of branchcache update after a
strip. However is less necessary since we have branchcache collaboration.
Invalid branchcache are likely to be cheaply rebuilt again a near subset of the
repo.
Moreover, this trick would need update to be relevant in the now filtered
repository world. It currently update the unfiltered branchcache that few people
cares about. Make it smarter on that aspect would need complexes update of the
calling logic
So this mechanism is:
- Arguably needed,
- Currently irrelevant,
- Hard to update
and I'm dropping it.
We now update the branchcache in all case by courtesy of the read only reader.
This changeset have a few expected impact on the testsuite are different cache
are updated.
The `commitctx` and `addchangegroup` methods of repo upgrade branchcache after
completion. This behavior aims to keep the branchcache in sync for read only
process as hgweb. See b4909adfc093 for details.
Since changelog filtering is used, those calls only update the cache for unfiltered repo.
One of no interest for typical read only process like hgweb.
Note: By chance in basic case, `repo.unfiltered() == repo.filtered('unserved')`
This changesets have the "unserved" cache updated instead. I think this is the
only cache that matter for hgweb.
We could imagine updating all possible branchcaches instead but:
- I'm not sure it would have any benefit impact. It may even increase the odd of
all cache being invalidated.
- This is more complicated change.
So I'm going for updating a single cache only which is already better that
updating a cache nobody cares about.
This changeset have a few expected impact on the testsuite are different cache
are updated.
Now that changelog filtering is in place, it's become evident that
naming the filters according to the set of revs _not_ included in the
filtered changelog is confusing. This is especially evident in the
collaborative branch cache scheme.
This changes the names of the filters to reflect the revs that _are_
included:
hidden -> visible
unserved -> served
mutable -> immutable
impactable -> base
repoview.filteredrevs is renamed to filterrevs, so that callers read a
bit more sensibly, e.g.:
filterrevs('visible') # filter revs according to what's visible
Add sorted() in places found by testing with PYTHONHASHSEED=random and code
inspection.
An alternative to sprinkling sorted() all over would be to change substate to a
custom dict with sorted iterators...
We need to make sure that if X is in the filecache then it's also in the
filecache owner's __dict__, otherwise it will go out of sync:
repo.X # first access to X, records stat info in
# filecache and updates __dict__
repo._filecache.clear() # removes X from _filecache but it's still in __dict__
repo.invalidate() # iterates over _filecache and removes entries
# from __dict__, but X isn't in _filecache, so
# it's kept in __dict__
repo.X # X is fetched from __dict__, bypassing the filecache
We call invalidate to remove properties from __dict__ because they're
possibly outdated and we'd like to check for a new version. Next time
the property is accessed the filecache mechanism checks the current stat
info with the one recorded at the last time the property was read, if
they're different it recreates the property.
Previously we refreshed the stat info on all properties in the filecache
when the lock is released, including properties that are missing from
__dict__. This is a problem because:
l = repo.lock()
repo.P # stat info S for P is recorded in _filecache
<changes are made to repo.P indirectly, e.g. underlying file is replaced>
# P's new stat info = S'
l.release() # filecache refreshes, records S' as P's stat info
At this point our filecache contains P with stat info S', but P's
version is from S, which is outdated.
The above happens during _rollback and strip. Currently we're wiping the
filecache and forcing everything to reload from scratch which works but
isn't the right solution.
Creation of changectx objects is very slow, and they are not very
useful. We are going to drop them. The first step is to change the
function argument type.
This changeset installs a broad filter on most repos used for
serving. This removes the need to use the `visiblehead`/`visiblebranchmap`
functions, and ensures that changesets we should not serve are in
fact never served.
We do not use filtering on hgweb yet, as there is still a number
of issues to solve there.
The `filteredrevs` function already have a cache mechanism. And this cache in
invalidated at the same time than the current property cache. So we drop the
cache on the property.
The property itself is going to be dropped soon.
There is not good reason for this computation to be handle in a different way
from the other. We are moving the computation of hidden revs in the filter
function. In later changesets, code that access to `repo.hiddenrevs` will be
updated and the property dropped.
Disables this simple optimisation to allow coming more powerfull approach: cache
collaboration.
Our goal is to have branchcache collaborate. This means that unfiltered
branchcache will fallback to some filtered branchcache if invalid. We can't have
the filtered branchcache to use the unfiltered one. That would loop.
When commit is followed by strip (qrefresh), phasecache contains nodes that were
removed from the changelog. Since phasecache is filecached with .hg/store/phaseroots
which doesn't change as a result of stripping, we have to filter it manually.
If we don't write it immediately, the next time it is read from disk the nodes
will be filtered again. That's what happened before, but there's no reason not
to write it immediately.
The change in test-keyword.t is caused by the above.
The `_branchcache` attribute is turned into a dictionary. Key are filter name and
value is a `branchcache` object. Unfiltered version is cached as `None` filter.
The attribute is renamed to `_branchcaches` to avoid confusion with the previous
one. Both old and new contents are dictionary even if their contents are
different. I prefer possible extension code to crash right away instead of just
messing the wrong dictionary.
As all different caches work isolated to each other, this code keeps the
previous behavior of using the unfiltered cache we nothing is filtered. This
is a cheap way to have cache collaborate and nullify potential impact in the
default case.