When changelogs are written, a copy of the index (or inline revlog)
may be written to an 00changelog.i.a file to facilitate hooks and
other processes having access to the pending data before it is
finalized.
The way it works today, the localrepo class loads the changelog
like normal. Then, if it detects a pending transaction, it asks
the changelog class to load a pending changelog. The changelog
class looks for a 00changelog.i.a file. If it exists, it is
loaded and internal data structures on the new revlog class are
copied to the original instance.
The existing mechanism is inefficient because it loads 2 revlog
files. The index, node map, and chunk cache for 00changelog.i
are thrown away and replaced by those for 00changelog.i.a.
The existing mechanism is also brittle because it is a layering
violation to access the data structures being accessed. For example,
the code copies the "chunk cache" because for inline revlogs
this cache contains the raw revision chunks and allows the original
changelog/revlog instance to access revision data for these pending
revisions. This whole behavior of course relies on the revlog
constructor reading the entirety of an inline revlog into memory
and caching it. That's why it is brittle. (I discovered all this
as part of modifying behavior of the chunk cache.)
This patch streamlines the loading of a pending 00changelog.i.a
revlog by doing it directly in the changelog constructor if told
to do so. When this code path is active, we no longer load the
00changelog.i file at all.
The only negative outcome I see from this change is if loading
00changelog.i was somehow facilitating a role. But I can't imagine
what that would be because we throw away its data (the index data
structures are replaced and inline revision data is replaced via
the chunk cache) and since 00changelog.i.a is a copy of
00changelog.i, file content should be identical, so there should
be no meaninful file integrity checking at play. I think this was
all just sub-optimal code.
We are about to remove the branchmap cache update in changegroup application.
There is a debug message alongside this update that we do not want to loose. We
move the message beforehand to simplify the test update in the next changeset.
The message move is quite noisy and isolating that noise is useful.
Most tests update are just line reordering since the message is issued at a
later point during the transaction.
After this changes, the message is displayed in more case since local commit
creation also issue it.
Now that we garantee that branchmap cache are updated at the end of the
transaction we can drop that one. This removes a problematic case with nested
transaction where the new cache could be written on disk before the transaction
is finished.
The test change is harmless, since we update the cache at a later point, the
dirstate have been updated in between.
Regenerating the cache after a 'strip' or a 'rollback' is useful. So we call the
generic cache warming function as other caches than just branchmap will be
updated there in the future.
To do so, we have to make 'repo.updatecache()' able to take no arguments. In
such cases, we reload all caches.
We have multiple caches that gain from being kept up to date. For example in a
server setup, we want to make sure the branchcache cache is hot for other
read-only clients.
Right now each cache tries to update themself in place where new data have been
added. However the approach is error prone (we might miss some spot) and
fragile. When nested transaction are involved, such cache updates might happen
before a top level transaction is committed. Writing caches for uncommitted
data on disk.
Having a single entry point, run at the end of each successful transaction,
helps to ensure the cache is up to date and refreshed at the right time.
We start with updating the branchmap cache but other will come.
Tracking revisions is not the data that will unlock the most new capability.
However, they are the simplest thing to track and still unlock some nice
improvements in regard with caching.
We plug ourself at the changelog level to make sure we do not miss any revision
additions.
The 'revs' set is configured at the repository level because the transaction
itself does not needs to know that much about the business logic.
It seems like localrepo.getbundle() is trying to do the same thing, so
let's just call the method. That way we get the same condition as
there (matching any "HG2" prefix, not only "HG20").
The tag changes information we compute is now written to disk. This gives
hooks full access to that data.
The format picked for that file uses a 2 characters prefix for the action:
-R: tag removed
+A: tag added
-M: tag moved (old value)
+M: tag moved (new value)
This format allows hooks to easily select the line that matters to them without
having to post process the file too much. Here is a couple of examples:
* to select all newly tagged changeset, match "^+",
* to detect tag move, match "^.M",
* to detect tag deletion, match "-R".
Once again we rely on the fact the tag tests run through all possible
situations to test this change.
We now compute the proper actuall differences between tags before and after the
transaction. This catch a couple of false positives in the tests.
The compute the full difference since we are about to make this data available
to hooks in the next changeset.
This changeset introduces detection of tags changes during transaction. When
this happens a 'tag_moved=1' argument is set for hooks, similar to what we do
for bookmarks and phases.
This code is disabled by default as there are still various performance
concerns. Some require a smarter use of our existing tag caches and some other
require rework around the transaction logic to skip execution when unneeded.
These performance improvements have been delayed, I would like to be able to
experiment and stabilize the feature behavior first.
Later changesets will push the concept further and provide a way for hooks to
know what are the actual changes introduced by the transaction. Similar work
is needed for the other families of changes (bookmark, phase, obsolescence,
etc). Upgrade of the transaction logic will likely be performed at the same
time.
The current code can report some false positive when .hgtags file changes but
resulting tags are unchanged. This will be fixed in the next changeset.
For testing, we simply globally enable a hook in the tag test as all the
possible tag update cases should exist there. A couple of them show the false
positive mentioned above.
See in code documentation for more details.
This is minor update along the way. We simplify the 'findglobaltags' function to
only return the tags. Since no existing data is reused, we know that all tags
returned are global and we can let the caller get that information if it cares
about it.
As far as I understand, that function do not needs to be on the local repository
class, so we extract it in the 'tags' module were it will be nice and
comfortable. We keep the '_' in the name since its only user will follow in the
next changeset.
At the beginning of March, I promised Yuya that I would follow up a comment I
made on a patch with improved documention for these vfs objects. Also hat tip
to Pierre-Yves for adding the documentation here in the first place.
On Python 3, keys() is more like iterkeys(), so we got in trouble for
mutating the dict while we're iterating here. Since the list of caches
should be relatively small, work around this difference by just
forcing a copy of the key list.
The patch lingered a bit too long in my local clone and I messed up when I
updated the version number. Since nobody caught it, I'm fixing the version after
the fact.
localrepo have an insane amount of method. Accessing the feature through the
vfs is not really harder and allow us to schedule that method for removal.
We are about to turn the 'join' method of the base class Abstract, so we need
on to be defined in the localrepo. The ultimate goal here is to be able to stop
relying for the 'localrepo' class to have a 'join' methods (there is above one
hundred methods on 'localrepo'. This change make te 'repo' file cache have its
own code so that we can prepare this change to the repostory class.
explicite join
When debugging in a Python shell, the type of "repo" is "proxycls", which
could confuse new people.
In [1]: repo
Out[1]: <mercurial.localrepo.proxycls at 0x7f65d4b976d0>
Let's rename it to "filteredrepo" to make it clearer.
Now that the 'vfs' classes moved in their own module, lets use the new module
directly. We update code iteratively to help with possible bisect needs in the
future.
In "aftertrans", we rename "journal.*" to "undo.*". We expect "journal.*"
files to disappear after renaming.
However, if "journal.foo" and "undo.foo" refer to a same file (hardlink),
rename may be a no-op, leaving both files on disk, according to Linux
manpage [1]:
If oldpath and newpath are existing hard links referring to the same
file, then rename() does nothing, and returns a suc‐ cess status.
The POSIX specification [2] is not very clear about what to do.
To be safe, remove "undo.*" before the rename so "journal.*" cannot be left
on disk.
[1]: http://man7.org/linux/man-pages/man2/rename.2.html
[2]: http://pubs.opengroup.org/onlinepubs/9699919799/
Storing a relative path the source repository is useful when exporting
repositories over the network or when they're located on external
drives where the mountpoint isn't always fixed.
Currently, Mercurial interprets paths in `.hg/shared` relative to
$PWD. I suspect this is very much unintentional, and you have to
manually edit `.hg/shared` in order to trigger this behaviour.
However, on the off chance that someone might rely on it, I added a
new capability called 'relshared'. In addition, this makes earlier
versions of Mercurial fail with a graceful error.
I should note that I haven't tested this patch on Windows.
The 'ui' object dedicated to a 'localrepo' is independent from the one available
in dispatch (and 'uisetup'). In addition, it is created from the 'baseui'
(apparently for good reason). As a result, we need to run the color setup on
it after the local repository config is read.
This was overlooked when the rest of the initialization changed but did not
had impact yet because all setup is still global. We fix it before it is too
late.
Before this patch, checking HG_PENDING for changelog in localrepo.py
might cause unintentional reading unrelated '00changelog.i.a' in,
because HG_PENDING is checked by str.startswith().
An external hook spawned by inner repository in nested ones satisfies
this condition.
This patch uses txnutil.mayhavepending() to check HG_PENDING strictly.
BTW, this patch may cause failure of bisect in the repository of
Mercurial itself, if examination at bisecting assumes that an external
hook can see all pending changes while nested transactions across
repositories.
This invisibility issue will be fixed by subsequent patch, which
allows HG_PENDING to refer multiple repositories.