Adds a new function to treemanifest that allows walking over the directories in
the tree. Currently it only accepts a matcher to prune the walk, but in the
future it will also accept a list of trees and will only walk over subtrees that
differ from the versions in the list. This will be useful for identifying what
parts of the tree are new to this revision, which is useful when deciding the
minimal set of trees to send to a client given that they have a certain tree
already.
Since this is intended for an extension to use, the only current consumer is a
test. In the future this function may be useful for implementing other
algorithms like diff and changegroup generation.
This doesn't look nice, but a straightforward way to support Python 3.
bytes(m[start:end]) is needed because a memoryview doesn't support ordering
operations. On Python 2, m[start:end] returns a bytes object even if m is
a buffer, so calling bytes() should involve no additional copy.
I'm tired of trying cleaner alternatives, including:
a. extend memoryview to be compatible with buffer type
=> memoryview is not an acceptable base type
b. wrap memoryview by buffer-like class
=> zlib complains it isn't bytes-like
In a flat manifest, a node with the same content but different parents is still
considered a new node. In the current tree manifests however, if the content is
the same, we ignore the parents entirely and just reuse the existing node.
In our external treemanifest extension, we want to allow having one treemanifest
for every flat manifests, as a way of easeing the migration to treemanifests. To
make this possible, let's change the root node treemanifest behavior to match
the behavior for flat manifests, so we can have a 1:1 relationship.
While this sounds like a BC breakage, it's not actually a state users can
normally get in because: A) you can't make empty commits, and B) even if you try
to make an empty commit (by making a commit then amending it's changes away),
the higher level commit logic in localrepo.commitctx() forces the commit to use
the original p1 manifest node if no files were changed. So this would only
affect extensions and automation that reached passed the normal
localrepo.commit() logic straight into the manifest logic.
As part of removing manifest.matches (since it is O(manifest)), let's start by
adding match arguments to diff and filesnotin. As we'll see in later patches,
these are the only flows that actually use matchers, so by moving the matching
into the actual functions, other manifest implementations can make more efficient
algorithsm.
For instance, this will allow treemanifest diff's to only iterate over the files
that are different AND meet the match criteria.
No consumers are changed in this patches, but the code is fairly easy to verify
visually. Future patches will convert consumers to use it.
One test was affected because it did not use the kwargs version of the clean
parameter.
We were storing the repo on the manifestctx objects so that they could access
the manifestlog via repo.manifestlog, which would refresh the structure if it
became out of date. This caused probems however when we want to have multiple
manifest logs in memory at once, like when transitioning to tree manifest from
flat manifests, since a tree manifest would try to access sub-trees via
repo.manifestlog[node], which was the flat manifest.
The solution is to just not store the repo, and instead store the manifestlog
that created this context. This removes the invalidation when the in memory
manifestlog becomes out of date, but people should probably not be keeping ctx's
around that long anyway.
Previously we had hardcoded the manifest filename to be 00manifest.i. In our
external treemanifest extension, we want to allow writing a treemanifest side by
side with a flat manifest, so we need to be able to store the root revisions at
a different location (in our extension we use 00manifesttree.i).
This patches moves the revlog name to a parameter so we can adjust it.
The old code here would end up executing __len__ on a tree manifest to determine
if 'not _data' was true or not. This was very expensive on large repos. Since
this function just cares about memoization, we can just check 'if _data is None'
instead and save a bunch of time.
Most manifestctx creation already happened in manifestlog.get(), but there was
one spot in the manifestctx class itself that created an instance manually. This
patch makes that one instance go through the manifestlog. This means extensions
can just wrap manifestlog.get() and it will cover all manifestctx creations. It
also means this code path now hits the manifestlog cache.
Previously, treemanifestctx would directly construct its subtrees. By making it
get the subtrees through manifestlog.get() we consolidate all treemanifestctx
creation into manifestlog.get() and therefore extensions that need to wrap
manifestctx creation (like narrow-hg) can intercept manifestctxs at that single
place.
This also means fetching subtrees will take advantage of the manifestlog ctx
cache now, which it did not before.
This patches adds an parameter to manifestlog.get() to disable hash checking.
This will be used in an upcoming patch to support treemanifestctx reading
sub-trees without loading them from the revlog. (This is already supported but
does not go through the manifestlog.get() code path)
As we start to make manifestlog the primary manifest source, the dependency on
manifest.manifest will cause circular dependency problems. Let's break this
dependency by making manifestlog use it's own cache. In a near future patch we
will remove the previous manifest cache so we're not duplicating it.
As part of migrating all manifest functionality out of manifest.manifest, let's
migrate a couple spots off of manifest.dirlog() to use the revlog specific
accessor. Then we can delete manifest.dirlog() and other unused functions.
A future patch will be removing the read() function from the manifest class.
Since manifestrevlog currently depends on the read function that manifest
implements (as a derived class), we need to break the dependency from
manifestrevlog to read(). We do this by adding an argument to
manifestrevlog.write() which provides it with the ability to read a manifest.
This is good in general because it further separates revlog as the storage
format from the actual inmemory data structure implementation.
This removes one more dependency on the manifest class by moving the write
functionality onto the memmanifestctx classes and changing the one consumer to
use the new API.
By moving the write path to a manifestctx, we now give the individual manifests
control over how they're read and serialized. This will be useful in developing
new manifest formats and storage systems.
This adds copy functionality to the manifestctx classes. This will be used in an
upcoming diff to copy a manifestctx during commit so we can modify the manifest
before committing.
This introduces two new classes to represent in-memory manifest instances.
Similar to memchangectx, this lets us prepare a manifest in memory, then in a
future patch we will add the apis that can commit this in memory structure.
The `self._repo.manifestlog._revlog` code is getting copy and pasted a lot in
manifestctx. Let's make it a function so it can be reused. This will make future
patches cleaner too.
As part of removing dependencies on manifest, this drops the find function and
fixes up the two existing callers to use the equivalent apis on manifestctx.
7089745181c5 (manifest: make treemanifestctx store the repo,
2016-10-18) broke most tests when run with treeinmem=True. The
treeinmem mode can not be enabled by the user, so this did not break
anything in practice, but it's useful to have it working for testing
the treemanifest code.
This adds a __nonzero__ method to manifestdict. This isn't strictly necessary in
the vanilla Mercurial implementation, since Python will handle nonzero checks by
using __len__, but having it implemented here makes it easier for alternative
implementations to implement __nonzero__ and have them be plug-n-play with the
normal implementation.
The old manifest had different functions for performing shallow reads, shallow
readdeltas, and shallow readfasts. Since a lot of the code is duplicate (and
since those functions don't make sense on a normal manifestctx), let's unify
them into flags on the existing readdelta and readfast functions.
A future diff will change consumers of these functions to use the manifestctx
versions and will delete the old apis.
In the last patch we added a get() function that allows fetching directory level
treemanifestctxs. It didn't handle caching at directory level though, so we need to
change our mancache to support multiple directories.
Previously manifestlog only allowed obtaining root level manifests. Future
patches will need direct access to subdirectory manifests as part of changegroup
creation, so let's add a get() function that knows how to deal with
subdirectories.
When accessing a manifest via manifestlog[node], let's verify that the node
actually exists and throw a LookupError if it doesn't. This matches the old read
behavior, so we don't accidentally return invalid manifestctxs.
We do this in manifestlog instead of in the manifestctx/treemanifestctx
constructors because the treemanifest code currently relies on the fact that
certain code paths can produce treemanifests without touching the revlogs (and
it has tests that verify things work if certain revlogs are missing entirely, so
they break if we add validation that tries to read them).
Same as in the last commit, the old treemanifestctx stored a reference to the
revlog. If the inmemory revlog became invalid, the ctx now held an old copy and
would be incorrect. To fix this, we need the ctx to go through the manifestlog
for each access.
This is the same pattern that changectx already uses (it stores the repo, and
accesses commit data through self._repo.changelog).
The old manifestctx stored a reference to the revlog. If the inmemory revlog
became invalid, the ctx now held an old copy and would be incorrect. To fix
this, we need the ctx to go through the manifestlog for each access.
This is the same pattern that changectx already uses (it stores the repo, and
accesses commit data through self._repo.changelog).
The old @property on manifestlog was broken. It meant that we would always
recreate the manifestlog instance, which meant the cache was never hit. Since
we'll eventually remove repo.manifest and make manifestlog the only property,
let's go ahead and make manifestlog the @storecache property, have manifestlog
own the manifest instance, and have repo.manifest refer to it via manifestlog.
This means all accesses go through repo.manifestlog, which is now invalidated
correctly.
If steps below occurs at "the same time in sec", all of mtime, ctime
and size are same between (1) and (3).
1. append data to 00manifest.i (and close transaction)
2. discard appended data by truncation (strip or rollback)
3. append same size but different data to 00manifest.i again
Therefore, cache validation doesn't work after (3) as expected.
To avoid such file stat ambiguity around truncation, this patch
specifies checkambig=True to revlog.__init__(). This makes revlog
write changes out with checkambig=True.
Even after this patch, avoiding file stat ambiguity of 00manifest.i
around truncation isn't yet completed, because truncation side isn't
aware of this issue.
This is a part of ExactCacheValidationPlan.
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
This adds a simple add() function to manifestlog. This lets us convert more
uses of repo.manifest to use repo.manifestlog, so we can further break our
dependency on the manifest class.
This moves add and _addtree onto manifestrevlog. manifestrevlog is responsible
for all serialization decisions, so therefore the add function should live on
it. This will allow us to call add() from manifestlog, which lets us further
break our dependency on manifest.
Currently manifest.add uses the treeinmem option to know if it can call
fastdelta on the given manifest instance. In a future patch we will be moving
add() to be on the manifestrevlog, so it won't have access to the treeinmem
option anymore. Instead, let's have it actually check if the given manifest
instance supports the fastdelta operation.
This also means that if treemanifest or any implementation eventually implements
fastdelta(), it will automatically benefit from this code path.