To support "status" operations against working directories that are
the children of censored revisions, filelog must define "cmp" and "size"
for censored content.
With this change, when a revlog revision hash does not match its content, and
the content is empty with a special metadata key, the integrity failure is
assumed to be intentionally caused to remove sensitive content from repository
history.
To allow different Mercurial functionality to handle this scenario differently
a more specific exception is raised than "ordinary" hash failures.
Alternatives to this approach include, but are not limited to:
- Calling a hook when hashes mismatch to allow arbitrary tombstone validation.
Cons: Irresponsibly easy to disable integrity checking altogether.
- Returning empty revision data eagerly instead of raising, masking the error.
Cons: Push/pull won't roundtrip the tombstone, so client repos are unusable.
- Doing nothing differently at this layer. Callers must do their own detection
of tombstoned data if they want to handle some hash checks and not others.
- Impacts dozens of callsites, many of which don't have the revision data
- Would probably be missing one or two callsites at any given time
- Currently we throw a RevlogError, as do 12 other places in revlog.py.
Callers would need to parse the exception message and/or ensure
RevlogError is not thrown from any other part of their call tree.
_parsemeta returns the dictionary and a list of keys in the order they appear
in metadata. This can be used to repack the dictionary in the same order.
_packmeta creates metadata from a dictionary and an optional key-order list.
In _parsemeta, we use slices and re.search indead of str.index so we can accept
both buffers and strings.
filelog.renamed() is an expensive call as it reads the filelog if p1 == nullid.
It's more efficient to first compute the hash, and to bail early if
the computed hash is the same as the stored nodeid.
'samehashes' variable is not strictly necessary, but helps for comprehension.
Because "\1\n" is a separator for metadata, data starting with "\1\n" is
handled specifically. It was not tested.
size() call return incorrect data if original data had been "\1\n-escaped".
There's no obvious way to fix it for now, just flag the error in the code
and add an "expected failure" kind of test.
This is similar to the __builtin__.cmp behaviour, but still not
straightforward, as the dailylife meaning of a comparison usually is
"find out if they are different".
the escaping of directories ending with .i or .d doesn't
really belong to filelog.
we put the encoding/decoding in store instead, for backwards
compat, streamclone and the fncache file format still uses the
partially encoded filenames.
This should be faster and more future-proof. Calls where the result is to be
sorted using util.sort() have been left unchanged. Calls to .items() on
configparser objects have been left as-is, too.
str.find return -1 when the substring is not found, -1 evaluate
to True and is a valid index, which can lead to bugs.
Using alternatives when possible makes the code clearer and less
prone to bugs. (and __contains__ is faster in microbenchmarks)
This changes revlog specify revlogng by default. Inlined
data files are also used unless a flags option is found in the .hgrc.
Some example hgrc files:
[revlog]
# use the original revlog format
format=0
[revlog]
# use revlogng. Because no flags are included, inlined data files
# also be selected
format=1
[revlog]
# use revlogng but do not inline the data files with the index
flags=
[revlog]
# the new default
format=1
flags=inline
revlogng results in smaller indexes, can address larger data files, and
supports flags and version numbers.
By default the original revlog format is used. To use the new format,
use the following .hgrc field:
[revlog]
# format choices are 0 (classic revlog format) and 1 revlogng
format=1
- delete copy information when we update dirstate
hg was keeping the copy state and marking things as copied on
multiple commits
- files that are renamed should have no parents
if you do a rename/copy to an existing file, it should not be marked
as descending from its previous revisions.
- remove spurious print from filelog.renamed
- add some more copy tests