Commit Graph

55 Commits

Author SHA1 Message Date
Gregory Szorc
d51dd423fa filelog: use absolute_import 2015-08-08 19:11:42 -07:00
Mike Edgar
93a70657fb revlog: addgroup checks if incoming deltas add censored revs, sets flag bit
A censored revision stored in a revlog should have the censored revlog index
flag bit set. This implies we must know if a revision is censored before we
add it to the revlog. When adding revisions from exchanged deltas, we would
prefer to determine this flag without decoding every single full text.

This change introduces a heuristic based on assumptions around the Mercurial
delta format and filelog metadata. Since deltas which produce a censored
revision must be full-replacement deltas, we can read the delta's first bytes
to check the filelog metadata. Since "censored" is the alphabetically first
filelog metadata key, censored filelog revisions have a well-known prefix we
can look for.

For more on the design and background of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
2015-01-14 15:16:08 -05:00
Mike Edgar
b4a5dfbe4d changegroup: emit full-replacement deltas if either revision is censored
To ensure that exchanged deltas in the presence of censored revisions can
always be applied to the recipient repository, the deltas must replace the
entire base text. To make this restriction reasonably enforceable, the delta
must do so with a single patch operation.

For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
2015-01-21 22:09:32 -05:00
Mike Edgar
c736894d9c revlog: add "iscensored()" to revlog public API
The iscensored method will be used by the exchange layer to reject
nonconforming deltas involving censored revisions (and to produce
conforming deltas).

For background and broader design of the censorship feature, see:
http://mercurial.selenic.com/wiki/CensorPlan
2015-01-23 17:01:39 -05:00
Mike Edgar
fcc0e5645e filelog: allow censored files to contain padding data
To ensure delta compatibility, when a revision is censored, it is
padded to match the original data in size. The previous check does
not allow for padding because it was added before padding was found
to be a requirement.

For more background and design of the censorship feature, see:
mercurial.selenic.com/wiki/CensorPlan
2015-02-06 01:44:24 +00:00
Mike Edgar
bd60d594d0 filelog: remove unused _file method 2015-01-22 11:09:34 -05:00
Mike Edgar
3136b352b8 filelog: use censored revlog flag bit to quickly check if a node is censored 2015-01-12 15:29:36 -05:00
Mike Edgar
ab4953ab9b filelog: censored files compare against empty data, have 0 size
To support "status" operations against working directories that are
the children of censored revisions, filelog must define "cmp" and "size"
for censored content.
2014-09-14 20:32:34 -04:00
Mike Edgar
04a8ee7081 filelog: raise CensoredNodeError when hash checks fail with censor metadata
With this change, when a revlog revision hash does not match its content, and
the content is empty with a special metadata key, the integrity failure is
assumed to be intentionally caused to remove sensitive content from repository
history.

To allow different Mercurial functionality to handle this scenario differently
a more specific exception is raised than "ordinary" hash failures.

Alternatives to this approach include, but are not limited to:

- Calling a hook when hashes mismatch to allow arbitrary tombstone validation.
  Cons: Irresponsibly easy to disable integrity checking altogether.
- Returning empty revision data eagerly instead of raising, masking the error.
  Cons: Push/pull won't roundtrip the tombstone, so client repos are unusable.
- Doing nothing differently at this layer. Callers must do their own detection
  of tombstoned data if they want to handle some hash checks and not others.
  - Impacts dozens of callsites, many of which don't have the revision data
  - Would probably be missing one or two callsites at any given time
  - Currently we throw a RevlogError, as do 12 other places in revlog.py.
    Callers would need to parse the exception message and/or ensure
    RevlogError is not thrown from any other part of their call tree.
2014-09-03 22:14:20 -04:00
Mike Edgar
f67e6d6c04 filelog: parsemeta stops returning unused key list
Currently, only the returned meta dictionary is used. An upcoming change will
use the returned text offset.
2014-09-02 14:42:30 -04:00
Mike Edgar
8a41e32180 filelog: make parsemeta a public module function, to be used by censor module 2014-09-10 00:18:15 -04:00
Mike Edgar
bfb97f3a41 filelog: make packmeta a public module function, to be used by censor 2014-09-10 00:17:17 -04:00
Durham Goode
a1ef848104 filelog: use super() for calling base functions
filelog had some hardcoded revlog.revlog.foo() calls. This changes it to
use super() instead so that extensions can replace the filelog base class.
2013-05-01 10:39:37 -07:00
Sune Foldager
3a06c3752e filelog: add file function to open other filelogs 2011-05-10 17:38:58 +02:00
Sune Foldager
f18c48ae86 filelog: extract metadata parsing and packing
_parsemeta returns the dictionary and a list of keys in the order they appear
in metadata. This can be used to repack the dictionary in the same order.

_packmeta creates metadata from a dictionary and an optional key-order list.

In _parsemeta, we use slices and re.search indead of str.index so we can accept
both buffers and strings.
2011-04-30 16:32:50 +02:00
Matt Mackall
06f318150a filelog: move metadata parsing to a helper function 2011-01-06 17:04:47 -06:00
Nicolas Dumazet
70fb2c5315 filelog: cmp: don't read data if hashes are identical (issue2273)
filelog.renamed() is an expensive call as it reads the filelog if p1 == nullid.
It's more efficient to first compute the hash, and to bail early if
the computed hash is the same as the stored nodeid.

'samehashes' variable is not strictly necessary, but helps for comprehension.
2010-07-05 19:49:54 +09:00
Nicolas Dumazet
e3057a06d5 filelog: test behaviour for data starting with "\1\n"
Because "\1\n" is a separator for metadata, data starting with "\1\n" is
handled specifically. It was not tested.

size() call return incorrect data if original data had been "\1\n-escaped".
There's no obvious way to fix it for now, just flag the error in the code
and add an "expected failure" kind of test.
2010-07-05 18:43:46 +09:00
Nicolas Dumazet
6e75efdbcb cmp: document the fact that we return True if content is different
This is similar to the __builtin__.cmp behaviour, but still not
straightforward, as the dailylife meaning of a comparison usually is
"find out if they are different".
2010-07-09 11:02:39 +09:00
Benoit Boissinot
aa7653dcd6 merge with stable 2010-03-16 01:16:19 +01:00
Benoit Boissinot
67b2142e44 filelog: no need to optimize an uncommon case, assume meta = {} 2010-03-16 01:16:04 +01:00
Benoit Boissinot
3c5670f245 filelog: text is stored modified when it starts with '\1\n' 2010-03-16 01:12:46 +01:00
Ronny Pfannschmidt
d170b686d2 filelog: sort meta entries, ensure deterministic order 2010-02-16 21:04:04 +01:00
Matt Mackall
8d99be19f0 many, many trivial check-code fixups 2010-01-25 00:05:27 -06:00
Matt Mackall
595d66f424 Update license to GPLv2+ 2010-01-19 22:20:08 -06:00
Benoit Boissinot
5b1d4a56ec filelog encoding: move the encoding/decoding into store
the escaping of directories ending with .i or .d doesn't
really belong to filelog.

we put the encoding/decoding in store instead, for backwards
compat, streamclone and the fncache file format still uses the
partially encoded filenames.
2009-05-20 18:35:47 +02:00
Martin Geisler
750183bdad updated license to be explicit about GPL version 2 2009-04-26 01:08:54 +02:00
Matt Mackall
b28ccc9a94 revlog: kill from-style imports
They're slow.
2009-01-11 22:55:36 -06:00
Dirkjan Ochtman
574603a8c0 use dict.iteritems() rather than dict.items()
This should be faster and more future-proof. Calls where the result is to be
sorted using util.sort() have been left unchanged. Calls to .items() on
configparser objects have been left as-is, too.
2009-01-12 09:16:03 +01:00
Joel Rosdahl
4f8012378a Remove unused imports 2008-03-06 22:23:41 +01:00
Joel Rosdahl
5dae3059a0 Expand import * to allow Pyflakes to find problems 2008-03-06 22:23:26 +01:00
Christian Ebert
5c18a69d2e Prefer i in d over d.has_key(i) 2008-01-20 14:39:25 +01:00
Thomas Arendsen Hein
4d29c6dc8e Updated copyright notices and add "and others" to "hg version" 2007-06-19 08:51:34 +02:00
Matt Mackall
04561e556e revlog: simplify revlog version handling
- pass the default version as an attribute on the opener
- eliminate config option mess
2007-03-22 19:52:38 -05:00
Matt Mackall
b4f6965b1d revlog: don't pass datafile as an argument 2007-03-22 19:12:03 -05:00
Matt Mackall
f17a4e1934 Replace demandload with new demandimport 2006-12-13 13:27:09 -06:00
Benoit Boissinot
0fba88fde3 use forward "/" for internal path and static http, fix issue437 2006-12-05 16:33:40 +01:00
Brendan Cully
ff2b0fa1de filelog.annotate is now obsolete 2006-09-30 20:56:26 -07:00
Matt Mackall
50e42b8388 filelog: make metadata method private 2006-09-17 22:38:06 -05:00
Brendan Cully
627744e332 Teach annotate to follow copies. 2006-08-18 14:59:18 -07:00
Matt Mackall
11dee4259d merge: use file size stored in revlog index
Add size method to filelog to handle nodes with renames
2006-08-15 22:46:35 -05:00
Matt Mackall
a04e906b9c filelog.cmp: return 0 for equality
spotted by Alexis Carvalho
2006-08-15 16:28:00 -05:00
Matt Mackall
e3e04b8f17 Move cmp bits from filelog to revlog 2006-08-15 14:18:13 -05:00
Matt Mackall
b5a0f2743c filelog: add hash-based comparisons
For status, rather than reconstruct full file versions from revlog for
comparison, compare hashes.
2006-08-14 15:07:00 -05:00
Vadim Gelfer
dc377b58c1 update copyrights. 2006-08-12 12:30:02 -07:00
Benoit Boissinot
7dd019b60b use __contains__, index or split instead of str.find
str.find return -1 when the substring is not found, -1 evaluate
to True and is a valid index, which can lead to bugs.
Using alternatives when possible makes the code clearer and less
prone to bugs. (and __contains__ is faster in microbenchmarks)
2006-07-09 01:30:30 +02:00
Vadim Gelfer
9a0c813fdc use demandload more. 2006-06-20 23:58:21 -07:00
mason@suse.com
58d4ef2538 Use revlogng and inlined data files by default
This changes revlog specify revlogng by default.  Inlined
data files are also used unless a flags option is found in the .hgrc.
Some example hgrc files:

[revlog]
# use the original revlog format
format=0

[revlog]
# use revlogng.  Because no flags are included, inlined data files
# also be selected
format=1

[revlog]
# use revlogng but do not inline the data files with the index
flags=

[revlog]
# the new default
format=1
flags=inline
2006-05-08 14:26:18 -05:00
mason@suse.com
ed26ff0cae Implement revlogng.
revlogng results in smaller indexes, can address larger data files, and
supports flags and version numbers.

By default the original revlog format is used.  To use the new format,
use the following .hgrc field:

[revlog]
# format choices are 0 (classic revlog format) and 1 revlogng
format=1
2006-04-04 16:38:43 -04:00
Matt Mackall
91766807e2 Re-enable the renamed check fastpath 2005-12-22 13:18:44 -06:00