Commit Graph

82 Commits

Author SHA1 Message Date
Yuya Nishihara
027352b7a8 streamclone: comment why path auditing is disabled in generatev1()
Copied from 8809f5acb29a. I wasn't sure whether it's for optimization or
suppressing unwanted error.
2017-07-07 23:19:31 +09:00
Yuya Nishihara
954fcd3b6c streamclone: close large revlog files explicitly in generatev1() 2017-07-07 23:25:16 +09:00
Pierre-Yves David
a9e2461dd2 streamclone: stop using 'vfs.mustaudit = False'
Now that each call disable the auditing on its own, we can safely drop this the
mustaudit usage. No other code is modified.
2017-07-02 04:26:34 +02:00
Pierre-Yves David
6da87d469e vfs: simplify path audit disabling in stream clone
The whole 'mustaudit' API is quite complex compared to its actual usage by its
unique user in stream clone.

Instead we add a "auditpath" parameter to 'vfs.__call_'. The stream clone code
then explicitly open files with path auditing disabled.

The 'mustaudit' API will be cleaned up in the next changeset.
2017-07-02 02:28:04 +02:00
Pierre-Yves David
54d53f6ed6 configitems: register the 'server.uncompressedallowsecret' config 2017-06-30 03:44:14 +02:00
Gregory Szorc
bc8582fc01 streamclone: consider secret changesets (BC) (issue5589)
Previously, a repo containing secret changesets would be served via
stream clone, transferring those secret changesets. While secret
changesets aren't meant to imply strong security (if you really
want to keep them secret, others shouldn't have read access to the
repo), we should at least make an effort to protect secret changesets
when possible.

After this commit, we no longer serve stream clones for repos
containing secret changesets by default. This is backwards
incompatible behavior. In case anyone is relying on the behavior,
we provide a config option to opt into the old behavior.

Note that this defense is only beneficial for remote repos
accessed via the wire protocol: if a client has access to the
files backing a repo, they can get to the raw data and see secret
revisions.
2017-06-09 10:41:13 -07:00
Siddharth Agarwal
3bf516869d clone: warn when streaming was requested but couldn't be performed
This helps both users and the people who support them figure out why
a stream clone couldn't be performed.

In an upcoming patch we're going to add a way for servers to hard
abort on a full getbundle. In those cases servers might expect
clients to perform a stream clone, so it's important to communicate
why one couldn't be done.
2017-05-08 20:01:06 -07:00
Simon Farnsworth
e0b70e4f7f mercurial: switch to util.timer for all interval timings
util.timer is now the best available interval timer, at the expense of not
having a known epoch. Let's use it whenever the epoch is irrelevant.
2017-02-15 13:17:39 -08:00
Mads Kiilerich
38cb771268 spelling: fixes of non-dictionary words 2016-10-17 23:16:55 +02:00
FUJIWARA Katsunori
ff0a456116 streamclone: clear caches after writing changes into files for visibility
Before this patch, streamclone-ed changes are invisible via @filecache
properties to in-process procedures before closing transaction
(e.g. pretxnclose python hook), if corresponded property is cached
before consumev1(). Strictly speaking, caching should occur inside
(store) lock for transaction.

repo.invalidate() after closing transaction is too late to force
@filecache properties to be reloaded from changed files at next
access.

For visibility of streamclone-ed changes to in-process procedures
before closing transaction, this patch clears caches just after
writing changes into files.

BTW, regardless of changing in this patch, clearing cached properties
in consumev1() causes inconsistency, if (1) transaction is started and
(2) any @filecache property is changed before consumev1().

This patch also adds the comment to fix this (potential) inconsistency
in the future.
2016-09-12 03:06:29 +09:00
FUJIWARA Katsunori
e23dee13b3 streamclone: force @filecache properties to be reloaded from file
Before this patch, consumev1() invokes repo.invalidate() after closing
transaction, to force @filecache properties to be reloaded from files
at next access, because streamclone writes data into files directly.

But this doesn't work as expected in the case below:

  1. at closing transaction, repo._refreshfilecachestats() refreshes
     file stat of each @filecache properties with streamclone-ed files

     This means that in-memory properties are treated as valid.

  2. but streamclone doesn't changes in-memory properties

     This means that in-memory properties are actually invalid.

  3. repo.invalidate() just forces to examine file stat of @filecache
     properties at the first access after it

     Such examination should concludes that reloading from file isn't
     needed, because file stat was already refreshed at (1).

     Therefore, invalid in-memory cached properties (2) are
     unintentionally treated as valid (1).

This patch invokes repo.invalidate() with clearfilecache=True, to
force @filecache properties to be reloaded from file at next access.

BTW, it is accidental that repo.invalidate() without
clearfilecache=True in streamclone case seems to work as expected
before this patch.

If transaction is started via "filtered repo" object,
repo._refreshfilecachestats() tries to refresh file stat of each
@filecache properties on "filtered repo" object, even though all of
them are stored into "unfiltered repo" object.

In this case, repo._refreshfilecachestats() does nothing
unintentionally, but this unexpected behavior causes reloading
@filecache properties after repo.invalidate().

This is reason why this patch should be applied before making
_refreshfilecachestats() correctly refresh file stat of @filecache
properties.
2016-09-12 03:06:28 +09:00
Matt Mackall
97d8dbf685 merge with stable 2016-03-15 14:10:46 -07:00
Mads Kiilerich
39acd325e0 streamclone: fix error when store files grow while stream cloning
Effectively a backout of d573a437d564, but updated to using 'with'.
2016-03-13 02:29:11 +01:00
Anton Shestakov
245ded8e7d streamclone: specify unit for ui.progress when handling data 2016-03-11 22:28:27 +08:00
Gregory Szorc
a05892eae0 streamclone: use backgroundfilecloser (issue4889)
Closing files that have been appended to is slow on Windows/NTFS.
CloseHandle() calls on this platform often take 1-10ms - and that's
on my i7-6700K Skylake processor with a modern and fast SSD. Contrast
with other I/O operations, such as writing data, which take <100us.

This means that creating/appending thousands of files can add
significant overhead. For example, cloning mozilla-central creates
~232,000 revlog files. Assuming 1ms per CloseHandle(), that yields
232s (3:52) of wall time waiting for file closes!

The impact of this overhead can be measured most directly when applying
stream clone bundles. Applying these files is effectively uncompressing
a tar archive (read: it's very fast).

Using a RAM disk (read: no I/O wait), the difference in wall time for a
`hg debugapplystreamclonebundle` for a ~1731 MB mozilla-central bundle
between Windows and Linux from the same machine is drastic:

Linux:    ~12.8s (128MB/s)
Windows: ~352.0s (4.7MB/s)

Windows is ~27.5x slower. Yikes!

After this patch:

Linux:    ~12.8s (128MB/s)
Windows: ~102.1s (16.1MB/s)

Windows is now ~3.4x faster. Unfortunately, it is still ~8x slower than
Linux. Profiling reveals a few hot code paths that could likely be
improved. But those are for other patches.

This patch introduces test-clone-uncompressed.t because existing tests
of `clone --uncompressed` are scattered about and adding a variation for
background thread closing to e.g. test-http.t doesn't feel correct.
2016-01-14 13:44:01 -08:00
Gregory Szorc
ba2d05e908 streamclone: indent code
This will make the subsequent patch easier to read.
2016-01-02 16:11:36 -08:00
Gregory Szorc
9128d3d945 streamclone: extract code for reading header fields
So it can be called from another consumer in a future patch.
2016-01-14 22:48:54 -08:00
Bryan O'Sullivan
13360de2f3 with: use context manager for transaction in consumev1 2016-01-15 13:14:49 -08:00
Bryan O'Sullivan
cde011507a with: use context manager in streamclone consumev1 2016-01-15 13:14:49 -08:00
Bryan O'Sullivan
9b486e52cc with: use context manager in maybeperformlegacystreamclone 2016-01-15 13:14:49 -08:00
Bryan O'Sullivan
721f51151e with: use context manager in streamclone generatev1 2016-01-15 13:14:50 -08:00
Bryan O'Sullivan
47caeb4184 i18n: don't translate a transaction name 2016-01-15 13:14:49 -08:00
Gregory Szorc
054fcca201 streamclone: use context manager for writing files
These are the file writes that have the most to gain from background
I/O. Plug in a context manager so I can design the background I/O
mechanism with context managers in mind.
2016-01-02 15:09:58 -08:00
Gregory Szorc
34d6b4f44c streamclone: use read()
We have a convenience API for reading the full contents of a file.
Use it.
2016-01-02 15:14:55 -08:00
Gregory Szorc
9f922bdde8 streamclone: support for producing and consuming stream clone bundles
Up to this point, stream clones only existed as a dynamically generated
data format produced and consumed during streaming clones. In order to
support this efficient cloning format with the clone bundles feature, we
need a more formal, on disk representation of the streaming clone data.

This patch introduces a new "bundle" type for streaming clones. Unlike
existing bundles, it does not contain changegroup data. It does,
however, share the same concepts like the 4 byte header which identifies
the type of data that follows and the 2 byte abbreviation for
compression types (of which only "UN" is currently supported).

The new bundle format is essentially the existing stream clone version 1
data format with some headers at the beginning.

Content negotiation at stream clone request time checked for repository
format/requirements compatibility before initiating a stream clone. We
can't do active content negotiation when using clone bundles. So, we put
this set of requirements inside the payload so consumers have a built-in
mechanism for checking compatibility before reading and applying lots of
data. Of course, we will also advertise this requirements set in clone
bundles. But that's for another patch.

We currently don't have a mechanism to produce and consume this new
bundle format. This will be implemented in upcoming patches.

It's worth noting that if a legacy client attempts to `hg unbundle` a
stream clone bundle (with the "HGS1" header), it will abort with:
"unknown bundle version S1," which seems appropriate.
2015-10-17 11:14:52 -07:00
Pierre-Yves David
30913031d4 error: get Abort from 'error' instead of 'util'
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.

For great justice.
2015-10-08 12:55:45 -07:00
Gregory Szorc
c70ae254f7 streamclone: move "streaming all changes" message location
Previously, the message was printed after we requested and started
processing the remote stream. This seems like something that we should
do before calling out to the remote. Moving it also makes it easier to
deal with the bundle2 implementation.
2015-10-04 12:07:01 -07:00
Gregory Szorc
9250e99393 streamclone: move payload header generation into own function
The stream clone data over the wire protocol contains a header line
indicating total file count and data size. In bundle2, this metadata can
be captured by a part parameter and doesn't need to be in the body.

In preparation for bundle2, have generatev1() return the raw metadata
and move the header generation to its own function.
2015-10-04 19:06:06 -07:00
Gregory Szorc
f101d4721b streamclone: move payload header line consumption
bundle2 parts have parameters. These are a logical place for "header"
data such as the file count and payload size of stream clone data. In
preparation for supporting stream clones with bundle2, move the
consumption of the header line from the payload into
maybeperformlegacystreamclone().

Note: the header line is still being emitted by generatev1(). This will
be addressed in a subsequent patch.
2015-10-04 18:44:46 -07:00
Gregory Szorc
6868c29dd6 streamclone: teach canperformstreamclone to be bundle2 aware
We add an argument to canperformstreamclone() to return False if a
bundle2 stream clone is available. This will enable the legacy stream
clone step to no-op when a bundle2 stream clone is supported.

The commented code will be made active when bundle2 supports streaming
clone.

This patch does foreshadow the introduction of the "stream" bundle2
capability and its "v1" sub-capability. The bundle2 capability mirrors
the existing "stream" capability and is needed so clients know whether a
server explicitly supports streaming clones over bundle2 (servers up to
this point support bundle2 without streaming clone support).

The sub-capability will denote which data formats and variations are
supported. Currently, the value "v1" denotes the existing streaming
clone data format, which I intend to reuse inside a bundle2 part. My
intent is to eventually introduce alternate data formats that can be
produced and consumed more efficiently. Having a sub-capability means
we don't need to introduce a new top-level bundle2 capability when new
formats are introduced. This doesn't really have any implications
beyond making the capabilities namespace more organized.
2015-10-04 18:35:19 -07:00
Gregory Szorc
77cf036fc0 streamclone: refactor canperformstreamclone to accept a pullop
This isn't strictly necessary. But a lot of pull functionality accepts a
pulloperation so extra state can be added easily. It also enables
extensions to perform more powerful things.
2015-10-04 11:50:42 -07:00
Gregory Szorc
f283944ccb streamclone: rename and document maybeperformstreamclone()
Upcoming patches will introduce bundle2 based streaming clones. Add
"legacy" to the function name and add a docstring clarifying the intent of
the function.
2015-10-04 11:34:28 -07:00
Gregory Szorc
fc2b0ba2f0 streamclone: move applyremotedata() into maybeperformstreamclone()
Future work around stream cloning will be implemented in a bundle2
world. This code will only be used in the legacy code path and
doesn't need to be abstracted or extensible.
2015-10-04 11:27:10 -07:00
Gregory Szorc
eeba469be5 branchmap: move branch cache code out of streamclone.py
This is low-level branch map and cache manipulation code. It deserves to
live next to similar code in branchmap.py. Moving it also paves the road
for multiple consumers, such as a bundle2 part handler that receives
branch mappings from a remote.

This is largely a mechanical move, with only variable names and
indentation being changed.
2015-10-03 09:53:56 -07:00
Gregory Szorc
b191f901f5 streamclone: move streamin() into maybeperformstreamclone()
streamin() only had a single consumer. And it always only ever will
because it is strongly coupled with the current,
soon-to-be-superseded-by-bundle2 functionality.

The return value has been dropped because nobody was using it.
2015-10-02 23:08:15 -07:00
Gregory Szorc
8ac7d32ad1 streamclone: refactor maybeperformstreamclone to take a pullop
Just like all the other pull steps. Consistency is good.

This seems a little excessive right now since maybeperformstreamclone is
such a short function. This will be addressed in a subsequent patch.
2015-10-04 11:20:52 -07:00
Gregory Szorc
37cd17cd86 streamclone: add explicit check for empty local repo
Stream clone doesn't work with non-empty local repositories. In upcoming
patches, we'll move stream cloning to the regular pull code path. Add an
explicit check on the repository being empty to prevent streaming clones
to non-empty repos.
2015-10-02 21:53:25 -07:00
Gregory Szorc
21ac8b474d streamclone: refactor code for deciding to stream clone
Having this in a standalone function will eventually enable bundle2 to
share code with the bundle1 code path.

While I was here, I also added some comments to add clarity.
2015-10-02 22:22:11 -07:00
Gregory Szorc
db39558e0c streamclone: move streaming clone logic from localrepo
This is the last remnants of streaming clone code in localrepo.py.

This is a mostly mechanical transplant of code to a new file. Only a
rewrite of "self" to "repo" was performed. The code will be
significantly refactored in upcoming patches. So don't scrutinize it too
closely.
2015-10-02 21:39:04 -07:00
Gregory Szorc
40483f6f59 streamclone: move _allowstream() from wireproto
While we're moving things into streamclone.py...
2015-10-02 16:24:56 -07:00
Gregory Szorc
d8e74180f0 streamclone: move code out of exchange.py
We bulk move functions from exchange.py related to streaming clones.

Function names were renamed slightly to drop a component redundant with
the module name. Docstrings and comments referencing old names and
locations were updated accordingly.
2015-10-02 16:05:52 -07:00
Gregory Szorc
e5b1fcee2d streamclone: move stream_in() from localrepo
Another basic content move. The underscore from the function name was
removed to comply with naming standards.
2015-10-02 15:58:24 -07:00
Gregory Szorc
afd8c0b560 streamclone: move applystreamclone() from localrepo.py
Upcoming patches will modernize the streaming clone code. Streaming
clone data and code kind of lives in its own world. exchange.py is
arguably the most appropriate existing location for it. However, over
a dozen patches from now it became apparent that there was a lot of code
related to streaming clones and that having it contained within its own
module would make it easier to comprehend. So, we establish
streamclone.py.

It's worth noting that streamclone.py existed a long time ago, last seen
in the 1.6 release. It was removed in 2cd3dd86758c.

The function was renamed as part of the move because its old name was
redundant with the new module name. The only other content change was
"self" was renamed to "repo" and minor grammar in the docstring was
updated.
2015-10-02 15:51:32 -07:00
Dirkjan Ochtman
2c0f8ea6a7 protocol: move the streamclone implementation into wireproto 2010-07-20 20:52:23 +02:00
Dirkjan Ochtman
ce4ed80c7a protocol: convert StreamException to generated error code
This makes it much easier to handle these errors at the transport level.
2010-07-16 22:20:19 +02:00
Nicolas Dumazet
7f1a963829 pylint, pyflakes: remove unused or duplicate imports 2010-04-14 17:58:10 +09:00
Matt Mackall
b5b825953f streaming: actually change default 2010-02-09 14:12:34 -06:00
Matt Mackall
b7afbe529a streamclone: allow uncompressed clones by default 2010-02-07 15:31:53 +01:00
Matt Mackall
595d66f424 Update license to GPLv2+ 2010-01-19 22:20:08 -06:00
Matt Mackall
3e6199cea0 Merge with -stable 2009-09-30 21:42:51 -05:00