We've gotten reports of users receiving corrupt file blobs directly from the
server. The corruption doesn't enter the cache pool, and we don't get any
further reports of it, so I think it's a transient issue caused certain readers
reading the file before the writer has finished writing it.
Let's use atomic rename files to make this not happen.
I saw some crazy looking stack traces like this while testing an
improved implementation of our internal cacheprocess binary:
```
fileservice.prefetch([(self.filename, id)])
File "/usr/lib/python2.6/site-packages/remotefilelog/remotefilelog.py", line 78, in read
File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 357, in prefetch
raw = self._read(hex(node))
File "/usr/lib/python2.6/site-packages/remotefilelog/remotefilelog.py", line 283, in _read
missingids = self.request(missingids)
File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 196, in request
fileservice.prefetch([(self.filename, id)])
File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 357, in prefetch
missingid = cache.receiveline()
File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 105, in receiveline
self.close()
missingids = self.request(missingids)
File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 76, in close
File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 196, in request
self.pipei.write("exit\n")
missingid = cache.receiveline()
File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 105, in receiveline
ValueError: I/O operation on closed file self.close()
File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 76, in close
self.pipei.write("exit\n")
ValueError: I/O operation on closed file
```
it looks like we are somehow re-entrant (maybe referenced from multiple generators?) and get tripped
up if we're not careful about checking for or catching issues during the close() method call.
So let's be a little more careful :-)
Summary:
_walkstreamfiles() uses mercurial.store.decodedir(), so
mercurial.store needs to be imported.
Test Plan:
Confirmed that _walkstreamfiles() no longer throws an exception when cloning a
remote shallow repository.
Reviewers: durham, pyd, rmcelroy
Reviewed By: rmcelroy
Subscribers: net-systems-diffs@, exa, yogeshwer
Differential Revision: https://phabricator.fb.com/D2409648
Signature: t1:2409648:1441245825:00a758f6f0884b77572078589f18592ca6cb6fa4
Streaming clones were taking a while because apparently self.datafiles()
actually stats each .i file instead of just returning the list straight from
fncache. To fix this, let's not call datafiles() when we know the matcher is
going to reject everything anyways.
This significantly speeds up streaming clones.
Previously we'd just send one enormous batch for everything to the
server. This led to prolonged periods of no progress output for the
user. Now we send batches in smaller chunks (default is 100) which
gives the user some idea that things are working.
Includes a trivial test, which doesn't really verify that the batching
logic is used as described, but at least prevents the boneheaded error
I had in an earlier (unmailed) version of this patch which forgot to
use configint() when loading the config setting.
Without this, the only way to report a failure of a file load in a
batched set of getfile requests is to fail the entire batch, which is
potentially painful. Instead, add our own error reporting in-band
which the client can then detect and raise.
I'm not completely happy with the somewhat adhoc error reporting here,
but we expect our server to have at least one additional error ("not
allowed to see file contents") which will require some special
handling on our end, so we need some level of flexibility in the error
reporting protocol so we can extend it later. Sigh.
Open question: should we reserve some range of error codes so that
it's easy for strange custom servers to have related monkeypatches to
client code for custom handling of unforseen-by-remotefilelog
conditions?
I couldn't figure out how to actually get the client to try loading
file contents over http in the test, but the get-with-headers test at
least proves that the server responses look the way I expect.
We were not prefetching the potential dependent files for the filelog revisions
we received over the wire. This resulted in a lot of non-batched downloads,
which was super slow. This fixes it by batch downloading the parents and delta
parents of the incoming filelog revisions and adds a test.
This lets clients send many getfile requests in a single transaction.
Note that this requires 76fcf62accb0 be applied to your Mercurial, or
you'll be bitten by a bug[0] in Mercurial's wireproto batching. As a
result of this change, remotefilelog now effectively requires the
upcoming Mercurial 3.5 if you want to use a specific release.
0: http://bz.selenic.com/show_bug.cgi?id=4739
Right now, this is a naive fetch-one-file method. The next change will
mark the method as batchable and use a batch in the client so that
many files can be requested in a single RPC.
The way the protocol is defined for getfiles interleaves reading
filenames and sending file contents, which works fine over ssh but is
incompatible with http.
This change is probably not neccessary now that remotefilelog
correctly checks for its own capability first, but it helped me debug
so I left it in for completeness.
If we instead wrap wireproto.capabilities, then our capabilities don't
get transmitted via the hello command, so not all clients will notice
the new capability unless we do the wrapping here.
Test output is in the test that previously demonstrated the
defect. Note that there's still a defect: we're advertising the
capability over http even though we have no hope of the getfiles
method working over http.
The magic string 'internal' causes Mercurial to never blame
remotefilelog for being broken. I had suspected that remotefilelog
might work with 3.4, but the tests fail against 3.4.1, so I'm just
making testedwith empty.
The rev graph building code was flawed because it didn't track second parents
correctly. This was caught when someone was developing an extension and
attempted to commit a merge commit in some way.
repo.sopener has been deprecated since hg 2.3, and repo.svfs replaces
it. Since it's been dead for so long, let's just use svfs and call it
good enough.
Summary:
The incominghook was meant to pregenerate any remotefilelog blobs that were
likely to be needed shortly. Unfortunately it actually just slows down pushes,
since in large repos the hook takes longer than the push does sometimes.
So let's just remove it.
Test Plan: Apparently there were no tests for this :p
Reviewers: sid0, lcharignon, mitrandir, ericsumner, rmcelroy
Reviewed By: rmcelroy
Differential Revision: https://phabricator.fb.com/D2185894
Signature: t1:2185894:1435126819:e1e1125520411356eccff4baee31ab2938ebc0fe
Summary: I really don't think it should be in this list.
Test Plan: `hg`
Reviewers: durham, #sourcecontrol, rmcelroy
Reviewed By: durham, #sourcecontrol, rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.fb.com/D1997655
Signature: t1:1997655:1429189594:aa8f355a6fc61e300f824be6b2fbd64a42dde2b5
Summary:
When adjustlinkrevs got moved to the filectx upstream, we incorrectly
moved it to the remotefilectx inside remotefilelog. We don't actually use
remotefilectx on the server, so wrapping it did nothing.
The fix is to move the wrapping to be in remotefilelogserver.py so it is
executed on the server side.
Test Plan:
Did a checkout with my shallow client pointed at a full repo with no
blob cache. Verified it went quickly (minutes, instead of hours).
Reviewers: pyd
Differential Revision: https://phabricator.fb.com/D2097851
Summary:
Since we only prefetch things that are in the sparse checkout, copy tracing
(which touches everything in the manifest diff) would do individual file
downloads for every file. Let's just remove those files from the copy tracing
check entirely since the user probably doesn't care if they're outside the
sparse checkout.
Test Plan: Added a test
Reviewers: sid0, rmcelroy, lcharignon, pyd
Differential Revision: https://phabricator.fb.com/D2083768
Summary:
Match with with latest version of core to pass the test.
There were a couple of changes in core that broke the extension, I matched
those changes to make the test pass.
Test Plan: The tests are all passing
Reviewers: durham
Differential Revision: https://phabricator.fb.com/D2053958
Upstream now has a matcher on _computeforwardmissing which will allow us to only
prefetch the necessary parts of a sparse checkout.
Since we're now being returned an iterator, we need to convert it to a list
since we iterate over it and return it.
Summary:
Previously remotefilelog would prefetch every file in a commit. With the sparse
checkout extension we want to only prefetch things in the sparse checkout.
This commit makes remotefilelog aware of the possible existence of a sparse
matcher.
Test Plan: Added tests
Reviewers: sid0, rmcelroy, pyd, lcharignon
Subscribers: kang
Differential Revision: https://phabricator.fb.com/D1967207
Summary:
Per @pyd's review of D1933267, we need to check for the linknode in cl.nodemap,
not in cl (whose __contains__ method only looks for revs and doesn't even check
for visibility... lolz).
Test Plan: ran tests
Reviewers: durham, sid0, pyd, ericsumner, lcharignon, davidsp, mitrandir
Reviewed By: mitrandir
Subscribers: akushner, daviser, pyd
Differential Revision: https://phabricator.fb.com/D1934941
Tasks: 6573011
Signature: t1:1934941:1427130649:b084635db9bfcd28c4d4a1bcf12a7500c06b323c
Summary:
The new version of adjust linknodes wasn't accounting for the fact that some
ancestries contained nodes that no longer exist. Check for that before looking
for common ancestors.
The old version of this code survived by luck. We were catching KeyErrors as one
base case, and it just happens that LookupError from the changelog is also a
KeyError, so it was getting caught and eaten.
Test Plan:
We should probably add a test, but I have to leave shortly and this is pretty
broken, so we'll have to take a rain check.
Reviewers: rmcelroy, pyd, sid0
Differential Revision: https://phabricator.fb.com/D1933267
Summary:
The new fixmappinglinknodes function was using recursion to traverse the file
history, but this would break for files with history that was extremely long
(stack overflow). Switch to using a manual stack approach.
Test Plan: Ran the tests (I'd added a test to cover this logic before).
Reviewers: sid0, davidsp, mitrandir, lcharignon, pyd, rmcelroy
Reviewed By: rmcelroy
Subscribers: michaelbarton
Differential Revision: https://phabricator.fb.com/D1931944
Signature: t1:1931944:1426884986:3a0ef144fb55b8c0533e5c5de90699a1823b891f
Summary: I'm going to add a new parameter upstream. Make this more generic so that we don't have to try and support both the old and the new versions.
Test Plan: Ran tests with both old and new hg.
Reviewers: davidsp, rmcelroy, akushner, pyd, daviser, mitrandir, ericsumner, durham
Reviewed By: durham
Differential Revision: https://phabricator.fb.com/D1920172
Signature: t1:1920172:1426615175:d90bda3b3cc30f6e5f3149af82ae9e43dee39455
Summary:
Previously remotefilelog did not produce all the necessary local data blobs
when doing a peer push/pull if the incoming changegroup had two manifests
that referred to the same file revision. We would only create a file blob
containing the history for the first occurrence, then if the user tried to
access the file history for other occurrences they got an exception.
The fix is to add linkrev fixup logic, similar to the adjustlinkrev() method
from core Mercurial's filectx. Now, if no valid local file blob can be found, we
will compute a valid history by reading the changelog.
We might be able to write this data to disk in the future as well to prevent
having to repeatedly compute this.
Test Plan: Added a test
Reviewers: sid0, rmcelroy, pyd, mitrandir, lcharignon
Differential Revision: https://phabricator.fb.com/D1904453
Summary:
For hg-git conversions we're going to cause commits without actually updating to the base. Currently, this will cause lots of individual fetches.
The test demonstrates the issue -- wihtout this patch it'll fetch the 2 files over 2 fetches, but with it it'll fetch the files over 1 fetch.
Test Plan: Ran the tests.
Reviewers: davidsp, rmcelroy, akushner, pyd, daviser, mitrandir, ericsumner, durham
Reviewed By: durham
Differential Revision: https://phabricator.fb.com/D1893721
Tasks: 6390769
Signature: t1:1893721:1425624679:5651f71d5023919e9321646275b681b573847c44
Upstream has refactored the copy logic to compute the file lists in separate
functions, so we no longer need to compute the file lists ourselves.
Update the README's Mercurial min-version since this change depends on new APIs
inside Mercurial.
Summary:
Upstream has moved _adjustlinkrev from being a global function to one
on the filectx. Let's do the same.
Test Plan: Ran the tests
Reviewers: mitrandir
Differential Revision: https://phabricator.fb.com/D1825043
Summary:
adjustlinkrev makes ancestor reading orders of magnitude slower,
so we need to avoid using it. Since adjustlinkrev already returns the linkrev in
certain cases, let's just force it to always return that during file blob
creation.
Test Plan:
Generated a few thousand blobs for www and fbcode using the old and new
methods and verified that they were byte-for-byte identical.
Reviewers: sid0, pyd, mpm, rmcelroy
Differential Revision: https://phabricator.fb.com/D1782400
Summary:
If the remotefilelog server was not specified in the hgrc, or if the project
hgrc wasn't trusted, it would throw an obtuse error about a NoneType string.
This fixes it to give a more informative error explaining the problem.
Test Plan: Added a test
Reviewers: sid0, pyd, mitrandir, ericsumner, rmcelroy
Reviewed By: rmcelroy
Differential Revision: https://phabricator.fb.com/D1774743
Signature: t1:1774743:1420830544:5122a8e11f668ee8c35996e0f4395883a31ce8b0
Summary:
There are reports of the local cache becoming invalid when stored on disk. This
adds an option that will do some basic validation and remediation for those
entries, and log some data to disk.
This is optional, since it incurs some performance overhead. We just want to use
it long enough to track down the issue.
Test Plan: Added a test
Reviewers: sid0, pyd, ericsumner, rmcelroy, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.fb.com/D1774724
Signature: t1:1774724:1420827432:06ace9d1dc078f469e0f61ebd7f604fc3b606f6d
Summary:
We've gotten reports of corrupt cache files, and the error message is pretty
obtuse (ValueError for converting a string to an int). This refactors the size
check into a function and provides a better error message.
Test Plan: Added a test
Reviewers: sid0, pyd, mitrandir, ericsumner, rmcelroy
Reviewed By: rmcelroy
Differential Revision: https://phabricator.fb.com/D1774721
Signature: t1:1774721:1420830671:afd54dde8fdc00e08ed1c6cb73bf9fdc7fac2327
Summary: We were forgetting to pass these arguments on to the child function.
Test Plan: Visual inspection.
Reviewers: durham, davidsp, rmcelroy, akushner, pyd, daviser, mitrandir, ericsumner
Reviewed By: ericsumner
Differential Revision: https://phabricator.fb.com/D1773782
Signature: t1:1773782:1420765574:d73be08ab25265e4769d8bf70671f2ea1c13f8dd
Mercurial upstream does some fancy stuff inside introrev now to provide the
correct introrev. It relies on having the filelog though, so we need to avoid
it. Remotefilelog has perfect history knowledge, so we can just return the
correct linkrev.
Summary:
We're seeing some weird cache corruption errors when writing the cache to disk.
My best bet is there's multiple writes colliding and causing bad data, so let's
do atomic renames.
Test Plan: Ran the test suite
Reviewers: sid0, pyd, davidsp, rmcelroy
Reviewed By: rmcelroy
Subscribers: ericsumner, mitrandir
Differential Revision: https://phabricator.fb.com/D1747190
Signature: t1:1747190:1418865586:0a07e5243dfe9c1d5ea24f81874910d1080f24e2
It is part of the revlog API and some extension like tortoisehg rely on it. The
default implementation is the same as size so we can safely mimic this here.
A recent fix to make ancestor maps work with changeset evolution actually caused
a pretty serious regression. The ancestormap validation code was returning
ancestormaps with hidden ancestors if the first commit in the history was a
hidden node. This resulted in lots of invalid ancestories being returned.
Instead we only want to allow hidden ancestors in the map if the relativeto
commit has been explicitly set to a hidden node.
Summary: Last bits needed to get remotefilelog over bundle2 working. Includes tests.
Test Plan: Ran tests, including with `--extra-config-opt experimental.bundle2-exp=True`
Reviewers: davidsp, akushner, pyd, rmcelroy, daviser, durham
Reviewed By: durham
Differential Revision: https://phabricator.fb.com/D1671738
Tasks: 5568731
Signature: t1:1671738:1415676482:b9e7a1f308919526b0c41fee54d89da876518ec7
Certain filectx constructions used the rev number of the self._changeid. We
need to convert that to a node before using it. This was breaking blame. I've
now added a blame test too.
Bundlerepos work by providing a fake revlog layer above an existing revlog.
Since remotefilelog doesn't use revlogs for filelogs, bundlerepo's did not work.
This commit fixes it such that you can now hg pull from a bundle, as long as
that bundle is shallow (i.e. contains no file contents). This will work for the
common use case of trying to recover data from .hg/strip-backups.
For reference, shallow bundles don't contain any file data because we never
delete any file data from .hg/store/data when using remotefilelog. Even after
the commits have been stripped.
Upstream Mercurial has moved localrepo.pull into exchange.pull. This moves our
wrapping of that command out of shallowrepo and into __init__. Exchange is
becoming an increasingly important class, so we may want to think about moving
all exchange wrapper logic out to a separate module in remotefilelog.
repo.revs() no longer returns an object that can be indexed, so we can't use []
on it anymore. So let's use list() on it first.
The bookmark output from upstream Mercurial has also changed, so we need to
update the tests.
Summary:
When doing 'hg unshelve foo.txt' with Changeset Evolution enabled, uncommit will
first prune the commit, then try to read the filelog history to determine if any
renames need to be undone. Since the commit is now pruned, remotefilelog fails
to find any valid histories.
This fixes it two allow hidden histories if the filectx commit is hidden. It
also tweaks remotefilectx to produce commit-relative histories when possible,
which will result in more accurate histories.
Test Plan:
Ran hg uncommit in the evolve repo that had problems before. Verified
it now worked.
Reviewers: pyd, sid0
Differential Revision: https://phabricator.fb.com/D1587306
Summary:
Pull-prefetch would not download file versions from the server if the file
version already existed in the local cache or the local store data.
Unfortunately, if someone landed their commit, then later stripped their local
version, the local store data file version might become invalid and no local
cache version would exist. Meaning things like 'commit' might fail when offline.
This changes prefetch to always fetch from the server when dealing with files it
knows are from revs on the server.
Test Plan:
Added a test that makes local commits that already exist on the
server, and verifies that a pull-prefetch fetches the server file version,
despite that same version existing locally.
Reviewers: sid0, pyd, davidsp
Subscribers: orip
Differential Revision: https://phabricator.fb.com/D1607260
Summary:
The ResponseError exception expect a second argument. Otherwise the code
handling it crashes.
Test Plan: The handling of the response error stop crashing.
Reviewers: durham
Differential Revision: https://phabricator.fb.com/D1581574
Summary:
If the orig function crash before the fileservice is installed, the finally
clause explode, shadowing the original error. We fixes thats.
Test Plan:
crash stopped being shadowed but crash in the finally clause.
Reviewers: durham
Differential Revision: https://phabricator.fb.com/D1581562
Summary: API change
Test Plan: @durham ran an amend.
Reviewers: durham
Reviewed By: durham
Subscribers: durham
Differential Revision: https://phabricator.fb.com/D1569510
Summary:
Upstream Mercurial changed the way merging works and added
revlog.commonancestorsheads. This changes remotefilelog to implement the same
API.
Previously we were able to use ancestors.genericancestors to do the graph
traversal. Upstream Mercurial has deleted that function though (since it is now
unused), so remotefilelog must now build a temporary rev graph in order to use
the ancestors.* apis.
Test Plan: Added a test. It failed without the fix, it passes with the fix.
Reviewers: sid0, davidsp, pyd
Differential Revision: https://phabricator.fb.com/D1566787
Summary: This was broken by recent changes.
Test Plan: Ran test suite.
Reviewers: durham
Reviewed By: durham
Differential Revision: https://phabricator.fb.com/D1558890
Tasks: 5170539
Summary:
With recent version of mercurial (>= 3.2, 4dfcf21a6aa7), revert uses status
information to determine the files that needs to be touched. It then offer a
simple handle for extensions that needs prefetch.
Test Plan:
Ran the tests. Certain tests depended on the old revert behavior (of
prefetching everything), so they required slight changes.
Reviewers: pyd, sid0, davidsp
Differential Revision: https://phabricator.fb.com/D1551059
Summary:
Changegroups have been refactored upstream and we need to update our
remotefilelog monkey patching accordingly.
Also fix an issue with the tests where 'function foo()' was not considered valid
on certain systems.
Test Plan: Ran the tests
Reviewers: pyd, sid0, davidsp
Differential Revision: https://phabricator.fb.com/D1551019
Summary:
Previously, if pullprefetch was set, we'd perform a prefetch of the
entire manifest of the specified revs (usually the public bookmarks). This
involved stat-ing all the relevant files in the cache to see if they already
existed, which added an extra 6 seconds or so to every pull.
Now we only prefetch the files that are different from our working copy. We
assume we already have all the files that are in our working copy. This reduces
the pullprefetch overhead significantly.
Test Plan:
Did a pull on my laptop. Verified it didn't hang for 6 seconds at the
prefetch stage. Also updated a test
Reviewers: davidsp, pyd, sid0
Reviewed By: sid0
Differential Revision: https://phabricator.fb.com/D1505841
Tasks: 4608894
Summary:
Previously, pullprefetch was executed during the repo.pull stage. This happens
before the bookmarks have been moved, so revsets like 'bookmark()' would
prefetch the wrong commits.
This change moves the pullprefetch logic to after the pull command is completely
finished. Updated a test to make sure this is caught.
Also fixes a bug where we were using linkrevs to read a manifest rev entry. We
should be using the manifest rev instead.
Test Plan: Added a test. Ran it.
Reviewers: sid0, pyd, davidsp
Differential Revision: https://phabricator.fb.com/D1483345
Summary: These commands (well, not the debug one) were visible in the shortlist that showed up when you type `hg`. They're not basic commands.
Test Plan: Ran `hg` with the extension enabled, didn't see those commands.
Reviewers: durham
Reviewed By: durham
Differential Revision: https://phabricator.fb.com/D1454931
Summary:
Due to a change in upstream mercurial, hg log with patterns was no longer
working. This fixes it by forcing hg log to take the slow path when using
patterns.
It also updates the warning messages to work when running hg log <file> from
within a subdirectory.
Test Plan: Ran the new tests
Reviewers: sid0
Differential Revision: https://phabricator.fb.com/D1450193
Summary:
Adds a remotefilelog.pullprefetch config options that accepts a revset. Whenever
a pull is run, the revs matched by that revset will be prefetched. The most
common value for this will be '(bookmark() + heads(all())) & public()', since it will download
almost everything necessary to work offline.
Test Plan: Added a test. Ran it.
Reviewers: davidsp, pyd, sid0
Reviewed By: sid0
Differential Revision: https://phabricator.fb.com/D1419420
Summary:
Expands environment variables in the cacheprocess and cachepath config options,
so users can specify something like remotefilelog.cachepath=$HOME/.hgcache
Test Plan:
Set my cachepath to $HOME/.hgcache on my laptop and manually
performed a shallow clone. Verified data was put in ~/.hgcache
Reviewers: sid0
Differential Revision: https://phabricator.fb.com/D1342174
Summary: Pulling from a local non-remotefilelog repo to a remotefilelog repo was broken. This fixes it.
Test Plan: `hg pull` from a local non-remotefilelog repo to a remotefilelog repo.
Reviewers: durham
Reviewed By: durham
Differential Revision: https://phabricator.fb.com/D1341059
Summary:
Recent changes to upstream Mercurial have moved localrepo.getbundle and
localrepo.addchangegroupfiles to changegroup.py. remotefilelog wraps these
functions, and thus needs to be updated.
Applyupdate also had a function signature change, which is fixed here.
Minor fix to a test as well, which had a hard coded time instead of a glob.
Test Plan: ./run-tests.py --with-hg=/data/users/durham/hg/hg
Reviewers: sid0, davidsp, pyd, dschleimer
Differential Revision: https://phabricator.fb.com/D1260737
Preivously shallow clones only work using the streaming clone protocol. With
this change they work for the standard getbundle protocol as well. This is what
the majority of Mercurial users use, so we need to support that.
The current local cache is just files on disk, and this implementation detail
was spread across the extension. This change refactors it to hide the
implementation inside a class so that we can replace it with other
implementations (such as a sqlite local cache) later.
Previously the file service client was a global object that all repos could
share. This was a bit hacky and is no longer needed. Now the file service
client exists per repo instance.
This is part of a series of changes to abstract the local caching and remote
file service in such a way that we can plug and play implementations.
If the memcache process exited early, remotefilelog was throwing an exception
instead of falling back to the server. This change makes it fall back to the
server, and also print a warning that the cache connection closed early.
Summary:
hg bundle was producing shallow bundles. This change makes it produce full
sized bundles so they can be used in other repos.
Test Plan: Added a test
Reviewers: sid0
Reviewed By: sid0
CC: keegancsmith
Differential Revision: https://phabricator.fb.com/D1167462
Summary:
Previously requesting remotefilelog file blobs from the server required write
access in order to write the blob to the cache. This changes it to not abort
entirely if the user doesn't have write access.
Test Plan:
cd tests
./run-tests.py --with-hg=/data/users/durham/hg/hg test-permissions.t
Also ran the test without the fix and verified it fails.
Reviewers: sid0, davidsp, pyd, dschleimer
Reviewed By: dschleimer
Differential Revision: https://phabricator.fb.com/D1145976
Task ID: 3601184
Summary:
Adds a 'hg prefetch' command to remotefilelog for prepopulating the
local cache. Supports specifying revsets and file patterns to limit what is
downloaded.
Test Plan: ./run-tests.py test-prefetch.t --with-hg=/data/users/durham/hg/hg
Reviewers: dschleimer, sid0, davidsp, pyd, mpm
CC: kunalb, minyoung
Differential Revision: https://phabricator.fb.com/D1129942
The alternate lookup code was mistakening looking for only the last digit
instead of looking at the entire prefix. This meant files with more than 10
alternates would start failing to find histories, which breaks rebase.
When falling back to the master server for cache misses, we only kept two
requests in flight at any time. Over high latency connections (like across
oceans) this resulted in very slow downloads.
This change increases the request size to 10,000 keys at once. This will keep
the size of the request lower than the tcp buffer size, while allowing us to
maximize our throughput.
Previously we sent the entire list of files to the fallback repo in a single ssh
write/flush. If the size of this write exceeded the tcp buffer on the receiving
end, the call would hang until the buffer had room. The problem is that the
receiving end (the server) is hung trying to send data back to the
client. Therefore it deadlocked.
The fix is to send and receive requests one at a time. We always have the next
request in flight while receiving so we shouldn't be waiting on requests too
often.
Enables specifying a name for a repo that is used in the cache key.
This allows multiple repos on a machine to share a cache without the
risk of keys overlapping.
Previously we used a global variable to track if the incoming connection was
from a shallow remote (based on if the network command was a *_shallow command).
This is hacky and overall a bad idea. The new implementation stores the shallow
flag as a bundlecapability passed to the getbundle command.
A side effect of this is remotefilelog won't work with versions of mercurial
that don't use the getbundle command.
The previous algorithm thought that if the system cache had the file rev, it was
guaranteed to be valid. This isn't true in the case of a machine in which
multiple people share the cache (one person may have pulled a rev but the other
hasn't).
The new algorithm is more explicit. It checks:
- system cache
- local cache
- local cache fallbacks
- remote cache
- master server
Adds a cache client implementation using the opensource python-memcached
library. It's more of an educational example than a production ready one since
it doesn't perform the requests asynchronously. It does however split up large
files into smaller chunks for you.
The remotefilelog extension currently doesn't work with tags. Adding include and
exclude patterns allows users to specify which files they want to treat as
shallow and which the want to download the entire history for. By excluding
.hgtags from being shallow, this enables tags to work in a mostly shallow repo.
This also enables largefile like scenarios where most files are full and only a
few large ones are kept remote.
A rare bug can occur where the local file blob might not exist, but a valid old
version of that blob does exist. This refactor the linknode logic in ancestormap
to check the old versions if the server fetch fails to find the blob.
It still prints an ugly warning message from the server, but this whole issue is
quite rare anyway.
When the cache is stored on a filesystem, excessive stat calls can slow
mercurial updates down dramatically. This reduces it to a single open call for
the cache location and if that fails, a single open call for the local location.
A workingctx produces manifest entries with nullid+'a' or nullid+'m'
for any added or modified files. The extension was trying to prefetch
these but they didn't exist and caused an error. Luckily they are length
42 so we can check for them and not prefetch them.