Summary:
This seems similar to t9539553, the patch is just the change of the
test output, it seems like we changed some bundle related code in core.
Test Plan: test pass
Reviewers: ericsumner, durham
Differential Revision: https://phabricator.fb.com/D2794159
Summary:
In 4fb35d8c2105 in core @durham removed _verify and replaced it with
verify, this patch makes remotefilelog compatible with those changes.
Test Plan: The tests are failing after but don't fail on this anyore
Reviewers: ericsumner
Subscribers: durham
Differential Revision: https://phabricator.fb.com/D2791847
Summary:
Historicaly we would move the old backup data blob to <name>+<int> so we had a
record of all the old data blobs we could search though for good commit
histories.
Since we no longer require that the data blobs have perfect commit histories,
these extra blobs just take up space.
This changes makes us only store one old version (for debugging and recovery
purposes), which should save space on clients.
Also switched to atomic rename writes while we're at it.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Differential Revision: https://phabricator.fb.com/D2770675
The newly added checkunknown prefetching apparently gets handed the full list of
files that are not present on disk right now, which includes all the files
outside of the sparse checkout. So we need to filter those out here.
Summary:
When running addremove, it needs to see the contents of the removed files so it
can determine if they are a remain. So we need to add bulk prefetching in this
situation.
Test Plan: Added a test
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: dcapra
Differential Revision: https://phabricator.fb.com/D2756979
Signature: t1:2756979:1450132279:668b8b160d792cad1ac37e2069716e20ea304f57
Summary:
During hg status Mercurial sometimes needs to look at the size of contents of
the file and compare it to what's in history, which requires the file blob.
This patch causes those files to be batch downloaded before they are compared.
There was a previous attempt at this (see the deleted code), but it only wrapped
the dirstate once at the beginning, so it was lost if the dirstate object was
replaced at any point.
Test Plan: Added a test to verify unknown files require only one fetch.
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Subscribers: dcapra
Differential Revision: https://phabricator.fb.com/D2756768
Signature: t1:2756768:1450130997:7c7101efe66c998e3182dfbd848aa6b1a57d509f
Summary:
When doing an update, Mercurial checks if unknown files on disk match
what's in memory, otherwise it stops the checkout so it doesn't cause data loss.
We need to batch fetch the necessary files from the remotefilelog server for
this operation.
Test Plan: Added a test
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: dcapra
Differential Revision: https://phabricator.fb.com/D2756837
Signature: t1:2756837:1450132288:bc0530a07ea40aaeb2af1a93e4da82778cc11369
Summary:
Previously we recreated the ssh connection for each prefetch. In the case where
we were fetching files one by one (like when we forgot to batch request files),
it results in a 1+ second overhead for each fetch.
This changes makes us hold onto the ssh connection and simply issue new requests
along the same connection.
Test Plan:
Some of the tests execute this code path (I know because I saw them
fail when I had bugs)
Reviewers: #sourcecontrol, ttung
Differential Revision: https://phabricator.fb.com/D2744688
localrepo.clone() was removed in hg revision 9996a5eb7344 (localrepo:
remove clone method by hoisting into hg.py, 2015-11-11).
Instead of localrepo.clone(), we now use exchange.pull(). However,
that method was already overridden in onetimeclientsetup(), which is
called from our new overriding of exchange.pull(). Since it should be
done first, we move that overriding from onetimeclientsetup() to
uisetup().
Summary:
There was a race condition where there could be an exception when trying to
create directories that already exist.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Differential Revision: https://phabricator.fb.com/D2736268
Summary:
Previously hg gc would try to keep all files relevant to all heads in the repo.
If the repo has a lot of heads, reading the manifest for all of them and
building a massive set of all the files can be extremely slow.
Let's just keep files related to the most recent public heads.
Test Plan: Ran the tests. This improves 'hg gc' time on some repos from 2 hours to 10 minutes.
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D2733157
Signature: t1:2733157:1449558332:14bbea343600959155f5927913552304ab8f94a7
Before this patch, if the repofile was empty or containing bad entries we were
just crashing. This patch prevents the crash by catching the error and displays
some interesting information to debug issues.
If another process deletes files managed by localcache, then, the gc step would
fail. This patch prevents the failure and add interesting information to debug
the problem.
Summary:
Attempting to maintain perfect history in the file blobs has become the most
complex, bug prone, and performance hurting aspect of remotefilelog. Let's just
drop this requirement and rely on upstream Mercurial's ability to fixup linkrevs
in the face of imperfect data.
The real solution for this class of problems is to make it so that the filelog
hashes are unique with respect to the commit that introduces them, but that's a
much harder problem.
Test Plan:
Ran the tests.
Made a commit with 1000 files changes. hg commit went from 15s to 7.5s. The difference will be even more dramatic for certain situations that have known to have caused problems in the past.
Reviewers: #sourcecontrol, pyd
Subscribers: rmcelroy, pyd
Differential Revision: https://phabricator.fb.com/D2686318
Summary:
Previously we would keep all server cache files for any head in the repo, even
if that head was really old. This resulted in unnecessarily large serve caches.
The new strategy is to keep the files necessary for any commit within the past
25,000 revs or so. Even on repo's with large commit rates this equates to
multiple weeks of time.
Test Plan: Ran the tests
Reviewers: #sourcecontrol
Differential Revision: https://phabricator.fb.com/D2652542
Summary:
Previously, hg log -fr master file/ was very slow with remotefilelog because
Mercurial decides whether to take the slowpath (i.e. walk the changelog) or the
filelog path based on if the filelog exists in the repo. remotefilelog has no
way to know if the filelog exists (since there's not a full list of filelogs),
so it fakes it by returning 'True' any time mercurial asks, then when the
filelog is needed, remotefilelog walks the entire changelog to build a fake
looking filelog. Therefore mercurial attempted to take the filelog path, and
remotefilelog did a very slow walk.
The fix is to force mercurial to take the slowpath when it sees 'hg log -fr
revset file'. Technically we could take the fast path by inspecting all the
results of the revset and seeing if the file/pattern exists as a file in any of
those. But that could be expensive and complicated, so this naive fix will
suffice for now.
Test Plan: Added a test. Previously it resulted in no output
Reviewers: cdelahousse, rmcelroy, #sourcecontrol
Differential Revision: https://phabricator.fb.com/D2634918
Summary: thg and 'hg serve' stack trace when trying to view a file. The
correct fix is to walk back the changelog and look to see which was the
first one to touch the specific file. In the meantime, this makes the
graphic UIs usable.
Test Plan: ran tests
Reviewers: durham, rmcelroy
Reviewed by: rmcelroy
Summary:
It is possible to mark the cache connection as closed but never close
the pipes, which leads to an error the next time the connection is opened for
use. Make sure we actually close and terminate everything when close is called.
Test Plan: ran the tests
Reviewers: #sourcecontrol, durham
Reviewed By: durham
Differential Revision: https://phabricator.fb.com/D2540680
Tasks: 8712950
Signature: t1:2540680:1444841805:e9fd8f21ab370a599138bd8b0c3241543418521a
Summary:
We've received reports of non-batch fetches that do a ton of invididual file
downloads. This patch adds logging to the blackbox for that.
Test Plan:
manually changed the code to trigger the logging and verified it came
out in the blackbox and had a warning message.
Reviewers: #sourcecontrol
Differential Revision: https://phabricator.fb.com/D2488803
We've gotten reports of users receiving corrupt file blobs directly from the
server. The corruption doesn't enter the cache pool, and we don't get any
further reports of it, so I think it's a transient issue caused certain readers
reading the file before the writer has finished writing it.
Let's use atomic rename files to make this not happen.
I saw some crazy looking stack traces like this while testing an
improved implementation of our internal cacheprocess binary:
```
fileservice.prefetch([(self.filename, id)])
File "/usr/lib/python2.6/site-packages/remotefilelog/remotefilelog.py", line 78, in read
File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 357, in prefetch
raw = self._read(hex(node))
File "/usr/lib/python2.6/site-packages/remotefilelog/remotefilelog.py", line 283, in _read
missingids = self.request(missingids)
File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 196, in request
fileservice.prefetch([(self.filename, id)])
File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 357, in prefetch
missingid = cache.receiveline()
File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 105, in receiveline
self.close()
missingids = self.request(missingids)
File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 76, in close
File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 196, in request
self.pipei.write("exit\n")
missingid = cache.receiveline()
File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 105, in receiveline
ValueError: I/O operation on closed file self.close()
File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 76, in close
self.pipei.write("exit\n")
ValueError: I/O operation on closed file
```
it looks like we are somehow re-entrant (maybe referenced from multiple generators?) and get tripped
up if we're not careful about checking for or catching issues during the close() method call.
So let's be a little more careful :-)
Doing this rather than depending on Mercurial will allow developers of
Mercurial to have this package installed without having to do
something awkward to not also get Mercurial from their distro.
Summary:
_walkstreamfiles() uses mercurial.store.decodedir(), so
mercurial.store needs to be imported.
Test Plan:
Confirmed that _walkstreamfiles() no longer throws an exception when cloning a
remote shallow repository.
Reviewers: durham, pyd, rmcelroy
Reviewed By: rmcelroy
Subscribers: net-systems-diffs@, exa, yogeshwer
Differential Revision: https://phabricator.fb.com/D2409648
Signature: t1:2409648:1441245825:00a758f6f0884b77572078589f18592ca6cb6fa4
Streaming clones were taking a while because apparently self.datafiles()
actually stats each .i file instead of just returning the list straight from
fncache. To fix this, let's not call datafiles() when we know the matcher is
going to reject everything anyways.
This significantly speeds up streaming clones.