Commit Graph

227 Commits

Author SHA1 Message Date
Durham Goode
20102e4f2b Reuse ssh connection across miss fetches
Summary:
Previously we recreated the ssh connection for each prefetch. In the case where
we were fetching files one by one (like when we forgot to batch request files),
it results in a 1+ second overhead for each fetch.

This changes makes us hold onto the ssh connection and simply issue new requests
along the same connection.

Test Plan:
Some of the tests execute this code path (I know because I saw them
fail when I had bugs)

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D2744688
2015-12-11 11:18:51 -08:00
Martin von Zweigbergk
1c64f784ed make changegroup.addchangegroupfiles() overriding more flexible
The method gained a parameter in hg revision 43d86cd9dae2
(changegroup: note during bundle apply if the repo was empty,
2015-12-02).
2015-12-10 17:25:14 -08:00
Martin von Zweigbergk
7251d9b51b repo: replace repo.parents() by repo[None].parents()
repo.parents() was removed in hg revision d5d613de0f44 (commands:
inline definition of localrepo.parents() and drop the method (API),
2015-11-11).
2015-12-10 17:25:14 -08:00
Martin von Zweigbergk
8f7ee3c1b1 replace localrepo.clone() by exchange.pull()
localrepo.clone() was removed in hg revision 9996a5eb7344 (localrepo:
remove clone method by hoisting into hg.py, 2015-11-11).

Instead of localrepo.clone(), we now use exchange.pull(). However,
that method was already overridden in onetimeclientsetup(), which is
called from our new overriding of exchange.pull(). Since it should be
done first, we move that overriding from onetimeclientsetup() to
uisetup().
2015-12-10 17:25:14 -08:00
Martin von Zweigbergk
0ac1cffabd drop unnecessary "format.generaldelta=True" now that it's the default 2015-12-10 17:25:14 -08:00
Durham Goode
2b30eeb96b Fix exception when making a directory that already exists
Summary:
There was a race condition where there could be an exception when trying to
create directories that already exist.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D2736268
2015-12-10 10:11:27 -08:00
Laurent Charignon
7431e97098 remotefilelog: fix tests so that they work on devservers too 2015-12-09 16:55:05 -08:00
Eric Sumner
64aa941995 generaldelta config name change 2015-12-08 09:55:15 -08:00
Eric Sumner
9d009eb59f generaldelta now on by default 2015-12-08 09:55:10 -08:00
Durham Goode
f75037000f Make gc only inspect the last week of changes
Summary:
Previously hg gc would try to keep all files relevant to all heads in the repo.
If the repo has a lot of heads, reading the manifest for all of them and
building a massive set of all the files can be extremely slow.

Let's just keep files related to the most recent public heads.

Test Plan: Ran the tests. This improves 'hg gc' time on some repos from 2 hours to 10 minutes.

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D2733157

Signature: t1:2733157:1449558332:14bbea343600959155f5927913552304ab8f94a7
2015-12-08 09:53:33 -08:00
Laurent Charignon
c89f602b7d gcclient: guard against malformed repo paths
Before this patch, gc would stop on malformed repo path. When this happens
we want to know what happened and get useful debugging information.
2015-12-02 10:40:49 -08:00
Laurent Charignon
34e5ad607d gcclient: guard against corrupted or empty repofile
Before this patch, if the repofile was empty or containing bad entries we were
just crashing. This patch prevents the crash by catching the error and displays
some interesting information to debug issues.
2015-12-02 10:40:49 -08:00
Laurent Charignon
e388dd5709 localcache: don't fail on file removal if the file is not there
If another process deletes files managed by localcache, then, the gc step would
fail. This patch prevents the failure and add interesting information to debug
the problem.
2015-12-02 10:40:49 -08:00
Durham Goode
9947ff9cc6 Allow file blobs to have imperfect history
Summary:
Attempting to maintain perfect history in the file blobs has become the most
complex, bug prone, and performance hurting aspect of remotefilelog. Let's just
drop this requirement and rely on upstream Mercurial's ability to fixup linkrevs
in the face of imperfect data.

The real solution for this class of problems is to make it so that the filelog
hashes are unique with respect to the commit that introduces them, but that's a
much harder problem.

Test Plan:
Ran the tests.

Made a commit with 1000 files changes.  hg commit went from 15s to 7.5s.  The difference will be even more dramatic for certain situations that have known to have caused problems in the past.

Reviewers: #sourcecontrol, pyd

Subscribers: rmcelroy, pyd

Differential Revision: https://phabricator.fb.com/D2686318
2015-12-01 23:49:48 -08:00
Durham Goode
5c49e2b7e4 Change server cache collection strategy
Summary:
Previously we would keep all server cache files for any head in the repo, even
if that head was really old. This resulted in unnecessarily large serve caches.

The new strategy is to keep the files necessary for any commit within the past
25,000 revs or so. Even on repo's with large commit rates this equates to
multiple weeks of time.

Test Plan: Ran the tests

Reviewers: #sourcecontrol

Differential Revision: https://phabricator.fb.com/D2652542
2015-11-13 09:56:52 -08:00
Durham Goode
eb4f7f166c Speed up log -fr master file/
Summary:
Previously, hg log -fr master file/ was very slow with remotefilelog because
Mercurial decides whether to take the slowpath (i.e. walk the changelog) or the
filelog path based on if the filelog exists in the repo.  remotefilelog has no
way to know if the filelog exists (since there's not a full list of filelogs),
so it fakes it by returning 'True' any time mercurial asks, then when the
filelog is needed, remotefilelog walks the entire changelog to build a fake
looking filelog. Therefore mercurial attempted to take the filelog path, and
remotefilelog did a very slow walk.

The fix is to force mercurial to take the slowpath when it sees 'hg log -fr
revset file'. Technically we could take the fast path by inspecting all the
results of the revset and seeing if the file/pattern exists as a file in any of
those. But that could be expensive and complicated, so this naive fix will
suffice for now.

Test Plan: Added a test. Previously it resulted in no output

Reviewers: cdelahousse, rmcelroy, #sourcecontrol

Differential Revision: https://phabricator.fb.com/D2634918
2015-11-09 16:54:14 -08:00
Eric Sumner
14962f53b8 test: URL parameters could come in either order
This is being generated by iterating over a dict, which isn't stable.
2015-11-04 12:56:01 -08:00
Aaron Kushner
fe561e382a Don't stack trace when getting children from thg and hg serve
Summary: thg and 'hg serve' stack trace when trying to view a file. The
correct fix is to walk back the changelog and look to see which was the
first one to touch the specific file. In the meantime, this makes the
graphic UIs usable.

Test Plan: ran tests

Reviewers: durham, rmcelroy

Reviewed by: rmcelroy
2015-10-25 14:05:14 +00:00
Aaron Kushner
11c9fd8e04 Remove what looks to be dead code
Summary: changectx set, but doesn't seem to be used.

Test Plan: ran tests

Reviewers: rmcelroy, durham
2015-10-25 15:32:58 +00:00
Augie Fackler
0b81082c8a remotefilelog: cope with rename of addchangegroupfiles to _addchangegroupfiles
This prevents remotefilelog from breaking with Mercurial 3.6.
2015-10-15 10:12:54 -04:00
Ryan McElroy
9943d04f51 make fileserverclient.close fully robust
Test Plan: ran tests

Reviewers: durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D2544243

Signature: t1:2544243:1444883084:a7b9cc9167a7671e34813826ba9fcd289919afd1
2015-10-14 18:49:55 -07:00
Ryan McElroy
f8360b4766 remotecache: unconditionally close process and pipes
Summary:
It is possible to mark the cache connection as closed but never close
the pipes, which leads to an error the next time the connection is opened for
use. Make sure we actually close and terminate everything when close is called.

Test Plan: ran the tests

Reviewers: #sourcecontrol, durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D2540680

Tasks: 8712950

Signature: t1:2540680:1444841805:e9fd8f21ab370a599138bd8b0c3241543418521a
2015-10-14 08:12:48 -07:00
Durham Goode
bb8c595d67 Update to work with latest Mercurial
Upstream Mercurial has made a lot of changes around streaming clones, so we need
to update remotefilelog to handle these new changes.
2015-10-13 14:17:02 -07:00
Mathias De Maré
2ddceef9c7 cacheclient: don't forget to specify the port of the memcached server 2015-09-29 07:48:58 +02:00
Durham Goode
ca8028eb16 Add kwargs to repo.sparsematch 2015-10-06 10:07:01 -07:00
Durham Goode
4eec2c3535 Add excessive fetch logging
Summary:
We've received reports of non-batch fetches that do a ton of invididual file
downloads. This patch adds logging to the blackbox for that.

Test Plan:
manually changed the code to trigger the logging and verified it came
out in the blackbox and had a warning message.

Reviewers: #sourcecontrol

Differential Revision: https://phabricator.fb.com/D2488803
2015-09-28 22:16:12 -07:00
Durham Goode
e9a9bad998 Use atomic file writes for server side cache
We've gotten reports of users receiving corrupt file blobs directly from the
server. The corruption doesn't enter the cache pool, and we don't get any
further reports of it, so I think it's a transient issue caused certain readers
reading the file before the writer has finished writing it.

Let's use atomic rename files to make this not happen.
2015-09-28 10:31:38 -07:00
Wez Furlong
6e7195b8ef Be more careful during close
I saw some crazy looking stack traces like this while testing an
improved implementation of our internal cacheprocess binary:

```
fileservice.prefetch([(self.filename, id)])
  File "/usr/lib/python2.6/site-packages/remotefilelog/remotefilelog.py", line 78, in read
  File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 357, in prefetch
    raw = self._read(hex(node))
  File "/usr/lib/python2.6/site-packages/remotefilelog/remotefilelog.py", line 283, in _read
    missingids = self.request(missingids)
  File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 196, in request
    fileservice.prefetch([(self.filename, id)])
  File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 357, in prefetch
    missingid = cache.receiveline()
  File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 105, in receiveline
    self.close()
    missingids = self.request(missingids)
  File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 76, in close
  File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 196, in request
    self.pipei.write("exit\n")
    missingid = cache.receiveline()
  File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 105, in receiveline
ValueError: I/O operation on closed file    self.close()

  File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 76, in close
    self.pipei.write("exit\n")
ValueError: I/O operation on closed file
```

it looks like we are somehow re-entrant (maybe referenced from multiple generators?) and get tripped
up if we're not careful about checking for or catching issues during the close() method call.

So let's be a little more careful :-)
2015-09-15 07:48:14 -07:00
Augie Fackler
ee90e218a4 debian: mark us as breaking hg older than 3.5, and enhancing mercurial
Doing this rather than depending on Mercurial will allow developers of
Mercurial to have this package installed without having to do
something awkward to not also get Mercurial from their distro.
2015-09-14 12:37:07 -04:00
Augie Fackler
26c3dd960f builddeb: use a + before latesttagdistance instead of -
This matches the behavior of Mercurial itself.
2015-09-14 12:34:45 -04:00
Augie Fackler
22754f44c5 debian: mark architecture as "all"
We don't have any architecture-specific files, so we can use one deb
for all platforms.
2015-09-14 12:30:35 -04:00
Augie Fackler
9a73b2d621 Makefile: add make deb rule for convenience
This matches what Mercurial has, so I figure we may as well be
consistent.
2015-09-10 11:31:42 -04:00
Augie Fackler
5495152039 contrib: new rules for building a debian package of remotefilelog
To build it, simply do `bash contrib/builddeb`, which will build a
debian package for the current host system.
2015-09-10 11:01:36 -04:00
Augie Fackler
3adaef8565 hgignore: also ignore built packages 2015-09-10 11:28:38 -04:00
Augie Fackler
07fd87a3f1 setup: correctly state license 2015-09-10 11:09:33 -04:00
Augie Fackler
83c4f04860 setup: properly list lz4 as an install_requires 2015-09-10 11:01:03 -04:00
Adam Simpkins
a93ebb8b1e remotefilelogserver: fix missing import
Summary:
_walkstreamfiles() uses mercurial.store.decodedir(), so
mercurial.store needs to be imported.

Test Plan:
Confirmed that _walkstreamfiles() no longer throws an exception when cloning a
remote shallow repository.

Reviewers: durham, pyd, rmcelroy

Reviewed By: rmcelroy

Subscribers: net-systems-diffs@, exa, yogeshwer

Differential Revision: https://phabricator.fb.com/D2409648

Signature: t1:2409648:1441245825:00a758f6f0884b77572078589f18592ca6cb6fa4
2015-09-02 19:04:33 -07:00
Durham Goode
fb7827372b Don't check datafiles if the matcher says everything is remote
Streaming clones were taking a while because apparently self.datafiles()
actually stats each .i file instead of just returning the list straight from
fncache. To fix this, let's not call datafiles() when we know the matcher is
going to reject everything anyways.

This significantly speeds up streaming clones.
2015-09-05 12:24:04 -07:00
Mathias De Maré
8ab8d2601b fileserverclient: clear error message if cachepath is not configured 2015-08-29 08:20:54 +02:00
Augie Fackler
4451faca7e test-http: save access log and make sure we actually use request batching 2015-08-21 12:51:40 -04:00
Augie Fackler
226a6f1027 fileserverclient: add config knob to control batch size
Previously we'd just send one enormous batch for everything to the
server. This led to prolonged periods of no progress output for the
user. Now we send batches in smaller chunks (default is 100) which
gives the user some idea that things are working.

Includes a trivial test, which doesn't really verify that the batching
logic is used as described, but at least prevents the boneheaded error
I had in an earlier (unmailed) version of this patch which forgot to
use configint() when loading the config setting.
2015-08-18 15:14:01 -04:00
Augie Fackler
06c09f03ab fileserverclient: correctly use exception constructor
We were passing one argument instead of 3.
2015-08-18 15:35:21 -04:00
Augie Fackler
51f7cac5a7 getfile: add error reporting to getfile method
Without this, the only way to report a failure of a file load in a
batched set of getfile requests is to fail the entire batch, which is
potentially painful. Instead, add our own error reporting in-band
which the client can then detect and raise.

I'm not completely happy with the somewhat adhoc error reporting here,
but we expect our server to have at least one additional error ("not
allowed to see file contents") which will require some special
handling on our end, so we need some level of flexibility in the error
reporting protocol so we can extend it later. Sigh.

Open question: should we reserve some range of error codes so that
it's easy for strange custom servers to have related monkeypatches to
client code for custom handling of unforseen-by-remotefilelog
conditions?

I couldn't figure out how to actually get the client to try loading
file contents over http in the test, but the get-with-headers test at
least proves that the server responses look the way I expect.
2015-08-04 14:59:53 -04:00
Durham Goode
5bb4351364 prefetch: add prefetching to bundle receiving
We were not prefetching the potential dependent files for the filelog revisions
we received over the wire. This resulted in a lot of non-batched downloads,
which was super slow. This fixes it by batch downloading the parents and delta
parents of the incoming filelog revisions and adds a test.
2015-07-21 18:32:33 -07:00
Durham Goode
9152c8be08 fileserverclient: fix progress bar
A previous commit changed count to be a list, but missed the use of it when
being passed to progress. This fixes it.
2015-07-21 18:31:01 -07:00
Augie Fackler
5e7fb3d85f test-gc: filter wc output as suggested by check-code
wc's output is not wholly portable: BSD wc likes to prepend some whitespace.
2015-07-16 12:07:17 -04:00
Augie Fackler
ea78b4307a test-gc: work around lack of -d on BSD touch
touch -t is portable, but requires some computation to get a date
value that's a week ago. A Python oneliner is a little goofy, but
seemed like a straightforward enough answer that I chose that.
2015-07-16 12:06:37 -04:00
Augie Fackler
30c29e5afc test-push-pull: glob out total line from ls since it's platform-specific 2015-07-16 11:57:20 -04:00
Augie Fackler
26ab790f75 fileserverclient: mark getfile as batchable
This lets clients send many getfile requests in a single transaction.

Note that this requires 76fcf62accb0 be applied to your Mercurial, or
you'll be bitten by a bug[0] in Mercurial's wireproto batching. As a
result of this change, remotefilelog now effectively requires the
upcoming Mercurial 3.5 if you want to use a specific release.

0: http://bz.selenic.com/show_bug.cgi?id=4739
2015-06-30 17:34:01 -04:00
Augie Fackler
16310f95f3 remotefilelog: introduce new getfile method
Right now, this is a naive fetch-one-file method. The next change will
mark the method as batchable and use a batch in the client so that
many files can be requested in a single RPC.
2015-06-30 17:32:31 -04:00