Commit Graph

91 Commits

Author SHA1 Message Date
Durham Goode
2b154613e2 pack: switch readers to read from file handle
Summary:
A future patch will move the pack wireprotocol to use bundle2. In this new world
we'll be given a file handle instead of a remote peer, so let's switch the
utility methods to work on a file handle instead.

Test Plan: Ran the tests

Reviewers: #mercurial, quark

Reviewed By: quark

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4924553

Signature: t1:4924553:1493050882:7a9ee8b282bf47ef393362dd0114d801dc2a68d5
2017-04-27 10:44:33 -07:00
Durham Goode
902af15d1e remotefilelog: refactor pack wireprotocol to separate file
Summary:
The pack wireprotocol will be useful for exchanging treemanifests, so let's
refactor it out to it's own file. There's a slight protocol change here, where
we terminate the response with 10 null bytes (2 for 0 length filename, 4 for 0
length data, 4 for 0 length history) instead of the original 2 null bytes (for
0 length filename) which didn't let us handle entries with '' as the name.

This pack exchange code isn't even used in production, since most remotefilelog
downloads are done via the lose file (getfile/getfiles) format.

Test Plan:
Ran the tests. Even though this code isn't used in production, the
prefetch and repack tests still cover it.

Reviewers: #mercurial, rmcelroy

Reviewed By: rmcelroy

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4860556

Signature: t1:4860556:1491853521:c3810a4a681606571354b270b957e8df0962c86a
2017-04-10 17:56:01 -07:00
Kostia Balytskyi
0c2d706810 remotefilelog: wrap lz4 imports for compatibility
Differential Revision: https://phabricator.intern.facebook.com/D4707115
2017-03-17 14:02:26 -07:00
Durham Goode
5903428735 remotefilelog: add metric logging for prefetch
Summary:
This adds ui.log() output for prefetch statistics. Extensions who hook into
ui.log() can now log this data to external metrics systems.

Test Plan:
Ran a hg prefetch with the config flags enabled, while ptailing the dev command
timer. Verified the result contained remotefilelogfetches*

```
CHGDISABLE=1 FB_HG_DIAGS=1 hg --config
extensions.remotefilelog=../fb-hgext/remotefilelog/ --config
sampling.key.remotefilelog.prefetch=perfpipe_dev_command_timers prefetch -r .~9
ptail -f perfpipe_dev_command_timers | grep durham
```

Reviewers: #mercurial, simonfar

Reviewed By: simonfar

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4711096

Signature: t1:4711096:1489591144:1c91a4fbd118a3c10c2a2c68391c9f5b0dbcedf3
2017-03-15 20:57:32 -07:00
Kostia Balytskyi
8458ee33af remotefilelog: make _getfiles step size configurable
Summary:
This makes the `_getfiles` batch size configurable. Let me know if some other config name will serve this purpose better.

**Reason for this fix**
Currently, `_getfile` will write 10000 lines of text into a pipe and only upon the success of this operation, will read file blobs from another pipe. Serving process will start writing file blobs into a second pipe as soon as it sees something in the first pipe. Second pipe's buffer will fill up as it is not read from by the client until client writes 10K file requests. 10K file requests fill the buffer of the first pipe and we have a deadlock.
Ideally, we should make client check whether it can write to the first pipe and if not, go and read from the second pipe, but that is a bigger fix.

Test Plan:
- run local tests, see them all passing
- except for `test-cstore.t`, but it fails for me without my changes as well
- this generally makes sense

Reviewers: #sourcecontrol, durham

Reviewed By: durham

Subscribers: durham, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4620152

Signature: t1:4620152:1488221258:04555177926d129c6ba41bc982ad4e913cb31b20
2017-02-27 14:45:51 -08:00
Durham Goode
4eb48da7a3 remotefilelog: back out threading change
This threading changed introduced a deadlock. I'm not sure what caused it, but
let's back it out for now.
2017-02-01 07:00:31 -08:00
Olivier Trempe
7a3f21ad7d remotefilelog: deadlock due to OS buffer filled up on large batch requests
Summary:
On Windows, the default SSH client shipped with TortoiseHg is TortoisePlink.
It is very slow and for some reason, we did not experiment deadlocks using this
client. (My unverified hypothesis is that it may have an internal buffer to
manage large requests).

Using another SSH client such as MSYS ssh.exe (shipped with git Windows
installer) dramatically speeds up tranfers (~8 times). However, using this client
yields the deadlock problem.

NOTE: I don't know why it's not a problem on linux boxes.

The fix is to issue requests from a separate thread, so we don't enter a
deadlock on the client.
2017-01-26 09:13:20 -08:00
Durham Goode
d3c6def7b8 remotefilelog: move pack file permissions to be in mutablebasepack
Summary:
Treemanifest had a bug where the pack files it created were 400 instead of 444.
This meant people sharing a cache on the same machine couldn't access them. In
most of the remotefilelog code we had set opener.createmode before creating the
pack but we forgot to for treemanifest.

This patch moves the opener creation and createmode setting into the mutable
pack class so we can't screw this up again.

Test Plan:
Tests that wrote to the packs before, now failed and had to be
updated to chmod the packs before writing to them.

Reviewers: #mercurial, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4411580

Tasks: 15469140

Signature: t1:4411580:1484321293:9aa78254677548a6dc2270c58cee0ec6f57dd089
2017-01-13 09:42:25 -08:00
Jun Wu
d3424dd296 fastannotate: work better with remotefilelog
Summary:
fastannotate will tell remotefilelog what nodes of a file is already known in
linelog + revmap, so remotefilelog will not prefetch those files. Previously,
fastannotate either prevents all prefetching or allows all prefetching, which
is sub-optimal.

fastannotate could now donate its sshpeer to remotefilelog, so remotefilelog
won't need to start another one (assuming they can share a same sshpeer,
could be turned off via config options). This should reduce run time if SSH
handshake is expensive.

fastannotate could now also steal sshpeer from remotefilelog, so fastannotate
won't need to start another one. Combined with the above change, there would
always be only one sshpeer shared by fastannotate and remotefilelog.

Test Plan: Modified existing tests

Reviewers: #sourcecontrol, durham

Reviewed By: durham

Subscribers: ikostia, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4325382

Signature: t1:4325382:1481933531:39d6344b2c076fbbff1f07997cd268e7cee25e80
2016-12-19 12:13:51 -08:00
Jun Wu
d633064ec0 remotefilelog: correct ResponseError usage
Summary:
This fixed a user report (P56867550) where the error message does not get
shown correctly, because ResponseError takes two parameters.

Also did a minor change for the first ResponseError to comply the format.

Test Plan: `arc diff`

Reviewers: durham, #sourcecontrol, simonfar

Reviewed By: simonfar

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4280775

Signature: t1:4280775:1481016196:4762161be699e5d215ec86b2d1385493d977a2b6
2016-12-05 23:02:22 +00:00
Augie Fackler
857f31c0d7 fileserverclient: avoid ever requesting nullid nodes
I keep tripping over bugs where this angers some part of our
stack. It's dumb to even be fetching these, but it's harmless to skip
them, so issue a developer warning when we encounter one and refuse to
fetch it.
2016-10-25 12:29:44 -07:00
Durham Goode
4235752853 remotefilelog: rename getpackpath to getcachepackpath
Summary:
In a future diff we will be introducing packs into .hg/store, so we need to
differentiate between cache packs and local packs. This patch renames
getpackpath to getcachepackpath. A future diff will add getlocalpackpath.

This exposed a pyflakes error, so we fix that too.

Test Plan: Ran the tests

Reviewers: #mercurial, quark

Reviewed By: quark

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4055815

Signature: t1:4055815:1477059415:e0221557bbeec6701820c826f00390d3a71cd2d3
2016-10-21 11:02:20 -07:00
Durham Goode
f43ba75915 remotefilelog: fix pyflakes and module import errors
Summary:
This fixes all the pyflaks and module errors for the main remotefilelog
code base.

Test Plan: ./run-tests.py test-check* test-remotefilelog*

Reviewers: #mercurial, quark

Reviewed By: quark

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4055537

Signature: t1:4055537:1477049663:ee904d311d17d3659e055e2c109c68c9023cfd1f
2016-10-21 11:02:09 -07:00
Durham Goode
df86b3486d treemanifest: auto create manifest pack directory
The pack/manifests directory wasn't being automatically created, so let's make
it so.
2016-09-21 13:51:39 -07:00
Durham Goode
3b7beae747 ctree: add ctreemanifest hg extension
Summary:
Adds the initial extension that sets up the ctreemanifest. It currently relies
on the fastmanifest extension to hook into all the manifest APIs to construct
ctreemanifests.

Test Plan:
In a future patch, I was able to run 'hg manifests' on a commit and
have it return the manifest contents by reading the treemanifest.

Reviewers: #fastmanifest, ttung

Reviewed By: ttung

Subscribers: ttung

Differential Revision: https://phabricator.intern.facebook.com/D3755327

Signature: t1:3755327:1472114482:0c5862cba68ed4db643d28c2fae01f33f5352970
2016-08-29 16:19:52 -07:00
Adam Simpkins
e25889f60f fix crash in "hg pull" when remotefilelog not enabled
Summary:
764cd9916c94 recently introduced code that was unconditionally checking the
repo.includepattern and repo.excludepattern attributes on a local repository
without first checking if this is a shallow repository.  These attributes only
exist on shallow repositories, causing "hg pull" to crash on non-shallow
repositories.  This crash wouldn't happen in simple circumstances, since the
remotefilelog extension only gets fully set up once a shallow repository object
has been created, however when using chg you can end up with scenarios where a
non-shallow repository is used in the same hg process after a shallow one.

This refactors the code to now store the local repository object on the remote
peer rather than trying to store the individual shallow, includepattern, and
excludepattern attributes.

Overall this code does still feel a bit janky to me -- the rest of the peer API
is independent of the local repository, but the _callstream() wrapper cares
about the local repository being referenced.  It seems like we should ideally
redesign the APIs so that _callstream() receives the local repository data as
an argument (or we should make the peer <--> local repository assocation more
formal and explicit if think it's better to force an association here).

Test Plan: Added a new test which triggered the crash, but passes with these changes.

Reviewers: ttung, mitrandir, durham

Reviewed By: durham

Subscribers: net-systems-diffs@, yogeshwer

Differential Revision: https://phabricator.intern.facebook.com/D3756493

Tasks: 12823586

Signature: t1:3756493:1471971600:9666e9c31bf59070c3ace0821d47d322671eb5b1
2016-08-23 14:14:42 -07:00
Jun Wu
372dc6028a fileserverclient: remotefilepeer could have not implemented "cleanup"
Summary:
We actually have no idea what "hg.peer" will return and should check
if it has a "cleanup" method or not before wrapping it.

Test Plan: Code Review

Reviewers: ttung, #mercurial, rmcelroy

Reviewed By: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3703201

Signature: t1:3703201:1470924859:852eaf275c89ceced285a4b74d09938e489d9ee0

Blame Revision: D3685587
2016-08-11 15:14:54 +01:00
Jun Wu
037146cbd9 Workaround an issue where sshpeer + workers lead to deadlocks
Summary:
The deadlock happens in sshpeer.cleanup:

```
  # sshpeer.cleanup
  def cleanup(self):
      if self.pipeo is None:
          return
      self.pipeo.close() # [1]
      self.pipei.close()
      try:
          # read the error descriptor until EOF
          for l in self.pipee: # [2]
              self.ui.status(_("remote: "), l)
      except (IOError, ValueError):
          pass
      self.pipee.close()
```

Notes:

  1. Normally, this closes "stdin" of the server-side ssh process so
     the "ssh" process running server-side will detect EOF and end.
     However, with workers calling "os.fork", there could be other
     processes keeping "pipeo" open thus "ssh" won't receive EOF.
  2. Deadlock happens here, if the ssh process cannot get EOF and
     does not end itself, thus does not close its stderr.

The dead lock happens with these steps:

  1. "hg update ..." starts. Let's call it "the master process".
  2. Memcache miss happens, and a sshpeer is started by fileserverclient
     pipei, pipeo, pipee get created.
  3. Workers start. They inherit pipei, pipeo, pipee.
  4. The master process wait for the workers without closing pipeo.
  5. The workers are at the "cleanup" method.
  6. The server-side "ssh" reading its stdin hangs because the master
     process hasn't close pipeo, thus no EOF to its stdin.
  7. The server-side "ssh" process never ends. It does not close its
     stderr.
  8. The workers reading from pipee never end. They never get EOF
     because the server-side "ssh" won't close its stderr.
  9. The master process waiting for workers never never completes.
     Because workers won't exit before they read all pipee.

The patch closes pipee for forked processes to address the issue.
Ideally, we want this in sshpeer.py because it could in theory
affect other sshpeer use-cases. But let's do it for remotefilelog
for now since remotefilelog is currently the only victim of this
deadlock pattern.

Test Plan:
Add some extra debugging logs. Check the wrapped `_cleanup` gets called
and things work normally.

Reviewers: #mercurial, ttung, durham

Reviewed By: durham

Differential Revision: https://phabricator.intern.facebook.com/D3685587

Tasks: 12563156

Signature: t1:3685587:1470688591:0f4f97508699b273e17df867898d65205ee52434
2016-08-08 20:13:12 +01:00
Olivier Trempe
af90628992 flogheads: return an empty list when requesting heads of a non-existing filelog 2016-08-02 10:41:41 -07:00
Olivier Trempe
8005fc4b2f fileserverclient: add wireproto command for requesting a filelog's heads
Allowing discovery of all the heads of a filelog allows supporting some existing
Mercurial use cases, like viewing all the versions of a file in a UI.
2016-08-02 10:40:42 -07:00
Olivier Trempe
a2ce732706 fileserverclient: fixed lingering ssh connection due to reference cycle on pull operations
Calling wrapfunction on the remotefilepeer(sshpeer) object in exchangepull
function introduces a reference cycle. Hence, this object will not be deleted
until the process dies. This is not a big issue for processes having a short
lifetime(e.g. lauched by command line.)
However, for persistent processes (e.g. TortoiseHg), this can lead to multiple
lingering ssh connections to the server(actually one by pull operation).

The fix is to not wrap the remotefilepeer._callstream. This method is defined
right into the remotefilepeer object. The required repo data is made available
in the remotefilepeer object by monkeypatching this object in the exchangepull
function.
2016-07-22 13:47:02 -07:00
Jeroen Vaelen
07efaadb9d [remotefilelog] use hashlib to compute sha1 hashes
Summary:
hg-crew's c27dc3c3122 and c27dc3c3122^ were breaking our extensions:

```
$ hg log -r c27dc3c3122^
changeset:   9010734b79911d2d2e7405d91a4df479b35b3841
user:        Augie Fackler <raf@durin42.com>
date:        Thu, 09 Jun 2016 21:12:33 -0700
s.ummary:     cleanup: replace uses of util.(md5|sha1|sha256|sha512) with hashlib.\1
```

```
$ hg log -r c27dc3c3122
changeset:   0d55a7b8d07bf948c935822e6eea85b044383f00
user:        Augie Fackler <raf@durin42.com>
date:        Thu, 09 Jun 2016 21:13:23 -0700
s.ummary:     util: drop local aliases for md5, sha1, sha256, and sha512
```

I did a grep over facebook-hg-rpms to see what was affected:
```
$ grep "util\.\(md5\|sha1\|sha256\|sha512\)" -r ~/facebook-hg-rpms
/home/jeroenv/facebook-hg-rpms/remotefilelog/remotefilelog/basestore.py:            sha = util.sha1(filename).digest()
/home/jeroenv/facebook-hg-rpms/remotefilelog/remotefilelog/basestore.py:                sha = util.sha1(filename).digest()
/home/jeroenv/facebook-hg-rpms/remotefilelog/remotefilelog/shallowutil.py:    pathhash = util.sha1(file).hexdigest()
/home/jeroenv/facebook-hg-rpms/remotefilelog/remotefilelog/shallowutil.py:    pathhash = util.sha1(file).hexdigest()
/home/jeroenv/facebook-hg-rpms/remotefilelog/remotefilelog/debugcommands.py:    filekey = util.sha1(file).hexdigest()
/home/jeroenv/facebook-hg-rpms/remotefilelog/remotefilelog/historypack.py:        namehash = util.sha1(name).digest()
/home/jeroenv/facebook-hg-rpms/remotefilelog/remotefilelog/historypack.py:        node = util.sha1(filename).digest()
/home/jeroenv/facebook-hg-rpms/remotefilelog/remotefilelog/historypack.py:        files = ((util.sha1(filename).digest(), offset, size)
/home/jeroenv/facebook-hg-rpms/remotefilelog/remotefilelog/fileserverclient.py:    pathhash = util.sha1(file).hexdigest()
/home/jeroenv/facebook-hg-rpms/remotefilelog/remotefilelog/fileserverclient.py:    pathhash = util.sha1(file).hexdigest()
/home/jeroenv/facebook-hg-rpms/remotefilelog/remotefilelog/basepack.py:        self.sha = util.sha1()
/home/jeroenv/facebook-hg-rpms/remotefilelog/tests/test-datapack.py:        return util.sha1(content).digest()
/home/jeroenv/facebook-hg-rpms/remotefilelog/tests/test-histpack.py:        return util.sha1(content).digest()
Binary file /home/jeroenv/facebook-hg-rpms/hg-crew/.hg/store/data/mercurial/revlog.py.i matches
/home/jeroenv/facebook-hg-rpms/fb-hgext/sparse.py:            return util.sha1(fh.read()).hexdigest()
/home/jeroenv/facebook-hg-rpms/fb-hgext/sparse.py:        sha1 = util.sha1()
/home/jeroenv/facebook-hg-rpms/fb-hgext/sparse.py:        sha1 = util.sha1()
/home/jeroenv/facebook-hg-rpms/fb-hgext/sparse.py:        sha1 = util.sha1()
/home/jeroenv/facebook-hg-rpms/fb-hgext/sparse.py:    sha1 = util.sha1()
/home/jeroenv/facebook-hg-rpms/mutable-history/hgext/simple4server.py:        sha = util.sha1()
/home/jeroenv/facebook-hg-rpms/mutable-history/hgext/evolve.py:        sha = util.sha1()
```
This diff is part of the fix.

Test Plan:
Ran the tests.
```
$MERCURIALRUNTEST -S -j 48 --with-hg ~/local/facebook-hg-rpms/hg-crew/hg
```

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.intern.facebook.com/D3440041

Tasks: 11762191
2016-06-15 15:48:16 -07:00
Durham Goode
d860baf210 cachegroup: fix pack path use of cachegroup
Summary:
The pack path logic did not use the correct unix group when
remotefilelog.cachegroup was specified. This fixes that.

Test Plan:
I manually tested it by deleting a pack dir and running repack. This
is hard to create an automated test for since the feature isn't really cross
platform, and we don't have a way to know what groups they have on their
machine.

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3400756

Tasks: 11584114

Signature: t1:3400756:1465342537:ed023f6dc830117df5e85e294a41486f072714c9
2016-06-08 09:09:06 -07:00
Martin von Zweigbergk
14962d6e74 fileserverclient: make iterbatch() case work with new store
The iterbatch() handling added in f93aa99d4f1e (fileserverclient: use
new iterbatch() method, 2016-03-22) was broken by 31e88bf6faf0 (store:
change fileserviceclient to write via new store, 2016-04-04). Fix it
by copying the pattern introduced elsewhere in that change.
2016-05-18 22:39:20 -07:00
Olivier Trempe
abf6481106 windows: grp module not supported
On windows the grp module is not present, so we need to avoid importing it. This
means the shared group feature of remotefilelog is not supported on windows.
2016-05-20 08:08:57 -07:00
Durham Goode
a5ed23b7ca fileserverclient: separate data and history in prefetching
Summary:
Previously the fileserverclient logic only checked the data store when
determining whether to prefetch a given key or not. This meant that if the file
was in the data store but not the history store, it could result in a history
lookup failing to prefetch.

The fix is to separate the notions of data and history in the prefetch logic. By
default we just check the data store like before, but an optional argument
allows us to specify checking the history store as well (and we change the
remotemetadatastore to pass that argument).

Test Plan:
Ran the tests. Also ran the repack scenario in my large repo that
reproed the issue. In another diff, I'm going to come back and add a suite of
tests around the various repack permutations that I've seen cause issues.

Reviewers: #mercurial, ttung, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3277239

Signature: t1:3277239:1463085690:49ad478048cd13836b60f7ac9190e2294f5e9c64
2016-05-16 10:59:09 -07:00
Durham Goode
5bb799ed73 prefetch: add progress to pack prefetch
Summary:
Add a simple progress bar on the client when receiving a pack from the
server.

Test Plan: Ran it

Reviewers: #mercurial, ttung, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3277265

Signature: t1:3277265:1463085643:aa0960092958c4f56a6d1c3a3901348dba48aa91
2016-05-16 10:59:09 -07:00
Durham Goode
05ceb8b419 store: basic wire protocol for bundle delivery
Summary:
This adds a new wire protocol command to allow clients to request a set
of file contents and histories from the server and receive them in pack format.
It's pretty simple and always returns all the history for every node requested
(which is a bit overkill), but it's labeled v1 and we can iterate on it.

Test Plan: Added a test

Reviewers: #mercurial, ttung, mitrandir

Reviewed By: mitrandir

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3277212

Signature: t1:3277212:1463421279:459cc84265502175b47df293647aab7e7a830185
2016-05-16 10:59:09 -07:00
Durham Goode
7e1047d11f store: change union stores to accept a list of stores
Summary:
Instead of hard coding the list of stores in each union store, let's make it a
list and just test each store in order. This will allow easily adding new stores
and reordering the priority of the existing ones.

Also fix the remote store's contains function. 'contains' is the old name, and
it now needs to be getmissing in order to fit the store contract.

Test Plan: ran the tests

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Differential Revision: https://phabricator.fb.com/D3205314

Signature: t1:3205314:1461606028:3a513ac82c5de668a7e40bbf7cc88d8754e2f0bb
2016-04-26 15:10:38 -07:00
Durham Goode
9cfbf5a59e store: keep track of the writable store instead of hard coding it
Summary:
A future patch is going to change the union store to just contain an ordered
list of stores. Therefore we need a special spot to record which store is the
one that should receive writes.

Test Plan: ran the tests

Reviewers: #sourcecontrol

Differential Revision: https://phabricator.fb.com/D3205307
2016-04-26 15:10:38 -07:00
Durham Goode
859510b65e checkcode: fix fileserverclient.py
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3221369

Signature: t1:3221369:1461648197:185cbbba61a9d1a7a1beacd64153185d0d0826ed
2016-04-26 13:00:31 -07:00
Durham Goode
2e93ca187a Add byte count checking when receiving from the server
Summary:
We've received a few complaints that receivemissing is throwing corrupt data
exceptions. My best guess is that we're not receiving all of the data for some
reason. Let's add an assertion to ensure all the data is present, so we can
narrow it down to a connection issue instead of actual corrupt data.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D3136203
2016-04-05 09:50:12 -07:00
Durham Goode
24323a759c store: address code review feedback
This was meant to be part of the previous stack of commits, but I pushed the
wrong stack. This patch addresses a number of code review feedback points, the
most visible being to remain 'contains' to something else (in this case
'getmissing').
2016-04-04 16:48:55 -07:00
Durham Goode
8ca8f7f6ca stores: remove fetch logic and replace with a remote store fallthrough
The old way of fetching from the server required the base store api expose a way
for outside callers to add fetch handlers to the store. This exposed some of the
underlying details of how data is fetched in an unnecessary way and added an
awkward subscription api.

Let's just treat our remote caches as another store we can fetch from, and
require that the over arching configure logic (in shallowrepo.py) can connect
all our stores together in a union store.
2016-04-04 16:26:12 -07:00
Durham Goode
29ea8ada1e store: delete the localcache class
Now that all functionality has been moved to the new store, we no longer need
the localcache class. So let's delete it.
2016-04-04 16:26:12 -07:00
Durham Goode
d70897e18c store: implement markrepo on the new store
Now that most of our storage has been moved behind the new store, let's also
move the ability to mark the repo to behind that storage abstraction.
2016-04-04 16:26:12 -07:00
Durham Goode
8ad3ce6f41 store: change fileserviceclient to write via new store
Now that we have the new store abstraction, and now that remotefilelog.py writes
via it, let's also make fileserverclient write to the store via that API.

This required some refactoring of how receive missing worked, so we could pass
the filename down, as that is required for writing to the store.
2016-04-04 16:26:12 -07:00
Durham Goode
1d97924c54 store: construct store during repo creation
We are refactoring the storage to be behind more abstract APIs. This patch
creates the new store objects on the repo and passes them to the
fileserverclient so it can add itself as a file provider, in the case of misses.
2016-04-04 16:26:12 -07:00
Kostia Balytskyi
4e61e19a3d remotefilelog: do ui.log of remotecache hit rate
Summary: We would like to utilize Martijn's logtoprocess extension to log cache hit rate.

Test Plan: None so far, will update the diff later.

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D3094765
2016-04-01 03:16:00 -07:00
Augie Fackler
9eb0009839 fileserverclient: use new iterbatch() method
This allows the client to send a single batch request for all file contents
and then handle the responses as they stream back to the client, which should
improve both running time and the user experience as far as it goes with
progress.
2016-03-22 10:06:24 -07:00
Wez Furlong
2ec314e26a remotefilelog: add separate option to validate localcache files
Summary:
We've recently had to dig into two different issues that resulted in broken
files landing in the localcache; one was due to a problem with the data source
for our cacheprocess becoming corrupt and the other was due to a failed write
(ENOSPC) causing a truncated file to be left in the local cache.

It is desirable to perform some lightweight consistency checks before we return
data up to the caller of localcache, but prior to this diff the validation
functionality was coupled to configuring a log file.

Due to the shared nature of the localcache it's not always clear cut where we
want to log localcache consistency issues, so it feels more flexible to
decouple logging from enabling checks.

This diff introduces `remotefilelog.validatecache` as a separate option that
can have three values:

* `off` - no checks are performed
* `on` - checks are performed during read and write
* `strict` - checks are performed during __contains__, read and write

The default is now `on`.

Test Plan: `./run-tests.py --with-hg=../../hg-crew/hg`

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D2941067

Tasks: 10044183, 9987694
2016-02-18 08:34:33 -08:00
Wez Furlong
fd584f7e56 remotefilelog: more graceful handling of write errors for localcache
Summary:
I debugged an issue this past week where a set of machines had exhausted the
disk space available on the partition where the local cache was situated.  This
particular tier didn't use cacheprocess, only the local cache.  There were some
number of truncated files in the local cache.

Inspecting the code here, it looks like we're using atomictempfile incorrectly.
atomictempfile.close() will unconditionally rename the temp file into place,
and we were calling this from a finally handler.

It seems safest to remove the try/finally from around this section of code and
just let the destructor trigger to clean up the temporary file in the error
path, and if we make it through writing the data, then call close and have it
move the file in to place.

Test Plan:
ran the tests.  They don't cover this case, but at least I didn't
obviously break anything:

```
 $ ./run-tests.py --with-hg=../../hg-crew/hg
...................
# Ran 19 tests, 0 skipped, 0 warned, 0 failed.
```

Reviewers: #sourcecontrol, ttung, mitrandir

Reviewed By: mitrandir

Subscribers: scyost

Differential Revision: https://phabricator.fb.com/D2940861

Tasks: 10044183

Signature: t1:2940861:1455673078:a7593d70c32151e13c8ccc31f92387e9c8cb23a0
2016-02-17 08:03:38 -08:00
Augie Fackler
afca077cf9 fileserverclient: add option to provide file path to cacheprocess
For our uses of remotefilelog, life is significantly easier if we also
have the file path rather than just a hash of the file path. Hide this
behind a config knob so users can enable it or not as makes sense.
2016-01-27 13:22:22 -08:00
Durham Goode
b1c0840594 Remove unnecessary fallbackpath arg from getfiles
This wasn't used so we can clean it up.
2015-12-11 11:20:24 -08:00
Durham Goode
20102e4f2b Reuse ssh connection across miss fetches
Summary:
Previously we recreated the ssh connection for each prefetch. In the case where
we were fetching files one by one (like when we forgot to batch request files),
it results in a 1+ second overhead for each fetch.

This changes makes us hold onto the ssh connection and simply issue new requests
along the same connection.

Test Plan:
Some of the tests execute this code path (I know because I saw them
fail when I had bugs)

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D2744688
2015-12-11 11:18:51 -08:00
Durham Goode
2b30eeb96b Fix exception when making a directory that already exists
Summary:
There was a race condition where there could be an exception when trying to
create directories that already exist.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D2736268
2015-12-10 10:11:27 -08:00
Laurent Charignon
e388dd5709 localcache: don't fail on file removal if the file is not there
If another process deletes files managed by localcache, then, the gc step would
fail. This patch prevents the failure and add interesting information to debug
the problem.
2015-12-02 10:40:49 -08:00
Ryan McElroy
9943d04f51 make fileserverclient.close fully robust
Test Plan: ran tests

Reviewers: durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D2544243

Signature: t1:2544243:1444883084:a7b9cc9167a7671e34813826ba9fcd289919afd1
2015-10-14 18:49:55 -07:00
Ryan McElroy
f8360b4766 remotecache: unconditionally close process and pipes
Summary:
It is possible to mark the cache connection as closed but never close
the pipes, which leads to an error the next time the connection is opened for
use. Make sure we actually close and terminate everything when close is called.

Test Plan: ran the tests

Reviewers: #sourcecontrol, durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D2540680

Tasks: 8712950

Signature: t1:2540680:1444841805:e9fd8f21ab370a599138bd8b0c3241543418521a
2015-10-14 08:12:48 -07:00
Durham Goode
4eec2c3535 Add excessive fetch logging
Summary:
We've received reports of non-batch fetches that do a ton of invididual file
downloads. This patch adds logging to the blackbox for that.

Test Plan:
manually changed the code to trigger the logging and verified it came
out in the blackbox and had a warning message.

Reviewers: #sourcecontrol

Differential Revision: https://phabricator.fb.com/D2488803
2015-09-28 22:16:12 -07:00