Commit Graph

598 Commits

Author SHA1 Message Date
Durham Goode
c429030ca4 store: add class definitions and stub for repack
Summary:
This introduces the high level classes that will implement the generic repack
logic.

Test Plan: Ran the repack in conjunction with later commits that use these apis.

Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3249577

Signature: t1:3249577:1462225435:000f9cc29ae2a3d7fdbedf546c8936ef45d1e4cf
2016-05-03 12:32:35 -07:00
Durham Goode
b049a0910a store: datapack fix perf issue
Summary:
Using range() allocates a full list, which is 2**16 entries in the fanout case.
Let's use xrange instead. This is a notable performance win when checking many
keys.

Also removed an unused variable and use index instead of self._index since this
is a hotpath.

Test Plan: Ran hg repack

Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3249563

Signature: t1:3249563:1462240834:c19d6cbf0b6237f15ca8d81e8da856752df0ec59
2016-05-03 12:30:44 -07:00
Durham Goode
1704e5c8fb store: add tests for historypack
Summary:
This adds a basic test suite for the historypack class, and fixes some issues it
found.

Test Plan: ./run-tests.py test-historypack.py

Reviewers: mitrandir, rmcelroy, ttung, lcharignon

Reviewed By: lcharignon

Differential Revision: https://phabricator.intern.facebook.com/D3237858

Signature: t1:3237858:1461884966:c0ec90a2735255e5ef70eade09915066a7b71ee5
2016-04-28 17:37:03 -07:00
Durham Goode
8f4d83edeb shallowbundle: fix broken fallback orig call
This was caught by tests running in an unusual configuration
2016-04-28 17:34:08 -07:00
Durham Goode
22948ce7e1 checkcode: add check code test
Summary: Adds the same check code test that upstream Mercurial uses.

Test Plan:
Ran it, and fixed all the failures. I won't land this commit until
all the failure fixes are landed.

Reviewers: #sourcecontrol, ttung, rmcelroy, wez

Reviewed By: wez

Subscribers: quark, rmcelroy, wez

Differential Revision: https://phabricator.intern.facebook.com/D3221380

Signature: t1:3221380:1461802769:19f5bdc209c05edb442faa70ae572ce31e2fbc95
2016-04-28 10:18:47 -07:00
Durham Goode
29d3dda67e checkcode: fix various store files
Summary: Fix check code for various store related files

Test Plan: Ran the tests

Reviewers: #sourcecontrol, mitrandir, ttung

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3222465

Signature: t1:3222465:1461701300:34560288be4dc921f0252d4ad8fdc9c8d9357e23
2016-04-27 16:49:33 -07:00
Durham Goode
98fd33f8cb store: add missing imports
Summary: These were missing, and only needed in exception cases.

Test Plan: nope

Reviewers: #sourcecontrol, rmcelroy, ttung

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3219749

Signature: t1:3219749:1461608742:91e3a721e78188c52431b6c5d1b3ad091e249c3a
2016-04-27 16:49:30 -07:00
Durham Goode
f92668636b store: add historypack store that reads histpack files from .hg/store/packs
Summary:
Now that we can read and write histpack files, let's add a store implementation that
can serve packed content.

My next set of commits (which haven't been written yet) will:
- add tests for all of this

Test Plan:
Ran the tests. Also repacked a repo, deleted the old cache files,
ran hg log FILE, and verified it produced results without hitting the network.

Reviewers: #sourcecontrol, ttung, mitrandir, rmcelroy

Reviewed By: mitrandir, rmcelroy

Subscribers: rmcelroy, mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3219765

Signature: t1:3219765:1461717992:9b2e8646c0555472fa00ee7059c0f283fd4c2c65
2016-04-27 16:49:27 -07:00
Durham Goode
18cde8ba89 store: add a historypack class that can read histpacks
Summary:
The previous patch added logic to repack store history and write it to
a histpack file. This patch adds a pack reader implementation that knows how to
read histpacks.

Test Plan:
Ran the tests.  Also tested this in conjunction with the next patch
which actually reads from the data structure.

Reviewers: #sourcecontrol, ttung, mitrandir, rmcelroy

Reviewed By: mitrandir, rmcelroy

Subscribers: rmcelroy, mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3219764

Signature: t1:3219764:1461718081:9d812b6aea87fe9eb48fdac9dbef282e4775c3c9
2016-04-27 16:49:24 -07:00
Durham Goode
f22bae206b store: add a historypack format and a repacker for it
Summary:
This is an initial implementation of a history pack file creator and a repacker
class that can produce it. A history pack is a pack file that contains no file
content, just history information (parents and linknodes).

A histpack is two files:

- a .histpack file consisting of a series of file sections, each of which
  contains a series of revision entries (node, p1, p2, linknode)
- a .histidx file containing a filename based index to the various file sections
  in the histpack.

See the code for documentation of the exact format.

Test Plan:
ran the tests.  A future diff will add unit tests for all the new pack
structures.

Ran `hg repack` on a large repo. Verified pack files were produced in
.hg/store/packs. In a future diff, I verified that the data could be read
correctly.

Reviewers: #sourcecontrol, mitrandir, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: mitrandir, rmcelroy, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3219762

Signature: t1:3219762:1461751982:e7bbc65e8f01c812fc1eb566d2d48208b0913766
2016-04-27 16:49:21 -07:00
Durham Goode
f17f6cc093 store: add revisions to datapack in alphabetical order
Summary:
This forces the revisions in the datapack to be added in alphabetical order.
This makes the algorithm more deterministic, but otherwise has little effect.

Test Plan: Ran the tests, ran repack

Reviewers: #sourcecontrol, rmcelroy, ttung

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3219760

Signature: t1:3219760:1461687720:7be5fdc1419f8214c8c83074494b33214b3684ae
2016-04-27 16:49:18 -07:00
Durham Goode
43ed70b6f1 store: add datapack store that reads pack files from .hg/store/packs
Summary:
Now that we can read and write datapack files, let's add a store implementation that
can serve packed content. With this patch, it's technically possible for someone
to prefetch and repack large portions of history for long term storage with
remotefilelog.

My next set of commits (which haven't been written yet) will:
- add tests for all of this
- add an indexpack format for packing ancestor metadata (the datapack only packs
  revision content)

Test Plan:
Ran the tests. Also repacked a repo, deleted the old cache files, ran
hg up null && hg up master, and verified it checked out master with the
right files and without fetching blobs from the server.

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3205351

Signature: t1:3205351:1461751649:45a56b57d962a282aeef9478500a3b23495a0eb7
2016-04-27 16:49:15 -07:00
Durham Goode
56c83ea072 store: add a datapack class that can read datapacks
Summary:
The previous patch added logic to repack store contents and write it to a
datapack file. This patch adds a new store implementation that knows how to read
datapacks.

It's just a simple implementation without any parallelism. So there's room for
improvement.

Test Plan:
Ran the tests.  Also tested this in conjunction with the next patch
which actually reads from the data structure.

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3205342

Signature: t1:3205342:1461750967:84377517cb1f285d37694a3f503d60ae85bacb66
2016-04-27 16:49:12 -07:00
Durham Goode
510ac021f3 store: add a basic repack and datapack format
Summary:
This is an initial implementation of a repack algorithm that can read data from
an arbitrary store (in this case the remotefilelog content store), and repack it
into a datapack.

A datapack is two files:

- a .datapack file consisting of a series of deltas (a delta may be a full text if the delta base is the nullid)
- a .dataidx file consisting of delta information and an index into the deltas

See the code for documentation of the exact format.

Test Plan:
ran the tests

Ran `hg repack` in a large repo. Verified that a datapack and a dataidx file
were created in .hg/store/packs. The datapack used 148MB instead of the 439MB the
old remotefilelog storage used.

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3205334

Signature: t1:3205334:1461751366:ee4bf6a580ffb667071a8046fda6f0858b7f25ae
2016-04-27 16:49:09 -07:00
Durham Goode
f362c9a3a8 store: add getfiles() api to store
Summary:
This adds a api to the store contract that allows the store to return a list of
the name/node pairs that it contains. This will be used to allow a repack
algorithm to list the contents of the store so it can repack it into another
store. The old remotefilelog blob store used namehash+node keys, which is
different from the new store API's name+node keys, so the getfiles()
implementation here has to perform a reverse  namehash->name lookup so it can
satisfy the store API contract.

In the remotefilelog basestore implementation, it reads the file names from the
local data directory and the shared cache directory, and reverse resolves the
file name hashes into filenames to produce the list.

Test Plan: ran the tests

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3205321

Signature: t1:3205321:1461751437:a7c44c2bbe153122a3b85b8d82907a112cf77b1a
2016-04-27 16:49:06 -07:00
Durham Goode
438db1be81 store: allow union metadatastore to combine ancestors from many stores
Summary:
The old store api required that each store be able to return the complete
ancestor history for a given name/node pair. This patch allows a store to return
only the parts of history it knows about, and the union store will combine that
history with the history from other stores to produce the full result. This is
useful for stores like bundle files, where they contain only a partial history
that needs to be annotated by the real store.

Test Plan: ran the tests

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3205319

Signature: t1:3205319:1461751511:210740b82cc6767b2f0c393715ac93d8f1b96bc7
2016-04-27 16:49:04 -07:00
Durham Goode
cce75d4663 store: add concept of delta chain to content store
Summary:
The old store contracts required that every store be able to produce the full
text for a revision. This patch modifies the contract so that a store (like a
bundle file store) can serve a delta chain and the union store can combine delta
chains from multiple stores together to create the final full text.

Test Plan: ran the tests

Reviewers: #sourcecontrol, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.fb.com/D3205315

Signature: t1:3205315:1461669845:3eb8968566285f6221c7c44435b855cc65da33f4
2016-04-26 15:10:38 -07:00
Durham Goode
7e1047d11f store: change union stores to accept a list of stores
Summary:
Instead of hard coding the list of stores in each union store, let's make it a
list and just test each store in order. This will allow easily adding new stores
and reordering the priority of the existing ones.

Also fix the remote store's contains function. 'contains' is the old name, and
it now needs to be getmissing in order to fit the store contract.

Test Plan: ran the tests

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Differential Revision: https://phabricator.fb.com/D3205314

Signature: t1:3205314:1461606028:3a513ac82c5de668a7e40bbf7cc88d8754e2f0bb
2016-04-26 15:10:38 -07:00
Durham Goode
9cfbf5a59e store: keep track of the writable store instead of hard coding it
Summary:
A future patch is going to change the union store to just contain an ordered
list of stores. Therefore we need a special spot to record which store is the
one that should receive writes.

Test Plan: ran the tests

Reviewers: #sourcecontrol

Differential Revision: https://phabricator.fb.com/D3205307
2016-04-26 15:10:38 -07:00
Durham Goode
c3c047f0b7 Move sortnodes into shallowutil
Summary:
This is a generic topological sort and will be useful in the upcoming repacking
code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3204124

Signature: t1:3204124:1461260520:e1cb5c9d496f11e5f44e0cdbc5ba851b1573d2e1
2016-04-26 15:10:38 -07:00
Durham Goode
84bc49f25d checkcode: fix shallowrepo, shallowutil, and setup.py
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3221375

Signature: t1:3221375:1461648312:7dbdd59e6370cb32b90d864a623d8066028741e7
2016-04-26 13:00:31 -07:00
Durham Goode
3817826242 checkcode: fix remotefilelogserver and shallowbundle
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3221373

Signature: t1:3221373:1461648284:23203c17f4a87e33ff4e9be17a8b99bddbcdff05
2016-04-26 13:00:31 -07:00
Durham Goode
39d350996f checkcode: fix remotefilectx and remotefilelog
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3221371

Signature: t1:3221371:1461648217:e9702d761ab8fd6f85dee60a4c192cf25e784f11
2016-04-26 13:00:31 -07:00
Durham Goode
859510b65e checkcode: fix fileserverclient.py
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3221369

Signature: t1:3221369:1461648197:185cbbba61a9d1a7a1beacd64153185d0d0826ed
2016-04-26 13:00:31 -07:00
Durham Goode
71bd8c2561 checkcode: fix errors in cacheclient and debugcommands
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3221366

Signature: t1:3221366:1461648117:088f3a5837393499e1a383af860bd1a935e0cba7
2016-04-26 13:00:31 -07:00
Durham Goode
495a853d78 checkcode: fix __init__.py
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3221365

Signature: t1:3221365:1461646159:efeb0478c66cbd49d4a0a6c02a79d530b42f8248
2016-04-26 13:00:31 -07:00
Jun Wu
ead8969797 Fix missing errno import
Summary: Apparently we need to `import errno` in `shallowutil.py`

Test Plan: Code Review

Reviewers: #sourcecontrol, ttung, durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D3195117

Signature: t1:3195117:1461031210:424912a96448a2a8cb37197f006cfa95d4ab1cb1
2016-04-18 19:04:58 -07:00
Durham Goode
2d1dcb4b97 Fix missing 'grp' import 2016-04-18 11:46:06 -07:00
Durham Goode
5b2914142a Fix status returning invalid results
The recent refactor caused remotefilelog.size() to include rename metadata in
the size count, which meant the size didn't match what the rest of Mercurial
expected. This caused clean files to show up as dirty in hg status if they had a
'lookup' dirstate state and were renames.
2016-04-10 09:46:24 -07:00
Durham Goode
2e93ca187a Add byte count checking when receiving from the server
Summary:
We've received a few complaints that receivemissing is throwing corrupt data
exceptions. My best guess is that we're not receiving all of the data for some
reason. Let's add an assertion to ensure all the data is present, so we can
narrow it down to a connection issue instead of actual corrupt data.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D3136203
2016-04-05 09:50:12 -07:00
Durham Goode
24323a759c store: address code review feedback
This was meant to be part of the previous stack of commits, but I pushed the
wrong stack. This patch addresses a number of code review feedback points, the
most visible being to remain 'contains' to something else (in this case
'getmissing').
2016-04-04 16:48:55 -07:00
Durham Goode
8ca8f7f6ca stores: remove fetch logic and replace with a remote store fallthrough
The old way of fetching from the server required the base store api expose a way
for outside callers to add fetch handlers to the store. This exposed some of the
underlying details of how data is fetched in an unnecessary way and added an
awkward subscription api.

Let's just treat our remote caches as another store we can fetch from, and
require that the over arching configure logic (in shallowrepo.py) can connect
all our stores together in a union store.
2016-04-04 16:26:12 -07:00
Durham Goode
ece19111e0 ioutil: rename ioutil to shallowutil
The old name was not very descriptive. There's already a shallowutil, so let's
just use that.
2016-04-04 16:26:12 -07:00
Durham Goode
29ea8ada1e store: delete the localcache class
Now that all functionality has been moved to the new store, we no longer need
the localcache class. So let's delete it.
2016-04-04 16:26:12 -07:00
Durham Goode
ecf4378d18 store: implement gc in the new store
The last major piece of functionality that needs to be moved into the new store
is the gc algorithm. This is just a copy paste of the one that exists in
localcache.
2016-04-04 16:26:12 -07:00
Durham Goode
d70897e18c store: implement markrepo on the new store
Now that most of our storage has been moved behind the new store, let's also
move the ability to mark the repo to behind that storage abstraction.
2016-04-04 16:26:12 -07:00
Durham Goode
0dd4247520 store: make remotefilelog.ancestormap use the new store
Now that we have a metadatastore, let's use it to implement
remotefilelog.ancestormap. This gets rid of a bunch of ugly code.
2016-04-04 16:26:12 -07:00
Durham Goode
ad473d5a6b store: make remotefilelog.linknode us the new store
Now that we have the new metadatastore, let's use it to fetch the linknode
instead of parsing the data ourself.
2016-04-04 16:26:12 -07:00
Durham Goode
82bc4468ed store: make remotefilelog.renamed use the store
Now that we have a metadata store, let's switch remotefilelog.renamed to consult
it, instead of parsing the data itself.
2016-04-04 16:26:12 -07:00
Durham Goode
aba161c424 store: implement metadatastore functions
This implements the metadatastore APIs that were previously just stubs.
2016-04-04 16:26:12 -07:00
Durham Goode
8ad3ce6f41 store: change fileserviceclient to write via new store
Now that we have the new store abstraction, and now that remotefilelog.py writes
via it, let's also make fileserverclient write to the store via that API.

This required some refactoring of how receive missing worked, so we could pass
the filename down, as that is required for writing to the store.
2016-04-04 16:26:12 -07:00
Durham Goode
721f54d0df store: move remotefilelog content writing to be done via basestore
Now that we have the new store abstraction, let's route writes through it as
well.
2016-04-04 16:26:12 -07:00
Durham Goode
ffb239bdcb store: switch remotefilelog.size to use new store
Now that we can read data via the new store, let's switch remotefilelog to use
that instead of talking to the filesystem directly.
2016-04-04 16:26:12 -07:00
Durham Goode
69aff18063 store: switch remotefilelog.read to use self.revision
Now that remotefilelog.revision is implemented using the new contentstore, let's
switch remotefilelog.read to use that instead. This logic is almost identical to
what's in filelog.read
2016-04-04 16:26:12 -07:00
Durham Goode
50df2e518f store: switch remotefilelog.revision to use new store
Now that the new contentstore has get(), let's switch remotefilelog.revision to
use it instead.
2016-04-04 16:26:12 -07:00
Durham Goode
cb48cd034a store: add store data validation
The old store logic has validation for checking the data it's reading is
corrupt. Let's copy and paste that over to the new store.
2016-04-04 16:26:12 -07:00
Durham Goode
dfb49ad597 store: implement contentstore.get
This implements the basic function for fetching content data from the
remotefilelog store.
2016-04-04 16:26:12 -07:00
Durham Goode
647684cca8 store: implement basestore.contains
This implements the basic contains function that checks if the given (filename,
node) pairs are in the store.
2016-04-04 16:26:12 -07:00
Durham Goode
1d97924c54 store: construct store during repo creation
We are refactoring the storage to be behind more abstract APIs. This patch
creates the new store objects on the repo and passes them to the
fileserverclient so it can add itself as a file provider, in the case of misses.
2016-04-04 16:26:12 -07:00
Durham Goode
9c88142860 store: add union stores
Future patches will refactor the storage logic into a more abstract API. This
patch adds a union store, which will allow us to check both local client storage and
shared cache storage, without exposing the difference at higher levels.
2016-04-04 16:26:12 -07:00
Durham Goode
b62ef50278 store: add stubs for storage classes
Future patches will refactor the storage into a more abstract API. This is the
initial stubs for that API.
2016-04-04 16:26:12 -07:00
Durham Goode
492b9af06e ioutil: move helper functions to ioutil
Future patches will refactor the storage into more abstract APIs. Let's move
these utility functions out to be on their own.
2016-04-04 16:26:12 -07:00
Jun Wu
b7e6384e9c Allow repo = None in runcommand
Summary:
When running inside chg, `reposetup` will be called once since `serve` is not
a `norepo` command. Then if the user runs a `norepo` command like `help`,
`runcommand` will receive `repo = None` and error out. Fix it by checking
`repo` explicitly.

Test Plan: Run `chg help` and no exception is thrown.

Reviewers: #sourcecontrol, ttung, durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D3136328

Signature: t1:3136328:1459811387:3b86df9765aa5e20677031d6e9fc4bc3d524efa6
2016-04-04 16:22:16 -07:00
Durham Goode
f774b1b204 adjustlinknode: remove unnecessary ancestor walk
Summary:
Since we added the C code ancestor walk to this function, this python ancestor
walk is completely unnecessary, and can cause significant slow downs if none of
the ancestors are known linknodes (it walks the entire history).

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D3136150
2016-04-04 15:30:47 -07:00
Jun Wu
2ec49732fd Add shallowrepo check in wrapped log function
Summary:
Discovered by `hg log filename` in the hg-committed repo. It seems we missed
a check here.

Test Plan:
Run `hg log filename` in a non-remotefilelog repo with remotefilelog enabled
and make sure "warning: file log can be slow on large repos" is not printed.

Reviewers: #sourcecontrol, ttung, durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D3132523

Signature: t1:3132523:1459801676:bcba3bbcaf1c358ad11e8ad25c0a1d3cc2637a76
2016-04-04 13:33:28 -07:00
Kostia Balytskyi
4e61e19a3d remotefilelog: do ui.log of remotecache hit rate
Summary: We would like to utilize Martijn's logtoprocess extension to log cache hit rate.

Test Plan: None so far, will update the diff later.

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D3094765
2016-04-01 03:16:00 -07:00
Mateusz Kwapich
cc54a98956 addchangegroup: adjust for new upstream API
Summary:
addchangegropfiles doesn't take the pr function as a parameter anymore.
The upstream change https://selenic.com/hg/rev/982e3ef7f5bf

Test Plan: tests are passing now on the release branch

Reviewers: #sourcecontrol, ttung, durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D3107217

Signature: t1:3107217:1459211189:4ece7531aff6043fc3acbfe43e2f471781c25c9d
2016-03-30 14:17:49 -07:00
Augie Fackler
9eb0009839 fileserverclient: use new iterbatch() method
This allows the client to send a single batch request for all file contents
and then handle the responses as they stream back to the client, which should
improve both running time and the user experience as far as it goes with
progress.
2016-03-22 10:06:24 -07:00
Augie Fackler
86ea8ed060 commands: norepo was removed in e1563031f528
Use the decorator form instead, introduced in hg 3.1.
2016-03-03 13:40:31 -05:00
Wez Furlong
2ec314e26a remotefilelog: add separate option to validate localcache files
Summary:
We've recently had to dig into two different issues that resulted in broken
files landing in the localcache; one was due to a problem with the data source
for our cacheprocess becoming corrupt and the other was due to a failed write
(ENOSPC) causing a truncated file to be left in the local cache.

It is desirable to perform some lightweight consistency checks before we return
data up to the caller of localcache, but prior to this diff the validation
functionality was coupled to configuring a log file.

Due to the shared nature of the localcache it's not always clear cut where we
want to log localcache consistency issues, so it feels more flexible to
decouple logging from enabling checks.

This diff introduces `remotefilelog.validatecache` as a separate option that
can have three values:

* `off` - no checks are performed
* `on` - checks are performed during read and write
* `strict` - checks are performed during __contains__, read and write

The default is now `on`.

Test Plan: `./run-tests.py --with-hg=../../hg-crew/hg`

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D2941067

Tasks: 10044183, 9987694
2016-02-18 08:34:33 -08:00
Durham Goode
a7a78cda1e More robust adjustlinknode code for None srcrev's
Summary:
The srcrev passed to adjustlinknode can sometimes be None, which causes an
exception. The code that throws the exception was introduced recently as part of
taking advantage of a C fast path.

The fix is to move the srcrev check to be after the None handling.

Test Plan:
I'm not sure how to repro this naturally actually.  I tried writing
tests that did rebases of renames, but it didn't trigger.  I manually verified
it by using the debugger to insert a None for the srcrev at the beginning of
adjustlinknode

Reviewers: lcharignon, #sourcecontrol, ttung, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.fb.com/D2944899

Tasks: 10066192

Signature: t1:2944899:1455735567:c8eea240885847061239bf3df0ea59dbbd0e4858
2016-02-17 11:01:45 -08:00
Wez Furlong
fd584f7e56 remotefilelog: more graceful handling of write errors for localcache
Summary:
I debugged an issue this past week where a set of machines had exhausted the
disk space available on the partition where the local cache was situated.  This
particular tier didn't use cacheprocess, only the local cache.  There were some
number of truncated files in the local cache.

Inspecting the code here, it looks like we're using atomictempfile incorrectly.
atomictempfile.close() will unconditionally rename the temp file into place,
and we were calling this from a finally handler.

It seems safest to remove the try/finally from around this section of code and
just let the destructor trigger to clean up the temporary file in the error
path, and if we make it through writing the data, then call close and have it
move the file in to place.

Test Plan:
ran the tests.  They don't cover this case, but at least I didn't
obviously break anything:

```
 $ ./run-tests.py --with-hg=../../hg-crew/hg
...................
# Ran 19 tests, 0 skipped, 0 warned, 0 failed.
```

Reviewers: #sourcecontrol, ttung, mitrandir

Reviewed By: mitrandir

Subscribers: scyost

Differential Revision: https://phabricator.fb.com/D2940861

Tasks: 10044183

Signature: t1:2940861:1455673078:a7593d70c32151e13c8ccc31f92387e9c8cb23a0
2016-02-17 08:03:38 -08:00
Durham Goode
2cce4008b6 adjustlinknode: user C fastpath
Summary:
The adjustlinknode logic was pretty slow, since it did all the ancestry
traversal in python. This patch makes it first use the C fastpath to check if
the provide linknode is correct (which it usually is), before proceeding to the
slow path.

The fastpath can process about 300,000 commits per second, versus the 9,000
commits per second by the slow path.

This cuts 'hg log <file>' down from 5s to 2.5s in situations where the log spans
several hundred thousand commits.

Test Plan:
Ran the tests, and ran hg log <file> on a file with a lot of history
and verified the time gain.

Reviewers: pyd, #sourcecontrol, ttung, quark

Reviewed By: quark

Subscribers: quark

Differential Revision: https://phabricator.fb.com/D2908532

Signature: t1:2908532:1454718666:c4e63d73057572f035082943ef2e6fe0a49238c1
2016-02-08 14:40:07 -08:00
Simon Farnsworth
6cdf20e7ad remotefilelog: Make TortoiseHG work with remotefilelog 2016-02-05 14:53:45 +00:00
Durham Goode
16d12ec27c Remove limit on adjust linknode lookup
Previously we limited the changelog scan for old commits to the most recent
100,000, under the assumption that most changes would be within that time frame.
This turned out to not be a good assumption, so let's remove the limitation.
2016-01-27 15:56:36 -08:00
Augie Fackler
afca077cf9 fileserverclient: add option to provide file path to cacheprocess
For our uses of remotefilelog, life is significantly easier if we also
have the file path rather than just a hash of the file path. Hide this
behind a config knob so users can enable it or not as makes sense.
2016-01-27 13:22:22 -08:00
Durham Goode
4ee8e7278d changegroup: support new _packermap name
Upstream changed changegroup.packermap to be changegroup._packermap. So we need
to update accordingly.
2016-01-19 16:34:53 -08:00
Durham Goode
13c2a7823f Add alternative linkrev lookup logic
Summary:
The old linkrev lookup logic depended on the repo containing the latest commit
to have contained that particular version of the file. If the latest version had
been stripped however (like what happens in rebase --abort currently), the
linkrev function would attempt to scan history from the current rev,
trying to find the linkrev node.

If the filectx was not provided with a 'current node', the linkrev function
would return None. This caused certain places to break, like the Mercurial
merge conflict resolution logic (which constructs a filectx using only a
fileid, and no changeid, for the merge ancestor).

The fix is to allow scanning all the latest commits in the repo, looking for the
appropriate linkrev. This is pretty slow (1 second for every 14,000 commits
inspected), but is better than just returning None and crashing.

Test Plan:
Manually repro'd the issue by making a commit, amending it, stripping the
amended version and going back to the original, making two sibling commits on
top of the original, then rebasing sibling 1 onto sibling 2 (so that the
original commit that had the bad linknode data was the ancestor during the
merge). Previously this failed, now it passes. I'd write a test, but it's 11pm
and I'm tired and I need this in by early tomorrow morning to make the cut.

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: trunkagent, rmcelroy

Differential Revision: https://phabricator.fb.com/D2826850

Signature: t1:2826850:1452680293:cb8c1f8c20ce13ad632925137dbdce6e994ab360
2016-01-13 11:25:26 -08:00
Laurent Charignon
707f243248 remotefilelog: make the wrapping of dispatch.run safer
Summary:
I somehow got a stacktrace with IPython on a non-remotefilelog repo that ran
this code and complained that fileservice didn't exit. I am not sure how it
happened but let's make the call safer to match the pattern used elsewhere in
the file.

Test Plan: No stacktrace seen after that, one line change

Reviewers: durham

Differential Revision: https://phabricator.fb.com/D2819402
2016-01-11 10:48:51 -08:00
Kostia Balytskyi
9500813607 remotefilelog: removing filelog check from verification process
Differential Revision: https://phabricator.fb.com/D2812664
2016-01-07 16:57:39 -08:00
Stanislau Hlebik
33b7e1013a remotefilelog: make .hg/store/data blobs read only
Summary:
Today, people running codemods or search/replace on their repos often accidentally corrupt their repos, and everyone ends up sad.
It's better to make them read-only

Test Plan: python run-tests.py

Reviewers: rmcelroy, #sourcecontrol, durham, ttung

Reviewed By: durham

Subscribers: mitrandir, quark, durham

Differential Revision: https://phabricator.fb.com/D2807369

Tasks: 9431187

Signature: t1:2807369:1452192329:b5ed6606cb66b1c830fc3d3fb5a81e6120387b38
2016-01-07 13:37:36 -08:00
Laurent Charignon
af9917b578 remotefilelog: fix compat with core on builddeltaheader 2015-12-30 13:33:47 -08:00
Laurent Charignon
963dc28d83 compat: fix _verify wrapper
Summary:
In 4fb35d8c2105 in core @durham removed _verify and replaced it with
verify, this patch makes remotefilelog compatible with those changes.

Test Plan: The tests are failing after but don't fail on this anyore

Reviewers: ericsumner

Subscribers: durham

Differential Revision: https://phabricator.fb.com/D2791847
2015-12-28 14:58:21 -08:00
Durham Goode
cb448f683b Stop writing backup local data blobs
Summary:
Historicaly we would move the old backup data blob to <name>+<int> so we had a
record of all the old data blobs we could search though for good commit
histories.

Since we no longer require that the data blobs have perfect commit histories,
these extra blobs just take up space.

This changes makes us only store one old version (for debugging and recovery
purposes), which should save space on clients.

Also switched to atomic rename writes while we're at it.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D2770675
2015-12-17 13:02:29 -08:00
Durham Goode
c59623483f Limit checkunknown fetching to just what's in the sparse checkout
The newly added checkunknown prefetching apparently gets handed the full list of
files that are not present on disk right now, which includes all the files
outside of the sparse checkout. So we need to filter those out here.
2015-12-16 12:59:44 -08:00
Durham Goode
b3b4ddc20b Prefetch before addremove check
Summary:
When running addremove, it needs to see the contents of the removed files so it
can determine if they are a remain. So we need to add bulk prefetching in this
situation.

Test Plan: Added a test

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: dcapra

Differential Revision: https://phabricator.fb.com/D2756979

Signature: t1:2756979:1450132279:668b8b160d792cad1ac37e2069716e20ea304f57
2015-12-14 14:44:11 -08:00
Durham Goode
faccfe65d4 Add prefetching to checklookup
Summary:
During hg status Mercurial sometimes needs to look at the size of contents of
the file and compare it to what's in history, which requires the file blob.

This patch causes those files to be batch downloaded before they are compared.

There was a previous attempt at this (see the deleted code), but it only wrapped
the dirstate once at the beginning, so it was lost if the dirstate object was
replaced at any point.

Test Plan: Added a test to verify unknown files require only one fetch.

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Subscribers: dcapra

Differential Revision: https://phabricator.fb.com/D2756768

Signature: t1:2756768:1450130997:7c7101efe66c998e3182dfbd848aa6b1a57d509f
2015-12-14 14:44:08 -08:00
Durham Goode
4a5ae177bb Add prefetching for checkunknownfiles
Summary:
When doing an update, Mercurial checks if unknown files on disk match
what's in memory, otherwise it stops the checkout so it doesn't cause data loss.

We need to batch fetch the necessary files from the remotefilelog server for
this operation.

Test Plan: Added a test

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: dcapra

Differential Revision: https://phabricator.fb.com/D2756837

Signature: t1:2756837:1450132288:bc0530a07ea40aaeb2af1a93e4da82778cc11369
2015-12-14 14:49:34 -08:00
Durham Goode
b1c0840594 Remove unnecessary fallbackpath arg from getfiles
This wasn't used so we can clean it up.
2015-12-11 11:20:24 -08:00
Durham Goode
20102e4f2b Reuse ssh connection across miss fetches
Summary:
Previously we recreated the ssh connection for each prefetch. In the case where
we were fetching files one by one (like when we forgot to batch request files),
it results in a 1+ second overhead for each fetch.

This changes makes us hold onto the ssh connection and simply issue new requests
along the same connection.

Test Plan:
Some of the tests execute this code path (I know because I saw them
fail when I had bugs)

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D2744688
2015-12-11 11:18:51 -08:00
Martin von Zweigbergk
1c64f784ed make changegroup.addchangegroupfiles() overriding more flexible
The method gained a parameter in hg revision 43d86cd9dae2
(changegroup: note during bundle apply if the repo was empty,
2015-12-02).
2015-12-10 17:25:14 -08:00
Martin von Zweigbergk
7251d9b51b repo: replace repo.parents() by repo[None].parents()
repo.parents() was removed in hg revision d5d613de0f44 (commands:
inline definition of localrepo.parents() and drop the method (API),
2015-11-11).
2015-12-10 17:25:14 -08:00
Martin von Zweigbergk
8f7ee3c1b1 replace localrepo.clone() by exchange.pull()
localrepo.clone() was removed in hg revision 9996a5eb7344 (localrepo:
remove clone method by hoisting into hg.py, 2015-11-11).

Instead of localrepo.clone(), we now use exchange.pull(). However,
that method was already overridden in onetimeclientsetup(), which is
called from our new overriding of exchange.pull(). Since it should be
done first, we move that overriding from onetimeclientsetup() to
uisetup().
2015-12-10 17:25:14 -08:00
Durham Goode
2b30eeb96b Fix exception when making a directory that already exists
Summary:
There was a race condition where there could be an exception when trying to
create directories that already exist.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D2736268
2015-12-10 10:11:27 -08:00
Durham Goode
f75037000f Make gc only inspect the last week of changes
Summary:
Previously hg gc would try to keep all files relevant to all heads in the repo.
If the repo has a lot of heads, reading the manifest for all of them and
building a massive set of all the files can be extremely slow.

Let's just keep files related to the most recent public heads.

Test Plan: Ran the tests. This improves 'hg gc' time on some repos from 2 hours to 10 minutes.

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D2733157

Signature: t1:2733157:1449558332:14bbea343600959155f5927913552304ab8f94a7
2015-12-08 09:53:33 -08:00
Laurent Charignon
c89f602b7d gcclient: guard against malformed repo paths
Before this patch, gc would stop on malformed repo path. When this happens
we want to know what happened and get useful debugging information.
2015-12-02 10:40:49 -08:00
Laurent Charignon
34e5ad607d gcclient: guard against corrupted or empty repofile
Before this patch, if the repofile was empty or containing bad entries we were
just crashing. This patch prevents the crash by catching the error and displays
some interesting information to debug issues.
2015-12-02 10:40:49 -08:00
Laurent Charignon
e388dd5709 localcache: don't fail on file removal if the file is not there
If another process deletes files managed by localcache, then, the gc step would
fail. This patch prevents the failure and add interesting information to debug
the problem.
2015-12-02 10:40:49 -08:00
Durham Goode
9947ff9cc6 Allow file blobs to have imperfect history
Summary:
Attempting to maintain perfect history in the file blobs has become the most
complex, bug prone, and performance hurting aspect of remotefilelog. Let's just
drop this requirement and rely on upstream Mercurial's ability to fixup linkrevs
in the face of imperfect data.

The real solution for this class of problems is to make it so that the filelog
hashes are unique with respect to the commit that introduces them, but that's a
much harder problem.

Test Plan:
Ran the tests.

Made a commit with 1000 files changes.  hg commit went from 15s to 7.5s.  The difference will be even more dramatic for certain situations that have known to have caused problems in the past.

Reviewers: #sourcecontrol, pyd

Subscribers: rmcelroy, pyd

Differential Revision: https://phabricator.fb.com/D2686318
2015-12-01 23:49:48 -08:00
Durham Goode
5c49e2b7e4 Change server cache collection strategy
Summary:
Previously we would keep all server cache files for any head in the repo, even
if that head was really old. This resulted in unnecessarily large serve caches.

The new strategy is to keep the files necessary for any commit within the past
25,000 revs or so. Even on repo's with large commit rates this equates to
multiple weeks of time.

Test Plan: Ran the tests

Reviewers: #sourcecontrol

Differential Revision: https://phabricator.fb.com/D2652542
2015-11-13 09:56:52 -08:00
Durham Goode
eb4f7f166c Speed up log -fr master file/
Summary:
Previously, hg log -fr master file/ was very slow with remotefilelog because
Mercurial decides whether to take the slowpath (i.e. walk the changelog) or the
filelog path based on if the filelog exists in the repo.  remotefilelog has no
way to know if the filelog exists (since there's not a full list of filelogs),
so it fakes it by returning 'True' any time mercurial asks, then when the
filelog is needed, remotefilelog walks the entire changelog to build a fake
looking filelog. Therefore mercurial attempted to take the filelog path, and
remotefilelog did a very slow walk.

The fix is to force mercurial to take the slowpath when it sees 'hg log -fr
revset file'. Technically we could take the fast path by inspecting all the
results of the revset and seeing if the file/pattern exists as a file in any of
those. But that could be expensive and complicated, so this naive fix will
suffice for now.

Test Plan: Added a test. Previously it resulted in no output

Reviewers: cdelahousse, rmcelroy, #sourcecontrol

Differential Revision: https://phabricator.fb.com/D2634918
2015-11-09 16:54:14 -08:00
Aaron Kushner
fe561e382a Don't stack trace when getting children from thg and hg serve
Summary: thg and 'hg serve' stack trace when trying to view a file. The
correct fix is to walk back the changelog and look to see which was the
first one to touch the specific file. In the meantime, this makes the
graphic UIs usable.

Test Plan: ran tests

Reviewers: durham, rmcelroy

Reviewed by: rmcelroy
2015-10-25 14:05:14 +00:00
Aaron Kushner
11c9fd8e04 Remove what looks to be dead code
Summary: changectx set, but doesn't seem to be used.

Test Plan: ran tests

Reviewers: rmcelroy, durham
2015-10-25 15:32:58 +00:00
Augie Fackler
0b81082c8a remotefilelog: cope with rename of addchangegroupfiles to _addchangegroupfiles
This prevents remotefilelog from breaking with Mercurial 3.6.
2015-10-15 10:12:54 -04:00
Ryan McElroy
9943d04f51 make fileserverclient.close fully robust
Test Plan: ran tests

Reviewers: durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D2544243

Signature: t1:2544243:1444883084:a7b9cc9167a7671e34813826ba9fcd289919afd1
2015-10-14 18:49:55 -07:00
Ryan McElroy
f8360b4766 remotecache: unconditionally close process and pipes
Summary:
It is possible to mark the cache connection as closed but never close
the pipes, which leads to an error the next time the connection is opened for
use. Make sure we actually close and terminate everything when close is called.

Test Plan: ran the tests

Reviewers: #sourcecontrol, durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D2540680

Tasks: 8712950

Signature: t1:2540680:1444841805:e9fd8f21ab370a599138bd8b0c3241543418521a
2015-10-14 08:12:48 -07:00
Durham Goode
bb8c595d67 Update to work with latest Mercurial
Upstream Mercurial has made a lot of changes around streaming clones, so we need
to update remotefilelog to handle these new changes.
2015-10-13 14:17:02 -07:00
Mathias De Maré
2ddceef9c7 cacheclient: don't forget to specify the port of the memcached server 2015-09-29 07:48:58 +02:00
Durham Goode
ca8028eb16 Add kwargs to repo.sparsematch 2015-10-06 10:07:01 -07:00
Durham Goode
4eec2c3535 Add excessive fetch logging
Summary:
We've received reports of non-batch fetches that do a ton of invididual file
downloads. This patch adds logging to the blackbox for that.

Test Plan:
manually changed the code to trigger the logging and verified it came
out in the blackbox and had a warning message.

Reviewers: #sourcecontrol

Differential Revision: https://phabricator.fb.com/D2488803
2015-09-28 22:16:12 -07:00
Durham Goode
e9a9bad998 Use atomic file writes for server side cache
We've gotten reports of users receiving corrupt file blobs directly from the
server. The corruption doesn't enter the cache pool, and we don't get any
further reports of it, so I think it's a transient issue caused certain readers
reading the file before the writer has finished writing it.

Let's use atomic rename files to make this not happen.
2015-09-28 10:31:38 -07:00
Wez Furlong
6e7195b8ef Be more careful during close
I saw some crazy looking stack traces like this while testing an
improved implementation of our internal cacheprocess binary:

```
fileservice.prefetch([(self.filename, id)])
  File "/usr/lib/python2.6/site-packages/remotefilelog/remotefilelog.py", line 78, in read
  File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 357, in prefetch
    raw = self._read(hex(node))
  File "/usr/lib/python2.6/site-packages/remotefilelog/remotefilelog.py", line 283, in _read
    missingids = self.request(missingids)
  File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 196, in request
    fileservice.prefetch([(self.filename, id)])
  File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 357, in prefetch
    missingid = cache.receiveline()
  File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 105, in receiveline
    self.close()
    missingids = self.request(missingids)
  File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 76, in close
  File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 196, in request
    self.pipei.write("exit\n")
    missingid = cache.receiveline()
  File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 105, in receiveline
ValueError: I/O operation on closed file    self.close()

  File "/usr/lib/python2.6/site-packages/remotefilelog/fileserverclient.py", line 76, in close
    self.pipei.write("exit\n")
ValueError: I/O operation on closed file
```

it looks like we are somehow re-entrant (maybe referenced from multiple generators?) and get tripped
up if we're not careful about checking for or catching issues during the close() method call.

So let's be a little more careful :-)
2015-09-15 07:48:14 -07:00
Adam Simpkins
a93ebb8b1e remotefilelogserver: fix missing import
Summary:
_walkstreamfiles() uses mercurial.store.decodedir(), so
mercurial.store needs to be imported.

Test Plan:
Confirmed that _walkstreamfiles() no longer throws an exception when cloning a
remote shallow repository.

Reviewers: durham, pyd, rmcelroy

Reviewed By: rmcelroy

Subscribers: net-systems-diffs@, exa, yogeshwer

Differential Revision: https://phabricator.fb.com/D2409648

Signature: t1:2409648:1441245825:00a758f6f0884b77572078589f18592ca6cb6fa4
2015-09-02 19:04:33 -07:00
Durham Goode
fb7827372b Don't check datafiles if the matcher says everything is remote
Streaming clones were taking a while because apparently self.datafiles()
actually stats each .i file instead of just returning the list straight from
fncache. To fix this, let's not call datafiles() when we know the matcher is
going to reject everything anyways.

This significantly speeds up streaming clones.
2015-09-05 12:24:04 -07:00
Mathias De Maré
8ab8d2601b fileserverclient: clear error message if cachepath is not configured 2015-08-29 08:20:54 +02:00
Augie Fackler
226a6f1027 fileserverclient: add config knob to control batch size
Previously we'd just send one enormous batch for everything to the
server. This led to prolonged periods of no progress output for the
user. Now we send batches in smaller chunks (default is 100) which
gives the user some idea that things are working.

Includes a trivial test, which doesn't really verify that the batching
logic is used as described, but at least prevents the boneheaded error
I had in an earlier (unmailed) version of this patch which forgot to
use configint() when loading the config setting.
2015-08-18 15:14:01 -04:00
Augie Fackler
06c09f03ab fileserverclient: correctly use exception constructor
We were passing one argument instead of 3.
2015-08-18 15:35:21 -04:00
Augie Fackler
51f7cac5a7 getfile: add error reporting to getfile method
Without this, the only way to report a failure of a file load in a
batched set of getfile requests is to fail the entire batch, which is
potentially painful. Instead, add our own error reporting in-band
which the client can then detect and raise.

I'm not completely happy with the somewhat adhoc error reporting here,
but we expect our server to have at least one additional error ("not
allowed to see file contents") which will require some special
handling on our end, so we need some level of flexibility in the error
reporting protocol so we can extend it later. Sigh.

Open question: should we reserve some range of error codes so that
it's easy for strange custom servers to have related monkeypatches to
client code for custom handling of unforseen-by-remotefilelog
conditions?

I couldn't figure out how to actually get the client to try loading
file contents over http in the test, but the get-with-headers test at
least proves that the server responses look the way I expect.
2015-08-04 14:59:53 -04:00
Durham Goode
5bb4351364 prefetch: add prefetching to bundle receiving
We were not prefetching the potential dependent files for the filelog revisions
we received over the wire. This resulted in a lot of non-batched downloads,
which was super slow. This fixes it by batch downloading the parents and delta
parents of the incoming filelog revisions and adds a test.
2015-07-21 18:32:33 -07:00
Durham Goode
9152c8be08 fileserverclient: fix progress bar
A previous commit changed count to be a list, but missed the use of it when
being passed to progress. This fixes it.
2015-07-21 18:31:01 -07:00
Augie Fackler
26ab790f75 fileserverclient: mark getfile as batchable
This lets clients send many getfile requests in a single transaction.

Note that this requires 76fcf62accb0 be applied to your Mercurial, or
you'll be bitten by a bug[0] in Mercurial's wireproto batching. As a
result of this change, remotefilelog now effectively requires the
upcoming Mercurial 3.5 if you want to use a specific release.

0: http://bz.selenic.com/show_bug.cgi?id=4739
2015-06-30 17:34:01 -04:00
Augie Fackler
16310f95f3 remotefilelog: introduce new getfile method
Right now, this is a naive fetch-one-file method. The next change will
mark the method as batchable and use a batch in the client so that
many files can be requested in a single RPC.
2015-06-30 17:32:31 -04:00
Augie Fackler
adef2bd2d0 remotefilelogserver: move umask twiddling for cache into _loadfileblob
This narrows the interval during which we've modified umask, which
seems nice. Done as a separate change for clarity.
2015-06-30 16:58:15 -04:00
Augie Fackler
d2f7930f70 fileserverclient: tease out a _getfiles method
This will make it easier to detect servers that support _getfiles2 and
prefer that method when available.
2015-06-30 16:43:18 -04:00
Augie Fackler
5966446c14 remotefilelogserver: tease out a _loadfileblob method for future use
We're about to introduce a new getfiles method, so let's take this
opportunity to split out the file loading code so it'll be used in
only one place.
2015-06-30 15:02:07 -04:00
Augie Fackler
882ca8e705 remotefilelogserver: prevent getfiles from being called over http at all
This means that even old clients that fail to sniff for capabilities
before trying getfiles will get a sensible error message back from the
server.
2015-06-30 11:04:47 -04:00
Augie Fackler
4e4a3a3a7b remotefilelogserver: disable remotefilelog serving over non-ssh protocols 2015-06-29 16:34:31 -04:00
Augie Fackler
e2d021637c fileserverclient: refuse to operate on a non-sshpeer
The way the protocol is defined for getfiles interleaves reading
filenames and sending file contents, which works fine over ssh but is
incompatible with http.

This change is probably not neccessary now that remotefilelog
correctly checks for its own capability first, but it helped me debug
so I left it in for completeness.
2015-06-29 16:25:44 -04:00
Augie Fackler
dd2e200ad1 fileserverclient: sniff for remotefilelog capability before using it
This prevents clients from causing a server problem on an http server.
2015-06-29 17:33:56 -04:00
Augie Fackler
32cb84c8b7 remotefilelogserver: restrict remotefilelog capability to ssh
This only works over ssh, so let's not pretend otherwise.

A future change will ensure the capability is still advertised via ssh.
2015-06-29 17:36:25 -04:00
Augie Fackler
5a72282b12 remotefilelogserver: wrap wireproto._capabilities
If we instead wrap wireproto.capabilities, then our capabilities don't
get transmitted via the hello command, so not all clients will notice
the new capability unless we do the wrapping here.

Test output is in the test that previously demonstrated the
defect. Note that there's still a defect: we're advertising the
capability over http even though we have no hope of the getfiles
method working over http.
2015-06-29 17:35:32 -04:00
Augie Fackler
2c11d5bbf8 remotefilelog: stop declaring remotefilelog to be an hg-internal extension
The magic string 'internal' causes Mercurial to never blame
remotefilelog for being broken. I had suspected that remotefilelog
might work with 3.4, but the tests fail against 3.4.1, so I'm just
making testedwith empty.
2015-07-01 15:58:44 -04:00
Durham Goode
87ac4a0c9e Fix building revgraph across merge commits
The rev graph building code was flawed because it didn't track second parents
correctly. This was caught when someone was developing an extension and
attempted to commit a merge commit in some way.
2015-06-30 16:43:01 -07:00
Augie Fackler
5eecca9702 remotefilelog: handle the death of repo.sopener (hg change 0bbe3294361a)
repo.sopener has been deprecated since hg 2.3, and repo.svfs replaces
it. Since it's been dead for so long, let's just use svfs and call it
good enough.
2015-06-30 10:12:38 -04:00
Durham Goode
047afeff5f hooks: remove incominghook
Summary:
The incominghook was meant to pregenerate any remotefilelog blobs that were
likely to be needed shortly. Unfortunately it actually just slows down pushes,
since in large repos the hook takes longer than the push does sometimes.

So let's just remove it.

Test Plan: Apparently there were no tests for this :p

Reviewers: sid0, lcharignon, mitrandir, ericsumner, rmcelroy

Reviewed By: rmcelroy

Differential Revision: https://phabricator.fb.com/D2185894

Signature: t1:2185894:1435126819:e1e1125520411356eccff4baee31ab2938ebc0fe
2015-06-23 20:03:57 -07:00
Siddharth Agarwal
c45c59236b remove prefetch from the short help list
Summary: I really don't think it should be in this list.

Test Plan: `hg`

Reviewers: durham, #sourcecontrol, rmcelroy

Reviewed By: durham, #sourcecontrol, rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.fb.com/D1997655

Signature: t1:1997655:1429189594:aa8f355a6fc61e300f824be6b2fbd64a42dde2b5
2015-04-16 00:38:43 -07:00
Durham Goode
93e4a455ff clone: fix streaming clones
Upstream refactored the streaming clone api, so we need to adjust accordingly.
2015-05-27 17:29:34 -07:00
Durham Goode
acea316460 Fix blob generation with adjustlinkrevs
Summary:
When adjustlinkrevs got moved to the filectx upstream, we incorrectly
moved it to the remotefilectx inside remotefilelog. We don't actually use
remotefilectx on the server, so wrapping it did nothing.

The fix is to move the wrapping to be in remotefilelogserver.py so it is
executed on the server side.

Test Plan:
Did a checkout with my shallow client pointed at a full repo with no
blob cache. Verified it went quickly (minutes, instead of hours).

Reviewers: pyd

Differential Revision: https://phabricator.fb.com/D2097851
2015-05-22 21:32:12 -07:00
Durham Goode
95e9918016 sparse: remove sparse-filtered results from copy tracing
Summary:
Since we only prefetch things that are in the sparse checkout, copy tracing
(which touches everything in the manifest diff) would do individual file
downloads for every file.  Let's just remove those files from the copy tracing
check entirely since the user probably doesn't care if they're outside the
sparse checkout.

Test Plan: Added a test

Reviewers: sid0, rmcelroy, lcharignon, pyd

Differential Revision: https://phabricator.fb.com/D2083768
2015-05-18 16:08:49 -07:00
Laurent Charignon
5652bd276a Match with with latest version of core to pass the test
Summary:
Match with with latest version of core to pass the test.
There were a couple of changes in core that broke the extension, I matched
those changes to make the test pass.

Test Plan: The tests are all passing

Reviewers: durham

Differential Revision: https://phabricator.fb.com/D2053958
2015-05-07 12:50:51 -07:00
Durham Goode
b29a6b04dd Add match arg to computeforwardmissing wrapper
Upstream now has a matcher on _computeforwardmissing which will allow us to only
prefetch the necessary parts of a sparse checkout.

Since we're now being returned an iterator, we need to convert it to a list
since we iterate over it and return it.
2015-04-22 16:39:16 -07:00
Durham Goode
8bf6e4f004 sparse: make remotefilelog aware of sparse checkouts
Summary:
Previously remotefilelog would prefetch every file in a commit. With the sparse
checkout extension we want to only prefetch things in the sparse checkout.

This commit makes remotefilelog aware of the possible existence of a sparse
matcher.

Test Plan: Added tests

Reviewers: sid0, rmcelroy, pyd, lcharignon

Subscribers: kang

Differential Revision: https://phabricator.fb.com/D1967207
2015-04-02 09:58:46 -07:00
Ryan McElroy
5a769d1fd6 ajustlinknodes: check for node in nodemap
Summary:
Per @pyd's review of D1933267, we need to check for the linknode in cl.nodemap,
not in cl (whose __contains__ method only looks for revs and doesn't even check
for visibility... lolz).

Test Plan: ran tests

Reviewers: durham, sid0, pyd, ericsumner, lcharignon, davidsp, mitrandir

Reviewed By: mitrandir

Subscribers: akushner, daviser, pyd

Differential Revision: https://phabricator.fb.com/D1934941

Tasks: 6573011

Signature: t1:1934941:1427130649:b084635db9bfcd28c4d4a1bcf12a7500c06b323c
2015-03-23 09:55:23 -07:00
Durham Goode
25efa4b886 Fix adjust linknodes for ancestries with old nodes
Summary:
The new version of adjust linknodes wasn't accounting for the fact that some
ancestries contained nodes that no longer exist. Check for that before looking
for common ancestors.

The old version of this code survived by luck. We were catching KeyErrors as one
base case, and it just happens that LookupError from the changelog is also a
KeyError, so it was getting caught and eaten.

Test Plan:
We should probably add a test, but I have to leave shortly and this is pretty
broken, so we'll have to take a rain check.

Reviewers: rmcelroy, pyd, sid0

Differential Revision: https://phabricator.fb.com/D1933267
2015-03-20 18:39:38 -07:00
Ryan McElroy
1184ee707c Fix stack overflow when dealing with long file histories
Summary:
The new fixmappinglinknodes function was using recursion to traverse the file
history, but this would break for files with history that was extremely long
(stack overflow). Switch to using a manual stack approach.

Test Plan: Ran the tests (I'd added a test to cover this logic before).

Reviewers: sid0, davidsp, mitrandir, lcharignon, pyd, rmcelroy

Reviewed By: rmcelroy

Subscribers: michaelbarton

Differential Revision: https://phabricator.fb.com/D1931944

Signature: t1:1931944:1426884986:3a0ef144fb55b8c0533e5c5de90699a1823b891f
2015-03-20 14:04:40 -07:00
Siddharth Agarwal
604cebd541 make patch.trydiff wrapper more generic
Summary: I'm going to add a new parameter upstream. Make this more generic so that we don't have to try and support both the old and the new versions.

Test Plan: Ran tests with both old and new hg.

Reviewers: davidsp, rmcelroy, akushner, pyd, daviser, mitrandir, ericsumner, durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D1920172

Signature: t1:1920172:1426615175:d90bda3b3cc30f6e5f3149af82ae9e43dee39455
2015-03-17 10:56:59 -07:00
Durham Goode
c599c6ae79 Extra changes related to the previous commit 2015-03-12 15:58:46 -07:00
Durham Goode
1d96446f97 push: fix pushing multiple manifests with the same file node
Summary:
Previously remotefilelog did not produce all the necessary local data blobs
when doing a peer push/pull if the incoming changegroup had two manifests
that referred to the same file revision.  We would only create a file blob
containing the history for the first occurrence, then if the user tried to
access the file history for other occurrences they got an exception.

The fix is to add linkrev fixup logic, similar to the adjustlinkrev() method
from core Mercurial's filectx. Now, if no valid local file blob can be found, we
will compute a valid history by reading the changelog.

We might be able to write this data to disk in the future as well to prevent
having to repeatedly compute this.

Test Plan: Added a test

Reviewers: sid0, rmcelroy, pyd, mitrandir, lcharignon

Differential Revision: https://phabricator.fb.com/D1904453
2015-03-10 20:02:14 -07:00
Durham Goode
8bc01a01bc prefetching: fix computenonoverlap wrapper
The computenonoverlap function has changed upstream. Update ourselves to match
it.
2015-03-10 19:59:43 -07:00
Siddharth Agarwal
43e26aff3b shallowrepo: prefetch files before a commitctx
Summary:
For hg-git conversions we're going to cause commits without actually updating to the base. Currently, this will cause lots of individual fetches.

The test demonstrates the issue -- wihtout this patch it'll fetch the 2 files over 2 fetches, but with it it'll fetch the files over 1 fetch.

Test Plan: Ran the tests.

Reviewers: davidsp, rmcelroy, akushner, pyd, daviser, mitrandir, ericsumner, durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D1893721

Tasks: 6390769

Signature: t1:1893721:1425624679:5651f71d5023919e9321646275b681b573847c44
2015-03-05 16:06:12 -08:00
Durham Goode
8203e771e3 Fix store/data permissions to have g+w
Previously we only set the umask for shared caches. Let's set it for
.hg/store/data as well so shallow repos can be used for shared repositories.
2015-02-25 17:13:49 -08:00
Durham Goode
74c8469821 Update copy wrapping to use new upstream functions
Upstream has refactored the copy logic to compute the file lists in separate
functions, so we no longer need to compute the file lists ourselves.

Update the README's Mercurial min-version since this change depends on new APIs
inside Mercurial.
2015-01-27 19:20:47 -08:00
Durham Goode
f84dcdee5d Move _adjustlinkrev onto remotefilectx
Summary:
Upstream has moved _adjustlinkrev from being a global function to one
on the filectx. Let's do the same.

Test Plan: Ran the tests

Reviewers: mitrandir

Differential Revision: https://phabricator.fb.com/D1825043
2015-02-03 18:59:00 -08:00
Durham Goode
07359d1038 Change server blob creation to not use adjustlinkrev
Summary:
adjustlinkrev makes ancestor reading orders of magnitude slower,
so we need to avoid using it. Since adjustlinkrev already returns the linkrev in
certain cases, let's just force it to always return that during file blob
creation.

Test Plan:
Generated a few thousand blobs for www and fbcode using the old and new
methods and verified that they were byte-for-byte identical.

Reviewers: sid0, pyd, mpm, rmcelroy

Differential Revision: https://phabricator.fb.com/D1782400
2015-01-14 13:14:35 -08:00
Durham Goode
d56fa342f0 Improve error message when fallback server isn't configured
Summary:
If the remotefilelog server was not specified in the hgrc, or if the project
hgrc wasn't trusted, it would throw an obtuse error about a NoneType string.
This fixes it to give a more informative error explaining the problem.

Test Plan: Added a test

Reviewers: sid0, pyd, mitrandir, ericsumner, rmcelroy

Reviewed By: rmcelroy

Differential Revision: https://phabricator.fb.com/D1774743

Signature: t1:1774743:1420830544:5122a8e11f668ee8c35996e0f4395883a31ce8b0
2015-01-09 09:43:14 -08:00
Durham Goode
4d92ad3ed7 Add optional cache validation
Summary:
There are reports of the local cache becoming invalid when stored on disk. This
adds an option that will do some basic validation and remediation for those
entries, and log some data to disk.

This is optional, since it incurs some performance overhead. We just want to use
it long enough to track down the issue.

Test Plan: Added a test

Reviewers: sid0, pyd, ericsumner, rmcelroy, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.fb.com/D1774724

Signature: t1:1774724:1420827432:06ace9d1dc078f469e0f61ebd7f604fc3b606f6d
2015-01-08 18:59:04 -08:00
Durham Goode
5f69d8dd0b Improve error message for corrupt cache files
Summary:
We've gotten reports of corrupt cache files, and the error message is pretty
obtuse (ValueError for converting a string to an int). This refactors the size
check into a function and provides a better error message.

Test Plan: Added a test

Reviewers: sid0, pyd, mitrandir, ericsumner, rmcelroy

Reviewed By: rmcelroy

Differential Revision: https://phabricator.fb.com/D1774721

Signature: t1:1774721:1420830671:afd54dde8fdc00e08ed1c6cb73bf9fdc7fac2327
2015-01-09 09:11:06 -08:00
Durham Goode
f0548ee974 Update remotefilectx.filectx to match upstream
Upstream has changed the filectx function slightly, so we need to match it.
2015-01-09 11:56:42 -08:00
Siddharth Agarwal
8b622893dc [shallowbundle] don't drop units and reorder on the floor
Summary: We were forgetting to pass these arguments on to the child function.

Test Plan: Visual inspection.

Reviewers: durham, davidsp, rmcelroy, akushner, pyd, daviser, mitrandir, ericsumner

Reviewed By: ericsumner

Differential Revision: https://phabricator.fb.com/D1773782

Signature: t1:1773782:1420765574:d73be08ab25265e4769d8bf70671f2ea1c13f8dd
2015-01-08 17:02:37 -08:00
Durham Goode
6687d78fc7 Add introrev to remotefilectx
Mercurial upstream does some fancy stuff inside introrev now to provide the
correct introrev. It relies on having the filelog though, so we need to avoid
it. Remotefilelog has perfect history knowledge, so we can just return the
correct linkrev.
2015-01-06 09:28:16 -08:00
Durham Goode
d985df868c Atomically write local cache files
Summary:
We're seeing some weird cache corruption errors when writing the cache to disk.
My best bet is there's multiple writes colliding and causing bad data, so let's
do atomic renames.

Test Plan: Ran the test suite

Reviewers: sid0, pyd, davidsp, rmcelroy

Reviewed By: rmcelroy

Subscribers: ericsumner, mitrandir

Differential Revision: https://phabricator.fb.com/D1747190

Signature: t1:1747190:1418865586:0a07e5243dfe9c1d5ea24f81874910d1080f24e2
2014-12-17 16:36:40 -08:00
Pierre-Yves David
ee7bdd47d8 remotefilelog: "implement" rawsize too
It is part of the revlog API and some extension like tortoisehg rely on it. The
default implementation is the same as size so we can safely mimic this here.
2014-11-29 05:20:28 -08:00
Durham Goode
97d36d285b Fix rebase with changeset evolution
A recent fix to make ancestor maps work with changeset evolution actually caused
a pretty serious regression. The ancestormap validation code was returning
ancestormaps with hidden ancestors if the first commit in the history was a
hidden node. This resulted in lots of invalid ancestories being returned.

Instead we only want to allow hidden ancestors in the map if the relativeto
commit has been explicitly set to a hidden node.
2014-11-24 22:42:34 -08:00
Siddharth Agarwal
d731468f70 [bundle2] insert ourselves into the cg1packer class hierarchy and fix up the packermap
Summary: Last bits needed to get remotefilelog over bundle2 working. Includes tests.

Test Plan: Ran tests, including with `--extra-config-opt experimental.bundle2-exp=True`

Reviewers: davidsp, akushner, pyd, rmcelroy, daviser, durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D1671738

Tasks: 5568731

Signature: t1:1671738:1415676482:b9e7a1f308919526b0c41fee54d89da876518ec7
2014-11-07 18:35:52 -08:00
Siddharth Agarwal
ca3f7a704e [bundle2] rename shallowbundle to shallowcg1packer
Summary: Preparation for bundle2 support

Test Plan: Ran tests

Reviewers: pyd, akushner, davidsp, durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D1668145

Tasks: 5568731

Signature: t1:1668145:1415643197:05ea239c2eb713f82bed6ad67bcd02fad7073a1f
2014-11-07 15:39:20 -08:00
Siddharth Agarwal
74584bb934 [bundle2] support arbitrary kwargs in getlocalbundle
Summary: bundle2 adds arbitrary kwargs like `listkeys`.

Test Plan: Got further in a remotefilelog pull with bundle2.

Reviewers: pyd, davidsp, akushner, durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D1668121

Tasks: 5568731

Signature: t1:1668121:1415643137:8f85d1c32ffc00f3c7d8bf3c3179626268814a17
2014-11-07 18:31:48 -08:00
Durham Goode
3889ee7b5d Fix relative ancestor traversals for hg blame
Certain filectx constructions used the rev number of the self._changeid.  We
need to convert that to a node before using it. This was breaking blame.  I've
now added a blame test too.
2014-10-23 17:16:07 -07:00
Durham Goode
dc5a3bf415 Allow pulling from shallow bundlerepos
Bundlerepos work by providing a fake revlog layer above an existing revlog.
Since remotefilelog doesn't use revlogs for filelogs, bundlerepo's did not work.
This commit fixes it such that you can now hg pull from a bundle, as long as
that bundle is shallow (i.e. contains no file contents). This will work for the
common use case of trying to recover data from .hg/strip-backups.

For reference, shallow bundles don't contain any file data because we never
delete any file data from .hg/store/data when using remotefilelog.  Even after
the commits have been stripped.
2014-10-23 00:01:21 -07:00
Durham Goode
f9730cd521 Fix dirstate wrapping to match upstream
Upstream Mercurial commit f447144c8ada changed the dirstate.status output. This
updates remotefilelog to match that new output.
2014-10-22 12:36:53 -07:00
Durham Goode
37798a0827 Fix pull wrapping to match upstream
Upstream Mercurial has moved localrepo.pull into exchange.pull. This moves our
wrapping of that command out of shallowrepo and into __init__. Exchange is
becoming an increasingly important class, so we may want to think about moving
all exchange wrapper logic out to a separate module in remotefilelog.
2014-10-14 15:50:04 -07:00
Durham Goode
65503211ed Fix revset indexing bug and update test output
repo.revs() no longer returns an object that can be indexed, so we can't use []
on it anymore. So let's use list() on it first.

The bookmark output from upstream Mercurial has also changed, so we need to
update the tests.
2014-10-14 15:30:38 -07:00
Durham Goode
3ecee80a81 Allow ancestormap to contain hidden commits (sometimes)
Summary:
When doing 'hg unshelve foo.txt' with Changeset Evolution enabled, uncommit will
first prune the commit, then try to read the filelog history to determine if any
renames need to be undone. Since the commit is now pruned, remotefilelog fails
to find any valid histories.

This fixes it two allow hidden histories if the filectx commit is hidden. It
also tweaks remotefilectx to produce commit-relative histories when possible,
which will result in more accurate histories.

Test Plan:
Ran hg uncommit in the evolve repo that had problems before. Verified
it now worked.

Reviewers: pyd, sid0

Differential Revision: https://phabricator.fb.com/D1587306
2014-09-30 14:40:09 -07:00
Durham Goode
8a5a5330c1 Fix pullprefetch for recently landed commits
Summary:
Pull-prefetch would not download file versions from the server if the file
version already existed in the local cache or the local store data.
Unfortunately, if someone landed their commit, then later stripped their local
version, the local store data file version might become invalid and no local
cache version would exist. Meaning things like 'commit' might fail when offline.

This changes prefetch to always fetch from the server when dealing with files it
knows are from revs on the server.

Test Plan:
Added a test that makes local commits that already exist on the
server, and verifies that a pull-prefetch fetches the server file version,
despite that same version existing locally.

Reviewers: sid0, pyd, davidsp

Subscribers: orip

Differential Revision: https://phabricator.fb.com/D1607260
2014-10-09 15:20:54 -07:00
Pierre-Yves David
548b8af8b5 client: add a second argument to ResponseError
Summary:
The ResponseError exception expect a second argument. Otherwise the code
handling it crashes.

Test Plan: The handling of the response error stop crashing.

Reviewers: durham

Differential Revision: https://phabricator.fb.com/D1581574
2014-09-11 20:30:16 +02:00
Pierre-Yves David
c72eed0894 clone: have a more robust finally clause
Summary:
If the orig function crash before the fileservice is installed, the finally
clause explode, shadowing the original error. We fixes thats.

Test Plan:
  crash stopped being shadowed but crash in the finally clause.

Reviewers: durham

Differential Revision: https://phabricator.fb.com/D1581562
2014-09-11 20:08:42 +02:00
Siddharth Agarwal
5faaeedd84 [remotefilelog] fix packmeta call
Summary: API change

Test Plan: @durham ran an amend.

Reviewers: durham

Reviewed By: durham

Subscribers: durham

Differential Revision: https://phabricator.fb.com/D1569510
2014-09-22 11:38:04 -07:00
Durham Goode
c7f1c0b383 Fix committing merges
Summary:
Upstream Mercurial changed the way merging works and added
revlog.commonancestorsheads. This changes remotefilelog to implement the same
API.

Previously we were able to use ancestors.genericancestors to do the graph
traversal. Upstream Mercurial has deleted that function though (since it is now
unused), so remotefilelog must now build a temporary rev graph in order to use
the ancestors.* apis.

Test Plan: Added a test. It failed without the fix, it passes with the fix.

Reviewers: sid0, davidsp, pyd

Differential Revision: https://phabricator.fb.com/D1566787
2014-09-19 12:21:30 -07:00
Siddharth Agarwal
8d48e1e5ee fix for parsemeta API change
Summary: This was broken by recent changes.

Test Plan: Ran test suite.

Reviewers: durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D1558890

Tasks: 5170539
2014-09-16 13:28:03 -07:00
Pierre-Yves David
2c956d95e2 revert: only pre-fetch files that needs to be touched
Summary:
With recent version of mercurial (>= 3.2, 4dfcf21a6aa7), revert uses status
information to determine the files that needs to be touched. It then offer a
simple handle for extensions that needs prefetch.

Test Plan:
Ran the tests. Certain tests depended on the old revert behavior (of
prefetching everything), so they required slight changes.

Reviewers: pyd, sid0, davidsp

Differential Revision: https://phabricator.fb.com/D1551059
2014-09-08 15:20:59 +02:00
Durham Goode
580f3eaeb3 Update to match Mercurial version b8c8cacd4482
Summary:
Changegroups have been refactored upstream and we need to update our
remotefilelog monkey patching accordingly.

Also fix an issue with the tests where 'function foo()' was not considered valid
on certain systems.

Test Plan: Ran the tests

Reviewers: pyd, sid0, davidsp

Differential Revision: https://phabricator.fb.com/D1551019
2014-09-11 14:39:14 -07:00
Durham Goode
17c16cf610 Optimize pullprefetch to limit number of stats
Summary:
Previously, if pullprefetch was set, we'd perform a prefetch of the
entire manifest of the specified revs (usually the public bookmarks). This
involved stat-ing all the relevant files in the cache to see if they already
existed, which added an extra 6 seconds or so to every pull.

Now we only prefetch the files that are different from our working copy. We
assume we already have all the files that are in our working copy. This reduces
the pullprefetch overhead significantly.

Test Plan:
Did a pull on my laptop. Verified it didn't hang for 6 seconds at the
prefetch stage. Also updated a test

Reviewers: davidsp, pyd, sid0

Reviewed By: sid0

Differential Revision: https://phabricator.fb.com/D1505841

Tasks: 4608894
2014-08-19 09:33:31 -07:00
Durham Goode
e46cd0e8e0 Merge heads 2014-08-07 10:23:18 -07:00
Durham Goode
e5228d9989 Fix pullprefetch that uses bookmarks
Summary:
Previously, pullprefetch was executed during the repo.pull stage. This happens
before the bookmarks have been moved, so revsets like 'bookmark()' would
prefetch the wrong commits.

This change moves the pullprefetch logic to after the pull command is completely
finished.  Updated a test to make sure this is caught.

Also fixes a bug where we were using linkrevs to read a manifest rev entry. We
should be using the manifest rev instead.

Test Plan: Added a test. Ran it.

Reviewers: sid0, pyd, davidsp

Differential Revision: https://phabricator.fb.com/D1483345
2014-08-06 18:50:57 -07:00
Siddharth Agarwal
07a515c430 don't show remotefilelog commands in the shortlist
Summary: These commands (well, not the debug one) were visible in the shortlist that showed up when you type `hg`. They're not basic commands.

Test Plan: Ran `hg` with the extension enabled, didn't see those commands.

Reviewers: durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D1454931
2014-07-23 20:37:48 -07:00
Durham Goode
c44433c62c Fix hg log on patterns
Summary:
Due to a change in upstream mercurial, hg log with patterns was no longer
working. This fixes it by forcing hg log to take the slow path when using
patterns.

It also updates the warning messages to work when running hg log <file> from
within a subdirectory.

Test Plan: Ran the new tests

Reviewers: sid0

Differential Revision: https://phabricator.fb.com/D1450193
2014-07-22 12:55:29 -07:00
Durham Goode
13058fb30c Allow auto-prefetching during pulls
Summary:
Adds a remotefilelog.pullprefetch config options that accepts a revset. Whenever
a pull is run, the revs matched by that revset will be prefetched. The most
common value for this will be '(bookmark() + heads(all())) & public()', since it will download
almost everything necessary to work offline.

Test Plan: Added a test. Ran it.

Reviewers: davidsp, pyd, sid0

Reviewed By: sid0

Differential Revision: https://phabricator.fb.com/D1419420
2014-07-03 13:05:11 -07:00
Siddharth Agarwal
f662120645 merge 2014-06-21 16:06:06 -07:00
Siddharth Agarwal
0d248aa73f applyupdates: update for Mercurial changes
Summary: Update for Mercurial commits 1b6040917a6c anmd 9b42f49d06aa.

Test Plan: Ran the tests

Reviewers: durham, dschleimer, pyd, akushner, davidsp

Reviewed By: davidsp

Differential Revision: https://phabricator.fb.com/D1388563

Tasks: 4533623
2014-06-17 15:47:12 -07:00
Durham Goode
e6bee07496 Expand environment variables in cacheprocess and cachepath
Summary:
Expands environment variables in the cacheprocess and cachepath config options,
so users can specify something like remotefilelog.cachepath=$HOME/.hgcache

Test Plan:
Set my cachepath to $HOME/.hgcache on my laptop and manually
performed a shallow clone.  Verified data was put in ~/.hgcache

Reviewers: sid0

Differential Revision: https://phabricator.fb.com/D1342174
2014-05-21 12:28:03 -07:00
Siddharth Agarwal
ef8674624a Fix shallowbundle.getbundle for local non-remotefilelog repositories
Summary: Pulling from a local non-remotefilelog repo to a remotefilelog repo was broken. This fixes it.

Test Plan: `hg pull` from a local non-remotefilelog repo to a remotefilelog repo.

Reviewers: durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D1341059
2014-05-20 20:56:19 -07:00
Durham Goode
c5b2f574a0 Fix changegroup wrapping with new upstream Mercurial
Summary:
Recent changes to upstream Mercurial have moved localrepo.getbundle and
localrepo.addchangegroupfiles to changegroup.py.  remotefilelog wraps these
functions, and thus needs to be updated.

Applyupdate also had a function signature change, which is fixed here.

Minor fix to a test as well, which had a hard coded time instead of a glob.

Test Plan: ./run-tests.py --with-hg=/data/users/durham/hg/hg

Reviewers: sid0, davidsp, pyd, dschleimer

Differential Revision: https://phabricator.fb.com/D1260737
2014-04-04 15:55:06 -07:00
Durham Goode
0237412d94 Fix shallow clones using getbundle protocol
Preivously shallow clones only work using the streaming clone protocol. With
this change they work for the standard getbundle protocol as well. This is what
the majority of Mercurial users use, so we need to support that.
2014-02-24 22:19:15 -08:00
Durham Goode
0301f9f129 Move local cache logic into it's own class
The current local cache is just files on disk, and this implementation detail
was spread across the extension. This change refactors it to hide the
implementation inside a class so that we can replace it with other
implementations (such as a sqlite local cache) later.
2014-02-11 16:25:55 -08:00
Durham Goode
bdea38dd56 Move fileservice to be per repo instead of global
Previously the file service client was a global object that all repos could
share. This was a bit hacky and is no longer needed. Now the file service
client exists per repo instance.

This is part of a series of changes to abstract the local caching and remote
file service in such a way that we can plug and play implementations.
2014-02-11 14:41:56 -08:00
Durham Goode
9eda4f7a0f Fix fallback when memcache process exits unexpectedly
If the memcache process exited early, remotefilelog was throwing an exception
instead of falling back to the server. This change makes it fall back to the
server, and also print a warning that the cache connection closed early.
2014-01-09 11:41:12 -08:00
Durham Goode
fc3a887712 hg bundle produces full sized bundles
Summary:
hg bundle was producing shallow bundles. This change makes it produce full
sized bundles so they can be used in other repos.

Test Plan: Added a test

Reviewers: sid0

Reviewed By: sid0

CC: keegancsmith

Differential Revision: https://phabricator.fb.com/D1167462
2014-02-10 16:13:41 -08:00
Durham Goode
92d01b616c Allow readonly access to remotefilelog cache
Summary:
Previously requesting remotefilelog file blobs from the server required write
access in order to write the blob to the cache. This changes it to not abort
entirely if the user doesn't have write access.

Test Plan:
cd tests
./run-tests.py --with-hg=/data/users/durham/hg/hg test-permissions.t
Also ran the test without the fix and verified it fails.

Reviewers: sid0, davidsp, pyd, dschleimer

Reviewed By: dschleimer

Differential Revision: https://phabricator.fb.com/D1145976

Task ID: 3601184
2014-01-27 17:09:48 -08:00
Durham Goode
106035959b Add prefetch command to remotefilelog
Summary:
Adds a 'hg prefetch' command to remotefilelog for prepopulating the
local cache.  Supports specifying revsets and file patterns to limit what is
downloaded.

Test Plan: ./run-tests.py test-prefetch.t --with-hg=/data/users/durham/hg/hg

Reviewers: dschleimer, sid0, davidsp, pyd, mpm

CC: kunalb, minyoung

Differential Revision: https://phabricator.fb.com/D1129942
2014-01-15 13:41:29 -08:00
Durham Goode
f76b0f894c Fix looking up double digit alternates
The alternate lookup code was mistakening looking for only the last digit
instead of looking at the entire prefix. This meant files with more than 10
alternates would start failing to find histories, which breaks rebase.
2014-01-09 11:40:39 -08:00
Durham Goode
16a7f940d5 Increase batch request size
When falling back to the master server for cache misses, we only kept two
requests in flight at any time. Over high latency connections (like across
oceans) this resulted in very slow downloads.

This change increases the request size to 10,000 keys at once. This will keep
the size of the request lower than the tcp buffer size, while allowing us to
maximize our throughput.
2013-12-17 14:31:21 -08:00
Durham Goode
285ad01336 Handle the case where the alternates directory doesn't exist yet 2013-12-13 17:14:55 -08:00
Durham Goode
688d0f9594 Fix debugremotefilelog command 2013-12-13 11:42:50 -08:00
Durham Goode
4e8c3b941d Fix broken alternates lookup 2013-12-13 11:21:51 -08:00
Durham Goode
17f5a0d712 Fix issues with hg pulling from svn 2013-12-12 12:34:39 -08:00
Durham Goode
4d6f31837e Fix hang when manifest size is greater than tcp buffer
Previously we sent the entire list of files to the fallback repo in a single ssh
write/flush.  If the size of this write exceeded the tcp buffer on the receiving
end, the call would hang until the buffer had room.  The problem is that the
receiving end (the server) is hung trying to send data back to the
client. Therefore it deadlocked.

The fix is to send and receive requests one at a time. We always have the next
request in flight while receiving so we shouldn't be waiting on requests too
often.
2013-12-11 13:39:53 -08:00
Durham Goode
393958c76b Allow naming repos
Enables specifying a name for a repo that is used in the cache key.
This allows multiple repos on a machine to share a cache without the
risk of keys overlapping.
2013-08-15 11:00:51 -07:00
Durham Goode
85e48b58fd Move server and debug logic into their own files
__init__.py was getting quite large. This change moves the server and debug
logic into their own files.  Client-side logic remains in __init__.py
2013-11-25 16:36:44 -08:00
Durham Goode
d9d4477013 Remove global variable for tracking shallow remotes
Previously we used a global variable to track if the incoming connection was
from a shallow remote (based on if the network command was a *_shallow command).
This is hacky and overall a bad idea. The new implementation stores the shallow
flag as a bundlecapability passed to the getbundle command.

A side effect of this is remotefilelog won't work with versions of mercurial
that don't use the getbundle command.
2013-11-25 14:22:56 -08:00
Durham Goode
b88d1b44d4 Replace linknode fallback algorithm
The previous algorithm thought that if the system cache had the file rev, it was
guaranteed to be valid. This isn't true in the case of a machine in which
multiple people share the cache (one person may have pulled a rev but the other
hasn't).

The new algorithm is more explicit. It checks:

- system cache
- local cache
- local cache fallbacks
- remote cache
- master server
2013-11-22 13:41:54 -08:00
Durham Goode
e5f5e3244b Add more comments explaining various complexities 2013-11-05 17:19:59 -08:00
Durham Goode
24ce0242d7 Add example cache client implementation
Adds a cache client implementation using the opensource python-memcached
library. It's more of an educational example than a production ready one since
it doesn't perform the requests asynchronously.  It does however split up large
files into smaller chunks for you.
2013-10-17 14:18:23 -07:00
Durham Goode
18baf608df Remove unused time and traceback imports 2013-10-16 13:40:25 -07:00
Durham Goode
d122f76e5b Add readme and GPL info 2013-10-15 17:20:12 -07:00
Durham Goode
1275d15990 Add include and exclude configuration settings
The remotefilelog extension currently doesn't work with tags. Adding include and
exclude patterns allows users to specify which files they want to treat as
shallow and which the want to download the entire history for. By excluding
.hgtags from being shallow, this enables tags to work in a mostly shallow repo.

This also enables largefile like scenarios where most files are full and only a
few large ones are kept remote.
2013-09-26 10:46:06 -07:00
Durham Goode
5a628dc440 Fix linknode test failure 2013-10-09 10:20:47 -07:00
Durham Goode
3c6137f555 Fix revert prefetch causing excess output 2013-10-07 17:13:00 -07:00
Durham Goode
b47e016320 Replace linknode recovery tests with a real world test 2013-10-04 14:40:47 -07:00
Durham Goode
7268e5b709 Refactor ancestormap linknode logic to handle a bug
A rare bug can occur where the local file blob might not exist, but a valid old
version of that blob does exist. This refactor the linknode logic in ancestormap
to check the old versions if the server fetch fails to find the blob.

It still prints an ugly warning message from the server, but this whole issue is
quite rare anyway.
2013-10-03 15:15:15 -07:00
Durham Goode
ab72a92e85 GC server cache and add GC tests 2013-10-02 16:21:48 -07:00
Durham Goode
be29ee042a Fix reverting from non-root directories 2013-10-02 09:45:52 -07:00
Durham Goode
335e1d1bfc Prefetch before revert 2013-09-17 11:24:31 -07:00
Durham Goode
2ccd88dfcd Support new mercurial _basesupported 2013-10-01 15:11:57 -07:00
Durham Goode
efdfcc1502 Send all available data during a pull 2013-09-19 16:22:14 -07:00
Durham Goode
3667c253fd Refresh changelog during getfiles loop 2013-09-19 15:56:26 -07:00
Durham Goode
cf9d751d8a Add remotefilelog debug commands 2013-09-17 20:15:08 -07:00
Durham Goode
6a8a2f0e58 Fix rare issue with broken linknodes in the ancestormap 2013-09-16 18:46:24 -07:00
Durham Goode
f480c7deef Remove remotefilectx.__str__
Recent changes to Mercurial mean this is implemented by a base class.
2013-09-11 12:29:01 -07:00
Durham Goode
4ce55b8a0f Add log file warning 2013-09-11 10:27:56 -07:00
Durham Goode
6781d80d25 Fix local pulls to send file data 2013-09-09 11:44:08 -07:00
Durham Goode
6acb5968a1 Clean up empty cache files if we encounter them 2013-09-09 11:23:03 -07:00
Durham Goode
3619a1911d Cut down number of sys calls during filelog reads
When the cache is stored on a filesystem, excessive stat calls can slow
mercurial updates down dramatically. This reduces it to a single open call for
the cache location and if that fails, a single open call for the local location.
2013-09-09 10:23:29 -07:00
Durham Goode
c17ec690c9 Change cache key to use a two character prefix for directories.
Some file systems can't handle having a ton of files/directories inside a
directory, so this splits up all our files amongst directories.
2013-09-06 13:28:15 -07:00
Durham Goode
4d70ed4fce Fix a bug with status prefetching in merge scenarios 2013-09-04 19:07:01 -07:00
Durham Goode
29ba0e9bc1 If cacheprocess is not set, always use the fallback
This allows tests to run without a memcache process.
2013-09-03 20:03:24 -07:00
Durham Goode
4a5c8d437d Fix hg diff when fnode is None 2013-09-03 11:39:16 -07:00
Durham Goode
5ec22c7093 Prevent verify from checking filelogs 2013-08-30 15:43:22 -07:00
Durham Goode
b685d98f57 Prefetch revisions before a diff 2013-08-30 11:27:09 -07:00
Durham Goode
4edeed8417 Prefetch lookup set during hg status 2013-08-30 11:09:19 -07:00
Durham Goode
3c879ed1a8 Enable efficient pulling between shallow repos 2013-08-28 18:51:01 -07:00
Durham Goode
d0738cc010 Make cache files owned by uid/svnuser 2013-08-20 12:59:33 -07:00
Durham Goode
96bbab8f7a Fix shared cache permissions to be g+w 2013-08-15 10:59:11 -07:00
Durham Goode
f68d704603 Enable hg gc from outside a repo 2013-08-15 10:56:25 -07:00
Durham Goode
bf7491936d Fix hg diff with added or moved files.
A workingctx produces manifest entries with nullid+'a' or nullid+'m'
for any added or modified files. The extension was trying to prefetch
these but they didn't exist and caused an error. Luckily they are length
42 so we can check for them and not prefetch them.
2013-07-24 22:16:50 -07:00
Durham Goode
a5828ce7a3 Add newline to end of debug output 2013-07-24 18:49:14 -07:00
Durham Goode
9df6e83354 Prevent 'running ssh...' in stdout when run with -v 2013-07-24 13:20:13 -07:00
Durham Goode
3cbc732b42 Fix fallbackrepo not being present during the clone after update.
Make debug message get sent to stderr instead of stdout.
2013-07-23 19:06:40 -07:00
Durham Goode
77d31b12e4 Add hit/miss ratio to debug output 2013-07-01 17:37:55 -07:00
Durham Goode
9642a8a2d6 Add remotefilelog.fallbackrepo config 2013-07-01 16:28:34 -07:00
Durham Goode
58ff8f91f6 Prefetch before copy tracing 2013-07-01 15:35:08 -07:00
Durham Goode
027a1d4ab8 Set umask before writing files to shared cache 2013-06-28 17:12:20 -07:00
Durham Goode
8e037436cb Add gc command for cleaning up the cache 2013-06-28 15:57:15 -07:00
Durham Goode
6e3494bf98 Add incoming hook for producing file blobs 2013-06-27 15:14:22 -07:00
Durham Goode
1ac9b8cbc1 Move requirement string to a variable 2013-06-26 14:37:59 -07:00
Durham Goode
3e6b7810df Override bundle10.generatefiles instead of prune 2013-06-25 13:26:24 -07:00
Durham Goode
bb32b111bf Change contract between extension and memcache process to allow arbitrary key lengths and customizable cache paths 2013-06-25 11:38:48 -07:00
Durham Goode
6536d87bc0 Prevent pull from sending files to shallow clones 2013-06-23 13:50:22 -07:00
Durham Goode
84b481de56 Add option for server cache location.
Change _callstream wrapper to only run on client.
2013-06-21 13:22:18 -07:00
Durham Goode
f16a3a4134 Rename to remotefilelog since shallowrepo is already taken 2013-06-21 10:14:29 -07:00