Commit Graph

318 Commits

Author SHA1 Message Date
Durham Goode
c429030ca4 store: add class definitions and stub for repack
Summary:
This introduces the high level classes that will implement the generic repack
logic.

Test Plan: Ran the repack in conjunction with later commits that use these apis.

Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3249577

Signature: t1:3249577:1462225435:000f9cc29ae2a3d7fdbedf546c8936ef45d1e4cf
2016-05-03 12:32:35 -07:00
Durham Goode
6d68bef157 store: add perf test
Summary:
Adds an optional test for performance profiling. Using this test I was
able to see the difference between bisect, interpolation search, and the fanout
table on the index lookup times, and determined the fanout table was about 10x
faster on packs with 1 million objects.

It also taught me that the 2^16 fanout table is very inefficient for small packs
(cpu wise), so we should allow it to be configurable.

Test Plan: Ran it

Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3249574

Signature: t1:3249574:1462240719:b8895f79895286613261cd2d4314c214688558eb
2016-05-03 12:32:16 -07:00
Durham Goode
b049a0910a store: datapack fix perf issue
Summary:
Using range() allocates a full list, which is 2**16 entries in the fanout case.
Let's use xrange instead. This is a notable performance win when checking many
keys.

Also removed an unused variable and use index instead of self._index since this
is a hotpath.

Test Plan: Ran hg repack

Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3249563

Signature: t1:3249563:1462240834:c19d6cbf0b6237f15ca8d81e8da856752df0ec59
2016-05-03 12:30:44 -07:00
Durham Goode
1704e5c8fb store: add tests for historypack
Summary:
This adds a basic test suite for the historypack class, and fixes some issues it
found.

Test Plan: ./run-tests.py test-historypack.py

Reviewers: mitrandir, rmcelroy, ttung, lcharignon

Reviewed By: lcharignon

Differential Revision: https://phabricator.intern.facebook.com/D3237858

Signature: t1:3237858:1461884966:c0ec90a2735255e5ef70eade09915066a7b71ee5
2016-04-28 17:37:03 -07:00
Durham Goode
8f4d83edeb shallowbundle: fix broken fallback orig call
This was caught by tests running in an unusual configuration
2016-04-28 17:34:08 -07:00
Durham Goode
0244f5460a store: add unit tests for datapack classes
Summary: This adds some basic unit tests for creating and reading from datapack files.

Test Plan: ./run-tests.py test-datapack.py

Reviewers: mitrandir, rmcelroy, lcharignon

Differential Revision: https://phabricator.intern.facebook.com/D3233181
2016-04-28 15:00:34 -07:00
Durham Goode
22948ce7e1 checkcode: add check code test
Summary: Adds the same check code test that upstream Mercurial uses.

Test Plan:
Ran it, and fixed all the failures. I won't land this commit until
all the failure fixes are landed.

Reviewers: #sourcecontrol, ttung, rmcelroy, wez

Reviewed By: wez

Subscribers: quark, rmcelroy, wez

Differential Revision: https://phabricator.intern.facebook.com/D3221380

Signature: t1:3221380:1461802769:19f5bdc209c05edb442faa70ae572ce31e2fbc95
2016-04-28 10:18:47 -07:00
Durham Goode
29d3dda67e checkcode: fix various store files
Summary: Fix check code for various store related files

Test Plan: Ran the tests

Reviewers: #sourcecontrol, mitrandir, ttung

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3222465

Signature: t1:3222465:1461701300:34560288be4dc921f0252d4ad8fdc9c8d9357e23
2016-04-27 16:49:33 -07:00
Durham Goode
98fd33f8cb store: add missing imports
Summary: These were missing, and only needed in exception cases.

Test Plan: nope

Reviewers: #sourcecontrol, rmcelroy, ttung

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3219749

Signature: t1:3219749:1461608742:91e3a721e78188c52431b6c5d1b3ad091e249c3a
2016-04-27 16:49:30 -07:00
Durham Goode
f92668636b store: add historypack store that reads histpack files from .hg/store/packs
Summary:
Now that we can read and write histpack files, let's add a store implementation that
can serve packed content.

My next set of commits (which haven't been written yet) will:
- add tests for all of this

Test Plan:
Ran the tests. Also repacked a repo, deleted the old cache files,
ran hg log FILE, and verified it produced results without hitting the network.

Reviewers: #sourcecontrol, ttung, mitrandir, rmcelroy

Reviewed By: mitrandir, rmcelroy

Subscribers: rmcelroy, mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3219765

Signature: t1:3219765:1461717992:9b2e8646c0555472fa00ee7059c0f283fd4c2c65
2016-04-27 16:49:27 -07:00
Durham Goode
18cde8ba89 store: add a historypack class that can read histpacks
Summary:
The previous patch added logic to repack store history and write it to
a histpack file. This patch adds a pack reader implementation that knows how to
read histpacks.

Test Plan:
Ran the tests.  Also tested this in conjunction with the next patch
which actually reads from the data structure.

Reviewers: #sourcecontrol, ttung, mitrandir, rmcelroy

Reviewed By: mitrandir, rmcelroy

Subscribers: rmcelroy, mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3219764

Signature: t1:3219764:1461718081:9d812b6aea87fe9eb48fdac9dbef282e4775c3c9
2016-04-27 16:49:24 -07:00
Durham Goode
f22bae206b store: add a historypack format and a repacker for it
Summary:
This is an initial implementation of a history pack file creator and a repacker
class that can produce it. A history pack is a pack file that contains no file
content, just history information (parents and linknodes).

A histpack is two files:

- a .histpack file consisting of a series of file sections, each of which
  contains a series of revision entries (node, p1, p2, linknode)
- a .histidx file containing a filename based index to the various file sections
  in the histpack.

See the code for documentation of the exact format.

Test Plan:
ran the tests.  A future diff will add unit tests for all the new pack
structures.

Ran `hg repack` on a large repo. Verified pack files were produced in
.hg/store/packs. In a future diff, I verified that the data could be read
correctly.

Reviewers: #sourcecontrol, mitrandir, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: mitrandir, rmcelroy, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3219762

Signature: t1:3219762:1461751982:e7bbc65e8f01c812fc1eb566d2d48208b0913766
2016-04-27 16:49:21 -07:00
Durham Goode
f17f6cc093 store: add revisions to datapack in alphabetical order
Summary:
This forces the revisions in the datapack to be added in alphabetical order.
This makes the algorithm more deterministic, but otherwise has little effect.

Test Plan: Ran the tests, ran repack

Reviewers: #sourcecontrol, rmcelroy, ttung

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3219760

Signature: t1:3219760:1461687720:7be5fdc1419f8214c8c83074494b33214b3684ae
2016-04-27 16:49:18 -07:00
Durham Goode
43ed70b6f1 store: add datapack store that reads pack files from .hg/store/packs
Summary:
Now that we can read and write datapack files, let's add a store implementation that
can serve packed content. With this patch, it's technically possible for someone
to prefetch and repack large portions of history for long term storage with
remotefilelog.

My next set of commits (which haven't been written yet) will:
- add tests for all of this
- add an indexpack format for packing ancestor metadata (the datapack only packs
  revision content)

Test Plan:
Ran the tests. Also repacked a repo, deleted the old cache files, ran
hg up null && hg up master, and verified it checked out master with the
right files and without fetching blobs from the server.

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3205351

Signature: t1:3205351:1461751649:45a56b57d962a282aeef9478500a3b23495a0eb7
2016-04-27 16:49:15 -07:00
Durham Goode
56c83ea072 store: add a datapack class that can read datapacks
Summary:
The previous patch added logic to repack store contents and write it to a
datapack file. This patch adds a new store implementation that knows how to read
datapacks.

It's just a simple implementation without any parallelism. So there's room for
improvement.

Test Plan:
Ran the tests.  Also tested this in conjunction with the next patch
which actually reads from the data structure.

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3205342

Signature: t1:3205342:1461750967:84377517cb1f285d37694a3f503d60ae85bacb66
2016-04-27 16:49:12 -07:00
Durham Goode
510ac021f3 store: add a basic repack and datapack format
Summary:
This is an initial implementation of a repack algorithm that can read data from
an arbitrary store (in this case the remotefilelog content store), and repack it
into a datapack.

A datapack is two files:

- a .datapack file consisting of a series of deltas (a delta may be a full text if the delta base is the nullid)
- a .dataidx file consisting of delta information and an index into the deltas

See the code for documentation of the exact format.

Test Plan:
ran the tests

Ran `hg repack` in a large repo. Verified that a datapack and a dataidx file
were created in .hg/store/packs. The datapack used 148MB instead of the 439MB the
old remotefilelog storage used.

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3205334

Signature: t1:3205334:1461751366:ee4bf6a580ffb667071a8046fda6f0858b7f25ae
2016-04-27 16:49:09 -07:00
Durham Goode
f362c9a3a8 store: add getfiles() api to store
Summary:
This adds a api to the store contract that allows the store to return a list of
the name/node pairs that it contains. This will be used to allow a repack
algorithm to list the contents of the store so it can repack it into another
store. The old remotefilelog blob store used namehash+node keys, which is
different from the new store API's name+node keys, so the getfiles()
implementation here has to perform a reverse  namehash->name lookup so it can
satisfy the store API contract.

In the remotefilelog basestore implementation, it reads the file names from the
local data directory and the shared cache directory, and reverse resolves the
file name hashes into filenames to produce the list.

Test Plan: ran the tests

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3205321

Signature: t1:3205321:1461751437:a7c44c2bbe153122a3b85b8d82907a112cf77b1a
2016-04-27 16:49:06 -07:00
Durham Goode
438db1be81 store: allow union metadatastore to combine ancestors from many stores
Summary:
The old store api required that each store be able to return the complete
ancestor history for a given name/node pair. This patch allows a store to return
only the parts of history it knows about, and the union store will combine that
history with the history from other stores to produce the full result. This is
useful for stores like bundle files, where they contain only a partial history
that needs to be annotated by the real store.

Test Plan: ran the tests

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3205319

Signature: t1:3205319:1461751511:210740b82cc6767b2f0c393715ac93d8f1b96bc7
2016-04-27 16:49:04 -07:00
Durham Goode
cce75d4663 store: add concept of delta chain to content store
Summary:
The old store contracts required that every store be able to produce the full
text for a revision. This patch modifies the contract so that a store (like a
bundle file store) can serve a delta chain and the union store can combine delta
chains from multiple stores together to create the final full text.

Test Plan: ran the tests

Reviewers: #sourcecontrol, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.fb.com/D3205315

Signature: t1:3205315:1461669845:3eb8968566285f6221c7c44435b855cc65da33f4
2016-04-26 15:10:38 -07:00
Durham Goode
7e1047d11f store: change union stores to accept a list of stores
Summary:
Instead of hard coding the list of stores in each union store, let's make it a
list and just test each store in order. This will allow easily adding new stores
and reordering the priority of the existing ones.

Also fix the remote store's contains function. 'contains' is the old name, and
it now needs to be getmissing in order to fit the store contract.

Test Plan: ran the tests

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Differential Revision: https://phabricator.fb.com/D3205314

Signature: t1:3205314:1461606028:3a513ac82c5de668a7e40bbf7cc88d8754e2f0bb
2016-04-26 15:10:38 -07:00
Durham Goode
9cfbf5a59e store: keep track of the writable store instead of hard coding it
Summary:
A future patch is going to change the union store to just contain an ordered
list of stores. Therefore we need a special spot to record which store is the
one that should receive writes.

Test Plan: ran the tests

Reviewers: #sourcecontrol

Differential Revision: https://phabricator.fb.com/D3205307
2016-04-26 15:10:38 -07:00
Durham Goode
c3c047f0b7 Move sortnodes into shallowutil
Summary:
This is a generic topological sort and will be useful in the upcoming repacking
code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3204124

Signature: t1:3204124:1461260520:e1cb5c9d496f11e5f44e0cdbc5ba851b1573d2e1
2016-04-26 15:10:38 -07:00
Durham Goode
9440072474 checkcode: fix tests
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Subscribers: ps

Differential Revision: https://phabricator.fb.com/D3221377
2016-04-26 13:00:31 -07:00
Durham Goode
84bc49f25d checkcode: fix shallowrepo, shallowutil, and setup.py
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3221375

Signature: t1:3221375:1461648312:7dbdd59e6370cb32b90d864a623d8066028741e7
2016-04-26 13:00:31 -07:00
Durham Goode
3817826242 checkcode: fix remotefilelogserver and shallowbundle
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3221373

Signature: t1:3221373:1461648284:23203c17f4a87e33ff4e9be17a8b99bddbcdff05
2016-04-26 13:00:31 -07:00
Durham Goode
39d350996f checkcode: fix remotefilectx and remotefilelog
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3221371

Signature: t1:3221371:1461648217:e9702d761ab8fd6f85dee60a4c192cf25e784f11
2016-04-26 13:00:31 -07:00
Durham Goode
859510b65e checkcode: fix fileserverclient.py
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3221369

Signature: t1:3221369:1461648197:185cbbba61a9d1a7a1beacd64153185d0d0826ed
2016-04-26 13:00:31 -07:00
Durham Goode
71bd8c2561 checkcode: fix errors in cacheclient and debugcommands
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3221366

Signature: t1:3221366:1461648117:088f3a5837393499e1a383af860bd1a935e0cba7
2016-04-26 13:00:31 -07:00
Durham Goode
495a853d78 checkcode: fix __init__.py
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3221365

Signature: t1:3221365:1461646159:efeb0478c66cbd49d4a0a6c02a79d530b42f8248
2016-04-26 13:00:31 -07:00
Jun Wu
ead8969797 Fix missing errno import
Summary: Apparently we need to `import errno` in `shallowutil.py`

Test Plan: Code Review

Reviewers: #sourcecontrol, ttung, durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D3195117

Signature: t1:3195117:1461031210:424912a96448a2a8cb37197f006cfa95d4ab1cb1
2016-04-18 19:04:58 -07:00
Durham Goode
2d1dcb4b97 Fix missing 'grp' import 2016-04-18 11:46:06 -07:00
Durham Goode
5b2914142a Fix status returning invalid results
The recent refactor caused remotefilelog.size() to include rename metadata in
the size count, which meant the size didn't match what the rest of Mercurial
expected. This caused clean files to show up as dirty in hg status if they had a
'lookup' dirstate state and were renames.
2016-04-10 09:46:24 -07:00
Durham Goode
2e93ca187a Add byte count checking when receiving from the server
Summary:
We've received a few complaints that receivemissing is throwing corrupt data
exceptions. My best guess is that we're not receiving all of the data for some
reason. Let's add an assertion to ensure all the data is present, so we can
narrow it down to a connection issue instead of actual corrupt data.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D3136203
2016-04-05 09:50:12 -07:00
Durham Goode
24323a759c store: address code review feedback
This was meant to be part of the previous stack of commits, but I pushed the
wrong stack. This patch addresses a number of code review feedback points, the
most visible being to remain 'contains' to something else (in this case
'getmissing').
2016-04-04 16:48:55 -07:00
Durham Goode
8ca8f7f6ca stores: remove fetch logic and replace with a remote store fallthrough
The old way of fetching from the server required the base store api expose a way
for outside callers to add fetch handlers to the store. This exposed some of the
underlying details of how data is fetched in an unnecessary way and added an
awkward subscription api.

Let's just treat our remote caches as another store we can fetch from, and
require that the over arching configure logic (in shallowrepo.py) can connect
all our stores together in a union store.
2016-04-04 16:26:12 -07:00
Durham Goode
ece19111e0 ioutil: rename ioutil to shallowutil
The old name was not very descriptive. There's already a shallowutil, so let's
just use that.
2016-04-04 16:26:12 -07:00
Durham Goode
29ea8ada1e store: delete the localcache class
Now that all functionality has been moved to the new store, we no longer need
the localcache class. So let's delete it.
2016-04-04 16:26:12 -07:00
Durham Goode
ecf4378d18 store: implement gc in the new store
The last major piece of functionality that needs to be moved into the new store
is the gc algorithm. This is just a copy paste of the one that exists in
localcache.
2016-04-04 16:26:12 -07:00
Durham Goode
d70897e18c store: implement markrepo on the new store
Now that most of our storage has been moved behind the new store, let's also
move the ability to mark the repo to behind that storage abstraction.
2016-04-04 16:26:12 -07:00
Durham Goode
0dd4247520 store: make remotefilelog.ancestormap use the new store
Now that we have a metadatastore, let's use it to implement
remotefilelog.ancestormap. This gets rid of a bunch of ugly code.
2016-04-04 16:26:12 -07:00
Durham Goode
ad473d5a6b store: make remotefilelog.linknode us the new store
Now that we have the new metadatastore, let's use it to fetch the linknode
instead of parsing the data ourself.
2016-04-04 16:26:12 -07:00
Durham Goode
82bc4468ed store: make remotefilelog.renamed use the store
Now that we have a metadata store, let's switch remotefilelog.renamed to consult
it, instead of parsing the data itself.
2016-04-04 16:26:12 -07:00
Durham Goode
aba161c424 store: implement metadatastore functions
This implements the metadatastore APIs that were previously just stubs.
2016-04-04 16:26:12 -07:00
Durham Goode
8ad3ce6f41 store: change fileserviceclient to write via new store
Now that we have the new store abstraction, and now that remotefilelog.py writes
via it, let's also make fileserverclient write to the store via that API.

This required some refactoring of how receive missing worked, so we could pass
the filename down, as that is required for writing to the store.
2016-04-04 16:26:12 -07:00
Durham Goode
721f54d0df store: move remotefilelog content writing to be done via basestore
Now that we have the new store abstraction, let's route writes through it as
well.
2016-04-04 16:26:12 -07:00
Durham Goode
ffb239bdcb store: switch remotefilelog.size to use new store
Now that we can read data via the new store, let's switch remotefilelog to use
that instead of talking to the filesystem directly.
2016-04-04 16:26:12 -07:00
Durham Goode
69aff18063 store: switch remotefilelog.read to use self.revision
Now that remotefilelog.revision is implemented using the new contentstore, let's
switch remotefilelog.read to use that instead. This logic is almost identical to
what's in filelog.read
2016-04-04 16:26:12 -07:00
Durham Goode
50df2e518f store: switch remotefilelog.revision to use new store
Now that the new contentstore has get(), let's switch remotefilelog.revision to
use it instead.
2016-04-04 16:26:12 -07:00
Durham Goode
cb48cd034a store: add store data validation
The old store logic has validation for checking the data it's reading is
corrupt. Let's copy and paste that over to the new store.
2016-04-04 16:26:12 -07:00
Durham Goode
dfb49ad597 store: implement contentstore.get
This implements the basic function for fetching content data from the
remotefilelog store.
2016-04-04 16:26:12 -07:00