Summary:
Implementing these two functions will allow datapack's to be repacked (either
into other formats, or by combining multiple packs into one).
A future patch will add a test.
Test Plan: Added a test in a future patch
Reviewers: lcharignon, ttung, rmcelroy, mitrandir, quark
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3251539
Signature: t1:3251539:1462393256:7caa09677fbcaaf57a47d7a833684883483c5b3a
Summary:
Previously, given a historypack file, we had no way of reading the contents,
since we had no way to know when to stop reading the revision entries for a
given file section.
This patch changes the format to have a revision count value after the filename
and before the revisions. The documentation already documented the format like
this, and therefore doesn't need updating.
A future patch will use this information to iterate over all the revisions in
the pack.
Test Plan: Added a test in a future patch
Reviewers: lcharignon, ttung, rmcelroy, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3251538
Signature: t1:3251538:1462393282:f46b50e79237bfa8a25ff1957344588622b2699a
Summary:
In a later patch we will need to add the count of revisions in a given file
section to the on-disk format. To make that easier, let's make the file section
serialization lazy, so that we will have the full list when it comes time to
count the entries.
Test Plan: added a test in a future patch
Reviewers: lcharignon, ttung, rmcelroy, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3251537
Signature: t1:3251537:1462393274:60b72a47de45f5a94f4f5a8d34b3942db0aa3fda
Summary:
This adds a simple test for repack that ensures the old blobs are cleaned up and
the new data is still accessible.
Test Plan: Ran the tests
Reviewers: lcharignon, ttung, rmcelroy, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3250673
Signature: t1:3250673:1462235487:bcb5015b9a3665c5bb54776de6d110b1b8a64078
Summary:
Previously, hg repack would repack all the objects in all the store and dump the
new packs in .hg/store/packs. Initially we only want to repack the shared cache
though, so let's change repack to only operate on shared stores, and to write
out the new packs to the hgcache, under the appropriate repo name.
In a future patch I'm going to go through all this store stuff and replace all
uses of os.path and direct file reads/writes with a mercurial vfs.
Test Plan:
Ran repack in a large repo and verified packs were produced in
$HGCACHE/$REPONAME/packs
Ran hg cat on a file to verify that it read the data from the pack and did not do any remotefilelog network calls.
Reviewers: lcharignon, rmcelroy, ttung, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3250213
Signature: t1:3250213:1462315927:694661795141e2c869ba661a54cea8f4b90823df
Summary:
Previously, if a repack failed, it would leave temporary pack files laying
around. By adding enter/exit functions to mutable packs, we can guarantee
cleanup happens.
Test Plan: Ran repack, verified that a failure did not leave tmp files
Reviewers: rmcelroy, quark, ttung, lcharignon, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3250201
Signature: t1:3250201:1462234552:7f20260a193ed1dd858bf6e9f489ac902d859218
Summary:
Now that all the repack logic is in place, let's switch the repack
command to use the new version. This also means the repack command will now
clean up the old remotefilelog blobs once it's finished.
Test Plan:
Ran hg repack in a large repo. Verified it deleted the old
remotefilelog blobs, and verified that I could still updated around the
repository without making any remotefilelog network requests.
A future diff will add standard .t mercurial tests for the repack command.
Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3249601
Signature: t1:3249601:1462235506:03c0d95f6a82cfc04b340b139f39c02853941a17
Summary:
We had a naive repack implementation in historypack.py. Let's move it to the
repack module and do the minor adjustments to use the new repackerledger apis.
Test Plan:
Ran hg repack in conjunction with future diffs that make use of this
api
Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3249587
Signature: t1:3249587:1462232544:591cd8bec09f781370896470746eae5a4489531f
Summary:
We had a naive repack implementation in datapack.py. Let's move it to the repack
module and do the minor adjustments to use the new repackerledger apis.
Test Plan: Ran it in conjunction with future diffs that make use of this api.
Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3249585
Signature: t1:3249585:1462232504:a00aa65afca9562a2c1456cc4ab48c50d1ba5b68
Summary:
This implements the new markledger and cleanup apis on the existing
remotefilelog stores. These apis are used to tell the repacker what each store
has, and allows each store to cleanup if its data has been repacked.
Test Plan:
Ran repack in conjunction with the future diffs that make use of
these apis.
Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3249584
Signature: t1:3249584:1462226133:1e8faffc9f6bf8f7c94e6e79aee8865e3c41648c
Summary:
This introduces the high level classes that will implement the generic repack
logic.
Test Plan: Ran the repack in conjunction with later commits that use these apis.
Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3249577
Signature: t1:3249577:1462225435:000f9cc29ae2a3d7fdbedf546c8936ef45d1e4cf
Summary:
Adds an optional test for performance profiling. Using this test I was
able to see the difference between bisect, interpolation search, and the fanout
table on the index lookup times, and determined the fanout table was about 10x
faster on packs with 1 million objects.
It also taught me that the 2^16 fanout table is very inefficient for small packs
(cpu wise), so we should allow it to be configurable.
Test Plan: Ran it
Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3249574
Signature: t1:3249574:1462240719:b8895f79895286613261cd2d4314c214688558eb
Summary:
Using range() allocates a full list, which is 2**16 entries in the fanout case.
Let's use xrange instead. This is a notable performance win when checking many
keys.
Also removed an unused variable and use index instead of self._index since this
is a hotpath.
Test Plan: Ran hg repack
Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3249563
Signature: t1:3249563:1462240834:c19d6cbf0b6237f15ca8d81e8da856752df0ec59
Summary:
This adds a basic test suite for the historypack class, and fixes some issues it
found.
Test Plan: ./run-tests.py test-historypack.py
Reviewers: mitrandir, rmcelroy, ttung, lcharignon
Reviewed By: lcharignon
Differential Revision: https://phabricator.intern.facebook.com/D3237858
Signature: t1:3237858:1461884966:c0ec90a2735255e5ef70eade09915066a7b71ee5
Summary: This adds some basic unit tests for creating and reading from datapack files.
Test Plan: ./run-tests.py test-datapack.py
Reviewers: mitrandir, rmcelroy, lcharignon
Differential Revision: https://phabricator.intern.facebook.com/D3233181
Summary: Adds the same check code test that upstream Mercurial uses.
Test Plan:
Ran it, and fixed all the failures. I won't land this commit until
all the failure fixes are landed.
Reviewers: #sourcecontrol, ttung, rmcelroy, wez
Reviewed By: wez
Subscribers: quark, rmcelroy, wez
Differential Revision: https://phabricator.intern.facebook.com/D3221380
Signature: t1:3221380:1461802769:19f5bdc209c05edb442faa70ae572ce31e2fbc95
Summary: Fix check code for various store related files
Test Plan: Ran the tests
Reviewers: #sourcecontrol, mitrandir, ttung
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3222465
Signature: t1:3222465:1461701300:34560288be4dc921f0252d4ad8fdc9c8d9357e23
Summary: These were missing, and only needed in exception cases.
Test Plan: nope
Reviewers: #sourcecontrol, rmcelroy, ttung
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3219749
Signature: t1:3219749:1461608742:91e3a721e78188c52431b6c5d1b3ad091e249c3a
Summary:
Now that we can read and write histpack files, let's add a store implementation that
can serve packed content.
My next set of commits (which haven't been written yet) will:
- add tests for all of this
Test Plan:
Ran the tests. Also repacked a repo, deleted the old cache files,
ran hg log FILE, and verified it produced results without hitting the network.
Reviewers: #sourcecontrol, ttung, mitrandir, rmcelroy
Reviewed By: mitrandir, rmcelroy
Subscribers: rmcelroy, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3219765
Signature: t1:3219765:1461717992:9b2e8646c0555472fa00ee7059c0f283fd4c2c65
Summary:
The previous patch added logic to repack store history and write it to
a histpack file. This patch adds a pack reader implementation that knows how to
read histpacks.
Test Plan:
Ran the tests. Also tested this in conjunction with the next patch
which actually reads from the data structure.
Reviewers: #sourcecontrol, ttung, mitrandir, rmcelroy
Reviewed By: mitrandir, rmcelroy
Subscribers: rmcelroy, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3219764
Signature: t1:3219764:1461718081:9d812b6aea87fe9eb48fdac9dbef282e4775c3c9
Summary:
This is an initial implementation of a history pack file creator and a repacker
class that can produce it. A history pack is a pack file that contains no file
content, just history information (parents and linknodes).
A histpack is two files:
- a .histpack file consisting of a series of file sections, each of which
contains a series of revision entries (node, p1, p2, linknode)
- a .histidx file containing a filename based index to the various file sections
in the histpack.
See the code for documentation of the exact format.
Test Plan:
ran the tests. A future diff will add unit tests for all the new pack
structures.
Ran `hg repack` on a large repo. Verified pack files were produced in
.hg/store/packs. In a future diff, I verified that the data could be read
correctly.
Reviewers: #sourcecontrol, mitrandir, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: mitrandir, rmcelroy, mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D3219762
Signature: t1:3219762:1461751982:e7bbc65e8f01c812fc1eb566d2d48208b0913766
Summary:
This forces the revisions in the datapack to be added in alphabetical order.
This makes the algorithm more deterministic, but otherwise has little effect.
Test Plan: Ran the tests, ran repack
Reviewers: #sourcecontrol, rmcelroy, ttung
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3219760
Signature: t1:3219760:1461687720:7be5fdc1419f8214c8c83074494b33214b3684ae
Summary:
Now that we can read and write datapack files, let's add a store implementation that
can serve packed content. With this patch, it's technically possible for someone
to prefetch and repack large portions of history for long term storage with
remotefilelog.
My next set of commits (which haven't been written yet) will:
- add tests for all of this
- add an indexpack format for packing ancestor metadata (the datapack only packs
revision content)
Test Plan:
Ran the tests. Also repacked a repo, deleted the old cache files, ran
hg up null && hg up master, and verified it checked out master with the
right files and without fetching blobs from the server.
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3205351
Signature: t1:3205351:1461751649:45a56b57d962a282aeef9478500a3b23495a0eb7
Summary:
The previous patch added logic to repack store contents and write it to a
datapack file. This patch adds a new store implementation that knows how to read
datapacks.
It's just a simple implementation without any parallelism. So there's room for
improvement.
Test Plan:
Ran the tests. Also tested this in conjunction with the next patch
which actually reads from the data structure.
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3205342
Signature: t1:3205342:1461750967:84377517cb1f285d37694a3f503d60ae85bacb66
Summary:
This is an initial implementation of a repack algorithm that can read data from
an arbitrary store (in this case the remotefilelog content store), and repack it
into a datapack.
A datapack is two files:
- a .datapack file consisting of a series of deltas (a delta may be a full text if the delta base is the nullid)
- a .dataidx file consisting of delta information and an index into the deltas
See the code for documentation of the exact format.
Test Plan:
ran the tests
Ran `hg repack` in a large repo. Verified that a datapack and a dataidx file
were created in .hg/store/packs. The datapack used 148MB instead of the 439MB the
old remotefilelog storage used.
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3205334
Signature: t1:3205334:1461751366:ee4bf6a580ffb667071a8046fda6f0858b7f25ae
Summary:
This adds a api to the store contract that allows the store to return a list of
the name/node pairs that it contains. This will be used to allow a repack
algorithm to list the contents of the store so it can repack it into another
store. The old remotefilelog blob store used namehash+node keys, which is
different from the new store API's name+node keys, so the getfiles()
implementation here has to perform a reverse namehash->name lookup so it can
satisfy the store API contract.
In the remotefilelog basestore implementation, it reads the file names from the
local data directory and the shared cache directory, and reverse resolves the
file name hashes into filenames to produce the list.
Test Plan: ran the tests
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3205321
Signature: t1:3205321:1461751437:a7c44c2bbe153122a3b85b8d82907a112cf77b1a
Summary:
The old store api required that each store be able to return the complete
ancestor history for a given name/node pair. This patch allows a store to return
only the parts of history it knows about, and the union store will combine that
history with the history from other stores to produce the full result. This is
useful for stores like bundle files, where they contain only a partial history
that needs to be annotated by the real store.
Test Plan: ran the tests
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3205319
Signature: t1:3205319:1461751511:210740b82cc6767b2f0c393715ac93d8f1b96bc7
Summary:
The old store contracts required that every store be able to produce the full
text for a revision. This patch modifies the contract so that a store (like a
bundle file store) can serve a delta chain and the union store can combine delta
chains from multiple stores together to create the final full text.
Test Plan: ran the tests
Reviewers: #sourcecontrol, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.fb.com/D3205315
Signature: t1:3205315:1461669845:3eb8968566285f6221c7c44435b855cc65da33f4
Summary:
Instead of hard coding the list of stores in each union store, let's make it a
list and just test each store in order. This will allow easily adding new stores
and reordering the priority of the existing ones.
Also fix the remote store's contains function. 'contains' is the old name, and
it now needs to be getmissing in order to fit the store contract.
Test Plan: ran the tests
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Differential Revision: https://phabricator.fb.com/D3205314
Signature: t1:3205314:1461606028:3a513ac82c5de668a7e40bbf7cc88d8754e2f0bb
Summary:
A future patch is going to change the union store to just contain an ordered
list of stores. Therefore we need a special spot to record which store is the
one that should receive writes.
Test Plan: ran the tests
Reviewers: #sourcecontrol
Differential Revision: https://phabricator.fb.com/D3205307
Summary:
This is a generic topological sort and will be useful in the upcoming repacking
code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3204124
Signature: t1:3204124:1461260520:e1cb5c9d496f11e5f44e0cdbc5ba851b1573d2e1
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Subscribers: ps
Differential Revision: https://phabricator.fb.com/D3221377
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3221375
Signature: t1:3221375:1461648312:7dbdd59e6370cb32b90d864a623d8066028741e7
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3221373
Signature: t1:3221373:1461648284:23203c17f4a87e33ff4e9be17a8b99bddbcdff05
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3221371
Signature: t1:3221371:1461648217:e9702d761ab8fd6f85dee60a4c192cf25e784f11
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3221369
Signature: t1:3221369:1461648197:185cbbba61a9d1a7a1beacd64153185d0d0826ed
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3221366
Signature: t1:3221366:1461648117:088f3a5837393499e1a383af860bd1a935e0cba7
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3221365
Signature: t1:3221365:1461646159:efeb0478c66cbd49d4a0a6c02a79d530b42f8248
Summary: Apparently we need to `import errno` in `shallowutil.py`
Test Plan: Code Review
Reviewers: #sourcecontrol, ttung, durham
Reviewed By: durham
Differential Revision: https://phabricator.fb.com/D3195117
Signature: t1:3195117:1461031210:424912a96448a2a8cb37197f006cfa95d4ab1cb1
The recent refactor caused remotefilelog.size() to include rename metadata in
the size count, which meant the size didn't match what the rest of Mercurial
expected. This caused clean files to show up as dirty in hg status if they had a
'lookup' dirstate state and were renames.
Summary:
We've received a few complaints that receivemissing is throwing corrupt data
exceptions. My best guess is that we're not receiving all of the data for some
reason. Let's add an assertion to ensure all the data is present, so we can
narrow it down to a connection issue instead of actual corrupt data.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Differential Revision: https://phabricator.fb.com/D3136203
This was meant to be part of the previous stack of commits, but I pushed the
wrong stack. This patch addresses a number of code review feedback points, the
most visible being to remain 'contains' to something else (in this case
'getmissing').
The old way of fetching from the server required the base store api expose a way
for outside callers to add fetch handlers to the store. This exposed some of the
underlying details of how data is fetched in an unnecessary way and added an
awkward subscription api.
Let's just treat our remote caches as another store we can fetch from, and
require that the over arching configure logic (in shallowrepo.py) can connect
all our stores together in a union store.
The last major piece of functionality that needs to be moved into the new store
is the gc algorithm. This is just a copy paste of the one that exists in
localcache.