Summary:
Now that we can read and write histpack files, let's add a store implementation that
can serve packed content.
My next set of commits (which haven't been written yet) will:
- add tests for all of this
Test Plan:
Ran the tests. Also repacked a repo, deleted the old cache files,
ran hg log FILE, and verified it produced results without hitting the network.
Reviewers: #sourcecontrol, ttung, mitrandir, rmcelroy
Reviewed By: mitrandir, rmcelroy
Subscribers: rmcelroy, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3219765
Signature: t1:3219765:1461717992:9b2e8646c0555472fa00ee7059c0f283fd4c2c65
Summary:
The previous patch added logic to repack store history and write it to
a histpack file. This patch adds a pack reader implementation that knows how to
read histpacks.
Test Plan:
Ran the tests. Also tested this in conjunction with the next patch
which actually reads from the data structure.
Reviewers: #sourcecontrol, ttung, mitrandir, rmcelroy
Reviewed By: mitrandir, rmcelroy
Subscribers: rmcelroy, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3219764
Signature: t1:3219764:1461718081:9d812b6aea87fe9eb48fdac9dbef282e4775c3c9
Summary:
This is an initial implementation of a history pack file creator and a repacker
class that can produce it. A history pack is a pack file that contains no file
content, just history information (parents and linknodes).
A histpack is two files:
- a .histpack file consisting of a series of file sections, each of which
contains a series of revision entries (node, p1, p2, linknode)
- a .histidx file containing a filename based index to the various file sections
in the histpack.
See the code for documentation of the exact format.
Test Plan:
ran the tests. A future diff will add unit tests for all the new pack
structures.
Ran `hg repack` on a large repo. Verified pack files were produced in
.hg/store/packs. In a future diff, I verified that the data could be read
correctly.
Reviewers: #sourcecontrol, mitrandir, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: mitrandir, rmcelroy, mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D3219762
Signature: t1:3219762:1461751982:e7bbc65e8f01c812fc1eb566d2d48208b0913766
Summary:
This forces the revisions in the datapack to be added in alphabetical order.
This makes the algorithm more deterministic, but otherwise has little effect.
Test Plan: Ran the tests, ran repack
Reviewers: #sourcecontrol, rmcelroy, ttung
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3219760
Signature: t1:3219760:1461687720:7be5fdc1419f8214c8c83074494b33214b3684ae
Summary:
Now that we can read and write datapack files, let's add a store implementation that
can serve packed content. With this patch, it's technically possible for someone
to prefetch and repack large portions of history for long term storage with
remotefilelog.
My next set of commits (which haven't been written yet) will:
- add tests for all of this
- add an indexpack format for packing ancestor metadata (the datapack only packs
revision content)
Test Plan:
Ran the tests. Also repacked a repo, deleted the old cache files, ran
hg up null && hg up master, and verified it checked out master with the
right files and without fetching blobs from the server.
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3205351
Signature: t1:3205351:1461751649:45a56b57d962a282aeef9478500a3b23495a0eb7
Summary:
The previous patch added logic to repack store contents and write it to a
datapack file. This patch adds a new store implementation that knows how to read
datapacks.
It's just a simple implementation without any parallelism. So there's room for
improvement.
Test Plan:
Ran the tests. Also tested this in conjunction with the next patch
which actually reads from the data structure.
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3205342
Signature: t1:3205342:1461750967:84377517cb1f285d37694a3f503d60ae85bacb66
Summary:
This is an initial implementation of a repack algorithm that can read data from
an arbitrary store (in this case the remotefilelog content store), and repack it
into a datapack.
A datapack is two files:
- a .datapack file consisting of a series of deltas (a delta may be a full text if the delta base is the nullid)
- a .dataidx file consisting of delta information and an index into the deltas
See the code for documentation of the exact format.
Test Plan:
ran the tests
Ran `hg repack` in a large repo. Verified that a datapack and a dataidx file
were created in .hg/store/packs. The datapack used 148MB instead of the 439MB the
old remotefilelog storage used.
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3205334
Signature: t1:3205334:1461751366:ee4bf6a580ffb667071a8046fda6f0858b7f25ae
Summary:
This adds a api to the store contract that allows the store to return a list of
the name/node pairs that it contains. This will be used to allow a repack
algorithm to list the contents of the store so it can repack it into another
store. The old remotefilelog blob store used namehash+node keys, which is
different from the new store API's name+node keys, so the getfiles()
implementation here has to perform a reverse namehash->name lookup so it can
satisfy the store API contract.
In the remotefilelog basestore implementation, it reads the file names from the
local data directory and the shared cache directory, and reverse resolves the
file name hashes into filenames to produce the list.
Test Plan: ran the tests
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3205321
Signature: t1:3205321:1461751437:a7c44c2bbe153122a3b85b8d82907a112cf77b1a
Summary:
The old store api required that each store be able to return the complete
ancestor history for a given name/node pair. This patch allows a store to return
only the parts of history it knows about, and the union store will combine that
history with the history from other stores to produce the full result. This is
useful for stores like bundle files, where they contain only a partial history
that needs to be annotated by the real store.
Test Plan: ran the tests
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3205319
Signature: t1:3205319:1461751511:210740b82cc6767b2f0c393715ac93d8f1b96bc7
Summary:
The old store contracts required that every store be able to produce the full
text for a revision. This patch modifies the contract so that a store (like a
bundle file store) can serve a delta chain and the union store can combine delta
chains from multiple stores together to create the final full text.
Test Plan: ran the tests
Reviewers: #sourcecontrol, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.fb.com/D3205315
Signature: t1:3205315:1461669845:3eb8968566285f6221c7c44435b855cc65da33f4
Summary:
Instead of hard coding the list of stores in each union store, let's make it a
list and just test each store in order. This will allow easily adding new stores
and reordering the priority of the existing ones.
Also fix the remote store's contains function. 'contains' is the old name, and
it now needs to be getmissing in order to fit the store contract.
Test Plan: ran the tests
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Differential Revision: https://phabricator.fb.com/D3205314
Signature: t1:3205314:1461606028:3a513ac82c5de668a7e40bbf7cc88d8754e2f0bb
Summary:
A future patch is going to change the union store to just contain an ordered
list of stores. Therefore we need a special spot to record which store is the
one that should receive writes.
Test Plan: ran the tests
Reviewers: #sourcecontrol
Differential Revision: https://phabricator.fb.com/D3205307
Summary:
This is a generic topological sort and will be useful in the upcoming repacking
code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3204124
Signature: t1:3204124:1461260520:e1cb5c9d496f11e5f44e0cdbc5ba851b1573d2e1
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Subscribers: ps
Differential Revision: https://phabricator.fb.com/D3221377
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3221375
Signature: t1:3221375:1461648312:7dbdd59e6370cb32b90d864a623d8066028741e7
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3221373
Signature: t1:3221373:1461648284:23203c17f4a87e33ff4e9be17a8b99bddbcdff05
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3221371
Signature: t1:3221371:1461648217:e9702d761ab8fd6f85dee60a4c192cf25e784f11
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3221369
Signature: t1:3221369:1461648197:185cbbba61a9d1a7a1beacd64153185d0d0826ed
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3221366
Signature: t1:3221366:1461648117:088f3a5837393499e1a383af860bd1a935e0cba7
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3221365
Signature: t1:3221365:1461646159:efeb0478c66cbd49d4a0a6c02a79d530b42f8248
Summary: Apparently we need to `import errno` in `shallowutil.py`
Test Plan: Code Review
Reviewers: #sourcecontrol, ttung, durham
Reviewed By: durham
Differential Revision: https://phabricator.fb.com/D3195117
Signature: t1:3195117:1461031210:424912a96448a2a8cb37197f006cfa95d4ab1cb1
The recent refactor caused remotefilelog.size() to include rename metadata in
the size count, which meant the size didn't match what the rest of Mercurial
expected. This caused clean files to show up as dirty in hg status if they had a
'lookup' dirstate state and were renames.
Summary:
We've received a few complaints that receivemissing is throwing corrupt data
exceptions. My best guess is that we're not receiving all of the data for some
reason. Let's add an assertion to ensure all the data is present, so we can
narrow it down to a connection issue instead of actual corrupt data.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Differential Revision: https://phabricator.fb.com/D3136203
This was meant to be part of the previous stack of commits, but I pushed the
wrong stack. This patch addresses a number of code review feedback points, the
most visible being to remain 'contains' to something else (in this case
'getmissing').
The old way of fetching from the server required the base store api expose a way
for outside callers to add fetch handlers to the store. This exposed some of the
underlying details of how data is fetched in an unnecessary way and added an
awkward subscription api.
Let's just treat our remote caches as another store we can fetch from, and
require that the over arching configure logic (in shallowrepo.py) can connect
all our stores together in a union store.
The last major piece of functionality that needs to be moved into the new store
is the gc algorithm. This is just a copy paste of the one that exists in
localcache.
Now that we have the new store abstraction, and now that remotefilelog.py writes
via it, let's also make fileserverclient write to the store via that API.
This required some refactoring of how receive missing worked, so we could pass
the filename down, as that is required for writing to the store.
Now that remotefilelog.revision is implemented using the new contentstore, let's
switch remotefilelog.read to use that instead. This logic is almost identical to
what's in filelog.read
We are refactoring the storage to be behind more abstract APIs. This patch
creates the new store objects on the repo and passes them to the
fileserverclient so it can add itself as a file provider, in the case of misses.
Future patches will refactor the storage logic into a more abstract API. This
patch adds a union store, which will allow us to check both local client storage and
shared cache storage, without exposing the difference at higher levels.
Summary:
When running inside chg, `reposetup` will be called once since `serve` is not
a `norepo` command. Then if the user runs a `norepo` command like `help`,
`runcommand` will receive `repo = None` and error out. Fix it by checking
`repo` explicitly.
Test Plan: Run `chg help` and no exception is thrown.
Reviewers: #sourcecontrol, ttung, durham
Reviewed By: durham
Differential Revision: https://phabricator.fb.com/D3136328
Signature: t1:3136328:1459811387:3b86df9765aa5e20677031d6e9fc4bc3d524efa6
Summary:
Since we added the C code ancestor walk to this function, this python ancestor
walk is completely unnecessary, and can cause significant slow downs if none of
the ancestors are known linknodes (it walks the entire history).
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Differential Revision: https://phabricator.fb.com/D3136150
Summary:
Discovered by `hg log filename` in the hg-committed repo. It seems we missed
a check here.
Test Plan:
Run `hg log filename` in a non-remotefilelog repo with remotefilelog enabled
and make sure "warning: file log can be slow on large repos" is not printed.
Reviewers: #sourcecontrol, ttung, durham
Reviewed By: durham
Differential Revision: https://phabricator.fb.com/D3132523
Signature: t1:3132523:1459801676:bcba3bbcaf1c358ad11e8ad25c0a1d3cc2637a76
Summary: We would like to utilize Martijn's logtoprocess extension to log cache hit rate.
Test Plan: None so far, will update the diff later.
Reviewers: #sourcecontrol, ttung
Differential Revision: https://phabricator.fb.com/D3094765