Commit Graph

315 Commits

Author SHA1 Message Date
Durham Goode
99a426f3eb store: fix fanout logic for history pack
This fix was already applied to datapack a while ago. Basically, if the pack
doesn't have a lot of revisions, it's possible that the fanout table is sparsely
populated. Therefore we need to scan forward when looking for the end bounds of
our fanout table.
2016-05-03 15:41:57 -07:00
Durham Goode
7da17af64f store: record what files were created during a repack
Summary:
Previously, if you ran repack twice in a row, it would actually delete your
packs, because the repack produced files with the same name as before, and the
cleanup then deleted them.

The fix is to have the stores record what files they produced in the ledger,
then future clean up work can avoid deleting anything that was created during
the repack.

Test Plan: Added a test

Reviewers: #mercurial, ttung, mitrandir

Reviewed By: mitrandir

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3255819

Signature: t1:3255819:1462393814:d32155b12535990f72fbe48de045eddbb6f7fab6
2016-05-04 14:53:10 -07:00
Durham Goode
30cc85653c store: make pack files read-only
Summary:
Since pack files should never change after they are created, let's create them
with read-only permissions. It turns out that the Mercurial vfs doesn't apply
the correct permissions to files created by mkstemp (and we have to use mkstemp
since we don't know the name of the file until after we've written all the data
to it), so we have to manually call the permission fixing code.

We also need to fix our mmap calls to be readonly now, otherwise we get a
runtime permission denied exception.

Test Plan: Added a test

Reviewers: #mercurial, ttung, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3255816

Signature: t1:3255816:1462321201:dff4fb4c9301d67a77043ecc1d96262bb5d6a54a
2016-05-04 14:53:07 -07:00
Durham Goode
1d4b4dbb36 store: switch mutable packs to use openers
Summary:
Instead of passing in a path and performing joins ourselves, let's use an
opener. This will help handle all the file permission edge cases.

Test Plan: Ran the tests

Reviewers: #mercurial, ttung, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3255165

Signature: t1:3255165:1462393836:38a28c850a0dc06838d9c17672d3dffd9903bbd7
2016-05-04 14:53:04 -07:00
Durham Goode
0671f3d76a store: add version header to index
Summary: Pretty straight forward

Test Plan: Ran the tests

Reviewers: lcharignon, rmcelroy, ttung, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3254892

Signature: t1:3254892:1462315520:888bb27ef121c08d463f9fd4cf9eeb3c42383a96
2016-05-04 14:53:01 -07:00
Durham Goode
d71eace818 store: refactor version number and size to constants
Summary:
A future patch will add a version number to the index file. Let's move the
version size, fanout start, and index start to constants so we can more easily
change them without changing the code.

Test Plan: Ran the tests

Reviewers: lcharignon, rmcelroy, ttung, mitrandir, quark

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3254876

Signature: t1:3254876:1462315858:63fe56e8cfcdbb0209861898ce0c45c7d7b33e35
2016-05-04 14:52:58 -07:00
Durham Goode
4aa798d76e store: don't allow gc to delete pack files
Summary:
`hg gc` is very aggressive in that it deletes any files in the cache that it
determines aren't a needed key, including files it doesn't recognize. Let's
teach it to not delete pack files.

In the future we'll need to make `hg gc` able to garbage collect the contents of
pack files as well.

Test Plan: It's probably fine!

Reviewers: lcharignon, rmcelroy, ttung, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3254744

Signature: t1:3254744:1462308257:c1f932b88abf3337370f16c05c789422ea51b0e1
2016-05-04 14:52:55 -07:00
Durham Goode
312acdb24e store: implement markledger and cleanup for historypack
Summary:
Implementing these two functions allows historypacks to be repacked, either into
a new format, or by combining multiple packs into a single new one.

Test Plan: Added a test in my next patch

Reviewers: lcharignon, ttung, rmcelroy, mitrandir, quark

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3251542

Signature: t1:3251542:1462392294:f95f7666a3a5df675f1351a19af7532c4742af2b
2016-05-04 14:52:49 -07:00
Durham Goode
d86ae1e525 store: implement markledger and cleanup for datapack
Summary:
Implementing these two functions will allow datapack's to be repacked (either
into other formats, or by combining multiple packs into one).

A future patch will add a test.

Test Plan: Added a test in a future patch

Reviewers: lcharignon, ttung, rmcelroy, mitrandir, quark

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3251539

Signature: t1:3251539:1462393256:7caa09677fbcaaf57a47d7a833684883483c5b3a
2016-05-04 14:52:46 -07:00
Durham Goode
6b25f32192 store: add revision count to historypack filesection
Summary:
Previously, given a historypack file, we had no way of reading the contents,
since we had no way to know when to stop reading the revision entries for a
given file section.

This patch changes the format to have a revision count value after the filename
and before the revisions. The documentation already documented the format like
this, and therefore doesn't need updating.

A future patch will use this information to iterate over all the revisions in
the pack.

Test Plan: Added a test in a future patch

Reviewers: lcharignon, ttung, rmcelroy, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3251538

Signature: t1:3251538:1462393282:f46b50e79237bfa8a25ff1957344588622b2699a
2016-05-04 14:52:43 -07:00
Durham Goode
548ccdeae1 store: make historypack file section writing lazier
Summary:
In a later patch we will need to add the count of revisions in a given file
section to the on-disk format. To make that easier, let's make the file section
serialization lazy, so that we will have the full list when it comes time to
count the entries.

Test Plan: added a test in a future patch

Reviewers: lcharignon, ttung, rmcelroy, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3251537

Signature: t1:3251537:1462393274:60b72a47de45f5a94f4f5a8d34b3942db0aa3fda
2016-05-04 14:52:40 -07:00
Durham Goode
a54d3f1257 store: make repack only repack the shared cache
Summary:
Previously, hg repack would repack all the objects in all the store and dump the
new packs in .hg/store/packs. Initially we only want to repack the shared cache
though, so let's change repack to only operate on shared stores, and to write
out the new packs to the hgcache, under the appropriate repo name.

In a future patch I'm going to go through all this store stuff and replace all
uses of os.path and direct file reads/writes with a mercurial vfs.

Test Plan:
Ran repack in a large repo and verified packs were produced in
$HGCACHE/$REPONAME/packs

Ran hg cat on a file to verify that it read the data from the pack and did not do any remotefilelog network calls.

Reviewers: lcharignon, rmcelroy, ttung, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3250213

Signature: t1:3250213:1462315927:694661795141e2c869ba661a54cea8f4b90823df
2016-05-04 14:52:33 -07:00
Durham Goode
cfd60406ad store: add context manager to mutable pack classes
Summary:
Previously, if a repack failed, it would leave temporary pack files laying
around. By adding enter/exit functions to mutable packs, we can guarantee
cleanup happens.

Test Plan: Ran repack, verified that a failure did not leave tmp files

Reviewers: rmcelroy, quark, ttung, lcharignon, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3250201

Signature: t1:3250201:1462234552:7f20260a193ed1dd858bf6e9f489ac902d859218
2016-05-03 12:34:45 -07:00
Durham Goode
d2e7ae7519 store: make repack command use new repacker
Summary:
Now that all the repack logic is in place, let's switch the repack
command to use the new version. This also means the repack command will now
clean up the old remotefilelog blobs once it's finished.

Test Plan:
Ran hg repack in a large repo. Verified it deleted the old
remotefilelog blobs, and verified that I could still updated around the
repository without making any remotefilelog network requests.

A future diff will add standard .t mercurial tests for the repack command.

Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3249601

Signature: t1:3249601:1462235506:03c0d95f6a82cfc04b340b139f39c02853941a17
2016-05-03 12:34:09 -07:00
Durham Goode
4ecb47b021 store: move history repack logic to repacker
Summary:
We had a naive repack implementation in historypack.py. Let's move it to the
repack module and do the minor adjustments to use the new repackerledger apis.

Test Plan:
Ran hg repack in conjunction with future diffs that make use of this
api

Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3249587

Signature: t1:3249587:1462232544:591cd8bec09f781370896470746eae5a4489531f
2016-05-03 12:33:54 -07:00
Durham Goode
735aa964d5 store: move data repack logic to repacker
Summary:
We had a naive repack implementation in datapack.py. Let's move it to the repack
module and do the minor adjustments to use the new repackerledger apis.

Test Plan: Ran it in conjunction with future diffs that make use of this api.

Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3249585

Signature: t1:3249585:1462232504:a00aa65afca9562a2c1456cc4ab48c50d1ba5b68
2016-05-03 12:33:36 -07:00
Durham Goode
c902797dc9 store: implement markledger and cleanup on stores
Summary:
This implements the new markledger and cleanup apis on the existing
remotefilelog stores. These apis are used to tell the repacker what each store
has, and allows each store to cleanup if its data has been repacked.

Test Plan:
Ran repack in conjunction with the future diffs that make use of
these apis.

Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3249584

Signature: t1:3249584:1462226133:1e8faffc9f6bf8f7c94e6e79aee8865e3c41648c
2016-05-03 12:33:00 -07:00
Durham Goode
c429030ca4 store: add class definitions and stub for repack
Summary:
This introduces the high level classes that will implement the generic repack
logic.

Test Plan: Ran the repack in conjunction with later commits that use these apis.

Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3249577

Signature: t1:3249577:1462225435:000f9cc29ae2a3d7fdbedf546c8936ef45d1e4cf
2016-05-03 12:32:35 -07:00
Durham Goode
b049a0910a store: datapack fix perf issue
Summary:
Using range() allocates a full list, which is 2**16 entries in the fanout case.
Let's use xrange instead. This is a notable performance win when checking many
keys.

Also removed an unused variable and use index instead of self._index since this
is a hotpath.

Test Plan: Ran hg repack

Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3249563

Signature: t1:3249563:1462240834:c19d6cbf0b6237f15ca8d81e8da856752df0ec59
2016-05-03 12:30:44 -07:00
Durham Goode
1704e5c8fb store: add tests for historypack
Summary:
This adds a basic test suite for the historypack class, and fixes some issues it
found.

Test Plan: ./run-tests.py test-historypack.py

Reviewers: mitrandir, rmcelroy, ttung, lcharignon

Reviewed By: lcharignon

Differential Revision: https://phabricator.intern.facebook.com/D3237858

Signature: t1:3237858:1461884966:c0ec90a2735255e5ef70eade09915066a7b71ee5
2016-04-28 17:37:03 -07:00
Durham Goode
8f4d83edeb shallowbundle: fix broken fallback orig call
This was caught by tests running in an unusual configuration
2016-04-28 17:34:08 -07:00
Durham Goode
22948ce7e1 checkcode: add check code test
Summary: Adds the same check code test that upstream Mercurial uses.

Test Plan:
Ran it, and fixed all the failures. I won't land this commit until
all the failure fixes are landed.

Reviewers: #sourcecontrol, ttung, rmcelroy, wez

Reviewed By: wez

Subscribers: quark, rmcelroy, wez

Differential Revision: https://phabricator.intern.facebook.com/D3221380

Signature: t1:3221380:1461802769:19f5bdc209c05edb442faa70ae572ce31e2fbc95
2016-04-28 10:18:47 -07:00
Durham Goode
29d3dda67e checkcode: fix various store files
Summary: Fix check code for various store related files

Test Plan: Ran the tests

Reviewers: #sourcecontrol, mitrandir, ttung

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3222465

Signature: t1:3222465:1461701300:34560288be4dc921f0252d4ad8fdc9c8d9357e23
2016-04-27 16:49:33 -07:00
Durham Goode
98fd33f8cb store: add missing imports
Summary: These were missing, and only needed in exception cases.

Test Plan: nope

Reviewers: #sourcecontrol, rmcelroy, ttung

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3219749

Signature: t1:3219749:1461608742:91e3a721e78188c52431b6c5d1b3ad091e249c3a
2016-04-27 16:49:30 -07:00
Durham Goode
f92668636b store: add historypack store that reads histpack files from .hg/store/packs
Summary:
Now that we can read and write histpack files, let's add a store implementation that
can serve packed content.

My next set of commits (which haven't been written yet) will:
- add tests for all of this

Test Plan:
Ran the tests. Also repacked a repo, deleted the old cache files,
ran hg log FILE, and verified it produced results without hitting the network.

Reviewers: #sourcecontrol, ttung, mitrandir, rmcelroy

Reviewed By: mitrandir, rmcelroy

Subscribers: rmcelroy, mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3219765

Signature: t1:3219765:1461717992:9b2e8646c0555472fa00ee7059c0f283fd4c2c65
2016-04-27 16:49:27 -07:00
Durham Goode
18cde8ba89 store: add a historypack class that can read histpacks
Summary:
The previous patch added logic to repack store history and write it to
a histpack file. This patch adds a pack reader implementation that knows how to
read histpacks.

Test Plan:
Ran the tests.  Also tested this in conjunction with the next patch
which actually reads from the data structure.

Reviewers: #sourcecontrol, ttung, mitrandir, rmcelroy

Reviewed By: mitrandir, rmcelroy

Subscribers: rmcelroy, mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3219764

Signature: t1:3219764:1461718081:9d812b6aea87fe9eb48fdac9dbef282e4775c3c9
2016-04-27 16:49:24 -07:00
Durham Goode
f22bae206b store: add a historypack format and a repacker for it
Summary:
This is an initial implementation of a history pack file creator and a repacker
class that can produce it. A history pack is a pack file that contains no file
content, just history information (parents and linknodes).

A histpack is two files:

- a .histpack file consisting of a series of file sections, each of which
  contains a series of revision entries (node, p1, p2, linknode)
- a .histidx file containing a filename based index to the various file sections
  in the histpack.

See the code for documentation of the exact format.

Test Plan:
ran the tests.  A future diff will add unit tests for all the new pack
structures.

Ran `hg repack` on a large repo. Verified pack files were produced in
.hg/store/packs. In a future diff, I verified that the data could be read
correctly.

Reviewers: #sourcecontrol, mitrandir, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: mitrandir, rmcelroy, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3219762

Signature: t1:3219762:1461751982:e7bbc65e8f01c812fc1eb566d2d48208b0913766
2016-04-27 16:49:21 -07:00
Durham Goode
f17f6cc093 store: add revisions to datapack in alphabetical order
Summary:
This forces the revisions in the datapack to be added in alphabetical order.
This makes the algorithm more deterministic, but otherwise has little effect.

Test Plan: Ran the tests, ran repack

Reviewers: #sourcecontrol, rmcelroy, ttung

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3219760

Signature: t1:3219760:1461687720:7be5fdc1419f8214c8c83074494b33214b3684ae
2016-04-27 16:49:18 -07:00
Durham Goode
43ed70b6f1 store: add datapack store that reads pack files from .hg/store/packs
Summary:
Now that we can read and write datapack files, let's add a store implementation that
can serve packed content. With this patch, it's technically possible for someone
to prefetch and repack large portions of history for long term storage with
remotefilelog.

My next set of commits (which haven't been written yet) will:
- add tests for all of this
- add an indexpack format for packing ancestor metadata (the datapack only packs
  revision content)

Test Plan:
Ran the tests. Also repacked a repo, deleted the old cache files, ran
hg up null && hg up master, and verified it checked out master with the
right files and without fetching blobs from the server.

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3205351

Signature: t1:3205351:1461751649:45a56b57d962a282aeef9478500a3b23495a0eb7
2016-04-27 16:49:15 -07:00
Durham Goode
56c83ea072 store: add a datapack class that can read datapacks
Summary:
The previous patch added logic to repack store contents and write it to a
datapack file. This patch adds a new store implementation that knows how to read
datapacks.

It's just a simple implementation without any parallelism. So there's room for
improvement.

Test Plan:
Ran the tests.  Also tested this in conjunction with the next patch
which actually reads from the data structure.

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3205342

Signature: t1:3205342:1461750967:84377517cb1f285d37694a3f503d60ae85bacb66
2016-04-27 16:49:12 -07:00
Durham Goode
510ac021f3 store: add a basic repack and datapack format
Summary:
This is an initial implementation of a repack algorithm that can read data from
an arbitrary store (in this case the remotefilelog content store), and repack it
into a datapack.

A datapack is two files:

- a .datapack file consisting of a series of deltas (a delta may be a full text if the delta base is the nullid)
- a .dataidx file consisting of delta information and an index into the deltas

See the code for documentation of the exact format.

Test Plan:
ran the tests

Ran `hg repack` in a large repo. Verified that a datapack and a dataidx file
were created in .hg/store/packs. The datapack used 148MB instead of the 439MB the
old remotefilelog storage used.

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3205334

Signature: t1:3205334:1461751366:ee4bf6a580ffb667071a8046fda6f0858b7f25ae
2016-04-27 16:49:09 -07:00
Durham Goode
f362c9a3a8 store: add getfiles() api to store
Summary:
This adds a api to the store contract that allows the store to return a list of
the name/node pairs that it contains. This will be used to allow a repack
algorithm to list the contents of the store so it can repack it into another
store. The old remotefilelog blob store used namehash+node keys, which is
different from the new store API's name+node keys, so the getfiles()
implementation here has to perform a reverse  namehash->name lookup so it can
satisfy the store API contract.

In the remotefilelog basestore implementation, it reads the file names from the
local data directory and the shared cache directory, and reverse resolves the
file name hashes into filenames to produce the list.

Test Plan: ran the tests

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3205321

Signature: t1:3205321:1461751437:a7c44c2bbe153122a3b85b8d82907a112cf77b1a
2016-04-27 16:49:06 -07:00
Durham Goode
438db1be81 store: allow union metadatastore to combine ancestors from many stores
Summary:
The old store api required that each store be able to return the complete
ancestor history for a given name/node pair. This patch allows a store to return
only the parts of history it knows about, and the union store will combine that
history with the history from other stores to produce the full result. This is
useful for stores like bundle files, where they contain only a partial history
that needs to be annotated by the real store.

Test Plan: ran the tests

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3205319

Signature: t1:3205319:1461751511:210740b82cc6767b2f0c393715ac93d8f1b96bc7
2016-04-27 16:49:04 -07:00
Durham Goode
cce75d4663 store: add concept of delta chain to content store
Summary:
The old store contracts required that every store be able to produce the full
text for a revision. This patch modifies the contract so that a store (like a
bundle file store) can serve a delta chain and the union store can combine delta
chains from multiple stores together to create the final full text.

Test Plan: ran the tests

Reviewers: #sourcecontrol, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.fb.com/D3205315

Signature: t1:3205315:1461669845:3eb8968566285f6221c7c44435b855cc65da33f4
2016-04-26 15:10:38 -07:00
Durham Goode
7e1047d11f store: change union stores to accept a list of stores
Summary:
Instead of hard coding the list of stores in each union store, let's make it a
list and just test each store in order. This will allow easily adding new stores
and reordering the priority of the existing ones.

Also fix the remote store's contains function. 'contains' is the old name, and
it now needs to be getmissing in order to fit the store contract.

Test Plan: ran the tests

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Differential Revision: https://phabricator.fb.com/D3205314

Signature: t1:3205314:1461606028:3a513ac82c5de668a7e40bbf7cc88d8754e2f0bb
2016-04-26 15:10:38 -07:00
Durham Goode
9cfbf5a59e store: keep track of the writable store instead of hard coding it
Summary:
A future patch is going to change the union store to just contain an ordered
list of stores. Therefore we need a special spot to record which store is the
one that should receive writes.

Test Plan: ran the tests

Reviewers: #sourcecontrol

Differential Revision: https://phabricator.fb.com/D3205307
2016-04-26 15:10:38 -07:00
Durham Goode
c3c047f0b7 Move sortnodes into shallowutil
Summary:
This is a generic topological sort and will be useful in the upcoming repacking
code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3204124

Signature: t1:3204124:1461260520:e1cb5c9d496f11e5f44e0cdbc5ba851b1573d2e1
2016-04-26 15:10:38 -07:00
Durham Goode
84bc49f25d checkcode: fix shallowrepo, shallowutil, and setup.py
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3221375

Signature: t1:3221375:1461648312:7dbdd59e6370cb32b90d864a623d8066028741e7
2016-04-26 13:00:31 -07:00
Durham Goode
3817826242 checkcode: fix remotefilelogserver and shallowbundle
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3221373

Signature: t1:3221373:1461648284:23203c17f4a87e33ff4e9be17a8b99bddbcdff05
2016-04-26 13:00:31 -07:00
Durham Goode
39d350996f checkcode: fix remotefilectx and remotefilelog
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3221371

Signature: t1:3221371:1461648217:e9702d761ab8fd6f85dee60a4c192cf25e784f11
2016-04-26 13:00:31 -07:00
Durham Goode
859510b65e checkcode: fix fileserverclient.py
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3221369

Signature: t1:3221369:1461648197:185cbbba61a9d1a7a1beacd64153185d0d0826ed
2016-04-26 13:00:31 -07:00
Durham Goode
71bd8c2561 checkcode: fix errors in cacheclient and debugcommands
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3221366

Signature: t1:3221366:1461648117:088f3a5837393499e1a383af860bd1a935e0cba7
2016-04-26 13:00:31 -07:00
Durham Goode
495a853d78 checkcode: fix __init__.py
Summary: Fix failures found by check-code.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Differential Revision: https://phabricator.fb.com/D3221365

Signature: t1:3221365:1461646159:efeb0478c66cbd49d4a0a6c02a79d530b42f8248
2016-04-26 13:00:31 -07:00
Jun Wu
ead8969797 Fix missing errno import
Summary: Apparently we need to `import errno` in `shallowutil.py`

Test Plan: Code Review

Reviewers: #sourcecontrol, ttung, durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D3195117

Signature: t1:3195117:1461031210:424912a96448a2a8cb37197f006cfa95d4ab1cb1
2016-04-18 19:04:58 -07:00
Durham Goode
2d1dcb4b97 Fix missing 'grp' import 2016-04-18 11:46:06 -07:00
Durham Goode
5b2914142a Fix status returning invalid results
The recent refactor caused remotefilelog.size() to include rename metadata in
the size count, which meant the size didn't match what the rest of Mercurial
expected. This caused clean files to show up as dirty in hg status if they had a
'lookup' dirstate state and were renames.
2016-04-10 09:46:24 -07:00
Durham Goode
2e93ca187a Add byte count checking when receiving from the server
Summary:
We've received a few complaints that receivemissing is throwing corrupt data
exceptions. My best guess is that we're not receiving all of the data for some
reason. Let's add an assertion to ensure all the data is present, so we can
narrow it down to a connection issue instead of actual corrupt data.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D3136203
2016-04-05 09:50:12 -07:00
Durham Goode
24323a759c store: address code review feedback
This was meant to be part of the previous stack of commits, but I pushed the
wrong stack. This patch addresses a number of code review feedback points, the
most visible being to remain 'contains' to something else (in this case
'getmissing').
2016-04-04 16:48:55 -07:00
Durham Goode
8ca8f7f6ca stores: remove fetch logic and replace with a remote store fallthrough
The old way of fetching from the server required the base store api expose a way
for outside callers to add fetch handlers to the store. This exposed some of the
underlying details of how data is fetched in an unnecessary way and added an
awkward subscription api.

Let's just treat our remote caches as another store we can fetch from, and
require that the over arching configure logic (in shallowrepo.py) can connect
all our stores together in a union store.
2016-04-04 16:26:12 -07:00
Durham Goode
ece19111e0 ioutil: rename ioutil to shallowutil
The old name was not very descriptive. There's already a shallowutil, so let's
just use that.
2016-04-04 16:26:12 -07:00
Durham Goode
29ea8ada1e store: delete the localcache class
Now that all functionality has been moved to the new store, we no longer need
the localcache class. So let's delete it.
2016-04-04 16:26:12 -07:00
Durham Goode
ecf4378d18 store: implement gc in the new store
The last major piece of functionality that needs to be moved into the new store
is the gc algorithm. This is just a copy paste of the one that exists in
localcache.
2016-04-04 16:26:12 -07:00
Durham Goode
d70897e18c store: implement markrepo on the new store
Now that most of our storage has been moved behind the new store, let's also
move the ability to mark the repo to behind that storage abstraction.
2016-04-04 16:26:12 -07:00
Durham Goode
0dd4247520 store: make remotefilelog.ancestormap use the new store
Now that we have a metadatastore, let's use it to implement
remotefilelog.ancestormap. This gets rid of a bunch of ugly code.
2016-04-04 16:26:12 -07:00
Durham Goode
ad473d5a6b store: make remotefilelog.linknode us the new store
Now that we have the new metadatastore, let's use it to fetch the linknode
instead of parsing the data ourself.
2016-04-04 16:26:12 -07:00
Durham Goode
82bc4468ed store: make remotefilelog.renamed use the store
Now that we have a metadata store, let's switch remotefilelog.renamed to consult
it, instead of parsing the data itself.
2016-04-04 16:26:12 -07:00
Durham Goode
aba161c424 store: implement metadatastore functions
This implements the metadatastore APIs that were previously just stubs.
2016-04-04 16:26:12 -07:00
Durham Goode
8ad3ce6f41 store: change fileserviceclient to write via new store
Now that we have the new store abstraction, and now that remotefilelog.py writes
via it, let's also make fileserverclient write to the store via that API.

This required some refactoring of how receive missing worked, so we could pass
the filename down, as that is required for writing to the store.
2016-04-04 16:26:12 -07:00
Durham Goode
721f54d0df store: move remotefilelog content writing to be done via basestore
Now that we have the new store abstraction, let's route writes through it as
well.
2016-04-04 16:26:12 -07:00
Durham Goode
ffb239bdcb store: switch remotefilelog.size to use new store
Now that we can read data via the new store, let's switch remotefilelog to use
that instead of talking to the filesystem directly.
2016-04-04 16:26:12 -07:00
Durham Goode
69aff18063 store: switch remotefilelog.read to use self.revision
Now that remotefilelog.revision is implemented using the new contentstore, let's
switch remotefilelog.read to use that instead. This logic is almost identical to
what's in filelog.read
2016-04-04 16:26:12 -07:00
Durham Goode
50df2e518f store: switch remotefilelog.revision to use new store
Now that the new contentstore has get(), let's switch remotefilelog.revision to
use it instead.
2016-04-04 16:26:12 -07:00
Durham Goode
cb48cd034a store: add store data validation
The old store logic has validation for checking the data it's reading is
corrupt. Let's copy and paste that over to the new store.
2016-04-04 16:26:12 -07:00
Durham Goode
dfb49ad597 store: implement contentstore.get
This implements the basic function for fetching content data from the
remotefilelog store.
2016-04-04 16:26:12 -07:00
Durham Goode
647684cca8 store: implement basestore.contains
This implements the basic contains function that checks if the given (filename,
node) pairs are in the store.
2016-04-04 16:26:12 -07:00
Durham Goode
1d97924c54 store: construct store during repo creation
We are refactoring the storage to be behind more abstract APIs. This patch
creates the new store objects on the repo and passes them to the
fileserverclient so it can add itself as a file provider, in the case of misses.
2016-04-04 16:26:12 -07:00
Durham Goode
9c88142860 store: add union stores
Future patches will refactor the storage logic into a more abstract API. This
patch adds a union store, which will allow us to check both local client storage and
shared cache storage, without exposing the difference at higher levels.
2016-04-04 16:26:12 -07:00
Durham Goode
b62ef50278 store: add stubs for storage classes
Future patches will refactor the storage into a more abstract API. This is the
initial stubs for that API.
2016-04-04 16:26:12 -07:00
Durham Goode
492b9af06e ioutil: move helper functions to ioutil
Future patches will refactor the storage into more abstract APIs. Let's move
these utility functions out to be on their own.
2016-04-04 16:26:12 -07:00
Jun Wu
b7e6384e9c Allow repo = None in runcommand
Summary:
When running inside chg, `reposetup` will be called once since `serve` is not
a `norepo` command. Then if the user runs a `norepo` command like `help`,
`runcommand` will receive `repo = None` and error out. Fix it by checking
`repo` explicitly.

Test Plan: Run `chg help` and no exception is thrown.

Reviewers: #sourcecontrol, ttung, durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D3136328

Signature: t1:3136328:1459811387:3b86df9765aa5e20677031d6e9fc4bc3d524efa6
2016-04-04 16:22:16 -07:00
Durham Goode
f774b1b204 adjustlinknode: remove unnecessary ancestor walk
Summary:
Since we added the C code ancestor walk to this function, this python ancestor
walk is completely unnecessary, and can cause significant slow downs if none of
the ancestors are known linknodes (it walks the entire history).

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D3136150
2016-04-04 15:30:47 -07:00
Jun Wu
2ec49732fd Add shallowrepo check in wrapped log function
Summary:
Discovered by `hg log filename` in the hg-committed repo. It seems we missed
a check here.

Test Plan:
Run `hg log filename` in a non-remotefilelog repo with remotefilelog enabled
and make sure "warning: file log can be slow on large repos" is not printed.

Reviewers: #sourcecontrol, ttung, durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D3132523

Signature: t1:3132523:1459801676:bcba3bbcaf1c358ad11e8ad25c0a1d3cc2637a76
2016-04-04 13:33:28 -07:00
Kostia Balytskyi
4e61e19a3d remotefilelog: do ui.log of remotecache hit rate
Summary: We would like to utilize Martijn's logtoprocess extension to log cache hit rate.

Test Plan: None so far, will update the diff later.

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D3094765
2016-04-01 03:16:00 -07:00
Mateusz Kwapich
cc54a98956 addchangegroup: adjust for new upstream API
Summary:
addchangegropfiles doesn't take the pr function as a parameter anymore.
The upstream change https://selenic.com/hg/rev/982e3ef7f5bf

Test Plan: tests are passing now on the release branch

Reviewers: #sourcecontrol, ttung, durham

Reviewed By: durham

Differential Revision: https://phabricator.fb.com/D3107217

Signature: t1:3107217:1459211189:4ece7531aff6043fc3acbfe43e2f471781c25c9d
2016-03-30 14:17:49 -07:00
Augie Fackler
9eb0009839 fileserverclient: use new iterbatch() method
This allows the client to send a single batch request for all file contents
and then handle the responses as they stream back to the client, which should
improve both running time and the user experience as far as it goes with
progress.
2016-03-22 10:06:24 -07:00
Augie Fackler
86ea8ed060 commands: norepo was removed in e1563031f528
Use the decorator form instead, introduced in hg 3.1.
2016-03-03 13:40:31 -05:00
Wez Furlong
2ec314e26a remotefilelog: add separate option to validate localcache files
Summary:
We've recently had to dig into two different issues that resulted in broken
files landing in the localcache; one was due to a problem with the data source
for our cacheprocess becoming corrupt and the other was due to a failed write
(ENOSPC) causing a truncated file to be left in the local cache.

It is desirable to perform some lightweight consistency checks before we return
data up to the caller of localcache, but prior to this diff the validation
functionality was coupled to configuring a log file.

Due to the shared nature of the localcache it's not always clear cut where we
want to log localcache consistency issues, so it feels more flexible to
decouple logging from enabling checks.

This diff introduces `remotefilelog.validatecache` as a separate option that
can have three values:

* `off` - no checks are performed
* `on` - checks are performed during read and write
* `strict` - checks are performed during __contains__, read and write

The default is now `on`.

Test Plan: `./run-tests.py --with-hg=../../hg-crew/hg`

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D2941067

Tasks: 10044183, 9987694
2016-02-18 08:34:33 -08:00
Durham Goode
a7a78cda1e More robust adjustlinknode code for None srcrev's
Summary:
The srcrev passed to adjustlinknode can sometimes be None, which causes an
exception. The code that throws the exception was introduced recently as part of
taking advantage of a C fast path.

The fix is to move the srcrev check to be after the None handling.

Test Plan:
I'm not sure how to repro this naturally actually.  I tried writing
tests that did rebases of renames, but it didn't trigger.  I manually verified
it by using the debugger to insert a None for the srcrev at the beginning of
adjustlinknode

Reviewers: lcharignon, #sourcecontrol, ttung, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.fb.com/D2944899

Tasks: 10066192

Signature: t1:2944899:1455735567:c8eea240885847061239bf3df0ea59dbbd0e4858
2016-02-17 11:01:45 -08:00
Wez Furlong
fd584f7e56 remotefilelog: more graceful handling of write errors for localcache
Summary:
I debugged an issue this past week where a set of machines had exhausted the
disk space available on the partition where the local cache was situated.  This
particular tier didn't use cacheprocess, only the local cache.  There were some
number of truncated files in the local cache.

Inspecting the code here, it looks like we're using atomictempfile incorrectly.
atomictempfile.close() will unconditionally rename the temp file into place,
and we were calling this from a finally handler.

It seems safest to remove the try/finally from around this section of code and
just let the destructor trigger to clean up the temporary file in the error
path, and if we make it through writing the data, then call close and have it
move the file in to place.

Test Plan:
ran the tests.  They don't cover this case, but at least I didn't
obviously break anything:

```
 $ ./run-tests.py --with-hg=../../hg-crew/hg
...................
# Ran 19 tests, 0 skipped, 0 warned, 0 failed.
```

Reviewers: #sourcecontrol, ttung, mitrandir

Reviewed By: mitrandir

Subscribers: scyost

Differential Revision: https://phabricator.fb.com/D2940861

Tasks: 10044183

Signature: t1:2940861:1455673078:a7593d70c32151e13c8ccc31f92387e9c8cb23a0
2016-02-17 08:03:38 -08:00
Durham Goode
2cce4008b6 adjustlinknode: user C fastpath
Summary:
The adjustlinknode logic was pretty slow, since it did all the ancestry
traversal in python. This patch makes it first use the C fastpath to check if
the provide linknode is correct (which it usually is), before proceeding to the
slow path.

The fastpath can process about 300,000 commits per second, versus the 9,000
commits per second by the slow path.

This cuts 'hg log <file>' down from 5s to 2.5s in situations where the log spans
several hundred thousand commits.

Test Plan:
Ran the tests, and ran hg log <file> on a file with a lot of history
and verified the time gain.

Reviewers: pyd, #sourcecontrol, ttung, quark

Reviewed By: quark

Subscribers: quark

Differential Revision: https://phabricator.fb.com/D2908532

Signature: t1:2908532:1454718666:c4e63d73057572f035082943ef2e6fe0a49238c1
2016-02-08 14:40:07 -08:00
Simon Farnsworth
6cdf20e7ad remotefilelog: Make TortoiseHG work with remotefilelog 2016-02-05 14:53:45 +00:00
Durham Goode
16d12ec27c Remove limit on adjust linknode lookup
Previously we limited the changelog scan for old commits to the most recent
100,000, under the assumption that most changes would be within that time frame.
This turned out to not be a good assumption, so let's remove the limitation.
2016-01-27 15:56:36 -08:00
Augie Fackler
afca077cf9 fileserverclient: add option to provide file path to cacheprocess
For our uses of remotefilelog, life is significantly easier if we also
have the file path rather than just a hash of the file path. Hide this
behind a config knob so users can enable it or not as makes sense.
2016-01-27 13:22:22 -08:00
Durham Goode
4ee8e7278d changegroup: support new _packermap name
Upstream changed changegroup.packermap to be changegroup._packermap. So we need
to update accordingly.
2016-01-19 16:34:53 -08:00
Durham Goode
13c2a7823f Add alternative linkrev lookup logic
Summary:
The old linkrev lookup logic depended on the repo containing the latest commit
to have contained that particular version of the file. If the latest version had
been stripped however (like what happens in rebase --abort currently), the
linkrev function would attempt to scan history from the current rev,
trying to find the linkrev node.

If the filectx was not provided with a 'current node', the linkrev function
would return None. This caused certain places to break, like the Mercurial
merge conflict resolution logic (which constructs a filectx using only a
fileid, and no changeid, for the merge ancestor).

The fix is to allow scanning all the latest commits in the repo, looking for the
appropriate linkrev. This is pretty slow (1 second for every 14,000 commits
inspected), but is better than just returning None and crashing.

Test Plan:
Manually repro'd the issue by making a commit, amending it, stripping the
amended version and going back to the original, making two sibling commits on
top of the original, then rebasing sibling 1 onto sibling 2 (so that the
original commit that had the bad linknode data was the ancestor during the
merge). Previously this failed, now it passes. I'd write a test, but it's 11pm
and I'm tired and I need this in by early tomorrow morning to make the cut.

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: trunkagent, rmcelroy

Differential Revision: https://phabricator.fb.com/D2826850

Signature: t1:2826850:1452680293:cb8c1f8c20ce13ad632925137dbdce6e994ab360
2016-01-13 11:25:26 -08:00
Laurent Charignon
707f243248 remotefilelog: make the wrapping of dispatch.run safer
Summary:
I somehow got a stacktrace with IPython on a non-remotefilelog repo that ran
this code and complained that fileservice didn't exit. I am not sure how it
happened but let's make the call safer to match the pattern used elsewhere in
the file.

Test Plan: No stacktrace seen after that, one line change

Reviewers: durham

Differential Revision: https://phabricator.fb.com/D2819402
2016-01-11 10:48:51 -08:00
Kostia Balytskyi
9500813607 remotefilelog: removing filelog check from verification process
Differential Revision: https://phabricator.fb.com/D2812664
2016-01-07 16:57:39 -08:00
Stanislau Hlebik
33b7e1013a remotefilelog: make .hg/store/data blobs read only
Summary:
Today, people running codemods or search/replace on their repos often accidentally corrupt their repos, and everyone ends up sad.
It's better to make them read-only

Test Plan: python run-tests.py

Reviewers: rmcelroy, #sourcecontrol, durham, ttung

Reviewed By: durham

Subscribers: mitrandir, quark, durham

Differential Revision: https://phabricator.fb.com/D2807369

Tasks: 9431187

Signature: t1:2807369:1452192329:b5ed6606cb66b1c830fc3d3fb5a81e6120387b38
2016-01-07 13:37:36 -08:00
Laurent Charignon
af9917b578 remotefilelog: fix compat with core on builddeltaheader 2015-12-30 13:33:47 -08:00
Laurent Charignon
963dc28d83 compat: fix _verify wrapper
Summary:
In 4fb35d8c2105 in core @durham removed _verify and replaced it with
verify, this patch makes remotefilelog compatible with those changes.

Test Plan: The tests are failing after but don't fail on this anyore

Reviewers: ericsumner

Subscribers: durham

Differential Revision: https://phabricator.fb.com/D2791847
2015-12-28 14:58:21 -08:00
Durham Goode
cb448f683b Stop writing backup local data blobs
Summary:
Historicaly we would move the old backup data blob to <name>+<int> so we had a
record of all the old data blobs we could search though for good commit
histories.

Since we no longer require that the data blobs have perfect commit histories,
these extra blobs just take up space.

This changes makes us only store one old version (for debugging and recovery
purposes), which should save space on clients.

Also switched to atomic rename writes while we're at it.

Test Plan: Ran the tests

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D2770675
2015-12-17 13:02:29 -08:00
Durham Goode
c59623483f Limit checkunknown fetching to just what's in the sparse checkout
The newly added checkunknown prefetching apparently gets handed the full list of
files that are not present on disk right now, which includes all the files
outside of the sparse checkout. So we need to filter those out here.
2015-12-16 12:59:44 -08:00
Durham Goode
b3b4ddc20b Prefetch before addremove check
Summary:
When running addremove, it needs to see the contents of the removed files so it
can determine if they are a remain. So we need to add bulk prefetching in this
situation.

Test Plan: Added a test

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: dcapra

Differential Revision: https://phabricator.fb.com/D2756979

Signature: t1:2756979:1450132279:668b8b160d792cad1ac37e2069716e20ea304f57
2015-12-14 14:44:11 -08:00
Durham Goode
faccfe65d4 Add prefetching to checklookup
Summary:
During hg status Mercurial sometimes needs to look at the size of contents of
the file and compare it to what's in history, which requires the file blob.

This patch causes those files to be batch downloaded before they are compared.

There was a previous attempt at this (see the deleted code), but it only wrapped
the dirstate once at the beginning, so it was lost if the dirstate object was
replaced at any point.

Test Plan: Added a test to verify unknown files require only one fetch.

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Subscribers: dcapra

Differential Revision: https://phabricator.fb.com/D2756768

Signature: t1:2756768:1450130997:7c7101efe66c998e3182dfbd848aa6b1a57d509f
2015-12-14 14:44:08 -08:00
Durham Goode
4a5ae177bb Add prefetching for checkunknownfiles
Summary:
When doing an update, Mercurial checks if unknown files on disk match
what's in memory, otherwise it stops the checkout so it doesn't cause data loss.

We need to batch fetch the necessary files from the remotefilelog server for
this operation.

Test Plan: Added a test

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: dcapra

Differential Revision: https://phabricator.fb.com/D2756837

Signature: t1:2756837:1450132288:bc0530a07ea40aaeb2af1a93e4da82778cc11369
2015-12-14 14:49:34 -08:00
Durham Goode
b1c0840594 Remove unnecessary fallbackpath arg from getfiles
This wasn't used so we can clean it up.
2015-12-11 11:20:24 -08:00
Durham Goode
20102e4f2b Reuse ssh connection across miss fetches
Summary:
Previously we recreated the ssh connection for each prefetch. In the case where
we were fetching files one by one (like when we forgot to batch request files),
it results in a 1+ second overhead for each fetch.

This changes makes us hold onto the ssh connection and simply issue new requests
along the same connection.

Test Plan:
Some of the tests execute this code path (I know because I saw them
fail when I had bugs)

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.fb.com/D2744688
2015-12-11 11:18:51 -08:00
Martin von Zweigbergk
1c64f784ed make changegroup.addchangegroupfiles() overriding more flexible
The method gained a parameter in hg revision 43d86cd9dae2
(changegroup: note during bundle apply if the repo was empty,
2015-12-02).
2015-12-10 17:25:14 -08:00
Martin von Zweigbergk
7251d9b51b repo: replace repo.parents() by repo[None].parents()
repo.parents() was removed in hg revision d5d613de0f44 (commands:
inline definition of localrepo.parents() and drop the method (API),
2015-11-11).
2015-12-10 17:25:14 -08:00
Martin von Zweigbergk
8f7ee3c1b1 replace localrepo.clone() by exchange.pull()
localrepo.clone() was removed in hg revision 9996a5eb7344 (localrepo:
remove clone method by hoisting into hg.py, 2015-11-11).

Instead of localrepo.clone(), we now use exchange.pull(). However,
that method was already overridden in onetimeclientsetup(), which is
called from our new overriding of exchange.pull(). Since it should be
done first, we move that overriding from onetimeclientsetup() to
uisetup().
2015-12-10 17:25:14 -08:00