Commit Graph

43 Commits

Author SHA1 Message Date
Mihails Smolins
bbd44be472 remotefilelog: garbage collection on repack
Summary:
Implemented initial version of a garbage collector for repack. Garbage
collection is only performed if gcrepack flag is set in config. Currently
data that is old and not in keepset is garabage collected. The age of data at
which point it will be garbage collected can be specified in config as well.

Test Plan: Added a test case

Reviewers: durham

Reviewed By: durham

Subscribers: rmcelroy, medson, mjpieters, #mercurial

Differential Revision: https://phabricator.intern.facebook.com/D5548194

Tasks: 19727521

Signature: t1:5548194:1501882489:cc775ec95aa6fafc462a46f99d790f231139ce27
2017-08-07 11:30:53 -07:00
Durham Goode
96ea7cd405 histpack: add per-node index
Summary:
Previously looking up a particular node in a histpack required a bisect to find
the file section, then a linear scan to find the particular node.  If you needed
to look up the latest 3000 nodes one by one, that involved 3000 linear scans,
many of which traversed the same nodes over and over.

This patch adds additional index at the end of the current histidx file. In a
future patch, we will change getnodeinfo() to use this index instead of the
linear scan logic.

Test Plan:
Ran the tests. I haven't actually verified that the data in these
indexes is correct. My next patch will add logic that reads these indexes and
will add tests around it. I won't land this until I've confirmed it's correct.

Reviewers: quark, #mercurial, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4983690

Signature: t1:4983690:1493798877:ae0802b4896b54bf066df9f684d94554855fd35a
2017-05-03 10:19:45 -07:00
Durham Goode
d1a927d335 packs: add entry count to pack index
Summary:
Previously, we used the length of the index file to determine the upper bounds
of the bisect. In a future patch we'll want to add more data to the end of the
index file, so we need to record how long the index portion of the index is.
This patch adds that information.

Test Plan: Ran the tests.

Reviewers: #mercurial, quark

Reviewed By: quark

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4983682

Signature: t1:4983682:1493693255:57ab9af2030847fedff05b6755113ba8ce0c933b
2017-05-03 10:19:45 -07:00
Jun Wu
491a1ce44c datapack: fix a freememory issue
Summary:
The issue was introduced by the getmeta refactoring. It didn't get caught by
tests because tests didn't trigger freememory.

Test Plan: The fix was verified working on fbsource

Reviewers: #mercurial, durham

Reviewed By: durham

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4978835

Signature: t1:4978835:1493689136:296d425cf5d08b807b898c0e8cd881c9207c6359
2017-05-01 19:09:06 -07:00
Jun Wu
4a613bcaef datapack: update the format about metadata
Summary:
This diff makes 2 changes to v1 packfile metadata:

1. Move `key` in a metadata entry to before `size`.

```
old: [entry-size: 2 byte] [key: 1 byte] [data: var length]
new: [key: 1 byte] [data-size: 2 byte] [data: var length]
```

Previously `entry-size == 0` does not make sense.

2. Use binary to represent sizes, instead of ASCII.

Related utility methods are cleaned up a bit so it's harder to make mistakes.

Test Plan: Updated existing tests

Reviewers: #mercurial, durham

Reviewed By: durham

Subscribers: durham, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4983189

Signature: t1:4983189:1493689852:22d544d73ed63fac83f849786de035af304161ce
2017-05-01 19:03:25 -07:00
Jun Wu
9a8545ebc1 cdatapack: implement getmeta for C module
Summary:
This diff implements getmeta in C and enables related tests.

Now all content stores support `getmeta`.

Test Plan: Run existing tests.

Reviewers: #mercurial, durham

Reviewed By: durham

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4960926

Signature: t1:4960926:1493611048:55095c32927fac74e698f21d47173cb8a7523fb6
2017-05-01 13:29:19 -07:00
Jun Wu
4240bd017e remotefilelog: let content stores support metadata
Summary:
This diffs add a `getmeta` method to all content stores. The cdatapack code is
modified to pass the tests, it needs further change to support `getmeta`.

The datapack format is bumped to v1 from v0. For v1, we append a `metadata`
dict at the end of each revision. The dict is currently used to store revlog
flags and rawsize of raw revlog fulltext. In the future we can put more data
like a second hash etc, without changing API or format again.

This diff focuses on correctness. A datapack caching layer to speed up
`getmeta` will be added later.

Tests are updated since we write new v1 packfile now and the format change
leads to different content and packfile names.

`Makefile`, `ls-l.py` are added to make tests easier to maintain.

Test Plan: Updated existing tests.

Reviewers: #mercurial, rmcelroy, durham

Reviewed By: durham

Subscribers: rmcelroy, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4903917

Signature: t1:4903917:1493255844:7ef5d487096cd2f78f2aaae672a68d49f33632ee
2017-04-26 19:50:36 -07:00
Jun Wu
8fcd86af16 remotefilelog: move constants to class to prepare index format change
Summary:
To be able to bump version and change formats, the related constants need to
be moved to individual classes. So a class (ex. datapack) can be subclassed to
handle different formats.

Test Plan: `arc unit`

Reviewers: #mercurial, durham

Reviewed By: durham

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4927284

Signature: t1:4927284:1493152641:e3274dd735d50baf193b7615dd314f4e6cf161f0
2017-04-26 13:34:15 -07:00
Kostia Balytskyi
0c2d706810 remotefilelog: wrap lz4 imports for compatibility
Differential Revision: https://phabricator.intern.facebook.com/D4707115
2017-03-17 14:02:26 -07:00
Augie Fackler
fde49a1a56 datapack: don't depend on demandimport when cstore isn't available
We've got a goofy test binary that doesn't use demandimport, and this
was tripping it up.
2017-03-14 12:44:43 -07:00
Durham Goode
e1b1c470e6 cstore: move py-cdatapack to be part of py-cstore
Summary:
As part of unifying our storage layer into a single library, let's move
py-cdatapack into the new cstore directory. Future patches will move
ctreemanifest and the upcoming datapackstore into here as well.

py-cdatapack.h required some reordering since it seems forward declarations work
a little differently between C and C++. There were no code changes though,
except one int->size_t fix.

Test Plan: Ran the tests

Reviewers: #mercurial, simonfar

Reviewed By: simonfar

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4581320

Signature: t1:4581320:1487788968:e8a34c7a03a16db282214c7dd476b749b92a1bfa
2017-02-23 14:03:02 -08:00
Adam Simpkins
ab81106b08 [remotefilelog] don't crash on invalid pack files
Summary:
We have run into some cases where users ended up with empty pack file in their
packs directory.  Just log a warning in this case and skip this pack file,
rather than letting the exception propagate up and crashing the command.

Test Plan:
Created empty 0000000000000000000000000000000000000000.histpack and
0000000000000000000000000000000000000000.histidx files in my repository's
hgcache directory, and confirmed that "hg log" now simply warns about them
instead of crashing.

I didn't really test the perftest.py or treemanifest_correctness.py extensions
much.  They seem to throw exceptions, and look like they have maybe gotten a
bit stale.  I fixed one minor typo, but didn't dig into the other exceptions
too much.

Reviewers: durham

Reviewed By: durham

Subscribers: net-systems-diffs@, yogeshwer, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4402516

Tasks: 15428659

Signature: t1:4402516:1484155254:96d2980efcec2d735257b08910e1ca437ef1dad6
2017-01-12 09:47:29 -08:00
Durham Goode
f43ba75915 remotefilelog: fix pyflakes and module import errors
Summary:
This fixes all the pyflaks and module errors for the main remotefilelog
code base.

Test Plan: ./run-tests.py test-check* test-remotefilelog*

Reviewers: #mercurial, quark

Reviewed By: quark

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4055537

Signature: t1:4055537:1477049663:ee904d311d17d3659e055e2c109c68c9023cfd1f
2016-10-21 11:02:09 -07:00
Tony Tung
f642d24f1c [cdatapack] create a fastdatapack class
Summary:
fastdatapack is the same as datapack.  add selector in datapackstore to determine which datapack to create.

test-datapack-fast.t is the same as tset-datapack.t, except it enables fastdatapack

Test Plan: pass test-datapack.t test-datapack-fast.t

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3666932

Signature: t1:3666932:1470426499:45292064e2868caab152d9a5b788840c5f63e4e4
2016-08-05 14:35:29 -07:00
Tony Tung
641c5f4f01 [debugcommands] return the uncompressed delta length when iterating over a datapack
Summary:
When retrieving a delta chain, datapack.py uncompresses the delta chain data.  However, when iterating over the datapack, we get the compressed length.  THis is not desirable as the output is no longer consistent.  This diff peeks into the lz4 header to get the uncompressed length when iterating.

Depends on D3627119

Test Plan:
```
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:e2ef218> hg debugdatapack d864669a5651d04505ec6e5e9dba1319cde71f7b --node f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e

source/zippydb/tier_spec/tier_settings/zippydb.wildcard.tmpfs.zippydb_settings.cconf
Node                                      Delta Base                                Delta SHA1                                Delta Length
f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e  0000000000000000000000000000000000000000  f32b366a6c44430df6526133f82f9638426ba9c5  37769
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:e2ef218> hg debugdatapack d864669a5651d04505ec6e5e9dba1319cde71f7b | tail -n 4

source/zippydb/tier_spec/tier_settings/zippydb.wildcard.tmpfs.zippydb_settings.cconf
Node          Delta Base    Delta Length
f2e53f83c5dc  000000000000  37769
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:e2ef218>
```

Reviewers: durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3636945

Signature: t1:3636945:1469811243:b21d90d9599244ed4600c5336818b9a18eacf3ff
2016-08-01 14:16:29 -07:00
Tony Tung
3da845968f [datapack] fix computation of the paged-in size
Summary:
It should include the filelen and the deltalen fields, which are
2 and 8 bytes.

Test Plan: visual.

Reviewers: durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3627108

Signature: t1:3627108:1469742083:ffb59768906d9e5463065eec92e1c80cc8482884
2016-07-28 17:06:50 -07:00
Durham Goode
073f8e3d22 repack: unmap memory occasionally to reclaim space
Summary:
When running large repack operations, the resident size of the process
could become quite large, since we're scanning in entire pack files. Linux/OSX
have api calls for telling the kernel it's ok to release some of that memory,
but those apis are not exposed to python.

So instead, let's unmap and remap the mmap's once a certain amount of data has
been read. I also tried changing the mmap accessors to use the file oriented api
(mmap.read(), mmap.seek(), etc) so we could switch to actual file handles during
repack, but it had a drastic affect on normal performance (repack took 1 hour
instead of a few minutes).

Long term we should move all of this logic to c++ so we can use the more
powerful APIs.

Test Plan:
Did a full repack on a laptop and verified memory capped out at 2GB
instead of exceeding 5GB.

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.intern.facebook.com/D3545171
2016-07-12 11:46:48 -07:00
Durham Goode
77192943d4 repack: handle race condition with background repacks
Summary:
There was a race condition where if a repack is running and another hg process
launches, the new process will only see the original packs, and not any of the
new packs (even though the source blobs are being deleted from disk by the
repack).

The fix is to allow our pack store to refresh it's list of packs every so often.
In this particular implementation we do it at most every 100ms. A more robust
strategy would be to group key misses and only check for new packs at the end
once we have a list of all the misses, but this would require significant
refactoring to make everything grouped. This case should only ever happen during
repacks, so it should almost never occur more than once during a command, so the
100ms version is probably good enough.

Test Plan:
Ran `hg up && hg pull && sleep 0.2 && hg up master` in a loop with a
break point in the refresh code and caught it executing in a situation where the
background repack had removed the original sources and put them in a new pack.
Verified that it loaded the data from the new pack correctly.

Reviewers: #mercurial, ttung, lcharignon

Reviewed By: lcharignon

Subscribers: lcharignon

Differential Revision: https://phabricator.intern.facebook.com/D3524314

Signature: t1:3524314:1467907680:85be07ad953811000c468852eb0626f4d8b53a13
2016-07-07 15:59:06 -07:00
Durham Goode
a2646d8da9 packs: change LookupError to KeyError
We've unified on KeyError being the error thrown when the pack is missing the
desired filename+filehash, but there were a few old places still using
LookupError. This patch changes them to also be KeyError.

This fixes an issue where a repack could throw a LookupError when it only had a
partial history of a file. Now that we throw a KeyError, the exception is caught
and handled appropriately.
2016-05-26 02:07:11 -07:00
Durham Goode
304c6f5bd0 pack: move common pack logic into basepack
Summary:
This moves the common logic from datapack and historypack into a common
basepack. At the moment the only common logic is the constructor, which handles
version checking, fanout initialization, and mmap stuff.

Test Plan: Ran the tests

Reviewers: mitrandir, #mercurial, ttung, mjpieters

Reviewed By: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3306558

Signature: t1:3306558:1463474571:35d3d2e71849b8111e5455da2dd4810725a35523
2016-05-24 02:15:58 -07:00
Durham Goode
bb3a62a266 pack: move common pack store logic to basepackstore
Summary:
This takes the duplicate logic from datapackstore and historypackstore and moves
it into a common subclass.

Test Plan: Ran the tests

Reviewers: ttung, mitrandir, #mercurial, mjpieters

Reviewed By: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3306551

Signature: t1:3306551:1463474958:ddd36a43c2a3cbb34254c2b0153c0f3c5d431edb
2016-05-24 02:15:58 -07:00
Durham Goode
7cce219abb pack: move common logic out of mutabledatapack into base class
Summary:
mutabledatapack and mutablehistorypack share a lot of common code, especially
around the index and fanout table. Let's move much of the code to a common
mutablebasepack class and out of mutabledatapack. In the next patch I will make
mutablehistorypack a subclass of mutablebasepack and delete all the duplicate
logic.

Test Plan: Ran the tests

Reviewers: mitrandir, #mercurial, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: quark, rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3306542

Signature: t1:3306542:1463611860:16bc68416c9bbed87748a50f55a3bac7c618fdf1
2016-05-20 09:31:37 -07:00
Durham Goode
39fb6f6f14 packs: prevent creation of empty packs
Summary:
Previously creating a mutable pack then not adding anything to it, would result
in a 1 byte pack file. This patch fixes it so it doesn't produce any pack file.
A future patch found this bug and includes a test that executes this code path.

Test Plan: Future patch adds a test

Reviewers: mitrandir, #mercurial, ttung, rmcelroy

Reviewed By: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3306520

Signature: t1:3306520:1463602354:f47e9478ea686b0b86525b5979dec9ce56301b2a
2016-05-20 09:31:19 -07:00
Durham Goode
56411349f7 datapack: make fanout size dynamic
Summary:
Previously all fanout tables were 2^16 in length. For small packs, this resulted
in very sparse fanout tables that had to be linearly scanned to find the end of
the search bounds, which was slow.

With this patch, the data index now dynamically chooses what fanout table size
to use.  If the pack has over 2^16 / 10 entries, we use 2^16, otherwise we use
2^8. The reasoning is in the code.

The patch is a bit large because we had to take a bunch of constants and
duplicate them, and change their accessors to access them through member
variables.

Test Plan: Added a test

Reviewers: #mercurial, ttung, mitrandir

Reviewed By: mitrandir

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3279000

Signature: t1:3279000:1463170357:9746d19dde14743bac9a8c40cafbc618504c420f
2016-05-16 10:59:09 -07:00
Durham Goode
ad5352432e packs: change iterkeys() to __iter__
Summary:
In a future patch we'll add debug commmands to view the contents of packs. Let's
repurpose iterkeys to just be __iter__ and have it return most of the data about
each entry.

Test Plan: Ran the tests

Reviewers: #mercurial, ttung, mitrandir

Reviewed By: mitrandir

Subscribers: mitrandir, quark

Differential Revision: https://phabricator.intern.facebook.com/D3277225

Signature: t1:3277225:1463170015:c1acbc7f435f1cf8d07fea4f32bf742a22de5716
2016-05-16 10:59:09 -07:00
Durham Goode
e617c24532 datapack: add index marker for no delta base
Summary:
Previously, we assumed every delta in a pack had a complete chain in that pack,
so the index had no way to indicate a deltabase offset that didn't exist in the
pack.

Let's add a new marker to indicate that the delta base doesn't exist in this
pack.

Test Plan: Tested in my large repo repack scenario.  Added a test

Reviewers: #mercurial, ttung, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3277222

Signature: t1:3277222:1463084346:1687cfa174e98c2cf3022de9e9c3808881f689cd
2016-05-16 10:59:09 -07:00
Durham Goode
c2cdcde2fb store: read recent packs first
Summary:
Since recent packs are likely to contain more recent data, let's put them at the
front of the pack list so they are checked first.

In a future diff I'll come back and refact the common code between datapack and
historypack into one base class.

Test Plan:
Ran the tests. Used the debugger to verify that the sort order was
correct.

Reviewers: #mercurial, ttung, mjpieters

Reviewed By: mjpieters

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3259823

Signature: t1:3259823:1462388395:8ee48a7b02c6abc079878e53c5b336675249cb91
2016-05-04 14:53:16 -07:00
Durham Goode
7da17af64f store: record what files were created during a repack
Summary:
Previously, if you ran repack twice in a row, it would actually delete your
packs, because the repack produced files with the same name as before, and the
cleanup then deleted them.

The fix is to have the stores record what files they produced in the ledger,
then future clean up work can avoid deleting anything that was created during
the repack.

Test Plan: Added a test

Reviewers: #mercurial, ttung, mitrandir

Reviewed By: mitrandir

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3255819

Signature: t1:3255819:1462393814:d32155b12535990f72fbe48de045eddbb6f7fab6
2016-05-04 14:53:10 -07:00
Durham Goode
30cc85653c store: make pack files read-only
Summary:
Since pack files should never change after they are created, let's create them
with read-only permissions. It turns out that the Mercurial vfs doesn't apply
the correct permissions to files created by mkstemp (and we have to use mkstemp
since we don't know the name of the file until after we've written all the data
to it), so we have to manually call the permission fixing code.

We also need to fix our mmap calls to be readonly now, otherwise we get a
runtime permission denied exception.

Test Plan: Added a test

Reviewers: #mercurial, ttung, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3255816

Signature: t1:3255816:1462321201:dff4fb4c9301d67a77043ecc1d96262bb5d6a54a
2016-05-04 14:53:07 -07:00
Durham Goode
1d4b4dbb36 store: switch mutable packs to use openers
Summary:
Instead of passing in a path and performing joins ourselves, let's use an
opener. This will help handle all the file permission edge cases.

Test Plan: Ran the tests

Reviewers: #mercurial, ttung, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3255165

Signature: t1:3255165:1462393836:38a28c850a0dc06838d9c17672d3dffd9903bbd7
2016-05-04 14:53:04 -07:00
Durham Goode
0671f3d76a store: add version header to index
Summary: Pretty straight forward

Test Plan: Ran the tests

Reviewers: lcharignon, rmcelroy, ttung, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3254892

Signature: t1:3254892:1462315520:888bb27ef121c08d463f9fd4cf9eeb3c42383a96
2016-05-04 14:53:01 -07:00
Durham Goode
d71eace818 store: refactor version number and size to constants
Summary:
A future patch will add a version number to the index file. Let's move the
version size, fanout start, and index start to constants so we can more easily
change them without changing the code.

Test Plan: Ran the tests

Reviewers: lcharignon, rmcelroy, ttung, mitrandir, quark

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3254876

Signature: t1:3254876:1462315858:63fe56e8cfcdbb0209861898ce0c45c7d7b33e35
2016-05-04 14:52:58 -07:00
Durham Goode
d86ae1e525 store: implement markledger and cleanup for datapack
Summary:
Implementing these two functions will allow datapack's to be repacked (either
into other formats, or by combining multiple packs into one).

A future patch will add a test.

Test Plan: Added a test in a future patch

Reviewers: lcharignon, ttung, rmcelroy, mitrandir, quark

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3251539

Signature: t1:3251539:1462393256:7caa09677fbcaaf57a47d7a833684883483c5b3a
2016-05-04 14:52:46 -07:00
Durham Goode
cfd60406ad store: add context manager to mutable pack classes
Summary:
Previously, if a repack failed, it would leave temporary pack files laying
around. By adding enter/exit functions to mutable packs, we can guarantee
cleanup happens.

Test Plan: Ran repack, verified that a failure did not leave tmp files

Reviewers: rmcelroy, quark, ttung, lcharignon, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3250201

Signature: t1:3250201:1462234552:7f20260a193ed1dd858bf6e9f489ac902d859218
2016-05-03 12:34:45 -07:00
Durham Goode
735aa964d5 store: move data repack logic to repacker
Summary:
We had a naive repack implementation in datapack.py. Let's move it to the repack
module and do the minor adjustments to use the new repackerledger apis.

Test Plan: Ran it in conjunction with future diffs that make use of this api.

Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3249585

Signature: t1:3249585:1462232504:a00aa65afca9562a2c1456cc4ab48c50d1ba5b68
2016-05-03 12:33:36 -07:00
Durham Goode
c902797dc9 store: implement markledger and cleanup on stores
Summary:
This implements the new markledger and cleanup apis on the existing
remotefilelog stores. These apis are used to tell the repacker what each store
has, and allows each store to cleanup if its data has been repacked.

Test Plan:
Ran repack in conjunction with the future diffs that make use of
these apis.

Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3249584

Signature: t1:3249584:1462226133:1e8faffc9f6bf8f7c94e6e79aee8865e3c41648c
2016-05-03 12:33:00 -07:00
Durham Goode
b049a0910a store: datapack fix perf issue
Summary:
Using range() allocates a full list, which is 2**16 entries in the fanout case.
Let's use xrange instead. This is a notable performance win when checking many
keys.

Also removed an unused variable and use index instead of self._index since this
is a hotpath.

Test Plan: Ran hg repack

Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3249563

Signature: t1:3249563:1462240834:c19d6cbf0b6237f15ca8d81e8da856752df0ec59
2016-05-03 12:30:44 -07:00
Durham Goode
22948ce7e1 checkcode: add check code test
Summary: Adds the same check code test that upstream Mercurial uses.

Test Plan:
Ran it, and fixed all the failures. I won't land this commit until
all the failure fixes are landed.

Reviewers: #sourcecontrol, ttung, rmcelroy, wez

Reviewed By: wez

Subscribers: quark, rmcelroy, wez

Differential Revision: https://phabricator.intern.facebook.com/D3221380

Signature: t1:3221380:1461802769:19f5bdc209c05edb442faa70ae572ce31e2fbc95
2016-04-28 10:18:47 -07:00
Durham Goode
29d3dda67e checkcode: fix various store files
Summary: Fix check code for various store related files

Test Plan: Ran the tests

Reviewers: #sourcecontrol, mitrandir, ttung

Reviewed By: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3222465

Signature: t1:3222465:1461701300:34560288be4dc921f0252d4ad8fdc9c8d9357e23
2016-04-27 16:49:33 -07:00
Durham Goode
f17f6cc093 store: add revisions to datapack in alphabetical order
Summary:
This forces the revisions in the datapack to be added in alphabetical order.
This makes the algorithm more deterministic, but otherwise has little effect.

Test Plan: Ran the tests, ran repack

Reviewers: #sourcecontrol, rmcelroy, ttung

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3219760

Signature: t1:3219760:1461687720:7be5fdc1419f8214c8c83074494b33214b3684ae
2016-04-27 16:49:18 -07:00
Durham Goode
43ed70b6f1 store: add datapack store that reads pack files from .hg/store/packs
Summary:
Now that we can read and write datapack files, let's add a store implementation that
can serve packed content. With this patch, it's technically possible for someone
to prefetch and repack large portions of history for long term storage with
remotefilelog.

My next set of commits (which haven't been written yet) will:
- add tests for all of this
- add an indexpack format for packing ancestor metadata (the datapack only packs
  revision content)

Test Plan:
Ran the tests. Also repacked a repo, deleted the old cache files, ran
hg up null && hg up master, and verified it checked out master with the
right files and without fetching blobs from the server.

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3205351

Signature: t1:3205351:1461751649:45a56b57d962a282aeef9478500a3b23495a0eb7
2016-04-27 16:49:15 -07:00
Durham Goode
56c83ea072 store: add a datapack class that can read datapacks
Summary:
The previous patch added logic to repack store contents and write it to a
datapack file. This patch adds a new store implementation that knows how to read
datapacks.

It's just a simple implementation without any parallelism. So there's room for
improvement.

Test Plan:
Ran the tests.  Also tested this in conjunction with the next patch
which actually reads from the data structure.

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3205342

Signature: t1:3205342:1461750967:84377517cb1f285d37694a3f503d60ae85bacb66
2016-04-27 16:49:12 -07:00
Durham Goode
510ac021f3 store: add a basic repack and datapack format
Summary:
This is an initial implementation of a repack algorithm that can read data from
an arbitrary store (in this case the remotefilelog content store), and repack it
into a datapack.

A datapack is two files:

- a .datapack file consisting of a series of deltas (a delta may be a full text if the delta base is the nullid)
- a .dataidx file consisting of delta information and an index into the deltas

See the code for documentation of the exact format.

Test Plan:
ran the tests

Ran `hg repack` in a large repo. Verified that a datapack and a dataidx file
were created in .hg/store/packs. The datapack used 148MB instead of the 439MB the
old remotefilelog storage used.

Reviewers: #sourcecontrol, ttung, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3205334

Signature: t1:3205334:1461751366:ee4bf6a580ffb667071a8046fda6f0858b7f25ae
2016-04-27 16:49:09 -07:00