Commit Graph

28 Commits

Author SHA1 Message Date
Durham Goode
70ce116529 treemanifest: add history data to tree repacks
Summary:
Previously, tree repacks did not take into account tree history. It would just
look at the delta base and if the base existed, it would just reuse the delta.
This would A) result in very long chains, and B) result in chains where the full
text was the oldest version, instead of the newest (recent full texts means
faster access to recent versions).

This patch threads tree history into the repacker, which already knows how to
use history for repacks.

Test Plan:
Updated the tests, and inspected the new test results to ensure tree
entries that were not deltas before the repack became reverse deltas during the
repack.

Reviewers: #mercurial, simonfar

Reviewed By: simonfar

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4647359

Signature: t1:4647359:1488882710:dba72cf488766ce827b7641735164fa0efc9a303
2017-03-07 11:15:26 -08:00
Durham Goode
3f65bd2532 treemanifest: don't build pack files that depend on other pack files
Summary:
Previously, when treemanifest would create packs of trees during pull, we
allowed trees to be delta'd against trees in other packs. This resulted in
smaller packs, but if the other pack disappeared for some reason (since it's a
cache), the chain broke.

This patch ensures that the first version of every tree added to a pack is a
full text.

This temporarily makes repacks worse, since the repacker doesn't know about
history to produce deltas when combining packs. The next patch adds history
awareness which improves the repack deltafication.

Test Plan:
Updated the tests, and inspected the new test results to ensure that
all packs only had deltas within the pack.

Reviewers: #mercurial, simonfar

Reviewed By: simonfar

Subscribers: simonfar, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4647348

Signature: t1:4647348:1488882214:e850622a853a534fc60caeef604c88c30740c60d
2017-03-07 11:15:25 -08:00
Durham Goode
5bc368c71d treemanifest: move tree delta logic up to python layer
Summary:
Previously the treemanifest code itself would create the text deltas when
writing a tree out. This meant we couldn't make the delta decision based on
other data, like if the p1 commit was in the same pack file.

This patch removes treemanifest.write() and moves all calls over to
treemanifest.finalize() which gives the python/pack layer control over delta
choices. A future patch will use this to ensure tree packs always contain
complete delta chains.

Test Plan: All tests pass

Reviewers: #mercurial, simonfar

Reviewed By: simonfar

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4645942

Signature: t1:4645942:1488880851:d0c8c902e7e849072a53344630a9184b6d8e1e7f
2017-03-07 11:15:25 -08:00
Durham Goode
943afede25 treemanifest: add support for creating history packs
Summary:
Previously the treemanifest auto-tree-creation logic only produced data packs
containing the actual contents of the tree blobs. This lost history information
which is important for our ability to efficiently repack the data files.

This patch creates history packs during pull as well. A future patch will also
create history packs for the local tree blob store.

Test Plan: Updated the tests to cover this

Reviewers: #mercurial, simonfar

Reviewed By: simonfar

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4638865

Signature: t1:4638865:1488449992:48b60961b50b90b6d0e75a64af1f36fb29944e7a
2017-03-07 11:15:25 -08:00
Durham Goode
940796814d treemanifest: add option for using native store
Summary:
Adds a treemanifest.usecunionstore config flag for enabling and disabling use of
the native code uniondatapackstore.

Since we haven't implemented the repack APIs on the native datapack stores, we
currently have to force repack to use the old python implementations. Instead of
trying to expose just the appropriate APIs through the python interface, I think
we'll rewrite all of repack to be in C++ at a future time, since we can take
advantage of parallelism, etc.

Test Plan:
Updated test-treemanifest.t to use the c datapackstore. Also run all
the tests with --extra-config-opt=treemanifest.usecunionstore=True.

These tests caught a missing null check in the C++ code as well.

Reviewers: #mercurial, simonfar

Reviewed By: simonfar

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4609795

Signature: t1:4609795:1488365341:203362db5f470b613c4d6484686cd32c3fa8458f
2017-03-01 16:55:19 -08:00
Durham Goode
2cd1eeb08e ctreemanifest: move treemanifest into cstore
Summary:
As part of unifying our native store data structures into a single library,
let's move the treemanifest (including the python extension) into py-cstore.

Test Plan:
Built and ran the tests. Verified there was no ctreemanifest.so
dependency in the built cstore.so by using 'ldd cstore.so' on Linux and 'otools
-L cstore.so' on OSX.

Reviewers: #mercurial, simonfar

Reviewed By: simonfar

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4602484

Signature: t1:4602484:1487842683:964cbb43b7cb20d0db699ef691fe7fcf6bccf2e8
2017-02-23 14:03:03 -08:00
Durham Goode
760ab3473c treemanifest: error out if invalid configuration
Summary:
treemanifest requires fastmanifest, and fastmanifest.usetree requires
treemanifest. Let's make these dependencies explicit in the code and error out
if they are incorrect.

Test Plan: Added a test

Reviewers: #mercurial, wez

Reviewed By: wez

Subscribers: wez, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4417512

Signature: t1:4417512:1484347447:7e18340813fac0b298aa51a7cc2f89fc6953680f
2017-01-13 14:58:20 -08:00
Durham Goode
d3c6def7b8 remotefilelog: move pack file permissions to be in mutablebasepack
Summary:
Treemanifest had a bug where the pack files it created were 400 instead of 444.
This meant people sharing a cache on the same machine couldn't access them. In
most of the remotefilelog code we had set opener.createmode before creating the
pack but we forgot to for treemanifest.

This patch moves the opener creation and createmode setting into the mutable
pack class so we can't screw this up again.

Test Plan:
Tests that wrote to the packs before, now failed and had to be
updated to chmod the packs before writing to them.

Reviewers: #mercurial, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4411580

Tasks: 15469140

Signature: t1:4411580:1484321293:9aa78254677548a6dc2270c58cee0ec6f57dd089
2017-01-13 09:42:25 -08:00
Adam Simpkins
ab81106b08 [remotefilelog] don't crash on invalid pack files
Summary:
We have run into some cases where users ended up with empty pack file in their
packs directory.  Just log a warning in this case and skip this pack file,
rather than letting the exception propagate up and crashing the command.

Test Plan:
Created empty 0000000000000000000000000000000000000000.histpack and
0000000000000000000000000000000000000000.histidx files in my repository's
hgcache directory, and confirmed that "hg log" now simply warns about them
instead of crashing.

I didn't really test the perftest.py or treemanifest_correctness.py extensions
much.  They seem to throw exceptions, and look like they have maybe gotten a
bit stale.  I fixed one minor typo, but didn't dig into the other exceptions
too much.

Reviewers: durham

Reviewed By: durham

Subscribers: net-systems-diffs@, yogeshwer, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4402516

Tasks: 15428659

Signature: t1:4402516:1484155254:96d2980efcec2d735257b08910e1ca437ef1dad6
2017-01-12 09:47:29 -08:00
Stanislau Hlebik
e03071ee64 treemanifest: fix unit tests
Summary:
In 676596f945ea2166820ef92e692ef7fe6a6247f0 were added comments with
lines > 80. In aec81a9a80d22989bbdc8c74c1dfec9dcbbe6866 default config value
was changed.

Test Plan: arc unit

Reviewers: #sourcecontrol

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4377188
2017-01-03 05:09:06 -08:00
Durham Goode
8041fb2a9c treemanifest: debug verification refactor 2016-12-31 18:22:38 -08:00
Durham Goode
5f0176753a treemanifeset: fix auto-tree creation for merges
Previously we were relying on mfrevlog.revdiff() to produce the delta for us.
This only showed us what files were added/modified, as compared to p1, and we
had to use a heuristic to know what files were deleted (by looking at the list
of files in the commit metadata). Merge commits have a different criteria for
what is in the commit metadata (it only contains entries where the file is
different from both parents), so we can't use it for the same heuristic. So
let's fall back to a normal manifest diff for merge commits, since they are
rare.

Adds a test for verifying that conversion of merge commits into a tree works.
2016-12-31 18:22:38 -08:00
Durham Goode
bab8d2e0a5 treemanifest: remove allfiles optimization
Commit 24515b72d5 attempted to optimize writes by checking if the file in the
delta was also in the list of files in the commit metadata. This doesn't work for merge
commits since the only files the commit metadata are the ones that differ from
both (so set(changes in metadata) != set(changes in node-diff-against-p1)). This
caused the verify code to catch the issue. The fix is to just remove the
allfiles optimization.
2016-12-12 18:59:45 -08:00
Durham Goode
51ae957ffc treemanifest: add simple test for tree repack
Summary:
This adds a simple test that verifies hg repack will pack two tree manifest
packs into one.

It caught a bug where creating a treemanifest for a commit with a null parent
produced incorrect output because it constructed an empty tree and tried to use
it's node as the parent of the delta, when there should not have been any delta
in the first place. This is fixed by this diff as well.

Test Plan: Ran the new test

Reviewers: #mercurial, dsyang, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4261591

Signature: t1:4261591:1480705822:ef21fb8cebd8b89f92f58f11bb1dab59bf97664d
2016-12-02 14:37:55 -08:00
Durham Goode
fe149d0caa treemanifest: set repo.name
Summary:
Since treemanifest uses the same storage locations as remotefilelog, it needs
access to the repo name as well. If a given repo has treemanifest enabled but is
not a remotefilelog repo, it won't have the repo.name member already. So let's
add it ourselves in treemanifest.

Eventually we should probably refactor this out to be a more global concept of
repo name.

Test Plan: A future patch adds a test that caught this

Reviewers: #mercurial, dsyang, rmcelroy

Reviewed By: rmcelroy

Subscribers: rmcelroy, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4261581

Signature: t1:4261581:1480705457:8fb4b86ce8abeed62cc7c8f787868359c3cf8abc
2016-12-02 14:37:51 -08:00
Durham Goode
046fb2ca08 repack: enable repacking of local manifest stores
Summary:
Previous hg repack would only repack the shared manifest cache store. This patch
makes it also repack the local manifest store too.

Test Plan:
Made a local commit in my test repo with treemanifests enabled.
Rebased the commit to master so now there were two local tree packs. Ran hg
repack and verified they were combined into one pack in
.hg/store/packs/manifests

I'll add a test later today

Reviewers: #mercurial, mjpieters

Reviewed By: mjpieters

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4260580

Signature: t1:4260580:1480696151:8f989e299dda50281ca63489e870202eb195d714
2016-12-02 14:37:45 -08:00
Durham Goode
a0dc16174d treemanifest: write deltas for trees
Summary:
Previously, when we wrote each tree entry into a pack file, it wasn't delta'd in
any way. This patch makes it store the delta against p1 in the pack file.

Testing in a large repo shows this reduces tree pack size by about 22x.

Test Plan:
Ran the tests. Did a pull in a large repo and saw the pack file was
22x smaller than before (and still usable).

Reviewers: #mercurial

Differential Revision: https://phabricator.intern.facebook.com/D4202088
2016-11-29 15:37:58 -08:00
Durham Goode
3ad41b8578 treemanifest: limit autocreatetrees to commits on master
Summary:
Creating a tree for a commit whose parent is not already a tree is expensive.
Let's optimize the autocreatetrees option (which converts manifests to trees
during pull) to only create trees if A) the parent is already a tree, or B) the
parent is master. This way we only pay the expensive part once. It also means
that as new branches fork off master, they will be trees too, since all commits
in the new branch will meet criteria (A).

Test Plan:
Ran hg pull in a large repo over a large pull with 40 different
branches and verified it only paused for the initial master commit.

Reviewers: #mercurial, simonfar

Reviewed By: simonfar

Subscribers: simonfar, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4129010

Signature: t1:4129010:1478260239:1698ada4e3c6a38ab77a94317e75daee4812276f
2016-11-16 13:51:48 -08:00
Durham Goode
b1109deca8 treemanifest: update to work with manifestlog
Summary:
Upstream has refactored the manifest class into several classes, so we need to
update treemanifest to work with the new structure. Notably, the factory add
function previously relied on the ability for the revlog class to create a new
manifestdict (via manifest.maniest.read()), since this isn't possible anymore,
we have to construct the hybridmanifest ourselves and provide an appropriate
loadflat function to get the flat manifest if necessary.

Test Plan: Ran the tests

Reviewers: #mercurial, rmcelroy

Reviewed By: rmcelroy

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4180891

Signature: t1:4180891:1479285991:82bc546a1eb682d3cfd8b4724bda575410405d0f
2016-11-16 12:11:15 -08:00
Durham Goode
e63b8ce924 treemanifest: write trees during local commits
Summary:
This makes local commits get written to trees (as well as flat manifest) when a
commit happens where the parent commit has a tree manifest already. During a
transaction where multiple trees are written (like when rebasing multiple
nodes), we reuse the same pack file for all the trees produced by tieing into
the transaction abort and close hooks.

Test Plan:
Ran the tests. Ran hg commit with the extension enabled. A future
patch will add an integration test for the treemanifest extension.

Reviewers: #mercurial, quark

Reviewed By: quark

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4055851

Signature: t1:4055851:1477059659:91b1c2f93ef986e910cea752ebf2466cb20ac921
2016-10-21 11:02:26 -07:00
Durham Goode
3707e186b6 treemanifest: add local pack store to unionstore
Summary:
In a future patch we will start writing user local tree data into a local
directory. This patches add the local store to the union store so the contents
will be accessible once we start writing it.

It also renames manifest to manifests for consistency with the other store names
(like 'packs').

Test Plan: Ran the tests

Reviewers: #mercurial, quark

Reviewed By: quark

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4055827

Signature: t1:4055827:1477059457:0d5b0d999b47d88c73f5ab2721d8d27deacc01bc
2016-10-21 11:02:22 -07:00
Durham Goode
4235752853 remotefilelog: rename getpackpath to getcachepackpath
Summary:
In a future diff we will be introducing packs into .hg/store, so we need to
differentiate between cache packs and local packs. This patch renames
getpackpath to getcachepackpath. A future diff will add getlocalpackpath.

This exposed a pyflakes error, so we fix that too.

Test Plan: Ran the tests

Reviewers: #mercurial, quark

Reviewed By: quark

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D4055815

Signature: t1:4055815:1477059415:e0221557bbeec6701820c826f00390d3a71cd2d3
2016-10-21 11:02:20 -07:00
Durham Goode
4f326d2c38 treemanifest: don't call set for files that didn't actually change
We compute which files have changed by looking at the manifest deltas. It's
possible that a manifest delta may contain an entry that deletes a file, then
replaces that file with the exact same content. This results in an unnecessary
set operation. Let's catch that earlier and avoid the set.
2016-10-14 16:01:12 -07:00
Durham Goode
e2c52435b0 treemanifest: optimize which trees we keep in memory during auto create
Previously, autocreate would keep every tree we created in memory. This result
in a memory explosion such that the process was eating 10's of GB of memory.
Let's optimize this to keep track of how many times each tree will be needed,
then throw the tree away once it is no longer needed.

Testing this via 'hg pull' showed that memory stayed constant even when pulling
and converting thousands of commits.
2016-09-21 13:51:39 -07:00
Durham Goode
599de92eab treemanifest: move verification behind an option
This moves our auto create tree verification to be behind an option. It defaults
to True for now.
2016-09-21 13:57:06 -07:00
Durham Goode
df86b3486d treemanifest: auto create manifest pack directory
The pack/manifests directory wasn't being automatically created, so let's make
it so.
2016-09-21 13:51:39 -07:00
Durham Goode
4da14809c2 tree: add option to automatically create trees during hg pull
Summary:
This adds a treemanifest.autocreatetrees config option. When it is set, hg pull
will automatically create a pack file that contains tree contents during an hg
pull.

We'll need to wait until the setitem and dirtybit logic is landed before we land
this, since that's required for us to test the full iteration logic here.

Test Plan:
Ran hg pull and verified a datapack was produced with the correct
manifest contents. The contents currently only contained the root manifest,
since we don't have the setitem + dirtybit logic necessary to actually modify
tree yet.

Reviewers: #fastmanifest

Differential Revision: https://phabricator.intern.facebook.com/D3838836
2016-09-19 16:30:17 -07:00
Durham Goode
3b7beae747 ctree: add ctreemanifest hg extension
Summary:
Adds the initial extension that sets up the ctreemanifest. It currently relies
on the fastmanifest extension to hook into all the manifest APIs to construct
ctreemanifests.

Test Plan:
In a future patch, I was able to run 'hg manifests' on a commit and
have it return the manifest contents by reading the treemanifest.

Reviewers: #fastmanifest, ttung

Reviewed By: ttung

Subscribers: ttung

Differential Revision: https://phabricator.intern.facebook.com/D3755327

Signature: t1:3755327:1472114482:0c5862cba68ed4db643d28c2fae01f33f5352970
2016-08-29 16:19:52 -07:00