Summary:
In a future diff we will be introducing packs into .hg/store, so we need to
differentiate between cache packs and local packs. This patch renames
getpackpath to getcachepackpath. A future diff will add getlocalpackpath.
This exposed a pyflakes error, so we fix that too.
Test Plan: Ran the tests
Reviewers: #mercurial, quark
Reviewed By: quark
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4055815
Signature: t1:4055815:1477059415:e0221557bbeec6701820c826f00390d3a71cd2d3
Summary:
This fixes all the pyflaks and module errors for the main remotefilelog
code base.
Test Plan: ./run-tests.py test-check* test-remotefilelog*
Reviewers: #mercurial, quark
Reviewed By: quark
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4055537
Signature: t1:4055537:1477049663:ee904d311d17d3659e055e2c109c68c9023cfd1f
Summary:
Adds the initial extension that sets up the ctreemanifest. It currently relies
on the fastmanifest extension to hook into all the manifest APIs to construct
ctreemanifests.
Test Plan:
In a future patch, I was able to run 'hg manifests' on a commit and
have it return the manifest contents by reading the treemanifest.
Reviewers: #fastmanifest, ttung
Reviewed By: ttung
Subscribers: ttung
Differential Revision: https://phabricator.intern.facebook.com/D3755327
Signature: t1:3755327:1472114482:0c5862cba68ed4db643d28c2fae01f33f5352970
Summary:
Previously, the history repack logic would stop traversing history for a given
filename once it encountered a rename. This isn't quite right, since the history
could eventually be traversed back to the original file, where we'd need to
continue processing. So now we check for when the copyfrom becomes the filename.
Also, if the copy source file and the copy target file have two nodes with the
same value, we would not process the one in the copy target (since it was marked
do not process). We fix this by explicitly checking if the node is one of the
known entries in the file being processed.
Test Plan: Added a test
Reviewers: #mercurial, ttung, mitrandir, rmcelroy
Reviewed By: mitrandir, rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3523215
Signature: t1:3523215:1467828169:bd487c8f296352c1a1b9355cb55f9001bd5e19a9
Summary:
The pack path logic did not use the correct unix group when
remotefilelog.cachegroup was specified. This fixes that.
Test Plan:
I manually tested it by deleting a pack dir and running repack. This
is hard to create an automated test for since the feature isn't really cross
platform, and we don't have a way to know what groups they have on their
machine.
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3400756
Tasks: 11584114
Signature: t1:3400756:1465342537:ed023f6dc830117df5e85e294a41486f072714c9
Summary:
Previously a bunch of different places accessed the cachepath through ui.config
directly. This is a problem because we need to resolve any environment variables
in the path, and some spots didn't do this. So let's unify all accesses through
a helper function that takes care of the environment variables.
Test Plan: Added a test
Reviewers: mitrandir, lcharignon, #sourcecontrol, ttung, simonfar
Reviewed By: simonfar
Subscribers: simonfar
Differential Revision: https://phabricator.intern.facebook.com/D3385583
Signature: t1:3385583:1464971813:5b9ee5ed3d6ff9f1a78cb9e0269e433844758c9d
Previously, background repacks would only repack pack files, which meant there
was no automated way to repack loose remotefilelog files without manually
running 'hg repack'. This allows incremental repacks to also pack the loose
files.
It also changes the config knob for background repacks, so we can enable pack
file usage without the server having to support it just yet.
Previousy, a union store required that it be able to compute the entire history
of the revision. This caused problems in repack, since it may only have a
partial history. Instead of throwing a KeyError and giving the repack algorithm
no history information at all, we add a config knob to let the repack logic
specify that it's ok with partial histories.
The next patch will do the same for contentstore.
Summary:
Previously we only had incremental repacking for data packs. This patch adds it
for history packs as well. The algorithm here is simpler, since the amount of
history data is generally much smaller than the amount of delta data.
The algorithm is basically: if there are 2 things bigger than 100MB, repack
them; else repack up to 100MB of smaller things.
The datapack hashes changed because having the history available during a repack
allows us to make different decisions about delta ordering, etc.
Test Plan: Updated the tsets
Reviewers: mitrandir, #mercurial, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3306535
Signature: t1:3306535:1463696613:f40ed10c9dfed40d7bc455582592a7aed108ec3a
Summary:
This runs the incremental background repacking logic after hg pull.
As part of adding tests, I also added a 'hg debugwaitonrepack' function that
will wait until any pending repack is done before returning, so the tests can
wait on repacks without so many sleeps.
Test Plan: Adds a test
Reviewers: mitrandir, #mercurial, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3306526
Signature: t1:3306526:1463696933:9e27daf0c08076468e8f365a3c372fa7d4f56bde
Summary:
This adds a --incremental flag to the hg repack command. This flag causes repack
to look at the distribution of pack files in the repo and performs the most
minimal repack to keep the repo in good shape. Currently it's only implemented
for datapacks.
The new remotefilelog.datagenerations config contains a list of the sizes for
the different generations of pack files. For instance:
[remotefilelog]
datagenerations=1GB
100MB
1MB
Designates 4 generations - packs over 1GB, packs over 100MB, packs over 1MB, and
implicitly packs undex 1MB. The incremental algorithm will try to keep each
generation to less than 3 pack files (prioritizing the larger generations
first). When performing a repack it will grab at least 2 packs, and will grab
more if the total pack size is less than 100MB (since repacking at that level is
pretty cheap).
I have no idea if this is a good algorithm. We'll how to see and iterate.
Test Plan: Adds a test
Reviewers: mitrandir, #mercurial, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3306523
Signature: t1:3306523:1463697129:c87f4a397ef357b5ca4a80d01e9a6ca4d61f9d3d
Summary:
A future patch will be adding incremental repack, so let's move our repack logic
to the repack module so it's easier to refactor and extend.
Also adds a message for when a background repack kicks off (since we'll be
calling that from other places eventually).
Test Plan: Adds a test
Reviewers: mitrandir, #mercurial, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3306521
Signature: t1:3306521:1463602886:cece3d517f0672b829702866482c902812f9ae27
Summary:
Previously, when repacking deltas we would require a full history of the node so
we could order the hashes optimally. In some situations though, we don't have
the full history available (like if we're only repacking a subset of packs), so
we need to be able to repack even without full history.
This patch handles the case where a given delta doesn't have history
information. We just store it as a full text.
This becomes useful in an upcoming series that will introduce incremental
packing that only packs a subset of the packs.
Test Plan: Added a test
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3278346
Signature: t1:3278346:1463086170:54c0fbefe78f9cafa7efc4b6f037887a924ab4a5
Summary:
In an old version of the code, we would walk the entire history of a node during
data repack, which meant we had to keep track of when we saw a rename, and stop
walking there. Since then, we've changed the code to no longer walk the entire
history, and instead walk just the parts it was told to repack for this
particular file. This means we no longer ever walk across a copy, and therefore
don't need this copy detection logic.
Test Plan: Ran the tests
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3278443
Signature: t1:3278443:1463086137:c6d9eb6637bf3b8636a3df7e531f265d51cab0de
Summary:
Previously we were throwing away copy information when we repacked things into
pack files. The hope was that we could store copy information somewhere else,
and keep the history pack using fixed length entries. Since storing copy
information elsewhere is a long ways off, let's just go ahead and put copy info
in the pack file.
This makes the entries non-fixed length, which means any iteration over them has
to read the length of each entry. This also affects the historypack filename
hashes since they are content based, so the tests had to change.
This matches the old remotefilelog behavior more closely (which is why no code
had to change outside the pack logic).
Test Plan: Added a test
Reviewers: #mercurial, mitrandir, ttung
Reviewed By: mitrandir
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3262185
Signature: t1:3262185:1462562602:935683692276c7fa569d381b18aa3b18656793b1
Summary:
This adds a lock that limits us to running only one repack at a time. We also
add a simple prerepack hook to allow the tests to insert a sleep to test this
functionality.
Test Plan: Added a test
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3260428
Signature: t1:3260428:1462393311:3e1bf5dd047e7f3521679ca7640b448f5e784913
Summary:
Some miscellaneous perf improvements:
- Get rid of unnecessary sorting
- Bail early from remotefilelog reverse filename lookup if there are no keys
- Avoid unnecessary class/enum lookups in ancestor hotpath
- Fix n^2 behavior in history repacking
- Remove unnecessary set() and tuple instantiation in hotpath
- Use __slots__ on repackentry to improve access times
Test Plan:
Ran the tests. Tested the actual perf improvements via 'hg repack' in
a large repo. The n^2 fix caused a massive perf when, but the others shaved off
a number of seconds as well.
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3257595
Signature: t1:3257595:1462392755:80209fe66c2632d2c16a2840500b0655926e7513
Summary:
Previously, if you ran repack twice in a row, it would actually delete your
packs, because the repack produced files with the same name as before, and the
cleanup then deleted them.
The fix is to have the stores record what files they produced in the ledger,
then future clean up work can avoid deleting anything that was created during
the repack.
Test Plan: Added a test
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3255819
Signature: t1:3255819:1462393814:d32155b12535990f72fbe48de045eddbb6f7fab6
Summary:
We had a naive repack implementation in historypack.py. Let's move it to the
repack module and do the minor adjustments to use the new repackerledger apis.
Test Plan:
Ran hg repack in conjunction with future diffs that make use of this
api
Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3249587
Signature: t1:3249587:1462232544:591cd8bec09f781370896470746eae5a4489531f
Summary:
We had a naive repack implementation in datapack.py. Let's move it to the repack
module and do the minor adjustments to use the new repackerledger apis.
Test Plan: Ran it in conjunction with future diffs that make use of this api.
Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3249585
Signature: t1:3249585:1462232504:a00aa65afca9562a2c1456cc4ab48c50d1ba5b68
Summary:
This introduces the high level classes that will implement the generic repack
logic.
Test Plan: Ran the repack in conjunction with later commits that use these apis.
Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3249577
Signature: t1:3249577:1462225435:000f9cc29ae2a3d7fdbedf546c8936ef45d1e4cf