Summary: The strings are not necessarily null-terminated and length needs to be considered.
Test Plan: used in later diff to find a path.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3770358
Signature: t1:3770358:1472173786:9a2f681c90476aafd481c301ff65ac8b199214ec
Summary:
appendbinfromhex appends the binary representation of a 40-byte hex string onto a std::string. This allows us to reuse a std::string rather than to allocate a new one every time.
Also:
1. converted binfromhex to use this method.
2. updated the docblocks to actually reflect reality.
Test Plan: `make local && cd ~/work/fbsource && PYTHONPATH=~/work/mercurial/facebook-hg-rpms/remotefilelog:~/work/mercurial/facebook-hg-rpms/fb-hgext/:~/work/mercurial/facebook-hg-rpms/remotenames/:~/work/mercurial/facebook-hg-rpms/lz4revlog/ /opt/local/bin/python2.7 ~/work/mercurial/facebook-hg-rpms/hg-crew/hg --config extensions.perftest=~/work/mercurial/facebook-hg-rpms/remotefilelog/tests/perftest.py testtree --kind flat,ctree,fast --test fulliter,diff,find --build "master~5::master"`
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3770048
Signature: t1:3770048:1472105520:eac79a42360ebfa258519346b68fc4541c2dbb7c
Summary: In most cases, this is pretty straightforward. The only unusual case is `_treemanifest_find`, which does the actual resolution of the root manifest (unlike diff and iter).
Test Plan: make local
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3763324
Signature: t1:3763324:1472173572:3dcda9b318ad818d2f51e8c3472c7770739faafe
Summary: This more accurately describes its purpose.
Test Plan: simple refactor, so thus just make local
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3763323
Signature: t1:3763323:1472104318:b48246777db0527c7022066438c211f54c88703e
Summary: It makes a destructor possible.
Test Plan: make local
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3763322
Signature: t1:3763322:1472104242:e76ef1943c2082ddecff872c6472de4706505922
Summary:
764cd9916c94 recently introduced code that was unconditionally checking the
repo.includepattern and repo.excludepattern attributes on a local repository
without first checking if this is a shallow repository. These attributes only
exist on shallow repositories, causing "hg pull" to crash on non-shallow
repositories. This crash wouldn't happen in simple circumstances, since the
remotefilelog extension only gets fully set up once a shallow repository object
has been created, however when using chg you can end up with scenarios where a
non-shallow repository is used in the same hg process after a shallow one.
This refactors the code to now store the local repository object on the remote
peer rather than trying to store the individual shallow, includepattern, and
excludepattern attributes.
Overall this code does still feel a bit janky to me -- the rest of the peer API
is independent of the local repository, but the _callstream() wrapper cares
about the local repository being referenced. It seems like we should ideally
redesign the APIs so that _callstream() receives the local repository data as
an argument (or we should make the peer <--> local repository assocation more
formal and explicit if think it's better to force an association here).
Test Plan: Added a new test which triggered the crash, but passes with these changes.
Reviewers: ttung, mitrandir, durham
Reviewed By: durham
Subscribers: net-systems-diffs@, yogeshwer
Differential Revision: https://phabricator.intern.facebook.com/D3756493
Tasks: 12823586
Signature: t1:3756493:1471971600:9666e9c31bf59070c3ace0821d47d322671eb5b1
Summary: It's a fixed reference that we use, so no need for a pointer.
Test Plan: `PYTHONPATH=~/work/mercurial/facebook-hg-rpms/remotefilelog:~/work/mercurial/facebook-hg-rpms/fb-hgext/:~/work/mercurial/facebook-hg-rpms/remotenames/:~/work/mercurial/facebook-hg-rpms/lz4revlog/ /opt/local/bin/python2.7 ~/work/mercurial/facebook-hg-rpms/hg-crew/hg --config extensions.perftest=~/work/mercurial/facebook-hg-rpms/remotefilelog/tests/perftest.py testtree --kind flat,ctree,fast --test fulliter,diff,find --build "master~5::master"`
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3753463
Signature: t1:3753463:1471905506:802315db9d2aa34363c7b0bccaefcbcda21b1a1e
Summary: Since we're no longer returning a struct, we no longer need to return the value through a pointer argument.
Test Plan: `PYTHONPATH=~/work/mercurial/facebook-hg-rpms/remotefilelog:~/work/mercurial/facebook-hg-rpms/fb-hgext/:~/work/mercurial/facebook-hg-rpms/remotenames/:~/work/mercurial/facebook-hg-rpms/lz4revlog/ /opt/local/bin/python2.7 ~/work/mercurial/facebook-hg-rpms/hg-crew/hg --config extensions.perftest=~/work/mercurial/facebook-hg-rpms/remotefilelog/tests/perftest.py testtree --kind flat,ctree,fast --test fulliter,diff,find --build "master~5::master"`
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3752376
Tasks: 12818084
Signature: t1:3752376:1471891162:e7cc159d2077d77d234902e4598b9661f15840c0
Summary: This allows us to modify the ManifestEntry stored in memory. This also requires us to remove the const qualifier in a number of places.
Test Plan: `PYTHONPATH=~/work/mercurial/facebook-hg-rpms/remotefilelog:~/work/mercurial/facebook-hg-rpms/fb-hgext/:~/work/mercurial/facebook-hg-rpms/remotenames/:~/work/mercurial/facebook-hg-rpms/lz4revlog/ /opt/local/bin/python2.7 ~/work/mercurial/facebook-hg-rpms/hg-crew/hg --config extensions.perftest=~/work/mercurial/facebook-hg-rpms/remotefilelog/tests/perftest.py testtree --kind flat,ctree,fast --test fulliter,diff,find --build "master~5::master"`
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3752366
Tasks: 12818084
Signature: t1:3752366:1471891101:1d42a04d85b7e8db34644dc8fbf1bb3481fbb7bc
Summary: py-treemanifest.cpp will be mostly just python binding stuff.
Test Plan: make local
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3733832
Signature: t1:3733832:1471542792:d6f1130b5c16487f4d0173cedd91857fd3711c1d
Summary: Directly pass in the path + len and the node. Note that the path is now a char* + len, because this allows us to use the path in treemanifest_find directly, rather than to construct a new path.
Test Plan: run existing perftest without crash.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3738280
Signature: t1:3738280:1471890923:c13283f1c61dc020ba1918ee9b25c24dfd2fc19b
Test Plan: used in later diff.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3732102
Signature: t1:3732102:1471905481:000ff5d976c348bac9f993f9c0b56f4b9c8b84f0
Summary:
I like many small files.
There is one place where I'm making a functional change (convert.h) to satisfy angry compilers.
Test Plan: make local.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3732584
Signature: t1:3732584:1471542758:d0b7804753ea4fd39a507090338ae3c5104dc7fa
Summary: For adding new children to Manifests, we need to be able to create new ManifestEntries. Since these ManifestEntries will not be backed by datapack data structures, we need ManifestEntries that have its own memory allocation.
Test Plan: used in later diff.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3732101
Signature: t1:3732101:1471541200:17b60af0977109757610168637a276f5dd999f8b
Summary: D3730823 removes the need for it.
Test Plan: compiles
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mathieubaudet, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3730880
Signature: t1:3730880:1471467917:2b202c5cab1a10fcfe5899670b11bc44740983f7
Summary:
If a manifest has already been loaded for a ManifestEntry, we should cache that entry and reuse it. Two reasons:
1) if someone makes a modification to a tree, we need to persist that.
2) better performance.
Missing in this diff: memory cleanup
Test Plan: compiles
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3725842
Signature: t1:3725842:1471401136:cbf4a987c35ea19ca432059cc15e299f0aa5568b
Summary: This allows us to use ManifestFetcher inside the ManifestEntry class.
Test Plan: compiles
Reviewers: #fastmanifest, durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3725838
Summary: This will allow us to replace portions of manifests with in-memory representations.
Test Plan: existing script doesn't crash.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3725831
Signature: t1:3725831:1471400740:4d9891c01f8567f4ceab76d6bd36e7dc595de4a6
Summary: `parseptr` refers to where the parsing is going next. This simplifies the code and makes ManifestEntry less tied to the original memory allocation.
Test Plan: the existing test i've been running doesn't crash.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3717940
Signature: t1:3717940:1471383988:9b718e137b8ffaf8aad09f78f15791eec57fce6e
Summary: Parse manifests when we construct the Manifest object. This allows us to do manipulations to the Manifest object and not have to deal with a parallel set of data structures.
Test Plan:
`PYTHONPATH=~/work/mercurial/facebook-hg-rpms/remotefilelog:~/work/mercurial/facebook-hg-rpms/fb-hgext/:~/work/mercurial/facebook-hg-rpms/remotenames/ valgrind ~/work/mercurial/facebook-hg-rpms/hg-crew/hg --config extensions.perftest=~/work/mercurial/facebook-hg-rpms/remotefilelog/tests/perftest.py testtree --kind flat,ctree,fast --test fulliter,diff,find --build "@~2::@"`
note that I'm running this with Valgrind.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3717936
Signature: t1:3717936:1471383856:012634c1e59f1da9fc1e5a918e7f7d99d30d6992
Summary: It's not necessary for the direction we're going in.
Test Plan: compiles
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3717930
Signature: t1:3717930:1471383901:d8bfddb7ea7fb08d26315cbfce7af65d14662bc8
Summary:
We need manifest objects to be able to stick around in memory, because now they have overrides and all that other good stuff.
This probably introduces a metric ton of memory leaks, but we'll slowly whittle them down.
Test Plan: same script.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3717304
Signature: t1:3717304:1471385439:9269ab248d233a970c6725dbb4ca3eb661f6a96e
Summary: xxx is the C++ class, py_xxx is the python wrapper for it.
Test Plan: make local
Reviewers: #fastmanifest, akushner
Reviewed By: akushner
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3701536
Signature: t1:3701536:1471150787:5f3dfc3a65360233f9ee2ea1757ed4f368f70d62
Summary: py_treemanifest is initialized to 0s. We initialize the `tm` field by explicitly calling the constructor in `treemanifest_init` and we destroy everything by explicitly calling the destructor in `treemanifest_dealloc`
Test Plan: `PYTHONPATH=~/work/mercurial/facebook-hg-rpms/remotefilelog:~/work/mercurial/facebook-hg-rpms/fb-hgext/:~/work/mercurial/facebook-hg-rpms/remotenames/:~/work/mercurial/facebook-hg-rpms/lz4revlog/ /opt/local/bin/python2.7 ~/work/mercurial/facebook-hg-rpms/hg-crew/hg --config extensions.perftest=~/work/mercurial/facebook-hg-rpms/remotefilelog/tests/perftest.py testtree --kind flat,ctree,fast --test fulliter,diff,find --build "master~5::master"` in fbsource
Reviewers: #fastmanifest
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3730823
Summary: If the flag is not present, then `flag` field is set to NULL. In that case, the current code will segfault. Now we will assign `\0` to `*resultflag`.
Test Plan:
run `PYTHONPATH=~/work/mercurial/facebook-hg-rpms/remotefilelog:~/work/mercurial/facebook-hg-rpms/fb-hgext/:~/work/mercurial/facebook-hg-rpms/remotenames/ ~/work/mercurial/facebook-hg-rpms/hg-crew/hg --config extens
.perftest=~/work/mercurial/facebook-hg-rpms/remotefilelog/tests/perftest.py testtree --kind flat,ctree,fast --test fulliter,diff,find --build "@~2::@"`
without crashing.
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3700352
Summary: This is the first step in disentangling the C++ code from the python interface.
Test Plan: `PYTHONPATH=~/remotefilelog/build/lib.linux-x86_64-2.6/ python ~/hg/hg --config extensions.remotefilelog=~/remotefilelog/remotefilelog --config extensions.perftest=~/remotefilelog/tests/perftest.py testtree --kind flat,ctree --test fulliter,diff,find --build "master~5000::master"`
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3700627
Signature: t1:3700627:1471382403:d8dc6dc5c295ec55878ca020b91fc0b30d930ce8
Summary: Useful if you run CLion.
Test Plan: built everything.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3700542
Signature: t1:3700542:1471382263:260eb0a1588480f1d4c798df60d559c63ed19c8a
Summary: If someone removes the pack file, getpack will throw an IOError. Catch it and don't bother trying to add the pack to the list of available packs.
Test Plan: pdb.set_trace() in this method, then remove a file while the debugger is halted. continue without a traceback.
Reviewers: #mercurial, quark
Reviewed By: quark
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3712310
Tasks: 12660285
Signature: t1:3712310:1471080740:85f2e3f49b09f348ec654362eadf5f1e689b8a19
Summary: Once we're done reading the delta data, we madvise it away.
Test Plan:
dump all the hashes from a datapack into a separate file. then run a script to fetch all the delta chains. observed that the memory footprint did not increase significantly.
```
#!/usr/bin/env python
import binascii
import cdatapack
dp = cdatapack.datapack('/dev/shm/hgcache/fbsource/packs/8b5d28f7a5bd7391a0b060c88af8cca3af357c24')
for ix, line in enumerate(open('/tmp/hashes', 'r')):
line = line.strip()
dp.getdeltachain(binascii.unhexlify(line))
```
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3686830
Signature: t1:3686830:1470716333:e8fc8e3e3fa29c1931f69222c17e10f519a4a8c2
Summary:
We actually have no idea what "hg.peer" will return and should check
if it has a "cleanup" method or not before wrapping it.
Test Plan: Code Review
Reviewers: ttung, #mercurial, rmcelroy
Reviewed By: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3703201
Signature: t1:3703201:1470924859:852eaf275c89ceced285a4b74d09938e489d9ee0
Blame Revision: D3685587
Summary:
clang has a bug where the fully qualified class name cannot be used to invoke the destructor https://llvm.org/bugs/show_bug.cgi?id=12350
This is one of the suggested workarounds.
Test Plan: compiles on darwin
Reviewers: #fastmanifest, lcharignon
Reviewed By: lcharignon
Subscribers: mathieubaudet, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3692812
Signature: t1:3692812:1470852812:09ef7de2a322034a01f5569b574c47fcc6f0c8d7
Summary: Since we allocate the memory for the uncompressed data on the heap, we need to free it.
Test Plan: compiles.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3686334
Signature: t1:3686334:1470716123:10fae5169e73ab581b282f771d188f804198d169
Summary:
The deadlock happens in sshpeer.cleanup:
```
# sshpeer.cleanup
def cleanup(self):
if self.pipeo is None:
return
self.pipeo.close() # [1]
self.pipei.close()
try:
# read the error descriptor until EOF
for l in self.pipee: # [2]
self.ui.status(_("remote: "), l)
except (IOError, ValueError):
pass
self.pipee.close()
```
Notes:
1. Normally, this closes "stdin" of the server-side ssh process so
the "ssh" process running server-side will detect EOF and end.
However, with workers calling "os.fork", there could be other
processes keeping "pipeo" open thus "ssh" won't receive EOF.
2. Deadlock happens here, if the ssh process cannot get EOF and
does not end itself, thus does not close its stderr.
The dead lock happens with these steps:
1. "hg update ..." starts. Let's call it "the master process".
2. Memcache miss happens, and a sshpeer is started by fileserverclient
pipei, pipeo, pipee get created.
3. Workers start. They inherit pipei, pipeo, pipee.
4. The master process wait for the workers without closing pipeo.
5. The workers are at the "cleanup" method.
6. The server-side "ssh" reading its stdin hangs because the master
process hasn't close pipeo, thus no EOF to its stdin.
7. The server-side "ssh" process never ends. It does not close its
stderr.
8. The workers reading from pipee never end. They never get EOF
because the server-side "ssh" won't close its stderr.
9. The master process waiting for workers never never completes.
Because workers won't exit before they read all pipee.
The patch closes pipee for forked processes to address the issue.
Ideally, we want this in sshpeer.py because it could in theory
affect other sshpeer use-cases. But let's do it for remotefilelog
for now since remotefilelog is currently the only victim of this
deadlock pattern.
Test Plan:
Add some extra debugging logs. Check the wrapped `_cleanup` gets called
and things work normally.
Reviewers: #mercurial, ttung, durham
Reviewed By: durham
Differential Revision: https://phabricator.intern.facebook.com/D3685587
Tasks: 12563156
Signature: t1:3685587:1470688591:0f4f97508699b273e17df867898d65205ee52434
This is a large refactor of the original code base. I apologize for it not being
broken up into multiple commits.
This adds the Manifest class (that represents a single Manifest directory
entry), the ManifestFetcher class (which allows fetching children manifests),
and the InMemoryManifest class (that represents an uncommitted Manifest).
The combination of these classes does a few things:
1. It removes most of the references to PythonObj from the primary algorithms.
So in the future we could use this code from other C++ code without relying on
Python.
2. It refactors the manifest access in such a way that we allow manifests to be
stored in either python strings or in memory. This opens the doors to
implementing manifest editting apis.
Summary:
Now that we have a wrapper around python objects, let's add a getattr() function
to easily retrieve attributes off the object.
Test Plan: Ran the perf test
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3674366
Summary:
This patch introduces a PythonObj class which implements the copy-constructor,
destructor, and assignment operator in a way that will manage the ref count
automatically. If we move to C++ 11 we could also implement the move constructor
and move assignment operators to make this even more efficient.
The current implementation allows implicitly converting to and from PyObject*,
which may be questionable design wise, but makes switching to and using this
class much cleaner since we can do things like `PythonObj foo = PyObject_New()`
and `PyObject_DoStuff(myPythonObj)`.
Test Plan: Ran the perf test. It succeeded, and I saw no effect on perf.
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3674311
Summary:
Instead of returning NULL and propagating it up the call stack, let's throw
pyexception (which assumes the python error string has already been set) and the
top of the stack can just return NULL.
Test Plan: Ran my perf test suite
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3673804
Summary:
This implements treemanifest.diff(). It takes two manifests and iterates over
them to produce a python dictionary containing the differences.
I'm not proud of this. Just putting it up for review for completeness since it
completes the find, diff, iter trifecta. I need to refactor it to remove some of
the duplication before it gets accepted.
Test Plan:
Ran it as part of a perf test suite using diffs across various
distances. It takes 250ms to diff across 5000 commits, and 900ms to diff across
50,000 commits.
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3646003
Summary:
By making this a class we can encapsulate common operations like parsing and
directory checking. The later diff that implements treemanifest_diff uses this
a lot.
Test Plan: Ran my perf tests for find and iter (and for the later patch diff).
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3673228
Summary:
The varargs style Py_BuildValue functions have no idea what type the incoming
arguments are, so it passed the hard coded ints as 32 bit instead of 64bit.
Let's explicitly cast every number being passed to that function as Py_ssize_t
instead.
Test Plan: We were seeing segfaults without this change. Now we don't.
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3673217
Summary:
This implements treemanifest.find(), which takes a filename and returns the node
and flag for it, if it exists.
This isn't the prettiest function I've ever written. I need to think about how
to refactor this to unify the various traversal algorithms that are used in
treemanifest.
Test Plan:
Ran it as part of a perf test suite. It's basically 0 milliseconds in
every case.
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3645967
Summary:
This implements the basic __iter__ logic. It returns an iterator that returns
every file path in the manifest.
Test Plan:
Ran a separate perf test suite to verify the performance of this. It
can iterate over 1 million files in about 550 milliseconds, assuming a fast
store.
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3645890
Summary:
This defines the basic type for representing an iteration over all the keys in
the treemanifest. The next patch fill add the logic that constructs and mutates
this type as it iterates.
Test Plan: I ran a perf test in a future patch that executes all of this code.
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3645768
Summary:
This adds getdata and binfromhex functions. These are common functions that will
be used throughout the implementation of the ctreemanifest.
Test Plan: Ran a perf test suit on the code in a later diff.
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3644960
Summary:
This is the basic definition of the c treemanifest type. Future patches will add
functions to this type.
Test Plan: Tested by running a perf suite on a later version of this series
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3644935
Summary: Just a simple module declaration with no logic yet.
Test Plan: Ran perf tests against a later implementation of this
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3644720
Summary:
fastdatapack is the same as datapack. add selector in datapackstore to determine which datapack to create.
test-datapack-fast.t is the same as tset-datapack.t, except it enables fastdatapack
Test Plan: pass test-datapack.t test-datapack-fast.t
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3666932
Signature: t1:3666932:1470426499:45292064e2868caab152d9a5b788840c5f63e4e4
Summary: Call the getdeltachain in C, format the results for python.
Test Plan: pass test-repack.t
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3671744
Signature: t1:3671744:1470382442:efae340c8cf5b173407c909c0816bf26704c7bf5
Summary: Do the divides when we load up the index table. Lookups are now cheaper.
Test Plan: pass test-repack.t
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3671719
Signature: t1:3671719:1470382246:937fd19f60ad71304e6a75f348fa92f067aba895
Previously we were kicking off background repacks even for non remotefilelog
repos. Moving the repack to be inside the remotefilelog requirement check will
prevent this.
Summary: This is to match the Python API.
Test Plan: used in later diff to pass test-datapack.t
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3668366
Signature: t1:3668366:1470341366:7c77684e646fe31a742e76158133597b46307797
Summary: The raw index is a byte offset, not an entry number. I hope the compiler is smart enough to optimize out the divide and multiply. :)
Test Plan: cdatapack_get on a delta chain that has a deltabasenode does not crash!
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3667033
Signature: t1:3667033:1470341429:b37da6c9ea6e37fe79b48ec6766c857b5e56c36a
Summary:
Replace some TODOs with actual error handling code.
Also lumped in typo fixes and style changes. Sorry.
Test Plan: Used in a later diffs to pass test-datapack.t
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3666875
Signature: t1:3666875:1470341306:67cd439341ad30fcd690ff2e399e8beacea1c0bb
Summary:
1. When bisecting, we don't want to wrap around. If middle == 0 and we're lesser than that, we should just fail.
2. large fanout should be header->config & LARGE_FANOUT. | means it's always a large fanout.
3. the format of the fanout table on disk makes it impossible to differentiate between an empty fanout entry and the first fanout entry, as they are both '0'. Therefore, any entry before the *second* fanout entry must implicitly search the 0th element.
4. fixed a bug in the calculation of the last index entry.
Test Plan: passed test-datapack.t with other fixes applied.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3666770
Signature: t1:3666770:1470341277:3f4f63a365e8bb0f4da6e574fc7f15228877c682
Summary: We don't ever need to modify the node sha data, so make it const.
Test Plan: compiles
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3666714
Signature: t1:3666714:1470340104:080ffa290a49388e797dcefc66976f6341932b76
Summary: Needed if we want to do a hybrid implementation of cdatapack
Test Plan: used in following diff.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3660087
Signature: t1:3660087:1470339373:4e8b548f1509af7f34d0a4bf8bd85723f38d238d
Summary: `iteritems()` differs from `__iter__()` slightly in that it yields the delta base and delta.
Test Plan:
run this toy program
```
#!/usr/bin/env python
import cdatapack
a = cdatapack.datapack('d864669a5651d04505ec6e5e9dba1319cde71f7b')
for x in a.iterentries():
print x[0], repr(x[1]), repr(x[2]), len(x[3])
for x in a:
print x[0], repr(x[1])
```
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3659133
Signature: t1:3659133:1470339839:dbdce5990a30ffe019ccc44fce97925b64524acd
Summary: cdatapack now has a getiter function, and it returns a cdatapack_iterator.
Test Plan:
using this toy program, dumped a pack file.
```
#!/usr/bin/env python
import cdatapack
a = cdatapack.datapack('d864669a5651d04505ec6e5e9dba1319cde71f7b')
for x in a:
print x
```
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3659005
Signature: t1:3659005:1470339657:aa39cc57a669b9bc4604933ce35ed20b3f81b468
Summary:
1. Get ntohl from arpa/inet.h as per the posix spec
2. Get ntohll from endian.h's be64toh
Test Plan: make local
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3671211
Signature: t1:3671211:1470341382:e6b0fe12094246aeb6be09252122bde9680e4599
Summary: Just a simple module declaration with no logic yet.
Test Plan:
```
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:2445a3a> make local
<output snipped>
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:2445a3a> python
Python 2.7.11 (default, Mar 1 2016, 18:40:10)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cdatapack
>>>
```
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3654786
Signature: t1:3654786:1470175354:c7e8847dcc74c83483d21888ad30cd9242fb461c
Summary:
Given a node sha, find it in the index file and retrieve the deltas. Checksum the data and dump it.
Depends on D3637000, D3636945
Test Plan:
```
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:68cd351> /Users/tonytung/Library/Caches/CLion2016.2/cmake/generated/cdatapack-64b7828e/64b7828e/Debug0/cdatapack_get d864669a5651d04505ec6e5e9dba1319cde71f7b f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e
source/zippydb/tier_spec/tier_settings/zippydb.wildcard.tmpfs.zippydb_settings.cconf
Node Delta Base Delta SHA1 Delta Length
f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e 0000000000000000000000000000000000000000 f32b366a6c44430df6526133f82f9638426ba9c5 37769
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:68cd351> hg debugdatapack d864669a5651d04505ec6e5e9dba1319cde71f7b --node f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e
source/zippydb/tier_spec/tier_settings/zippydb.wildcard.tmpfs.zippydb_settings.cconf
Node Delta Base Delta SHA1 Delta Length
f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e 0000000000000000000000000000000000000000 f32b366a6c44430df6526133f82f9638426ba9c5 37769
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:68cd351>
```
Reviewers: durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3637416
Signature: t1:3637416:1470094723:bce7e903cd0b80c293e16b7532c49e552d3039ef
Summary:
1. offsets are absolute byte offsets. convert them to entry offsets to make the bisect code a lot simpler.
2. when writing entries to pack chain, we need to advance the pointer.
Depends on D3627122
Test Plan: used in later diff.
Reviewers: durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3637000
Signature: t1:3637000:1469741885:c2416a3b30e5bb2b64e6bb7062f4c02098be91eb
Summary:
When retrieving a delta chain, datapack.py uncompresses the delta chain data. However, when iterating over the datapack, we get the compressed length. THis is not desirable as the output is no longer consistent. This diff peeks into the lz4 header to get the uncompressed length when iterating.
Depends on D3627119
Test Plan:
```
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:e2ef218> hg debugdatapack d864669a5651d04505ec6e5e9dba1319cde71f7b --node f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e
source/zippydb/tier_spec/tier_settings/zippydb.wildcard.tmpfs.zippydb_settings.cconf
Node Delta Base Delta SHA1 Delta Length
f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e 0000000000000000000000000000000000000000 f32b366a6c44430df6526133f82f9638426ba9c5 37769
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:e2ef218> hg debugdatapack d864669a5651d04505ec6e5e9dba1319cde71f7b | tail -n 4
source/zippydb/tier_spec/tier_settings/zippydb.wildcard.tmpfs.zippydb_settings.cconf
Node Delta Base Delta Length
f2e53f83c5dc 000000000000 37769
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:e2ef218>
```
Reviewers: durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3636945
Signature: t1:3636945:1469811243:b21d90d9599244ed4600c5336818b9a18eacf3ff
Summary:
`->index_table` is not heap-alloacted. however, `->fanout_table` is and should be released.
Also added call to `close_datapack()` at the end of `cdatapack_dump.c`.
Depends on D3627122
Test Plan: valgrind is much happier now.
Reviewers: durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3631368
Signature: t1:3631368:1469741779:e0c4e5d59c7e73c8aa3507901df3005383f0d3f5
Summary: This is not yet complete, but seems to be able to parse a data file.
Test Plan:
`/Users/tonytung/Library/Caches/CLion2016.2/cmake/generated/cdatapack-64b7828e/64b7828e/Debug/cdatapack_dump d864669a5651d04505ec6e5e9dba1319cde71f7b > /tmp/2`
compare it with the output of `hg debugdatapack --long d864669a5651d04505ec6e5e9dba1319cde71f7b > /tmp/1`
and it exactly matches.
Reviewers: durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3627122
Signature: t1:3627122:1470085301:c9b9e8b2fa57bb7a09dd56d3c811ff8eadbb85ba
Summary:
It should include the filelen and the deltalen fields, which are
2 and 8 bytes.
Test Plan: visual.
Reviewers: durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3627108
Signature: t1:3627108:1469742083:ffb59768906d9e5463065eec92e1c80cc8482884
Calling wrapfunction on the remotefilepeer(sshpeer) object in exchangepull
function introduces a reference cycle. Hence, this object will not be deleted
until the process dies. This is not a big issue for processes having a short
lifetime(e.g. lauched by command line.)
However, for persistent processes (e.g. TortoiseHg), this can lead to multiple
lingering ssh connections to the server(actually one by pull operation).
The fix is to not wrap the remotefilepeer._callstream. This method is defined
right into the remotefilepeer object. The required repo data is made available
in the remotefilepeer object by monkeypatching this object in the exchangepull
function.
In some situations the remotefilelog setup logic could be called, which will
wrap certain functions, and then later a call will happen to a repo that wasn't
remotefilelog which will run some remotefilelog code because of the wrapping.
Normally we take care of this by checking for the remotefilelog requirement. We
missed it in this one spot though.
In two place, we were checking if a revlog was an instance of
revlog.revlog and, I think, treating it as a
remotefilelog.remotefilelog otherwise. I noticed this when I created
another non-revlog.revlog revlog in narrowhg and remotefilelog thought
it was a remotefilelog.remotefilelog. Let's specifically check if it's
a remotefilelog.remotefilelog instead.
Summary:
When running large repack operations, the resident size of the process
could become quite large, since we're scanning in entire pack files. Linux/OSX
have api calls for telling the kernel it's ok to release some of that memory,
but those apis are not exposed to python.
So instead, let's unmap and remap the mmap's once a certain amount of data has
been read. I also tried changing the mmap accessors to use the file oriented api
(mmap.read(), mmap.seek(), etc) so we could switch to actual file handles during
repack, but it had a drastic affect on normal performance (repack took 1 hour
instead of a few minutes).
Long term we should move all of this logic to c++ so we can use the more
powerful APIs.
Test Plan:
Did a full repack on a laptop and verified memory capped out at 2GB
instead of exceeding 5GB.
Reviewers: #sourcecontrol, ttung
Differential Revision: https://phabricator.intern.facebook.com/D3545171
Summary:
There was a race condition where if a repack is running and another hg process
launches, the new process will only see the original packs, and not any of the
new packs (even though the source blobs are being deleted from disk by the
repack).
The fix is to allow our pack store to refresh it's list of packs every so often.
In this particular implementation we do it at most every 100ms. A more robust
strategy would be to group key misses and only check for new packs at the end
once we have a list of all the misses, but this would require significant
refactoring to make everything grouped. This case should only ever happen during
repacks, so it should almost never occur more than once during a command, so the
100ms version is probably good enough.
Test Plan:
Ran `hg up && hg pull && sleep 0.2 && hg up master` in a loop with a
break point in the refresh code and caught it executing in a situation where the
background repack had removed the original sources and put them in a new pack.
Verified that it loaded the data from the new pack correctly.
Reviewers: #mercurial, ttung, lcharignon
Reviewed By: lcharignon
Subscribers: lcharignon
Differential Revision: https://phabricator.intern.facebook.com/D3524314
Signature: t1:3524314:1467907680:85be07ad953811000c468852eb0626f4d8b53a13
Summary:
The shared cache needs to be completely g+ws so that all members of the group
can write to each directory in it. The old code only applied g+ws to the leaf
directories, so other users aren't able to write to non leaf directories (like
hgcache/7a/83beca8.../ others couldn't write to 7a/)
Test Plan:
Updated a test to view group permissions for the intermediate
directories
Reviewers: #mercurial, ttung, simpkins
Reviewed By: simpkins
Subscribers: lcharignon, net-systems-diffs@, simpkins, mbolin
Differential Revision: https://phabricator.intern.facebook.com/D3523918
Signature: t1:3523918:1467930221:452b11b56a2e69896bf8d2cd0acd7131b41f90d8
Summary:
Previously, the history repack logic would stop traversing history for a given
filename once it encountered a rename. This isn't quite right, since the history
could eventually be traversed back to the original file, where we'd need to
continue processing. So now we check for when the copyfrom becomes the filename.
Also, if the copy source file and the copy target file have two nodes with the
same value, we would not process the one in the copy target (since it was marked
do not process). We fix this by explicitly checking if the node is one of the
known entries in the file being processed.
Test Plan: Added a test
Reviewers: #mercurial, ttung, mitrandir, rmcelroy
Reviewed By: mitrandir, rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3523215
Signature: t1:3523215:1467828169:bd487c8f296352c1a1b9355cb55f9001bd5e19a9
Before this patch, debugremotefilelog and verifyremotefilelog would
crash if not given a path. Also, many commands would accept arguments
they then ignored.
Summary:
Before this patch, we were not importing mercurial.error, this was
causing a crash when calling error.Abort. This patch adds the missing import.
Test Plan: Tests pass, and add a new test
Reviewers: durham
Differential Revision: https://phabricator.intern.facebook.com/D3457086
In the old days we would check the cache first, then the local store. This was
important because the cache is more likely to contain correct data (since it
comes from the final pushed version of commits), versus local data which may
contain information about stripped commits.
As part of the big store refactor, this order got switched unintentionally. So
let's switch it back.
Summary:
The pack path logic did not use the correct unix group when
remotefilelog.cachegroup was specified. This fixes that.
Test Plan:
I manually tested it by deleting a pack dir and running repack. This
is hard to create an automated test for since the feature isn't really cross
platform, and we don't have a way to know what groups they have on their
machine.
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3400756
Tasks: 11584114
Signature: t1:3400756:1465342537:ed023f6dc830117df5e85e294a41486f072714c9
The new pack stores return None for the copyfrom field, instead of the expected
''. We need the local file blob generator to handle this case, instead of just
putting None in the copyfrom field.
Summary:
Now that repack can clean up old remotefilelog blobs, let's have it also delete
any empty directories that get left behind.
Test Plan: Updated an existing test to cover it
Reviewers: mitrandir, lcharignon, #sourcecontrol, ttung, simonfar
Reviewed By: simonfar
Subscribers: simonfar
Differential Revision: https://phabricator.intern.facebook.com/D3385546
Signature: t1:3385546:1464972782:5ca63cf0a5589bb8a537957f50b4bc5ec4e0f0f5
Summary:
Previously a bunch of different places accessed the cachepath through ui.config
directly. This is a problem because we need to resolve any environment variables
in the path, and some spots didn't do this. So let's unify all accesses through
a helper function that takes care of the environment variables.
Test Plan: Added a test
Reviewers: mitrandir, lcharignon, #sourcecontrol, ttung, simonfar
Reviewed By: simonfar
Subscribers: simonfar
Differential Revision: https://phabricator.intern.facebook.com/D3385583
Signature: t1:3385583:1464971813:5b9ee5ed3d6ff9f1a78cb9e0269e433844758c9d
Previously, background repacks would only repack pack files, which meant there
was no automated way to repack loose remotefilelog files without manually
running 'hg repack'. This allows incremental repacks to also pack the loose
files.
It also changes the config knob for background repacks, so we can enable pack
file usage without the server having to support it just yet.
A previous patch allowed the unionmetadatastore to return partial histories if a
certain config provided. This allowed repack to get partial history information.
This patch does the same for deltachains. This isn't currently used, but will be
used in the future to allow repacking packs with partial delta chains by just
lifting them out of one pack and putting them directly in another.
Previousy, a union store required that it be able to compute the entire history
of the revision. This caused problems in repack, since it may only have a
partial history. Instead of throwing a KeyError and giving the repack algorithm
no history information at all, we add a config knob to let the repack logic
specify that it's ok with partial histories.
The next patch will do the same for contentstore.
We've unified on KeyError being the error thrown when the pack is missing the
desired filename+filehash, but there were a few old places still using
LookupError. This patch changes them to also be KeyError.
This fixes an issue where a repack could throw a LookupError when it only had a
partial history of a file. Now that we throw a KeyError, the exception is caught
and handled appropriately.
Summary:
This moves the common logic from datapack and historypack into a common
basepack. At the moment the only common logic is the constructor, which handles
version checking, fanout initialization, and mmap stuff.
Test Plan: Ran the tests
Reviewers: mitrandir, #mercurial, ttung, mjpieters
Reviewed By: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D3306558
Signature: t1:3306558:1463474571:35d3d2e71849b8111e5455da2dd4810725a35523
Summary:
This takes the duplicate logic from datapackstore and historypackstore and moves
it into a common subclass.
Test Plan: Ran the tests
Reviewers: ttung, mitrandir, #mercurial, mjpieters
Reviewed By: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D3306551
Signature: t1:3306551:1463474958:ddd36a43c2a3cbb34254c2b0153c0f3c5d431edb
Summary:
Now that we have a mutablebasepack base class, we can get rid of all the
redundant logic in mutablehistorypack. This also has the side effect and making
the historypack's fanout table dynamically size, just like the datapack already
does. That required a few changes to the historypack reader class as well.
Test Plan: Ran the tests
Reviewers: mitrandir, #mercurial, ttung
Differential Revision: https://phabricator.intern.facebook.com/D3306546
The iterbatch() handling added in f93aa99d4f1e (fileserverclient: use
new iterbatch() method, 2016-03-22) was broken by 31e88bf6faf0 (store:
change fileserviceclient to write via new store, 2016-04-04). Fix it
by copying the pattern introduced elsewhere in that change.
Summary:
mutabledatapack and mutablehistorypack share a lot of common code, especially
around the index and fanout table. Let's move much of the code to a common
mutablebasepack class and out of mutabledatapack. In the next patch I will make
mutablehistorypack a subclass of mutablebasepack and delete all the duplicate
logic.
Test Plan: Ran the tests
Reviewers: mitrandir, #mercurial, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: quark, rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3306542
Signature: t1:3306542:1463611860:16bc68416c9bbed87748a50f55a3bac7c618fdf1
Summary:
In normal Mercurial, the filelog entry's contents contains extra metadata that
stores the copy source. In the new pack format, that information is stored in
the history store, not in the data store. Therefore we need to change the server
side logic that responds to requests for packs to move that information over to
the history side before it sends the data.
Test Plan: Added a test
Reviewers: ttung, mitrandir, #mercurial, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3306539
Signature: t1:3306539:1463609462:0c1e33e0892f96effcc96c8f78401cf0d8ab5cbd
Summary:
Previously we only had incremental repacking for data packs. This patch adds it
for history packs as well. The algorithm here is simpler, since the amount of
history data is generally much smaller than the amount of delta data.
The algorithm is basically: if there are 2 things bigger than 100MB, repack
them; else repack up to 100MB of smaller things.
The datapack hashes changed because having the history available during a repack
allows us to make different decisions about delta ordering, etc.
Test Plan: Updated the tsets
Reviewers: mitrandir, #mercurial, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3306535
Signature: t1:3306535:1463696613:f40ed10c9dfed40d7bc455582592a7aed108ec3a
Summary:
This runs the incremental background repacking logic after hg pull.
As part of adding tests, I also added a 'hg debugwaitonrepack' function that
will wait until any pending repack is done before returning, so the tests can
wait on repacks without so many sleeps.
Test Plan: Adds a test
Reviewers: mitrandir, #mercurial, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3306526
Signature: t1:3306526:1463696933:9e27daf0c08076468e8f365a3c372fa7d4f56bde
Summary:
This adds a --incremental flag to the hg repack command. This flag causes repack
to look at the distribution of pack files in the repo and performs the most
minimal repack to keep the repo in good shape. Currently it's only implemented
for datapacks.
The new remotefilelog.datagenerations config contains a list of the sizes for
the different generations of pack files. For instance:
[remotefilelog]
datagenerations=1GB
100MB
1MB
Designates 4 generations - packs over 1GB, packs over 100MB, packs over 1MB, and
implicitly packs undex 1MB. The incremental algorithm will try to keep each
generation to less than 3 pack files (prioritizing the larger generations
first). When performing a repack it will grab at least 2 packs, and will grab
more if the total pack size is less than 100MB (since repacking at that level is
pretty cheap).
I have no idea if this is a good algorithm. We'll how to see and iterate.
Test Plan: Adds a test
Reviewers: mitrandir, #mercurial, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3306523
Signature: t1:3306523:1463697129:c87f4a397ef357b5ca4a80d01e9a6ca4d61f9d3d
Summary:
A future patch will be adding incremental repack, so let's move our repack logic
to the repack module so it's easier to refactor and extend.
Also adds a message for when a background repack kicks off (since we'll be
calling that from other places eventually).
Test Plan: Adds a test
Reviewers: mitrandir, #mercurial, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3306521
Signature: t1:3306521:1463602886:cece3d517f0672b829702866482c902812f9ae27
Summary:
Previously creating a mutable pack then not adding anything to it, would result
in a 1 byte pack file. This patch fixes it so it doesn't produce any pack file.
A future patch found this bug and includes a test that executes this code path.
Test Plan: Future patch adds a test
Reviewers: mitrandir, #mercurial, ttung, rmcelroy
Reviewed By: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3306520
Signature: t1:3306520:1463602354:f47e9478ea686b0b86525b5979dec9ce56301b2a
On windows, you cannot delete a file if it is readonly. Since remotefilelog
makes heavy use of readonly files, we need to chmod them to be writable before
deleting them.
We only do this on windows, since on unix based systems if the current user is
not the owner of the file (and is just accessing the file via a shared group),
they cannot chmod the file.
Renames some variables to disambiguate the stat module as well.
On windows the grp module is not present, so we need to avoid importing it. This
means the shared group feature of remotefilelog is not supported on windows.
On windows, there is text mode and binary mode for reading and writing files.
Since we're dealing with files as just blob data, we always want binary mode.
'debugindex --dir' is used for looking up the revlog of a directory
manifest, so it should use the normal (local file system) handling,
not remotefilelog.
Summary:
Previously all fanout tables were 2^16 in length. For small packs, this resulted
in very sparse fanout tables that had to be linearly scanned to find the end of
the search bounds, which was slow.
With this patch, the data index now dynamically chooses what fanout table size
to use. If the pack has over 2^16 / 10 entries, we use 2^16, otherwise we use
2^8. The reasoning is in the code.
The patch is a bit large because we had to take a bunch of constants and
duplicate them, and change their accessors to access them through member
variables.
Test Plan: Added a test
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3279000
Signature: t1:3279000:1463170357:9746d19dde14743bac9a8c40cafbc618504c420f
Summary:
Previously, when repacking deltas we would require a full history of the node so
we could order the hashes optimally. In some situations though, we don't have
the full history available (like if we're only repacking a subset of packs), so
we need to be able to repack even without full history.
This patch handles the case where a given delta doesn't have history
information. We just store it as a full text.
This becomes useful in an upcoming series that will introduce incremental
packing that only packs a subset of the packs.
Test Plan: Added a test
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3278346
Signature: t1:3278346:1463086170:54c0fbefe78f9cafa7efc4b6f037887a924ab4a5
Summary:
In an old version of the code, we would walk the entire history of a node during
data repack, which meant we had to keep track of when we saw a rename, and stop
walking there. Since then, we've changed the code to no longer walk the entire
history, and instead walk just the parts it was told to repack for this
particular file. This means we no longer ever walk across a copy, and therefore
don't need this copy detection logic.
Test Plan: Ran the tests
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3278443
Signature: t1:3278443:1463086137:c6d9eb6637bf3b8636a3df7e531f265d51cab0de
Summary:
In a remotecontent/metadatastore a `get` request first runs prefetch, then reads
the resulting data from the shared cache store. Before this patch, the prefetch
would not download a value if it existed in the local data store, which means
nothing would be added to the shared cache store, and the `get` would fail. This
patch changes the remote stores to always prefetch based only on the contents of
the shared cache, so data will always be written.
This issue showed up when attempting to repack pack files that contained
references to nodes that were in the local store (which it didn't have access
to) but not the shared cache.
Test Plan:
Manually verified my issue disappeared. This isn't actually an issue
anymore, since future patches refactor the way repack works to not rely on the
remote stores, so this shouldn't be hit again. But it's a safe change
regardless.
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3278362
Signature: t1:3278362:1463086099:987d2fdd1c75e518f815c3159473e8cb22a15ba0
Summary:
In the old days _findsection would return None if the section wasn't found. We
have since changed it to throw a KeyError like all the other operations in the
packs. We need to update getmissing to eat that error like usual.
Test Plan: ran the repack that was failing, it succeeded. Added a unit test
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3277245
Signature: t1:3277245:1463085831:0b32f852f49bd45ef2dfd3298313b9c2b87f75b6
Summary:
Previously the fileserverclient logic only checked the data store when
determining whether to prefetch a given key or not. This meant that if the file
was in the data store but not the history store, it could result in a history
lookup failing to prefetch.
The fix is to separate the notions of data and history in the prefetch logic. By
default we just check the data store like before, but an optional argument
allows us to specify checking the history store as well (and we change the
remotemetadatastore to pass that argument).
Test Plan:
Ran the tests. Also ran the repack scenario in my large repo that
reproed the issue. In another diff, I'm going to come back and add a suite of
tests around the various repack permutations that I've seen cause issues.
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3277239
Signature: t1:3277239:1463085690:49ad478048cd13836b60f7ac9190e2294f5e9c64
Summary:
Add a simple progress bar on the client when receiving a pack from the
server.
Test Plan: Ran it
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3277265
Signature: t1:3277265:1463085643:aa0960092958c4f56a6d1c3a3901348dba48aa91
Summary: Some simple debug commands to print the contents of each pack.
Test Plan: Ran it manually, and added a simple test
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3277233
Signature: t1:3277233:1463085196:c54fc875d536a96150bb1461b77247a5d7a9402c
Summary:
In a future patch we'll add debug commmands to view the contents of packs. Let's
repurpose iterkeys to just be __iter__ and have it return most of the data about
each entry.
Test Plan: Ran the tests
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Subscribers: mitrandir, quark
Differential Revision: https://phabricator.intern.facebook.com/D3277225
Signature: t1:3277225:1463170015:c1acbc7f435f1cf8d07fea4f32bf742a22de5716
Summary:
Previously, we assumed every delta in a pack had a complete chain in that pack,
so the index had no way to indicate a deltabase offset that didn't exist in the
pack.
Let's add a new marker to indicate that the delta base doesn't exist in this
pack.
Test Plan: Tested in my large repo repack scenario. Added a test
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3277222
Signature: t1:3277222:1463084346:1687cfa174e98c2cf3022de9e9c3808881f689cd
Summary:
This adds a new wire protocol command to allow clients to request a set
of file contents and histories from the server and receive them in pack format.
It's pretty simple and always returns all the history for every node requested
(which is a bit overkill), but it's labeled v1 and we can iterate on it.
Test Plan: Added a test
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3277212
Signature: t1:3277212:1463421279:459cc84265502175b47df293647aab7e7a830185
Summary:
Previously we were throwing away copy information when we repacked things into
pack files. The hope was that we could store copy information somewhere else,
and keep the history pack using fixed length entries. Since storing copy
information elsewhere is a long ways off, let's just go ahead and put copy info
in the pack file.
This makes the entries non-fixed length, which means any iteration over them has
to read the length of each entry. This also affects the historypack filename
hashes since they are content based, so the tests had to change.
This matches the old remotefilelog behavior more closely (which is why no code
had to change outside the pack logic).
Test Plan: Added a test
Reviewers: #mercurial, mitrandir, ttung
Reviewed By: mitrandir
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3262185
Signature: t1:3262185:1462562602:935683692276c7fa569d381b18aa3b18656793b1
Summary:
In a future diff we will be changing the format of history pack entries
to have a variable length file path in them. Before doing that, let's remove the
places that depend on each entry being exactly 80 characters.
Test Plan: Ran the tests
Reviewers: #mercurial, mitrandir, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3262154
Signature: t1:3262154:1462820474:b7f5666001b3b7f8863b4de4826266204f3e87aa
Summary:
These APIs weren't actually used, and the questions can be answered via
the existing getancestors() api anyway.
They were originally put in place because they are the type of question that
doesn't require the full ancestor tree, so we could answer them without doing in
traversal. In an upcoming patch we add the concept of copyfrom back into the
historypack, and getparents becomes confusing since it doesn't expose knowledge
of copy information. So I just decided to delete it all until we need it.
In the future we may want a 'gethistoryinfo(filename, node)' api that just
returns (p1, p2, linknode, copyfrom), to fulfill that original need of history
information without a full ancestor traversal.
Test Plan: Ran the tests
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3261734
Signature: t1:3261734:1462413665:987c4703e53468a75346aa323188107a5c070fde
Summary:
The mercurial atomic temp logic makes some assumptions about modes and copies
the mode bits from the existing file. If the existing file is readonly, that
means the atomic temp fails to open the file for writing after it's initially
created.
Since we only actually need like 4 lines from the atomic temp code, I just
implemented it on our end, with custom logic for mode handling.
Test Plan: Ran the tests
Reviewers: #mercurial, ttung, quark
Reviewed By: quark
Differential Revision: https://phabricator.intern.facebook.com/D3288589
Signature: t1:3288589:1462993609:653a0a63266b9129b0e18807858e5de02310ecf9
Summary:
This allows triggering a repack that can be run in the background. In the future
we will trigger this automatically under certain circumstances (like too many
pack files).
Test Plan: Added a test
Reviewers: #mercurial, ttung, quark
Reviewed By: quark
Subscribers: quark
Differential Revision: https://phabricator.intern.facebook.com/D3261161
Signature: t1:3261161:1462398568:5ae25f3e5a9acd0f4b34490b34a62be33cc69e3c
Summary:
This adds a lock that limits us to running only one repack at a time. We also
add a simple prerepack hook to allow the tests to insert a sleep to test this
functionality.
Test Plan: Added a test
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3260428
Signature: t1:3260428:1462393311:3e1bf5dd047e7f3521679ca7640b448f5e784913
Summary:
Since recent packs are likely to contain more recent data, let's put them at the
front of the pack list so they are checked first.
In a future diff I'll come back and refact the common code between datapack and
historypack into one base class.
Test Plan:
Ran the tests. Used the debugger to verify that the sort order was
correct.
Reviewers: #mercurial, ttung, mjpieters
Reviewed By: mjpieters
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D3259823
Signature: t1:3259823:1462388395:8ee48a7b02c6abc079878e53c5b336675249cb91
Summary:
Some miscellaneous perf improvements:
- Get rid of unnecessary sorting
- Bail early from remotefilelog reverse filename lookup if there are no keys
- Avoid unnecessary class/enum lookups in ancestor hotpath
- Fix n^2 behavior in history repacking
- Remove unnecessary set() and tuple instantiation in hotpath
- Use __slots__ on repackentry to improve access times
Test Plan:
Ran the tests. Tested the actual perf improvements via 'hg repack' in
a large repo. The n^2 fix caused a massive perf when, but the others shaved off
a number of seconds as well.
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3257595
Signature: t1:3257595:1462392755:80209fe66c2632d2c16a2840500b0655926e7513
This fix was already applied to datapack a while ago. Basically, if the pack
doesn't have a lot of revisions, it's possible that the fanout table is sparsely
populated. Therefore we need to scan forward when looking for the end bounds of
our fanout table.
Summary:
Previously, if you ran repack twice in a row, it would actually delete your
packs, because the repack produced files with the same name as before, and the
cleanup then deleted them.
The fix is to have the stores record what files they produced in the ledger,
then future clean up work can avoid deleting anything that was created during
the repack.
Test Plan: Added a test
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3255819
Signature: t1:3255819:1462393814:d32155b12535990f72fbe48de045eddbb6f7fab6
Summary:
Since pack files should never change after they are created, let's create them
with read-only permissions. It turns out that the Mercurial vfs doesn't apply
the correct permissions to files created by mkstemp (and we have to use mkstemp
since we don't know the name of the file until after we've written all the data
to it), so we have to manually call the permission fixing code.
We also need to fix our mmap calls to be readonly now, otherwise we get a
runtime permission denied exception.
Test Plan: Added a test
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3255816
Signature: t1:3255816:1462321201:dff4fb4c9301d67a77043ecc1d96262bb5d6a54a
Summary:
Instead of passing in a path and performing joins ourselves, let's use an
opener. This will help handle all the file permission edge cases.
Test Plan: Ran the tests
Reviewers: #mercurial, ttung, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3255165
Signature: t1:3255165:1462393836:38a28c850a0dc06838d9c17672d3dffd9903bbd7
Summary:
A future patch will add a version number to the index file. Let's move the
version size, fanout start, and index start to constants so we can more easily
change them without changing the code.
Test Plan: Ran the tests
Reviewers: lcharignon, rmcelroy, ttung, mitrandir, quark
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3254876
Signature: t1:3254876:1462315858:63fe56e8cfcdbb0209861898ce0c45c7d7b33e35
Summary:
`hg gc` is very aggressive in that it deletes any files in the cache that it
determines aren't a needed key, including files it doesn't recognize. Let's
teach it to not delete pack files.
In the future we'll need to make `hg gc` able to garbage collect the contents of
pack files as well.
Test Plan: It's probably fine!
Reviewers: lcharignon, rmcelroy, ttung, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3254744
Signature: t1:3254744:1462308257:c1f932b88abf3337370f16c05c789422ea51b0e1
Summary:
Implementing these two functions allows historypacks to be repacked, either into
a new format, or by combining multiple packs into a single new one.
Test Plan: Added a test in my next patch
Reviewers: lcharignon, ttung, rmcelroy, mitrandir, quark
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3251542
Signature: t1:3251542:1462392294:f95f7666a3a5df675f1351a19af7532c4742af2b
Summary:
Implementing these two functions will allow datapack's to be repacked (either
into other formats, or by combining multiple packs into one).
A future patch will add a test.
Test Plan: Added a test in a future patch
Reviewers: lcharignon, ttung, rmcelroy, mitrandir, quark
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3251539
Signature: t1:3251539:1462393256:7caa09677fbcaaf57a47d7a833684883483c5b3a
Summary:
Previously, given a historypack file, we had no way of reading the contents,
since we had no way to know when to stop reading the revision entries for a
given file section.
This patch changes the format to have a revision count value after the filename
and before the revisions. The documentation already documented the format like
this, and therefore doesn't need updating.
A future patch will use this information to iterate over all the revisions in
the pack.
Test Plan: Added a test in a future patch
Reviewers: lcharignon, ttung, rmcelroy, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3251538
Signature: t1:3251538:1462393282:f46b50e79237bfa8a25ff1957344588622b2699a
Summary:
In a later patch we will need to add the count of revisions in a given file
section to the on-disk format. To make that easier, let's make the file section
serialization lazy, so that we will have the full list when it comes time to
count the entries.
Test Plan: added a test in a future patch
Reviewers: lcharignon, ttung, rmcelroy, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3251537
Signature: t1:3251537:1462393274:60b72a47de45f5a94f4f5a8d34b3942db0aa3fda
Summary:
Previously, hg repack would repack all the objects in all the store and dump the
new packs in .hg/store/packs. Initially we only want to repack the shared cache
though, so let's change repack to only operate on shared stores, and to write
out the new packs to the hgcache, under the appropriate repo name.
In a future patch I'm going to go through all this store stuff and replace all
uses of os.path and direct file reads/writes with a mercurial vfs.
Test Plan:
Ran repack in a large repo and verified packs were produced in
$HGCACHE/$REPONAME/packs
Ran hg cat on a file to verify that it read the data from the pack and did not do any remotefilelog network calls.
Reviewers: lcharignon, rmcelroy, ttung, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3250213
Signature: t1:3250213:1462315927:694661795141e2c869ba661a54cea8f4b90823df
Summary:
Previously, if a repack failed, it would leave temporary pack files laying
around. By adding enter/exit functions to mutable packs, we can guarantee
cleanup happens.
Test Plan: Ran repack, verified that a failure did not leave tmp files
Reviewers: rmcelroy, quark, ttung, lcharignon, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3250201
Signature: t1:3250201:1462234552:7f20260a193ed1dd858bf6e9f489ac902d859218
Summary:
Now that all the repack logic is in place, let's switch the repack
command to use the new version. This also means the repack command will now
clean up the old remotefilelog blobs once it's finished.
Test Plan:
Ran hg repack in a large repo. Verified it deleted the old
remotefilelog blobs, and verified that I could still updated around the
repository without making any remotefilelog network requests.
A future diff will add standard .t mercurial tests for the repack command.
Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3249601
Signature: t1:3249601:1462235506:03c0d95f6a82cfc04b340b139f39c02853941a17
Summary:
We had a naive repack implementation in historypack.py. Let's move it to the
repack module and do the minor adjustments to use the new repackerledger apis.
Test Plan:
Ran hg repack in conjunction with future diffs that make use of this
api
Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3249587
Signature: t1:3249587:1462232544:591cd8bec09f781370896470746eae5a4489531f
Summary:
We had a naive repack implementation in datapack.py. Let's move it to the repack
module and do the minor adjustments to use the new repackerledger apis.
Test Plan: Ran it in conjunction with future diffs that make use of this api.
Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3249585
Signature: t1:3249585:1462232504:a00aa65afca9562a2c1456cc4ab48c50d1ba5b68
Summary:
This implements the new markledger and cleanup apis on the existing
remotefilelog stores. These apis are used to tell the repacker what each store
has, and allows each store to cleanup if its data has been repacked.
Test Plan:
Ran repack in conjunction with the future diffs that make use of
these apis.
Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3249584
Signature: t1:3249584:1462226133:1e8faffc9f6bf8f7c94e6e79aee8865e3c41648c
Summary:
This introduces the high level classes that will implement the generic repack
logic.
Test Plan: Ran the repack in conjunction with later commits that use these apis.
Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3249577
Signature: t1:3249577:1462225435:000f9cc29ae2a3d7fdbedf546c8936ef45d1e4cf
Summary:
Using range() allocates a full list, which is 2**16 entries in the fanout case.
Let's use xrange instead. This is a notable performance win when checking many
keys.
Also removed an unused variable and use index instead of self._index since this
is a hotpath.
Test Plan: Ran hg repack
Reviewers: rmcelroy, ttung, lcharignon, quark, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3249563
Signature: t1:3249563:1462240834:c19d6cbf0b6237f15ca8d81e8da856752df0ec59
Summary:
This adds a basic test suite for the historypack class, and fixes some issues it
found.
Test Plan: ./run-tests.py test-historypack.py
Reviewers: mitrandir, rmcelroy, ttung, lcharignon
Reviewed By: lcharignon
Differential Revision: https://phabricator.intern.facebook.com/D3237858
Signature: t1:3237858:1461884966:c0ec90a2735255e5ef70eade09915066a7b71ee5
Summary: Adds the same check code test that upstream Mercurial uses.
Test Plan:
Ran it, and fixed all the failures. I won't land this commit until
all the failure fixes are landed.
Reviewers: #sourcecontrol, ttung, rmcelroy, wez
Reviewed By: wez
Subscribers: quark, rmcelroy, wez
Differential Revision: https://phabricator.intern.facebook.com/D3221380
Signature: t1:3221380:1461802769:19f5bdc209c05edb442faa70ae572ce31e2fbc95
Summary: Fix check code for various store related files
Test Plan: Ran the tests
Reviewers: #sourcecontrol, mitrandir, ttung
Reviewed By: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3222465
Signature: t1:3222465:1461701300:34560288be4dc921f0252d4ad8fdc9c8d9357e23
Summary: These were missing, and only needed in exception cases.
Test Plan: nope
Reviewers: #sourcecontrol, rmcelroy, ttung
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3219749
Signature: t1:3219749:1461608742:91e3a721e78188c52431b6c5d1b3ad091e249c3a
Summary:
Now that we can read and write histpack files, let's add a store implementation that
can serve packed content.
My next set of commits (which haven't been written yet) will:
- add tests for all of this
Test Plan:
Ran the tests. Also repacked a repo, deleted the old cache files,
ran hg log FILE, and verified it produced results without hitting the network.
Reviewers: #sourcecontrol, ttung, mitrandir, rmcelroy
Reviewed By: mitrandir, rmcelroy
Subscribers: rmcelroy, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3219765
Signature: t1:3219765:1461717992:9b2e8646c0555472fa00ee7059c0f283fd4c2c65
Summary:
The previous patch added logic to repack store history and write it to
a histpack file. This patch adds a pack reader implementation that knows how to
read histpacks.
Test Plan:
Ran the tests. Also tested this in conjunction with the next patch
which actually reads from the data structure.
Reviewers: #sourcecontrol, ttung, mitrandir, rmcelroy
Reviewed By: mitrandir, rmcelroy
Subscribers: rmcelroy, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3219764
Signature: t1:3219764:1461718081:9d812b6aea87fe9eb48fdac9dbef282e4775c3c9
Summary:
This is an initial implementation of a history pack file creator and a repacker
class that can produce it. A history pack is a pack file that contains no file
content, just history information (parents and linknodes).
A histpack is two files:
- a .histpack file consisting of a series of file sections, each of which
contains a series of revision entries (node, p1, p2, linknode)
- a .histidx file containing a filename based index to the various file sections
in the histpack.
See the code for documentation of the exact format.
Test Plan:
ran the tests. A future diff will add unit tests for all the new pack
structures.
Ran `hg repack` on a large repo. Verified pack files were produced in
.hg/store/packs. In a future diff, I verified that the data could be read
correctly.
Reviewers: #sourcecontrol, mitrandir, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: mitrandir, rmcelroy, mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D3219762
Signature: t1:3219762:1461751982:e7bbc65e8f01c812fc1eb566d2d48208b0913766
Summary:
This forces the revisions in the datapack to be added in alphabetical order.
This makes the algorithm more deterministic, but otherwise has little effect.
Test Plan: Ran the tests, ran repack
Reviewers: #sourcecontrol, rmcelroy, ttung
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3219760
Signature: t1:3219760:1461687720:7be5fdc1419f8214c8c83074494b33214b3684ae
Summary:
Now that we can read and write datapack files, let's add a store implementation that
can serve packed content. With this patch, it's technically possible for someone
to prefetch and repack large portions of history for long term storage with
remotefilelog.
My next set of commits (which haven't been written yet) will:
- add tests for all of this
- add an indexpack format for packing ancestor metadata (the datapack only packs
revision content)
Test Plan:
Ran the tests. Also repacked a repo, deleted the old cache files, ran
hg up null && hg up master, and verified it checked out master with the
right files and without fetching blobs from the server.
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3205351
Signature: t1:3205351:1461751649:45a56b57d962a282aeef9478500a3b23495a0eb7
Summary:
The previous patch added logic to repack store contents and write it to a
datapack file. This patch adds a new store implementation that knows how to read
datapacks.
It's just a simple implementation without any parallelism. So there's room for
improvement.
Test Plan:
Ran the tests. Also tested this in conjunction with the next patch
which actually reads from the data structure.
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3205342
Signature: t1:3205342:1461750967:84377517cb1f285d37694a3f503d60ae85bacb66
Summary:
This is an initial implementation of a repack algorithm that can read data from
an arbitrary store (in this case the remotefilelog content store), and repack it
into a datapack.
A datapack is two files:
- a .datapack file consisting of a series of deltas (a delta may be a full text if the delta base is the nullid)
- a .dataidx file consisting of delta information and an index into the deltas
See the code for documentation of the exact format.
Test Plan:
ran the tests
Ran `hg repack` in a large repo. Verified that a datapack and a dataidx file
were created in .hg/store/packs. The datapack used 148MB instead of the 439MB the
old remotefilelog storage used.
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3205334
Signature: t1:3205334:1461751366:ee4bf6a580ffb667071a8046fda6f0858b7f25ae
Summary:
This adds a api to the store contract that allows the store to return a list of
the name/node pairs that it contains. This will be used to allow a repack
algorithm to list the contents of the store so it can repack it into another
store. The old remotefilelog blob store used namehash+node keys, which is
different from the new store API's name+node keys, so the getfiles()
implementation here has to perform a reverse namehash->name lookup so it can
satisfy the store API contract.
In the remotefilelog basestore implementation, it reads the file names from the
local data directory and the shared cache directory, and reverse resolves the
file name hashes into filenames to produce the list.
Test Plan: ran the tests
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3205321
Signature: t1:3205321:1461751437:a7c44c2bbe153122a3b85b8d82907a112cf77b1a
Summary:
The old store api required that each store be able to return the complete
ancestor history for a given name/node pair. This patch allows a store to return
only the parts of history it knows about, and the union store will combine that
history with the history from other stores to produce the full result. This is
useful for stores like bundle files, where they contain only a partial history
that needs to be annotated by the real store.
Test Plan: ran the tests
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3205319
Signature: t1:3205319:1461751511:210740b82cc6767b2f0c393715ac93d8f1b96bc7
Summary:
The old store contracts required that every store be able to produce the full
text for a revision. This patch modifies the contract so that a store (like a
bundle file store) can serve a delta chain and the union store can combine delta
chains from multiple stores together to create the final full text.
Test Plan: ran the tests
Reviewers: #sourcecontrol, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy
Differential Revision: https://phabricator.fb.com/D3205315
Signature: t1:3205315:1461669845:3eb8968566285f6221c7c44435b855cc65da33f4
Summary:
Instead of hard coding the list of stores in each union store, let's make it a
list and just test each store in order. This will allow easily adding new stores
and reordering the priority of the existing ones.
Also fix the remote store's contains function. 'contains' is the old name, and
it now needs to be getmissing in order to fit the store contract.
Test Plan: ran the tests
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Differential Revision: https://phabricator.fb.com/D3205314
Signature: t1:3205314:1461606028:3a513ac82c5de668a7e40bbf7cc88d8754e2f0bb
Summary:
A future patch is going to change the union store to just contain an ordered
list of stores. Therefore we need a special spot to record which store is the
one that should receive writes.
Test Plan: ran the tests
Reviewers: #sourcecontrol
Differential Revision: https://phabricator.fb.com/D3205307
Summary:
This is a generic topological sort and will be useful in the upcoming repacking
code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3204124
Signature: t1:3204124:1461260520:e1cb5c9d496f11e5f44e0cdbc5ba851b1573d2e1
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3221375
Signature: t1:3221375:1461648312:7dbdd59e6370cb32b90d864a623d8066028741e7
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3221373
Signature: t1:3221373:1461648284:23203c17f4a87e33ff4e9be17a8b99bddbcdff05
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3221371
Signature: t1:3221371:1461648217:e9702d761ab8fd6f85dee60a4c192cf25e784f11
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3221369
Signature: t1:3221369:1461648197:185cbbba61a9d1a7a1beacd64153185d0d0826ed
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3221366
Signature: t1:3221366:1461648117:088f3a5837393499e1a383af860bd1a935e0cba7
Summary: Fix failures found by check-code.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D3221365
Signature: t1:3221365:1461646159:efeb0478c66cbd49d4a0a6c02a79d530b42f8248
Summary: Apparently we need to `import errno` in `shallowutil.py`
Test Plan: Code Review
Reviewers: #sourcecontrol, ttung, durham
Reviewed By: durham
Differential Revision: https://phabricator.fb.com/D3195117
Signature: t1:3195117:1461031210:424912a96448a2a8cb37197f006cfa95d4ab1cb1
The recent refactor caused remotefilelog.size() to include rename metadata in
the size count, which meant the size didn't match what the rest of Mercurial
expected. This caused clean files to show up as dirty in hg status if they had a
'lookup' dirstate state and were renames.
Summary:
We've received a few complaints that receivemissing is throwing corrupt data
exceptions. My best guess is that we're not receiving all of the data for some
reason. Let's add an assertion to ensure all the data is present, so we can
narrow it down to a connection issue instead of actual corrupt data.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Differential Revision: https://phabricator.fb.com/D3136203
This was meant to be part of the previous stack of commits, but I pushed the
wrong stack. This patch addresses a number of code review feedback points, the
most visible being to remain 'contains' to something else (in this case
'getmissing').
The old way of fetching from the server required the base store api expose a way
for outside callers to add fetch handlers to the store. This exposed some of the
underlying details of how data is fetched in an unnecessary way and added an
awkward subscription api.
Let's just treat our remote caches as another store we can fetch from, and
require that the over arching configure logic (in shallowrepo.py) can connect
all our stores together in a union store.
The last major piece of functionality that needs to be moved into the new store
is the gc algorithm. This is just a copy paste of the one that exists in
localcache.