Summary: Directly pass in the path + len and the node. Note that the path is now a char* + len, because this allows us to use the path in treemanifest_find directly, rather than to construct a new path.
Test Plan: run existing perftest without crash.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3738280
Signature: t1:3738280:1471890923:c13283f1c61dc020ba1918ee9b25c24dfd2fc19b
Test Plan: used in later diff.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3732102
Signature: t1:3732102:1471905481:000ff5d976c348bac9f993f9c0b56f4b9c8b84f0
Summary:
I like many small files.
There is one place where I'm making a functional change (convert.h) to satisfy angry compilers.
Test Plan: make local.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3732584
Signature: t1:3732584:1471542758:d0b7804753ea4fd39a507090338ae3c5104dc7fa
Summary: For adding new children to Manifests, we need to be able to create new ManifestEntries. Since these ManifestEntries will not be backed by datapack data structures, we need ManifestEntries that have its own memory allocation.
Test Plan: used in later diff.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3732101
Signature: t1:3732101:1471541200:17b60af0977109757610168637a276f5dd999f8b
Summary: D3730823 removes the need for it.
Test Plan: compiles
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mathieubaudet, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3730880
Signature: t1:3730880:1471467917:2b202c5cab1a10fcfe5899670b11bc44740983f7
Summary:
If a manifest has already been loaded for a ManifestEntry, we should cache that entry and reuse it. Two reasons:
1) if someone makes a modification to a tree, we need to persist that.
2) better performance.
Missing in this diff: memory cleanup
Test Plan: compiles
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3725842
Signature: t1:3725842:1471401136:cbf4a987c35ea19ca432059cc15e299f0aa5568b
Summary: This allows us to use ManifestFetcher inside the ManifestEntry class.
Test Plan: compiles
Reviewers: #fastmanifest, durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3725838
Summary: This will allow us to replace portions of manifests with in-memory representations.
Test Plan: existing script doesn't crash.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3725831
Signature: t1:3725831:1471400740:4d9891c01f8567f4ceab76d6bd36e7dc595de4a6
Summary: `parseptr` refers to where the parsing is going next. This simplifies the code and makes ManifestEntry less tied to the original memory allocation.
Test Plan: the existing test i've been running doesn't crash.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3717940
Signature: t1:3717940:1471383988:9b718e137b8ffaf8aad09f78f15791eec57fce6e
Summary: Parse manifests when we construct the Manifest object. This allows us to do manipulations to the Manifest object and not have to deal with a parallel set of data structures.
Test Plan:
`PYTHONPATH=~/work/mercurial/facebook-hg-rpms/remotefilelog:~/work/mercurial/facebook-hg-rpms/fb-hgext/:~/work/mercurial/facebook-hg-rpms/remotenames/ valgrind ~/work/mercurial/facebook-hg-rpms/hg-crew/hg --config extensions.perftest=~/work/mercurial/facebook-hg-rpms/remotefilelog/tests/perftest.py testtree --kind flat,ctree,fast --test fulliter,diff,find --build "@~2::@"`
note that I'm running this with Valgrind.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3717936
Signature: t1:3717936:1471383856:012634c1e59f1da9fc1e5a918e7f7d99d30d6992
Summary: It's not necessary for the direction we're going in.
Test Plan: compiles
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3717930
Signature: t1:3717930:1471383901:d8bfddb7ea7fb08d26315cbfce7af65d14662bc8
Summary:
We need manifest objects to be able to stick around in memory, because now they have overrides and all that other good stuff.
This probably introduces a metric ton of memory leaks, but we'll slowly whittle them down.
Test Plan: same script.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3717304
Signature: t1:3717304:1471385439:9269ab248d233a970c6725dbb4ca3eb661f6a96e
Summary: xxx is the C++ class, py_xxx is the python wrapper for it.
Test Plan: make local
Reviewers: #fastmanifest, akushner
Reviewed By: akushner
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3701536
Signature: t1:3701536:1471150787:5f3dfc3a65360233f9ee2ea1757ed4f368f70d62
Summary: py_treemanifest is initialized to 0s. We initialize the `tm` field by explicitly calling the constructor in `treemanifest_init` and we destroy everything by explicitly calling the destructor in `treemanifest_dealloc`
Test Plan: `PYTHONPATH=~/work/mercurial/facebook-hg-rpms/remotefilelog:~/work/mercurial/facebook-hg-rpms/fb-hgext/:~/work/mercurial/facebook-hg-rpms/remotenames/:~/work/mercurial/facebook-hg-rpms/lz4revlog/ /opt/local/bin/python2.7 ~/work/mercurial/facebook-hg-rpms/hg-crew/hg --config extensions.perftest=~/work/mercurial/facebook-hg-rpms/remotefilelog/tests/perftest.py testtree --kind flat,ctree,fast --test fulliter,diff,find --build "master~5::master"` in fbsource
Reviewers: #fastmanifest
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3730823
Summary: If the flag is not present, then `flag` field is set to NULL. In that case, the current code will segfault. Now we will assign `\0` to `*resultflag`.
Test Plan:
run `PYTHONPATH=~/work/mercurial/facebook-hg-rpms/remotefilelog:~/work/mercurial/facebook-hg-rpms/fb-hgext/:~/work/mercurial/facebook-hg-rpms/remotenames/ ~/work/mercurial/facebook-hg-rpms/hg-crew/hg --config extens
.perftest=~/work/mercurial/facebook-hg-rpms/remotefilelog/tests/perftest.py testtree --kind flat,ctree,fast --test fulliter,diff,find --build "@~2::@"`
without crashing.
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3700352
Summary: This is the first step in disentangling the C++ code from the python interface.
Test Plan: `PYTHONPATH=~/remotefilelog/build/lib.linux-x86_64-2.6/ python ~/hg/hg --config extensions.remotefilelog=~/remotefilelog/remotefilelog --config extensions.perftest=~/remotefilelog/tests/perftest.py testtree --kind flat,ctree --test fulliter,diff,find --build "master~5000::master"`
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3700627
Signature: t1:3700627:1471382403:d8dc6dc5c295ec55878ca020b91fc0b30d930ce8
Summary: Useful if you run CLion.
Test Plan: built everything.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3700542
Signature: t1:3700542:1471382263:260eb0a1588480f1d4c798df60d559c63ed19c8a
Summary: .idea is the CLion project directory. .testtimes is dropped by the linter.
Test Plan: minimal
Reviewers: #fastmanifest, akushner
Reviewed By: akushner
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3700563
Signature: t1:3700563:1471150982:79e96a7f8930c6417fa528375884d288ab661a6a
Summary: If someone removes the pack file, getpack will throw an IOError. Catch it and don't bother trying to add the pack to the list of available packs.
Test Plan: pdb.set_trace() in this method, then remove a file while the debugger is halted. continue without a traceback.
Reviewers: #mercurial, quark
Reviewed By: quark
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3712310
Tasks: 12660285
Signature: t1:3712310:1471080740:85f2e3f49b09f348ec654362eadf5f1e689b8a19
Summary: Once we're done reading the delta data, we madvise it away.
Test Plan:
dump all the hashes from a datapack into a separate file. then run a script to fetch all the delta chains. observed that the memory footprint did not increase significantly.
```
#!/usr/bin/env python
import binascii
import cdatapack
dp = cdatapack.datapack('/dev/shm/hgcache/fbsource/packs/8b5d28f7a5bd7391a0b060c88af8cca3af357c24')
for ix, line in enumerate(open('/tmp/hashes', 'r')):
line = line.strip()
dp.getdeltachain(binascii.unhexlify(line))
```
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3686830
Signature: t1:3686830:1470716333:e8fc8e3e3fa29c1931f69222c17e10f519a4a8c2
Summary:
We actually have no idea what "hg.peer" will return and should check
if it has a "cleanup" method or not before wrapping it.
Test Plan: Code Review
Reviewers: ttung, #mercurial, rmcelroy
Reviewed By: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3703201
Signature: t1:3703201:1470924859:852eaf275c89ceced285a4b74d09938e489d9ee0
Blame Revision: D3685587
Summary:
clang has a bug where the fully qualified class name cannot be used to invoke the destructor https://llvm.org/bugs/show_bug.cgi?id=12350
This is one of the suggested workarounds.
Test Plan: compiles on darwin
Reviewers: #fastmanifest, lcharignon
Reviewed By: lcharignon
Subscribers: mathieubaudet, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3692812
Signature: t1:3692812:1470852812:09ef7de2a322034a01f5569b574c47fcc6f0c8d7
Summary: Since we allocate the memory for the uncompressed data on the heap, we need to free it.
Test Plan: compiles.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3686334
Signature: t1:3686334:1470716123:10fae5169e73ab581b282f771d188f804198d169
Summary:
The deadlock happens in sshpeer.cleanup:
```
# sshpeer.cleanup
def cleanup(self):
if self.pipeo is None:
return
self.pipeo.close() # [1]
self.pipei.close()
try:
# read the error descriptor until EOF
for l in self.pipee: # [2]
self.ui.status(_("remote: "), l)
except (IOError, ValueError):
pass
self.pipee.close()
```
Notes:
1. Normally, this closes "stdin" of the server-side ssh process so
the "ssh" process running server-side will detect EOF and end.
However, with workers calling "os.fork", there could be other
processes keeping "pipeo" open thus "ssh" won't receive EOF.
2. Deadlock happens here, if the ssh process cannot get EOF and
does not end itself, thus does not close its stderr.
The dead lock happens with these steps:
1. "hg update ..." starts. Let's call it "the master process".
2. Memcache miss happens, and a sshpeer is started by fileserverclient
pipei, pipeo, pipee get created.
3. Workers start. They inherit pipei, pipeo, pipee.
4. The master process wait for the workers without closing pipeo.
5. The workers are at the "cleanup" method.
6. The server-side "ssh" reading its stdin hangs because the master
process hasn't close pipeo, thus no EOF to its stdin.
7. The server-side "ssh" process never ends. It does not close its
stderr.
8. The workers reading from pipee never end. They never get EOF
because the server-side "ssh" won't close its stderr.
9. The master process waiting for workers never never completes.
Because workers won't exit before they read all pipee.
The patch closes pipee for forked processes to address the issue.
Ideally, we want this in sshpeer.py because it could in theory
affect other sshpeer use-cases. But let's do it for remotefilelog
for now since remotefilelog is currently the only victim of this
deadlock pattern.
Test Plan:
Add some extra debugging logs. Check the wrapped `_cleanup` gets called
and things work normally.
Reviewers: #mercurial, ttung, durham
Reviewed By: durham
Differential Revision: https://phabricator.intern.facebook.com/D3685587
Tasks: 12563156
Signature: t1:3685587:1470688591:0f4f97508699b273e17df867898d65205ee52434
This is a large refactor of the original code base. I apologize for it not being
broken up into multiple commits.
This adds the Manifest class (that represents a single Manifest directory
entry), the ManifestFetcher class (which allows fetching children manifests),
and the InMemoryManifest class (that represents an uncommitted Manifest).
The combination of these classes does a few things:
1. It removes most of the references to PythonObj from the primary algorithms.
So in the future we could use this code from other C++ code without relying on
Python.
2. It refactors the manifest access in such a way that we allow manifests to be
stored in either python strings or in memory. This opens the doors to
implementing manifest editting apis.
Summary:
Now that we have a wrapper around python objects, let's add a getattr() function
to easily retrieve attributes off the object.
Test Plan: Ran the perf test
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3674366
Summary:
This patch introduces a PythonObj class which implements the copy-constructor,
destructor, and assignment operator in a way that will manage the ref count
automatically. If we move to C++ 11 we could also implement the move constructor
and move assignment operators to make this even more efficient.
The current implementation allows implicitly converting to and from PyObject*,
which may be questionable design wise, but makes switching to and using this
class much cleaner since we can do things like `PythonObj foo = PyObject_New()`
and `PyObject_DoStuff(myPythonObj)`.
Test Plan: Ran the perf test. It succeeded, and I saw no effect on perf.
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3674311
Summary:
Instead of returning NULL and propagating it up the call stack, let's throw
pyexception (which assumes the python error string has already been set) and the
top of the stack can just return NULL.
Test Plan: Ran my perf test suite
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3673804
Summary:
This implements treemanifest.diff(). It takes two manifests and iterates over
them to produce a python dictionary containing the differences.
I'm not proud of this. Just putting it up for review for completeness since it
completes the find, diff, iter trifecta. I need to refactor it to remove some of
the duplication before it gets accepted.
Test Plan:
Ran it as part of a perf test suite using diffs across various
distances. It takes 250ms to diff across 5000 commits, and 900ms to diff across
50,000 commits.
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3646003
Summary:
By making this a class we can encapsulate common operations like parsing and
directory checking. The later diff that implements treemanifest_diff uses this
a lot.
Test Plan: Ran my perf tests for find and iter (and for the later patch diff).
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3673228
Summary:
The varargs style Py_BuildValue functions have no idea what type the incoming
arguments are, so it passed the hard coded ints as 32 bit instead of 64bit.
Let's explicitly cast every number being passed to that function as Py_ssize_t
instead.
Test Plan: We were seeing segfaults without this change. Now we don't.
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3673217
Summary:
This implements treemanifest.find(), which takes a filename and returns the node
and flag for it, if it exists.
This isn't the prettiest function I've ever written. I need to think about how
to refactor this to unify the various traversal algorithms that are used in
treemanifest.
Test Plan:
Ran it as part of a perf test suite. It's basically 0 milliseconds in
every case.
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3645967
Summary:
This implements the basic __iter__ logic. It returns an iterator that returns
every file path in the manifest.
Test Plan:
Ran a separate perf test suite to verify the performance of this. It
can iterate over 1 million files in about 550 milliseconds, assuming a fast
store.
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3645890
Summary:
This defines the basic type for representing an iteration over all the keys in
the treemanifest. The next patch fill add the logic that constructs and mutates
this type as it iterates.
Test Plan: I ran a perf test in a future patch that executes all of this code.
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3645768
Summary:
This adds getdata and binfromhex functions. These are common functions that will
be used throughout the implementation of the ctreemanifest.
Test Plan: Ran a perf test suit on the code in a later diff.
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3644960
Summary:
This is the basic definition of the c treemanifest type. Future patches will add
functions to this type.
Test Plan: Tested by running a perf suite on a later version of this series
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3644935
Summary: Just a simple module declaration with no logic yet.
Test Plan: Ran perf tests against a later implementation of this
Reviewers: #fastmanifest
Differential Revision: https://phabricator.intern.facebook.com/D3644720
Summary:
fastdatapack is the same as datapack. add selector in datapackstore to determine which datapack to create.
test-datapack-fast.t is the same as tset-datapack.t, except it enables fastdatapack
Test Plan: pass test-datapack.t test-datapack-fast.t
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3666932
Signature: t1:3666932:1470426499:45292064e2868caab152d9a5b788840c5f63e4e4
Summary: Call the getdeltachain in C, format the results for python.
Test Plan: pass test-repack.t
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3671744
Signature: t1:3671744:1470382442:efae340c8cf5b173407c909c0816bf26704c7bf5
Summary: Do the divides when we load up the index table. Lookups are now cheaper.
Test Plan: pass test-repack.t
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3671719
Signature: t1:3671719:1470382246:937fd19f60ad71304e6a75f348fa92f067aba895
Previously we were kicking off background repacks even for non remotefilelog
repos. Moving the repack to be inside the remotefilelog requirement check will
prevent this.