Commit Graph

370 Commits

Author SHA1 Message Date
Jun Wu
372dc6028a fileserverclient: remotefilepeer could have not implemented "cleanup"
Summary:
We actually have no idea what "hg.peer" will return and should check
if it has a "cleanup" method or not before wrapping it.

Test Plan: Code Review

Reviewers: ttung, #mercurial, rmcelroy

Reviewed By: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D3703201

Signature: t1:3703201:1470924859:852eaf275c89ceced285a4b74d09938e489d9ee0

Blame Revision: D3685587
2016-08-11 15:14:54 +01:00
Tony Tung
aa27058507 [ctree] fix build on clang
Summary:
clang has a bug where the fully qualified class name cannot be used to invoke the destructor https://llvm.org/bugs/show_bug.cgi?id=12350

This is one of the suggested workarounds.

Test Plan: compiles on darwin

Reviewers: #fastmanifest, lcharignon

Reviewed By: lcharignon

Subscribers: mathieubaudet, mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3692812

Signature: t1:3692812:1470852812:09ef7de2a322034a01f5569b574c47fcc6f0c8d7
2016-08-10 13:04:38 -07:00
Tony Tung
a401e1db1c [cdatapack] free the data segments allocated for delta chains
Summary: Since we allocate the memory for the uncompressed data on the heap, we need to free it.

Test Plan: compiles.

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3686334

Signature: t1:3686334:1470716123:10fae5169e73ab581b282f771d188f804198d169
2016-08-09 13:48:32 -07:00
Jun Wu
037146cbd9 Workaround an issue where sshpeer + workers lead to deadlocks
Summary:
The deadlock happens in sshpeer.cleanup:

```
  # sshpeer.cleanup
  def cleanup(self):
      if self.pipeo is None:
          return
      self.pipeo.close() # [1]
      self.pipei.close()
      try:
          # read the error descriptor until EOF
          for l in self.pipee: # [2]
              self.ui.status(_("remote: "), l)
      except (IOError, ValueError):
          pass
      self.pipee.close()
```

Notes:

  1. Normally, this closes "stdin" of the server-side ssh process so
     the "ssh" process running server-side will detect EOF and end.
     However, with workers calling "os.fork", there could be other
     processes keeping "pipeo" open thus "ssh" won't receive EOF.
  2. Deadlock happens here, if the ssh process cannot get EOF and
     does not end itself, thus does not close its stderr.

The dead lock happens with these steps:

  1. "hg update ..." starts. Let's call it "the master process".
  2. Memcache miss happens, and a sshpeer is started by fileserverclient
     pipei, pipeo, pipee get created.
  3. Workers start. They inherit pipei, pipeo, pipee.
  4. The master process wait for the workers without closing pipeo.
  5. The workers are at the "cleanup" method.
  6. The server-side "ssh" reading its stdin hangs because the master
     process hasn't close pipeo, thus no EOF to its stdin.
  7. The server-side "ssh" process never ends. It does not close its
     stderr.
  8. The workers reading from pipee never end. They never get EOF
     because the server-side "ssh" won't close its stderr.
  9. The master process waiting for workers never never completes.
     Because workers won't exit before they read all pipee.

The patch closes pipee for forked processes to address the issue.
Ideally, we want this in sshpeer.py because it could in theory
affect other sshpeer use-cases. But let's do it for remotefilelog
for now since remotefilelog is currently the only victim of this
deadlock pattern.

Test Plan:
Add some extra debugging logs. Check the wrapped `_cleanup` gets called
and things work normally.

Reviewers: #mercurial, ttung, durham

Reviewed By: durham

Differential Revision: https://phabricator.intern.facebook.com/D3685587

Tasks: 12563156

Signature: t1:3685587:1470688591:0f4f97508699b273e17df867898d65205ee52434
2016-08-08 20:13:12 +01:00
Durham Goode
f63c7b38a0 ctree: large refactor to introduce Manifest, ManifestFetcher, and InMemoryManifest
This is a large refactor of the original code base. I apologize for it not being
broken up into multiple commits.

This adds the Manifest class (that represents a single Manifest directory
entry), the ManifestFetcher class (which allows fetching children manifests),
and the InMemoryManifest class (that represents an uncommitted Manifest).

The combination of these classes does a few things:
1. It removes most of the references to PythonObj from the primary algorithms.
So in the future we could use this code from other C++ code without relying on
Python.
2. It refactors the manifest access in such a way that we allow manifests to be
stored in either python strings or in memory. This opens the doors to
implementing manifest editting apis.
2016-08-08 12:20:44 -07:00
Durham Goode
c8ca415b0e ctree: add getattr to PythonObj
Summary:
Now that we have a wrapper around python objects, let's add a getattr() function
to easily retrieve attributes off the object.

Test Plan: Ran the perf test

Reviewers: #fastmanifest

Differential Revision: https://phabricator.intern.facebook.com/D3674366
2016-08-08 12:16:25 -07:00
Durham Goode
18a54f3d14 ctree: switch all PyObjects to use new wrapper class
Summary:
This patch introduces a PythonObj class which implements the copy-constructor,
destructor, and assignment operator in a way that will manage the ref count
automatically. If we move to C++ 11 we could also implement the move constructor
and move assignment operators to make this even more efficient.

The current implementation allows implicitly converting to and from PyObject*,
which may be questionable design wise, but makes switching to and using this
class much cleaner since we can do things like `PythonObj foo = PyObject_New()`
and `PyObject_DoStuff(myPythonObj)`.

Test Plan: Ran the perf test. It succeeded, and I saw no effect on perf.

Reviewers: #fastmanifest

Differential Revision: https://phabricator.intern.facebook.com/D3674311
2016-08-08 12:16:25 -07:00
Durham Goode
81e68944ae ctree: use new pyexception type to propagate python exceptions
Summary:
Instead of returning NULL and propagating it up the call stack, let's throw
pyexception (which assumes the python error string has already been set) and the
top of the stack can just return NULL.

Test Plan: Ran my perf test suite

Reviewers: #fastmanifest

Differential Revision: https://phabricator.intern.facebook.com/D3673804
2016-08-08 12:16:25 -07:00
Durham Goode
f9a0e52da4 ctree: implement diff
Summary:
This implements treemanifest.diff(). It takes two manifests and iterates over
them to produce a python dictionary containing the differences.

I'm not proud of this.  Just putting it up for review for completeness since it
completes the find, diff, iter trifecta. I need to refactor it to remove some of
the duplication before it gets accepted.

Test Plan:
Ran it as part of a perf test suite using diffs across various
distances.  It takes 250ms to diff across 5000 commits, and 900ms to diff across
50,000 commits.

Reviewers: #fastmanifest

Differential Revision: https://phabricator.intern.facebook.com/D3646003
2016-08-08 12:16:25 -07:00
Durham Goode
031783c47f ctree: convert manifesttree struct to class
Summary:
By making this a class we can encapsulate common operations like parsing and
directory checking. The later diff that implements treemanifest_diff uses this
a lot.

Test Plan: Ran my perf tests for find and iter (and for the later patch diff).

Reviewers: #fastmanifest

Differential Revision: https://phabricator.intern.facebook.com/D3673228
2016-08-08 12:16:25 -07:00
Durham Goode
c43476ae8a ctree: explicitly cast to Py_ssize_t
Summary:
The varargs style Py_BuildValue functions have no idea what type the incoming
arguments are, so it passed the hard coded ints as 32 bit instead of 64bit.
Let's explicitly cast every number being passed to that function as Py_ssize_t
instead.

Test Plan: We were seeing segfaults without this change. Now we don't.

Reviewers: #fastmanifest

Differential Revision: https://phabricator.intern.facebook.com/D3673217
2016-08-08 12:16:25 -07:00
Durham Goode
bced440ce6 ctree: implement find
Summary:
This implements treemanifest.find(), which takes a filename and returns the node
and flag for it, if it exists.

This isn't the prettiest function I've ever written.  I need to think about how
to refactor this to unify the various traversal algorithms that are used in
treemanifest.

Test Plan:
Ran it as part of a perf test suite. It's basically 0 milliseconds in
every case.

Reviewers: #fastmanifest

Differential Revision: https://phabricator.intern.facebook.com/D3645967
2016-08-08 12:16:25 -07:00
Durham Goode
d5ae2f2fdf ctree: implement __iter__
Summary:
This implements the basic __iter__ logic. It returns an iterator that returns
every file path in the manifest.

Test Plan:
Ran a separate perf test suite to verify the performance of this. It
can iterate over 1 million files in about 550 milliseconds, assuming a fast
store.

Reviewers: #fastmanifest

Differential Revision: https://phabricator.intern.facebook.com/D3645890
2016-08-08 12:16:25 -07:00
Durham Goode
f3263cc7ab ctree: define basic fileiter type
Summary:
This defines the basic type for representing an iteration over all the keys in
the treemanifest. The next patch fill add the logic that constructs and mutates
this type as it iterates.

Test Plan: I ran a perf test in a future patch that executes all of this code.

Reviewers: #fastmanifest

Differential Revision: https://phabricator.intern.facebook.com/D3645768
2016-08-08 12:16:25 -07:00
Durham Goode
28b736554e ctree: add common helper methods
Summary:
This adds getdata and binfromhex functions. These are common functions that will
be used throughout the implementation of the ctreemanifest.

Test Plan: Ran a perf test suit on the code in a later diff.

Reviewers: #fastmanifest

Differential Revision: https://phabricator.intern.facebook.com/D3644960
2016-08-08 12:16:25 -07:00
Durham Goode
d55020433d ctree: add an initial c type definition for treemanifest
Summary:
This is the basic definition of the c treemanifest type. Future patches will add
functions to this type.

Test Plan: Tested by running a perf suite on a later version of this series

Reviewers: #fastmanifest

Differential Revision: https://phabricator.intern.facebook.com/D3644935
2016-08-08 12:16:25 -07:00
Durham Goode
aa0ad156a2 ctree: adds an initial cpython file for the ctreemanifest implementation
Summary: Just a simple module declaration with no logic yet.

Test Plan: Ran perf tests against a later implementation of this

Reviewers: #fastmanifest

Differential Revision: https://phabricator.intern.facebook.com/D3644720
2016-08-08 12:16:25 -07:00
Tony Tung
f642d24f1c [cdatapack] create a fastdatapack class
Summary:
fastdatapack is the same as datapack.  add selector in datapackstore to determine which datapack to create.

test-datapack-fast.t is the same as tset-datapack.t, except it enables fastdatapack

Test Plan: pass test-datapack.t test-datapack-fast.t

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3666932

Signature: t1:3666932:1470426499:45292064e2868caab152d9a5b788840c5f63e4e4
2016-08-05 14:35:29 -07:00
Tony Tung
bbeadc1631 [cdatapack] wire up getdeltachain
Summary: Call the getdeltachain in C, format the results for python.

Test Plan: pass test-repack.t

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: durham, mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3671744

Signature: t1:3671744:1470382442:efae340c8cf5b173407c909c0816bf26704c7bf5
2016-08-05 11:44:19 -07:00
Tony Tung
4a9136f325 [cdatapack] optimize fanout table lookups by doing the calculations at load time
Summary: Do the divides when we load up the index table.  Lookups are now cheaper.

Test Plan: pass test-repack.t

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3671719

Signature: t1:3671719:1470382246:937fd19f60ad71304e6a75f348fa92f067aba895
2016-08-05 11:43:39 -07:00
Durham Goode
baaff31090 repack: move background repack into requirement check
Previously we were kicking off background repacks even for non remotefilelog
repos. Moving the repack to be inside the remotefilelog requirement check will
prevent this.
2016-08-05 10:00:22 -07:00
Tony Tung
e45027053c [cdatapack] iterator should return deltalen and not delta
Summary: This is to match the Python API.

Test Plan: used in later diff to pass test-datapack.t

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3668366

Signature: t1:3668366:1470341366:7c77684e646fe31a742e76158133597b46307797
2016-08-04 13:52:55 -07:00
Tony Tung
437072e8ec [cdatapack] fix bug in following deltabasenode pointers
Summary: The raw index is a byte offset, not an entry number.  I hope the compiler is smart enough to optimize out the divide and multiply. :)

Test Plan: cdatapack_get on a delta chain that has a deltabasenode does not crash!

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3667033

Signature: t1:3667033:1470341429:b37da6c9ea6e37fe79b48ec6766c857b5e56c36a
2016-08-04 13:52:41 -07:00
Tony Tung
4fc0694b79 [cdatapack] use int for string length
Summary: Unless `PY_SSIZE_T_CLEAN` is defined, the length is int.

Test Plan: compiles.

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3666998

Signature: t1:3666998:1470341329:3ddd1f7e36389aff3e18482db4d0b28ab8d7c12f
2016-08-04 13:52:24 -07:00
Tony Tung
66ffd84e1d [cdatapack] add some error handling
Summary:
Replace some TODOs with actual error handling code.

Also lumped in typo fixes and style changes.  Sorry.

Test Plan: Used in a later diffs to pass test-datapack.t

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3666875

Signature: t1:3666875:1470341306:67cd439341ad30fcd690ff2e399e8beacea1c0bb
2016-08-04 13:51:45 -07:00
Tony Tung
66368c07a1 [cdatapack] fix bugs in the fanout table
Summary:
1. When bisecting, we don't want to wrap around.  If middle == 0 and we're lesser than that, we should just fail.
2. large fanout should be header->config & LARGE_FANOUT.  | means it's always a large fanout.
3. the format of the fanout table on disk makes it impossible to differentiate between an empty fanout entry and the first fanout entry, as they are both '0'.  Therefore, any entry before the *second* fanout entry must implicitly search the 0th element.
4. fixed a bug in the calculation of the last index entry.

Test Plan: passed test-datapack.t with other fixes applied.

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: durham, mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3666770

Signature: t1:3666770:1470341277:3f4f63a365e8bb0f4da6e574fc7f15228877c682
2016-08-04 13:51:07 -07:00
Tony Tung
8237209f77 [cdatapack] stricter const
Summary: We don't ever need to modify the node sha data, so make it const.

Test Plan: compiles

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3666714

Signature: t1:3666714:1470340104:080ffa290a49388e797dcefc66976f6341932b76
2016-08-04 13:50:00 -07:00
Tony Tung
c299d186f4 [cdatapack] implement _find()
Summary: Depends on D3660087, D3654810

Test Plan:
```
#!/usr/bin/env python

import binascii

import cdatapack

a = cdatapack.datapack('d864669a5651d04505ec6e5e9dba1319cde71f7b')

bin = binascii.unhexlify('f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e')
print a._find(bin)
```

yields:

```
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:eaf5d75> python foo.py
('\xf2\xe5?\x83\xc5\xdc\x80j\xa2\xed\xa8{\xb1_\xe06{\xaf:~', 4294967295, 285122348L, 8374L)
```

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3660091

Signature: t1:3660091:1470339492:21a7f3067bda7822c8f396120f99f1dc6e4e26b5
2016-08-04 13:49:19 -07:00
Tony Tung
9b036e561e [cdatapack] expose the find interface
Summary: Needed if we want to do a hybrid implementation of cdatapack

Test Plan: used in following diff.

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3660087

Signature: t1:3660087:1470339373:4e8b548f1509af7f34d0a4bf8bd85723f38d238d
2016-08-04 13:48:32 -07:00
Tony Tung
113f23b65e [cdatapack] implement iteritems()
Summary: `iteritems()` differs from `__iter__()` slightly in that it yields the delta base and delta.

Test Plan:
run this toy program

```
#!/usr/bin/env python

import cdatapack

a = cdatapack.datapack('d864669a5651d04505ec6e5e9dba1319cde71f7b')
for x in a.iterentries():
    print x[0], repr(x[1]), repr(x[2]), len(x[3])
for x in a:
    print x[0], repr(x[1])
```

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3659133

Signature: t1:3659133:1470339839:dbdce5990a30ffe019ccc44fce97925b64524acd
2016-08-04 13:48:21 -07:00
Tony Tung
a28137668e [cdatapack] skeleton for iterator type
Summary: cdatapack now has a getiter function, and it returns a cdatapack_iterator.

Test Plan:
using this toy program, dumped a pack file.

```
#!/usr/bin/env python

import cdatapack

a = cdatapack.datapack('d864669a5651d04505ec6e5e9dba1319cde71f7b')
for x in a:
    print x
```

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: durham, mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3659005

Signature: t1:3659005:1470339657:aa39cc57a669b9bc4604933ce35ed20b3f81b468
2016-08-04 13:48:04 -07:00
Tony Tung
e8da5b62df [cdatapack] fix build on linux hosts
Summary:
1. Get ntohl from arpa/inet.h as per the posix spec
2. Get ntohll from endian.h's be64toh

Test Plan: make local

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3671211

Signature: t1:3671211:1470341382:e6b0fe12094246aeb6be09252122bde9680e4599
2016-08-04 13:23:11 -07:00
Tony Tung
7b70d8c572 [cdatapack] skeleton for the python type
Summary:
This is the skeleton for the python type.  Only initialization and the destructor are filled in.

Depends on D3654786.

Test Plan:
```
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:0478c29> ls -l d864669a5651d04505ec6e5e9dba1319cde71f7b*
-r--r--r--  1 tonytung  staff     947666 Jul 26 14:08 d864669a5651d04505ec6e5e9dba1319cde71f7b.dataidx
-r--r--r--  1 tonytung  staff  285130722 Jul 26 14:08 d864669a5651d04505ec6e5e9dba1319cde71f7b.datapack
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:0478c29> cat foo.py
#!/usr/bin/env python

import cdatapack

a = cdatapack.datapack('d864669a5651d04505ec6e5e9dba1319cde71f7b')
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:0478c29> python foo.py
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:0478c29>
```

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3654810

Signature: t1:3654810:1470175451:c2d4e4acc138685c1030ed98f0afd9379f9fa0c4
2016-08-03 15:29:44 -07:00
Tony Tung
76f5986ab9 [cdatapack] adds an initial cpython file for the cdatapack implementation
Summary: Just a simple module declaration with no logic yet.

Test Plan:
```
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:2445a3a> make local

<output snipped>

[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:2445a3a> python
Python 2.7.11 (default, Mar  1 2016, 18:40:10)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cdatapack
>>>
```

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3654786

Signature: t1:3654786:1470175354:c7e8847dcc74c83483d21888ad30cd9242fb461c
2016-08-03 15:29:29 -07:00
Tony Tung
58e395d2e6 [cdatapack] utility to retrieve and checksum the delta chain
Summary:
Given a node sha, find it in the index file and retrieve the deltas.  Checksum the data and dump it.

Depends on D3637000, D3636945

Test Plan:
```
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:68cd351> /Users/tonytung/Library/Caches/CLion2016.2/cmake/generated/cdatapack-64b7828e/64b7828e/Debug0/cdatapack_get  d864669a5651d04505ec6e5e9dba1319cde71f7b  f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e

source/zippydb/tier_spec/tier_settings/zippydb.wildcard.tmpfs.zippydb_settings.cconf
Node                                      Delta Base                                Delta SHA1                                Delta Length
f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e  0000000000000000000000000000000000000000  f32b366a6c44430df6526133f82f9638426ba9c5  37769
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:68cd351> hg debugdatapack d864669a5651d04505ec6e5e9dba1319cde71f7b --node f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e

source/zippydb/tier_spec/tier_settings/zippydb.wildcard.tmpfs.zippydb_settings.cconf
Node                                      Delta Base                                Delta SHA1                                Delta Length
f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e  0000000000000000000000000000000000000000  f32b366a6c44430df6526133f82f9638426ba9c5  37769
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:68cd351>
```

Reviewers: durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3637416

Signature: t1:3637416:1470094723:bce7e903cd0b80c293e16b7532c49e552d3039ef
2016-08-03 15:29:01 -07:00
Olivier Trempe
af90628992 flogheads: return an empty list when requesting heads of a non-existing filelog 2016-08-02 10:41:41 -07:00
Olivier Trempe
8005fc4b2f fileserverclient: add wireproto command for requesting a filelog's heads
Allowing discovery of all the heads of a filelog allows supporting some existing
Mercurial use cases, like viewing all the versions of a file in a UI.
2016-08-02 10:40:42 -07:00
Olivier Trempe
51e02acf2c filelogrevset: Return revset.baseset instead of plain list. Add test for kind in path. 2016-08-02 09:40:50 -07:00
Tony Tung
974339f97e [cdatapack] fix index retrieval bugs
Summary:
1. offsets are absolute byte offsets.  convert them to entry offsets to make the bisect code a lot simpler.
2. when writing entries to pack chain, we need to advance the pointer.

Depends on D3627122

Test Plan: used in later diff.

Reviewers: durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3637000

Signature: t1:3637000:1469741885:c2416a3b30e5bb2b64e6bb7062f4c02098be91eb
2016-08-01 14:18:35 -07:00
Tony Tung
641c5f4f01 [debugcommands] return the uncompressed delta length when iterating over a datapack
Summary:
When retrieving a delta chain, datapack.py uncompresses the delta chain data.  However, when iterating over the datapack, we get the compressed length.  THis is not desirable as the output is no longer consistent.  This diff peeks into the lz4 header to get the uncompressed length when iterating.

Depends on D3627119

Test Plan:
```
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:e2ef218> hg debugdatapack d864669a5651d04505ec6e5e9dba1319cde71f7b --node f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e

source/zippydb/tier_spec/tier_settings/zippydb.wildcard.tmpfs.zippydb_settings.cconf
Node                                      Delta Base                                Delta SHA1                                Delta Length
f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e  0000000000000000000000000000000000000000  f32b366a6c44430df6526133f82f9638426ba9c5  37769
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:e2ef218> hg debugdatapack d864669a5651d04505ec6e5e9dba1319cde71f7b | tail -n 4

source/zippydb/tier_spec/tier_settings/zippydb.wildcard.tmpfs.zippydb_settings.cconf
Node          Delta Base    Delta Length
f2e53f83c5dc  000000000000  37769
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:e2ef218>
```

Reviewers: durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3636945

Signature: t1:3636945:1469811243:b21d90d9599244ed4600c5336818b9a18eacf3ff
2016-08-01 14:16:29 -07:00
Tony Tung
20126b3bd8 [cdatapack] fix memory handling for cdatapack
Summary:
`->index_table` is not heap-alloacted.  however, `->fanout_table` is and should be released.

Also added call to `close_datapack()` at the end of `cdatapack_dump.c`.

Depends on D3627122

Test Plan: valgrind is much happier now.

Reviewers: durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3631368

Signature: t1:3631368:1469741779:e0c4e5d59c7e73c8aa3507901df3005383f0d3f5
2016-08-01 14:11:16 -07:00
Tony Tung
705c0731b6 [remotefilelog] initial checkin of a c datapack parser
Summary: This is not yet complete, but seems to be able to parse a data file.

Test Plan:
`/Users/tonytung/Library/Caches/CLion2016.2/cmake/generated/cdatapack-64b7828e/64b7828e/Debug/cdatapack_dump d864669a5651d04505ec6e5e9dba1319cde71f7b > /tmp/2`

compare it with the output of `hg debugdatapack --long d864669a5651d04505ec6e5e9dba1319cde71f7b > /tmp/1`

and it exactly matches.

Reviewers: durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3627122

Signature: t1:3627122:1470085301:c9b9e8b2fa57bb7a09dd56d3c811ff8eadbb85ba
2016-08-01 14:05:37 -07:00
Tony Tung
9e557758b0 [datapack] add --node as a parameter to dump extra data about a node
Summary:
It obtains the deltachain and dumps the chain to the console.

Depends on D3627117.

Test Plan:
```
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:3266095> hg debugdatapack d864669a5651d04505ec6e5e9dba1319cde71f7b --node ba5fbf1aba48f25d46228626917b2705adc9e7c8

arcanist/__phutil_library_map__.php
Node                                      Delta Base                                Delta SHA1                                Delta Length
ba5fbf1aba48f25d46228626917b2705adc9e7c8  0000000000000000000000000000000000000000  df442a6f976b946c266f76b0f63a198e8aabf809  3993
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:3266095>
```

Reviewers: durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3627119

Signature: t1:3627119:1469738313:d61726585a020ed4cbabbb1f623eb202ccd51b9f
2016-07-28 17:15:21 -07:00
Tony Tung
7111de0994 [datapack] allow for long hashes to be printed
Summary:
This will help verify the C datapack reader.

Depends on D3627112.

Test Plan:
```
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:4063a65> hg debugdatapack --long  d864669a5651d04505ec6e5e9dba1319cde71f7b  | head -n 15

arcanist/__phutil_library_map__.php
Node                                      Delta Base                                Delta Length
ba5fbf1aba48f25d46228626917b2705adc9e7c8  0000000000000000000000000000000000000000  1265

arcanist/canary/FacebookConfigeratorArcanistCanaryWorkflow.php
Node                                      Delta Base                                Delta Length
142f9991fca1a16c6544cb6e5a0071296e712268  0000000000000000000000000000000000000000  6546

arcanist/lint/FacebookConfigeratorLintEngine.php
Node                                      Delta Base                                Delta Length
c8630501c45f1bc1dc47df2ee2ad354993438cdb  0000000000000000000000000000000000000000  2811

arcanist/lint/linter/FbcodePyFlake8Linter.php
Node                                      Delta Base                                Delta Length
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:4063a65> hg debugdatapack   d864669a5651d04505ec6e5e9dba1319cde71f7b  | head -n 15

arcanist/__phutil_library_map__.php
Node          Delta Base    Delta Length
ba5fbf1aba48  000000000000  1265

arcanist/canary/FacebookConfigeratorArcanistCanaryWorkflow.php
Node          Delta Base    Delta Length
142f9991fca1  000000000000  6546

arcanist/lint/FacebookConfigeratorLintEngine.php
Node          Delta Base    Delta Length
c8630501c45f  000000000000  2811

arcanist/lint/linter/FbcodePyFlake8Linter.php
Node          Delta Base    Delta Length
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:4063a65>
```

Reviewers: durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3627117

Signature: t1:3627117:1469735318:103e9a21be082749332572c9c4f9942ea9c1c248
2016-07-28 17:07:04 -07:00
Tony Tung
3da845968f [datapack] fix computation of the paged-in size
Summary:
It should include the filelen and the deltalen fields, which are
2 and 8 bytes.

Test Plan: visual.

Reviewers: durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3627108

Signature: t1:3627108:1469742083:ffb59768906d9e5463065eec92e1c80cc8482884
2016-07-28 17:06:50 -07:00
Olivier Trempe
a2ce732706 fileserverclient: fixed lingering ssh connection due to reference cycle on pull operations
Calling wrapfunction on the remotefilepeer(sshpeer) object in exchangepull
function introduces a reference cycle. Hence, this object will not be deleted
until the process dies. This is not a big issue for processes having a short
lifetime(e.g. lauched by command line.)
However, for persistent processes (e.g. TortoiseHg), this can lead to multiple
lingering ssh connections to the server(actually one by pull operation).

The fix is to not wrap the remotefilepeer._callstream. This method is defined
right into the remotefilepeer object. The required repo data is made available
in the remotefilepeer object by monkeypatching this object in the exchangepull
function.
2016-07-22 13:47:02 -07:00
Olivier Trempe
0368ca40fc Fix filelogrevset not properly handling "kind" in path 2016-07-22 13:09:48 -07:00
Durham Goode
df65096278 pull: add more requirement checking
In some situations the remotefilelog setup logic could be called, which will
wrap certain functions, and then later a call will happen to a repo that wasn't
remotefilelog which will run some remotefilelog code because of the wrapping.

Normally we take care of this by checking for the remotefilelog requirement. We
missed it in this one spot though.
2016-07-22 12:33:56 -07:00
Martin von Zweigbergk
0df928d828 shallowbundle: specifically compare instance to remotefilelog.remotefilelog
In two place, we were checking if a revlog was an instance of
revlog.revlog and, I think, treating it as a
remotefilelog.remotefilelog otherwise. I noticed this when I created
another non-revlog.revlog revlog in narrowhg and remotefilelog thought
it was a remotefilelog.remotefilelog. Let's specifically check if it's
a remotefilelog.remotefilelog instead.
2016-07-15 23:53:09 -07:00
Durham Goode
073f8e3d22 repack: unmap memory occasionally to reclaim space
Summary:
When running large repack operations, the resident size of the process
could become quite large, since we're scanning in entire pack files. Linux/OSX
have api calls for telling the kernel it's ok to release some of that memory,
but those apis are not exposed to python.

So instead, let's unmap and remap the mmap's once a certain amount of data has
been read. I also tried changing the mmap accessors to use the file oriented api
(mmap.read(), mmap.seek(), etc) so we could switch to actual file handles during
repack, but it had a drastic affect on normal performance (repack took 1 hour
instead of a few minutes).

Long term we should move all of this logic to c++ so we can use the more
powerful APIs.

Test Plan:
Did a full repack on a laptop and verified memory capped out at 2GB
instead of exceeding 5GB.

Reviewers: #sourcecontrol, ttung

Differential Revision: https://phabricator.intern.facebook.com/D3545171
2016-07-12 11:46:48 -07:00