Commit Graph

21 Commits

Author SHA1 Message Date
Tony Tung
30ba9cdd24 [cdatapack] madvise the memory away
Summary: Once we're done reading the delta data, we madvise it away.

Test Plan:
dump all the hashes from a datapack into a separate file.  then run a script to fetch all the delta chains.  observed that the memory footprint did not increase significantly.

```
#!/usr/bin/env python

import binascii

import cdatapack

dp = cdatapack.datapack('/dev/shm/hgcache/fbsource/packs/8b5d28f7a5bd7391a0b060c88af8cca3af357c24')

for ix, line in enumerate(open('/tmp/hashes', 'r')):
    line = line.strip()
    dp.getdeltachain(binascii.unhexlify(line))
```

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: durham, mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3686830

Signature: t1:3686830:1470716333:e8fc8e3e3fa29c1931f69222c17e10f519a4a8c2
2016-08-15 11:39:14 -07:00
Tony Tung
a401e1db1c [cdatapack] free the data segments allocated for delta chains
Summary: Since we allocate the memory for the uncompressed data on the heap, we need to free it.

Test Plan: compiles.

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3686334

Signature: t1:3686334:1470716123:10fae5169e73ab581b282f771d188f804198d169
2016-08-09 13:48:32 -07:00
Tony Tung
bbeadc1631 [cdatapack] wire up getdeltachain
Summary: Call the getdeltachain in C, format the results for python.

Test Plan: pass test-repack.t

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: durham, mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3671744

Signature: t1:3671744:1470382442:efae340c8cf5b173407c909c0816bf26704c7bf5
2016-08-05 11:44:19 -07:00
Tony Tung
4a9136f325 [cdatapack] optimize fanout table lookups by doing the calculations at load time
Summary: Do the divides when we load up the index table.  Lookups are now cheaper.

Test Plan: pass test-repack.t

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3671719

Signature: t1:3671719:1470382246:937fd19f60ad71304e6a75f348fa92f067aba895
2016-08-05 11:43:39 -07:00
Tony Tung
e45027053c [cdatapack] iterator should return deltalen and not delta
Summary: This is to match the Python API.

Test Plan: used in later diff to pass test-datapack.t

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3668366

Signature: t1:3668366:1470341366:7c77684e646fe31a742e76158133597b46307797
2016-08-04 13:52:55 -07:00
Tony Tung
437072e8ec [cdatapack] fix bug in following deltabasenode pointers
Summary: The raw index is a byte offset, not an entry number.  I hope the compiler is smart enough to optimize out the divide and multiply. :)

Test Plan: cdatapack_get on a delta chain that has a deltabasenode does not crash!

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3667033

Signature: t1:3667033:1470341429:b37da6c9ea6e37fe79b48ec6766c857b5e56c36a
2016-08-04 13:52:41 -07:00
Tony Tung
4fc0694b79 [cdatapack] use int for string length
Summary: Unless `PY_SSIZE_T_CLEAN` is defined, the length is int.

Test Plan: compiles.

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3666998

Signature: t1:3666998:1470341329:3ddd1f7e36389aff3e18482db4d0b28ab8d7c12f
2016-08-04 13:52:24 -07:00
Tony Tung
66ffd84e1d [cdatapack] add some error handling
Summary:
Replace some TODOs with actual error handling code.

Also lumped in typo fixes and style changes.  Sorry.

Test Plan: Used in a later diffs to pass test-datapack.t

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3666875

Signature: t1:3666875:1470341306:67cd439341ad30fcd690ff2e399e8beacea1c0bb
2016-08-04 13:51:45 -07:00
Tony Tung
66368c07a1 [cdatapack] fix bugs in the fanout table
Summary:
1. When bisecting, we don't want to wrap around.  If middle == 0 and we're lesser than that, we should just fail.
2. large fanout should be header->config & LARGE_FANOUT.  | means it's always a large fanout.
3. the format of the fanout table on disk makes it impossible to differentiate between an empty fanout entry and the first fanout entry, as they are both '0'.  Therefore, any entry before the *second* fanout entry must implicitly search the 0th element.
4. fixed a bug in the calculation of the last index entry.

Test Plan: passed test-datapack.t with other fixes applied.

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: durham, mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3666770

Signature: t1:3666770:1470341277:3f4f63a365e8bb0f4da6e574fc7f15228877c682
2016-08-04 13:51:07 -07:00
Tony Tung
8237209f77 [cdatapack] stricter const
Summary: We don't ever need to modify the node sha data, so make it const.

Test Plan: compiles

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3666714

Signature: t1:3666714:1470340104:080ffa290a49388e797dcefc66976f6341932b76
2016-08-04 13:50:00 -07:00
Tony Tung
c299d186f4 [cdatapack] implement _find()
Summary: Depends on D3660087, D3654810

Test Plan:
```
#!/usr/bin/env python

import binascii

import cdatapack

a = cdatapack.datapack('d864669a5651d04505ec6e5e9dba1319cde71f7b')

bin = binascii.unhexlify('f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e')
print a._find(bin)
```

yields:

```
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:eaf5d75> python foo.py
('\xf2\xe5?\x83\xc5\xdc\x80j\xa2\xed\xa8{\xb1_\xe06{\xaf:~', 4294967295, 285122348L, 8374L)
```

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3660091

Signature: t1:3660091:1470339492:21a7f3067bda7822c8f396120f99f1dc6e4e26b5
2016-08-04 13:49:19 -07:00
Tony Tung
9b036e561e [cdatapack] expose the find interface
Summary: Needed if we want to do a hybrid implementation of cdatapack

Test Plan: used in following diff.

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3660087

Signature: t1:3660087:1470339373:4e8b548f1509af7f34d0a4bf8bd85723f38d238d
2016-08-04 13:48:32 -07:00
Tony Tung
113f23b65e [cdatapack] implement iteritems()
Summary: `iteritems()` differs from `__iter__()` slightly in that it yields the delta base and delta.

Test Plan:
run this toy program

```
#!/usr/bin/env python

import cdatapack

a = cdatapack.datapack('d864669a5651d04505ec6e5e9dba1319cde71f7b')
for x in a.iterentries():
    print x[0], repr(x[1]), repr(x[2]), len(x[3])
for x in a:
    print x[0], repr(x[1])
```

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3659133

Signature: t1:3659133:1470339839:dbdce5990a30ffe019ccc44fce97925b64524acd
2016-08-04 13:48:21 -07:00
Tony Tung
a28137668e [cdatapack] skeleton for iterator type
Summary: cdatapack now has a getiter function, and it returns a cdatapack_iterator.

Test Plan:
using this toy program, dumped a pack file.

```
#!/usr/bin/env python

import cdatapack

a = cdatapack.datapack('d864669a5651d04505ec6e5e9dba1319cde71f7b')
for x in a:
    print x
```

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: durham, mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3659005

Signature: t1:3659005:1470339657:aa39cc57a669b9bc4604933ce35ed20b3f81b468
2016-08-04 13:48:04 -07:00
Tony Tung
e8da5b62df [cdatapack] fix build on linux hosts
Summary:
1. Get ntohl from arpa/inet.h as per the posix spec
2. Get ntohll from endian.h's be64toh

Test Plan: make local

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3671211

Signature: t1:3671211:1470341382:e6b0fe12094246aeb6be09252122bde9680e4599
2016-08-04 13:23:11 -07:00
Tony Tung
7b70d8c572 [cdatapack] skeleton for the python type
Summary:
This is the skeleton for the python type.  Only initialization and the destructor are filled in.

Depends on D3654786.

Test Plan:
```
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:0478c29> ls -l d864669a5651d04505ec6e5e9dba1319cde71f7b*
-r--r--r--  1 tonytung  staff     947666 Jul 26 14:08 d864669a5651d04505ec6e5e9dba1319cde71f7b.dataidx
-r--r--r--  1 tonytung  staff  285130722 Jul 26 14:08 d864669a5651d04505ec6e5e9dba1319cde71f7b.datapack
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:0478c29> cat foo.py
#!/usr/bin/env python

import cdatapack

a = cdatapack.datapack('d864669a5651d04505ec6e5e9dba1319cde71f7b')
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:0478c29> python foo.py
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:0478c29>
```

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3654810

Signature: t1:3654810:1470175451:c2d4e4acc138685c1030ed98f0afd9379f9fa0c4
2016-08-03 15:29:44 -07:00
Tony Tung
76f5986ab9 [cdatapack] adds an initial cpython file for the cdatapack implementation
Summary: Just a simple module declaration with no logic yet.

Test Plan:
```
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:2445a3a> make local

<output snipped>

[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:2445a3a> python
Python 2.7.11 (default, Mar  1 2016, 18:40:10)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cdatapack
>>>
```

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3654786

Signature: t1:3654786:1470175354:c7e8847dcc74c83483d21888ad30cd9242fb461c
2016-08-03 15:29:29 -07:00
Tony Tung
58e395d2e6 [cdatapack] utility to retrieve and checksum the delta chain
Summary:
Given a node sha, find it in the index file and retrieve the deltas.  Checksum the data and dump it.

Depends on D3637000, D3636945

Test Plan:
```
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:68cd351> /Users/tonytung/Library/Caches/CLion2016.2/cmake/generated/cdatapack-64b7828e/64b7828e/Debug0/cdatapack_get  d864669a5651d04505ec6e5e9dba1319cde71f7b  f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e

source/zippydb/tier_spec/tier_settings/zippydb.wildcard.tmpfs.zippydb_settings.cconf
Node                                      Delta Base                                Delta SHA1                                Delta Length
f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e  0000000000000000000000000000000000000000  f32b366a6c44430df6526133f82f9638426ba9c5  37769
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:68cd351> hg debugdatapack d864669a5651d04505ec6e5e9dba1319cde71f7b --node f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e

source/zippydb/tier_spec/tier_settings/zippydb.wildcard.tmpfs.zippydb_settings.cconf
Node                                      Delta Base                                Delta SHA1                                Delta Length
f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e  0000000000000000000000000000000000000000  f32b366a6c44430df6526133f82f9638426ba9c5  37769
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:68cd351>
```

Reviewers: durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3637416

Signature: t1:3637416:1470094723:bce7e903cd0b80c293e16b7532c49e552d3039ef
2016-08-03 15:29:01 -07:00
Tony Tung
974339f97e [cdatapack] fix index retrieval bugs
Summary:
1. offsets are absolute byte offsets.  convert them to entry offsets to make the bisect code a lot simpler.
2. when writing entries to pack chain, we need to advance the pointer.

Depends on D3627122

Test Plan: used in later diff.

Reviewers: durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3637000

Signature: t1:3637000:1469741885:c2416a3b30e5bb2b64e6bb7062f4c02098be91eb
2016-08-01 14:18:35 -07:00
Tony Tung
20126b3bd8 [cdatapack] fix memory handling for cdatapack
Summary:
`->index_table` is not heap-alloacted.  however, `->fanout_table` is and should be released.

Also added call to `close_datapack()` at the end of `cdatapack_dump.c`.

Depends on D3627122

Test Plan: valgrind is much happier now.

Reviewers: durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3631368

Signature: t1:3631368:1469741779:e0c4e5d59c7e73c8aa3507901df3005383f0d3f5
2016-08-01 14:11:16 -07:00
Tony Tung
705c0731b6 [remotefilelog] initial checkin of a c datapack parser
Summary: This is not yet complete, but seems to be able to parse a data file.

Test Plan:
`/Users/tonytung/Library/Caches/CLion2016.2/cmake/generated/cdatapack-64b7828e/64b7828e/Debug/cdatapack_dump d864669a5651d04505ec6e5e9dba1319cde71f7b > /tmp/2`

compare it with the output of `hg debugdatapack --long d864669a5651d04505ec6e5e9dba1319cde71f7b > /tmp/1`

and it exactly matches.

Reviewers: durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3627122

Signature: t1:3627122:1470085301:c9b9e8b2fa57bb7a09dd56d3c811ff8eadbb85ba
2016-08-01 14:05:37 -07:00