Commit Graph

27 Commits

Author SHA1 Message Date
Jun Wu
f68fc35c58 cdatapack: get be64toh defined correctly
Summary:
This is to fix the below compiling error:

```
remotefilelog/cdatapack/cdatapack.c:102:28: error: implicit declaration of function ‘be64toh’ [-Werror=implicit-function-declaration]
   packindex->data_offset = ntoh_data_offset(
                            ^^^^^^^^^^^^^^^^
```

Feature test macros must be defined before libc `#include`s.

Test Plan: `make local` and the above error disappers.

Reviewers: #sourcecontrol, ttung

Reviewed By: ttung

Subscribers: mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3797669

Signature: t1:3797669:1472671003:9f8674a14c39042d24c673e41321683807a60fab
2016-08-31 15:29:10 +01:00
Tony Tung
3bc89da6fd [cdatapack] set the return code when everything is fine.
Summary: D3786546 was broken in that it used an uninitialized return value.

Test Plan: `valgrind  /Users/tonytung/Library/Caches/CLion2016.2/cmake/generated/fb-hgext-c4fc6e6f/c4fc6e6f/RelWithDebInfo1/cdatapack_get ~/.hgcache/fbsource/packs/e70f2fc9ec08516a196a6f5f3a475b21b6cee3b2 749a00bf1a195a711de2818b23da78e7f1ee034a` is 100% clean.

Reviewers: #fastmanifest, quark

Reviewed By: quark

Subscribers: mitrandir, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3799213

Signature: t1:3799213:1472671038:51522987c9b3bebc281d17f72a422af542d3d815
2016-08-31 12:21:58 -07:00
Tony Tung
8f4429ebd2 [cdatapack] track how much memory we've paged in for reading the datapack, and madvise away if over a limit
Summary:
Things I considered:
1. doing this after a fixed number of delta chain reads (this would likely be just fine, actually).  there is the risk of a few very long/large chains blowing out the memory footprint.
2. tracking the actual memory addresses and not madvising away everything away. (unlikely to help, since it's probably the syscall + page table updates that is costly, not the address range)
3. letting FreeBSD / MACOS go nuts with MADV_FREE while limiting only linux (which uses the more costly MADV_DONTNEED)

In the end, I settled on this because we're already using this for the python version and it hasn't killed anyone.

Test Plan:
run:

```
#!/usr/bin/env python2.7

import binascii

import cdatapack

dp = cdatapack.datapack('/Users/tonytung/.hgcache/fbsource/packs/e70f2fc9ec08516a196a6f5f3a475b21b6cee3b2')

for ix, line in enumerate(open('/tmp/hashes', 'r')):
    line = line.strip()
    try:
        dp.getdeltachain(binascii.unhexlify(line))
    except:
        print "failed when retrieving " + line
        raise
```

and observed that the memory footprint stayed quite low.

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3794420

Tasks: 12829688

Signature: t1:3794420:1472597468:6642f1e6cd85859dbf7cce3e4ce35d5375601be0
2016-08-30 16:21:24 -07:00
Tony Tung
12f10ef039 [cdatapack] handle quirk in lz4's treatment of 0-length inputs
Summary: When lz4 gets a 0-length input to compress, it outputs a standard header (0 byte uncompressed output, followed by 12 mystery bytes.)  When we uncompress this, `LZ4_decompress_fast` returns -1 (!).  So we treat this as an oddball case and don't declare it corrupt.

Test Plan: dump all the entries in my fbsource datapack without any corruption reports

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3794368

Signature: t1:3794368:1472597620:a1368e55b6f74882c778036a5459e92d9c026b6a
2016-08-30 16:21:11 -07:00
Tony Tung
b868c924b9 [cdatapack] remove useless pointer assignment
Summary: We're not traversing the delta chain links sequentially, so we don't need this.  Just discard and move on.

Test Plan: make local, run test from D3792849

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3794320

Signature: t1:3794320:1472597597:0a3087317c6d34f51888b69590db52d4f51bd27e
2016-08-30 16:20:53 -07:00
Tony Tung
17d236dec2 [cdatapack] error checking when we retrieve the individual delta chain links
Summary: Check for OOM and corruption (when decompressing doesn't work as expected).

Test Plan: pass hacked test-remotefilelog-datapack.py

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: durham, mitrandir, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3787665

Tasks: 12932864

Signature: t1:3787665:1472575627:1fb75fd3a2524fe1457721b0667903163acf6062
2016-08-30 11:33:28 -07:00
Tony Tung
5adef73d26 [cdatapack] don't define _BSD_SOURCE if it's already defined
Summary: On some platforms, it's defined by errno.h.  WTF.

Test Plan: make local

Reviewers: #fastmanifest, zamsden, durham

Reviewed By: durham

Subscribers: mitrandir, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3789066

Signature: t1:3789066:1472521513:1b53feab3f46b55b2b0f4d20f82b829f304f5daf
2016-08-29 21:56:17 -07:00
Tony Tung
36681987eb [cdatapack] struct type for return code for getdeltachainlink
Summary: Noop as far as the actual result codes are thus far.

Test Plan: compiles, passes rigged test-remotefilelog-datapack.py

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3787517

Tasks: 12932864

Signature: t1:3787517:1472517234:f1a9bac77588e9c3d1d64e5923efc23e5e2d0066
2016-08-29 21:51:22 -07:00
Tony Tung
e337be9c29 [cdatapack] check the version code in the datapack file
Summary:
We were only checking the version code in the index file.

Depends on D3786770

Test Plan: now we can pass a hacked up version of test-remotefilelog-datapack.py

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3786805

Tasks: 12932864

Signature: t1:3786805:1472515742:5c411b158f39c6153a3746f734037e17ba274e11
2016-08-29 21:49:52 -07:00
Tony Tung
bcc09dcfb2 [cdatapack] error handling for setting up cdatapack
Summary:
1. handle has a status field which indicates how successful it was in opening.
2. in the python abstraction layer, we inspect the status field.  we check the error code before we free the block and return control.

Test Plan: a version of test-remotefilelog-datapack.py hacked up to use cdatapack passes (except for version code mismatches in the data file; addressing that in separate diff)

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3786770

Tasks: 12932864

Signature: t1:3786770:1472514907:e907f2c8d75f1ce0a65aff2b2c97541bb5e27801
2016-08-29 21:49:26 -07:00
Tony Tung
a916da7f6d [cdatapack] remove hilariously useless noop
Summary: Not sure how this snuck in.

Test Plan: make local

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3786739

Signature: t1:3786739:1472514343:a2df360e3aa2c7cb38573031a193db16d30dc747
2016-08-29 21:48:08 -07:00
Tony Tung
bae493d432 [cdatapack] fix error handling for build_pack_chain
Summary:
1. return `pack_chain_code_t` structs since they're small.  add an error code.
2. set the code for returning the data.
3. `get_delta_chain` examines the return code and passes it back up.

Depends on D3786447

Test Plan:
make local.

tried cdatapack_dump and cdatapack_get without anything blowing up.

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3786546

Tasks: 12932864

Signature: t1:3786546:1472515566:94d4df0d4a2f99d990bab5134df9b8ea934e0d7f
2016-08-29 21:47:37 -07:00
Tony Tung
adfafb792b [cdatapack] fix error handling for getdeltachain
Summary:
1. `delta_chain_t` is a pretty small structure, so I'm modifying the code to return this struct on the stack.  This removes the allocation and the error checking needs for it.
2. add a `code` field to indicate the status.

Test Plan:
make local.

unfortunately, we don't have any good tooling to introduce memory allocation failures, so this is mostly visual.

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: durham, mitrandir, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3786447

Tasks: 12932864

Signature: t1:3786447:1472514697:c09956480b8a5456ffc3f2f955d6a2d2e816769f
2016-08-29 21:46:43 -07:00
Tony Tung
5a3720dbae [cdatapack] fix build issues on other platforms
Summary: Ubuntu 14 apparently requires `_BSD_SOURCE` to be defined for nteoh64 to exist.  Also, MAP_FILE is not part of POSIX mmap spec, so is not always available.  Fortunately, on the platforms we support (linux and osx), it appears to be implied by default.

Test Plan: make local

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir, mjpieters

Differential Revision: https://phabricator.intern.facebook.com/D3780057

Signature: t1:3780057:1472246090:100c38f613e67dc1c56ab6ef55a3cc7c4f703497
2016-08-26 14:17:38 -07:00
Tony Tung
30ba9cdd24 [cdatapack] madvise the memory away
Summary: Once we're done reading the delta data, we madvise it away.

Test Plan:
dump all the hashes from a datapack into a separate file.  then run a script to fetch all the delta chains.  observed that the memory footprint did not increase significantly.

```
#!/usr/bin/env python

import binascii

import cdatapack

dp = cdatapack.datapack('/dev/shm/hgcache/fbsource/packs/8b5d28f7a5bd7391a0b060c88af8cca3af357c24')

for ix, line in enumerate(open('/tmp/hashes', 'r')):
    line = line.strip()
    dp.getdeltachain(binascii.unhexlify(line))
```

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: durham, mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3686830

Signature: t1:3686830:1470716333:e8fc8e3e3fa29c1931f69222c17e10f519a4a8c2
2016-08-15 11:39:14 -07:00
Tony Tung
a401e1db1c [cdatapack] free the data segments allocated for delta chains
Summary: Since we allocate the memory for the uncompressed data on the heap, we need to free it.

Test Plan: compiles.

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3686334

Signature: t1:3686334:1470716123:10fae5169e73ab581b282f771d188f804198d169
2016-08-09 13:48:32 -07:00
Tony Tung
4a9136f325 [cdatapack] optimize fanout table lookups by doing the calculations at load time
Summary: Do the divides when we load up the index table.  Lookups are now cheaper.

Test Plan: pass test-repack.t

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3671719

Signature: t1:3671719:1470382246:937fd19f60ad71304e6a75f348fa92f067aba895
2016-08-05 11:43:39 -07:00
Tony Tung
437072e8ec [cdatapack] fix bug in following deltabasenode pointers
Summary: The raw index is a byte offset, not an entry number.  I hope the compiler is smart enough to optimize out the divide and multiply. :)

Test Plan: cdatapack_get on a delta chain that has a deltabasenode does not crash!

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3667033

Signature: t1:3667033:1470341429:b37da6c9ea6e37fe79b48ec6766c857b5e56c36a
2016-08-04 13:52:41 -07:00
Tony Tung
66ffd84e1d [cdatapack] add some error handling
Summary:
Replace some TODOs with actual error handling code.

Also lumped in typo fixes and style changes.  Sorry.

Test Plan: Used in a later diffs to pass test-datapack.t

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3666875

Signature: t1:3666875:1470341306:67cd439341ad30fcd690ff2e399e8beacea1c0bb
2016-08-04 13:51:45 -07:00
Tony Tung
66368c07a1 [cdatapack] fix bugs in the fanout table
Summary:
1. When bisecting, we don't want to wrap around.  If middle == 0 and we're lesser than that, we should just fail.
2. large fanout should be header->config & LARGE_FANOUT.  | means it's always a large fanout.
3. the format of the fanout table on disk makes it impossible to differentiate between an empty fanout entry and the first fanout entry, as they are both '0'.  Therefore, any entry before the *second* fanout entry must implicitly search the 0th element.
4. fixed a bug in the calculation of the last index entry.

Test Plan: passed test-datapack.t with other fixes applied.

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: durham, mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3666770

Signature: t1:3666770:1470341277:3f4f63a365e8bb0f4da6e574fc7f15228877c682
2016-08-04 13:51:07 -07:00
Tony Tung
8237209f77 [cdatapack] stricter const
Summary: We don't ever need to modify the node sha data, so make it const.

Test Plan: compiles

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3666714

Signature: t1:3666714:1470340104:080ffa290a49388e797dcefc66976f6341932b76
2016-08-04 13:50:00 -07:00
Tony Tung
9b036e561e [cdatapack] expose the find interface
Summary: Needed if we want to do a hybrid implementation of cdatapack

Test Plan: used in following diff.

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3660087

Signature: t1:3660087:1470339373:4e8b548f1509af7f34d0a4bf8bd85723f38d238d
2016-08-04 13:48:32 -07:00
Tony Tung
e8da5b62df [cdatapack] fix build on linux hosts
Summary:
1. Get ntohl from arpa/inet.h as per the posix spec
2. Get ntohll from endian.h's be64toh

Test Plan: make local

Reviewers: #fastmanifest, durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3671211

Signature: t1:3671211:1470341382:e6b0fe12094246aeb6be09252122bde9680e4599
2016-08-04 13:23:11 -07:00
Tony Tung
58e395d2e6 [cdatapack] utility to retrieve and checksum the delta chain
Summary:
Given a node sha, find it in the index file and retrieve the deltas.  Checksum the data and dump it.

Depends on D3637000, D3636945

Test Plan:
```
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:68cd351> /Users/tonytung/Library/Caches/CLion2016.2/cmake/generated/cdatapack-64b7828e/64b7828e/Debug0/cdatapack_get  d864669a5651d04505ec6e5e9dba1319cde71f7b  f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e

source/zippydb/tier_spec/tier_settings/zippydb.wildcard.tmpfs.zippydb_settings.cconf
Node                                      Delta Base                                Delta SHA1                                Delta Length
f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e  0000000000000000000000000000000000000000  f32b366a6c44430df6526133f82f9638426ba9c5  37769
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:68cd351> hg debugdatapack d864669a5651d04505ec6e5e9dba1319cde71f7b --node f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e

source/zippydb/tier_spec/tier_settings/zippydb.wildcard.tmpfs.zippydb_settings.cconf
Node                                      Delta Base                                Delta SHA1                                Delta Length
f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e  0000000000000000000000000000000000000000  f32b366a6c44430df6526133f82f9638426ba9c5  37769
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:68cd351>
```

Reviewers: durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3637416

Signature: t1:3637416:1470094723:bce7e903cd0b80c293e16b7532c49e552d3039ef
2016-08-03 15:29:01 -07:00
Tony Tung
974339f97e [cdatapack] fix index retrieval bugs
Summary:
1. offsets are absolute byte offsets.  convert them to entry offsets to make the bisect code a lot simpler.
2. when writing entries to pack chain, we need to advance the pointer.

Depends on D3627122

Test Plan: used in later diff.

Reviewers: durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3637000

Signature: t1:3637000:1469741885:c2416a3b30e5bb2b64e6bb7062f4c02098be91eb
2016-08-01 14:18:35 -07:00
Tony Tung
20126b3bd8 [cdatapack] fix memory handling for cdatapack
Summary:
`->index_table` is not heap-alloacted.  however, `->fanout_table` is and should be released.

Also added call to `close_datapack()` at the end of `cdatapack_dump.c`.

Depends on D3627122

Test Plan: valgrind is much happier now.

Reviewers: durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3631368

Signature: t1:3631368:1469741779:e0c4e5d59c7e73c8aa3507901df3005383f0d3f5
2016-08-01 14:11:16 -07:00
Tony Tung
705c0731b6 [remotefilelog] initial checkin of a c datapack parser
Summary: This is not yet complete, but seems to be able to parse a data file.

Test Plan:
`/Users/tonytung/Library/Caches/CLion2016.2/cmake/generated/cdatapack-64b7828e/64b7828e/Debug/cdatapack_dump d864669a5651d04505ec6e5e9dba1319cde71f7b > /tmp/2`

compare it with the output of `hg debugdatapack --long d864669a5651d04505ec6e5e9dba1319cde71f7b > /tmp/1`

and it exactly matches.

Reviewers: durham

Reviewed By: durham

Subscribers: mitrandir

Differential Revision: https://phabricator.intern.facebook.com/D3627122

Signature: t1:3627122:1470085301:c9b9e8b2fa57bb7a09dd56d3c811ff8eadbb85ba
2016-08-01 14:05:37 -07:00