Summary: The raw index is a byte offset, not an entry number. I hope the compiler is smart enough to optimize out the divide and multiply. :)
Test Plan: cdatapack_get on a delta chain that has a deltabasenode does not crash!
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3667033
Signature: t1:3667033:1470341429:b37da6c9ea6e37fe79b48ec6766c857b5e56c36a
Summary:
Replace some TODOs with actual error handling code.
Also lumped in typo fixes and style changes. Sorry.
Test Plan: Used in a later diffs to pass test-datapack.t
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3666875
Signature: t1:3666875:1470341306:67cd439341ad30fcd690ff2e399e8beacea1c0bb
Summary:
1. When bisecting, we don't want to wrap around. If middle == 0 and we're lesser than that, we should just fail.
2. large fanout should be header->config & LARGE_FANOUT. | means it's always a large fanout.
3. the format of the fanout table on disk makes it impossible to differentiate between an empty fanout entry and the first fanout entry, as they are both '0'. Therefore, any entry before the *second* fanout entry must implicitly search the 0th element.
4. fixed a bug in the calculation of the last index entry.
Test Plan: passed test-datapack.t with other fixes applied.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3666770
Signature: t1:3666770:1470341277:3f4f63a365e8bb0f4da6e574fc7f15228877c682
Summary: We don't ever need to modify the node sha data, so make it const.
Test Plan: compiles
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3666714
Signature: t1:3666714:1470340104:080ffa290a49388e797dcefc66976f6341932b76
Summary: Needed if we want to do a hybrid implementation of cdatapack
Test Plan: used in following diff.
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3660087
Signature: t1:3660087:1470339373:4e8b548f1509af7f34d0a4bf8bd85723f38d238d
Summary: `iteritems()` differs from `__iter__()` slightly in that it yields the delta base and delta.
Test Plan:
run this toy program
```
#!/usr/bin/env python
import cdatapack
a = cdatapack.datapack('d864669a5651d04505ec6e5e9dba1319cde71f7b')
for x in a.iterentries():
print x[0], repr(x[1]), repr(x[2]), len(x[3])
for x in a:
print x[0], repr(x[1])
```
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3659133
Signature: t1:3659133:1470339839:dbdce5990a30ffe019ccc44fce97925b64524acd
Summary: cdatapack now has a getiter function, and it returns a cdatapack_iterator.
Test Plan:
using this toy program, dumped a pack file.
```
#!/usr/bin/env python
import cdatapack
a = cdatapack.datapack('d864669a5651d04505ec6e5e9dba1319cde71f7b')
for x in a:
print x
```
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: durham, mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3659005
Signature: t1:3659005:1470339657:aa39cc57a669b9bc4604933ce35ed20b3f81b468
Summary:
1. Get ntohl from arpa/inet.h as per the posix spec
2. Get ntohll from endian.h's be64toh
Test Plan: make local
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3671211
Signature: t1:3671211:1470341382:e6b0fe12094246aeb6be09252122bde9680e4599
Summary: Just a simple module declaration with no logic yet.
Test Plan:
```
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:2445a3a> make local
<output snipped>
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:2445a3a> python
Python 2.7.11 (default, Mar 1 2016, 18:40:10)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cdatapack
>>>
```
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3654786
Signature: t1:3654786:1470175354:c7e8847dcc74c83483d21888ad30cd9242fb461c
Summary:
Given a node sha, find it in the index file and retrieve the deltas. Checksum the data and dump it.
Depends on D3637000, D3636945
Test Plan:
```
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:68cd351> /Users/tonytung/Library/Caches/CLion2016.2/cmake/generated/cdatapack-64b7828e/64b7828e/Debug0/cdatapack_get d864669a5651d04505ec6e5e9dba1319cde71f7b f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e
source/zippydb/tier_spec/tier_settings/zippydb.wildcard.tmpfs.zippydb_settings.cconf
Node Delta Base Delta SHA1 Delta Length
f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e 0000000000000000000000000000000000000000 f32b366a6c44430df6526133f82f9638426ba9c5 37769
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:68cd351> hg debugdatapack d864669a5651d04505ec6e5e9dba1319cde71f7b --node f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e
source/zippydb/tier_spec/tier_settings/zippydb.wildcard.tmpfs.zippydb_settings.cconf
Node Delta Base Delta SHA1 Delta Length
f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e 0000000000000000000000000000000000000000 f32b366a6c44430df6526133f82f9638426ba9c5 37769
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:68cd351>
```
Reviewers: durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3637416
Signature: t1:3637416:1470094723:bce7e903cd0b80c293e16b7532c49e552d3039ef
Summary: Now that we report uncompressed lengths, the test output needs to be updated.
Test Plan: pass `PYTHONPATH=~/work/mercurial/facebook-hg-rpms/fb-hgext/:~/work/mercurial/facebook-hg-rpms/lz4revlog/:~/work/mercurial/facebook-hg-rpms/remotefilelog/ python ~/work/mercurial/facebook-hg-rpms/hg-crew/tests/run-tests.py -j32 test-repack.t`
Reviewers: #fastmanifest, durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3656796
Signature: t1:3656796:1470175464:56bf12516710cef8e8aaa7e7b3e0dbdfa220d797
Summary:
1. offsets are absolute byte offsets. convert them to entry offsets to make the bisect code a lot simpler.
2. when writing entries to pack chain, we need to advance the pointer.
Depends on D3627122
Test Plan: used in later diff.
Reviewers: durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3637000
Signature: t1:3637000:1469741885:c2416a3b30e5bb2b64e6bb7062f4c02098be91eb
Summary:
When retrieving a delta chain, datapack.py uncompresses the delta chain data. However, when iterating over the datapack, we get the compressed length. THis is not desirable as the output is no longer consistent. This diff peeks into the lz4 header to get the uncompressed length when iterating.
Depends on D3627119
Test Plan:
```
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:e2ef218> hg debugdatapack d864669a5651d04505ec6e5e9dba1319cde71f7b --node f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e
source/zippydb/tier_spec/tier_settings/zippydb.wildcard.tmpfs.zippydb_settings.cconf
Node Delta Base Delta SHA1 Delta Length
f2e53f83c5dc806aa2eda87bb15fe0367baf3a7e 0000000000000000000000000000000000000000 f32b366a6c44430df6526133f82f9638426ba9c5 37769
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:e2ef218> hg debugdatapack d864669a5651d04505ec6e5e9dba1319cde71f7b | tail -n 4
source/zippydb/tier_spec/tier_settings/zippydb.wildcard.tmpfs.zippydb_settings.cconf
Node Delta Base Delta Length
f2e53f83c5dc 000000000000 37769
[andromeda]:~/work/mercurial/facebook-hg-rpms/remotefilelog:e2ef218>
```
Reviewers: durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3636945
Signature: t1:3636945:1469811243:b21d90d9599244ed4600c5336818b9a18eacf3ff
Summary:
`->index_table` is not heap-alloacted. however, `->fanout_table` is and should be released.
Also added call to `close_datapack()` at the end of `cdatapack_dump.c`.
Depends on D3627122
Test Plan: valgrind is much happier now.
Reviewers: durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3631368
Signature: t1:3631368:1469741779:e0c4e5d59c7e73c8aa3507901df3005383f0d3f5
Summary: This is not yet complete, but seems to be able to parse a data file.
Test Plan:
`/Users/tonytung/Library/Caches/CLion2016.2/cmake/generated/cdatapack-64b7828e/64b7828e/Debug/cdatapack_dump d864669a5651d04505ec6e5e9dba1319cde71f7b > /tmp/2`
compare it with the output of `hg debugdatapack --long d864669a5651d04505ec6e5e9dba1319cde71f7b > /tmp/1`
and it exactly matches.
Reviewers: durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3627122
Signature: t1:3627122:1470085301:c9b9e8b2fa57bb7a09dd56d3c811ff8eadbb85ba
Summary:
It should include the filelen and the deltalen fields, which are
2 and 8 bytes.
Test Plan: visual.
Reviewers: durham
Reviewed By: durham
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3627108
Signature: t1:3627108:1469742083:ffb59768906d9e5463065eec92e1c80cc8482884
Calling wrapfunction on the remotefilepeer(sshpeer) object in exchangepull
function introduces a reference cycle. Hence, this object will not be deleted
until the process dies. This is not a big issue for processes having a short
lifetime(e.g. lauched by command line.)
However, for persistent processes (e.g. TortoiseHg), this can lead to multiple
lingering ssh connections to the server(actually one by pull operation).
The fix is to not wrap the remotefilepeer._callstream. This method is defined
right into the remotefilepeer object. The required repo data is made available
in the remotefilepeer object by monkeypatching this object in the exchangepull
function.
In some situations the remotefilelog setup logic could be called, which will
wrap certain functions, and then later a call will happen to a repo that wasn't
remotefilelog which will run some remotefilelog code because of the wrapping.
Normally we take care of this by checking for the remotefilelog requirement. We
missed it in this one spot though.
In two place, we were checking if a revlog was an instance of
revlog.revlog and, I think, treating it as a
remotefilelog.remotefilelog otherwise. I noticed this when I created
another non-revlog.revlog revlog in narrowhg and remotefilelog thought
it was a remotefilelog.remotefilelog. Let's specifically check if it's
a remotefilelog.remotefilelog instead.
Summary:
When running large repack operations, the resident size of the process
could become quite large, since we're scanning in entire pack files. Linux/OSX
have api calls for telling the kernel it's ok to release some of that memory,
but those apis are not exposed to python.
So instead, let's unmap and remap the mmap's once a certain amount of data has
been read. I also tried changing the mmap accessors to use the file oriented api
(mmap.read(), mmap.seek(), etc) so we could switch to actual file handles during
repack, but it had a drastic affect on normal performance (repack took 1 hour
instead of a few minutes).
Long term we should move all of this logic to c++ so we can use the more
powerful APIs.
Test Plan:
Did a full repack on a laptop and verified memory capped out at 2GB
instead of exceeding 5GB.
Reviewers: #sourcecontrol, ttung
Differential Revision: https://phabricator.intern.facebook.com/D3545171
Summary:
There was a race condition where if a repack is running and another hg process
launches, the new process will only see the original packs, and not any of the
new packs (even though the source blobs are being deleted from disk by the
repack).
The fix is to allow our pack store to refresh it's list of packs every so often.
In this particular implementation we do it at most every 100ms. A more robust
strategy would be to group key misses and only check for new packs at the end
once we have a list of all the misses, but this would require significant
refactoring to make everything grouped. This case should only ever happen during
repacks, so it should almost never occur more than once during a command, so the
100ms version is probably good enough.
Test Plan:
Ran `hg up && hg pull && sleep 0.2 && hg up master` in a loop with a
break point in the refresh code and caught it executing in a situation where the
background repack had removed the original sources and put them in a new pack.
Verified that it loaded the data from the new pack correctly.
Reviewers: #mercurial, ttung, lcharignon
Reviewed By: lcharignon
Subscribers: lcharignon
Differential Revision: https://phabricator.intern.facebook.com/D3524314
Signature: t1:3524314:1467907680:85be07ad953811000c468852eb0626f4d8b53a13
Summary:
The shared cache needs to be completely g+ws so that all members of the group
can write to each directory in it. The old code only applied g+ws to the leaf
directories, so other users aren't able to write to non leaf directories (like
hgcache/7a/83beca8.../ others couldn't write to 7a/)
Test Plan:
Updated a test to view group permissions for the intermediate
directories
Reviewers: #mercurial, ttung, simpkins
Reviewed By: simpkins
Subscribers: lcharignon, net-systems-diffs@, simpkins, mbolin
Differential Revision: https://phabricator.intern.facebook.com/D3523918
Signature: t1:3523918:1467930221:452b11b56a2e69896bf8d2cd0acd7131b41f90d8
Summary:
Previously, the history repack logic would stop traversing history for a given
filename once it encountered a rename. This isn't quite right, since the history
could eventually be traversed back to the original file, where we'd need to
continue processing. So now we check for when the copyfrom becomes the filename.
Also, if the copy source file and the copy target file have two nodes with the
same value, we would not process the one in the copy target (since it was marked
do not process). We fix this by explicitly checking if the node is one of the
known entries in the file being processed.
Test Plan: Added a test
Reviewers: #mercurial, ttung, mitrandir, rmcelroy
Reviewed By: mitrandir, rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3523215
Signature: t1:3523215:1467828169:bd487c8f296352c1a1b9355cb55f9001bd5e19a9
Before this patch, debugremotefilelog and verifyremotefilelog would
crash if not given a path. Also, many commands would accept arguments
they then ignored.
Summary:
Before this patch, we were not importing mercurial.error, this was
causing a crash when calling error.Abort. This patch adds the missing import.
Test Plan: Tests pass, and add a new test
Reviewers: durham
Differential Revision: https://phabricator.intern.facebook.com/D3457086
In this race condition test, occasionally the second invocation would actually
obtain the lock before the first. This meant that the first repack would fail
with an error message while the second would exit with 0, resulting in the test
output changing slightly. Let's introduce a slight delay before the second
invocation to prevent this from happening.
In the old days we would check the cache first, then the local store. This was
important because the cache is more likely to contain correct data (since it
comes from the final pushed version of commits), versus local data which may
contain information about stripped commits.
As part of the big store refactor, this order got switched unintentionally. So
let's switch it back.
Summary:
The pack path logic did not use the correct unix group when
remotefilelog.cachegroup was specified. This fixes that.
Test Plan:
I manually tested it by deleting a pack dir and running repack. This
is hard to create an automated test for since the feature isn't really cross
platform, and we don't have a way to know what groups they have on their
machine.
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3400756
Tasks: 11584114
Signature: t1:3400756:1465342537:ed023f6dc830117df5e85e294a41486f072714c9
The previous commit fixed a bug where copyfrom data was represented incorrectly
in the local .hg/store/data remotefilelog blobs when the ancestor data was read
from a pack file. This commit adds a test for that situation.
The new pack stores return None for the copyfrom field, instead of the expected
''. We need the local file blob generator to handle this case, instead of just
putting None in the copyfrom field.
Summary:
Now that repack can clean up old remotefilelog blobs, let's have it also delete
any empty directories that get left behind.
Test Plan: Updated an existing test to cover it
Reviewers: mitrandir, lcharignon, #sourcecontrol, ttung, simonfar
Reviewed By: simonfar
Subscribers: simonfar
Differential Revision: https://phabricator.intern.facebook.com/D3385546
Signature: t1:3385546:1464972782:5ca63cf0a5589bb8a537957f50b4bc5ec4e0f0f5
Summary:
Previously a bunch of different places accessed the cachepath through ui.config
directly. This is a problem because we need to resolve any environment variables
in the path, and some spots didn't do this. So let's unify all accesses through
a helper function that takes care of the environment variables.
Test Plan: Added a test
Reviewers: mitrandir, lcharignon, #sourcecontrol, ttung, simonfar
Reviewed By: simonfar
Subscribers: simonfar
Differential Revision: https://phabricator.intern.facebook.com/D3385583
Signature: t1:3385583:1464971813:5b9ee5ed3d6ff9f1a78cb9e0269e433844758c9d