Summary:
After D1417, `hg prefetch` takes care of downloading both the files
and trees during the prefetch. However, when the command is run without any
options, it attempts to prefetch the trees for the draft commits which results
in an error. We should not even attempt to prefetch trees for the draft
commits.
Test Plan: Added a test to detect this case and ran all the tests.
Reviewers: #fbhgext, durham
Reviewed By: #fbhgext, durham
Subscribers: durham
Differential Revision: https://phab.mercurial-scm.org/D1446
This extension is a thin wrapper around the native `changelog.index` object
that allows us to incrementally replace its methods. Since some index
methods (ex. `nodemap.__getitem__`) are called very frequently, Cython
features are used intentionally to avoid overhead. This also makes it easier
to integrate logic with C interface in the future.
As a side effect, this patch enforce `index` to be conceptually separate
from `nodemap`. So `changelog.index[node]` could be made illegal in the
future, which seems to be a good thing.
Test Plan:
Run `hg sl` with and without the extension in a large repo. Check traceprof
outputs. Notice the performance difference around index methods are roughly
10%, which seems acceptable:
Without the extension:
25 \ node (4823 times) changelog.py:361
18 | node (4931 times) revlog.py:631
With the extension:
27 \ node (4823 times) changelog.py:361
19 | node (4931 times) revlog.py:631
Also run `rt --extra-config-opt=extensions.clindex=` from core hg and make
sure changes are all caused by having an extra extension enabled.
Differential Revision: https://phab.mercurial-scm.org/D1353
The number 200 was used before D1435. It caused trouble on systems with low
`ulimit -n` and with the Python datapack code path because Python's mmap
implementation keeps an internal fd for every mmap object and there is no
way to close those fds via pure Python API. But there is no such limit for
cdatapack after D1185. So let's change cdatapack test to use 200 packs.
Test Plan:
`ulimit -n 50` and `./scripts/unit.py`
Differential Revision: https://phab.mercurial-scm.org/D1442
Summary:
Currently,
- `hg prefetch` prefetches files.
- `hg prefetchtrees` prefetches trees.
This commit removes `prefetchtrees` and makes `prefetch` responsible for
everything i.e. `prefetch` will prefetch whatever it can prefetch be it files,
trees, or both.
Test Plan: Ran all the tests.
Reviewers: #fbhgext, durham
Reviewed By: #fbhgext, durham
Subscribers: quark, durham
Differential Revision: https://phab.mercurial-scm.org/D1417
Summary:
The `prefetch` command has an option to repack the prefetched files.
Eventually, we plan to merge `prefetch` and `prefetchtrees` into a single
command and therefore, this commit takes a step towards making the interface to
these commands exactly the same.
Test Plan: Ran all the tests.
Reviewers: #fbhgext, durham
Reviewed By: #fbhgext, durham
Subscribers: durham
Differential Revision: https://phab.mercurial-scm.org/D1416
Summary:
The `prefetch` command only allows for one revision to be specified as
the base revision. Eventually, we plan to merge `prefetch` and `prefetchtrees`
into a single command and therefore, this commit takes a step towards making
the interface to these commands exactly the same.
Test Plan: Ran all the tests.
Reviewers: #fbhgext, durham
Reviewed By: #fbhgext, durham
Differential Revision: https://phab.mercurial-scm.org/D1415
Summary:
The `prefetch` command performs a preprocessing of the options before
doing the actual work. This commit just separates out that logic.
Test Plan: Ran all the tests.
Reviewers: #fbhgext, durham
Reviewed By: #fbhgext, durham
Subscribers: durham
Differential Revision: https://phab.mercurial-scm.org/D1414
Summary:
Adding the option to specify the base revision in the `prefetch`
command. This can useful to limit the prefetch data and also makes the
interface of `prefetch` consistent with `prefetchtrees`. Soon, we will merge
`prefetch` and `prefetchtrees` into a single command and both commands having a
similar interface is useful for the merging.
Test Plan: Ran all the tests.
Reviewers: #fbhgext, durham
Reviewed By: #fbhgext, durham
Subscribers: durham
Differential Revision: https://phab.mercurial-scm.org/D1368
Summary: Let the `prefetch` command be responsible for the repacking.
Test Plan: Ran all the tests.
Reviewers: #fbhgext, durham
Reviewed By: #fbhgext, durham
Differential Revision: https://phab.mercurial-scm.org/D1367
The exec bit got lost after rebase.
The rebase bug was filed as https://bz.mercurial-scm.org/5743.
The file content change is to workaround a potential pushrebase bug that
does not allow mode-only change.
Previously the test sets up `LD_LIBRARY_PATH` and `PYTHONPATH`, then runs
Python tests.
Within Python code, setting `sys.path` would achieve the same effect of
setting `PYTHONPATH`. For `LD_LIBRARY_PATH`, it's necessary for C libraries.
But the only C library that cstore depends on is `lz4`, which is supposed to
use the system version. There is no C library provided by this repo -
features like sha1 are compiled in `cstore.so`.
Therefore it's unnecessary to have a separate `.t` file wrapping `.py`
tests. Let's just use `.py` tests directly.
Test Plan:
`./script/unit.py`
Make a temporary change to `cdatapack.c` so it fails unconditionally in
open_datapack. Build the repo in different ways: `make local` and
`python2 setup.py build_clib build_ext`. Then run the test by using
`$HG_CREW/tests/run-tests.py -l test-remotefilelog-datapack.py` without the
`hg-dev` environment and make sure it fails with the expected exception.
Differential Revision: https://phab.mercurial-scm.org/D1429
If the os limited a given process to a <200 files open, this test would fail.
Let's change the cache size to be smaller to avoid this.
Also, it turns out the cache size and number of packs created doesn't actually
seem to affect this test. I changed the numbers in a few ways and the test never
failed.
Differential Revision: https://phab.mercurial-scm.org/D1435
Added workers in lfs.
I had to remove the fine progress tracking because between processes in *nix and threads in windows (diffs will appear soon) the tracking of 1MB progress is quite tricky.
With our network tracking progress per file is way enough to see things moving.
This change gives close to 50% speedup on hg sparse --enable-profile when prefetch is run. My current understanding is that prefetch is ran when profile is enabled for the first time.
Test Plan:
Enable profile:
time hg sprase --enable-profile SparseProfiles/TestProfile.sparse
The profile contains 42k files including 9GB of lfs files
On my machine the time improves by 47% while still being dominated by lfs
download time
# Tip: In Git and Mercurial, use a branch like "T123" to automatically associate
# changes with the corresponding task.
Differential Revision: https://phab.mercurial-scm.org/D1424
Add `repack.chainorphansbysize` (default True).
When enabled, we take all orphaned nodes (nodes that are not part of a chain),
and put them into a new chain at the end, so we can get some minimal
compression out of them. Right now, they default to each being stored as
fulltexts, which is wasteful.
We sort the orphan chain by size, descending, to make the largest version
quickest to access, on the assumption that it is probably the newest. (This is
what Git does for its packed data, and it is a decent fallback if ancestry is
not available)
Example chain output, before:
```
A->B C D->E->F G H
```
After:
```
A->B D->E->F G->C->H
(assuming len(G)>=C=>H)
```
(I'm still adding a test case, but the code itself could be reviewed.)
Differential Revision: https://phab.mercurial-scm.org/D1272
We got exception:
unable to load pack ...: [Errno 24] Too many open files
on OS X machine where we think `ulimit -n` is big enough.
Let's add some debugging outputs so we can have more clues about it.
Note: the Python implementation of `mmap.mmap` actually keeps a fd open [1].
So the fix (65c38ccb9835) only reduces fd count from 2 * N to N, but does
not really solve the issue.
We might want to enforce the native code path to work around Python mmap
implementation.
[1]: # Modules/mmapmodule.c
m_obj->fd = dup(fd);
if (m_obj->fd == -1) {
Py_DECREF(m_obj);
PyErr_SetFromErrno(mmap_module_error);
return NULL;
}
Differential Revision: https://phab.mercurial-scm.org/D1420
A very minor change, but we should probably explain that local rebasing is
needed. (You might be forgiven for thinking that `pushrebase` would have
done that for you.)
Differential Revision: https://phab.mercurial-scm.org/D1352
Print which bad characters were found on what line, so that users can
fix the problem just from the hook message.
Differential Revision: https://phab.mercurial-scm.org/D1419
When we changed the treemanifest {manifest} template output it broke the ability
to specifically ask for the node. This is important for tools migrating between
the old and new format. Let's add that back in.
Let's also make tweakdefaults change the '{manifest}' default template for all
repo's, not just tree repos.
Differential Revision: https://phab.mercurial-scm.org/D1418
Summary: The TODO has been addressed and this test should be able to run now.
Test Plan:
- Checked that test host is capable of running the test now.
- Ran all the tests.
Reviewers: #fbhgext, mitrandir
Reviewed By: #fbhgext, mitrandir
Differential Revision: https://phab.mercurial-scm.org/D1369
Treemanifest now has a unified spot to check if it can send trees. Infinitepush
needs to respect that, otherwise we're uploading trees to infinitepush that
might not be readable on other systems.
This is a common technique to store variable-length integers efficiently.
It's compatible with both Thrift and Protobuf [1].
It's intended to be used in:
- On-disk file format to make the file compact and avoid issues like
https://bz.mercurial-scm.org/5681 (Obsolete markers code crashes with
metadata keys/values longer than 255 bytes).
- Thrift layer.
[1]: https://developers.google.com/protocol-buffers/docs/encoding#varints
Test Plan:
```
cargo test
cargo clippy
```
Also ran a kcov coverage check and it says 100%.
```
cargo rustc --lib --profile test -- -Ccodegen-units=1 -Clink-dead-code -Zno-landing-pads
kcov --include-path $PWD/src --verify target/kcov ./target/debug/*-????????????????
```
Differential Revision: https://phab.mercurial-scm.org/D929
This lets you list your currently active profiles, as well as let you discover
new profiles, provided sparse.profile_directory is set.
Includes JSON output. Future revisions can build on this to provide richer
metadata (parsed from the profile files).
Differential Revision: https://phab.mercurial-scm.org/D1250
Occasionally, callers to `hg repack` prefer to skip loose objects and only
repack packfiles. This adds an option to do so.
Differential Revision: https://phab.mercurial-scm.org/D1228
Currently an incremental repack on the server will repack the entire pack files,
and the new parts of the revlogs. The pack files can be very large and can take
a long, long time to run. So let's use the normal incremental pack heuristics to
minimize how often we have to do full repacks.
Differential Revision: https://phab.mercurial-scm.org/D1350
This fixes blackbox.log to not have two messages on the same line. This might be
undesirable if there's some other system using ui.log and this was *expected* to
be creating a single line. In that case, this might instead be a feature request
for blackbox to not insert time/user/node/etc. if it's a consecutive log from
the same 'service'. Currently, the docstring for ui.log says "*msg should be a
newline-terminated format string to log", so this is bringing these uses in
line with that.
Sample blackbox.log without this fix:
2017/11/06 14:41:23 spectral @a659d684cdf40d442d38f1ea65ee618f8b21d4b6 (25545)> remote cache hit rate is 0 of 9 2017/11/06 14:41:23 spectral @a659d684cdf40d442d38f1ea65ee618f8b21d4b6 (25545)> Success2017/11/06 14:45:24 spectral @dcbd198c160cfc8fc6d4a877aa5ed9296f98ee3c (25545)> pythonhook-update: remotefilelog.wcpprefetch finished in 0.00 seconds
We need to pass the metadata store in, but we were passing the content store.
This only worked because currently we never use the metadata store on the
fileserver client for writing.
Summary:
It's failing on our tests macs because they can't have that many files open at
the time
Test Plan: tested on my laptop, fingers crossed
Reviewers: #mercurial, ikostia
Reviewed By: ikostia
Subscribers: mjpieters, medson
Differential Revision: https://phabricator.intern.facebook.com/D6285344
Tasks: T23454758
Signature: 6285344:1510247301:f295431e05836921288c313034864c3ec616b8af
Summary:
To speed up pack lookups (especially when there are lots of packs), we
should maintain an lru ordering of the packs and perform searches in that
order, since it's likely the next entry we search for will be in the same pack
file as the last entry we searched for. This commit achieves the same.
Test Plan:
- Ran all the tests.
- Created ~2k pack files in a large repo.
- Time taken without the cache:
- `hg update b` while at a: ~18 minutes.
- `hg update a` while at b: ~23 seconds.
- Time taken with the cache:
- `hg update b` while at a: ~14 seconds.
- `hg update a` while at b: ~9 seconds.
Reviewers: #fbhgext, durham
Reviewed By: #fbhgext, durham
Subscribers: durham
Differential Revision: https://phab.mercurial-scm.org/D1208
Summary:
createPack had no option to specify the pack directory because of
which it can only create one pack in a directory. This restriction was in place
because we only test the datapack and not the datapackstore during these tests.
This commit makes the method more generic and includes the option to specify
the directory for creating the packs. This would allow for the datapackstore to
be tested while reusing most of the current logic.
Test Plan: Ran all the tests.
Reviewers: #fbhgext, durham
Reviewed By: #fbhgext, durham
Subscribers: durham
Differential Revision: https://phab.mercurial-scm.org/D1325
Advice using `hg uncommit` when a command to prune (like `hg strip`) but keeping the changes (with `--keep` option)
Test Plan:
Run `hg strip -k\--keep` a "'hg uncommit' provides a better UI for undoing commits while keeping the changes" should show up.
Run `hg strip` w/o `--keep` option, "'hg hide' provides a better UI for hiding commits" should be shown
Differential Revision: https://phab.mercurial-scm.org/D1335
When the dirstate got refactored, we lost the check that only logged the
dirstate size if the dictionary was already populated. This caused a regression
in hg bookmark times (since it normally doesn't populate the dirstate map).
Summary:
Add support to hg book -d to delete scratch infinitepush bookmarks
Uses functions from remotenames to rewrite the remotenames cache omitting
the specified scratch bookmarks
Test Plan:
cd ~/facebook-hg-rpms/fb-hgext/tests
source ../../hg-dev
rt test-infinitepush-*.t --extra-config-opt=devel.all-warnings=False
Reviewers: #mercurial, cdown, stash, durham
Differential Revision: https://phabricator.intern.facebook.com/D6221853
Tasks: T22615396
An invalid entry is any entry with a base not in the pack, or whose deltabases
form a cycle.
If there are any entries like that, the output will look like this:
```
(Root):
Node Delta Base Delta Length Blob Size
665a7e7913af e66038a2894e 61 2142
52bd634be310 000000000000 2142 2142
8b5847087ce0 000000000000 2142 2142
960f5acb3e99 edf2ffd7daab 162 2142
b7d7e5aa692e 8b5847087ce0 162 2142
cdcc4d74d667 960f5acb3e99 324 2142
Total: 14652 48920 (70.0% smaller)
Bad entry: 960f5acb3e99 has an unknown deltabase (edf2ffd7daab)
Bad entry: b7d7e5aa692e has an unknown deltabase (edf2ffd7daab)
2 invalid entries
```
Differential Revision: https://phab.mercurial-scm.org/D1271
There is a bug in the upstream bundlerepo implementation that causes it to
infinite loop if the manifestlog isn't backed by a revlog. I've sent a fix
upstream, and this adds a test to cover that case.
The core Mercurial manifest template prints the rev number and the short hash.
Since treemanifest doesn't have rev numbers, this has to change. Let's just have
it print the whole hash, since manifest hashes are usually only ever used by
automation which probably wants the whole hash anyway.
Differential Revision: https://phab.mercurial-scm.org/D1305
If a hybrid repo pulls in a treeonly commit from a treeonly client, it
previously couldn't commit on top of it because it tried to read the flat
manifest. This patch makes it possible for the hybrid repo to make a treeonly
commit if it is committing on top of a treeonly commit (i.e. where the manifest
only exists in the tree store, not in the flat manifest revlog).
This makes it easier for multiple types of repositories to interact, and to flip
back and forth between treeonly and non-treeonly as we migrate.
Differential Revision: https://phab.mercurial-scm.org/D1304
When repacking data, we sort data nodes topologically by ancestry in order to
ensure the best (smallest) delta chain. Unfortunately the history we use to do
this will be whatever history packs the samre repack job chose for its history
repacking portion, which might be comically small and/or irrelevant.
To fix this, select all history packfiles, and pass them to the data packer as
`fullhistory`. Print a debug warning whenever any nodes are missing ancestry.
Differential Revision: https://phab.mercurial-scm.org/D1227
Also print the name of the packfile being inspected, and \itweak newlines a bit.
This simplies debugging several packfiles in tests.
Differential Revision: https://phab.mercurial-scm.org/D1326
The previous solution was incomplete. This solution logs once per run, with two
separate metrics (filestore_ and treestore_), each logging the number of packs
and bytes. I also did some refactoring.
Differential Revision: https://phab.mercurial-scm.org/D1309
When trees are fetched from the server as packs, metadata isn't included, as
it's not supported in the protocol. Fast size information is useful, since
we have access to the fulltext during a repack, add the metadatan then.
This will be needed for sized-based sorting of manifest entries.
Differential Revision: https://phab.mercurial-scm.org/D1255