Summary:
Previously looking up a particular node in a histpack required a bisect to find
the file section, then a linear scan to find the particular node. If you needed
to look up the latest 3000 nodes one by one, that involved 3000 linear scans,
many of which traversed the same nodes over and over.
This patch adds additional index at the end of the current histidx file. In a
future patch, we will change getnodeinfo() to use this index instead of the
linear scan logic.
Test Plan:
Ran the tests. I haven't actually verified that the data in these
indexes is correct. My next patch will add logic that reads these indexes and
will add tests around it. I won't land this until I've confirmed it's correct.
Reviewers: quark, #mercurial, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy, mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4983690
Signature: t1:4983690:1493798877:ae0802b4896b54bf066df9f684d94554855fd35a
Summary:
Previously, we used the length of the index file to determine the upper bounds
of the bisect. In a future patch we'll want to add more data to the end of the
index file, so we need to record how long the index portion of the index is.
This patch adds that information.
Test Plan: Ran the tests.
Reviewers: #mercurial, quark
Reviewed By: quark
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4983682
Signature: t1:4983682:1493693255:57ab9af2030847fedff05b6755113ba8ce0c933b
Summary:
The issue was introduced by the getmeta refactoring. It didn't get caught by
tests because tests didn't trigger freememory.
Test Plan: The fix was verified working on fbsource
Reviewers: #mercurial, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4978835
Signature: t1:4978835:1493689136:296d425cf5d08b807b898c0e8cd881c9207c6359
Summary:
This diffs add a `getmeta` method to all content stores. The cdatapack code is
modified to pass the tests, it needs further change to support `getmeta`.
The datapack format is bumped to v1 from v0. For v1, we append a `metadata`
dict at the end of each revision. The dict is currently used to store revlog
flags and rawsize of raw revlog fulltext. In the future we can put more data
like a second hash etc, without changing API or format again.
This diff focuses on correctness. A datapack caching layer to speed up
`getmeta` will be added later.
Tests are updated since we write new v1 packfile now and the format change
leads to different content and packfile names.
`Makefile`, `ls-l.py` are added to make tests easier to maintain.
Test Plan: Updated existing tests.
Reviewers: #mercurial, rmcelroy, durham
Reviewed By: durham
Subscribers: rmcelroy, mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4903917
Signature: t1:4903917:1493255844:7ef5d487096cd2f78f2aaae672a68d49f33632ee
Summary:
To be able to bump version and change formats, the related constants need to
be moved to individual classes. So a class (ex. datapack) can be subclassed to
handle different formats.
Test Plan: `arc unit`
Reviewers: #mercurial, durham
Reviewed By: durham
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4927284
Signature: t1:4927284:1493152641:e3274dd735d50baf193b7615dd314f4e6cf161f0
Summary: Many operations that did not require access to histpacks were constructing them regardless. By postponing histpack initialization until we require access to it, we save up to 55ms on such operations.
Test Plan:
1. Stat profiler no longer shows ~6.5% time spent in basepack constructor during `hg book`. Also no longer shows up in the `hg show` profile.
2. Timed how much time was spent in basepack constructor, this was up to 55ms (mostly around 35ms).
3. Ran tests, they all pass.
Reviewers: #sourcecontrol, durham, quark, simonfar
Reviewed By: simonfar
Subscribers: simonfar, quark, mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4500999
Signature: t1:4500999:1486502571:cda6cfb7be54eacb341b58fbff209113ea9cf670
Summary:
Treemanifest had a bug where the pack files it created were 400 instead of 444.
This meant people sharing a cache on the same machine couldn't access them. In
most of the remotefilelog code we had set opener.createmode before creating the
pack but we forgot to for treemanifest.
This patch moves the opener creation and createmode setting into the mutable
pack class so we can't screw this up again.
Test Plan:
Tests that wrote to the packs before, now failed and had to be
updated to chmod the packs before writing to them.
Reviewers: #mercurial, rmcelroy
Reviewed By: rmcelroy
Subscribers: rmcelroy, mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4411580
Tasks: 15469140
Signature: t1:4411580:1484321293:9aa78254677548a6dc2270c58cee0ec6f57dd089
Summary:
We have run into some cases where users ended up with empty pack file in their
packs directory. Just log a warning in this case and skip this pack file,
rather than letting the exception propagate up and crashing the command.
Test Plan:
Created empty 0000000000000000000000000000000000000000.histpack and
0000000000000000000000000000000000000000.histidx files in my repository's
hgcache directory, and confirmed that "hg log" now simply warns about them
instead of crashing.
I didn't really test the perftest.py or treemanifest_correctness.py extensions
much. They seem to throw exceptions, and look like they have maybe gotten a
bit stale. I fixed one minor typo, but didn't dig into the other exceptions
too much.
Reviewers: durham
Reviewed By: durham
Subscribers: net-systems-diffs@, yogeshwer, mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4402516
Tasks: 15428659
Signature: t1:4402516:1484155254:96d2980efcec2d735257b08910e1ca437ef1dad6
Summary:
This makes mutablepacks close and abort function accessible from outside the
__exit__ logic. This will be useful later when we tie the lifetime of a
mutablepack to a transaction.
Test Plan: Ran the tests
Reviewers: #mercurial, quark
Reviewed By: quark
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4055730
Signature: t1:4055730:1477059602:ffdfd66e65279ddf3ff43d7c2ee65b00f1fd2600
Summary:
This fixes all the pyflaks and module errors for the main remotefilelog
code base.
Test Plan: ./run-tests.py test-check* test-remotefilelog*
Reviewers: #mercurial, quark
Reviewed By: quark
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D4055537
Signature: t1:4055537:1477049663:ee904d311d17d3659e055e2c109c68c9023cfd1f
Summary:
This api will be used to tell the store that new data has been added and the
store should check disk for new data (like new pack files).
Test Plan:
Used as part of a future diff that generates new tree entries during
hg pull time.
Reviewers: #fastmanifest, ttung
Reviewed By: ttung
Subscribers: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D3837694
Signature: t1:3837694:1473454036:0f1f37064cadc0fe3dabc6ecc6e3f7b3319e3d56
Summary: If someone removes the pack file, getpack will throw an IOError. Catch it and don't bother trying to add the pack to the list of available packs.
Test Plan: pdb.set_trace() in this method, then remove a file while the debugger is halted. continue without a traceback.
Reviewers: #mercurial, quark
Reviewed By: quark
Subscribers: mitrandir
Differential Revision: https://phabricator.intern.facebook.com/D3712310
Tasks: 12660285
Signature: t1:3712310:1471080740:85f2e3f49b09f348ec654362eadf5f1e689b8a19
Summary:
When running large repack operations, the resident size of the process
could become quite large, since we're scanning in entire pack files. Linux/OSX
have api calls for telling the kernel it's ok to release some of that memory,
but those apis are not exposed to python.
So instead, let's unmap and remap the mmap's once a certain amount of data has
been read. I also tried changing the mmap accessors to use the file oriented api
(mmap.read(), mmap.seek(), etc) so we could switch to actual file handles during
repack, but it had a drastic affect on normal performance (repack took 1 hour
instead of a few minutes).
Long term we should move all of this logic to c++ so we can use the more
powerful APIs.
Test Plan:
Did a full repack on a laptop and verified memory capped out at 2GB
instead of exceeding 5GB.
Reviewers: #sourcecontrol, ttung
Differential Revision: https://phabricator.intern.facebook.com/D3545171
Summary:
There was a race condition where if a repack is running and another hg process
launches, the new process will only see the original packs, and not any of the
new packs (even though the source blobs are being deleted from disk by the
repack).
The fix is to allow our pack store to refresh it's list of packs every so often.
In this particular implementation we do it at most every 100ms. A more robust
strategy would be to group key misses and only check for new packs at the end
once we have a list of all the misses, but this would require significant
refactoring to make everything grouped. This case should only ever happen during
repacks, so it should almost never occur more than once during a command, so the
100ms version is probably good enough.
Test Plan:
Ran `hg up && hg pull && sleep 0.2 && hg up master` in a loop with a
break point in the refresh code and caught it executing in a situation where the
background repack had removed the original sources and put them in a new pack.
Verified that it loaded the data from the new pack correctly.
Reviewers: #mercurial, ttung, lcharignon
Reviewed By: lcharignon
Subscribers: lcharignon
Differential Revision: https://phabricator.intern.facebook.com/D3524314
Signature: t1:3524314:1467907680:85be07ad953811000c468852eb0626f4d8b53a13
Summary:
This moves the common logic from datapack and historypack into a common
basepack. At the moment the only common logic is the constructor, which handles
version checking, fanout initialization, and mmap stuff.
Test Plan: Ran the tests
Reviewers: mitrandir, #mercurial, ttung, mjpieters
Reviewed By: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D3306558
Signature: t1:3306558:1463474571:35d3d2e71849b8111e5455da2dd4810725a35523
Summary:
This takes the duplicate logic from datapackstore and historypackstore and moves
it into a common subclass.
Test Plan: Ran the tests
Reviewers: ttung, mitrandir, #mercurial, mjpieters
Reviewed By: mjpieters
Differential Revision: https://phabricator.intern.facebook.com/D3306551
Signature: t1:3306551:1463474958:ddd36a43c2a3cbb34254c2b0153c0f3c5d431edb
Summary:
mutabledatapack and mutablehistorypack share a lot of common code, especially
around the index and fanout table. Let's move much of the code to a common
mutablebasepack class and out of mutabledatapack. In the next patch I will make
mutablehistorypack a subclass of mutablebasepack and delete all the duplicate
logic.
Test Plan: Ran the tests
Reviewers: mitrandir, #mercurial, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: quark, rmcelroy
Differential Revision: https://phabricator.intern.facebook.com/D3306542
Signature: t1:3306542:1463611860:16bc68416c9bbed87748a50f55a3bac7c618fdf1