Now that remotefilelog.revision is implemented using the new contentstore, let's
switch remotefilelog.read to use that instead. This logic is almost identical to
what's in filelog.read
We are refactoring the storage to be behind more abstract APIs. This patch
creates the new store objects on the repo and passes them to the
fileserverclient so it can add itself as a file provider, in the case of misses.
Future patches will refactor the storage logic into a more abstract API. This
patch adds a union store, which will allow us to check both local client storage and
shared cache storage, without exposing the difference at higher levels.
Summary:
When running inside chg, `reposetup` will be called once since `serve` is not
a `norepo` command. Then if the user runs a `norepo` command like `help`,
`runcommand` will receive `repo = None` and error out. Fix it by checking
`repo` explicitly.
Test Plan: Run `chg help` and no exception is thrown.
Reviewers: #sourcecontrol, ttung, durham
Reviewed By: durham
Differential Revision: https://phabricator.fb.com/D3136328
Signature: t1:3136328:1459811387:3b86df9765aa5e20677031d6e9fc4bc3d524efa6
Summary:
Since we added the C code ancestor walk to this function, this python ancestor
walk is completely unnecessary, and can cause significant slow downs if none of
the ancestors are known linknodes (it walks the entire history).
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Differential Revision: https://phabricator.fb.com/D3136150
Summary:
Discovered by `hg log filename` in the hg-committed repo. It seems we missed
a check here.
Test Plan:
Run `hg log filename` in a non-remotefilelog repo with remotefilelog enabled
and make sure "warning: file log can be slow on large repos" is not printed.
Reviewers: #sourcecontrol, ttung, durham
Reviewed By: durham
Differential Revision: https://phabricator.fb.com/D3132523
Signature: t1:3132523:1459801676:bcba3bbcaf1c358ad11e8ad25c0a1d3cc2637a76
Summary: We would like to utilize Martijn's logtoprocess extension to log cache hit rate.
Test Plan: None so far, will update the diff later.
Reviewers: #sourcecontrol, ttung
Differential Revision: https://phabricator.fb.com/D3094765
Summary:
addchangegropfiles doesn't take the pr function as a parameter anymore.
The upstream change https://selenic.com/hg/rev/982e3ef7f5bf
Test Plan: tests are passing now on the release branch
Reviewers: #sourcecontrol, ttung, durham
Reviewed By: durham
Differential Revision: https://phabricator.fb.com/D3107217
Signature: t1:3107217:1459211189:4ece7531aff6043fc3acbfe43e2f471781c25c9d
This allows the client to send a single batch request for all file contents
and then handle the responses as they stream back to the client, which should
improve both running time and the user experience as far as it goes with
progress.
Summary:
We've recently had to dig into two different issues that resulted in broken
files landing in the localcache; one was due to a problem with the data source
for our cacheprocess becoming corrupt and the other was due to a failed write
(ENOSPC) causing a truncated file to be left in the local cache.
It is desirable to perform some lightweight consistency checks before we return
data up to the caller of localcache, but prior to this diff the validation
functionality was coupled to configuring a log file.
Due to the shared nature of the localcache it's not always clear cut where we
want to log localcache consistency issues, so it feels more flexible to
decouple logging from enabling checks.
This diff introduces `remotefilelog.validatecache` as a separate option that
can have three values:
* `off` - no checks are performed
* `on` - checks are performed during read and write
* `strict` - checks are performed during __contains__, read and write
The default is now `on`.
Test Plan: `./run-tests.py --with-hg=../../hg-crew/hg`
Reviewers: #sourcecontrol, ttung
Differential Revision: https://phabricator.fb.com/D2941067
Tasks: 10044183, 9987694
Summary:
The srcrev passed to adjustlinknode can sometimes be None, which causes an
exception. The code that throws the exception was introduced recently as part of
taking advantage of a C fast path.
The fix is to move the srcrev check to be after the None handling.
Test Plan:
I'm not sure how to repro this naturally actually. I tried writing
tests that did rebases of renames, but it didn't trigger. I manually verified
it by using the debugger to insert a None for the srcrev at the beginning of
adjustlinknode
Reviewers: lcharignon, #sourcecontrol, ttung, mitrandir
Reviewed By: mitrandir
Differential Revision: https://phabricator.fb.com/D2944899
Tasks: 10066192
Signature: t1:2944899:1455735567:c8eea240885847061239bf3df0ea59dbbd0e4858
Summary:
I debugged an issue this past week where a set of machines had exhausted the
disk space available on the partition where the local cache was situated. This
particular tier didn't use cacheprocess, only the local cache. There were some
number of truncated files in the local cache.
Inspecting the code here, it looks like we're using atomictempfile incorrectly.
atomictempfile.close() will unconditionally rename the temp file into place,
and we were calling this from a finally handler.
It seems safest to remove the try/finally from around this section of code and
just let the destructor trigger to clean up the temporary file in the error
path, and if we make it through writing the data, then call close and have it
move the file in to place.
Test Plan:
ran the tests. They don't cover this case, but at least I didn't
obviously break anything:
```
$ ./run-tests.py --with-hg=../../hg-crew/hg
...................
# Ran 19 tests, 0 skipped, 0 warned, 0 failed.
```
Reviewers: #sourcecontrol, ttung, mitrandir
Reviewed By: mitrandir
Subscribers: scyost
Differential Revision: https://phabricator.fb.com/D2940861
Tasks: 10044183
Signature: t1:2940861:1455673078:a7593d70c32151e13c8ccc31f92387e9c8cb23a0
Summary:
The adjustlinknode logic was pretty slow, since it did all the ancestry
traversal in python. This patch makes it first use the C fastpath to check if
the provide linknode is correct (which it usually is), before proceeding to the
slow path.
The fastpath can process about 300,000 commits per second, versus the 9,000
commits per second by the slow path.
This cuts 'hg log <file>' down from 5s to 2.5s in situations where the log spans
several hundred thousand commits.
Test Plan:
Ran the tests, and ran hg log <file> on a file with a lot of history
and verified the time gain.
Reviewers: pyd, #sourcecontrol, ttung, quark
Reviewed By: quark
Subscribers: quark
Differential Revision: https://phabricator.fb.com/D2908532
Signature: t1:2908532:1454718666:c4e63d73057572f035082943ef2e6fe0a49238c1
Previously we limited the changelog scan for old commits to the most recent
100,000, under the assumption that most changes would be within that time frame.
This turned out to not be a good assumption, so let's remove the limitation.
For our uses of remotefilelog, life is significantly easier if we also
have the file path rather than just a hash of the file path. Hide this
behind a config knob so users can enable it or not as makes sense.
Summary:
The old linkrev lookup logic depended on the repo containing the latest commit
to have contained that particular version of the file. If the latest version had
been stripped however (like what happens in rebase --abort currently), the
linkrev function would attempt to scan history from the current rev,
trying to find the linkrev node.
If the filectx was not provided with a 'current node', the linkrev function
would return None. This caused certain places to break, like the Mercurial
merge conflict resolution logic (which constructs a filectx using only a
fileid, and no changeid, for the merge ancestor).
The fix is to allow scanning all the latest commits in the repo, looking for the
appropriate linkrev. This is pretty slow (1 second for every 14,000 commits
inspected), but is better than just returning None and crashing.
Test Plan:
Manually repro'd the issue by making a commit, amending it, stripping the
amended version and going back to the original, making two sibling commits on
top of the original, then rebasing sibling 1 onto sibling 2 (so that the
original commit that had the bad linknode data was the ancestor during the
merge). Previously this failed, now it passes. I'd write a test, but it's 11pm
and I'm tired and I need this in by early tomorrow morning to make the cut.
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: trunkagent, rmcelroy
Differential Revision: https://phabricator.fb.com/D2826850
Signature: t1:2826850:1452680293:cb8c1f8c20ce13ad632925137dbdce6e994ab360
Summary:
I somehow got a stacktrace with IPython on a non-remotefilelog repo that ran
this code and complained that fileservice didn't exit. I am not sure how it
happened but let's make the call safer to match the pattern used elsewhere in
the file.
Test Plan: No stacktrace seen after that, one line change
Reviewers: durham
Differential Revision: https://phabricator.fb.com/D2819402
Summary:
Today, people running codemods or search/replace on their repos often accidentally corrupt their repos, and everyone ends up sad.
It's better to make them read-only
Test Plan: python run-tests.py
Reviewers: rmcelroy, #sourcecontrol, durham, ttung
Reviewed By: durham
Subscribers: mitrandir, quark, durham
Differential Revision: https://phabricator.fb.com/D2807369
Tasks: 9431187
Signature: t1:2807369:1452192329:b5ed6606cb66b1c830fc3d3fb5a81e6120387b38
Summary:
In 4fb35d8c2105 in core @durham removed _verify and replaced it with
verify, this patch makes remotefilelog compatible with those changes.
Test Plan: The tests are failing after but don't fail on this anyore
Reviewers: ericsumner
Subscribers: durham
Differential Revision: https://phabricator.fb.com/D2791847
Summary:
Historicaly we would move the old backup data blob to <name>+<int> so we had a
record of all the old data blobs we could search though for good commit
histories.
Since we no longer require that the data blobs have perfect commit histories,
these extra blobs just take up space.
This changes makes us only store one old version (for debugging and recovery
purposes), which should save space on clients.
Also switched to atomic rename writes while we're at it.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Differential Revision: https://phabricator.fb.com/D2770675
The newly added checkunknown prefetching apparently gets handed the full list of
files that are not present on disk right now, which includes all the files
outside of the sparse checkout. So we need to filter those out here.
Summary:
When running addremove, it needs to see the contents of the removed files so it
can determine if they are a remain. So we need to add bulk prefetching in this
situation.
Test Plan: Added a test
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: dcapra
Differential Revision: https://phabricator.fb.com/D2756979
Signature: t1:2756979:1450132279:668b8b160d792cad1ac37e2069716e20ea304f57
Summary:
During hg status Mercurial sometimes needs to look at the size of contents of
the file and compare it to what's in history, which requires the file blob.
This patch causes those files to be batch downloaded before they are compared.
There was a previous attempt at this (see the deleted code), but it only wrapped
the dirstate once at the beginning, so it was lost if the dirstate object was
replaced at any point.
Test Plan: Added a test to verify unknown files require only one fetch.
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Subscribers: dcapra
Differential Revision: https://phabricator.fb.com/D2756768
Signature: t1:2756768:1450130997:7c7101efe66c998e3182dfbd848aa6b1a57d509f
Summary:
When doing an update, Mercurial checks if unknown files on disk match
what's in memory, otherwise it stops the checkout so it doesn't cause data loss.
We need to batch fetch the necessary files from the remotefilelog server for
this operation.
Test Plan: Added a test
Reviewers: #sourcecontrol, ttung, rmcelroy
Reviewed By: rmcelroy
Subscribers: dcapra
Differential Revision: https://phabricator.fb.com/D2756837
Signature: t1:2756837:1450132288:bc0530a07ea40aaeb2af1a93e4da82778cc11369
Summary:
Previously we recreated the ssh connection for each prefetch. In the case where
we were fetching files one by one (like when we forgot to batch request files),
it results in a 1+ second overhead for each fetch.
This changes makes us hold onto the ssh connection and simply issue new requests
along the same connection.
Test Plan:
Some of the tests execute this code path (I know because I saw them
fail when I had bugs)
Reviewers: #sourcecontrol, ttung
Differential Revision: https://phabricator.fb.com/D2744688
localrepo.clone() was removed in hg revision 9996a5eb7344 (localrepo:
remove clone method by hoisting into hg.py, 2015-11-11).
Instead of localrepo.clone(), we now use exchange.pull(). However,
that method was already overridden in onetimeclientsetup(), which is
called from our new overriding of exchange.pull(). Since it should be
done first, we move that overriding from onetimeclientsetup() to
uisetup().
Summary:
There was a race condition where there could be an exception when trying to
create directories that already exist.
Test Plan: Ran the tests
Reviewers: #sourcecontrol, ttung
Differential Revision: https://phabricator.fb.com/D2736268
Summary:
Previously hg gc would try to keep all files relevant to all heads in the repo.
If the repo has a lot of heads, reading the manifest for all of them and
building a massive set of all the files can be extremely slow.
Let's just keep files related to the most recent public heads.
Test Plan: Ran the tests. This improves 'hg gc' time on some repos from 2 hours to 10 minutes.
Reviewers: #sourcecontrol, ttung
Reviewed By: ttung
Differential Revision: https://phabricator.fb.com/D2733157
Signature: t1:2733157:1449558332:14bbea343600959155f5927913552304ab8f94a7
Before this patch, if the repofile was empty or containing bad entries we were
just crashing. This patch prevents the crash by catching the error and displays
some interesting information to debug issues.
If another process deletes files managed by localcache, then, the gc step would
fail. This patch prevents the failure and add interesting information to debug
the problem.
Summary:
Attempting to maintain perfect history in the file blobs has become the most
complex, bug prone, and performance hurting aspect of remotefilelog. Let's just
drop this requirement and rely on upstream Mercurial's ability to fixup linkrevs
in the face of imperfect data.
The real solution for this class of problems is to make it so that the filelog
hashes are unique with respect to the commit that introduces them, but that's a
much harder problem.
Test Plan:
Ran the tests.
Made a commit with 1000 files changes. hg commit went from 15s to 7.5s. The difference will be even more dramatic for certain situations that have known to have caused problems in the past.
Reviewers: #sourcecontrol, pyd
Subscribers: rmcelroy, pyd
Differential Revision: https://phabricator.fb.com/D2686318
Summary:
Previously we would keep all server cache files for any head in the repo, even
if that head was really old. This resulted in unnecessarily large serve caches.
The new strategy is to keep the files necessary for any commit within the past
25,000 revs or so. Even on repo's with large commit rates this equates to
multiple weeks of time.
Test Plan: Ran the tests
Reviewers: #sourcecontrol
Differential Revision: https://phabricator.fb.com/D2652542
Summary:
Previously, hg log -fr master file/ was very slow with remotefilelog because
Mercurial decides whether to take the slowpath (i.e. walk the changelog) or the
filelog path based on if the filelog exists in the repo. remotefilelog has no
way to know if the filelog exists (since there's not a full list of filelogs),
so it fakes it by returning 'True' any time mercurial asks, then when the
filelog is needed, remotefilelog walks the entire changelog to build a fake
looking filelog. Therefore mercurial attempted to take the filelog path, and
remotefilelog did a very slow walk.
The fix is to force mercurial to take the slowpath when it sees 'hg log -fr
revset file'. Technically we could take the fast path by inspecting all the
results of the revset and seeing if the file/pattern exists as a file in any of
those. But that could be expensive and complicated, so this naive fix will
suffice for now.
Test Plan: Added a test. Previously it resulted in no output
Reviewers: cdelahousse, rmcelroy, #sourcecontrol
Differential Revision: https://phabricator.fb.com/D2634918