Summary:
For now, when the indexedlog is enabled, mirror all the writes done by memcache
to the indexedlog. This should give us more testing of both code path.
This will be submitted once a new hg_memcache_client build is published.
Reviewed By: quark-zju
Differential Revision: D15247640
fbshipit-source-id: 74e21731486b991dd9dcf2ce323d34e18cdeb0af
Summary:
The Hyper-based EdenAPI client proved to be problematic in practice due to various difficult to debug issues with Tokio, Hyper, TLS, and h2. We have kept it around for the time being while building out the Curl based client in case we wanted to revert back to a pure Rust solution.
Today, the Curl client works well, and future work on the EdenAPI will involve adding more functionality to it. Given that both the Hyper and Curl clients implement the EdenAPI trait, modifications to the trait involve updating both clients. So far this has been acceptable because the updates have been minor, but we would now like to add substantial new functionality to the trait (namely tree fetching), and adding this functionality to the Hyper client would take nontrivial effort.
Given that we aren't using this client at all, let's just delete it. We can always bring it back from the repo's history if we need it in the future.
Reviewed By: quark-zju
Differential Revision: D15289196
fbshipit-source-id: d9c0c3cfc5087c3e080a9919dd9e627b9657676c
Summary: The edenapi is now independant of the storage type for history data.
Reviewed By: kulshrax
Differential Revision: D15284355
fbshipit-source-id: 72a5db42bb0fb19ee03155b13914202581ab5966
Summary:
The type of store where data is stored is now fully abstracted to the python
bindings. For now, edenapi will write to the pending mutabledatapack, but we
can now switch it easily to any other store implementing MutableDeltaStore,
including an IndexedLogDataStore.
Reviewed By: kulshrax
Differential Revision: D15266191
fbshipit-source-id: 638cf90a567ef170e0302376312c4b82e6d6b6da
Summary:
While closing the repo, hg_memcache_client is terminated, and later, the
pending packfiles are commited. Unfortunately, the commit phase is also when
the content of the pending packfiles will be sent to memcache, and since the
connection is at this point closed, nothing is sent then.
Reviewed By: quark-zju
Differential Revision: D15247639
fbshipit-source-id: 8719cb8bbc3c9571b9cfe250ba4be2b745e4ba15
Summary:
With the mutablestores, this code was starting to be too messy and hard to
follow. Let's simplify it a bit.
Reviewed By: quark-zju
Differential Revision: D15200748
fbshipit-source-id: 2c0cc5a4ff5057f7a9590fcc602afc2b1cc72dcd
Summary: This will let us get a feel for download speeds.
Reviewed By: kulshrax
Differential Revision: D15168153
fbshipit-source-id: 86291b8eb3393d54271d574250291ed691a64a86
Summary: Records how many memcache misses we hit.
Reviewed By: kulshrax
Differential Revision: D15168154
fbshipit-source-id: 53e105999d0af566666fcfaecc5bcaf414c5e804
Summary:
The prefetch code was not refreshing the pack stores after writing
stuff into packs. This meant that in some cases we would execute a prefetch,
then when we went to read an entry from the store it wouldn't be found. That
would trigger a one-by-one download, which would happen to trigger a pack store
refresh (since remotecontentstore had a refresh) which would cause the rest of
the prefetched data to then become visible.
This fixes it to make prefetch responsible for refreshing the pack data.
This also caused issues where prefetching lfs files requires the hg fetching to
get the metadata, then it reads that metadata and does an lfs fetch. Since the
original metadata wasn't visible (since the packs had not refreshed), this
resulted in infinite recursion until the default 100ms refresh time passed.
Differential Revision: D15171701
fbshipit-source-id: 190b1da542675265efaad75a2a6826cbe3c9631c
Summary:
On a memcache miss, let's put the server data into a pending packfile. This
will greatly reduce the number of packfiles being created, and thus reduce the
need to run repack.
Reviewed By: DurhamG
Differential Revision: D15154509
fbshipit-source-id: 661efd7fde4fc4f3f6441eebeaf7d3ff4c43871a
Summary:
The code to handle pending mutable packs was effectively duplicated in 3
places. The new class allows refactoring of all of it. The use of several
lambdas is unfortunate, but required as the repo and the cache path are dynamic
and can't be obtained when constructing the pendingmutablepack object.
Reviewed By: DurhamG
Differential Revision: D15023059
fbshipit-source-id: 1eae68fe66ce741eb36baa0c8db318ba32957b41
Summary:
This diff adds a new progress reporting framework to the Eden API crate and uses it to power progress bars for HTTP file downloads in Mercurial.
The new `ProgressManager` type is designed to aggregate progress values across multiple concurrent HTTP transfers. The API is currently designed to integrate well with libcurl's progress callback API, allowing all of the curl handles within a curl multi session to concurrently report their progress.
This progress can then be reported (in aggregate) to a user-provided callback. In most cases, this callback will be a Rust wrapper around a callback provided by the Python code. The `EdenAPI` trait and FFI bindings have been updated accordingly to allow optionally passing in a callback for long-running operations.
Lastly, in `remotefilelog`'s Python code, the callback is specified as a Python closure that simply updates the progress bar.
Reviewed By: quark-zju
Differential Revision: D15179983
fbshipit-source-id: ee677b71beff730f91aafe0364124f7ea0671387
Summary: Per title, `hg debughttp` now prints out the hostname that the API server reports rather than the hostname in the URL we used to connect to it. The reason for this is that if the API server is behind a VIP, we get the actual hostname rather than just the VIP URL.
Differential Revision: D15170618
fbshipit-source-id: 9af5480f9987d8ea9c914baf3b62a00ad88d1b32
Summary: The packfile code already does the sort, no need to do it twice.
Reviewed By: DurhamG
Differential Revision: D15154517
fbshipit-source-id: 6ecfc65315a7e443e0c385a5a011f1b491b3f6ac
Summary:
The caller specifies whether data or history or both is needed. Let's only
download what is requested to avoid the network transfer and the disk usage.
Reviewed By: DurhamG
Differential Revision: D15151457
fbshipit-source-id: 5f8600e2ef6c45a98e05b17448fcee8d6574fa36
Summary:
After trying to enable `remotefilelog.fetchpacks` on windows, I realized that
the inline repack would always fail to remove the old pack files, leaving
temporary files around. The reason for this is that the packfiles are simply
not being garbage collected, even though `self.packs` is properly being
reinitialized. For some reason, that isn't enough to drop the refcount on the
old `self.packs` and therefore the packfiles aren't being released.
Instead of re-initializing `self.packs`, we can simply clear them, which will
release every one properly. In some cases though, it looks like one of the
packfile still isn't being released properly, I haven't figured out why, but at
least only one temporary file will be left around (until a repack 24h later
removes it).
In doing all of this, I had to duplicate some of the code in refresh to avoid
holding packfiles references in `oldpacks`.
Reviewed By: DurhamG
Differential Revision: D15076018
fbshipit-source-id: f05852c3dddc6a25e1eb13bfd633c1fcc9466bb1
Summary:
I'm getting reports on Eden failed to import data from Mercurial. One of the error message I have seen is `local variable 'rcvd' referenced before assignment` (unfortunately we don't have the backtrace to locate the exact problem), and this is the only place in our codebase has a variable named `rcvd`.
The log contains this error: P62827869$210-217
So if there is any exception raised in `self._connect`, `rcvd` and `total` will not be initialized, and these variables are being used to log the error message in the `except` block:
diffusion/FBS/browse/master/fbcode/scm/hg/edenscm/hgext/remotefilelog/fileserverclient.py;18aa2052b752f1255dd53474d541c4a4177bfef5$732-737
Reviewed By: quark-zju
Differential Revision: D15069162
fbshipit-source-id: 16ec56820107fbbb24d426ce309b38a88d7eae5b
Summary: Release the GIL during data fetching to allow for progress bars to update properly. The data fetching code is pure Rust and does not interact with the Python interpreter at all, so releasing the GIL here is safe.
Differential Revision: D15051852
fbshipit-source-id: 144da953720951f9a30aadfc2b7fc8c8bc6b14aa
Summary:
Once `remotefilelog.fetchpacks` is enabled, `hg gc` will no longer be able to
limit the size of the hgcache. This will be particularly challenging for
Sandcastle/Quicksand as they already see hgcache over 100GB.
The long-term plan is switching to IndexedLog based stores with a log rotate
functionality to control the cache size. In the meantime, we can implement
a basic logic to enforce the size of the hgcache that simply remove packfiles
once the cache is over the configured size.
One complication of this method is that several concurrent Mercurial processes
could be running and accessing the packfiles being removed. In this case, we
can split the packfiles in 2 categories: ones created a while back, and new
ones. Removing packfiles from the first case, lookups will simply raise a
KeyError and data will be re-fetched from Memcache/Mononoke, ie: failure is
acceptable. The second category belongs to ones that were just created by
downloading them from Memcache/Mononoke, and the code strongly assume that they
will stick around. A failure at this point will not be recovered.
One way of fixing this would be to handle these failures properly and simply
retry, the other is to not remove new packfiles. A time of 10 minutes was chosen
to categorize the packfiles.
Reviewed By: quark-zju
Differential Revision: D15014076
fbshipit-source-id: 014eea0251ea3a630aaaa75759cd492271a5c5cd
Summary:
Clean up some of the calls to `ui.log` and how they appear in blackbox logging.
* Make the names of the events consistently use `snake_case`.
* For watchman, only log once for each watchman command. Include whether or not it failed.
* Unify `fsmonitor` logging under the `fsmonitor` event.
* Omit the second argument when it is empty - it is optional and does nothing when empty.
* Increase the number of blackbox lines included in rage to 100.
Reviewed By: quark-zju
Differential Revision: D14949868
fbshipit-source-id: a9aa8251e71ae7ca556c08116f8f7c61ff472218
Summary: Per the title, if we attempt to fetch file data and history over HTTP and the fetch fails, fall back to SSH rather than crashing.
Differential Revision: D15035947
fbshipit-source-id: 2d00a49a51a0c8809daf1d28a6e3ab7f571415b0
Summary: Add a debug option for HTTP data fetching. The intended usage is for it to gate verbose debug messages; the option can be set by Chef for users in the hg_dev tier.
Differential Revision: D15040988
fbshipit-source-id: b7eaa3bab4200e083cffc5822fb9873611725e6b
Summary: Add a new config option to toggle file validation.
Differential Revision: D15034687
fbshipit-source-id: 3783ea1dacad9d1e494a5de1388f703db0ed1129
Summary: We have to pass a lot of config options across the FFI boundary; these are currently passed as arguments to the Eden API client constructor. Let's use argument unpacking to avoid repeating a bunch of argument names in the call to the constructor.
Differential Revision: D15034480
fbshipit-source-id: 74d0830c686c8863fcede6e57404aec3f0a58ea1
Summary:
When eden request a tree, it manually commit the pending mutable pack files. In
the unlikely case where the temporary files are removed from the disk, the
pack.close() operation will fail, since the pending packs aren't reset, the
next commit that happens while the repo object is closed will try again. This
time, it may try to close an already closed packfile, leading to P62634761.
Reviewed By: quark-zju
Differential Revision: D15015632
fbshipit-source-id: 016617334498c0161feed9dcec5ce24df931ad9c
Summary:
The warning isn't that useful, and can actually cause more harm than good, as running `hg prefetch -r .`
can download gigabytes of unnecessary data to the hgcache.
Reviewed By: quark-zju
Differential Revision: D14999458
fbshipit-source-id: b0ff2c2ad0e441622066fac10a5efafe8de588db
Summary:
When the option is set, the gets will be stored in the indexed log instead of
in a packfile. With the Python code now able to read from it, this will allow
us to get more real world testing.
Reviewed By: quark-zju
Differential Revision: D14895509
fbshipit-source-id: d73d49a028f7af199b7a0873551d7b18b047e50c
Summary:
Looking at the Hgerrors scuba table, I noticed that a lot of the sandcastle
machines had repack failures due to "No such file or directory". I'm suspecting
that's due to not having a local store to repack, and therefore listing of
files to repack would fail. Let's verify that the directory is present before
repacking to avoid this issue.
Reviewed By: quark-zju
Differential Revision: D14906503
fbshipit-source-id: 98fbe57310511df4fc9856bf71f836adefb3d855
Summary:
While the Rust code can read/write content out of an indexedlog, the Python
code cannot. For now, all the writes will be done in Rust, and the Python code
will only be able to read from it.
Reviewed By: quark-zju
Differential Revision: D14894330
fbshipit-source-id: 5c1698d31412bc93e93dabb93be106a2ef17d184
Summary:
While not a correctness issue, taking and releasing the repack lock multiple
times per repack makes `hg debugwaitonrepack` unreliable, which in turns makes
test flakey.
Reviewed By: quark-zju
Differential Revision: D14877896
fbshipit-source-id: 682b649f388d19fd51bcf8dd205ac96944039e86
Summary: Make the batch size of data and history requests independently configurable, since data responses are typically much larger than history responses (since the former contains actual file data whereas the latter is only metadata).
Differential Revision: D14859686
fbshipit-source-id: c87c31f3e6611a55ae712e7f0ed9bb392d31a579
Summary: Use the curl multi interface to fetch multiple batches of files or history entries concurrently.
Differential Revision: D14718547
fbshipit-source-id: c5a740c7e9106b719e825540f8182be31a72bae7
Summary:
Currently we skip metadata when constructing data wirepacks.
We need copy metadata in the data packflies, because the client expects the base text to contain copy metadata since the delta base on the server also contains copy metadata.
It is also needed for future file nodes validation.
Differential Revision: D14851678
fbshipit-source-id: 1a3f79dc2565cdb864bee2400d331ae3a7c3751b
Summary: Allow users to configure which HTTP client backend to use for the Eden API via the `edenapi.backend` config option. Valid options are `curl` and `hyper`, with `curl` being the default.
Reviewed By: quark-zju
Differential Revision: D14657871
fbshipit-source-id: 7a9972d2380fbbd5ed62d1accae764dc03ca4c29
Summary:
In most cases, we won't have more packfiles than what is held in the LRU cache,
and therefore, building a set is unecessary. Testing for the length is a O(1)
operation and is therefore a quick test to verify if more packs needs to be
iterated on.
Reviewed By: quark-zju
Differential Revision: D14802940
fbshipit-source-id: 8b39befb27368d474ab71f3eeac2340d8183e70e
Summary:
When many packfiles are present, and searching for missing data, we will
iterate on the packs twice. We can avoid the second iteration by testing if new
packfiles were detected.
Reviewed By: quark-zju
Differential Revision: D14802942
fbshipit-source-id: 624d9a86e65395af3141ea10d42bcfb8ee18db83
Summary:
Building a set is a O(n) operation, and in the cases where refresh is called
often, it starts showing up in profiles. Instead of rebuilding it everytime, we
can simply update it when adding/removing packs.
Reviewed By: quark-zju
Differential Revision: D14802945
fbshipit-source-id: f8967d09d2d7d0cc0d7400b047a587d536315002
Summary:
The repo.ui.configbool was showing in profiles when running `hg log -p`. Since
its value won't change for the duration of the operation, let's cache it.
Reviewed By: quark-zju
Differential Revision: D14802941
fbshipit-source-id: c1a2a764cb4d4a1c0d45d5118bb1e892c10798a6
Summary:
One of the drawback of only using packfiles is the cost of the pathological
cases when one pack file is being downloaded per file. This can happen during a
`hg log -p`, or on Eden when running `cat` on lots of not materialized files.
In these cases, mercurial will become slower and slower as the command is
running.
This is due to mercurial having to search each individual packfile to find
whether some data exist locally, or it needs to be fetched from the network.
This search will iterate over each packfile linearly, and on a miss, a
directory scan will be performed to see if a new packfile is present.
Previous patches attempted to solve this by running a repack when the command
finishes, but in the case of `hg log -p`, or when Eden is requesting the
content of some file, the command could be running for a long time.
As a solution, let's repack while the command is running if we detect that too
many pack files are present.
Reviewed By: quark-zju
Differential Revision: D14740566
fbshipit-source-id: 95ba381bd99e5404e352f799e9053a6375abac0a
Summary:
Loosefiles are quite slow to repack, and this is significantly affecting
laptops as the repack can potentially run for hours when a large amount of
loosefiles are present.
One of the reason for that is that an incremental repack really isn't
incremental for loosefiles as all of them are repacked at the same time.
Instead let's repack only one of the top-level directory at a time.
Reviewed By: quark-zju
Differential Revision: D14690348
fbshipit-source-id: ffba49840302ae0d99e32db410647e83e213fe64
Summary: Previously the debug commands for the Eden API did not check whether HTTP data fetching was enabled before trying to access the client singleton, which would result in a crash if HTTP was disabled. This diff adds explicit checks so that the commands will abort instead of crashing.
Reviewed By: quark-zju
Differential Revision: D14692893
fbshipit-source-id: d36e241d8460dadb555a15c92aca9334ba00f34c
Summary:
With `remotefilelog.fetchpacks`, some pathological cases would end up creating
tons of packfiles and we would thus spend a significant amount of time in
refresh, scanning for new packfiles. Jeremy noted that for every refresh,
stat(2) would be ran on every file, even though only a handful of packfile
would be added.
Differential Revision: D14674093
fbshipit-source-id: df198d208e3f4e1d667e7bdd069a793984683282
Summary:
Now that repacks runs more often, it's easier to trigger a race between repack
deleting packfiles, and another mercurial process listing the packfiles and
trying to open them. If the packfiles are deleted after the directory listing,
all the packfiles will fail to be opened and were mis-reported as corrupted.
Reviewed By: quark-zju
Differential Revision: D14648308
fbshipit-source-id: c3b852f669e28db6f622bde217f339533e094223
Summary:
The extdiff extension uses archive on the workingctx to make snapshots
of certain files. Prefetching fails on this because it sees placeholder file
nodes. Let's just not prefetch in this case at all.
Differential Revision: D14609691
fbshipit-source-id: 2d06bf015a8b10afa16b3b0bcb266afb102d63b9
Summary:
Now that we have perftracing infra, let's trace a bunch of likely
problem spots.
Reviewed By: sfilipco
Differential Revision: D14426367
fbshipit-source-id: 354a241aa9ac5d75d34062a9838d581b4f46746f
Summary:
Most of the repacks are background repacks, and most of the complaints are
coming from laptops users. Thanks to the rust repack, most of time during
repack is now spent in repacking loosefiles. While repacking them is expensive,
testing whether data is in a loosefile and obtaining it is actually pretty
fast. Packfiles have the opposite issue, repacking them is fast, but finding
data in them may be expensive if a lot of them are present.
Based on this, it makes sense to repack packfiles more frequently than
loosefiles. At first, the newly added option will be used to turn-off loosefile
repack for laptop users. A less frequent loosefile repack will be implemented
in a later patch.
Reviewed By: DurhamG
Differential Revision: D14586986
fbshipit-source-id: 5bc5c839cf8d2d78bcc4ffa016bbe3cf1b2ef3f7
Summary: Added new metrics to log loose files size and number during repack. We need it to understand how much better the pack files work in terms of disc and memory usage.
Reviewed By: markbt
Differential Revision: D14544811
fbshipit-source-id: 5a4d894bd5a3358c7e0f93ecc9db5e9f2c2f2372
Summary: Log data about round-trip count, and object count for files, trees, and SSH calls.
Differential Revision: D14515776
fbshipit-source-id: cce416fd7dccdad3c73a9f1751a04ddac0d2c507
Summary:
Not every command requires a valid repo, but when one is used, it is always
properly closed. Hence, let's simply wrap the repo.close method instead of
wrapping around the runcommand function.
Reviewed By: kulshrax
Differential Revision: D14531515
fbshipit-source-id: bcdbe6530c94041c1185b18570846ba609b5f605
Summary:
Reading a line over a pipe involves reading every character of the line
individually. This is extremely inefficient and slow, which cause prefetch to
be overly slow when most of the data isn't in memcache.
Using buffered reads tries to read 4096 bytes at a time, drastically reducing
the cost of reading a missing path/node pair from memcache.
Reviewed By: ikostia
Differential Revision: D14507063
fbshipit-source-id: e0910d7a303e15fe2d3c61fe2739e6c13370058f