Summary:
Previously we could have "Started ..." before "Starting ..."
This diff fixes it.
Reviewed By: krallin
Differential Revision: D20277406
fbshipit-source-id: 3c2f3fa1723c2e0852c6b114592ab7ad90be17ff
Summary:
We saw >10 timeouts on Mononoke side from people who fetch a lot of files from corp network (https://fburl.com/scuba/mononoke_test_perf/qd15c5f1). They all tried to download a lot of files, and they 90 mins timeout limit.
Let's try to download these files in rather large chunks (e.g. 200000 files). That should help with resumability and prevent timeout errors, and make retries smaller.
It adds an overhead for establishing a new connection after every 200000 files. That shouldn't be a big problem, and even if it's a big problem we can just increase remotefilelog.prefetchchunksize.
Reviewed By: xavierd
Differential Revision: D20286499
fbshipit-source-id: 316c2e07a0856731629447627ae65c094c6eecc5
Summary: expose the counters for number of pending imports (blobs, trees, prefetches) to allow use in tooling
Reviewed By: chadaustin
Differential Revision: D20269853
fbshipit-source-id: d2b7e2110520290751699c4a891d41ebd5b374cf
Summary:
On bad network link (such as on VPN), the reliability of the connection to
Mercurial might be fairly flaky. Instead of failing fetching files, let's retry
a bit first, in the hope that the connection will be back by then.
Reviewed By: quark-zju
Differential Revision: D20295255
fbshipit-source-id: d3038f5e4718b521ae4c3f2844f869a04ebb25e3
Summary:
The old code does "read, lock, write", which is unsound because after "lock"
the data just read can be outdated and needs a reload.
Reviewed By: xavierd
Differential Revision: D20306137
fbshipit-source-id: a1c29d5078b2d47ee95cf00db8c1fcbe3447cccf
Summary:
This updates the store_bytes method to chunk incoming data instead of uploading
it as-is. This is unfortunately a bit hacky (but so was the previous
implementation), since it means we have to hash the data before it has gone
through the Filestore's preparation.
That said, one of the invariants of the filestore is that chunk size shouldn't
affect the Content ID (and there is fairly extensive test coverage for this),
so, notionally, this does work.
Performance-wise, it does mean we are hashing the object twice. That actually
was the case before as well anyway (since obtain the ContentId for FileContents
would clone them then hash them).
The upshot of this change is that large files uploaded through unbundle will
actually be chunked (whereas before, they wouldn't be).
Long-term, we should try and delete this method, as it is quite unsavory to
begin with. But, for now, we don't really have a choice since our content
upload path does rely on its existence.
Reviewed By: StanislavGlebik
Differential Revision: D20281937
fbshipit-source-id: 78d584b2f9eea6996dd1d4acbbadc10c9049a408
Summary:
This is a very small struct (2 u64s) that really doesn't need to be passed by
reference. Might as well just pass it by value.
Differential Revision: D20281936
fbshipit-source-id: 2cc64c8ab6e99ee50b2e493eff61ea34d6eb54c1
Summary: separate out the Facebook-specific pieces of the sql_ext crate
Reviewed By: ahornby
Differential Revision: D20218219
fbshipit-source-id: e933c7402b31fcd5c4af78d5e70adafd67e91ecd
Summary:
We've had cases where a git commit goes in that shouldn't be translated
to Mercurial. Let's add an option to skip the commit. Instead of skipping it
entirely (which would require complicated logic to then parent the following
commit on the last converted commit), let's just convert the skipped commit as
an empty commit. This should cover the cases we've encountered so far.
Reviewed By: krallin
Differential Revision: D20261743
fbshipit-source-id: da401863b09c2ac727aae1ceef10a0e8d8f98a7e
Summary:
Context: https://fb.workplace.com/groups/rust.language/permalink/3338940432821215/
This codemod replaces all dependencies on `//common/rust/renamed:tokio-preview` with `fbsource//third-party/rust:tokio-preview` and their uses in Rust code from `tokio_preview::` to `tokio::`.
This does not introduce any collisions with `tokio::` meaning 0.1 tokio because D20235404 previously renamed all of those to `tokio_old::` in crates that depend on both 0.1 and 0.2 tokio.
This is the tokio version of what D20213432 did for futures.
Codemod performed by:
```
rg \
--files-with-matches \
--type-add buck:TARGETS \
--type buck \
--glob '!/experimental' \
--regexp '(_|\b)rust(_|\b)' \
| sed 's,TARGETS$,:,' \
| xargs \
-x \
buck query "labels(srcs, rdeps(%Ss, //common/rust/renamed:tokio-preview, 1))" \
| xargs sed -i 's,\btokio_preview::,tokio::,'
rg \
--files-with-matches \
--type-add buck:TARGETS \
--type buck \
--glob '!/experimental' \
--regexp '(_|\b)rust(_|\b)' \
| xargs sed -i 's,//common/rust/renamed:tokio-preview,fbsource//third-party/rust:tokio-preview,'
```
Reviewed By: k21
Differential Revision: D20236557
fbshipit-source-id: 15068b93a0a944d6249a1d9f63840a4c61c9c1ba
Summary:
I thought the index function could be the bottleneck. However, the Log reading
(xxhash, decoding vlqs) can be much slower for very long entries. Therefore
using bytes as the lag threshold is better. It does leaked the Log
implementation details (how it encodes an entry) to some extend, though.
Reverts D20042045 and D20043116 logically. The lagging calculation is using
the new Index::get_original_meta API, which is easier to verify correctness
(In fact, it seems the old code is wrong - it might skip Index flushes if
sync() is called multiple times without flushing).
This should mitigate an issue where a huge entry (generated by `hg trace`) in
blackbox does not get indexed in time and cause performance regressions.
Reviewed By: DurhamG
Differential Revision: D20286508
fbshipit-source-id: 7cd694b58b95537490047fb1834c16b30d102f18
Summary: This will be used to more reliably detect index lags.
Reviewed By: DurhamG
Differential Revision: D20286518
fbshipit-source-id: c553b6587363a55603b75df12580588e3100e35f
Summary:
This ensures indexes are complete even if index format or definition has been
changed.
Reviewed By: DurhamG
Differential Revision: D20286509
fbshipit-source-id: fcc4ebc616a4501e4b6fd2f1a9826f54f40b99b8
Summary:
This avoids loading all blackbox logs when `init()` gets called multiple times
(for example, once in Rust and once in Python).
Reviewed By: DurhamG
Differential Revision: D20286511
fbshipit-source-id: ef985e454782b787feac90a6249651a882b6552e
Summary: This API has the benefit that it does not trigger loading older logs.
Reviewed By: DurhamG
Differential Revision: D20286512
fbshipit-source-id: 426421691ad1130cdbb2305612d76f18c9f8798c
Summary:
This updates microwave to also support changesets, in addition to filenodes.
Those create a non-trivial amount of SQL load when we warm up the cache (due to
sequential reads), which we can eliminate by loading them through microwave.
They're also a bottleneck when manifests are loaded already.
Note: as part of this, I've updated the Microwave wrapper methods to panic if
we try to access a method that isn't instrumented. Since we'd be running
the Microwave builder in the background, this feels OK (because then we'd find
out if we call them during cache warmup unexpectedly).
Reviewed By: farnz
Differential Revision: D20221463
fbshipit-source-id: 317023677af4180007001fcaccc203681b7c95b7
Summary:
This incorporates microwave into the cache warmup process. See earlier in this
stack for a description of what this does, how it works, and why it's useful.
Reviewed By: ahornby
Differential Revision: D20219904
fbshipit-source-id: 52db74dc83635c5673ffe97cd5ff3e06faba7621
Summary:
With the new crate-public interfaces and Debug implementations it's possible to
write tests for DagSet. So let's do it.
Reviewed By: sfilipco
Differential Revision: D20242561
fbshipit-source-id: 180e04d9535f79471c79c4307f6ab6e8e8815067
Summary: The compiler was complaining about these on Windows.
Reviewed By: quark-zju
Differential Revision: D20250719
fbshipit-source-id: 89405e155875a4a549b243e93ce63cf3f53b1fab
Summary:
Don't restrict constructing a c_api datapack store to only Unix, we can
construct it on Windows too by assuming that their path will be valid UTF-8.
Reviewed By: quark-zju
Differential Revision: D20250718
fbshipit-source-id: 07234b6a71b50c803cfe3b962fa727f57037c919
Summary: This returns the ancestors in the reverser order as the parents method.
Reviewed By: sfilipco
Differential Revision: D20265277
fbshipit-source-id: 83277cee3d8e9070fc56d20d4c1877e6782c22f7
Summary:
adds a counter to track the imports queued to enable more statistics exposure.
- Add a counters to track the number of blob, tree, prefetch imports that are in the pending
- have the counters increment (increment in constructor of wrapper struct) when the import is about to be queued
- have counters decrement once the load has completed (decrement in destructor of wrapper struct)
Reviewed By: chadaustin
Differential Revision: D20256410
fbshipit-source-id: 5536b46307b30fc19dc5747414727a86961c78e1
Summary: Improvements aim to minimize number of db queries
Differential Revision: D20280711
fbshipit-source-id: 6cc06f1ac4ed8db9978e0eee956550fcd16bbe8a
Summary:
Implementation of derivation logic for the changeset info.
BonsaiDerived is implemented for the ChangesetInfo. `derive_from_parents` just derives an info and BonsaiDerivedMapping then puts it into the blobstore.
```
ChangesetInfo::derive(..) -> ChacgesetInfo
```
Reviewed By: krallin
Differential Revision: D20185954
fbshipit-source-id: afe609d1b2711aed7f2740714df6b9417c6fe716
Summary:
Introducing data structures for derived Bonsai changeset info, which is supposed to store all commit metadata except of the file changes.
Bonsai changeset consists of the commit metadata and a set of all the file changes associated with the commit.
Some of the changesets, usually for merge commits, include thousands of file changes. It is not a problem by itself, however in cases where we need to know some information about the commit apart from its hash, we have to fetch the whole changeset. And it can take up to 15-20 seconds
Changeset info as a separate data structure is needed to speed up changeset fetching process: when we need to use commit metadata but not the file changes.
Reviewed By: markbt
Differential Revision: D20139434
fbshipit-source-id: 4faab267304d987b44d56994af9e36b6efabe02a
Summary:
The new API is required for migration Commit Cloud off hg servers and infinitepush database
This also can fix phases issues with `hg cloud sl`.
Reviewed By: markbt
Differential Revision: D20221913
fbshipit-source-id: 67ddceb273b8c6156c67ce5bc7e71d679e8999b6
Summary:
Fix the tail interval delay, it wasn't triggering.
Took the opportunity to structure the code as a loop as well which simplified it a bit.
Reviewed By: markbt
Differential Revision: D20247077
fbshipit-source-id: 1786ef1528a4b0493f5e454d28450d7198af8ad4
Summary:
Remove a failing integration test that was testing behavior we don't really
care about.
My changes in D20210708 made this test start failing. This integration test
was initially added to exercise the code I reverted in D20210708.
This test fails when EdenFS is invoked in the foreground and under sudo. If
you send SIGSTOP to the EdenFS process sudo happens to notice this and send
the same signal to itself too. This results in a state where the `sudo`
command is stopped and is never resumed so it never wakes up to reap its child
EdenFS process when EdenFS exits. The behavior I reverted in D20210708 caused
the edenfsctl CLI code to simply ignore the fact that EdenFS was stuck in a
zombie state, and proceed anyway. This allowed EdenFS to at least restart,
but it left old zombies stuck forever on the system.
This problem is arguably an issue with how sudo operates, and it's sort of
hard for us to work around. To solve the problem you need to send SIGCONT to
the sudo process, but since it is running with root privileges you don't
normally have permission to send a signal to it. It is understandable why
sudo behaves this way, since normally it is desirable for sudo to background
itself when the child is stopped.
In practice this isn't really ever a situation that we care much about
handling. Normal users shouldn't ever get into this situation (they don't run
EdenFS in the foreground, and they generally don't run it under sudo either).
Reviewed By: genevievehelsel
Differential Revision: D20268924
fbshipit-source-id: d61d0a10ee1e132f00dbd2e4dc135808b7c79345
Summary:
D18538145 introduced a transaction that spans the entire infintepush
pull. This has a couple of unfortunate consequences:
1. hg pull --rebase now aborts the entire pull if the rebase hits a conflict,
since it's unable to commit the transaction.
2. If tree prefetching fails, it aborts the entire pull as well.
Tests seem to work fine if we scope down this lock.
Reviewed By: xavierd
Differential Revision: D20260480
fbshipit-source-id: d84228ababdb5572401645f74e78df035bf1461b
Summary: Those will be reused by nameset::DagSet.
Reviewed By: sfilipco
Differential Revision: D20242563
fbshipit-source-id: 944e9a04aeb15439256ecea64355b67e326e5c89
Summary:
This is useful for `assert_eq!(format!("{:?}", set), "...")` tests.
It will be eventually exposed to Python as `__repr__`, similar to Python's
smartsets.
Reviewed By: sfilipco
Differential Revision: D20242562
fbshipit-source-id: 5373bb180db7cafebf273ace7cf2cb80fbfb8038
Summary:
In the Python world all smartsets have some kind of "debug" information. Let's
do something similar in Rust.
Related code is updated so the test is more readable.
Reviewed By: sfilipco
Differential Revision: D20242564
fbshipit-source-id: 7439c93d82d5d037c7167818f4e1125c5a1e513e
Summary:
Replace the methods to get CPU and memory usage statistics:
- For the memory: use `VmRSS` of `/proc/[pid]/status`: http://man7.org/linux/man-pages/man5/proc.5.html
- For the CPU%: calculate the process is occupied how much percentage of the CPU time, use `getrusage()`: http://man7.org/linux/man-pages/man2/getrusage.2.html
- Implemented like the sigar: https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/third-party/sigar/src/sigar.c?commit=4f945812675131ea64cb3d143350b1414f34a351&lines=111-169
- Formula:
- CPU% = `process used time` during the period / `time period` * 100
- `time period` = current query timestamp - last query timestamp
- `process used time` = current `process total time` - last query `process total time`
- `process total time` = CPU time used in user mode + CPU time used in system mode // get from the API `ru_utime` and `ru_stime`
Remove the `fbzmq::ResourceMonitor` and `sigar`:
- Change and rename the UT
- `ResourceMonitorTest.cpp` -> `SystemMetricsTest.cpp`
- `ResourceMonitor` -> `SystemMetricsTest` in `openr/tests/OpenrSystemTest.cpp`
- Remove `ResourceMonitor` code and dependency for `Watchdog` and `ZmqMonitor`
- Remove `sigar` dependency used in building
Reviewed By: saifhhasan
Differential Revision: D20049944
fbshipit-source-id: 00b90c8558dc5f0fb18cc31a09b9666a47b096fe
Summary:
Previously, `flush()` will skip writing the file if there are only metadata
changes. Fix it by detecting metadata changes.
This can potentially fix an issue that certain blackbox indexes are empty,
lagging and require scanning the whole log again and again. In that case,
the index itself is not changed (the root radix entry is not changed), but
only the metadata tracking how many bytes in Log the index covered
changed.
Reviewed By: sfilipco
Differential Revision: D20264627
fbshipit-source-id: 7ee48454a92b5786b847d8b1d738cc38183f7a32
Summary:
On filesystems without symlinks, the test fails because ln prints errors.
Fix the test by using `#if symlink`.
Reviewed By: DurhamG
Differential Revision: D20260904
fbshipit-source-id: 1d0ffcc7e95d2718087fb01297369ca276b59013