Summary: Add ability to track route to node, so that one could report the node from which failing step started from.
Reviewed By: ikostia
Differential Revision: D20097615
fbshipit-source-id: 4f2c000f54bd212225533e7f3570178020f34a9d
Summary:
In case this starts to cause problems, let's have a way to correlate those
problems with some exported metrics.
Reviewed By: StanislavGlebik
Differential Revision: D20158822
fbshipit-source-id: 6ac9e25861dbedaecdf04fd92bda835ae66535eb
Summary:
## Wider goal
See D20068839
## This diff
This diff actually implements the conditional hydration of `getbundle`
responses, as described in the D20068839.
Note that as well as implementing support for hydrated `getbyndle` responses, this diff also implements support for changegroup v3 and lfs in such responses, which is needed if we are to do this kind of stuff in LFS-enabled repository.
Reviewed By: StanislavGlebik
Differential Revision: D20068838
fbshipit-source-id: fbdd3f8f5fb7cd2cb60473a94094553a1d4b4d2f
Summary:
Extend the session id logging to the validate command by adding ability to set
the progress reporters scuba builder.
Reviewed By: ikostia
Differential Revision: D20074153
fbshipit-source-id: ceaeebdb7eb976080061ad3b76b22d7a0f7bd891
Summary: I canaried with this but I forgot to fold it in -_-
Reviewed By: HarveyHunt
Differential Revision: D20158157
fbshipit-source-id: 4a570bbca421d8c3e1e66605f164f2b8e2a433f6
Summary:
Most binaries don't need hooks. Let's not require them. This might not be very
long lived since Simon is working on removing lua hooks, but this was a trivial
fix.
Reviewed By: johansglock
Differential Revision: D20140026
fbshipit-source-id: cc74b37459f63c5dd550c5779b72aa1d6531202c
Summary:
(this doesn't remove ad-hoc leases, like derived data)
Let's see if this has any impact on performance. We no longer fail Manifold
writes on conflicts, and
Reviewed By: StanislavGlebik
Differential Revision: D20038572
fbshipit-source-id: 4a972ff09ceb65e69a1d22a643a8f2d9b2ab1b17
Summary: The Thrift generated code depends only on futures 0.3, not 0.1. Thus it isn't necessary to depend on renamed:futures-preview and we can depend on futures-preview directly, which is exposed to Rust code as `futures::`.
Reviewed By: jsgf
Differential Revision: D20145921
fbshipit-source-id: 5cae94ec6747a374c2bf05f124ab237c798de005
Summary:
The last uses of futures 0.1 were removed in D18411564 and D18392252.
A later diff will switch thrift from using renamed:futures-preview to plain futures-preview to prepare for eliminating the -preview suffix.
Reviewed By: jsgf
Differential Revision: D20143832
fbshipit-source-id: b7fd79f18368ade59eeba6ed0ac09613000c046b
Summary:
Add a `TelemetrySample.fail()` method to report error information in a sample,
even when it is used in a `with` context that completes without an exception.
Also add a `TestTelemetryLogger` to help check telemetry logging behavior in
unit tests.
Reviewed By: genevievehelsel
Differential Revision: D20136170
fbshipit-source-id: ad94d044c7ae0835e3fe17aaa74eb92dfd41bf8e
Summary: It was a list. Make it possible to use it as a string.
Reviewed By: xavierd
Differential Revision: D20144811
fbshipit-source-id: b280c0344215a4c23ab9c63d89f47adf34fb06f3
Summary: This should help reduce test flakiness.
Reviewed By: xavierd
Differential Revision: D19872952
fbshipit-source-id: d66f6c404534b3f47903b478e3cdfdda5ed46284
Summary:
The state entry of a dirstate tuple is a single character. In python 3
it's a unicode string. To parse it, previously we used 'C' which takes a single
character unicode string and (little did I know) returns an int. We were storing
this in a char, which causes corruption.
Let's switch to reading the string, and just grabbing the first byte.
Reviewed By: xavierd
Differential Revision: D20143094
fbshipit-source-id: d9946c0cefdafe0941f4bdac070659fac27f30e3
Summary: Continue to push `compat()` deeper into subcommands. This enables us to refactor each file one at a time and ultimately remove the old futures from our code base.
Reviewed By: farnz
Differential Revision: D20132126
fbshipit-source-id: cc10dde6eda7ddcbf911dbe8d3ebe1713f8ec2ab
Summary: Makes the code a little nicer to work with.
Reviewed By: HarveyHunt
Differential Revision: D20138720
fbshipit-source-id: 19f228782ab3582739e35fddcb2b0bf952110641
Summary:
Paths are in a different replica, so they can be missing even if copy info is
present. Let's fallback to master in this case.
Differential Revision: D20098902
fbshipit-source-id: 838ab1c70a74420c431a2f442f1504c8edd29a2e
Summary:
Locking by physical shard worked earlier in this stack as indicated in the
benchmarks, but after Ondemand restored their fetching for www, it proved
insufficient in terms of parallelism, and resulted in substantially slower
gettreepacks.
Besides, with the "physical sharding" approach, we found ourselves between a rock and a hard place in terms of what to do with paths:
- We could keep holding the semaphore for a filenode while fetching paths. This is undesirable because it further limits our level our concurrency (because fetching a filenode + paths is going to be at least 2x as slow as fetching a filenode).
- We could fetch them without holding a lease at all. This is even more undesirable, because it means that when we release the semaphore for a given shard, we haven't filled the cache yet. This means that if we have a queue of 2 requests for the same bit of data, we're going to fetch twice (task A acquires the lock, goes to MySQL for the filenode, releases the lock and starts going to paths, at which point task B acquires the lock and goes to MySQL again since the filenode hasn't been filled yet).
To fix this, I had to add a dedicated cache for paths, and put it behind semaphores as well. In the example above, this would ensure task B finds a "partial filenode" in the cache and doesn't go to MySQL (instead, it goes straight up to queuing for access to paths, where it will wait behind task A and also won't hit MySQL).
There are a few problems with this:
- It's a lot of extra complexity (because we need to handle half misses where we have the filenode but not the path).
- It ties together our level of concurrency a second time to that of the underlying number of physical shards, which is kinda meaningless when some of this data can be provided by Memcache to begin with.
This diff fixes both problems.
The root cause of our problem that is that we're tying our level of concurrency to physical
MySQL shards, whereas what we actually want is a tunable level of concurrency
that matches our work load, yet effectively deduplicates queries.
In this diff, I'm updating our exclusive locking to be purely virtual. This
means that we're still not over-fetching, but we are no longer constrained by
the parallelism of the underlying DB (this does mean we might queue up requests
there, but they won't be duplicate requests).
This also results in simpler code, and opens up the way for further
improvements in the future, such as using Memcache lease-get operations to
further deduplicate calls, if we'd like.
As part of that, I've also updated our remote_cache to use the same CacheKey
entity as the local cache, to avoid spending time producing new keys when we
have perfectly good ones available.
Reviewed By: StanislavGlebik
Differential Revision: D20097821
fbshipit-source-id: 03d7be9082982fc1c6ef365d541c1ed8ae3e6e8d
Summary:
This adds a test for our cache fill behavior, which is to fill the remote cache
if we miss in local cache. I hadn't added this later and it's a little easier
to add now that the refactor for FilenodeInfo is through.
Reviewed By: ahornby
Differential Revision: D19905396
fbshipit-source-id: 88b5fd83f5d2213e91efc3c5dfb91dfe4e395136
Summary:
This updates our filenodes implementation to use different types for writing
(`PreparedFilenode`) and reading `(FilenodeInfo`).
The bottom line is that this avoids a bunch of cloning of paths on the read
path, which doesn't need to return the path to the caller, since the caller
already knows it! We can also take it out of Memcache, since we don't need
Memcache to tell us the path for a blob we could only possibly have found by
having the path to begin with.
This does update our filenodes serialization format. I bumped MC_CODEVER
accordingly.
Reviewed By: StanislavGlebik
Differential Revision: D19905400
fbshipit-source-id: 6037802c1773de564cade8e264d36087382ee15a
Summary:
This removes the old sqlfilenodes implementation, since we're now using the new
one. There's also a bit of cruft here and there we can get rid of.
Reviewed By: StanislavGlebik
Differential Revision: D19905395
fbshipit-source-id: 2526b6d65eeb981f5aedda9951b44b389ecec29d
Summary:
The former implementation would eagerly query Memcache when fetching history
(due to how old futures work) for files in getpack, but the new one does not.
This means the new one loses out on a lot of buffering, which the old one used
to do.
This diff emulates the old behavior by eagerly querying filenodes in getpack,
which improves performance on a very big getpack (32K files) by about 3x, and
makes it 30% faster than the old code, instead of > 2x slower.
Note that I'm not certain we really want to do this kind of aggressive
buffering in getpack long term, but for now, I'd like to keep this unchanged.
Reviewed By: StanislavGlebik
Differential Revision: D19905398
fbshipit-source-id: 49f9a2cd505a98123fd1dabb835e8e378d45c930
Summary:
This updates Mononoke to use the new filenodes implementation introduced
earlier in this stack.
See the test plan for detailed performance results supporting why I'm making
this change.
Reviewed By: StanislavGlebik
Differential Revision: D19905394
fbshipit-source-id: 8370fd30c9cfd075c3527b9220e4cf4f604705ae
Summary:
Since we have one connection per shard, it's a good idea to make sure we don't
keep those locked for too long. This diffs adds generous timeouts to protect
against this, as well as ODS reporting to track errors.
Reviewed By: StanislavGlebik
Differential Revision: D19905393
fbshipit-source-id: ee4f4d3e33cf48a9002b016e31d37a401c6578f2
Summary:
This introduces caching of filenodes to Memcache as in the old filenodes
implementation. The code is mostly was ported over from the existing filenodes
implementation, and converted to async / await. However, one key difference is
that the lookups happen once we hold the semaphore to talk to the underlying
MySQL shard.
The reason for this is:
- Reads to Memcache are really fast. They're often under 1ms. If you're going
to miss in Memcache and have to go to SQL, it won't make you much slower.
- Reads to Memcache are kinda expensive CPU-wise. Data in Memcache is
compressed, and we often see a lot of our CPU cycles spent talking to Memache
when we're under load.
- Memcache isn't an infinite resource. If we're reading the exact same
key a hundred times, that's going to hit the same Memcache box. A bit of
deduplication on our end is a nice thing to strive for. Besides, our own
thread pool we use to talk to Memcache is limited in size.
From a performance perspective, this doesn't make things any slower, but
reduces CPU usage when we'd otherwise have a lot of duplicate fetching.
Finally, note that this update also includes support for dirty-tracking in our
local cache. We use this to know if we should fill the remote cache (if we 100%
hit in local cache, we don't fill the remote cache).
Reviewed By: StanislavGlebik
Differential Revision: D19905390
fbshipit-source-id: 363f638bb24cf488c7cd3a8ecea43e93f8391d3f
Summary:
This is the meat of the change I'm trying to make here. This updates
newfilenodes to check their cache before dispatching queries to MySQL once they
acquire the connection.
Since we only get one connection per shard, this ensures that we don't query
several times for the same piece of data.
Note that the caching structure is a little different from the old one, which
cached entire filenode info. Instead, this now caches the exact data we'd get
out of MySQL, since we want to map MySQL queries 1-1 to cache lookups.
With this change, we also now have a local cache for file history queries.
Historically, we hadn't cached those at all, but with this change, we can get a
lot of value of caching them even for small period of time in order to
de-amplify reads to MySQL and Memcache.
However, they are in separate cache pools to make sure they don't evict point
filenodes, which we use for gettreepack (and have a good hit rate, unlike
history blocks, which have a pretty poor hit rate).
Note that having those semaphored connections might feel a little scary, but
it's worth noting that the exact same bottleneck is implicitly present in the
existing filenodes implementation, since we can only have one active query to
any given shard a given time. That said, this approach also gives us a little
more future flexibility, if we'd like, since we could map multiple semaphores to
"sub shards" that map N-to-1 to real, physical shards.
Reviewed By: HarveyHunt
Differential Revision: D19905391
fbshipit-source-id: 02b5efaa44789e6afcccdeb9ee2b4791f7c3c824
Summary:
This introduces a new implementation of filenodes that maintains its own
queuing on top of the queuing enforced by the SQL crate.
Later in this stack, the goal is for this implementation to avoid dispatching
duplicate queries when there is a lot of contention talking to MySQL, which
happens when large changes land and suddenly everyone wants the updated code.
The underlying goal is to avoid dispatching a lot of duplicate queries when
there is contention. Indeed, if there is contention, then the latency between
query and response increases. As a result, without visibility in the queue, the
following can happen:
- Task 1 looks for A in the cache. It misses
- Task 1 dispatches a SQL query
- Task 2 looks for A in the cache. It misses
- Task 2 dispatches a SQL query
- Task 3 looks for A in the cache. It misses
- Task 3 dispatches a SQL query
- ...
- Task 1's SQL query finally executes and fills the cache.
- All other queries execute anyway.
The longer the dispatch queue, the longer it takes to run those queries.
Looking at Mononoke's stats in prod, this happens pretty often:
https://pxl.cl/10xxmo (the spike at 3pm was a 10K-files change in fbsource, for
example).
The goal of this stack is to avoid this effect, by checking the cache only once
we know we're ready to go to SQL.
In this particular diff, what's added is:
- The SQL read and write implementation. This is all implemented using new
futures, but the logic should be largely unchanged from before (i.e. we store
filenodes and their associated copy info in shards by the filenode's path —
not the source path if there is copy info —, and paths in their own shard).
The queries themselves largely unchanged from the existing filenodes, with
only a few tweaks:
- Filenodes and copy info are now selected in one go.
- There are types to distinguish path hashes and paths.
- The structs to support this implementation.
Reviewed By: StanislavGlebik
Differential Revision: D19905397
fbshipit-source-id: bec981e7bfb396d62eb06e5ce249c21555afc64b
Summary:
The API expects a stream of filenodes to insert, but we actually never used
that ability. Instead, every single callsites has a `Vec`, which it converts to
a stream and passes that in.
I'd like to change this for two reasons:
- It's un-necessary
- It makes the code more complex on the Filenodes implementation side, and less
efficient, since we need to `chunk()` there in small chunks, which might not
all be in the same shard. If we get the entire `Vec` at once, we can chunk on a
per-shard basis (this happens later in this stack).
Besides, if we end up having a stream and wanting the old behavior, we can
always call `chunk()` the stream and call `add_filenodes` on each batch (which
is actually nicer because if you have a futures 0.2 stream that isn't static,
you can do this, but you can't turn it into a `BoxStream`!).
Reviewed By: StanislavGlebik
Differential Revision: D19902537
fbshipit-source-id: a4c030c4a51afbb6e9db133b32464009eed197af
Summary:
This new method returns the content of a blob without the copy-from metadata
header.
Reviewed By: DurhamG
Differential Revision: D20102889
fbshipit-source-id: e96f636b7d30460b59707a2cb700d667e616116a
Summary:
Python json produces unicode strings in the parsed results. This breaks
when passed to parts of the code that now assert that byte strings are required
(like the wire protocol). Let's switch phabricator stuff to use Mercurial json,
which produces bytes in Python 2 and unicode in Python 3.
Reviewed By: ikostia
Differential Revision: D20123140
fbshipit-source-id: d1b11426736a0f43ff7e74acf709ab1fd70d5bfe
Summary:
Log a per-run session id to distinguish runs more easily.
This diff adds the session for scrub logging , following one extends this to validate/progress logging.
So that each tail has a separate session logged, setup is delayed until the start of each tail by passing it in as a function.
Differential Revision: D19907398
fbshipit-source-id: 8e5470918112321866c67c9f94e703fd46e6a16b
Summary:
The Bytes 0.5 update left us in a somewhat undesirable position where every
access to our blobstore incurs an extra copy whenever we fetch data out of our
cache (by turning it from Bytes 0.5 into Bytes 0.4) — we also have quite a few
place where we convert in one direction then immediately into the other.
Internally, we can start using Bytes 0.5 now. For example, this is useful when
pulling data out of our blobstore and deserializing as Thrift (or conversely,
when serializing and putting it into our blobstore).
However, when we interface with Tokio (i.e. decoders & encoders), we still have
to use Bytes 0.4. So, when needed, we convert our Bytes 0.5 to 0.4 there.
The tradeoff idea is that we deal with more bytes internally than we end up
sending to clients, so doing the Bytes conversion closer to the point of
sending data to clients means less copies.
We can also start removing those once we migrate to Tokio 0.2 (and newer
versions of Hyper for HTTP services).
Changes that were required:
- You can't extend new bytes (because that implicitly copies). You need to use
BytesMut instead, which I did where that was necessary (I also added calls in
the Filestore to do that efficiently).
- You can't create bytes from a `&'a [u8]`, unless `'a` is `'static`. You need
to use `copy_from_slice` instead.
- `slice_to` and `slice_from` have been replaced by a `slice()` function that
takes ranges.
Reviewed By: StanislavGlebik
Differential Revision: D20121350
fbshipit-source-id: eb31af2051fd8c9d31c69b502e2f6f1ce2190cb1
Summary:
Nearly all of the Mononoke SQL stores are instantiated once per repo but they don't store the `RepositoryId` anywhere so every method takes it as argument. And because providing the repo_id on every call is not ergonomical we tend to add methods to blob_repo that just call the right method with the right repo_id in on of the underlying stores (see `get_bonsai_from_globalrev` on blobrepo for example).
Because my reviewers [pushed back](https://our.intern.facebook.com/intern/diff/D19972871/?transaction_id=196961774880671&dest_fbid=1282141621983439) when I've tried to do the same for bonsai_git_mapping I've decided to make it right by adding the repo_id to the BonsaiGitMapping.
Reviewed By: krallin
Differential Revision: D20029485
fbshipit-source-id: 7585c3bf9cc8fa3cbe59ab1e87938f567c09278a
Summary:
The NameSet is something similar to SpanSet and Mercurial's smartset but speaks
VertexNames instead of Ids. The idea is, NameSet will be part of NameDag APIs,
and potentially replace Mercurial's smartset layer (just smartset the container
types, not the revset language), in a way that revision numbers are completely
hidden behind the scenes.
This diff adds some basic abstraction around iteration-related operations.
Other operations will be added later.
Reviewed By: sfilipco
Differential Revision: D19912109
fbshipit-source-id: 504a26c074282ec51f260535ca63e943124f688e
Summary: EdenFS is planning on throwing an error if a user requests a checkout while a checkout is already in progress. Often, this is already disallowed by a mercurial repository lock, but there are instances where these calls can still get through. We would like to disallow these calls to queue, so we will throw an `EdenError` instead. Without this handling, a full stack trace prints, so this just makes it a bit prettier for the user.
Reviewed By: simpkins
Differential Revision: D20106480
fbshipit-source-id: e33df3d0b7aa42867ee752e4c1f3a47b31ade76b
Summary:
The ssh output order issue is a large contributor to test flakiness.
Example test failures are:
```
--- test-unbundlereplay.t
+++ test-unbundlereplay.t.respondfully.err
@@ -154,9 +154,9 @@
remote: [ReplayVerification] Expected: (master_bookmark, c2e526aacb5100b7c1ddb9b711d2e012e6c
69cda). Actual: (master_bookmark, 893d83f11bf81ce2b895a93d51638d4049d56ce2)
remote: pushkey-abort: prepushkey hook exited with status 1
remote: transaction abort!
+ replay failed: error:pushkey
+ unbundle replay batch item #0 failed
remote: rollback completed
- replay failed: error:pushkey
- unbundle replay batch item #0 failed
[1]
$ cat $TESTTMP/reports.txt
unbundle replay batch item #0 failed
--- test-commitcloud-backup-all.t
+++ test-commitcloud-backup-all.t.err
@@ -59,9 +59,9 @@
remote: pushing 1 commit:
remote: eccc11f58a56 D3
backing up stack rooted at 42952ab62cec
+ backing up stack rooted at 4903fdffd9c6
remote: pushing 1 commit:
remote: 42952ab62cec E1
- backing up stack rooted at 4903fdffd9c6
remote: pushing 1 commit:
remote: 4903fdffd9c6 E2
commitcloud: backed up 8 commits
test-fb-hgext-lfspushrebase-verify-blobs.t
--- test-fb-hgext-treemanifest-pushrebase.t
+++ test-fb-hgext-treemanifest-pushrebase.t.err
@@ -127,9 +127,9 @@
$ hg push --to master -B master --config treemanifest.sendtrees=True
pushing to ssh://user@dummy/master
searching for changes
- remote: baz
remote: prepushrebase.cat hook exited with status 1
abort: push failed on remote
+ remote: baz
[255]
- Disable the hook
```
The order is nondeterministic because the stderr reading thread can read the
content before or after ui.write or ui.write_err in the main thread.
This diff introduces an optional feature in dummyssh that buffers all stderr
output and only write them after the wrapped hg serve process has exited, at
which time the hg client should also have completed its operations and has no
reason to ui.write or ui.write_err anything nondeterministically. Then the
dummyssh wrapper writes out the buffered stderr so the output order becomes
well defined.
Reviewed By: xavierd
Differential Revision: D19872612
fbshipit-source-id: 84710f98a8e6b4a1c283ffecf008585cca12be0a
Summary: This makes the next change easier to see.
Reviewed By: xavierd
Differential Revision: D19872609
fbshipit-source-id: 9263a246258ffd18d8d883da7ced435a91fb5ced
Summary:
## Wider goal
See D20068839
## This diff
Asyncifying only singatures allows us to independently work on function bodies, without touching the callsites later in the diff.
Reviewed By: StanislavGlebik
Differential Revision: D20097804
fbshipit-source-id: f1391a055947c7802f719bc99b9eae71a4ac39cd
Summary:
## Wider goal
See D20068839
## This diff
This file contains a mix of old and new-style futures. It even has futures,
which have items composed of futures. To be able to convert on one of the
levels and not the other, we need to deal with the confusion.
Let's have old things have `Old` in the name.
Reviewed By: StanislavGlebik
Differential Revision: D20097803
fbshipit-source-id: fedb3669ef34a8328ec389a30ff2c512ab363818
Summary:
## Wider goal
We want the flexibility to return hydrated responses for `getbundle` wireproto
requests for draft commits. This means that the responses will contain not
only the commit data (as they do now), but also trees and files.
For context, when an "unhydrated" response is returned for the `getbundle`
request for a draft commit, we expect one of two things to happen later
in the e2e scenario:
- either `hg` client would immediately make another wireproto request
(`gettreepack`, `getpackv1`) within the same client `hg` command execution
- or a subsequent `hg update` call will cause another wireproto request
In any case, another request is needed before the pulled commit can be used.
This request can hit a different server, sometimes it can even be Mercurial
instead of Mononoke. Specifically, it can Mercurial instead of Mononoke if the
`fallback` path markers are configured incorrectly. In that case we have a
problem, as Mercurial is incapable of serving `gettreepack` or `getpackv1` for
infinitepush commits.
One way to deal with this is to always have correct path markers, which is
prone to human mistakes. Another way is to guarantee that Mononoke returns
everything in the original `getbundle` request. We don't want to do this for
public commits, as `pull`s of public commits typically fetch thousands of those
commits and never care about tree or file data for all but one of them. Draft
commits are different however, as they are usually exactly what the client
intends to use, so hydrating those is fine. Still, we want this behavior to
be gated behind a config flag.
## This diff
A lot of the needed code is already implemented in the hg-sync job, bundle
generating variant. So prior to implementing the actual behavior described
above, let's move the relevant bits to `getbundle_response`. Later we can comb
them up a bit (asyncify) and use to implement the needed behavior.
Reviewed By: StanislavGlebik
Differential Revision: D20068839
fbshipit-source-id: 0ab63d57b2d167401b7ee8864fe7760f5f65f8ec
Summary:
This is the moral equivalent of D20115877 in fbcode. See that diff for
motivation.
Reviewed By: StanislavGlebik
Differential Revision: D20118575
fbshipit-source-id: 8f77f572068e611003b1344be3434f2d04ec56ca
Summary:
Previously it was hard to tell whether the process were actually responsible
for generating derived data or it was just waiting for it to be generated.
Let's make this distinction clearer.
Reviewed By: johansglock
Differential Revision: D20138284
fbshipit-source-id: 52ae12679db2f61869f048baf2a603b456710a71
Summary:
Add `__enter__()` and `__exit__()` methods to `TelemetrySample` so it can be
used in `with` statements. It will automatically track the runtime for the
body of the `with` context, and will record this in the `duration` field of
the sample. It will also set the `success` field to True if the context exis
normally and False if it exits due to an exception. On an exception the
`error` field will also be populated with the exception message.
Reviewed By: genevievehelsel
Differential Revision: D20112723
fbshipit-source-id: d55ac3f1b53c23dc001f92a4f8eae431db8954e1
Summary:
Add a TelemetryLogger class that logs directly to scuba, and use that if we
are building in a Facebook environment.
Reviewed By: genevievehelsel
Differential Revision: D20112727
fbshipit-source-id: 284ca45d1902d51b753ff9a90debf3dfa8282f82
Summary:
Add a `TelemetryLogger` class that abstracts the mechanism we use to log
telemetry samples. This makes it possible to plug in alternative
implementations.
This includes 3 initial implementations of this class:
* `ExternalTelemetryLogger` logs samples by calling an external command
* `LocalTelemetryLogger` logs JSON samples to a local file
* `NullTelemetryLogger` simply discards all samples
This also moves some of the helper code for constructing telemetry samples
from the `EdenInstance` class and into `TelemetryLogger`.
Reviewed By: genevievehelsel
Differential Revision: D20112725
fbshipit-source-id: dbe24952a92fe548631fc169f146cc14008a7bb6
Summary:
Update the thrift `getDaemonInfo()` call to also return the fb303 status.
This allows the CLI to make a single thrift call instead of 2 when checking if
the EdenFS daemon is healthy.
Reviewed By: genevievehelsel
Differential Revision: D20130406
fbshipit-source-id: 9d25341e1d5f82fb1a921e1d7b1ebd34bcf19dc8
Summary:
Fix the `check_health()` function to always set a timeout when querying for
EdenFS's health. Originally we used to always set a default timeout of 60
seconds when creating thrift connections to EdenFS, but this was removed in
D5942205. In practice we ideally really want a handful of specific thrift
calls (e.g., 'checkOutRevision()`, `getScmStatusV2()`) to have extremely high
timeouts, but most other calls should have fairly short timeouts.
For now this ensures that we apply a 3 second timeout by default when checking
for EdenFS health. The `edenfsctl status` call did explicitly set a 15 second
timeout, but other commands like `edenfsctl clone` and `edenfsctl restart`
would also check for health and were not applying their own timeout.
Also add thrift timeout for the `initiateShutdown()` call when doing a full
restart in `edenfsctl restart`
Reviewed By: chadaustin
Differential Revision: D20130405
fbshipit-source-id: c59118dbcafc2ed0d29206e33891f1a58da8c05f
Summary:
Right now, all of our manifest parsing and evaluation is in the repo() class, but this is a design mistake. Over a repo's convert lifetime, a single repo will have many different manifests, based on branch, and location in the commit history. What's worse is that the current design makes it hard to build unit tests and new features like include evaluation.
This commit creates a whole new class called repomanifest, that represents a specific manifest (and its included files). It also has unit tests to test the various operations that the manifest performs, such as path and revision mapping. This commit does not modify the existing converter code outside of the class to use this new implementation.
Reviewed By: tchebb
Differential Revision: D19402995
fbshipit-source-id: b97dadcc595c6332f4495460618317194873a780
Summary:
In the past I saw test breakages where the stderr from the remote ssh process
becomes incomplete. It's hard to reproduce by running the tests directly.
But inserting a sleep in the background stderr thread exposes it trivially:
```
# sshpeer.py:class threadedstderr
def run(self):
# type: () -> None
while not self._stop:
buf = self._stderr.readline()
+ import time
+ time.sleep(5)
if len(buf) == 0:
break
```
Example test breakage:
```
--- a/test-commitcloud-sync.t
+++ b/test-commitcloud-sync.t.err
@@ -167,8 +167,7 @@ Make a commit in the first client, and sync it
$ hg cloud sync
commitcloud: synchronizing 'server' with 'user/test/default'
backing up stack rooted at fa5d62c46fd7
remote: pushing 1 commit:
- remote: fa5d62c46fd7 commit1
commitcloud: commits synchronized
finished in * (glob)
....
```
Upon investigation it's caused by 2 factors:
- The connection pool calls pipee.close() before pipeo.close(), to workaround
an issue that I suspect solved by D19794281.
- The new threaded stderr (pipee)'s close() method does not actually closes the
pipe immediately. Instead, it limits the text to read to one more line at
most, which causes those incomplete messages.
This diff made the following changes:
- Remove the `pipee.close` workaround in connectionpool.
- Remove `pipee.close`. Embed it in `pipee.join` to prevent misuses.
- Add detailed comments in sshpeer.py for the subtle behaviors.
Reviewed By: xavierd
Differential Revision: D19872610
fbshipit-source-id: 4b61ef8f9db81c6c347ac4a634e41dec544c05d0
Summary:
This makes `peer.close()` actually close the ssh connection if it's an
sshpeer. This affects the `clone` path to actually clean up the ssh connection
so we don't depend on (fragile) `__del__`.
I traced the code back to peerrepository.close in 2011 [1]. At that time it
seems the codebase depends on `__del__`. Nowadays the codebase calls `close()`
properly so I think it's reasonable to make the change.
[1]: https://www.mercurial-scm.org/repo/hg/rev/d747774ca9da.
Reviewed By: ikostia
Differential Revision: D19911393
fbshipit-source-id: ea640d1cd82ffcb786e22f47da8116c7f50a4690
Summary:
The added function can be used by extensions to run extra logic before the
"clone" function closes the repos or peers.
This is needed to make the next diff work. Otherwise extensions like remotenames will try to write to a closed sshpeer and cause errors.
Reviewed By: DurhamG
Differential Revision: D19911390
fbshipit-source-id: ca1364e808cebb632e051fbbdcfe4bf0dca721bc
Summary:
These comments end up being a source of churn as we roll out D20125635, and anyway are not particularly meaningful after the transformations performed by autocargo. For example:
```
bytes = { version = "0.4", features = ["serde"] } # todo: remove
```
^ This doesn't mean the generated Cargo.toml intends to drop its bytes dependency altogether, but just that will be migrated to a different version that is present in the third-party/rust/Cargo.toml but not visible in the generated Cargo.toml.
Reviewed By: jsgf
Differential Revision: D20128612
fbshipit-source-id: a9e7b29ddc4b26bc47a626dd73bdaa4771ee7b18
Summary:
Since Mononoke's filenodes were migrated to derived data framework
hg_linknode_populated alarm has been firing. The main reason was that there's
now a delay between hg changeset being generated and filenodes being generated.
This diff fixes it by making sure walker won't visit hg changesets without
generated filenodes (note that walker will visit these changesets later after filenodes will be
generated).
Reviewed By: ahornby
Differential Revision: D20067615
fbshipit-source-id: 285e9a3d8c89b85441491c889a8458c86ca0e3a8
Summary:
Update the `print_status()` function to take a `clidispatch::io::IO` object as
a parameter, instead of a simple output object. This will allow us to also
print error messages from this function in a future diff.
Reviewed By: quark-zju
Differential Revision: D19958504
fbshipit-source-id: bf482fdc4420e1350363a730c6a539cd760aef25
Summary: Updates the C code to support unicode filenames and states.
Reviewed By: simpkins
Differential Revision: D19786275
fbshipit-source-id: e7aeb029b792818b1b1a9c5d3028640b56522235
Summary: There is no need to open a transaction otherwise.
Reviewed By: DurhamG
Differential Revision: D20109840
fbshipit-source-id: e47adaaeea2d7565f3629701d8de4a67d4b55182
Summary:
Verifying the changelog is quite slow and we've had more users needing
to run hg recover these days. Let's finally get rid of the verify step.
Reviewed By: simpkins
Differential Revision: D20109706
fbshipit-source-id: a512d9e11716514bce986b0e3a26347fe6afd955
Summary: Most of the fixes related to encoding in `patch.py`
Reviewed By: DurhamG
Differential Revision: D19713378
fbshipit-source-id: 66ccbd0fc7826ab2d4c05173c7e9edb96700d106
Summary:
There is no need to generate expensive file history stream if only one node is requested.
I refactored code that generated stream of history commits, so it'd first yield the nodes and only then prefetch their parents. That will help to solve latency problem for the history request for only a single commit.
I removed BFS queue and added two state variables: ready nodes and already processed:
* The last are the nodes that were return as a part of a history stream on the last iteration and now can be used to construct next BFS layer: prefetch fastlog batches, fill the commit graph, take parents in BFS order to form new bunch of nodes.
* First are used if it's the first iteration - there is no processed nodes yet but there are some that are ready to be returned.
I believe removing the queue I simplified the code and logic a little bit.
Reviewed By: StanislavGlebik
Differential Revision: D19818100
fbshipit-source-id: c30d28c623464ba3552a00e8542552f7655076ef
Summary:
During our tests we noticed that we can send too many blobstore read requests to the
mapping. Let's add exponential backoff to prevent that
Reviewed By: ikostia
Differential Revision: D20116043
fbshipit-source-id: 6fecbda4c36a5065b77ba9df561c6d9c6a969089
Summary:
Memcache doesn't care (because both old and new Bytes to `Into<IOBuf>`), but
Thrift is Bytes 0.5. We have our caching ext layer in the middle, which wants
Bytes 0.4. This means we end up copying things we don't need to copy.
Let's update to fewer copies. I didn't update apiserver, because a) it's going
away, and b) those bytes go into Actix, and Actix isn't upgrading to Bytes 0.5
any time soon! Besides, this doesn't actually need updating besides tests anyway.
Reviewed By: dtolnay
Differential Revision: D20006062
fbshipit-source-id: 42766363a0ff8494f18349bcc822b5238e1ec0cd
Summary:
Add methods to `version.py` to get the version of the current running Eden CLI
code, rather than looking for the current installed RPM version. This means
that we no longer have to execute a separate subprocess that examines the RPM
database. This also makes sure we log the correct version information in
cases where developers are testing local development code even though they
have a different RPM version currently installed.
Reviewed By: genevievehelsel
Differential Revision: D20102259
fbshipit-source-id: ba9eb0c563c7f7c929170b130566946a67f679a5
Summary:
Update `get_installed_eden_rpm_version_parts()` to simplify the return type
from `Tuple[Optional[str], Optional[str]]` to `Optional[Tuple[str, str]]`
This also improves the output of `get_installed_eden_rpm_version()` when the
RPM is not installed so that it returns `<Not Installed>` rather than
`<Not Installed>-` with a trailing dash.
Additionally this updates the telemetry logging to include the full
version+release string. With our current version number scheme there can be
multiple packages with the same version but different release numbers if we
release multiple packages within a single day.
Reviewed By: genevievehelsel
Differential Revision: D20102263
fbshipit-source-id: 24d2df4cdca6ac576267be66b85422c3e50f1229
Summary:
Move the `get_running_eden_version()` functions from the `version.py` module
into the `EdenInstance` class in `config.py`. This helps eliminate some
circular dependency cycles in the code, so I can start breaking a few modules
out of the main CLI `lib` library.
I also changed the return type of `get_running_version_parts()` from
`Tuple[Optional[str], Optional[str]]` to just `Tuple[str, str]`. A dev build
of EdenFS already returns empty strings (rather than `None`) for the version
and release fields). There shouldn't really be any cases where `None` is
returned here, and even if there were I don't think we would ever care to
distinguish this from the empty string case.
Reviewed By: genevievehelsel
Differential Revision: D20102262
fbshipit-source-id: 564ec5ee820026a0c86c70ad0d7cfd3750ad94f5
Summary: Log when a user runs a normal (full) restart, including success or not. Success is determined by the return code of `start_daemon()` (which calls `subprocess.call()`), similar to the success critera for graceful restart logging
Reviewed By: fanzeyi
Differential Revision: D20098949
fbshipit-source-id: 0c6f4927571f686ed6b678d5c814f76c78322274
Summary: log when a user runs eden doctor, and log how many errors they encounter
Reviewed By: fanzeyi
Differential Revision: D20084617
fbshipit-source-id: 122a062c538931eb906cbfcd515ec1e8093efc38
Summary: This is required for eden doctor cli tests when adding logging to the eden doctor code path. This can just be a stub since we don't consume these scuba log statements during testing
Reviewed By: fanzeyi
Differential Revision: D20087861
fbshipit-source-id: 6805ae8d9c51e33a118cbda76461483962e876f3
Summary: the TypeCheck test cases were yelling at me because of this annotation missing when running locally, so adding it to fix those tests.
Reviewed By: fanzeyi
Differential Revision: D20098619
fbshipit-source-id: 630e7bca2b63033b34d72d1c739184819d3d86a3
Summary: Moving `compat` one level down to the call sites of subcommand functions.
Reviewed By: farnz
Differential Revision: D20085398
fbshipit-source-id: 461e147d2ae6e560b3a75fb92fa6b23f9f54d13e
Summary:
The problem is that the datapack files are not flushed to disk when it is prefetched. By having a pair of brackets around the `HgBackingStore`, it will ensure the `HgImporter` is closed by the time when we verify the prefetch with `hg cat` since it will terminate the `debugedenimporthelper` process in its destructor, which flushes the datapack files.
The real cause of the test failure is still unclear but I believe this is the correct way of doing this test.
Reviewed By: xavierd
Differential Revision: D20090249
fbshipit-source-id: 8e3966936a402c92311919433282027846d065e8
Summary: Windows SDK doesn't define dirent. Defining it here for adding Inodes support on Edenfs on Windows.
Reviewed By: simpkins
Differential Revision: D19956272
fbshipit-source-id: 1bdf9a7563c194fe38008741b09668242ffa64ee
Summary:
Logging on Windows doesn't work when the async is set. We haven't debugged it yet. Removing the async mode flag until we fix that.
Also bumping up the log level to 4. This would help to get more info while we are running in beta.
Reviewed By: simpkins
Differential Revision: D19776609
fbshipit-source-id: ccd6a6ed4d81f4a2edd550c6bb7195ac8b8b4d16
Summary:
During S196197 lease expired and we were rederiving the same derived data over and over again for a big commit.
this diff adds lease renewal that should help with this problem.
Reviewed By: HarveyHunt
Differential Revision: D20093323
fbshipit-source-id: d139abf6659722f47ea40d9b2f279daa03623ff4
Summary: log when a user runs eden rage
Reviewed By: simpkins
Differential Revision: D20084529
fbshipit-source-id: a92c5472554cd541c9a7d340edcf6845c1c9c0c0
Summary:
fetch_root_filenode is called by FilenodesOnlyPublicMapping to figure out if
filenodes were already derived. Previously it first derived hg changeset and
then fetched looked up root manifest in db. However if hg changeset is not
derived then filenodes couldn't possible be derived either and we can return an
answer faster.
This is useful in the next diff where I change walker
Reviewed By: ahornby
Differential Revision: D20068819
fbshipit-source-id: 17f066c437e0b1f7bbeb8f6e247eadc9afe94f90
Summary:
The blobstore_healer has never waited for MyRouter before querying for slave
status, but it ended up implicitly working because creating a blobstore
required a SQL factory, and creating a SQL factory would result in waiting for
MyRouter.
Now that creating a blobstore doesn't require SQL factory unless you're going
to actually use it (which the healer isn't: it doesn't use a multiplexblob, it
uses the underlying blobstores instead), we no longer wait properly for
MyRouter, so if MyRouter isn't there when we boot, we crash.
This fixes that.
Reviewed By: ahornby
Differential Revision: D20094829
fbshipit-source-id: 82b7e8d893a01049d1f434ee8dff36a877a0d2f4
Summary:
Add support for loading by GitSha1 Aliases. This relies on the change to
Alias::GitSha1 earlier in stack.
Reviewed By: ikostia
Differential Revision: D19903577
fbshipit-source-id: 73cdccc04af61fa524c3683851d8af9ae90d31dc
Summary:
D17135557 added a bunch of `pyre-fixme` comments to the EdenFS integration
tests for cases where Pyre cannot detect that some attributes are initialized
by the test case `setUp()` method.
It looks like Pyre's handling of `setUp()` is somewhat incorrect: it looks
like if a class has a `setUp()` method this currently suppresses all
uninitialized attribute errors (even if some attributes really are never
initialized). However, Pyre does not detect `setUp()` methods inherited from
parent classes, and always warns about uninitialized attributes in this case
even they are initialized.
Lets change these comments from `pyre-fixme` to `pyre-ignore` since this
appears to be an issue with Pyre rather than with this code. T62487924 is
open to track adding support for annotating custom constructor methods, which
might help here. I've also posted in Pyre Q&A about incorrect handling of
`setUp()` in derived classes.
Reviewed By: grievejia
Differential Revision: D19963118
fbshipit-source-id: 9fd13fc8665367e0780f871a5a0d9a8fe50cc687
Summary: As I work, it's getting harder and harder to keep my multiple changes from introducing merge conflicts between different branches. We need to break out the repo_source's implementation in to a bunch of different files to make it easier to keep things separate.
Reviewed By: zhonglowu, tchebb
Differential Revision: D20015946
fbshipit-source-id: bf954ac581e5ca9e43c091b6b1b4c539c14471f2
Summary:
Fix the PathRelativizer APIs to accept `Path` and even `str` arguments instead
of just `PathBuf`. The old code required a `PathBuf`, which often forced
callers to make a copy of the path data.
Reviewed By: quark-zju
Differential Revision: D19958505
fbshipit-source-id: 6fa40dd4b75df4e3faf9ad2ae4f0e4e6595669f6
Summary:
This should give us a slightly better idea of what hosts are doing to
troubleshoot duplicate derivation.
Also, let's make the logging a bit less confusing.
Reviewed By: StanislavGlebik
Differential Revision: D20070619
fbshipit-source-id: 91cc264b7043b8fc8c21c007832fba328ef0017d
Summary:
This updates our multiplexed blobstore configuration to carry its own DB
config. The upshot of this change is that we can move the blobstore sync queue
(a fairly unruly table) to its own DB.
Another nice side effect of this is that it cleans up a bunch of other code, by
finally decoupling the blobstore config from the DB config. For examples,
places that need to instantiate a blobstore can now to do even without a DB
config (such as wireproto logging).
Obviously, this cannot land until we update the configs to include this. I'll
do so in Configerator prior to landing the diff.
Reviewed By: HarveyHunt
Differential Revision: D19973905
fbshipit-source-id: 79e4ff92cdb989aab4532decd3fe4fd6c55e2bb2
Summary:
I'd like to refactor our multiplex blob to store its DB using a different
shard. In preparation of doing so, let's:
- Extract parsing DB configs from storage configs
- Tidy up some related places that take a reference when they actually need
ownership (which is sort of wasteful).
Reviewed By: StanislavGlebik
Differential Revision: D19973906
fbshipit-source-id: 82baceb892e9e24e5fd0349ffa5503884c177a7a
Summary:
Most of EdenFS's main logging is done through folly::logging, however a number
of libraries that we use do logging through glog. Previously we set glog's
`--minloglevel` setting to `0`, and we use the default `--v=0` setting.
This enabled glog `VLOG` messages, only for at VLOG level `0` messages.
Now that the Rust backing store code can fetch directly from memcache this now
links in some additional memcache library code that has some `VLOG(0)`
messages that are logged fairly frequently. These aren't useful for us to
have in our logs, so reduce the `minloglevel` to `1` for now, which disables
all `VLOG` messages.
Reviewed By: genevievehelsel
Differential Revision: D20050589
fbshipit-source-id: 167e301d61e46ae3c19975e0c9233eda371495c0