Summary: We no longer have any middleware that requires to be able to capture variables from the "inbound" phase (i.e. prior to handling the request) to the "outbound" phase (i.e. once the response is ready). Instead, we're passing everything through the State. So, let's get rid of the dynamism we don't need.
Reviewed By: HarveyHunt
Differential Revision: D17503373
fbshipit-source-id: 569d180250821aa3707245133a223b1f4efba3b6
Summary:
Middleware executes when a response is ready, but since responses can contain a stream, that might not be the full story if we're e.g. downloading a big file.
This diff updates our middleware to introduce a post-send context that lets us conditionally dispatch those actions after we've finished sending the response to the client.
Reviewed By: HarveyHunt
Differential Revision: D17503374
fbshipit-source-id: 4dc97c0057d6e1705d116cbc1d283fc73de213ef
Summary:
This reworks our middleware to not use the Gotham router for middleware, and
instead creates our own service to wrap the one provided by the Gotham Router.
The upshot of this is that we can then run our own middleware even if the
Gotham Router finds a 404 or 405 (in which case it doesn't normally run any
middleware).
This also simplifies our middleware a little bit, since our middleware isn't
modifying responses at all, but Gotham middleware allows that (we can still run
response transformation in Gotham).
As part of this change, I've also cleaned up our LoggingContext (which didn't
really belong in `lfs_server_context`), and moved our router implementation in
its own module.
Reviewed By: HarveyHunt
Differential Revision: D17500363
fbshipit-source-id: 9b0d7449f707f158d9f5433e2953d270b3446c8f
Summary:
This adds a little more headers to make it easier for clients to identify which
server they are talking to.
Reviewed By: HarveyHunt
Differential Revision: D17498981
fbshipit-source-id: 758562f0ed631dc6f54eb567200e3bf1af04b078
Summary:
This function finds entries that were "introduced" by this manifest i.e.
entries that are present in this manifest but not present in any of
`diff_against` manifests.
This function was used in fastlog, and I'm planning to use it in the new sync
job, so move it to ManifestOps crate (and rename along the way).
One small change in behaviour is that we removed a single traced...() call. I
talked with aslpavel who introduced it and he's fine with removing it -
unfortunately our traces are not in a great shape yet, so this trace call is
not used at all.
Reviewed By: aslpavel
Differential Revision: D17477055
fbshipit-source-id: cba1a55a64299857efe4f4be7a67b6faf31e7019
Summary: It will be used in the next diffs
Reviewed By: krallin
Differential Revision: D17498798
fbshipit-source-id: c71319a21aa586208871555f2055c81afb021b52
Summary: This is convenient to get the content ID for a file at a given path.
Reviewed By: HarveyHunt
Differential Revision: D17498393
fbshipit-source-id: f40706b9289ca77e99cb3d7070b396ee134f79c3
Summary:
tl;dr Got things working again by ignoring the `--config-path` from rls.
Despite my efforts in D17089540, `rustfmt` appeared to have stopped
working again in VS Code. I thought maybe something changed in our move
to 1.38.0-beta.3. I saw something about a `VersionMismatch` in the
console (which was the issue last time), but unfortunately the error
did not include what the expected and observed versions were, so I
built `rustfmt` from source locally and modified it to include this
information (which I should try to upstream).
Unfortunately, our VS Code/rls setup is not great when it comes to
logging errors to a file, so once again I ran:
```
sudo execsnoop.py -n rust
```
and observed the external `rustfmt` command that `rls` ultimately
constructed and executed so I would have a simpler repro case.
Incidentally, after rls had a `rustfmt` error of some sort, it seemed
to end up in a broken state, so debugging the issue outside of rls
was a lot simpler. Anyway, here was an example of such a command I saw
from `execsnoop`:
```
/data/users/mbolin/eden-fbsource/fbcode/common/rust/tools/common/rustfmt_wrapper.sh --unstable-features --skip-children --emit stdout --quiet --config-path /dev/shm/tmp/i0gVWxrcpp
```
As explained in D17089540, `/dev/shm/tmp/i0gVWxrcpp` ends up being this
fully serialized `rustfmt` configuration. I took this command and repro'd
by making a local formatting error and piping the file in:
```
/data/users/mbolin/eden-fbsource/fbcode/common/rust/tools/common/rustfmt_wrapper.sh --unstable-features --skip-children --emit stdout --quiet --config-path /dev/shm/tmp/i0gVWxrcpp < scm/dotslash/src/main.rs | head
```
and got:
```
Warning for license template file "": No such file or directory (os error 2)
1 282 282
// Copyright 2004-present Facebook. All Rights Reserved.
mod backend;
mod config;
#[cfg(not(build = "instrumentation"))]
mod curl;
mod curl_args;
mod decompress;
mod dotslash_cache;
```
What was that garbage at the top about a license template file? I ended up
looking at `/dev/shm/tmp/i0gVWxrcpp` and found this line:
```
license_template_path = ""
```
Apparently the default value for this configuration option is `""`
and `rustfmt` tries to open the path without bothering to check whether
it is the empty string:
4449250539/src/config/license.rs (L215-L221)
Great. (I filed an issue for this at https://github.com/rust-lang/rustfmt/issues/3802.)
It finally dawned on me that between `required_version` and `license_template_path`,
this generated config file from rls was not doing us any favors. Since we are already
using `rustfmt_wrapper.sh` to rewrite some options that we care about, why not rewrite
`--config-path` to use the `.rustfmt.toml` config in fbcode?
So that's what I did and now everything seems to work again.
Also, I appeared to have figured out why we need a wrapper script in the first place!
rls tries to set the `emit_mode` to `ModifiedLines`, which would generate the output
from rustfmt in the way that rls wants. It looks like someone then made reading from
stdin incompatible with `emit_mode=ModifiedLines`. I filed this as:
https://github.com/rust-lang/rls/issues/1555
Reviewed By: dtolnay
Differential Revision: D17492254
fbshipit-source-id: 3415bdab3c1030d3082ae2b8fab0c2e6b312534a
Summary:
Add a string representing the method to LoggingContext,
which can then be used by the ODS and Scuba middleware.
NOTE: This changes the batch endpoint key from:
mononoke.lfs.request.<repo>.objects
to:
mononoke.lfs.request.<repo>.batch
Reviewed By: krallin
Differential Revision: D17475551
fbshipit-source-id: 8692f165719c9f69bf0d783845ed9d87b8baf86f
Summary:
Rather than using hardcoded path maps and ad-hoc moveres, lets use the
logic which creates moves from config.
Reviewed By: farnz
Differential Revision: D17424678
fbshipit-source-id: 64a0a0b1c7332661408444a6d81f5931ed680c3c
Summary:
This diff achieves two things:
- configs are verified from this point of view when read, not just when
compiled
- we have a way to get config by repoid if we have `RepoConfigs`
Reviewed By: farnz
Differential Revision: D17460214
fbshipit-source-id: cedce280d6f8209c2f4e1a4ff0d53780242bac30
Summary:
I want to use TreepackPartInput in the next diffs, and it's easier to use if we
are passing fullpath here
Reviewed By: krallin
Differential Revision: D17445971
fbshipit-source-id: 936fb544df3d57b5565f900402fc61dceb422bc6
Summary:
Store a wrapped instance of ScubaSampleBuilder in gotham's State, so that it can be
accessed from outside of the middleware. Further, refactor the Scuba middleware
to use a separate function and replace the and_then() combinator with then() so that all internal
errors are handled by the scuba logger.
Additionally, use the scuba logger to log the number of objects in a batch as well
as the size of uploads.
Reviewed By: krallin
Differential Revision: D17425371
fbshipit-source-id: bb86995c2e561062c1b1951fcea98fa25300103c
Summary: This got amended into the wrong commit - land it so that we've got it
Reviewed By: krallin
Differential Revision: D17426107
fbshipit-source-id: 4bb2fcae1d7ea2db5378f39ec0f22c2abf225d74
Summary:
This updates multiplexblob to avoid waiting for the SQL queue if writes have
succeeded in all the blobstores we are writing to.
It's worth noting that this might mean adding a new blobstore will require a
little more effort, since we no longer guarantee that the blobstore queue will
contain all writes that we made. That said, we weren't really in a position to
100% rely on this anyway (writes can in fact succeed even if the SQL queue
doesn't get written to), so in that sense it doesn't make a huge difference.
Reviewed By: farnz
Differential Revision: D17421208
fbshipit-source-id: 0f2ecbf22ba51531a5917baf912183d5309e1b63
Summary:
Allow the --exclude flag to be passed to the hook tailer multiple
times, rather than accepting a single comma separated list.
Reviewed By: farnz
Differential Revision: D17422578
fbshipit-source-id: ba83e535d2613fd9fb81a5f75c7e640b193b7a9e
Summary: Mercurial and Mononoke disagree on the meaning of a forced pushrebase - Mercurial does a partial rewrite, Mononoke does not. Handle one of the differences by rejecting a commit that Mercurial would rewrite. The user can get identical behaviour by rewriting their commit, anyway.
Reviewed By: StanislavGlebik
Differential Revision: D17420573
fbshipit-source-id: 8332c8f86e56bbcfb00e0bf126b78e11b63d83c3
Summary: To make merges into our final megarepo world Just Work, we want to sustain an interim period where the legacy repo is still present and up to date. Provide a way to do this by having Mononoke internally remap and pushrebase commits so that we can have two repos kept in sync.
Reviewed By: ikostia, StanislavGlebik
Differential Revision: D16918280
fbshipit-source-id: c7a8d1bb1b1972cb5609490b8fd5b759cb5f9cdd
Summary:
Hg-bonsai mapping that we received from blob repo does not keep an order of the given changeset ids.
Because of that we returned randomly ordered history commits.
This diff fixes it.
I also added special processing of the bonsai ids, that the hg cs id pair wasn't found for.
This is because `get_hg_bonsai_mapping` returns mapping between bonsai and hg changesets that have already been derived. If mapping doesn't have hg changeset ids for some of the given bonsais, then I need to derive them: here goes `get_hg_from_bonsai_changeset`.
The whole thing is quite confusing, I created a task for that in blobrepo T54103243 to probably make getting mapping method private and make a better API function for a proper mapping.
Reviewed By: krallin
Differential Revision: D17397353
fbshipit-source-id: 45a96f820523d3d2d60fe5c77b45bf97f3616ba1
Summary:
`movers` are functions, that we use to shift file paths when syncing commits.
These functions should be automatically buildable from repo-sync configs
for both small-to-large and large-to-small sync directions.
Reviewed By: farnz
Differential Revision: D17395844
fbshipit-source-id: 25ec9b06ba5908d8c125702a712b3cf782ccffca
Summary:
The extra_context field is a populated with certain performance counters.
However, if there is nothing to log then we log a serialised empty HashMap. In such a
case, don't log anything at all.
Reviewed By: krallin
Differential Revision: D17419339
fbshipit-source-id: 9ede283b3ee20c59c09765412fa9c932a12cd913
Summary: Modified code to calculate next statistic: total number of lines in source files.
Reviewed By: krallin
Differential Revision: D17395757
fbshipit-source-id: f19ee888b9836b42e755acee2d4c76e1fef6265e
Summary: Move upload logic for mercurial data out of blobrepo crate
Reviewed By: StanislavGlebik
Differential Revision: D17366158
fbshipit-source-id: 9f1cdf4fbe67552b12fd1ef94f9c7de1be632988
Summary:
Some parts of mononoke don't yet work inside the new tokio runtime, so switch
back to the old runtime. This means we can't use `#[tokio::test]`, and
instead must construct our own runtime during the test.
Reviewed By: StanislavGlebik
Differential Revision: D17398589
fbshipit-source-id: 8acc2af92851fb50b89a8fe4087ff3cd3d5707ef
Summary: Long chains of field accessors are awkward to use. Add convenience methods to let us go straight to the context field we are interested in.
Reviewed By: StanislavGlebik
Differential Revision: D17398588
fbshipit-source-id: 9ad789e93a713ab6cded9c2b88e11552f69ee8f8
Summary:
Collect together all derived data in the `derived_data` directory. This moves
the `unodes` derivation crate.
Reviewed By: StanislavGlebik
Differential Revision: D17314208
fbshipit-source-id: f1b575446192cb9799e443efb1cb863b80ef72d6
Summary:
Implement `derive_fsnode`, which uses `derive_manifest` to derive fsnode data
for a bonsai changeset.
Reviewed By: krallin
Differential Revision: D17286038
fbshipit-source-id: c5b1fa99278f2a4c7290848584ed8e6746c2518c
Summary:
Fsnodes are manifests, and should implement the `Manifest` trait. This will
allow us to derive them using `derive_manifest`.
Fsnodes only exist for directories: there is no fsnode data for the leaves, so the
leaf type and leaf id type are both `(ContentId, FileType)`.
Reviewed By: StanislavGlebik
Differential Revision: D17286037
fbshipit-source-id: 4ffc26f702cd68b938c52471242eac40ac9cec78
Summary:
Introduce the concept of fsnodes. These are summary information for each
directory in the repository, and, unlike unodes, do not include history
information.
Define the types that will be used for fsnodes. This includes both the main
fsnode types, and their thrift serialization counterparts.
Reviewed By: StanislavGlebik
Differential Revision: D17286039
fbshipit-source-id: b6f3ddd94b72dd7e4e19f8cdc88fd75c88cc1e59
Summary:
Add statistics to the LFS server that report useful information, such as
the number of requests to an endpoint or a histogram of file sizes.
Reviewed By: krallin
Differential Revision: D17367739
fbshipit-source-id: bca99c059c61f11e7f78319ebccd22ebb31c4ae0
Summary: This change extends `derive_manifest` so it would pass additional optional context variable associated with subnodes. This is useful when we want to track some additional information associated with subtree, without either storing it inside entry or re-fetching it from blobstore.
Reviewed By: markbt
Differential Revision: D17397494
fbshipit-source-id: bab87a70cc680fa539c8a26a71032ce776d82127
Summary: `async-unit` no longer has the old problems around deadlock - instead, it runs the tests in a Tokio runtime and threads the failure state (if any) back to the test runner. Use it.
Reviewed By: StanislavGlebik
Differential Revision: D16937389
fbshipit-source-id: 81b87a32b9c8ffdfd4a2033d10809cdcdb39898b
Summary:
Change the multiplexed blobstore logging to log the blobstore key.
Also, fix the get() method to log errors to scuba (which it wasn't before!).
Further, refactor the code to remove env parsing for tupperware information and also switch
to using ScubaSampleBuilder. This means that the common server information can
be stored once and then cloned each time we log, rather than iterating through
a vec of server information each time we want to log.
Additionally, using ScubaSampleBuilder means that we don't need to pass in an Option<ScubaClient>,
cleaning up the code a little bit.
Reviewed By: StanislavGlebik
Differential Revision: D17368779
fbshipit-source-id: 0896962cdbd37912fc6f23a5e541e10cea90fa0e
Summary:
This diff moves initFacebook calls that used to happen just before FFI calls to instead happen at the beginning of main.
The basic assumption of initFacebook is that it happens at the beginning of main before there are additional threads. It must be allowed to modify process-global state like env vars or gflags without the possibility of a data race from other code concurrently reading those things. As such, the previous approach of calling initFacebook through `*fbinit::FACEBOOK` near FFI calls was prone to race conditions.
The new approach is based on attribute macros added in D17245802.
---
The primary remaining situations that still require `*fbinit::FACEBOOK` are when we don't directly control the function arguments surrounding the call to C++, such as in lazy_static:
lazy_static! {
static ref S: Ty = {
let _ = *fbinit::FACEBOOK;
/* call C++ */
};
}
and quickcheck:
quickcheck! {
fn f(/* args that impl Arbitrary */) {
let _ = *fbinit::FACEBOOK;
/* call C++ */
}
}
I will revisit these in a separate diff. They are a small fraction of total uses of fbinit.
Reviewed By: Imxset21
Differential Revision: D17328504
fbshipit-source-id: f80edb763e7f42b3216552dd32f1ea0e6cc8fd12
Summary: `single` subcommand which regenerates specified derived data for provided changeset. This is useful for debugging purposes
Reviewed By: StanislavGlebik
Differential Revision: D17318968
fbshipit-source-id: 3d3a551991b0628a05335addedd7d5b315fd45d2
Summary: I modified statistics_collector tool, so now it calculates both number of files, and total file size in repo.
Reviewed By: krallin
Differential Revision: D17342512
fbshipit-source-id: 94217d8b61c2a7350f1793a2ef33f84d600bbb54
Summary:
Generating a lot of fastlog batches for a single commit turned out to be a cpu
intensive operation. To make sure more cpu threads are used let's use
spawn_future() function that puts each future in a separate Tokio task, which
in turn let it be scheduled on different cpu.
There are some concerns that it might cause problems when we'll derive data in
production i.e. the whole host will be unavailable for a few seconds because it
uses all cpus. My hope is that it will affect only very few hosts because of
memcache leases that we use in derive data implementation.
Reviewed By: krallin
Differential Revision: D17364581
fbshipit-source-id: 0281de9c42f93d793f6caccbca7e8809056782ec
Summary:
Timing a request is going to be important for multiple logging
middlewares, so let's move it into its own middleware and place it
in the pipeline.
Further, create a new middleware/ directory and move middlewares into there.
Reviewed By: krallin
Differential Revision: D17346276
fbshipit-source-id: f84c6c06d76e95c11aab18c3a24200a67429bebf
Summary:
As the filestore can cheaply calculate file size, update the Scuba logging to
also log that. Further, set the content length header when responding to download requests.
Reviewed By: krallin
Differential Revision: D17319666
fbshipit-source-id: 858372316930c384f19b89e2b69b08faaf656237
Summary: Add lease_type to memcache lease stats so that its clearer what types of leases are driving lease waits
Reviewed By: HarveyHunt
Differential Revision: D17344425
fbshipit-source-id: ea1ae44319428bc1705f502ad9fa1b2b0c44bf96
Summary:
It's quite useful for testing to be able to keep cachelib caching, but disable
memcache caching. This diff adds it
Note that in this diff it uses cachelib caching for blobstore. We don't use any caching for filenodes, changesets etc
Reviewed By: HarveyHunt
Differential Revision: D17342171
fbshipit-source-id: 65458170560ea6913b3249a4118404dcc47e507d
Summary:
Previously rebuilding skiplist required fetching all the commits from mysql,
and then rebuilding the index from scratch. That was quite slow (> 16 mins to
finish). Instead let's try to read the key to check if it has the data already
and prepopulate skiplist with it.
Reviewed By: krallin
Differential Revision: D17343950
fbshipit-source-id: e8a446b94af61dbbd224d853f7dd8dd41510549d
Summary:
This wires up the stdlog crate with our slog output. The upshot is that we can
now run binaries with `RUST_LOG` set and expect it to work.
This is nice because many crates use stdlog (e.g. Tokio, Hyper), so this is
convenient to get access to their logging. For example, if you run with
`RUST_LOG=gotham=info,hyper=debug`, then you get debug logs from Hyper and info
logs from Gotham.
The way this works is by registering a stdlog logger that uses the env_logger's
filter (the one that "invented" `RUST_LOG`) to filter logs, and routes them to
slog if they pass. Note that the slog Logger used there doesn't do any
filtering, since we already do it before sending logs there.
One thing to keep in mind is that we should only register the stdlog global
logger once. I've renamed `get_logger` to `init_logging` to make this clearer.
This behavior is similar to what we do with `init_cachelib`. I've updated
callsites accordingly.
Note that we explicitly tell the stdlog framework to ignore anything that we
won't consider for logging. If you don't set `RUST_LOG`, then the default
logging level is `Error`, which means that anything below error that is sent to
stdlog won't even get to out filtering logic (the stdlog macros for logging
check for the global level before actually logging), so this is cheap unless
you do set `RUST_LOG`.
As part of this, I've also updated all our binaries (and therefore, tests) to
use glog for logging. We had been meaning to do this, and it was convenient to
do it here because the other logger factory we were using didn't make it easy
to get a Drain without putting it a Logger.
Reviewed By: ahornby
Differential Revision: D17314200
fbshipit-source-id: 19b5e8edc3bbe7ba02ccec4f1852dc3587373fff
Summary:
This updates the LFS server to route hg client correlators to Scuba. This will
help in troubleshooting issues should any arise.
Reviewed By: HarveyHunt
Differential Revision: D17319280
fbshipit-source-id: d4323925a425203f53aba184d5854dd674462da6