Commit Graph

539 Commits

Author SHA1 Message Date
Stanislau Hlebik
3874bb6de0 mononoke: support stacks in sync job bundle generator
Summary:
Previously it was possible to sync just a single commit - this diff adds
support for generating bundle with stacks of commits.

It uses DifferenceOfUnionsOfAncestorsNodeStream which is exactly the same
stream we use in Mononoke to serve `hg pull` requests.

Reviewed By: krallin

Differential Revision: D17498873

fbshipit-source-id: d94066ad0f0cbdd44fd958c57739de1e070eec9f
2019-09-24 05:29:42 -07:00
Stanislau Hlebik
d471529aec mononoke: first diff for the sync job that generates bundles (only single commit)
Summary:
This is the very first and very limited implementation of a new mode of the
sync job. This is the mode where bundles are generated by the sync job instead
of reusing the bundles pushed by the client.

At the moment it syncs only a single commit, new functionality has to be added
later.

A few notes:
1) We need to manually generate replycaps part . Mononoke essentially ignores it
(with a few small exceptions), however Mercurial actually pays attention to it, and
generates different response.
At the moment reply capabilities are hard-coded in the sync job, and that might be
problematic if replycaps change by Mercurial client but sync job hasn't been
updated.
There are a few protections from it - we can make replycaps dynamic (for
example, specify it in the tw spec) and we can also rely on integration tests
to catch regressions.

2) create_filenode_entries_stream has a complicated logic for renames - please
have a look at it to double check my understanding!

Reviewed By: krallin

Differential Revision: D17477983

fbshipit-source-id: fdb4584e768bdc5637868468a035c81f1584f7fe
2019-09-24 05:29:42 -07:00
Kostia Balytskyi
7afc953929 mononoke: allow tests to work with multiple mononoke repos
Reviewed By: farnz

Differential Revision: D17499277

fbshipit-source-id: 99c3e624cea855a6984b0a3c1d991d16e13c64a3
2019-09-23 07:16:15 -07:00
Kostia Balytskyi
d7640811c6 mononoke: make all binaries in library.sh use --repo-id
Reviewed By: StanislavGlebik

Differential Revision: D17500362

fbshipit-source-id: c7e167e99313547e98a54256a28f33a184d0702d
2019-09-20 08:26:22 -07:00
Kostia Balytskyi
d1bb74ea06 mononoke: make megarepotool capable of pushsing to bookmarks
Reviewed By: krallin

Differential Revision: D17448128

fbshipit-source-id: ca5bd6f90ce370806ab23120d3a34567bedfae47
2019-09-19 03:33:54 -07:00
Kostia Balytskyi
13cde788a4 mononoke: refactor megarepotool
Reviewed By: farnz

Differential Revision: D17448129

fbshipit-source-id: ec8231f6174c3db811668bac6a22007d6ad567ff
2019-09-19 03:33:53 -07:00
Kostia Balytskyi
126fe17a2f megarepotool: consume bookmarks as source arguments for move
Reviewed By: farnz

Differential Revision: D17448127

fbshipit-source-id: 7ff57fd3fff510bd293b818bfa027e0e37f7abf2
2019-09-19 03:33:53 -07:00
Kostia Balytskyi
e972a82545 mononoke: migrate megarepotool move subcommand to core mononoke movers
Summary:
Rather than using hardcoded path maps and ad-hoc moveres, lets use the
logic which creates moves from config.

Reviewed By: farnz

Differential Revision: D17424678

fbshipit-source-id: 64a0a0b1c7332661408444a6d81f5931ed680c3c
2019-09-19 03:33:53 -07:00
Simon Farnsworth
e20c051ac5 Forbid forced pushrebase where the commit contains mutation markers
Summary: Mercurial and Mononoke disagree on the meaning of a forced pushrebase - Mercurial does a partial rewrite, Mononoke does not. Handle one of the differences by rejecting a commit that Mercurial would rewrite. The user can get identical behaviour by rewriting their commit, anyway.

Reviewed By: StanislavGlebik

Differential Revision: D17420573

fbshipit-source-id: 8332c8f86e56bbcfb00e0bf126b78e11b63d83c3
2019-09-17 09:09:20 -07:00
David Tolnay
713973975d Replace *fbinit::FACEBOOK with #[fbinit::main]
Summary:
This diff moves initFacebook calls that used to happen just before FFI calls to instead happen at the beginning of main.

The basic assumption of initFacebook is that it happens at the beginning of main before there are additional threads. It must be allowed to modify process-global state like env vars or gflags without the possibility of a data race from other code concurrently reading those things. As such, the previous approach of calling initFacebook through `*fbinit::FACEBOOK` near FFI calls was prone to race conditions.

The new approach is based on attribute macros added in D17245802.

 ---

The primary remaining situations that still require `*fbinit::FACEBOOK` are when we don't directly control the function arguments surrounding the call to C++, such as in lazy_static:

    lazy_static! {
        static ref S: Ty = {
            let _ = *fbinit::FACEBOOK;
            /* call C++ */
        };
    }

and quickcheck:

    quickcheck! {
        fn f(/* args that impl Arbitrary */) {
            let _ = *fbinit::FACEBOOK;
            /* call C++ */
        }
    }

I will revisit these in a separate diff. They are a small fraction of total uses of fbinit.

Reviewed By: Imxset21

Differential Revision: D17328504

fbshipit-source-id: f80edb763e7f42b3216552dd32f1ea0e6cc8fd12
2019-09-13 20:17:29 -07:00
Thomas Orozco
53bf2886cc mononoke: connect stdlog and slog
Summary:
This wires up the stdlog crate with our slog output. The upshot is that we can
now run binaries with `RUST_LOG` set and expect it to work.

This is nice because many crates use stdlog (e.g.  Tokio, Hyper), so this is
convenient to get access to their logging.  For example, if you run with
`RUST_LOG=gotham=info,hyper=debug`, then you get debug logs from Hyper and info
logs from Gotham.

The way this works is by registering a stdlog logger that uses the env_logger's
filter (the one that "invented" `RUST_LOG`) to filter logs, and routes them to
slog if they pass. Note that the slog Logger used there doesn't do any
filtering, since we already do it before sending logs there.

One thing to keep in mind is that we should only register the stdlog global
logger once. I've renamed `get_logger` to `init_logging` to make this clearer.
This behavior is similar to what we do with `init_cachelib`. I've updated
callsites accordingly.

Note that we explicitly tell the stdlog framework to ignore anything that we
won't consider for logging. If you don't set `RUST_LOG`, then the default
logging level is `Error`, which means that anything below error that is sent to
stdlog won't even get to out filtering logic (the stdlog macros for logging
check for the global level before actually logging), so this is cheap unless
you do set `RUST_LOG`.

As part of this, I've also updated all our binaries (and therefore, tests) to
use glog for logging. We had been meaning to do this, and it was convenient to
do it here because the other logger factory we were using didn't make it easy
to get a Drain without putting it a Logger.

Reviewed By: ahornby

Differential Revision: D17314200

fbshipit-source-id: 19b5e8edc3bbe7ba02ccec4f1852dc3587373fff
2019-09-12 04:13:11 -07:00
Thomas Orozco
8b4012de75 mononoke: corrupt packs, not loose files
Summary:
Our tests were dead when D17244837 landed, but that diff removed loose files and
broke our tests because they were trying to corrupt them. Let's fix this by
corrupting packs instead.

Reviewed By: StanislavGlebik

Differential Revision: D17314430

fbshipit-source-id: 708f24b902ba1f94836f48989a32b696d2f6e52a
2019-09-11 05:40:56 -07:00
Kostia Balytskyi
a878f7ed13 mononoke: make blobimport capable of prefixing bookmarks
Summary:
We need this for `megarepo_test` blobimport, which will prefix `fbsource` and
`ovrsource` bookmarks.

Reviewed By: krallin

Differential Revision: D17286275

fbshipit-source-id: 40bb3f97b0a06dcd636f891d9b32c7ef9b55a0fc
2019-09-11 03:19:38 -07:00
Thomas Orozco
15b6f902e3 mononoke/dbbookmarks: finally use >list for allowed bookmark updates
Summary:
Now that I've updated the `queries!` macro to allow a `>list` along with other
arguments in a query, we can use it in the dbbookmarks update query that was
checking for valid `UPDATE`s.

Reviewed By: farnz

Differential Revision: D17288575

fbshipit-source-id: 051e03bd711bf26e43ec79509051ffff68316db7
2019-09-11 01:40:11 -07:00
Kostia Balytskyi
800824a2bd mononoke: add config reading logic for commit sync configs
Summary:
Commit sync will operate based on the following idea:
- there's one "large" and potentially many "small" repos
- there are two possible sync directions: large-to-small and small-to-large
- when syncing a small repo into a large repo, it is allowed to change paths
  of each individual file, but not to drop files
- large repo prepends a predefined prefix to every bookmark name from the small repo, except for the bookmarks, specified in `common_pushrebase_bookmarks` list, which refers to the bookmarks that can be advanced by any small repo

Reviewed By: krallin

Differential Revision: D17258451

fbshipit-source-id: 6cdaccd0374250f6bbdcbc9a280da89ccd7dff97
2019-09-10 07:23:15 -07:00
Thomas Orozco
9e458f7acf mononoke/apiserver: remove LFS implementation
Summary: This removes the LFS implementation from the apiserver. We haven't used it, and we weren't going to.

Reviewed By: StanislavGlebik

Differential Revision: D17263036

fbshipit-source-id: cad3e8a7d54abef06ff1bc1cb5e73c55342743b6
2019-09-10 06:39:54 -07:00
Thomas Orozco
11f0cc609b mononoke: update LFS tests to use new LFS server
Summary:
This updates all the existing integration tests that talked to the API Server to tlak to the LFS Server instead.

Note that some tests have changed a little bit. That's because unlike the old implementation, the new one doesn't ask clients to re-upload the things it already has.

Reviewed By: HarveyHunt

Differential Revision: D17263038

fbshipit-source-id: b3ef9af6f416458354d49b0ab9e99e588f8edd26
2019-09-10 06:39:53 -07:00
Thomas Orozco
cff0b359b3 mononoke/lfs_server: run without upstream + integration tests
Summary:
This adds support in the LFS Server for running without an upstream. This is
useful to run tests, because it lets chain LFS Servers and have one acting as a
proxy and one acting as a backend.

This also highlighted a bug in my implementation (we were expecting a download
action from upstream if the data didn't need to be uploaded there), which I
fixed here.

For the time being, we won't use this in production.

Reviewed By: HarveyHunt

Differential Revision: D17263039

fbshipit-source-id: 7cba550054e5f052a4b8953ebe0195907919aade
2019-09-10 06:39:53 -07:00
Thomas Orozco
73f713ca70 mononoke: remove manifold / MySQL flags + safe_writes
Summary: None of these do anything.

Reviewed By: StanislavGlebik

Differential Revision: D16963565

fbshipit-source-id: bb0dd344daf151dd1d5c45392c43b014d2b17a07
2019-09-06 07:17:45 -07:00
Kostia Balytskyi
1ece1d8688 mononoke: add a tool for megarepo manipulations
Reviewed By: StanislavGlebik

Differential Revision: D17181600

fbshipit-source-id: 87565ac13f8bca36a65a09c0a6169c3dcf692ff9
2019-09-06 05:28:18 -07:00
Mark Thomas
3a124c9d82 debugmutation: improve format and render successors
Summary:
Update the debugmutation format to collapse long chains.  Add
`hg debugmutation -s` to follow the successor relationship (printing what the
commit became) instead of the predecessor relationship.

Reviewed By: mitrandir77

Differential Revision: D17156463

fbshipit-source-id: 44a68692e3bdea6a2fe2ef9f53b533199136eab1
2019-09-06 00:42:03 -07:00
Daniel Grzegorzewski
94f41ba5aa Added option to use reponame insted of repo-id
Summary: Currently in mononoke_admin tool we have to use repo-id to identify repository. Sometimes it can be inconvenient. Changed it, so we can use either repo-id or reponame.

Reviewed By: StanislavGlebik

Differential Revision: D17202962

fbshipit-source-id: d33ad55f53c839afc70e42c26501ecd4421e32c0
2019-09-05 09:36:27 -07:00
Stanislau Hlebik
f50e19a70e mononoke: compress file history
Summary:
This diff adds handling of "overflows" i.e. making sure that each FastlogBatch
doesn't store more than 60 entries.
The idea is that FastlogBatch has "latest" entries and "previous_batches". New
entry is prepended to "latest", but if "latest" has more than 10 entries then
new FastlogBatch is uploaded and prepended to "previous_batches". If
"previous_batches" has more than 5 entries than the oldest entries is removed
from the batch.

Note that merges are still not handled in this diff, this functionality will be added in the next
diffs

Reviewed By: krallin

Differential Revision: D17160506

fbshipit-source-id: 956bc5320667c6c5511d2a567740a4f6ebd8ac1b
2019-09-03 07:30:55 -07:00
Stanislau Hlebik
90a4d40551 mononoke: do not generate hgchangeset unnecessarily
Summary:
We were generating hgchangesets in list_directory_unodes() method, however this
is not always necessary - if a client sends a bookmark then we can fetch bonsai
changesets and generate unodes directly bypassing hg changeset generation stage.
Same goes for is_ancestor method.

Reviewed By: krallin

Differential Revision: D17146609

fbshipit-source-id: 9613e28f8bacbce5d8de94a6ab88b152d19b0a08
2019-09-02 04:18:16 -07:00
Pavel Aslanov
f43b19c431 remove not needed BlobRepo::*{store|fetch} methods
Summary: `BlobRepo::*{store|fetch}` methods are subsumed by `Storable|Loadable`

Reviewed By: krallin

Differential Revision: D17133306

fbshipit-source-id: 6379aea6335d57f0d90a9669ba5ef0300f82a399
2019-08-30 13:34:29 -07:00
Harvey Hunt
a494c558c4 mononoke: Prevent humans from moving specific bookmarks in fbsource
Summary:
There are a collection of bookmarks that should only be moved by automation. If they're
moved by users, SEVs can happen (S184864).

Configure the restrict_users hook to disallow moves of the following bookmarks:

    fbandroid/stable
    fbcode/stable
    fbcode/warm
    fbobjc/stable
    fbsource/stable

I collected this list by running the following on a hg server:

    [svcscm@hg009.cln2 /data/scm/fbsource]$ hg config | grep automation
    hooks.automationbookmarks=fbandroid/stable\nfbcode/stable\nfbcode/warm\nfbobjc/stable\nfbsource/stable

The usernames came from the disable-nonff.py hook on hg servers.

Reviewed By: suitingtseng, StanislavGlebik

Differential Revision: D16960784

fbshipit-source-id: 9cf63b40492431bf34fb0574738794b0190703c6
2019-08-22 07:10:37 -07:00
Thomas Orozco
ce005af688 mononoke: add lfs_import command
Summary: This adds a command that allows importing a set of LFS blobs into Mononoke by streaming them out of another storage location (e.g. Dewey).

Reviewed By: HarveyHunt

Differential Revision: D16917806

fbshipit-source-id: 4917d56e11a187c89e00c23a32c6e791b351f8ef
2019-08-21 02:33:31 -07:00
Kostia Balytskyi
5757d6196a redaction: rename relevant constants into their desired names
Summary:
We do not want to use censorship/blacklisting, as those names are ambiguous
and redaction has exactly the meaning we need.

Reviewed By: quark-zju

Differential Revision: D16890423

fbshipit-source-id: 1a6dcc61ca3fe77b7a938a37fb1ec9250ae9e15a
2019-08-20 04:01:00 -07:00
Kostia Balytskyi
cff091b4c6 mononoke: rename censorship/blacklisting into a redaction
Summary:
This is a mechanical part of rename, does not change any commit messages in
tests, does not change the scuba table name/config setting. Those are more
complex.

Reviewed By: krallin

Differential Revision: D16890120

fbshipit-source-id: 966c0066f5e959631995a1abcc7123549f7495b6
2019-08-20 04:01:00 -07:00
Harvey Hunt
b778fd2482 mononoke: Add a rechunker binary
Summary:
Create a new binary that can be used to rechunk files content using the filestore.
The binary accepts multiple filenodes, that it will then go and rechunk using the filestore
config provided to it.

Reviewed By: krallin

Differential Revision: D16802701

fbshipit-source-id: d7c05729f5072ff2925bbc90cdd89fcfed56bba2
2019-08-16 09:08:45 -07:00
Kostia Balytskyi
ee03dfa250 mononoke: introduce blacklist list admin subcommand
Summary:
We may need to inspect the list of blacklisted files for a given commit.
This command is called to do so.

Reviewed By: krallin

Differential Revision: D16643467

fbshipit-source-id: ad389c4b9262ffa9ee6ea9b792b4866a23bebaa5
2019-08-16 02:32:45 -07:00
Thomas Orozco
895aa3c27a mononoke: getpackv2 LFS support (+ getfiles / getpack / eden_get_data refactor)
Summary:
This updates Mononoke to support LFS metadata when serving data over getpackv2.

However, in doing so, I've also refactored the various ways in which we currently access file data to serve it to clients or to process client uploads (when we need to compute deltas). The motivation to do that is that we've had several issues recently where some protocols knew about some functionality, and others didn't. Notably, redaction and LFS were supported in getfiles, but neither of them were supported in getpack or eden_get_data.

This patch refactors all those callsites away from blobrepo and instead through repo_client/remotefilelog, which provides an internal common method to fetch a filenode and return its metadata and bytes (prepare_blob), and separate protocol specific implementations for getpackv1 (includes metadata + file content -- this is basically the existing fetch_raw_filenode_bytes function), getpackv2 (includes metadata + file contents + getpackv2 metadata), getfiles (includes just file content, and ties file history into its response) and eden_get_data (which uses getpackv1).

Here are a few notable changes here that are worth noting as you review this:

- The getfiles method used to get its filenode from get_maybe_draft_filenode, but all it needed was the copy info. However, the updated method gets its filenode from the envelope (which also has this data). This should be equivalent.
- I haven't been able to remove fetch_raw_filenode_bytes yet because there's a callsite that still uses it and it's not entirely clear to me whether this is used and why. I'll look into it, but for now I left it unchanged.
- I've used the Mercurial implementation of getpack metadata here. This feels like the better approach so we can reuse some of the code, but historically I don't think we've depended on many Mercurial crates. Let me know if there's a reason not to do that.

Finally, there are a couple things to be aware of as you review:

- I removed some more `Arc<BlobRepo>` in places where it made it more difficult to call the new remotefilelog methods.
- I updated the implementation to get copy metadata out of a file envelope to not require copying the metadata into a mercurial::file::File only to immediately discard it.
- I cleaned up an LFS integration test a little bit. There are few functional changes there, but it makes tests a little easier to work with.

Reviewed By: farnz

Differential Revision: D16784413

fbshipit-source-id: 5c045d001472fb338a009044ede1e22ccd34dc55
2019-08-14 08:48:35 -07:00
Kostia Balytskyi
da7be55f9f mononoke: support tombstone approach to redaction in the apiserver
Summary:
This is needed, so that we do not break the existing clients when they update
to commits with redacted content.

Reviewed By: farnz

Differential Revision: D16752748

fbshipit-source-id: fdd4e2931fd362d2bcb5eb590d56ac71b74a8b03
2019-08-12 08:03:10 -07:00
Kostia Balytskyi
b68d21b5b3 mononoke: support blaclisting magic string in getpack wireproto requests
Summary: We need this to not break clients with `remotefilelog.fetchpacks=True`.

Reviewed By: krallin

Differential Revision: D16735900

fbshipit-source-id: a993610c2e95cfde95a8be1e31eaad825f95dc15
2019-08-12 08:03:10 -07:00
Kostia Balytskyi
533ca41d85 mononoke: add admin command to remove blacklisting
Summary:
This resolves the list of files at a given commit to a list of `ContentId`s
and unblacklists all of those `ContentId`s.

Note 1: This means that these particular versions of files will be
unblacklisted in all commits which reference them.

Note 2: There's no auditing of this admin command invokation, if the
need for one arises, we should add it externally.

Reviewed By: krallin

Differential Revision: D16643436

fbshipit-source-id: 3af488e3e4d748027605587ab1e08c0abd0da109
2019-08-12 04:19:20 -07:00
Kostia Balytskyi
e332a64e32 mononoke: introduce cli structure for blacklisting subcommands
Summary: In the following diffs, I will implement the actual logic.

Reviewed By: krallin

Differential Revision: D16639720

fbshipit-source-id: f42b91476035343a6f00b11e4921e26836b1a574
2019-08-12 04:19:19 -07:00
Thomas Orozco
98f94de92a mononoke/blobimport: lfs retries
Summary: This updates blobimport to retry on failures to fetch through its LFS helper.

Reviewed By: HarveyHunt

Differential Revision: D16667042

fbshipit-source-id: dfaec8fc029030a7d90806018301d22525d094df
2019-08-07 08:33:43 -07:00
Thomas Orozco
97b0763798 mononoke/blobimport: check for reusable LFS blobs
Summary: Updates blobimport to check for existing LFS blobs. This is perfectly safe since at worse it'll fail closed (i.e. blobs will be missing and blobimport will hang).

Reviewed By: HarveyHunt

Differential Revision: D16648560

fbshipit-source-id: ed9da7f2fa69c28451ac058a4e1adc937d111b31
2019-08-07 08:33:42 -07:00
Thomas Orozco
823e7a14ee mononoke/blobimport: log LFS helper context
Summary:
If you give blobimport a bad path for the LFS helper, it just tells you the
file doesn't exist, which is not very useful if you don't know what file we're
talking about. This fixes that.

Reviewed By: HarveyHunt

Differential Revision: D16621624

fbshipit-source-id: ff7fb5c8800604f8799c682e5957c888d2b01647
2019-08-07 08:33:41 -07:00
Thomas Orozco
07f3fa1a88 mononoke/integration: un-blackhole apiserver tests
Summary:
Johan fixed retry logic in Mercurial, so those tests can now succeed even if
the blackhole is enabled (though we haven't fully understood why the blackhole
was breaking them in the first place).

Differential Revision: D16646032

fbshipit-source-id: 8b7ff2d8d284e003e49681e737367e9942370fa1
2019-08-05 05:21:14 -07:00
Thomas Orozco
68569e5d0c mononoke/{types,filestore}: use a separate type for File Chunks
Summary:
NOTE: This changes our file storage format. It's fine to do it now since we haven't started enabling chunking yet (see: D16560844).

This updates the Filestore's chunking to store chunks as their own entity in Thrift, as opposed to have them be just FileContents.

The upside of this approach is that this we can't have an entity that's both a File and a Chunk, which means:

- We don't have to deal with recursive chunks (since, unlike Files, Chunks can't contain be pointers to other chunks).
- We don't have to produce metadata (forward mappings and backmappings) for chunks (the reason we had to produce it was to make sure we wouldn't accidentally produce inconsitent data if the upload for one of our chunks happened to have been tried as a file earlier and failed).

Note that this also updates the return value from the Filestore to `ContentMetadata`. We were using `Chunk` before there because it was sufficient and convenient, but now that `Chunk` no longer contains a `ContentId`, it no longer is convenient, so it's worth changing :)

Reviewed By: HarveyHunt

Differential Revision: D16598906

fbshipit-source-id: f6bec75d921f1dea3a9ea3441f57213f13aeb395
2019-08-02 03:43:16 -07:00
Thomas Orozco
cfa4c8332f mononoke/integration: disable blackhole for apiserver tests
Summary:
The network blackhole is causing the API server to occasionally hang while serving requests, which has broken some LFS tests. This appears to be have happened in the last month or so, but unfortunately, I haven't been able to root cause why this is happening.

From what I can tell, we have an hg client that tries an upload to the API Server, and uploads everything... and then the API server just hangs. If I kill the hg client, then the API server responds with a 400 (so it's not completely stuck), but otherwise it seems like the API server is waiting for something to happen on the client-side, but the client isn't sending that.

As far as I can tell, the API Server isn't actualy trying to make outbound requests (strace does report that it has a Scribe client that's trying to connect, but Scuba logging isn't enabled, and this is just trying to connect but not send anything), but something with the blackhole is causing this hg - API server interaciton to fail.

In the meantime, this diff disables the blackhole for those tests that definitely don't work when it's enabled ...

Reviewed By: HarveyHunt

Differential Revision: D16599929

fbshipit-source-id: c6d77c5428e206cd41d5466e20405264622158ab
2019-08-01 07:36:02 -07:00
Thomas Orozco
4e30164506 mononoke/blobimport: support LFS filenodes
Summary:
This adds support for LFS filenodes in blobimport. This works by passing a `--lfs-helper` argument, which should be an executable that can provide a LFS blob's contents on stdout when called with the OID and size of a LFS blob. My thinking is to `curl` directly from Dewey when running this in prod.

Note that, as of this change, we can blobimport LFS files, but doing so might be somewhat inefficient, since we'll roundtrip the blobs to our filestore, then generate filenodes. For now, however, I'd like to start with this so we can get a sense of whether this is acceptable performance-wise.

Reviewed By: farnz

Differential Revision: D16494241

fbshipit-source-id: 2ae032feb1530c558edf2cfbe967444a9a7c0d0f
2019-07-31 05:19:41 -07:00
Thomas Orozco
f9360cab9d mononoke/filestore: incorporate in Mononoke
Summary:
NOTE: This isn't 100% complete yet. I have a little more work to do around the aliasverify binary, but I think it'll make sense to rework this a little bit with the Filestore anyway.

This patch incorporates the Filestore throughout Mononoke. At this time, what this means is:

- Blobrepo methods return streams of `FileBytes`.
- Various callsites that need access to `FileBytes` call `concat2` on those streams.

This also eliminates the Sha256 aliasing code that we had written for LFS and replaces it with a Filestore-based implementation.

However, note that this does _not_ change how files submitted through `unbundle` are written to blobstores right now. Indeed, those contents are passed into the Filestore through `store_bytes`, which doesn't do chunking. This is intentional since it lets us use LFS uploads as a testbed for chunked storage before turning it on for everything else (also, chunking those requires further refactoring of content uploads, since right now they don't expect the `ContentId` to come back through a Future).

The goal of doing it this way is to make the transition simpler. In other words, this diff doesn't change anything functionally — it just updates the underlying API we use to access files. This is also important to get a smooth release: it we had new servers that started chunking things while old servers tried to read them, things would be bad. Doing it this way ensures that doesn't happen.

This means that streaming is there, but it's not being leveraged just yet. I'm planning to do so in a separate diff, starting with the LFS read and write endpoints in

Reviewed By: farnz

Differential Revision: D16440671

fbshipit-source-id: 02ae23783f38da895ee3052252fa6023b4a51979
2019-07-31 05:19:40 -07:00
Mateusz Kwapich
6c5b692bb3 autoformat all of fbcode/scm with black
Summary: blackpylogo

Reviewed By: krallin

Differential Revision: D16543521

fbshipit-source-id: 75977d00329e1068a92a344c99e0ad0392601c54
2019-07-30 08:19:24 -07:00
Simon Farnsworth
ba39d7e2e6 Commits we are going to pushrebase should be treated as drafts
Summary:
When a commit is pushrebased, we generate a new Bonsai commit to represent the result of the commit, which is the commit we actually make public.

Therefore, the commit that will be rebased should be uploaded as a draft. The rebase result will then provide filenodes, and thus we will generate good linknodes for pushrebaseed commits.

Reviewed By: krallin

Differential Revision: D16550469

fbshipit-source-id: 1f2f00f8cd9ad0f75441aca0eb8daae62ae299e0
2019-07-30 06:40:21 -07:00
George-Catalin Tintareanu
6976315469 Introduce quoted arguments
Summary: Introduce quoted arguments to prevent unwanted behavior

Reviewed By: ikostia

Differential Revision: D16458255

fbshipit-source-id: 842ba54782b5baf3dea7a22b220a595b11f274af
2019-07-25 03:19:13 -07:00
Stanislau Hlebik
099856d71c mononoke: avoid combinatoric explosion in derived_data framework
Summary:
In a very mergy repos we can hit a combinatoric explosion by visiting the same
node over and over again. Derived data framework has the same problem, and this diff
fixes it.

I had a few attempts at implementing it:

**1** Use `bounded_traversal`, but change unfold to filter out parents that were already visited.
That wasn't correct because during fold will be called only with
the "unvisited" parents. For example in a case like

```
  D
 /  \
C    B
 \ /
  A
```

fold for C or B will be called with empty parents, and that's incorrect.

**2** Use `bounded_traversal`, change unfold to filter out visited parents but
also remember real parents.

That won't work as well. The reason is that fold might be called before unfold
for parents have finished. so in the case like

```
  D
 /  \
C    B
 \ /
  A
  |
 ...
thousands of commits
```

If C reaches A first, then B won't visit any other node, and it will try to
derive data for B. However derived data for A might not be ready yet, so
deriving data for B might fail.

**3** Change bounded_traversal to support DAGs not just tree.

From two points above it's clear that bounded_traversal should be called
bounded_tree_traversal, because on any other DAGs it might hit combinatoric
explosion. I looked into changing bounded_traversal to support DAGs, and that
was possible but that was not easy. Specifically we need to make sure that all
unfold operations are called after fold operations, stop using integers for
nodes etc. It might also have a perf hit for the tree case, but not clear how
big is it.
While I think supporting DAGs in bounded_traversal makes sense, I don't want to
block derived data implementation on that. I'll create a separate task for that

---------------------------------------------------------------------------------

The approach I took in the end was to use bounded_stream_traversal that don't
visit the same node twice. Doing this will find all commits that need to be
regenerated but it might return them in an arbitrary order. After that we need
to topo_sort the commits (note that I introduced the bug for hg changeset
generation in D16132403, so this diff fixes it as well).

This is not the most optimal implementation because it will generate the nodes
sequentially even if they can be generated in parallel (e.g. if the nodes are
in different branches). I don't think it's a huge concern so I think it worth
waiting for bounded_dag_traversal implementation (see point 3) above)

---------------------------------------------------------------------------------

Finally there were concerns about memory usage from additional hashset that
keeps visited nodes. I think these concerns are unfounded for a few reasons:

1) We have to keep the nodes we visited *anyway* because we need to generated
derived data from parents to children. In fact, bounded_traversal keeps them in
the map as well.
That's true that bounded traversal can do it a bit more efficiently in cases
we have two different branches that do not intersect. I'd argue that's a rare
case and happens only on repo merges which have two independent equally sized
branches. But even for the case it's not a huge problem (see below).

2) Hashset just keep commit ids which are 32 bytes long. So even if we have 1M
commits to generate that would take 32Mb + hashset overhead. And the cases like
that should never happen in the first place - we do not expect to generate
derived data for 1M of commits except for the initial huge repo imports (and
for those cases we can afford 30Mb memory hit). If we in the state where we
need to generate too many commits we should just return an error to the user,
and we'll add it in the later diffs.

Reviewed By: krallin

Differential Revision: D16438342

fbshipit-source-id: 4d82ea6111ac882dd5856319a16dda8392dfae81
2019-07-24 11:55:34 -07:00
George-Catalin Tintareanu
25e53f3dd0 Merged integration tests for censorship config
Summary: `test-censoring-enabled` and `test-censoring-disabled` test if the config option to disable/enable censorship verification works . The diff merges these 2 tests since they test the same thing and the configuration is similar for both of the cases

Reviewed By: krallin

Differential Revision: D16441380

fbshipit-source-id: 97b3cfc6969be32d661d4676d6cdc7cfcd4eed04
2019-07-24 09:49:57 -07:00
George-Catalin Tintareanu
bfe906c02a censored magic string response
Summary:
Any get request for a blacklisted file returns a magic string - not an error anymore - which replaces the actual content of the file. The magic string represents a sequence of 256 random characters.
Put still returns an error if there is an attempt to change a blacklisted file, but that's the normal behavior for this type of request.

Reviewed By: ikostia

Differential Revision: D16333820

fbshipit-source-id: 54e0189ebe2a71ea365979950d44dbb3fd1e5dc7
2019-07-24 03:42:27 -07:00