Commit Graph

1335 Commits

Author SHA1 Message Date
Lukas Piatkowski
12c684afcd mononoke/hooks: make deny_files public
Reviewed By: aslpavel

Differential Revision: D23537799

fbshipit-source-id: 58c9568e30982f682b00faae42bc3a3f3595890f
2020-09-04 12:23:35 -07:00
Thomas Orozco
3ba2c2b429 mononoke/hg_sync: make it work on Mercurial Python 3
Summary:
A few things here:

- The heads must be bytes.
- The arguments to wireproto must be strings (we used to encode / decode them,
  but we shouldn't).
- The bookmark must be a string (otherwise it gets serialized as `"b\"foo\""`
  and then it deserializes to that instead of `foo`).

Reviewed By: StanislavGlebik

Differential Revision: D23499846

fbshipit-source-id: c8a657f24c161080c2d829eb214d17bc1c3d13ef
2020-09-04 11:56:44 -07:00
Thomas Orozco
747b355236 mononoke: make mononoke_hg_sync_job sendunbundlereplaybatch more debuggable
Summary:
Right now we get very little logging out of errors in here, which is making it
difficult to fix it on Py3 (where it currently is broken).

This diff doesn't fix anything, but at the very least, let's make the errors
better so we can make this easier to start debugging.

Reviewed By: ahornby

Differential Revision: D23499369

fbshipit-source-id: 7ee60b3f2a3be13f73b1f72dee062ca80cb8d8d9
2020-09-04 11:56:44 -07:00
Thomas Orozco
c8dd8ae4e3 mononoke: run tests using hg Python 3 as well
Summary:
The motivation for this is to surface potential regressions in hg Python 3 by
testing code paths that are exercised in Mononoke. The primary driver for this
were the regressions in the LFS extension that broke uploads, and for which we
have test coverage here in Mononoke.

To do this, I extracted the manifest generation (the manifest is the list of
binaries that the tests know about, which is passed to the hg test runner), and
moved it into its own function, then added a new target for the py3 tests.

Unfortunately, a number of tests are broken in Python 3 currently. We should
fix those. It looks like there are some errors in Mercurial when walking a
manifest with non-UTF-8 files, and the other problem is that the hg sync job is
in fact broken: https://fburl.com/testinfra/545af3p8.

Reviewed By: ahornby

Differential Revision: D23499370

fbshipit-source-id: 762764147f3b57b2493d017fb7e9d562a58d67ba
2020-09-04 11:56:44 -07:00
Stanislau Hlebik
7b323a4fd9 mononoke: add log-only mode in redaction
Summary:
Before redacting something it would be good to check that this file is not
accessed by anything. Having log-only mode would help with that.

Reviewed By: ikostia

Differential Revision: D23503666

fbshipit-source-id: ae492d4e0e6f2da792d36ee42a73f591e632dfa4
2020-09-04 07:37:15 -07:00
Stanislau Hlebik
0740f99f13 mononoke: allow logging censored scuba accesses to file
Summary:
In the next diff I'm going to add log-only mode to redaction, and it would be
good to have a way of testing it (i.e. testing that it actually logs accesses
to bad keys).

In this diff let's use a config option that allows logging censored scuba
accesses to file, and let's update redaction integration test to use it

Reviewed By: ikostia

Differential Revision: D23537797

fbshipit-source-id: 69af2f05b86bdc0ff6145979f211ddd4f43142d2
2020-09-04 07:37:14 -07:00
Thomas Orozco
f1e4f62e2d mononoke/fsnodes: expose FsnodeFile as the LeafId
Summary:
Fsnodes have a lot of data about files, but right now we can't access it
through a Fsnode lookup or a manifest walk, because the LeafId for a Fsnode is
just the content id and the file type.

This is a bit sad, because it means we e.g. cannot dump a manifest with file
sizes (D23471561 (179e4eb80e)).

Just changing the LeafId is easy, but that brings a new problem with Fsnode
derivation.

Indeed, deriving manifests normally expects us to have the "derive leaf"
function produce a LeafId (so we'd want to produce a `FsnodeFile`), but in
Fsnodes, this currently happens in deriving trees instead.

Unfortunately, we cannot easily just move the code that produces `FsnodeFile`
from the tree derivation to the leaf derivation, that is, do:

```
fn check_fsnode_leaf(
    leaf_info: LeafInfo<FsnodeFile, (ContentId, FileType)>,
) -> impl Future<Item = (Option<FsnodeSummary>, FsnodeFile), Error = Error>
```

Indeed, the performance of Fsnode derivation relies on all the leaves for a
given tree being derived together with the tree and its parents in context.

So, we'd need the ability for deriving a new leaf to return something different
from the actual leaf id. This means we want to return a `(ContentId,
FileType)`, even though our `LeafId` is a `FsnodeFile`.

To do this, this diff introduces a new `IntermediateLeafId` type in the
derivation. This represents the type of the leaf that is passed from deriving a
leaf to deriving a tree. We need to be able to turn a real `LeafId` into it,
because sometimes we don't re-derive leaves.

I think we could also refactor some of the code that passes a context here to
just do this through the `IntermediateLeafId`, but I didn't look into this too
much.

So, this diff does that, and uses it in Mononoke Admin so we can print file
sizes.

Reviewed By: StanislavGlebik

Differential Revision: D23497754

fbshipit-source-id: 2fc480be0b1e4d3d261da1d4d3dcd9c7b8501b9b
2020-09-04 06:30:18 -07:00
Mateusz Kwapich
f7be2eef14 tunable scuba sampling
Summary:
This allows us to sample the most popular method logs (`repo_list_hg_manifest` calls make up for 90% samples in our scuba table) while still have full logging for other queries end errors.

The sampling can be eaily disabled via tunable. In case we get a lot of errors we can also start sampling the error request with a simple configerator change.

Reviewed By: krallin

Differential Revision: D23507333

fbshipit-source-id: c7e34467d99410ec3de08cce2db275a55394effd
2020-09-04 06:26:35 -07:00
Viet Hung Nguyen
437a0e905b mononoke/repo_import: add deriving data types for multiple repos
Summary: Previously, we only supported deriving data types for the repo we import into. This diff expands on this and now we can do that for multiple repos (e.g. small repos we backsync commits to from large repo we import to).

Reviewed By: StanislavGlebik

Differential Revision: D23499953

fbshipit-source-id: 223209a6a2739eae93082cae4f04e53e0cba0c58
2020-09-04 05:39:21 -07:00
Stanislau Hlebik
11a45b6b60 mononoke: do not pass tasks to find_files_with_given_content_id_blobstore_keys
Summary:
In the next diff I'm going to add log_only mode for redaction.
And in this diff I make a small refactoring that makes next diff simpler.
find_files_with_given_content_id_blobstore_keys don't accept tasks anymore,
just content keys.

Reviewed By: aslpavel

Differential Revision: D23535829

fbshipit-source-id: 1dac37f5ea7038fc779ad51192a290fcc23e6556
2020-09-04 05:22:03 -07:00
Lukas Piatkowski
67a71d1f98 mononoke/hooks: make limit_commitsize and limit_filesize public
Reviewed By: aslpavel

Differential Revision: D23502908

fbshipit-source-id: 8b9070cfaa28af7b808d02548c0fb7c5d344550d
2020-09-04 04:23:05 -07:00
Lukas Piatkowski
462cb96cc2 mononoke/hooks: make no_questionable_filenames public
Reviewed By: aslpavel

Differential Revision: D23478259

fbshipit-source-id: 642948c2685690298a71fbe7177c4bd6a6e43f85
2020-09-04 04:23:05 -07:00
Lukas Piatkowski
eebdc0b896 mononoke/metaconfig: sync thrift changes from configerator for HookConfig
Summary: Use the new fields from RawHookConfig in HookConfig

Reviewed By: StanislavGlebik

Differential Revision: D23499766

fbshipit-source-id: 43e9d2dfdcfb0fa0dd4de6310ea0013db1b69474
2020-09-04 02:02:06 -07:00
Stefan Filip
3f0b08e46f segmented_changelog: add version field to IdMap
Summary:
The version is going to be used to seamlessly upgrade the IdMap. We can
generate the IdMap in a variety of ways. Naturally, algorithms for generating
the IdMap may change, so we want a mechanism for updating the shared IdMap.

A generated IdDag is going to require a specific IdMap version. To be more
precise, the IdDag is going to specify which version of IdMap it has to be
interpreted with.

Reviewed By: quark-zju

Differential Revision: D23501158

fbshipit-source-id: 370e6d9f87c433645d2a6b3336b139bea456c1a0
2020-09-03 16:33:20 -07:00
Stefan Filip
58a4821fe3 segmented_changelog: add IdMap trait with SqlIdMap implementation
Summary:
Separate the operational bits of the IdMap from the core SegmentedChangelog
requirements.

I debaded whether it make sense to add repo_id to SqlIdMap. Given the current
architecture I don't see a reason not to do it. On the contrary separating
two objects felt convoluted.

Reviewed By: quark-zju

Differential Revision: D23501160

fbshipit-source-id: dab076ab65286d625d2b33476569da99c7b733d9
2020-09-03 16:33:20 -07:00
Stefan Filip
f3c353edbc segmented_changelog: change idmap module from file to directory
Summary:
Planning to add a trait for core idmap functionality (that's just translating
cs_id to vertex and back). The current IdMap will then be an implementation of
that trait.

Reviewed By: quark-zju

Differential Revision: D23501159

fbshipit-source-id: 34e3b26744e4b5465cd108cca362c38070317920
2020-09-03 16:33:20 -07:00
Stanislau Hlebik
4947e07cb7 mononoke: asyncify one function in redaction admin subcommand
Summary:
I'm going to change this function soon, so it's nice to asyncify it to make
next diffs simpler and also remove duplicated logic.

Also remove unnecessary `logger` parameter - we can always get logger from CoreContext

Reviewed By: krallin

Differential Revision: D23501634

fbshipit-source-id: 7ad2fc17167e4107481ceb230e0b7cb3e7f2549a
2020-09-03 12:22:24 -07:00
Mateusz Kwapich
20d096f5d5 add thrift metadata support
Summary: This closely replicates EscapeZero work in D23328638 and will allow us to issue requests to SCS using Thrift Fiddle (https://www.internalfb.com/thrift_fiddle).

Reviewed By: EscapeZero

Differential Revision: D23475864

fbshipit-source-id: fb286e3fcd6ea79704fa2e7e1ed9ab5595ff7b81
2020-09-03 12:18:18 -07:00
Arun Kulshreshtha
858a080502 gotham_ext: make StreamBody automatically delay post-request callbacks
Summary: Now that post-request callbacks are available in `gotham_ext`, we can make `StreamBody` use them directly instead of using an LFS-specific wrapper (previously required to access the LFS server's `RequestContext`). This also means that the EdenAPI server will get this behavior for free.

Reviewed By: krallin

Differential Revision: D23402969

fbshipit-source-id: 56ab710473f13e8983b136664af364af6884bd3f
2020-09-03 11:59:32 -07:00
Arun Kulshreshtha
5556a447d1 edenapi_server: use LogMiddleware
Summary: Add `LogMiddleware` to the EdenAPI server, which will print a log message whenever a request is received or has completed.

Reviewed By: DurhamG

Differential Revision: D23299902

fbshipit-source-id: f44ef1b01692f0e4f9b109917fcee89a84ca4208
2020-09-03 11:59:32 -07:00
Arun Kulshreshtha
96a6a3fcfb edenapi_server: use LoadMiddleware
Summary: Use `LoadMiddleware` to track the number of outstanding requests in the server.

Reviewed By: DurhamG

Differential Revision: D23298415

fbshipit-source-id: bdcdb0f657d8deac593d356c87ac0d8d3f39e322
2020-09-03 11:59:32 -07:00
Arun Kulshreshtha
7144363d2c gotham_ext: move LogMiddleware to gotham_ext
Summary: Now that `LogMiddleware` no longer depends on `RequestContext`, it can be moved into `gotham_ext`.

Reviewed By: DurhamG

Differential Revision: D23298412

fbshipit-source-id: d5288decba98c3dd4605b9a44e41eba0f47fee37
2020-09-03 11:59:31 -07:00
Arun Kulshreshtha
35d292e513 gotham_ext: move LoadMiddleware to gotham_ext
Summary: Now that `LoadMiddleware` no longer depends on `RequestContext`, it can be moved into `gotham_ext`.

Reviewed By: DurhamG

Differential Revision: D23298416

fbshipit-source-id: 5d29da492e39beb5621daf0570d9b3e657cbfc04
2020-09-03 11:59:31 -07:00
Arun Kulshreshtha
82c451fb9f lfs_server: use PostRequestMiddleware
Summary: This diff removes the post-request callback functionality from the LFS server's `RequestContext` and replaces it with the new `PostRequestMiddleware`. The middleware is directly based on `RequestContext`, so the underlying behavior is essentially the same as before.

Reviewed By: krallin

Differential Revision: D23298413

fbshipit-source-id: 1e58a40f6ce6d526456dbd9ae3a8efc85768bf04
2020-09-03 11:59:31 -07:00
Arun Kulshreshtha
3ad7fa8b6f gotham_ext: allow applications to dynamically configure PostRequestMiddleware
Summary: Make `PostRequestMiddleware` generic over a user-provided config struct which can be used to dynamically configure the behavior of post-request callback dispatching. Right now this is only used to support disabling hostname logging, but could be easily extended to cover more uses in the future.

Reviewed By: krallin

Differential Revision: D23495005

fbshipit-source-id: 3d59a8346f449775ec76d03c260d973d04fb90a9
2020-09-03 11:59:31 -07:00
Arun Kulshreshtha
cc0f2e4c40 gotham_ext: add PostRequestMiddleware
Summary: Add new middleware that allows HTTP handlers and other middleware to register callbacks that will be run once the current request completes. This is heavily based on the post-request callback functionality from the LFS server's `RequestContext`. The intention here is to expose this functionality in a manner that's independent of other, application-specific logic.

Reviewed By: krallin

Differential Revision: D23298419

fbshipit-source-id: e4b1534b02c35f685ce544de13e331947e187818
2020-09-03 11:59:31 -07:00
Thomas Orozco
d77cf89ead mononoke/admin: clean up unodes subcommand a bit
Summary:
I pattern matched off of this for the previous diff in this stack, and spotted
a bit of clean up that might make sense here:

- Using `.help()` for a subcommand overrides the whole help text. We meant to
  use `.about()` here. I fixed this in some copy-pasted code as well.
- Printing debug output alongside real output makes it harder to select the
  real output. I fixed this by logging debug output to stderr instead.

Reviewed By: StanislavGlebik

Differential Revision: D23471560

fbshipit-source-id: 7900cfe65613c48abd77faad6d6a45a7aa523b36
2020-09-03 09:32:06 -07:00
Thomas Orozco
179e4eb80e mononoke/admin: add a subcommand for dumping paths
Summary:
This adds a subcommand for dumping all the paths in a repository. This is
helpful when you have a Content ID, limited imagination and time on your hands,
and you'd like to turn those into a file path where that Content ID lives.

This uses fsnodes for the traversal because that's O(# directories) as opposed
top O(# files). I had an earlier implementation that used unodes, but that was
really slow.

Reviewed By: StanislavGlebik

Differential Revision: D23471561

fbshipit-source-id: 948bfd20939adf4de0fb1e4b2852ad4d12182f16
2020-09-03 09:32:06 -07:00
Viet Hung Nguyen
7c34b39ec8 mononoke/repo_import: add backsyncing to rewrite file paths, remove backup file
Summary:
add backsyncing to rewrite file paths:
After setting the variables for large repo (D23294833 (d6895d837d)), we try to import the git commits into large repo and rewrite the file paths.
Following this, repo import tool should back-sync the commits into small_repo.

next step: derive all the data types for both small and large repos. Currently, we only derive it for the large repo.

==============
remove backup file:
The backup file was a last-minute addition when trying to import a repo for the first time.
Removed it, because we shouldn't write to external files. Future plan is to include
better process recoverability across the whole tool and not just rewrite file paths functionality.

Reviewed By: StanislavGlebik

Differential Revision: D23452571

fbshipit-source-id: bda39694fa34788218be795319dbbfd014ba85ff
2020-09-03 06:43:08 -07:00
Stanislau Hlebik
a77d9f243a mononoke: parallelize operations in create_commit scs method
Reviewed By: krallin

Differential Revision: D23496535

fbshipit-source-id: 18f88abb9b85d38a93d2aa99c38edcf8190343c3
2020-09-03 04:12:35 -07:00
Lukas Piatkowski
a4af730541 monononke/hooks: make no_bad_filenames public
Reviewed By: aslpavel

Differential Revision: D23474524

fbshipit-source-id: 5f7826346500b1acc7450791dd1e7806c4e623d6
2020-09-03 02:40:43 -07:00
Lukas Piatkowski
81d9338100 mononoke/hooks: make few generic hooks public
Summary: More hooks will come in next diffs.

Reviewed By: aslpavel

Differential Revision: D23449755

fbshipit-source-id: 451fdb7a759140f2d6df8f3a18493c700fa2b761
2020-09-03 02:40:43 -07:00
Stanislau Hlebik
29bbc0dc15 mononoke: check if content we are about to redact is not reachable
Summary:
That's one of the sev followups. Before redacting a file content let's check if
it exists in "main-bookmark" (which is be default master), and refuse to redact
if it actually exists.

If this check passes (i.e. the content we are about to redact is not reachable
from master) that doesn't mean that we are 100% safe. E.g. this comment can be
in ancestor of master, or in any other repo or it can be added in the next
commit.

This check is a best-effort check to prevent shooting ourselves in the foot.

Reviewed By: aslpavel

Differential Revision: D23476278

fbshipit-source-id: 5a4cd10964a65b8503ba9a6391f17319f0ce37d8
2020-09-03 01:30:14 -07:00
Stefan Filip
da4c33c67a tests: add commit-location-to-hash integration test
Summary: Exercise location-to-hash functionality in edenapi.

Reviewed By: kulshrax

Differential Revision: D23456214

fbshipit-source-id: 2ab22eb045517a5927c2de502d8cfc9898daecef
2020-09-02 17:20:43 -07:00
Stefan Filip
932450fb15 handlers: update location-to-hash endpoint with count parameter
Summary:
To reduce the size over the wire on cases where we would be traversing the
changelog on the client, we want to allow the endpoint to return a whole parent
chain with their hashes.

Reviewed By: kulshrax

Differential Revision: D23456216

fbshipit-source-id: d048462fa8415d0466dd8e814144347df7a3452a
2020-09-02 17:20:42 -07:00
Stefan Filip
7122cdded7 types: rename Location to CommitLocation
Summary:
Renaming all the LocationToHash related structures to CommitLocationToHash.
This is done for consistency. I realized the issue when the command for reading
the request from cbor was not what I was expecting it to be. The reason was that
the commit prefix was used inconsistently for LocationToHash.

Reviewed By: kulshrax

Differential Revision: D23456221

fbshipit-source-id: 0181dcaf81368b978902d8ca79c5405838e4b184
2020-09-02 17:20:42 -07:00
Stefan Filip
310b3616a6 blobrepo: instantiate segmented changelog as an attribute
Summary:
Segmented Changelog is a component that has multiple components of each own
that each can be configured in different ways. It seems that it already is
more complicated than other components in how it is set up and it will probably
evolve to have more knobs (caching comes to mind).

Right now we have 3 ways of instantiating SegmentedChangelog:
- Disabled, all requests return errors
- ReadOnly, requests to unprocessed commits return errors
- OnDemandUpdate, requests trigger commit processing when required

Reviewed By: aslpavel

Differential Revision: D23456217

fbshipit-source-id: a6016f05197abbc3722764fa8e9056190a767b36
2020-09-02 17:20:42 -07:00
Stefan Filip
b818a86631 config: add segmented changelog config parsing
Summary:
Parsing is done in the SegmentedChangelogConfig structure which will inform
how to construct the SegmentedChangelog in Mononoke.

Reviewed By: aslpavel

Differential Revision: D23456222

fbshipit-source-id: a7d5d81f4c166909164026e81af57f1c2ea32347
2020-09-02 17:20:42 -07:00
Stefan Filip
e57b1f9265 segmented_changelog: add on-demand updating dag implementation
Summary:
The Segmented Changelog must be built somewhere. One of the simplest deployments
of involves the on-demand update of the graph. When a commit that wasn't yet
processed is encountered, we sent it to processing along with all of it's
ancestors.

At this time not much attention was paid to the distinction of master commit
versus non-master commit. For now the expectation is that only commits from
master will exercise this code path. The current expectation is that clients
will only call location-to-hash using commits from master.
Let me know if there is an easy way to check if a commit is part of master.
Later changes will invest more in handling non-master commits.

Reviewed By: aslpavel

Differential Revision: D23456218

fbshipit-source-id: 28c70f589cdd13d08b83928c1968372b758c81ad
2020-09-02 17:20:42 -07:00
Stefan Filip
d50e09a41d segmented_changelog: add SegmentedChangelogBuilder
Summary:
This builders implements SqlConstruct and SqlConstuctFromMetadataDatabaseConfig
to make handling the Sql connection for IdMap consistent with what happens in
Mononoke in general.

Reviewed By: aslpavel

Differential Revision: D23456219

fbshipit-source-id: 6998afbbfaf1e0690a40be6e706aca1a3b47829f
2020-09-02 17:20:42 -07:00
Stefan Filip
66706d77c5 segmented_changelog: add SegmentedChangelog trait
Summary:
The trait provides two methods for location to hash translation. The first
returns a single hash and is existing functionality. The second returns a
list of hashes and represents new functionality. This diff also adds this
functionality to the Dag structure which is currently the only real
implementation for SegmentedChangelog.

Reviewed By: aslpavel

Differential Revision: D23456215

fbshipit-source-id: 0c2ca91672cf23129342c585f98446c0ebbdf7ef
2020-09-02 17:20:41 -07:00
Stefan Filip
10b233f180 blobrepo: move ChangesetFetcher to attributes
Summary:
I am planning to add Segmented Changelog to attributes.

I am writing an integration test for an EdenApi endpoint that depends on
Segmented Changelog and I would like to set it up to update on demand. When a
request comes in for a commit that we haven't parsed for Segmented Changelog we
want to update the structure on demand. This means that we probably need to
fetch commits. This means that we want to pass the ChangesetFetcher to Segmented
Changelog when it is built. Since Segmented Changelog fits well as an attribute
we want the ChangesetFetcher as an attribute.

I wonder how much thought has been given to attributes behaving as a dependency
injector in the `guice` sense.

Reviewed By: aslpavel

Differential Revision: D23428201

fbshipit-source-id: 7003c018ba806fd657dd8f071e0e83d35058b10f
2020-09-02 17:20:41 -07:00
Kostia Balytskyi
6e8cbd31b1 megarepotool: add gradual-merge-progress subcommand
Summary:
This is to be able to automatically report progress: how many merges has been
done already.

Note: this intentionally uses the same logic as regular `gradual-merge`, so that we always report correct numbers.

Reviewed By: StanislavGlebik

Differential Revision: D23478448

fbshipit-source-id: 3deb081ab99ad34dbdac1057682096b8faebca41
2020-09-02 12:18:31 -07:00
Thomas Orozco
b8e197fdb4 mononoke/lfs_server: allow enabling rate limits probabilistically
Summary:
If we exceed a rate limit, we probably don't want to just drop 100% of traffic.
This would create a sawtooth pattern where we allow a bunch of traffic, update
our counters, drop a bunch of traffic, update our counters again, allow a bunch
of traffic, etc.

To fix this, let's make limits probabilistic. This lets us say "beyond X GB/s,
drop Y% of traffic", which is closer to a sane rate limit.

It might also make sense to eventually change this to use ratelim. Initially,
we didn't do this because we needed our rate limiting decisions to be local to
a single host (because different hosts served different traffic), but now that
we spread the load for popular blobs across the whole tier, we should be able
to just delegate to ratelim.

For now, however, let's finish this bit of a functionality so we can turn it
on.

The corresponding Configerator change is here: D23472683

Reviewed By: aslpavel

Differential Revision: D23472945

fbshipit-source-id: f7d985fded3cdbbcea3bc8cef405224ff5426a25
2020-09-02 11:02:18 -07:00
Stanislau Hlebik
cdf96a20dd mononoke: asyncify redaction_add
Summary: Will change it in the next diff, so let's asyncify it now.

Reviewed By: aslpavel

Differential Revision: D23475332

fbshipit-source-id: f25fb7dc16f99cb140df9374f435e071401c2b90
2020-09-02 09:28:48 -07:00
Alex Hornby
b22599c500 mononoke: memo the hash values of interned paths in the walker
Summary: Memo the hash values of interned paths in the walker. The interner calls the hash function inside a lock that gets heavily contended, so this reduces the time the lock is held.

Reviewed By: farnz

Differential Revision: D23075260

fbshipit-source-id: 3ee50e3ce56106eadd17dc7d737ba95282640051
2020-09-02 05:52:33 -07:00
Alex Hornby
46cc110012 mononoke: switch walker from arc-intern to internment
Summary: Switch the walker from arc-intern::ArcIntern to internment::ArcIntern as internment does not need to acquire its map's locks on every drop.

Reviewed By: farnz

Differential Revision: D23075265

fbshipit-source-id: 6dd241aed850ec0fd3c8a4e68dda06053ec0b424
2020-09-02 05:52:33 -07:00
Kostia Balytskyi
d49406d847 repo_client: get rid of unneeded perf counters
Summary:
These two perf counters proved to be not very convenient to evaluate the
volume of undesired file fetches. Let's get rid of them. Specifically, they are
not convenient, because they accumulate values and it's hard to aggregate over
them.

Note that I don't do the same for tree fetches, as there's no better way of
estimating those now.

Reviewed By: mitrandir77

Differential Revision: D23452913

fbshipit-source-id: 08f8dd25eece495f986dc912a302ab3109662478
2020-09-02 05:02:46 -07:00
Kostia Balytskyi
e7ddc6cc13 undesired fetches: regex-based reporting
Summary:
We want to be able to report more than just on one prefix. Instead, let's add a regex-based reporting. To make deployment easier, let's keep both options for now and later just remove prefix-based one.

Note: this diff also changes how a situation with absent `undesired_path_prefix_to_log` is treated. Previously, if `undesired_path_prefix_to_log` is absent, but `"undesired_path_repo_name_to_log": "fbsource"`, it would report every path. Now it won't report any, which I think is a saner behavior. If we do ever want to report every path, we can just add `.*` as a regex.

Reviewed By: StanislavGlebik

Differential Revision: D23447800

fbshipit-source-id: 059109b44256f5703843625b7ab725a243a13056
2020-09-01 12:01:00 -07:00
Viet Hung Nguyen
2c1d4a49ad mononoke/repo_import: change logic of file paths rewriting with multiple movers
Summary:
This diff modifies how we rewrite file paths when we import into a repo by allowing the tool to apply multiple movers.

Motivation:
When we try to import into a small repo that pushredirects to a large repo, we have decided to import into the large repo first, then backsync to the small repo. To do that, we have to set a couple of flags related to importing into the large repo (see: D23294833 (d6895d837d)): bookmarks and import destination path.  Previously, we fixed the destination path in large repo by applying the small_to_large repo syncer's mover on the destination path in small repo. e.g:
if small_to_large repo syncer mover = {
default_action = prepend(**large_dir**)
map = [...]},
then **destination_path** in small repo becomes **large_dir/destination_path** in large repo.
After this, we prepended the imported files with the new prefix with another mover: prepend(**large_dir/dest_path**)
a -> large_dir/dest_path/a
Consequently, all directories and files under **destination_path** would get imported under **large_dir/destination_path** in large repo with this logic. e.g.
However, it's possible that with push-redirections, some directories would get remapped to a different place in large repo. e.g
small_to_large syncer mover = {
default_action = prepend(**large_dir**)
map = [
dest_path/b -> random_dir/b
]},
but with the current repo_import implementation dest_path/b would get prepended to large_dir/dest_path/b.
To avoid this, we apply multiple movers on the imported files. e.g.
1. we prepend all files with dest_path:
    mover = {
    default_action: prepend(**dest_path**)
    map={}} =>
    a -> dest_path/a
    b -> dest_path/b
2. we remap the files using the small_to_large repo syncer mover:
    mover = {
 default_action: prepend(**large_dir**)
 map =
 {dest_path/b -> random_dir/b}} =>
   dest_path/a -> large_dir/dest_path/a
   dest_path/b -> random_dir/b

Reviewed By: StanislavGlebik

Differential Revision: D23371244

fbshipit-source-id: 0bf4193b24d73c79ed00dfb38e2b0538388d1c0f
2020-09-01 09:26:07 -07:00