Commit Graph

7265 Commits

Author SHA1 Message Date
Zeyi (Rice) Fan
a6a73ec6b6 switch to tokio 0.2 transport
Summary:
This diff adds a `SocketTransport` implementation that no longer uses legacy `tokio-core` based futures but `tokio-tower` and `tower-service` for processing Thrift requests.

The old implementation is renamed to `SocketTransportLegacy` for better transitioning.

Reviewed By: dtolnay

Differential Revision: D20019196

fbshipit-source-id: 3bee684e9254bf1a81669ef0d2c2262a55e75daa
2020-09-08 17:53:57 -07:00
Zeyi (Rice) Fan
26c8020522 explicitly specify features for tokio-util
Summary: This is needed in a later diff that requires "codec" feature from `future-util`.

Reviewed By: dtolnay

Differential Revision: D23575630

fbshipit-source-id: e9cdf11b6ec05e5f2744da6b6efd8cb7bf08b212
2020-09-08 17:53:56 -07:00
Saurabh Singh
858dbc6861 tests: fix 'test-remotefilelog-undesired-file-logging.t'
Reviewed By: DurhamG

Differential Revision: D23589645

fbshipit-source-id: 350bab980baa811824d7c4fd36d689a5a3395dd8
2020-09-08 17:36:35 -07:00
Xavier Deguillard
1f94f8d652 win: add eden.exe to the package
Summary:
One of the difference between linux/macOS and Windows is that `edenfsctl` needs
to be used while `eden` works on the other platforms. This forces both users to
change their habit, and all the scripts at FB to be changed to take edenfsctl
into consideration.

Reviewed By: chadaustin

Differential Revision: D23550567

fbshipit-source-id: de2b0853137409e595a0012ca9286c37208b98a1
2020-09-08 16:33:55 -07:00
Kostia Balytskyi
39d1cd8a47 synced_commit_mapping: add get which returns a vec
Summary:
This method is a future of synced-commit-mapping: there can be multiple query
results and we should make a decision of whether it is acceptable for the
business logic in the business logic, rather than pick a random one.

In later diffs I will introduce the consumers for this method.

Reviewed By: mitrandir77

Differential Revision: D23574165

fbshipit-source-id: f256f82c9848f54e5096c6e50d42600bfd260081
2020-09-08 13:36:04 -07:00
Kostia Balytskyi
8e2b7754c4 synced_commit_mapping: rename get into get_one
Summary:
Another preparatory step for the actuall mapping model fix. This just renames
`get` method into a `get_one` to emphasize it's use-case and to ease the search later.

At the end of this change, I expect there to be no use-cases for `get_one` and expect is to be gone.

Reviewed By: mitrandir77

Differential Revision: D23574116

fbshipit-source-id: f5015329b15f3f08961006607d0f9bf10f499a88
2020-09-08 13:36:04 -07:00
Kostia Balytskyi
688309059b commit_rewriting: extract existing commit_sync_outcome into a file
Summary: This is just preparatory extraction to make further work more convenient.

Reviewed By: mitrandir77

Differential Revision: D23574077

fbshipit-source-id: 352ca8ac62bae4fd8fcb980da05c95ce477a414e
2020-09-08 13:36:04 -07:00
Durham Goode
2919268555 revisionstore: auto-delete when we have too much pack data
Summary:
In order to keep the hgcache size bounded we need to keep track of pack
file size even during normal operations and delete excess packs.

This has the negative side effect of deleting necessary data if the operation is
legitimately huge, but we'd rather have extra downloading time than fill up the
entire disk.

Reviewed By: quark-zju

Differential Revision: D23486922

fbshipit-source-id: d21be095a8671d2bfc794c85918f796358dc4834
2020-09-08 11:33:50 -07:00
Durham Goode
717d10958f revisionstore: refactor pack iteration code
Summary:
In a future diff we'll add logic to delete old pack files. We'll want
to use this pack iteration code, so let's move it to a function.

Reviewed By: quark-zju

Differential Revision: D23486920

fbshipit-source-id: 5f872e946ffe816289c925dd2e03c292e29da5af
2020-09-08 11:33:50 -07:00
Durham Goode
651a0690be revisionstore: auto-commit datapacks when they get large
Summary:
As the repository grows the opportunity for large downloads increases.
Today all writes to data packs get sent straight to disk, but we have no way to
prevent this from eating all the disk.

Let's automatically flush datapacks when they reach a certain size (default
4GB). In a future diff this will let us automatically garbage collect data packs
to bound the maximum size of packs.

Rotatelog already have this behavior.

Reviewed By: quark-zju

Differential Revision: D23478780

fbshipit-source-id: 14f9f707e8bffc59260c2d04c18b1e4f6bdb2f90
2020-09-08 11:33:50 -07:00
Genevieve Helsel
51c7e04e04 implement get_process_start_time for mac
Summary:
Top of the stack, last process to implement for full implementation on the `edenfs_restarter` code path. Again, we don't have `/proc/pid/stat`, so instead we use datetimes to calculate the start time of the process since epoch in seconds.

Note here that none of these manual runs look at the versions installed/running, and that is because that kind of manual testing only works if I build and manually deploy an rpm.

Reviewed By: fanzeyi

Differential Revision: D23443268

fbshipit-source-id: 370426f2cc0d5209b96615f2c017ac08acf266fc
2020-09-08 11:13:44 -07:00
Genevieve Helsel
e19198ded6 implement get_build_info_from_pid for mac
Summary: Implementation for making a getExportedValues() thrift call to the process if `get_build_info_from_pid()` is unavailable (which is the case on mac).

Reviewed By: fanzeyi

Differential Revision: D23442884

fbshipit-source-id: 011bcb63832226e2dabd5be60dd30e13f2481dcc
2020-09-08 11:13:44 -07:00
Genevieve Helsel
4a900d2df4 implement get_edenfs_processes for mac
Summary: Since mac does not have `/proc/`, instead here we call `ps` commands and parse the output. This mirrors the same logic used up for linux.

Reviewed By: wez

Differential Revision: D23442710

fbshipit-source-id: ed5160e4dd52884e5752949a4fb2077690906ac4
2020-09-08 11:13:44 -07:00
Thomas Orozco
2948993c38 remotefilelog: add killswitch for client certs
Summary:
See D23538897 for context. This adds a killswitch so we can rollout client
certs gradually through dynamicconfig.

Reviewed By: StanislavGlebik

Differential Revision: D23563905

fbshipit-source-id: 52141365d89c3892ad749800db36af08b79c3d0c
2020-09-08 10:39:07 -07:00
Thomas Orozco
d1c4772da3 remotefilelog: use client certs when connecting to LFS
Summary:
Like it says in the title, this updates remotefilelog to present client
certificates when connecting to LFS (this was historically the case in the
previous LFs extension). This has a few upsides:

- It lets us understand who is connecting, which makes debugging easier;
- It lets us enforce ACLs.
- It lets us apply different rate limits to different use cases.

Config-wise, those certs were historically set up for Ovrsource, and the auth
mechanism will ignore them if not found, so this should be safe. That said, I'd
like to a killswitch for this nonetheless. I'll reach out to Durham to see if I
can use dynamic config for that

Also, while I was in there, I cleaned up few functions that were taking
ownership of things but didn't need it.

Reviewed By: DurhamG

Differential Revision: D23538897

fbshipit-source-id: 5658e7ae9f74d385fb134b88d40add0531b6fd10
2020-09-08 10:39:07 -07:00
Wez Furlong
26eb8f0c29 eden: buck kill if appropriate when removing/unmounting a redirect
Summary:
We recently switched fbsource from using `.eden-redirections`
to manage some common buck-out redirections to relying on buck's
`eden redirect add PATH bind` invocation.

As part of this, a few users have run into an issue where buck hasn't
realized that the buck-out directory was unmounted and proceeds to
try to build.  It assumes the `eden redirect add PATH bind` will
restore a missing mount, but that command skips doing any real work
in the case that the mount is configured even if it is unmounted.

This commit aims to improve the UX around this situation by:

* When removing a redirection, test to see if buckd is running for
  the containing path and stop it.
* When running `eden redirect add PATH bind`, if the path is configured
  but not mounted then fixup the mount.  (Previously we'd just print
  that we're skipping it)

Reviewed By: genevievehelsel

Differential Revision: D23502306

fbshipit-source-id: 56e823f0b59981c19d0c44723948bd84d6d9008a
2020-09-08 09:59:16 -07:00
Wez Furlong
2a184e5744 eden: fixup buck kill invocation on macos
Summary:
I ran into a couple of problems with this as part of
looking at some improvements to `eden redirect` later in this
stack:

* There's currently an issue with the BUCKVERSION=last handling
  in our internal version of buck on macos, so sidestep it for now.
* We have a bad interaction between the environment set up to
  run edenfsctl.par and a different par file used by the FB internal
  buck wrapper script that causes it to fail on startup.

This commit cleans up the environment to compensate for these issues.

Reviewed By: genevievehelsel

Differential Revision: D23502307

fbshipit-source-id: 34b5099529dcc5f2b2d638bcb333e4dd00211766
2020-09-08 09:59:15 -07:00
David Tolnay
be0786f14b Prepare for rustfmt 2.0
Summary:
Generated by formatting with rustfmt 2.0.0-rc.2 and then a second time with fbsource's current rustfmt (1.4.14).

This results in formatting for which rustfmt 1.4 is idempotent but is closer to the style of rustfmt 2.0, reducing the amount of code that will need to change atomically in that upgrade.

 ---

*Why now?* **:** The 1.x branch is no longer being developed and fixes like https://github.com/rust-lang/rustfmt/issues/4159 (which we need in fbcode) only land to the 2.0 branch.

 ---

Reviewed By: StanislavGlebik

Differential Revision: D23568780

fbshipit-source-id: b4b4a0aa683d236e2fdeb5b96d723ac2d84b9faf
2020-09-08 07:33:16 -07:00
Stanislau Hlebik
bf8a8c4cc9 mononoke: try to fix the test-redaction.t
Summary:
This test fail on sandcastle because the last two lines are not showing up.
I have a hunch that the last two lines just weren't flushed, and this diff
attempts to fix it.

Reviewed By: krallin

Differential Revision: D23570321

fbshipit-source-id: fd7a3315582c313a05e9f46b404e811384bd2a50
2020-09-08 04:29:33 -07:00
Viet Hung Nguyen
065d80b947 mononoke/repo_import: add small change to sleep time
Summary: When we imported a repo (T71717570), we received a network connect error after querying a lot from graphql.  I am not sure, if it's because of the frequent amount of queries, but just to be on the safe side, I increased the default sleep time between queries.

Reviewed By: krallin

Differential Revision: D23538886

fbshipit-source-id: 6a84f509e5e19f86880d3f8c6413f2f47e4a469b
2020-09-08 01:14:24 -07:00
David Tolnay
e62b176170 Prepare for rustfmt 2.0
Summary:
Generated by formatting with rustfmt 2.0.0-rc.2 and then a second time with fbsource's current rustfmt (1.4.14).

This results in formatting for which rustfmt 1.4 is idempotent but is closer to the style of rustfmt 2.0, reducing the amount of code that will need to change atomically in that upgrade.

 ---

*Why now?* **:** The 1.x branch is no longer being developed and fixes like https://github.com/rust-lang/rustfmt/issues/4159 (which we need in fbcode) only land to the 2.0 branch.

 ---

Reviewed By: zertosh

Differential Revision: D23568779

fbshipit-source-id: 477200f35b280a4f6471d8e574e37e5f57917baf
2020-09-07 20:47:59 -07:00
Arun Kulshreshtha
8a26c3c960 edenapi_server: add Scuba logging
Summary: Add Scuba logging using `ScubaMiddleware` from `gotham_ext`. Each request will be logged to the Scuba dataset specified by the `--scuba-dataset` flag, as well as optionally to the log file specified by `--scuba-log-file`.

Reviewed By: sfilipco

Differential Revision: D23547668

fbshipit-source-id: e6cd88ad729a40cf45b63538f7481ee098ea12dc
2020-09-07 17:24:45 -07:00
Arun Kulshreshtha
a7a96e55eb lfs_server: tidy up middleware imports
Summary: Import middleware directly from `gotham_ext` rather than relying on reexports in the `middleware` module.

Reviewed By: farnz

Differential Revision: D23547320

fbshipit-source-id: e64a8acff55445a646b0a1b3b1e71cf6606c3d02
2020-09-07 17:24:45 -07:00
Arun Kulshreshtha
83c54b48f8 gotham_ext: move ScubaMiddleware into gotham_ext
Summary:
Move `ScubaMiddleware` out of the LFS server and into `gotham_ext`.

This change required splitting up the `ScubaKey` enum to separate generally useful column names (e.g., HTTP columns that would be applicable to any HTTP service) from LFS-specific columns. `ScubaMiddlwareState` has been modified to accept any type that implements `Into<String>` as a key, and the `ScubaKey` enum has been split up into `HttpScubaKey` (in `gotham_ext`) and `LfsScubaKey` (in `lfs_server`).

The middleware now takes a type parameter to specify a "handler" (implementing the new `ScubaHandler`  trait) which allows the application to add application-specific Scuba columns in addition to the default columns. The application-specific columns will be added immediately prior to the sample being logged.

Reviewed By: krallin

Differential Revision: D23458748

fbshipit-source-id: 3e99f3e0b5d3475a4f5ac9eaefade2eeff12c2fa
2020-09-07 17:24:45 -07:00
Mateusz Kwapich
6e5a6c3d71 metaedit: JSON input mode
Summary:
This makes it easy for `metaedit` to be used by automation. Provided
with a simple JSON file with hash->{user, message} mapping metaedit will
do all of its work without any prompts.

Reviewed By: quark-zju

Differential Revision: D23545527

fbshipit-source-id: 18763ecacff9143b9ad492faf654b176b0f86d1f
2020-09-07 13:33:58 -07:00
Mateusz Kwapich
3c10f1b9c5 add a way to query changed directories
Summary:
This diff is more complex than I wished for it as originally I didn't take into
account direcotries when designing `commit_compare` method.

Reviewed By: StanislavGlebik

Differential Revision: D23541892

fbshipit-source-id: 0e2b2abf7b3c541529d9881e48a575239374040f
2020-09-07 11:58:31 -07:00
Mateusz Kwapich
49b98e206e add a way to diff the directories between commits
Summary: We need that to replace similar feature in SCMQuery

Reviewed By: StanislavGlebik

Differential Revision: D23541893

fbshipit-source-id: 3dd6357ea834337a81216e24cb132e23b01bc77d
2020-09-07 11:58:31 -07:00
Lukas Piatkowski
fbfb856191 mononoke/integration test: make test-traffic-replay.t private
Reviewed By: StanislavGlebik

Differential Revision: D23565712

fbshipit-source-id: 7cb2d4a6c107ff513522e7343ffd5a8eea25879c
2020-09-07 10:35:39 -07:00
Lukasz Piatkowski
52bc18a728 mononoke/integration tests: fix up integration tests using hooks (#48)
Summary:
Hooks have been recently made public. Remove from list of excluded tests the ones that were blocked by missing hooks and fix them up.

Pull Request resolved: https://github.com/facebookexperimental/eden/pull/48

Reviewed By: farnz

Differential Revision: D23564883

Pulled By: lukaspiatkowski

fbshipit-source-id: 101dd093eb11003b8a4b4aa4c5ce242d9a9b9462
2020-09-07 08:42:39 -07:00
Jun Wu
89eb6520d2 scmutil: remove meaningfulparents
Summary:
The "meaningfulparents" concept is coupled with rev numbers.
Remove it. This changes default templates to not show parents, and `{parents}`
template to show parents.

Reviewed By: DurhamG

Differential Revision: D23408970

fbshipit-source-id: f1a8060122ee6655d9f64147b35a321af839266e
2020-09-05 15:06:44 -07:00
Lukasz Piatkowski
20b082ee6a mononoke/integration tests: blacklist 2 integration tests on OSS runs (#47)
Summary:
Those are new tests that use functionality not compatible yet with OSS.

Pull Request resolved: https://github.com/facebookexperimental/eden/pull/47

Reviewed By: chadaustin

Differential Revision: D23538921

Pulled By: lukaspiatkowski

fbshipit-source-id: c512a1b2359f9ff772d0e66d2e6a66f91e00f95c
2020-09-04 20:21:56 -07:00
Xavier Deguillard
cd0af7689a utils: compile ProcessAccessLog and ProcessNameCache on Windows
Summary:
Even though these might not be fully ported on Windows, they do compile and
tests are passing, so let's compile them.

Reviewed By: chadaustin

Differential Revision: D23505509

fbshipit-source-id: 567e8668ca489daf89c1c6576973bbaaabbb6c88
2020-09-04 16:14:25 -07:00
Xavier Deguillard
a5c85ec822 fuse: move and rename RequestData
Summary:
Most of the RequestData code is platform generic, but bits of it are currently
strongly tied to FUSE. By splitting these 2 parts, we will be able to use the
RequestContext class in Windows too and not having to re-implement all the
logic there.

Reviewed By: chadaustin

Differential Revision: D23482072

fbshipit-source-id: 857fd9ca4264d0f308ec10cc487e9ff3eeb5ee16
2020-09-04 16:14:24 -07:00
Durham Goode
8b91cccc8b remotefilelog: log undesired filename fetches
Summary:
Now that the Rust revisionstore records undesired filename fetches,
let's log those results to Scuba in Python.

Reviewed By: StanislavGlebik

Differential Revision: D23462572

fbshipit-source-id: b55f2290e30e3a5c3b67d9f612b24bc3aad403a8
2020-09-04 14:55:15 -07:00
Durham Goode
9772ab1718 revisionstore: record remote fetches that match a pattern
Summary:
We want to be able to record when fetches to certain paths happen.
Let's add recording infrastructure to the new ReportingRemoteDataStore.

A future diff will make the seen accessible from Python for scuba logging.

Reviewed By: xavierd

Differential Revision: D23462574

fbshipit-source-id: 5d749f2429e26e8e7fe4fb5adc29140b4309eac9
2020-09-04 14:55:15 -07:00
Durham Goode
84cbc26b1e revisionstore: add reporting wrapper for remote data store
Summary:
We want to monitor what paths are fetched from our remote servers.
Since all of our remote stores are hidden behind the RemoteDataStore interface,
let's create a wrapper around that. A future diff will insert the actual
monitoring and reporting.

Reviewed By: quark-zju

Differential Revision: D23462571

fbshipit-source-id: e6031f19db23f7d1b09767efb9613d7528fb457d
2020-09-04 14:55:14 -07:00
Jun Wu
dabb68c1e5 checkmessagehook: make error message more obvious
Summary: This hopefully makes it more obvious so it looks less like an hg crash.

Reviewed By: kulshrax

Differential Revision: D23509569

fbshipit-source-id: 7174780bc7e9841e3f89a482280c49427b62fb74
2020-09-04 14:55:14 -07:00
Jun Wu
4131dcf012 context: avoid memorizing revs
Summary:
The revs can change after flush. For example, during pushrebase, some ctx might
initially have a non-master Id assigned, and later got assigned an Id in the
master group:

```
ipdb> p self.__dict__
{'_repo': <edenscm.hgext.fastannotate.protocol.localreposetup.<locals>.fastannotaterepo object at 0x7f2415b3f8e0>, '_rev': 72057594038527478, '_node': b'\xb6\x12\xcd\x81b#\xa3\x01\xe2pP\x84\x05{\xd2He\xbe\xcc\xf0'}
ipdb> p self._node
b'\xb6\x12\xcd\x81b#\xa3\x01\xe2pP\x84\x05{\xd2He\xbe\xcc\xf0'
ipdb> p self._repo.changelog.rev(self._node)
7198913
ipdb> p self._rev
72057594038527478
```

Note that `self._rev` becomes inconsistent with `changelog.rev(self._node)`.

The error looks like:

  $ hg push -r . --to master --debug --trace --traceback --verbose
  ...
  pushing rev 556400239977 to destination ...
  ...
  1 commits found
  list of changesets:
  556400239977b9ed523eae5ad28773784c975f7f
  sending unbundle command
  ...
  added 79 commits with 0 changes to 0 files
  moving remote bookmark 'remote/master' to 84829e9242e4
  ...
  using eden update code path
  Traceback (most recent call last):
    ...
    File "/opt/fb/mercurial/edenscm/mercurial/merge.py", line 2220, in update
      return eden_update.update(
    File "/opt/fb/mercurial/edenscm/mercurial/eden_update.py", line 126, in update
      stats, actions = _handle_update_conflicts(
    ...
    File "/opt/fb/mercurial/edenscm/mercurial/context.py", line 503, in _changeset
      return self._repo.changelog.changelogrevision(self.rev())
      # self = <changectx 84829e9242e4>
    File "/opt/fb/mercurial/edenscm/mercurial/changelog2.py", line 312, in changelogrevision
      return changelogrevision(self.revision(nodeorrev))
      # nodeorrev = 72057594038527521
    File "/opt/fb/mercurial/edenscm/mercurial/changelog2.py", line 365, in revision
      node = self.node(nodeorrev)
      # nodeorrev = 72057594038527521
    File "/opt/fb/mercurial/edenscm/mercurial/changelog2.py", line 280, in node
      raise IndexError("revlog index out of range")
  Traceback (most recent call last):
    File "/opt/fb/mercurial/edenscm/mercurial/changelog2.py", line 278, in node
      return self.idmap.id2node(rev)
  error.CommitLookupError: 'N599585 cannot be found'

Change `context` object to not memorizing revs.

Reviewed By: DurhamG

Differential Revision: D23468702

fbshipit-source-id: b623bcec99b09d61169371e08c69fc6d6f38935c
2020-09-04 13:22:18 -07:00
Lukas Piatkowski
12c684afcd mononoke/hooks: make deny_files public
Reviewed By: aslpavel

Differential Revision: D23537799

fbshipit-source-id: 58c9568e30982f682b00faae42bc3a3f3595890f
2020-09-04 12:23:35 -07:00
Jun Wu
e74133f0fa dag: limit max segment level to 4
Summary:
This is based on fbsource data, building level 5 proves to be not useful.

This would save 300ms in the write path.

Reviewed By: sfilipco

Differential Revision: D23494505

fbshipit-source-id: ca795b4900af40dbfdaa463d36f3169413bf6a62
2020-09-04 12:20:54 -07:00
Jun Wu
b4adf0602f dag: remove non-master "Name -> Id" index on request
Summary:
Previously the IdMap's "Name -> Id" index simply ignores the "reassign
non-master" request. It turns out stale entries in that index can cause
issues as demonstrated by the previous diff.

Update IdMap to actually remove both indexes of non-master group on
remove_non_master so it cannot have stale entries.

To optimize the index, the format of IdMap is changed from:

  [ 8 bytes Id (Big Endian) ] [ Name ]

to:

  [ 8 bytes Id (Big Endian) ] [ 1 byte Group ] [ Name ]

So the index can use reference to the slice, instead of embedding the bytes, to
reduce index size.

The filesystem directory name for IdMap used by NameDag is bumped to `idmap2`
so it won't read the incompatible old `idmap` data.

Reviewed By: sfilipco

Differential Revision: D23494508

fbshipit-source-id: 3cb7782577750ba5bd13515b370f787519ed3894
2020-09-04 12:20:53 -07:00
Jun Wu
c5d6c9d0f2 dag: add a test showing non-master rebuild issues
Summary: Some vertexes can disappear from the graph!

Reviewed By: sfilipco

Differential Revision: D23494506

fbshipit-source-id: ecbf2a4169e5fc82596e89a4bfe4c442a82e9cd2
2020-09-04 12:20:53 -07:00
Jun Wu
4aea3657e1 dag: move some test utilities to a TestDag struct
Summary: The TestDag struct will be used to do some more complicated tests.

Reviewed By: sfilipco

Differential Revision: D23494507

fbshipit-source-id: 11350f9e448725ae49f50a7b6f19efc57ad84448
2020-09-04 12:20:53 -07:00
Thomas Orozco
3ba2c2b429 mononoke/hg_sync: make it work on Mercurial Python 3
Summary:
A few things here:

- The heads must be bytes.
- The arguments to wireproto must be strings (we used to encode / decode them,
  but we shouldn't).
- The bookmark must be a string (otherwise it gets serialized as `"b\"foo\""`
  and then it deserializes to that instead of `foo`).

Reviewed By: StanislavGlebik

Differential Revision: D23499846

fbshipit-source-id: c8a657f24c161080c2d829eb214d17bc1c3d13ef
2020-09-04 11:56:44 -07:00
Thomas Orozco
747b355236 mononoke: make mononoke_hg_sync_job sendunbundlereplaybatch more debuggable
Summary:
Right now we get very little logging out of errors in here, which is making it
difficult to fix it on Py3 (where it currently is broken).

This diff doesn't fix anything, but at the very least, let's make the errors
better so we can make this easier to start debugging.

Reviewed By: ahornby

Differential Revision: D23499369

fbshipit-source-id: 7ee60b3f2a3be13f73b1f72dee062ca80cb8d8d9
2020-09-04 11:56:44 -07:00
Thomas Orozco
c8dd8ae4e3 mononoke: run tests using hg Python 3 as well
Summary:
The motivation for this is to surface potential regressions in hg Python 3 by
testing code paths that are exercised in Mononoke. The primary driver for this
were the regressions in the LFS extension that broke uploads, and for which we
have test coverage here in Mononoke.

To do this, I extracted the manifest generation (the manifest is the list of
binaries that the tests know about, which is passed to the hg test runner), and
moved it into its own function, then added a new target for the py3 tests.

Unfortunately, a number of tests are broken in Python 3 currently. We should
fix those. It looks like there are some errors in Mercurial when walking a
manifest with non-UTF-8 files, and the other problem is that the hg sync job is
in fact broken: https://fburl.com/testinfra/545af3p8.

Reviewed By: ahornby

Differential Revision: D23499370

fbshipit-source-id: 762764147f3b57b2493d017fb7e9d562a58d67ba
2020-09-04 11:56:44 -07:00
Stanislau Hlebik
7b323a4fd9 mononoke: add log-only mode in redaction
Summary:
Before redacting something it would be good to check that this file is not
accessed by anything. Having log-only mode would help with that.

Reviewed By: ikostia

Differential Revision: D23503666

fbshipit-source-id: ae492d4e0e6f2da792d36ee42a73f591e632dfa4
2020-09-04 07:37:15 -07:00
Stanislau Hlebik
0740f99f13 mononoke: allow logging censored scuba accesses to file
Summary:
In the next diff I'm going to add log-only mode to redaction, and it would be
good to have a way of testing it (i.e. testing that it actually logs accesses
to bad keys).

In this diff let's use a config option that allows logging censored scuba
accesses to file, and let's update redaction integration test to use it

Reviewed By: ikostia

Differential Revision: D23537797

fbshipit-source-id: 69af2f05b86bdc0ff6145979f211ddd4f43142d2
2020-09-04 07:37:14 -07:00
Thomas Orozco
f1e4f62e2d mononoke/fsnodes: expose FsnodeFile as the LeafId
Summary:
Fsnodes have a lot of data about files, but right now we can't access it
through a Fsnode lookup or a manifest walk, because the LeafId for a Fsnode is
just the content id and the file type.

This is a bit sad, because it means we e.g. cannot dump a manifest with file
sizes (D23471561 (179e4eb80e)).

Just changing the LeafId is easy, but that brings a new problem with Fsnode
derivation.

Indeed, deriving manifests normally expects us to have the "derive leaf"
function produce a LeafId (so we'd want to produce a `FsnodeFile`), but in
Fsnodes, this currently happens in deriving trees instead.

Unfortunately, we cannot easily just move the code that produces `FsnodeFile`
from the tree derivation to the leaf derivation, that is, do:

```
fn check_fsnode_leaf(
    leaf_info: LeafInfo<FsnodeFile, (ContentId, FileType)>,
) -> impl Future<Item = (Option<FsnodeSummary>, FsnodeFile), Error = Error>
```

Indeed, the performance of Fsnode derivation relies on all the leaves for a
given tree being derived together with the tree and its parents in context.

So, we'd need the ability for deriving a new leaf to return something different
from the actual leaf id. This means we want to return a `(ContentId,
FileType)`, even though our `LeafId` is a `FsnodeFile`.

To do this, this diff introduces a new `IntermediateLeafId` type in the
derivation. This represents the type of the leaf that is passed from deriving a
leaf to deriving a tree. We need to be able to turn a real `LeafId` into it,
because sometimes we don't re-derive leaves.

I think we could also refactor some of the code that passes a context here to
just do this through the `IntermediateLeafId`, but I didn't look into this too
much.

So, this diff does that, and uses it in Mononoke Admin so we can print file
sizes.

Reviewed By: StanislavGlebik

Differential Revision: D23497754

fbshipit-source-id: 2fc480be0b1e4d3d261da1d4d3dcd9c7b8501b9b
2020-09-04 06:30:18 -07:00
Mateusz Kwapich
f7be2eef14 tunable scuba sampling
Summary:
This allows us to sample the most popular method logs (`repo_list_hg_manifest` calls make up for 90% samples in our scuba table) while still have full logging for other queries end errors.

The sampling can be eaily disabled via tunable. In case we get a lot of errors we can also start sampling the error request with a simple configerator change.

Reviewed By: krallin

Differential Revision: D23507333

fbshipit-source-id: c7e34467d99410ec3de08cce2db275a55394effd
2020-09-04 06:26:35 -07:00
Viet Hung Nguyen
437a0e905b mononoke/repo_import: add deriving data types for multiple repos
Summary: Previously, we only supported deriving data types for the repo we import into. This diff expands on this and now we can do that for multiple repos (e.g. small repos we backsync commits to from large repo we import to).

Reviewed By: StanislavGlebik

Differential Revision: D23499953

fbshipit-source-id: 223209a6a2739eae93082cae4f04e53e0cba0c58
2020-09-04 05:39:21 -07:00
Stanislau Hlebik
11a45b6b60 mononoke: do not pass tasks to find_files_with_given_content_id_blobstore_keys
Summary:
In the next diff I'm going to add log_only mode for redaction.
And in this diff I make a small refactoring that makes next diff simpler.
find_files_with_given_content_id_blobstore_keys don't accept tasks anymore,
just content keys.

Reviewed By: aslpavel

Differential Revision: D23535829

fbshipit-source-id: 1dac37f5ea7038fc779ad51192a290fcc23e6556
2020-09-04 05:22:03 -07:00
Lukas Piatkowski
67a71d1f98 mononoke/hooks: make limit_commitsize and limit_filesize public
Reviewed By: aslpavel

Differential Revision: D23502908

fbshipit-source-id: 8b9070cfaa28af7b808d02548c0fb7c5d344550d
2020-09-04 04:23:05 -07:00
Lukas Piatkowski
462cb96cc2 mononoke/hooks: make no_questionable_filenames public
Reviewed By: aslpavel

Differential Revision: D23478259

fbshipit-source-id: 642948c2685690298a71fbe7177c4bd6a6e43f85
2020-09-04 04:23:05 -07:00
Lukas Piatkowski
eebdc0b896 mononoke/metaconfig: sync thrift changes from configerator for HookConfig
Summary: Use the new fields from RawHookConfig in HookConfig

Reviewed By: StanislavGlebik

Differential Revision: D23499766

fbshipit-source-id: 43e9d2dfdcfb0fa0dd4de6310ea0013db1b69474
2020-09-04 02:02:06 -07:00
Jun Wu
c9e6995675 py2: fix crecord compatibility
Summary:
D23460476 (c84653c7a9) breaks Python 2:

Python 2: bytes + bytearray -> bytearray
Python 3: bytes + bytearray -> bytes

Fix it.

Python 2: b"%s" % bytearray -> bytes
Python 2: b"%s" % bytearray -> bytes

Reviewed By: singhsrb

Differential Revision: D23514590

fbshipit-source-id: 7fd5f2372444732f13909c42251f000f05955228
2020-09-03 18:51:10 -07:00
Zeyi (Rice) Fan
8be3e1940a change default configuration path
Summary:
`C:/tools/eden` will be overriden whenever a new EdenFS package is installed, therefore making it  unsuitable to be managed by Chef.

Changing the default configuration directory to `C:\ProgramData\eden` that aligns with other programs.

Reviewed By: xavierd

Differential Revision: D23484626

fbshipit-source-id: 763518c608b24caa08e089a738f5c3577a0d6483
2020-09-03 17:18:51 -07:00
Xavier Deguillard
5c6ab8afac utils: rename ProcessAccessLog::AccessType enum
Summary:
Removing Fuse from the enum name makes it non tied to Fuse and thus makes it
more portable. This also eliminates the last platform specific bit from
RequestData.

Reviewed By: chadaustin

Differential Revision: D23467773

fbshipit-source-id: 52515522c8ac51d0c4b56dc5e42d4b6593df6623
2020-09-03 17:00:07 -07:00
Stefan Filip
3f0b08e46f segmented_changelog: add version field to IdMap
Summary:
The version is going to be used to seamlessly upgrade the IdMap. We can
generate the IdMap in a variety of ways. Naturally, algorithms for generating
the IdMap may change, so we want a mechanism for updating the shared IdMap.

A generated IdDag is going to require a specific IdMap version. To be more
precise, the IdDag is going to specify which version of IdMap it has to be
interpreted with.

Reviewed By: quark-zju

Differential Revision: D23501158

fbshipit-source-id: 370e6d9f87c433645d2a6b3336b139bea456c1a0
2020-09-03 16:33:20 -07:00
Stefan Filip
58a4821fe3 segmented_changelog: add IdMap trait with SqlIdMap implementation
Summary:
Separate the operational bits of the IdMap from the core SegmentedChangelog
requirements.

I debaded whether it make sense to add repo_id to SqlIdMap. Given the current
architecture I don't see a reason not to do it. On the contrary separating
two objects felt convoluted.

Reviewed By: quark-zju

Differential Revision: D23501160

fbshipit-source-id: dab076ab65286d625d2b33476569da99c7b733d9
2020-09-03 16:33:20 -07:00
Stefan Filip
f3c353edbc segmented_changelog: change idmap module from file to directory
Summary:
Planning to add a trait for core idmap functionality (that's just translating
cs_id to vertex and back). The current IdMap will then be an implementation of
that trait.

Reviewed By: quark-zju

Differential Revision: D23501159

fbshipit-source-id: 34e3b26744e4b5465cd108cca362c38070317920
2020-09-03 16:33:20 -07:00
Stefan Filip
c09f80882c edenapi: use async-runtime to schedule futures
Summary:
Replacing places where the tokio runtime is instantiated inside the edenapi
client crate.

Reviewed By: quark-zju

Differential Revision: D23468596

fbshipit-source-id: ef68718c7d5b89b6477a2946daaa51618b53d06a
2020-09-03 15:45:34 -07:00
Jun Wu
cea2bf8728 dag: limit segment level at open time
Summary:
At open time, it's pointless to attempt to create new levels. So let's just
read the existing max_level and do not try to build max_level + 1.

This turns out to save 300ms in profiling result.

Reviewed By: sfilipco

Differential Revision: D23494509

fbshipit-source-id: 4ea326a3cc21792790ea0b87e5bf608a94ae382b
2020-09-03 13:48:43 -07:00
Jun Wu
f238529a97 multilog: use per-log meta to pick up updated indexes
Summary:
With MultiLog, per-log meta was previously entirely ignored. However, they can
be useful for updated indexes. For example, application defines a new index,
and opens a Log via MultiLog. The application would expect the new index is
built only once. Without MultiLog, per-log meta is updated at open time in
place. With MultiLog, the updated index meta is not written back to the
multimeta so the new index would be rebuilt multiple times undesirably.

Update MultiLog to reuse the per-log meta if it's compatible so it can pick up
new indexes.

Reviewed By: sfilipco

Differential Revision: D23488212

fbshipit-source-id: c8b3e6b5589dbda2e76a143d15085862a93dae22
2020-09-03 13:48:43 -07:00
Jun Wu
f79e7657af multilog: stop writing poisoned per-log meta
Summary:
The poisoned meta makes investigation harder. ex. `debugdumpindexlog` won't
work on those logs.

Reviewed By: sfilipco

Differential Revision: D23488213

fbshipit-source-id: b33894d8c605694b6adf5afdaed45707fbd7357e
2020-09-03 13:48:43 -07:00
Stanislau Hlebik
4947e07cb7 mononoke: asyncify one function in redaction admin subcommand
Summary:
I'm going to change this function soon, so it's nice to asyncify it to make
next diffs simpler and also remove duplicated logic.

Also remove unnecessary `logger` parameter - we can always get logger from CoreContext

Reviewed By: krallin

Differential Revision: D23501634

fbshipit-source-id: 7ad2fc17167e4107481ceb230e0b7cb3e7f2549a
2020-09-03 12:22:24 -07:00
Mateusz Kwapich
20d096f5d5 add thrift metadata support
Summary: This closely replicates EscapeZero work in D23328638 and will allow us to issue requests to SCS using Thrift Fiddle (https://www.internalfb.com/thrift_fiddle).

Reviewed By: EscapeZero

Differential Revision: D23475864

fbshipit-source-id: fb286e3fcd6ea79704fa2e7e1ed9ab5595ff7b81
2020-09-03 12:18:18 -07:00
Arun Kulshreshtha
858a080502 gotham_ext: make StreamBody automatically delay post-request callbacks
Summary: Now that post-request callbacks are available in `gotham_ext`, we can make `StreamBody` use them directly instead of using an LFS-specific wrapper (previously required to access the LFS server's `RequestContext`). This also means that the EdenAPI server will get this behavior for free.

Reviewed By: krallin

Differential Revision: D23402969

fbshipit-source-id: 56ab710473f13e8983b136664af364af6884bd3f
2020-09-03 11:59:32 -07:00
Arun Kulshreshtha
5556a447d1 edenapi_server: use LogMiddleware
Summary: Add `LogMiddleware` to the EdenAPI server, which will print a log message whenever a request is received or has completed.

Reviewed By: DurhamG

Differential Revision: D23299902

fbshipit-source-id: f44ef1b01692f0e4f9b109917fcee89a84ca4208
2020-09-03 11:59:32 -07:00
Arun Kulshreshtha
96a6a3fcfb edenapi_server: use LoadMiddleware
Summary: Use `LoadMiddleware` to track the number of outstanding requests in the server.

Reviewed By: DurhamG

Differential Revision: D23298415

fbshipit-source-id: bdcdb0f657d8deac593d356c87ac0d8d3f39e322
2020-09-03 11:59:32 -07:00
Arun Kulshreshtha
7144363d2c gotham_ext: move LogMiddleware to gotham_ext
Summary: Now that `LogMiddleware` no longer depends on `RequestContext`, it can be moved into `gotham_ext`.

Reviewed By: DurhamG

Differential Revision: D23298412

fbshipit-source-id: d5288decba98c3dd4605b9a44e41eba0f47fee37
2020-09-03 11:59:31 -07:00
Arun Kulshreshtha
35d292e513 gotham_ext: move LoadMiddleware to gotham_ext
Summary: Now that `LoadMiddleware` no longer depends on `RequestContext`, it can be moved into `gotham_ext`.

Reviewed By: DurhamG

Differential Revision: D23298416

fbshipit-source-id: 5d29da492e39beb5621daf0570d9b3e657cbfc04
2020-09-03 11:59:31 -07:00
Arun Kulshreshtha
82c451fb9f lfs_server: use PostRequestMiddleware
Summary: This diff removes the post-request callback functionality from the LFS server's `RequestContext` and replaces it with the new `PostRequestMiddleware`. The middleware is directly based on `RequestContext`, so the underlying behavior is essentially the same as before.

Reviewed By: krallin

Differential Revision: D23298413

fbshipit-source-id: 1e58a40f6ce6d526456dbd9ae3a8efc85768bf04
2020-09-03 11:59:31 -07:00
Arun Kulshreshtha
3ad7fa8b6f gotham_ext: allow applications to dynamically configure PostRequestMiddleware
Summary: Make `PostRequestMiddleware` generic over a user-provided config struct which can be used to dynamically configure the behavior of post-request callback dispatching. Right now this is only used to support disabling hostname logging, but could be easily extended to cover more uses in the future.

Reviewed By: krallin

Differential Revision: D23495005

fbshipit-source-id: 3d59a8346f449775ec76d03c260d973d04fb90a9
2020-09-03 11:59:31 -07:00
Arun Kulshreshtha
cc0f2e4c40 gotham_ext: add PostRequestMiddleware
Summary: Add new middleware that allows HTTP handlers and other middleware to register callbacks that will be run once the current request completes. This is heavily based on the post-request callback functionality from the LFS server's `RequestContext`. The intention here is to expose this functionality in a manner that's independent of other, application-specific logic.

Reviewed By: krallin

Differential Revision: D23298419

fbshipit-source-id: e4b1534b02c35f685ce544de13e331947e187818
2020-09-03 11:59:31 -07:00
Thomas Orozco
d77cf89ead mononoke/admin: clean up unodes subcommand a bit
Summary:
I pattern matched off of this for the previous diff in this stack, and spotted
a bit of clean up that might make sense here:

- Using `.help()` for a subcommand overrides the whole help text. We meant to
  use `.about()` here. I fixed this in some copy-pasted code as well.
- Printing debug output alongside real output makes it harder to select the
  real output. I fixed this by logging debug output to stderr instead.

Reviewed By: StanislavGlebik

Differential Revision: D23471560

fbshipit-source-id: 7900cfe65613c48abd77faad6d6a45a7aa523b36
2020-09-03 09:32:06 -07:00
Thomas Orozco
179e4eb80e mononoke/admin: add a subcommand for dumping paths
Summary:
This adds a subcommand for dumping all the paths in a repository. This is
helpful when you have a Content ID, limited imagination and time on your hands,
and you'd like to turn those into a file path where that Content ID lives.

This uses fsnodes for the traversal because that's O(# directories) as opposed
top O(# files). I had an earlier implementation that used unodes, but that was
really slow.

Reviewed By: StanislavGlebik

Differential Revision: D23471561

fbshipit-source-id: 948bfd20939adf4de0fb1e4b2852ad4d12182f16
2020-09-03 09:32:06 -07:00
Viet Hung Nguyen
7c34b39ec8 mononoke/repo_import: add backsyncing to rewrite file paths, remove backup file
Summary:
add backsyncing to rewrite file paths:
After setting the variables for large repo (D23294833 (d6895d837d)), we try to import the git commits into large repo and rewrite the file paths.
Following this, repo import tool should back-sync the commits into small_repo.

next step: derive all the data types for both small and large repos. Currently, we only derive it for the large repo.

==============
remove backup file:
The backup file was a last-minute addition when trying to import a repo for the first time.
Removed it, because we shouldn't write to external files. Future plan is to include
better process recoverability across the whole tool and not just rewrite file paths functionality.

Reviewed By: StanislavGlebik

Differential Revision: D23452571

fbshipit-source-id: bda39694fa34788218be795319dbbfd014ba85ff
2020-09-03 06:43:08 -07:00
Stanislau Hlebik
a77d9f243a mononoke: parallelize operations in create_commit scs method
Reviewed By: krallin

Differential Revision: D23496535

fbshipit-source-id: 18f88abb9b85d38a93d2aa99c38edcf8190343c3
2020-09-03 04:12:35 -07:00
Lukas Piatkowski
a4af730541 monononke/hooks: make no_bad_filenames public
Reviewed By: aslpavel

Differential Revision: D23474524

fbshipit-source-id: 5f7826346500b1acc7450791dd1e7806c4e623d6
2020-09-03 02:40:43 -07:00
Lukas Piatkowski
81d9338100 mononoke/hooks: make few generic hooks public
Summary: More hooks will come in next diffs.

Reviewed By: aslpavel

Differential Revision: D23449755

fbshipit-source-id: 451fdb7a759140f2d6df8f3a18493c700fa2b761
2020-09-03 02:40:43 -07:00
Stanislau Hlebik
29bbc0dc15 mononoke: check if content we are about to redact is not reachable
Summary:
That's one of the sev followups. Before redacting a file content let's check if
it exists in "main-bookmark" (which is be default master), and refuse to redact
if it actually exists.

If this check passes (i.e. the content we are about to redact is not reachable
from master) that doesn't mean that we are 100% safe. E.g. this comment can be
in ancestor of master, or in any other repo or it can be added in the next
commit.

This check is a best-effort check to prevent shooting ourselves in the foot.

Reviewed By: aslpavel

Differential Revision: D23476278

fbshipit-source-id: 5a4cd10964a65b8503ba9a6391f17319f0ce37d8
2020-09-03 01:30:14 -07:00
Wez Furlong
950c81858c eden: fix buffer advance in FileDescriptor::wrapFull
Summary:
The loop took care to advance `b` to match the amount
of data that it had processed, but was still passing `buf`
(the unadjusted start of the buffer) to the syscalls.

This meant that in situations where a `readFull` might
encounter a partial read, it would scribble over the start
of the buffer and leave junk at the end.

For example:

write("hell");
write("o");

could produce "oell?" in the buffer when `readFull` consumes
the other end of the pipe.

Reviewed By: xavierd

Differential Revision: D23486270

fbshipit-source-id: 0848f6789b44421b609b91fe08890768ff59f7f5
2020-09-02 23:38:18 -07:00
Katie Mancini
76df592222 allow multiple prefixes for paths to be logged
Summary:
Currently we use a single path prefix to configure data fetch logging in eden
(i.e if the path of a file which we fetch is an extension of our configured
path, then we log that data fetch. )

There is some interest in extending this to multiple path prefixes, so that we
can log separate parts repo.

Reviewed By: StanislavGlebik

Differential Revision: D22877942

fbshipit-source-id: f6eb3dcb4fa460b4acab09677e972caf9421ddff
2020-09-02 22:54:23 -07:00
Genevieve Helsel
eb97eeedd4 move is_system_idle to UnixProcUtils
Summary: This code can be used on Mac as well, so I can just move it to `UnixProcUtils` to be shared. I think to start it we can just try using this before trying to add special idleness detection with looking for active screensavers etc.

Reviewed By: fanzeyi

Differential Revision: D23183163

fbshipit-source-id: fffad8314e70f8726836c482f7a5e30e57a75c0d
2020-09-02 20:19:49 -07:00
Genevieve Helsel
4cc9a3b6de skip restarting if the installed version is the same as the running version
Summary: We don't need to restart users if their running version is the same as their installed version, so we should check that when deciding if we should restart. This will give us more freedom in restarts since we won't have to play with `min_uptime`. I will add a flag to skip this check in case for some reason we need to do so on the fly.

Reviewed By: wez

Differential Revision: D23438306

fbshipit-source-id: b17c0e13789071b8b7c1b15ac5a8deb74a4fd091
2020-09-02 20:19:48 -07:00
Genevieve Helsel
cca2991b75 add ability to construct EdenInstance from cmdline args
Summary: I want to be able to reverse engineer an EdenInstance in the `edenfs_restarter` given the cmdline of the process. I think this best lives in the `config.py` file.

Reviewed By: fanzeyi

Differential Revision: D23438318

fbshipit-source-id: b3d9ac3981d3fb2bb8045b07b8d949cd601f6898
2020-09-02 20:19:48 -07:00
Xavier Deguillard
34821976e0 utils: use makeWin32ErrorExplicit instead of system_error
Summary: The latter will not strip new lines from the system error message, while the former does.

Reviewed By: genevievehelsel

Differential Revision: D23480435

fbshipit-source-id: 44742b960935552fa1781ed19f38ff446a8c9403
2020-09-02 19:47:00 -07:00
Jun Wu
99511f8743 dag: benchmark dag_ops on different IdDagStores
Summary:
Change dag_ops benchmarks to use different IdDagStores. An example run shows:

  benchmarking dag::iddagstore::indexedlog_store::IndexedLogStore
  building segments (old)                           856.803 ms
  building segments (new)                           127.831 ms
  ancestors                                          54.288 ms
  children (spans)                                  619.966 ms
  children (1 id)                                    12.596 ms
  common_ancestors (spans)                            3.050 s
  descendants (small subset)                         35.652 ms
  gca_one (2 ids)                                   164.296 ms
  gca_one (spans)                                     3.132 s
  gca_all (2 ids)                                   270.542 ms
  gca_all (spans)                                     2.817 s
  heads                                             247.504 ms
  heads_ancestors                                    40.106 ms
  is_ancestor                                       108.719 ms
  parents                                           243.317 ms
  parent_ids                                         10.752 ms
  range (2 ids)                                       7.370 ms
  range (spans)                                      23.933 ms
  roots                                             620.150 ms

  benchmarking dag::iddagstore::in_process_store::InProcessStore
  building segments (old)                           790.429 ms
  building segments (new)                            55.007 ms
  ancestors                                           8.618 ms
  children (spans)                                  196.562 ms
  children (1 id)                                     2.488 ms
  common_ancestors (spans)                          545.344 ms
  descendants (small subset)                          8.093 ms
  gca_one (2 ids)                                    24.569 ms
  gca_one (spans)                                   529.080 ms
  gca_all (2 ids)                                    38.462 ms
  gca_all (spans)                                   540.486 ms
  heads                                             103.930 ms
  heads_ancestors                                     6.763 ms
  is_ancestor                                        16.208 ms
  parents                                           103.889 ms
  parent_ids                                          0.822 ms
  range (2 ids)                                       1.748 ms
  range (spans)                                       6.157 ms
  roots                                             197.924 ms

  benchmarking dag::iddagstore::bytes_store::BytesStore
  building segments (old)                           724.467 ms
  building segments (new)                            90.207 ms
  ancestors                                          23.812 ms
  children (spans)                                  348.237 ms
  children (1 id)                                     4.609 ms
  common_ancestors (spans)                            1.315 s
  descendants (small subset)                         20.819 ms
  gca_one (2 ids)                                    72.423 ms
  gca_one (spans)                                     1.346 s
  gca_all (2 ids)                                   116.025 ms
  gca_all (spans)                                     1.470 s
  heads                                             155.667 ms
  heads_ancestors                                    19.486 ms
  is_ancestor                                        51.529 ms
  parents                                           157.285 ms
  parent_ids                                          5.427 ms
  range (2 ids)                                       4.448 ms
  range (spans)                                      13.874 ms
  roots                                             365.568 ms

Overall, InProcessStore > BytesStore > IndexedLogStore. The InProcessStore
uses `Vec<BTreeMap<Id, StoreId>>` for the level-head index, which is more
efficient on the "Level" lookup (Vec), and more cache efficient (BTree).
BytesStore outperforms IndexedLogStore because it does not need to verify
checksum on every read access - the checksum was verified at store creation
(IdDag::from_bytes).

Note: The `BytesStore` is something optimized for serialization, and hasn't been sent.

Reviewed By: sfilipco

Differential Revision: D23438174

fbshipit-source-id: 6e5f15188e3b935659ccde25fac573e9b963b78f
2020-09-02 18:54:12 -07:00
Jun Wu
84ad7a5351 dag: implement GetLock for all IdDagStores
Summary: This allows them to use the SyncableIdDag APIs.

Reviewed By: sfilipco

Differential Revision: D23438170

fbshipit-source-id: 7ec7288cfb8186b88f85f0212a913cb0dffe7345
2020-09-02 18:54:12 -07:00
Jun Wu
cfff0e9144 dag: make IdDag::prepare_filesystem_sync generic
Summary: Other IdDagStores can also use the API. This will be used in benchmarks.

Reviewed By: sfilipco

Differential Revision: D23438180

fbshipit-source-id: 565552b66372dcfbb268c397883f627491d6e154
2020-09-02 18:54:12 -07:00
Jun Wu
8874e07f9b dag: IdDagStore::reload -> GetLock::reload
Summary:
Similar to `IdDagStore::sync` -> `GetLock::persist`, `reload` is more related
to filesystem/internal state exchange, and should be protected by a lock.  So
let's move the API there, and requires a lock.

Reviewed By: sfilipco

Differential Revision: D23438169

fbshipit-source-id: 4228106b7739a1a758677adfddd213ad54aa4b6a
2020-09-02 18:54:12 -07:00
Jun Wu
d633576880 dag: remove NameDag::reload
Summary:
`NameDag::reload` is used in `flush` to get a "fresh" NameDag.
In a future diff the `IdDag::reload` API gets changed, so let's
remove NameDag's use of it.

Instead, let's just re-`open` the path again to get a fresh NameDag.
It's a bit more expensive but probably okay, and easier to understand.
`get_new_segment_size()` was added as an internal API to preserve tests.

This also solves an issue where `NameDag` cannot recover properly if its
`flush` fails, because the old `NameDag` state is not lost.

After removing `NameDag::reload`, `idMap::reload` is no longer used publicly
and was made private.

Reviewed By: sfilipco

Differential Revision: D23438179

fbshipit-source-id: 0a32556a2cd786919c233d7efcae1cb9cbc5fb09
2020-09-02 18:54:11 -07:00
Jun Wu
8e16e4260f dag: IdDagStore::sync -> GetLock::persist
Summary:
The word "sync" is bi-directional: flush + reload. It was indexedlog::Log's
behavior. However, in the IdDag context "sync" is confusing - it is actually
only used to write data out, with protection from lock. Rename to `persist`
to clarify it's memory -> disk. Besides, requires a reference to a lock object
as a lightweight prove that some lock is held.

Reviewed By: sfilipco

Differential Revision: D23438175

fbshipit-source-id: 3d9ccd7431691d1c4e2ee74f3c80d95f5e7243b5
2020-09-02 18:54:11 -07:00
Jun Wu
3ad58ff945 dag: make SyncableIdMap use &mut IdMap instead of IdMap
Summary:
This removes the need of cloning `IdMap`.

SyncableIdMap is a bit tricky. I added some comments to clarify things.

Reviewed By: sfilipco

Differential Revision: D23438176

fbshipit-source-id: fe66071da07067ed6c53a6437790af1d81b28586
2020-09-02 18:54:11 -07:00
Jun Wu
23f9bec22b dag: move IdDagStore impls to separate files
Summary: This makes `iddagstore.rs` cleaner.

Reviewed By: sfilipco

Differential Revision: D23438177

fbshipit-source-id: 465cec2231a084a36b20da8e413cb9272f64a00a
2020-09-02 18:54:10 -07:00
Jun Wu
4e9200db44 dag: test IndexedLogIdDagStore
Summary:
Make the test cover IndexedLogIdDagStore. The only change is the parent index
returns children in a different order.

Reviewed By: sfilipco

Differential Revision: D23438173

fbshipit-source-id: bcfabcd329e45bbc5e7e773103fa42307c23c35d
2020-09-02 18:54:10 -07:00
Victor Zverovich
a2524040e0 Migrate to field_ref Thrift API
Summary:
We are unifying C++ APIs for accessing optional and unqualified fields:
https://fb.workplace.com/groups/1730279463893632/permalink/2541675446087359/.

This diff migrates code from accessing data members generated from unqualified
Thrift fields directly to the `field_ref` API, i.e. replacing

```
thrift_obj.field
```

with

```
*thrift_obj.field_ref()
```

The `_ref` suffixes will be removed in the future once data members are private
and names can be reclaimed.

The output of this codemod has been reviewed in D20039637.

The new API is documented in
https://our.intern.facebook.com/intern/wiki/Thrift/FieldAccess/.

drop-conflicts

Reviewed By: simpkins

Differential Revision: D23465292

fbshipit-source-id: bb9df3ad183685fae28173da7275e6ecd34df048
2020-09-02 18:05:47 -07:00
Stefan Filip
da4c33c67a tests: add commit-location-to-hash integration test
Summary: Exercise location-to-hash functionality in edenapi.

Reviewed By: kulshrax

Differential Revision: D23456214

fbshipit-source-id: 2ab22eb045517a5927c2de502d8cfc9898daecef
2020-09-02 17:20:43 -07:00
Stefan Filip
1ddf5aaa0e tools: add location-to-hash command to read_res
Summary:
There aren't too many thigs that we can do with the responses that we get back
from the server. Thigs are somewhat application specific for this endpoint.
One option that is not available right now and might make sense to add is
limiting the number of entries that are printed for a given location.

Reviewed By: kulshrax

Differential Revision: D23456220

fbshipit-source-id: eb24602c3dea39b568859b82fc27b7f6acc77600
2020-09-02 17:20:43 -07:00
Stefan Filip
932450fb15 handlers: update location-to-hash endpoint with count parameter
Summary:
To reduce the size over the wire on cases where we would be traversing the
changelog on the client, we want to allow the endpoint to return a whole parent
chain with their hashes.

Reviewed By: kulshrax

Differential Revision: D23456216

fbshipit-source-id: d048462fa8415d0466dd8e814144347df7a3452a
2020-09-02 17:20:42 -07:00
Stefan Filip
7122cdded7 types: rename Location to CommitLocation
Summary:
Renaming all the LocationToHash related structures to CommitLocationToHash.
This is done for consistency. I realized the issue when the command for reading
the request from cbor was not what I was expecting it to be. The reason was that
the commit prefix was used inconsistently for LocationToHash.

Reviewed By: kulshrax

Differential Revision: D23456221

fbshipit-source-id: 0181dcaf81368b978902d8ca79c5405838e4b184
2020-09-02 17:20:42 -07:00
Stefan Filip
310b3616a6 blobrepo: instantiate segmented changelog as an attribute
Summary:
Segmented Changelog is a component that has multiple components of each own
that each can be configured in different ways. It seems that it already is
more complicated than other components in how it is set up and it will probably
evolve to have more knobs (caching comes to mind).

Right now we have 3 ways of instantiating SegmentedChangelog:
- Disabled, all requests return errors
- ReadOnly, requests to unprocessed commits return errors
- OnDemandUpdate, requests trigger commit processing when required

Reviewed By: aslpavel

Differential Revision: D23456217

fbshipit-source-id: a6016f05197abbc3722764fa8e9056190a767b36
2020-09-02 17:20:42 -07:00
Stefan Filip
b818a86631 config: add segmented changelog config parsing
Summary:
Parsing is done in the SegmentedChangelogConfig structure which will inform
how to construct the SegmentedChangelog in Mononoke.

Reviewed By: aslpavel

Differential Revision: D23456222

fbshipit-source-id: a7d5d81f4c166909164026e81af57f1c2ea32347
2020-09-02 17:20:42 -07:00
Stefan Filip
e57b1f9265 segmented_changelog: add on-demand updating dag implementation
Summary:
The Segmented Changelog must be built somewhere. One of the simplest deployments
of involves the on-demand update of the graph. When a commit that wasn't yet
processed is encountered, we sent it to processing along with all of it's
ancestors.

At this time not much attention was paid to the distinction of master commit
versus non-master commit. For now the expectation is that only commits from
master will exercise this code path. The current expectation is that clients
will only call location-to-hash using commits from master.
Let me know if there is an easy way to check if a commit is part of master.
Later changes will invest more in handling non-master commits.

Reviewed By: aslpavel

Differential Revision: D23456218

fbshipit-source-id: 28c70f589cdd13d08b83928c1968372b758c81ad
2020-09-02 17:20:42 -07:00
Stefan Filip
d50e09a41d segmented_changelog: add SegmentedChangelogBuilder
Summary:
This builders implements SqlConstruct and SqlConstuctFromMetadataDatabaseConfig
to make handling the Sql connection for IdMap consistent with what happens in
Mononoke in general.

Reviewed By: aslpavel

Differential Revision: D23456219

fbshipit-source-id: 6998afbbfaf1e0690a40be6e706aca1a3b47829f
2020-09-02 17:20:42 -07:00
Stefan Filip
66706d77c5 segmented_changelog: add SegmentedChangelog trait
Summary:
The trait provides two methods for location to hash translation. The first
returns a single hash and is existing functionality. The second returns a
list of hashes and represents new functionality. This diff also adds this
functionality to the Dag structure which is currently the only real
implementation for SegmentedChangelog.

Reviewed By: aslpavel

Differential Revision: D23456215

fbshipit-source-id: 0c2ca91672cf23129342c585f98446c0ebbdf7ef
2020-09-02 17:20:41 -07:00
Stefan Filip
10b233f180 blobrepo: move ChangesetFetcher to attributes
Summary:
I am planning to add Segmented Changelog to attributes.

I am writing an integration test for an EdenApi endpoint that depends on
Segmented Changelog and I would like to set it up to update on demand. When a
request comes in for a commit that we haven't parsed for Segmented Changelog we
want to update the structure on demand. This means that we probably need to
fetch commits. This means that we want to pass the ChangesetFetcher to Segmented
Changelog when it is built. Since Segmented Changelog fits well as an attribute
we want the ChangesetFetcher as an attribute.

I wonder how much thought has been given to attributes behaving as a dependency
injector in the `guice` sense.

Reviewed By: aslpavel

Differential Revision: D23428201

fbshipit-source-id: 7003c018ba806fd657dd8f071e0e83d35058b10f
2020-09-02 17:20:41 -07:00
Xavier Deguillard
37df55b270 telemetry: consolidate Fuse/PrjFS stats in ChannelThreadStats
Summary:
This helps make RequestData slightly more generic by depending less on Fuse
specific types/code.

Reviewed By: chadaustin

Differential Revision: D23467487

fbshipit-source-id: 830f8269e2114c2968dcc49d3b5574e687191e4d
2020-09-02 15:28:39 -07:00
Kostia Balytskyi
6e8cbd31b1 megarepotool: add gradual-merge-progress subcommand
Summary:
This is to be able to automatically report progress: how many merges has been
done already.

Note: this intentionally uses the same logic as regular `gradual-merge`, so that we always report correct numbers.

Reviewed By: StanislavGlebik

Differential Revision: D23478448

fbshipit-source-id: 3deb081ab99ad34dbdac1057682096b8faebca41
2020-09-02 12:18:31 -07:00
Xavier Deguillard
18642dbd1f fuse: remove Dispatcher from RequestData
Summary:
This is unused and is sadly Fuse specific, making it harder to move to inodes/,
thus, let's remove it.

Reviewed By: chadaustin

Differential Revision: D23467486

fbshipit-source-id: c34e1abe245cfb79b9414002fd1055b8c0a5f1d3
2020-09-02 12:15:48 -07:00
Xavier Deguillard
ce44616cb3 fuse: cleanup some include of RequestData
Summary:
None of these were used, let's remove them.

ps: I thought we had a system to detect unused headers and lint about them?

Reviewed By: chadaustin

Differential Revision: D23465783

fbshipit-source-id: c21a34c9838db29f4fd0057d3be4e0fcb527cd6d
2020-09-02 12:15:48 -07:00
Xavier Deguillard
7b2c803904 fuse: move BufVec.h to utils/
Summary:
This is not per-se fuse related, thus move it to a common location and remove
the duplicated define in FileInode.h

Reviewed By: chadaustin

Differential Revision: D23465192

fbshipit-source-id: 5fa7709f127c2d3372ee5ea3aeb89e793ea5b9f7
2020-09-02 12:15:48 -07:00
Xavier Deguillard
07fbcd50e4 inodes: move fuse/InodeNumber.{cpp,h} into inodes/
Summary: This file is not fuse specific, therefore, let's move it to a non-fuse folder.

Reviewed By: chadaustin

Differential Revision: D23464460

fbshipit-source-id: f70e94bb0ecc37bd74798fd230dee2058484f31b
2020-09-02 12:15:48 -07:00
Xavier Deguillard
340866508b store: rename the Fuse fetch cause
Summary:
In the code base, "channel" is used to denote the OS mechanism that sends
EdenFS notifications. In macOS and Linux, that's Fuse, on Windows, that's
ProjectedFS. To avoid platform specific naming in ObjectedFetchContext, let's
rename the fetch cause enum.

Reviewed By: kmancini

Differential Revision: D23462460

fbshipit-source-id: 3ac68cdf4999e6a3b4ff4ee266f94e1f9736df39
2020-09-02 12:15:48 -07:00
Durham Goode
537d5858bd archive: block full archives in large repositories
Summary:
The default archive behavior archives the entire working copy. That is
undesirable and easy to accidentally trigger in a large repository. Let's
prevent it and require users to specify what they want archived.

Reviewed By: quark-zju

Differential Revision: D23464818

fbshipit-source-id: c39a631d618c2007e442e691cda542400cf8f4c3
2020-09-02 11:38:08 -07:00
Thomas Orozco
b8e197fdb4 mononoke/lfs_server: allow enabling rate limits probabilistically
Summary:
If we exceed a rate limit, we probably don't want to just drop 100% of traffic.
This would create a sawtooth pattern where we allow a bunch of traffic, update
our counters, drop a bunch of traffic, update our counters again, allow a bunch
of traffic, etc.

To fix this, let's make limits probabilistic. This lets us say "beyond X GB/s,
drop Y% of traffic", which is closer to a sane rate limit.

It might also make sense to eventually change this to use ratelim. Initially,
we didn't do this because we needed our rate limiting decisions to be local to
a single host (because different hosts served different traffic), but now that
we spread the load for popular blobs across the whole tier, we should be able
to just delegate to ratelim.

For now, however, let's finish this bit of a functionality so we can turn it
on.

The corresponding Configerator change is here: D23472683

Reviewed By: aslpavel

Differential Revision: D23472945

fbshipit-source-id: f7d985fded3cdbbcea3bc8cef405224ff5426a25
2020-09-02 11:02:18 -07:00
Xavier Deguillard
32609a44ac cli: increase start timeout to 60s
Summary:
I've seen cases where `edenfsctl start` would timeout after 10s, but EdenFS
ends up starting up shortly after, let's increase the timeout a bit.

Reviewed By: singhsrb

Differential Revision: D23475926

fbshipit-source-id: 3ef495aae7a03b064cb7cdf72241a7f60a8c4b77
2020-09-02 10:47:26 -07:00
Wez Furlong
7efa1b5745 eden: add CaseSensitive template param to PathMap
Summary:
This commit adds a compile time option to select
between case-sensitive and case-insensitive-but-case-preserving
mode for `PathMap`.

This replaces the `ifdef _WIN32` preprocessor conditional
that was inline in a couple of the methods and allows
explicitly testing the behavior in both modes of operation.

The unit tests have been expanded and rounded out to catch
some inconsistent behavior; insertion wasn't respecting
case insensitivity in all ... cases.

Hopefully we not relying on that behavior in the windows
flavor of the build; let's see what our CI says.

Reviewed By: genevievehelsel

Differential Revision: D23232629

fbshipit-source-id: 96e752e501d0398ec2bed5879f7c11c7ab6e1d70
2020-09-02 10:19:14 -07:00
Wez Furlong
533af0f60b eden: fixup debug level for SpawnedProcess
Summary:
I switched these from ERR to DBG2 thinking that would
take them out of the log by default, but we log DBG2 by default
so that didn't have the desired result.

This changes the log level to DBG6 which is an arbitrary log
level that isn't included by default.

Reviewed By: genevievehelsel

Differential Revision: D23467738

fbshipit-source-id: dd0c75f86318ece27313c237938f24f55758eec1
2020-09-02 10:15:03 -07:00
Stefan Filip
c2079c3464 revisionstore: use async-runtime crate for lfs
Summary:
Replacing uses of the custom Runtime in lfs with the global runtime in the
`async-runtime` crate.

Reviewed By: xavierd

Differential Revision: D23468347

fbshipit-source-id: 61d2858634a37eb2d7d807104702d24889ec047a
2020-09-02 10:01:08 -07:00
Stanislau Hlebik
cdf96a20dd mononoke: asyncify redaction_add
Summary: Will change it in the next diff, so let's asyncify it now.

Reviewed By: aslpavel

Differential Revision: D23475332

fbshipit-source-id: f25fb7dc16f99cb140df9374f435e071401c2b90
2020-09-02 09:28:48 -07:00
Alex Hornby
b22599c500 mononoke: memo the hash values of interned paths in the walker
Summary: Memo the hash values of interned paths in the walker. The interner calls the hash function inside a lock that gets heavily contended, so this reduces the time the lock is held.

Reviewed By: farnz

Differential Revision: D23075260

fbshipit-source-id: 3ee50e3ce56106eadd17dc7d737ba95282640051
2020-09-02 05:52:33 -07:00
Alex Hornby
46cc110012 mononoke: switch walker from arc-intern to internment
Summary: Switch the walker from arc-intern::ArcIntern to internment::ArcIntern as internment does not need to acquire its map's locks on every drop.

Reviewed By: farnz

Differential Revision: D23075265

fbshipit-source-id: 6dd241aed850ec0fd3c8a4e68dda06053ec0b424
2020-09-02 05:52:33 -07:00
Kostia Balytskyi
d49406d847 repo_client: get rid of unneeded perf counters
Summary:
These two perf counters proved to be not very convenient to evaluate the
volume of undesired file fetches. Let's get rid of them. Specifically, they are
not convenient, because they accumulate values and it's hard to aggregate over
them.

Note that I don't do the same for tree fetches, as there's no better way of
estimating those now.

Reviewed By: mitrandir77

Differential Revision: D23452913

fbshipit-source-id: 08f8dd25eece495f986dc912a302ab3109662478
2020-09-02 05:02:46 -07:00
Thomas Orozco
de260c7e9d py3: fix debugstacktrace
Summary:
debugstacktrace is broken right now on Python 3: it wants to write to stderr,
which expects `bytes`, but it tries to write a `str`. This fixes it.

Reviewed By: DurhamG

Differential Revision: D23447984

fbshipit-source-id: 5896ae858f6022276fa47e08636c700159a2a678
2020-09-02 00:53:28 -07:00
Jun Wu
a0223bc7e7 dag: make iddagstore test generic
Summary: Make it possible to test other IdDagStores.

Reviewed By: sfilipco

Differential Revision: D23438178

fbshipit-source-id: e5fc1b20833c71dd7569c77c31c76a26a6e357fe
2020-09-01 23:58:04 -07:00
Mark Mendoza
53e1072e7d Deleting failing eden systemd_fixture_test
Summary:
We are trying to get eden running in our atypical EDA corp environment.
When testing it out on one of these machines, we got things sorted out to the point where the only test failures were coming from this file.
chadaustin identified this as being a test of dead code, and so we decided to go for a deletion of it.
If this work resumes, these tests can be retrieved from version control and then be made to work on Centos7 (hopefully at that point we'll also have contbuild/utd magic set-up to have that re-enabling automatically trigger the build/test).

Reviewed By: genevievehelsel

Differential Revision: D23463831

fbshipit-source-id: 7714547c04573b94dbb2d9acf7906734d853c5aa
2020-09-01 22:39:10 -07:00
Jun Wu
c84653c7a9 py3: fix a crecord encoding issue
Summary: This only happens if specified context shows up.

Reviewed By: ytsheng

Differential Revision: D23460476

fbshipit-source-id: 788e236bd8e28918afa6b1e0a4e1be297b6f5a66
2020-09-01 21:24:53 -07:00
Jun Wu
211739f00c dag: remove SpanSetAsc
Summary:
Now SpanSet can easily support `push_front`, we can just use SpanSet
efficiently without SpanSetAsc.

Reviewed By: sfilipco

Differential Revision: D23385246

fbshipit-source-id: b2e0086f014977fa990d5142e6eee844293e7ca5
2020-09-01 21:02:08 -07:00
Jun Wu
64bdf70811 dag: add SpanSet::intersection_span_min
Summary: To remove SpanSetAsc, its API needs to be implemented on SpanSet.

Reviewed By: sfilipco

Differential Revision: D23385250

fbshipit-source-id: ebd9d537287b5c1cde6e2c52ffb6da57dbd71852
2020-09-01 21:02:08 -07:00
Jun Wu
16eaceafe9 dag: use VecDeque for SpanSet
Summary: This will make it possible to `push_front` and remove SpanSetAsc special case.

Reviewed By: sfilipco

Differential Revision: D23385249

fbshipit-source-id: 63ac67e9bce7cb281236399b3fb86eba23bbf8a0
2020-09-01 20:53:32 -07:00
Jun Wu
71f101054a dag: implement binary_search_by for VecDeque
Summary:
This makes it easier to replace Vec<Span> with VecDeque<Span> in SpanSet for
efficient push_front and deprecates SpanSetAsc (which uses Id in a bit hacky
way - they are not real Ids).

Reviewed By: sfilipco

Differential Revision: D23385245

fbshipit-source-id: b612cd816223a301e2705084057bd24865beccf0
2020-09-01 20:38:29 -07:00
Jun Wu
d8225764a5 py3: speed up simplemerge
Summary:
One user reports very very slow rebase (tens of minutes and running). The
commit is not very large. Python 2 can complete the rebase in 6 seconds.
I tracked it down to this code path. Making the change makes Python 3
rebase fast too (< 10 seconds). I haven't tracked down exactly why Python
3 is slow yet (maybe N^2 a += b)?

Some numbers about the slow merge:

  ipdb> p len(m3.atext)
  17984924
  ipdb> p len(m3.btext)
  17948110
  ipdb> p len(m3.a)
  613353
  ipdb> p len(m3.b)
  612129
  ipdb> p len(m3.base)
  612135

Reviewed By: singhsrb

Differential Revision: D23441221

fbshipit-source-id: 14b725439f4ecd3352edca512cdde32958b2ce29
2020-09-01 20:32:10 -07:00
Jun Wu
2d02d3b0f7 dag: validate SpanSet order and no mergable adjacent spans
Summary:
Previously the `is_valid()` function only checks about ordering.
Make it also check "no mergeable adjacent spans" and `span.low<=span.high`.
To provide better debug messages, the function does assertions
directly without returning a bool.

Reviewed By: sfilipco

Differential Revision: D23385247

fbshipit-source-id: 84829e9242e47e68dc2a4b2a6775b13331eba959
2020-09-01 20:27:03 -07:00
Jun Wu
4bf5817dad dag: always merge adjacent spans in SpanSet
Summary:
Previously, `SpanSet::from_sorted_spans` allows having adjacent spans like
`[1..=2, 3..=4]`, while `SpanSet::from_spans` would merge them into `[1..=4]`.
Change it so `SpanSet::from_sorted_spans` merges them too.  This simplifies
the `contains` logic and could make some Sets more efficient.

Reviewed By: sfilipco

Differential Revision: D23385248

fbshipit-source-id: 85b5ba9533f15034779e93255085a4fa09c6328a
2020-09-01 20:04:12 -07:00
Jun Wu
afa787bd5c rage: do not report 'serve' commands in sigtrace section
Summary:
There were some rage pastes that have very long "sigtrace" section (ex.  P141069793)
It turns out the sigtrace has lots of "serve" commands that is started in a
non-forking mode, producing very long traces like:

  Tracing Data:
  Process 726702 Thread 2610476:
     Start Dur.ms | Name                                              Source
         0    ... | Run Command                                       hgcommands::run line 296
                  | - pid = 726702                                    :
                  | - uid = 117869                                    :
                  | - nice = 0                                        :
                  | - args = ["/opt/fb/mercurial/hg.real","...        :
                  | - parent_pids = [2610476,1]                       :
                  | - parent_names = ["/opt/fb/mercurial/hg.real",""] :
                  | - exit_code = 0                                   :
                  | - max_rss = 0                                     :
        35    ... | Main Python Command                               (perftrace)
        35    +22  \ Repo Setup                                       edenscm.mercurial.hg line 168
                    | - local = true                                  :
        70   +802  \ Main Python Command                              (perftrace)
        72   +799   | Status                                          edenscm.mercurial.dirstate line 957
                    | - A/M/R Files = 0                               :
        74   +537   | Get EdenFS Status                               (perftrace)
                    | - status = true                                 :
       940   +914  \ Main Python Command                              (perftrace)
       943   +910   | Status                                          edenscm.mercurial.dirstate line 957
                    | - A/M/R Files = 0                               :
       943   +617   | Get EdenFS Status                               (perftrace)
                    | - status = true                                 :
      1875   +866  \ Main Python Command                              (perftrace)
      1877   +863   | Status                                          edenscm.mercurial.dirstate line 957
                    | - A/M/R Files = 0                               :
      1878   +604   | Get EdenFS Status                               (perftrace)
                    | - status = true                                 :
      2759  +2208  \ Main Python Command (719 times)                  (perftrace)
      3155   +860  \ Main Python Command                              (perftrace)
      3158   +856   | Status                                          edenscm.mercurial.dirstate line 957
                    | - A/M/R Files = 0                               :
      3158   +543   | Get EdenFS Status                               (perftrace)
                    | - status = true                                 :
      4068   +883  \ Main Python Command                              (perftrace)
      4071   +879   | Status                                          edenscm.mercurial.dirstate line 957
                    | - A/M/R Files = 0                               :
      4071   +591   | Get EdenFS Status                               (perftrace)
                    | - status = true                                 :
      4967   +913  \ Main Python Command                              (perftrace)
      4969   +910   | Status                                          edenscm.mercurial.dirstate line 957
                    | - A/M/R Files = 0                               :
      4969   +621   | Get EdenFS Status                               (perftrace)
                    | - status = true                                 :
      6630   +922  \ Main Python Command                              (perftrace)
      6633   +918   | Status                                          edenscm.mercurial.dirstate line 957
                    | - A/M/R Files = 0                               :
      6633   +640   | Get EdenFS Status                               (perftrace)
                    | - status = true                                 :
      7615   +856  \ Main Python Command                              (perftrace)
      7622   +849   | Status                                          edenscm.mercurial.dirstate line 957
                    | - A/M/R Files = 0                               :
      7622   +581   | Get EdenFS Status                               (perftrace)
                    | - status = true                                 :
      8487   +951  \ Main Python Command                              (perftrace)
      8490   +947   | Status                                          edenscm.mercurial.dirstate line 957
                    | - A/M/R Files = 0                               :
      8490   +671   | Get EdenFS Status                               (perftrace)
                    | - status = true                                 :
    139275   +794  \ Main Python Command                              (perftrace)
    139278   +790   | Status                                          edenscm.mercurial.dirstate line 957
                    | - A/M/R Files = 0                               :
    139278   +539   | Get EdenFS Status                               (perftrace)
                    | - status = true                                 :
    140132   +837  \ Main Python Command                              (perftrace)
    140135   +832   | Status                                          edenscm.mercurial.dirstate line 957
                    | - A/M/R Files = 0                               :
    140135   +544   | Get EdenFS Status                               (perftrace)
                    | - status = true                                 :
    140992   +814  \ Main Python Command                              (perftrace)
    140994   +811   | Status                                          edenscm.mercurial.dirstate line 957
                    | - A/M/R Files = 0                               :
    140994   +546   | Get EdenFS Status                               (perftrace)
                    | - status = true                                 :
    306862   +864  \ Main Python Command                              (perftrace)
    306865   +860   | Status                                          edenscm.mercurial.dirstate line 957
                    | - A/M/R Files = 0                               :
    306865   +586   | Get EdenFS Status                               (perftrace)
                    | - status = true                                 :
    307801   +858  \ Main Python Command                              (perftrace)
    307804   +854   | Status                                          edenscm.mercurial.dirstate line 957
                    | - A/M/R Files = 0                               :
    307804   +587   | Get EdenFS Status                               (perftrace)
                    | - status = true                                 :
    308690   +874  \ Main Python Command                              (perftrace)
    308693   +869   | Status                                          edenscm.mercurial.dirstate line 957
                    | - A/M/R Files = 0                               :
    308693   +610   | Get EdenFS Status                               (perftrace)
                    | - status = true                                 :
    506391   +924  \ Main Python Command                              (perftrace)
    506396   +917   | Status                                          edenscm.mercurial.dirstate line 957
                    | - A/M/R Files = 0                               :
    506396   +645   | Get EdenFS Status                               (perftrace)
                    | - status = true                                 :
    507401   +898  \ Main Python Command                              (perftrace)
    ....

Our chg usage does not start non-forking servers, those are started by apparently something related to emacs:

  args = ['--config', 'ui.interactive=True', '--config', 'ui.editor=emacsclient', '--config', 'extensions.shelve=', 'serve', '--cmdserver', ...]

Hide them in sigtrace to make rage paste shorter.

Reviewed By: DurhamG

Differential Revision: D23459991

fbshipit-source-id: 7ccc27dbe5ef03e0b97dbfec57213e5478003b1c
2020-09-01 19:57:41 -07:00
Jun Wu
5f0a6f35af py3: fix conflictinfo compatibility
Summary: File content needs to be encoded.

Reviewed By: DurhamG

Differential Revision: D23463706

fbshipit-source-id: e8e512668452618e3b139d7d94ec8776f2b6b25b
2020-09-01 18:31:35 -07:00
Jun Wu
062a83cc16 restack: fix bookmark movement with partial successful auto restack
Summary:
See the test change. Partially successful auto restack should have bookmarks
moved.

Reviewed By: DurhamG

Differential Revision: D23441932

fbshipit-source-id: 07e509a70bcc5cf81f702d40ec1b8dc4a5a781ff
2020-09-01 18:05:44 -07:00
Jun Wu
8191be83c1 tests: add a test for auto rebase bookmark movement issue
Summary: Reported By: asukhachev.

Reviewed By: DurhamG

Differential Revision: D23441931

fbshipit-source-id: b07f47e6796d4d0363250b3b1463f829bb5d0efa
2020-09-01 18:05:44 -07:00
Jun Wu
b3df065db5 debugshell: improve "%trace" UX
Summary: Print hints about how to enable detailed Python tracing.

Reviewed By: kulshrax

Differential Revision: D23437210

fbshipit-source-id: 009425a83945f9b5af2a6280c2572a782c6b349a
2020-09-01 13:49:13 -07:00
Wez Furlong
154d7309c9 eden: introduce SpawnedProcess
Summary:
This commit introduces a new process spawning class derived
from the ChildProcess class in the watchman codebase.

`SpawnedProcess` is similar to folly::Subprocess but is designed around the
idea that we will use a system provided spawning API to start a process, rather
than assuming the use of `fork`.

`fork` is to be avoided because it can be expensive for processes with large
address spaces and also because it interacts poorly with threads on macOS.  In
particular, we see the objC runtime terminating our process in some scenarios
where fork and threads are mixed.

There are some important differences from `folly::Subprocess` and that means
that some assumptions and uses need to be altered slightly from their prior
workings.  For example, detaching a SpawnedProcess moves the responsibility of
waiting on the child to a periodic task as there is no way to detach via
posix_spawn without also using fork.

On the plus side, this commit allows unifying spawning between posix and
windows systems, which simplifies the code!

Reviewed By: xavierd

Differential Revision: D23287763

fbshipit-source-id: b662af1d7eaaa9ed445c42f6c5765ae9af975eea
2020-09-01 13:31:32 -07:00
Wez Furlong
624c185094 eden: introduce FileDescriptor and Pipe types
Summary:
This commit introduces a few types from the watchman codebase:

`FileDescriptor` which is on posix systems represents a file descriptor,
and on Windows is a HANDLE (which can be a file, pipe or socket descriptor).

`Pipe` is a convenience struct that holds the read and write ends of a Pipe.
Note that we have a conceptual class with a windows specific Pipe type under
eden/fs/win/utils/Pipe.h; I remove that in the next diff in the stack.

There are a couple of differences from the watchman code

Reviewed By: chadaustin

Differential Revision: D23287819

fbshipit-source-id: 6ca90ba345037c6c3e308f588d690a899c9866a5
2020-09-01 13:31:32 -07:00
Kostia Balytskyi
e7ddc6cc13 undesired fetches: regex-based reporting
Summary:
We want to be able to report more than just on one prefix. Instead, let's add a regex-based reporting. To make deployment easier, let's keep both options for now and later just remove prefix-based one.

Note: this diff also changes how a situation with absent `undesired_path_prefix_to_log` is treated. Previously, if `undesired_path_prefix_to_log` is absent, but `"undesired_path_repo_name_to_log": "fbsource"`, it would report every path. Now it won't report any, which I think is a saner behavior. If we do ever want to report every path, we can just add `.*` as a regex.

Reviewed By: StanislavGlebik

Differential Revision: D23447800

fbshipit-source-id: 059109b44256f5703843625b7ab725a243a13056
2020-09-01 12:01:00 -07:00
Thomas Orozco
0ab9638ef6 py3: fix lfs debuglfsreceive{,all}
Summary:
Those commands are broken right now: they try to write bytes but don't use
`writebytes`.

Reviewed By: DurhamG

Differential Revision: D23450968

fbshipit-source-id: 5d554771459f81718d90e5bad9a4c439cbb05d97
2020-09-01 11:04:16 -07:00
Thomas Orozco
46ab9553bc py3: fix lfs uploads not working anymore
Summary:
When Python 3 wants to upload a file-like object, it does something a bit
awkward: it sets the `Transfer-Encoding` to `chunked`, but doesn't actually
chunk the data. Also, for some reason ,it still sets the `Content-Length`. I'm
not sure where that is coming from.

The thing is, when you set `Transfer-Encoding` to `chunked`, you do need to
chunk, or the other end is going to get very confused.

Unfortunately, this is not what happens here (note that the "send" logs are
from enabling http tracing in Python here, and those logs are basically one
line before `.send()` into a socket, so the chunking doesn't appear to happen
elsewhere):

```
[torozco@devbig051]~/opsfiles_bin % echo "aaaa" | ~/fbcode/buck-out/gen/eden/scm/__hg-py3__/hg-py3.sh debuglfssend https://mononoke-lfs.internal.tfbnw.net/opsfiles_bin
send: b'PUT /opsfiles_bin/upload/11a77c3d96c06974b53d7f40a577e6813739eb5c811b2a86f59038ea90add772/5 HTTP/1.1\r\nAccept-Encoding: identity\r\nContent-length: 5\r\nx-client-correlator: tQT3yBfFEzhVtqI5\r\naccept: application/mercurial-0.1\r\ncontent-type: application/x-www-form-urlencoded\r\nhost: mononoke-lfs.internal.tfbnw.net\r\ntransfer-encoding: chunked\r\nuser-agent: mercurial/4.4.2_dev git/2.15.1\r\n\r\n'
sendIng a read()able
send: b'aaaa\n'
reply: 'HTTP/1.1 400 Bad request\r\n'
header: Content-Type: text/html; charset=utf-8
header: Access-Control-Allow-Origin: *
header: proxy-status: client_read_error; e_upip="AcLKajO63Vab0hC4kzGZQsqck3P_YOu7HsBzshC-NCbuo31tlWWqCiVw5xVLh44LYYe7qioCPqYSb8-1cBpdvFDZb_t5oYRP1Q"; e_proxy="AcJjRKHG02qo6Bv6fEPCUVF7DpCyrq3rmSnXhRLWakKWREEvVpk4jc-tzDyG6l9jvn3vNo8PYPG_5hLtC3L1"
header: Date: Tue, 01 Sep 2020 13:10:35 GMT
header: Connection: close
header: Content-Length: 2959
```

What's a bit confusing to me here is where this Content-length header comes
from. Indeed, normally Python 3 will:

- Not infer a content-length for file-like objects (which is what we have)
  https://fburl.com/ms94eq31
- Set Transfer-Encoding if no Content-Length is present:
  https://fburl.com/f81g8v2j

So, it's a bit unexpected that a) we have a Content-Length (we shouldn't), and
that we b) also have a Transfer-Encoding header. That said, setting the
Content-Length does fix the problem, so that's what this diff does.

Reviewed By: DurhamG

Differential Revision: D23450969

fbshipit-source-id: e1f535ff3d0b49c0c914130593d9aebe89ba18ca
2020-09-01 11:04:16 -07:00
Viet Hung Nguyen
2c1d4a49ad mononoke/repo_import: change logic of file paths rewriting with multiple movers
Summary:
This diff modifies how we rewrite file paths when we import into a repo by allowing the tool to apply multiple movers.

Motivation:
When we try to import into a small repo that pushredirects to a large repo, we have decided to import into the large repo first, then backsync to the small repo. To do that, we have to set a couple of flags related to importing into the large repo (see: D23294833 (d6895d837d)): bookmarks and import destination path.  Previously, we fixed the destination path in large repo by applying the small_to_large repo syncer's mover on the destination path in small repo. e.g:
if small_to_large repo syncer mover = {
default_action = prepend(**large_dir**)
map = [...]},
then **destination_path** in small repo becomes **large_dir/destination_path** in large repo.
After this, we prepended the imported files with the new prefix with another mover: prepend(**large_dir/dest_path**)
a -> large_dir/dest_path/a
Consequently, all directories and files under **destination_path** would get imported under **large_dir/destination_path** in large repo with this logic. e.g.
However, it's possible that with push-redirections, some directories would get remapped to a different place in large repo. e.g
small_to_large syncer mover = {
default_action = prepend(**large_dir**)
map = [
dest_path/b -> random_dir/b
]},
but with the current repo_import implementation dest_path/b would get prepended to large_dir/dest_path/b.
To avoid this, we apply multiple movers on the imported files. e.g.
1. we prepend all files with dest_path:
    mover = {
    default_action: prepend(**dest_path**)
    map={}} =>
    a -> dest_path/a
    b -> dest_path/b
2. we remap the files using the small_to_large repo syncer mover:
    mover = {
 default_action: prepend(**large_dir**)
 map =
 {dest_path/b -> random_dir/b}} =>
   dest_path/a -> large_dir/dest_path/a
   dest_path/b -> random_dir/b

Reviewed By: StanislavGlebik

Differential Revision: D23371244

fbshipit-source-id: 0bf4193b24d73c79ed00dfb38e2b0538388d1c0f
2020-09-01 09:26:07 -07:00
Pavel Aslanov
fffcf5b966 utility to keep streaming clone data warm
Summary: This is streaming clone warmup binary as per https://fb.quip.com/hfuBAdYnzr9M

Reviewed By: StanislavGlebik

Differential Revision: D23347029

fbshipit-source-id: f187a2f3529a7eae5998bab199228bfbe6057e6e
2020-09-01 07:13:33 -07:00
Stanislau Hlebik
14527beaf4 add ObjectFetchContext with causeDetail field
Summary:
As previous diffs in the stack show there were at least one place in the
codebase which used incorrect object context logger and that resulted in "blind
spots" in undesired file fetches logging i.e. undesired file fetches were
logged, but neither pid nor cmd-line was logged.

There are quite a few places in the codebase that use null
object fetch context, and threading the correct object fetch context to all of
them might be hard. Threading the context is a bit annoying, so it would be good to know something like "EdenDispatcher code is responsible for most of the blind spots, so let's thread the correct context there first". Or it would be equally good to know that none of the null object context are responsible for blind spots.

This diff might help us decide where we need to thread real object fetch context
first. Instead of passing null object fetch context let's pass null object
fetch context with causeDetail field. This field will be logged to scuba (see
BackingStoreLogger::logImport code), and instead of getting "Unknown" interface
we'll get e.g. "Unknown - EdenDispatcher::create", and that would highlight
where we need to thread the context.

A note about implementation - getNullContextWithCauseDetail returns a raw pointer
which is expected to be static i.e. it should work similarly to current
getNullContext implementation. It's quite a hack, but allows us to get rid of
memory allocations (we'd have one memory allocation per place in the code where
getNullContextWithCauseDetail). Let me know if you are ok with this hack.

Reviewed By: kmancini

Differential Revision: D23422526

fbshipit-source-id: e576bba9fc09e160fc42771c7589cdd1694d93c0
2020-09-01 03:39:18 -07:00
Stanislau Hlebik
2e2e2432a7 sparse: warn if dirstate includes marker files
Summary:
As a follow up to the previous diff, let's also warn if dirstate includes
marker files that should not be included in any sparse profiles.

Reviewed By: DurhamG

Differential Revision: D23414361

fbshipit-source-id: 3d171328bf0ba5754e5bacde85f09abb4fed8603
2020-08-31 23:21:41 -07:00
Stanislau Hlebik
eec5a3d725 return pid even if opcode is 0
Summary:
I'm not sure if that's the right thing to do, so I'd like feedback on that.
Check for opcode was added in D22050919 (fdb1af8bc9), but there was also a comment from
chadaustin

```
Why do you think fuseHeader isn't staying valid? It's a struct that's copied
into the RequestData. It does look like stealReq() sets the opcode to 0, but I
don't see anything that would affect the pid field.
```

So I wonder if this code can be responsible for some of the blind spots in
logging?

Reviewed By: chadaustin

Differential Revision: D23422635

fbshipit-source-id: 4d9a7171d685eb3f6f69da7b8a191df2f65ad897
2020-08-31 23:18:30 -07:00
Jun Wu
56d0255228 extutil: drop runbgcommand
Summary: Callsites were migrated to `util.spawndetached`.

Reviewed By: DurhamG

Differential Revision: D23124753

fbshipit-source-id: f0345461a3f79f9bb6ff3a58e00cdf0ed1893645
2020-08-31 17:34:49 -07:00
Jun Wu
2cdca65aed remotefilelog: runshellcommand -> spawndetached
Summary: There seems to be no need to use a shell.

Reviewed By: DurhamG

Differential Revision: D23124756

fbshipit-source-id: 7de1c23e2325fe88dc4c6a2c90563d06f109ed2f
2020-08-31 17:34:49 -07:00
Jun Wu
ffb93ca839 commandcloud: runbgcommand -> spawndetached
Summary:
The Rust process utility avoids issues with interaction with Python and can do file
redirection on Windows.

Reviewed By: DurhamG

Differential Revision: D23124755

fbshipit-source-id: f72b88bafd19b3b41e53afbf6a4095d0d6bcb93a
2020-08-31 17:34:49 -07:00
Jun Wu
6e2a90ddb5 hooks: add predefined hook to run fsync
Reviewed By: DurhamG

Differential Revision: D22993217

fbshipit-source-id: 2cfb6b26479cd7dad02419fb76fa5d3ca5dd66db
2020-08-31 17:34:49 -07:00
Jun Wu
a01693df0e util: use Rust pyprocess to implement spawndetached
Summary:
The Rust bindings handle the cross-platform differences and avoids issues
with Python / Rust interaction. Use it.

As we're here, extend the API to support cwd and env.

Reviewed By: DurhamG

Differential Revision: D23124171

fbshipit-source-id: fdc13f6eaeb25c05b53d385eb220af33dad984e1
2020-08-31 17:34:48 -07:00
Jun Wu
a90c8ea775 bindings: export rust process handling to Python
Summary:
Spawning processes turns out to be tricky.

Python 2:

- "fork & exec" in plain Python is potentially dangerous. See D22855986 (c35b8088ef).
  Disabling GC might have solved it, but still seems fragile.
- "close_fds=True" works on Windows if there is no redirection.
- Does not work well with `disable_standard_handle_inheritability` from `hgmain`.
  We patched it. See `contrib/python2-winbuild/0002-windows-make-subprocess-work-with-non-inheritable-st.patch`.

Python 3:

- "subprocess" uses native code for "fork & exec". It's safer.
- (>= 3.8) "close_fds=True" works on Windows even with redirection.
- "subprocess" exposes options to tweak low-level details on Windows.

Rust:

- No "close_fds=True" support for both Windows and Unix.
- Does not have the `disable_standard_handle_inheritability` issue on Windows.
- Impossible to cleanly support "close_fds=True" on Windows with existing stdlib.
  https://github.com/rust-lang/rust/pull/75551 attempts to add that to stdlib.
  D23124167 provides a short-term solution that can have corner cases.

Mercurial:

- `win32.spawndetached` uses raw Win32 APIs to spawn processes, bypassing
  the `subprocess` Python stdlib.
- Its use of `CreateProcessA` is undesirable. We probably want `CreateProcessW`
  (unless `CreateProcessA` speaks utf-8 natively).

We are still on Python 2 on Windows, and we'd need to spawn processes correctly
from Rust anyway, and D23124167 kind of fills the missing feature of `close_fds=True`
from Python. So let's expose the Rust APIs.

The binding APIs closely match the Rust API. So when we migrate from Python to
Rust, the translation is more straightforward.

Reviewed By: DurhamG

Differential Revision: D23124168

fbshipit-source-id: 94a404f19326e9b4cca7661da07a4b4c55bcc395
2020-08-31 17:34:48 -07:00
Jun Wu
b7f2ee577a spawn-ext: extend Command::spawn to avoid inheriting fds
Summary:
The Rust upstream took the "set F_CLOEXEC on every opened file" approach and
provided no support for closing fds at spawn time to make spawn lightweight [1].

However, that does not play well in our case:
- On Windows:
  - stdin/stdout/stderr are not created by Rust, and inheritable by
    default (other process like `cargo`, or `dotslash` might leak them too).
  - a few other handles like "Null", "Afd" are inheritable. It's
    unclear how they get created, though.
  - Fortunately, files opened by Python or C in edenscm (ex. packfiles) seem to
    be not inheritable and do not require special handling.
- On Linux:
  - Files opened by Python or C are likely lack of F_CLOEXEC and need special
    handling.

Implement logic to close file handlers (or set F_CLOEXEC) explicitly.

[1]: https://github.com/rust-lang/rust/issues/12148

Reviewed By: DurhamG

Differential Revision: D23124167

fbshipit-source-id: 32f3a1b9e3ae3a9475609df282151c9d6c4badd4
2020-08-31 17:34:48 -07:00
Jun Wu
b3fd513ea4 util: make gethgcmd more reliable
Summary:
It uses `sys.argv`, which might be rewritten by `debugshell`. Capture
`sys.argv` to make hgcmd more reliable.

Reviewed By: DurhamG

Differential Revision: D22993215

fbshipit-source-id: 5fa319e8023b656c6cdf96cb3229ea9f2c9b9b99
2020-08-31 17:34:48 -07:00
Jun Wu
333177101f hooks: add a hook point after write commands
Summary: This allows us to run commands after changes were made to the repo.

Reviewed By: DurhamG

Differential Revision: D22993218

fbshipit-source-id: d9943dcda94da42970fb9107f48f4caa14b6a9d4
2020-08-31 17:34:48 -07:00
David Tolnay
75c2118e01 Remove crate_root from Rust dependency info
Reviewed By: danobi

Differential Revision: D23430948

fbshipit-source-id: c4b374021325fc247121ceecd0e82a0291aa75d6
2020-08-31 14:43:24 -07:00
Jun Wu
9aa9d022ae util: stop using time.perf_counter() for timer()
Summary:
Some code paths (ex. metalog.commit) use `util.timer()` as a way to get
seconds since epoch, and get 0 for tests. Other use-cases of `util.timer()`
are ad-hoc time measure for displaying speed / progress. They do not need high
precision or strong guarantee that the clock does not go backwards. Drop the
`time.perf_counter()` to meet the first use-case's expectation.

Reviewed By: singhsrb

Differential Revision: D23431253

fbshipit-source-id: 8bf2d1ed32e284e17285742e1d0fd7178f181fb3
2020-08-31 13:04:54 -07:00
Jun Wu
9f33746b31 histedit: do not show revision numbers
Summary:
With segments backend, the revision numbers will be longer than commit hashes
and are confusing.

Reviewed By: DurhamG

Differential Revision: D23408971

fbshipit-source-id: e2057fa644fc7b6be4291f879eee3235bb4e687b
2020-08-31 11:57:53 -07:00
Jun Wu
96548cade8 remotefilelog: do not assume range(len(cl)) are valid revs in _linkrev
Summary: `range(len(cl))` contains invalid revs with segments backend.

Reviewed By: DurhamG

Differential Revision: D23411209

fbshipit-source-id: 2f83a5402bb46824cf38871926c1954507b64b56
2020-08-31 11:57:53 -07:00
Jun Wu
ff2d572717 changelog2: avoid excessive memory usage during large pulls
Summary:
Pulling from older repos (ex. years ago) could require GBs of commit text data.
Flush commit data if they exceed certain size.

This is for revlog compatibility.
In the future we probably just make commit text lazy to avoid this kind of issues.

Reviewed By: DurhamG

Differential Revision: D23408834

fbshipit-source-id: 273384f5a05be07877bb1c9871c17b53ba436233
2020-08-31 11:57:53 -07:00
Jun Wu
01c551bb30 hgcommits: add flush_commit_data API
Summary: This would be used to avoid excessive memory usage during pull.

Reviewed By: DurhamG

Differential Revision: D23408833

fbshipit-source-id: 8edd95ab8201697074f65cc118d14755a230567d
2020-08-31 11:57:53 -07:00
Jun Wu
fee02d78e0 changelog2: only call addcommits once in addgroup
Summary:
`addcommits` is designed to be more efficiently if called with a batch of
commits. So let's buffer the commits to add then only call it once.

This avoids some N^2 behaviors, for example, the NameDag internally will
prepare "snapshot" of itself which involves coping the pending Rust vecs
about the segments and id <-> hash map.

The change makes `pull` usable from unusably slow:

Original Python Revlog backend:

```
In [1]: %trace repo.pull(bookmarknames=['master'],quiet=False)
 5191   +466      | Apply Changegroup                                   edenscm.mercurial.bundle2 line 516
                  | - Commits = 125                                     :
                  | - Range = a1d1b3ade136:2e3fe78af189                 :
 5191   +466      | changegroup.cg1unpacker.apply                       edenscm.mercurial.changegroup line 313
 5192   +416      | Progress Bar: commits                               (progressbar)
 5192   +415      | changelog.changelog.addgroup                        edenscm.mercurial.changelog line 536
 5192   +409      | revlog.revlog.addgroup                              edenscm.mercurial.revlog line 2116
 5215   +371      | changelog.changelog._addrevision (125 times)        edenscm.mercurial.changelog line 558
```

DoubleWrite (Segments + Revlog) backend, Before:

```
In [2]: %trace repo.pull(bookmarknames=['master'],quiet=False)
  2396 +154059   | Apply Changegroup                            edenscm.mercurial.bundle2 line 516
                 | - Commits = 323                              :
                 | - Range = cb0b100180ba:5fb57c74f72e          :
  2396 +154059   | changegroup.cg1unpacker.apply                edenscm.mercurial.changegroup line 313
  2397 +151433    \ Progress Bar: commits                       (progressbar)
  2397 +151433     | changelog2.changelog.addgroup              edenscm.mercurial.changelog2 line 334
```

DoubleWrite (Segments + Revlog) backend, After:

```
In [2]: %trace repo.pull(bookmarknames=['master'],quiet=False)
 4629   +512      | Apply Changegroup                                       edenscm.mercurial.bundle2 line 516
                  | - Commits = 45                                          :
                  | - Range = cf23c6972934:1ff0c5f0e7ad                     :
 4629   +512      | changegroup.cg1unpacker.apply                           edenscm.mercurial.changegroup line 313
 4630   +494      | changelog2.changelog.addgroup                           edenscm.mercurial.changelog2 line 334
```

Reviewed By: DurhamG

Differential Revision: D23390435

fbshipit-source-id: dd97a5008dedd844d4134b87bfef190fa739a80b
2020-08-31 11:57:52 -07:00
Jun Wu
e5a4533622 revlog: drop addrevisoncb from addgroup
Summary:
The users of addrevisoncb are gone.
This also removes the "alwayscache" parameter of "_addrevision".

Reviewed By: DurhamG

Differential Revision: D23390437

fbshipit-source-id: 7edd9dd0b93d4cb9d4f35d088a1aef719b450ec1
2020-08-31 11:57:52 -07:00
Jun Wu
1199790982 upgrade: remove the upgrade module
Summary: It is about legacy revlog formats that are no longer relevant.

Reviewed By: DurhamG

Differential Revision: D23390436

fbshipit-source-id: 58c2c432804181bcc6517d6c988777b843fc9ba4
2020-08-31 11:57:52 -07:00
Stanislau Hlebik
2d5000293e sparse: disallow changing profiles if it includes bad file
Summary:
We have a few safeguards against creating full checkouts. However we have
sparse profiles that are not full, but that include very large directories
which normally should not be included.

This diff adds a logic that checks if a new sparse profile has any of the "marker"
files i.e. some files from a folder that should not be included. Operation
aborts if that the case, however there's always a way to workaround that.

Reviewed By: DurhamG

Differential Revision: D23414200

fbshipit-source-id: 626f392319eb1be8b35f39cadafb61f3c1dfefe3
2020-08-31 11:38:16 -07:00
Stanislau Hlebik
71e1d6493e pass context to getOrLoadChild
Summary:
Scuba logging that tracks undesired file fetches has some blind spots i.e. a
lot of fetches have null pid and null cmd line. This diff tries to fix another
part of the problem.

TreeInode::getOrLoadChild() has TODO `pass a fetch context down through
getOrLoadChild to track this load`. This diff fixes this TODO, and also starts
to pass context from EdenDispatcher:lookup method.

Note that it adds quite a lot of new `ObjectFetchContext::getNullContext()`
calls, and potentially those might be responsible for blind spots in logging.
I'll try to address this problem in the next diffs.

Reviewed By: kmancini

Differential Revision: D23418218

fbshipit-source-id: 319d7436494d8dce3580289aae9963aa13bfc191
2020-08-31 10:05:02 -07:00
Stanislau Hlebik
ea4e64864c fix one case of logging of null ClientPid
Summary:
Scuba logging that tracks undesired file fetches has some blind spots i.e. a
lot of fetches have null pid and null cmd line. This diff fixes at least part
of the problem.

TreePrefetchContext which is used from TreePrefetchLease didn't logged client
pid at all (in fact, it logged almost nothing). This diff fixes at least one blind spot, however it doesn't look like this is the only one.

Reviewed By: kmancini

Differential Revision: D23417451

fbshipit-source-id: 107884e94c6b40de999328ec2ef78fe22174c1ca
2020-08-31 10:05:02 -07:00
Genevieve Helsel
5157cf6f34 fix eden thrift legacy dependencies
Summary: `legacy.py` depends on other files in this directory, so lets add them all to the link tree

Reviewed By: fanzeyi

Differential Revision: D23356917

fbshipit-source-id: e4bfd82ebbd9d143a5454a43bb47e8dd55b4485f
2020-08-31 07:55:27 -07:00
Genevieve Helsel
89791f6b89 mock kerberos checks in doctor tests
Summary: These tests fail locally since adding these checks in eden doctor, so we need to mock them.

Reviewed By: chadaustin

Differential Revision: D23326597

fbshipit-source-id: 87a0e6ab0472e3ae56f89503e928c5a00a16ab04
2020-08-31 07:51:17 -07:00
Stanislau Hlebik
7bbf044a49 sparse: fix --sparse to work on eden
Summary:
"hg diff" has --sparse option which diffs only files inside a sparse checkout.
The problem is that it doesn't work on eden checkouts because eden repo doesn't
have sparsematch() function.

This diff makes it so that if sparsematch() function doesn't exist then
--sparse option is just ignored.

The motivation for this change is
https://fb.workplace.com/groups/corehg/?post_id=687768245151742. There are some
diff calls that are triggered by arc lint that race with "hg update" and might download
loads of data on people's laptops. This diff doesn't fix the race, but it:
1) Makes sure we don't download too much data that are not in sparse profiles.
2) arc lint doesn't care about files outside of sparse profiles anyway, so
running --sparse make sense.

Reviewed By: DurhamG

Differential Revision: D23396918

fbshipit-source-id: 2a386fdbeab85187e2c2acab69cb86b74124d46f
2020-08-28 23:47:40 -07:00
Xavier Deguillard
4f9e1750c2 cli: enable doctor on Windows
Summary:
Most of the fixes are pretty trivial as the code was using functions not
present on Windows, either work around them, or switch to ones that are
multi-platform.

Of note, it looks like `hg doctor` doesn't properly detect when Mercurial and
EdenFS are out of sync, disabling the  tests until we figure out why.

Reviewed By: genevievehelsel, fanzeyi

Differential Revision: D23409708

fbshipit-source-id: 3314c197d43364dda13891a6874caab4c29e76ca
2020-08-28 19:49:37 -07:00
Jun Wu
fbc9b865b6 changegroup: do not calculate how many files received commits include
Summary:
This is practically just 0 in our production setup during `pull`s. In the
future when the commit data become lazy, it's no longer possible to read the
files locally. So let's just don't scan the commits.

Reviewed By: DurhamG

Differential Revision: D23390438

fbshipit-source-id: 4c54c4aac5fd840205296ab86955ec1b8ab76607
2020-08-28 13:40:18 -07:00
root@sandcastle5869.frc3.facebook.com
5f749ee470 suppress errors in eden - batch 1
Differential Revision: D23401295

fbshipit-source-id: 01fe0ff888d074c503a445c6d97f17bf0ec2b79c
2020-08-28 12:46:36 -07:00
Durham Goode
08c938e859 dirstate: block addition of paths containing "." and ".."
Summary:
Mergedrivers can call dirstate.add directly and are adding paths with
"." and "..". Let's block those paths.

Reviewed By: quark-zju

Differential Revision: D23375469

fbshipit-source-id: 64e9f20169cfd50325ecd8ebcc1dd3be7a5cb202
2020-08-28 09:42:25 -07:00
Durham Goode
2f5130c882 py3: fix extdiff
Summary:
extdiff uses shutil.rmtree which calls os.rmdir with new python 3
options. Since we pathc os.rmdir, we need to support those options.

Reviewed By: quark-zju

Differential Revision: D23350968

fbshipit-source-id: 081d179dcd67b51ffdeb6b85899adf4e574a8d0f
2020-08-27 19:15:22 -07:00
Jun Wu
f271d882e6 hgcommands: make commands! macro define modules
Summary: Similar to D18528858 so module names do not need to be spelled twice.

Reviewed By: markbt

Differential Revision: D23091380

fbshipit-source-id: a2a261abc9c78c8805cea62b38498ba65398796d
2020-08-27 19:02:27 -07:00
Arun Kulshreshtha
cb3f95d06e configparser: make code compile without "fb" feature
Summary: This crate would fail to build without the "fb" feature because `serde_json` was listed as an optional dependency (but is used in a way that isn't conditional on the `fb` feature). This diff makes the dependency non-optional, and also silences several dead code warnings that are emitted when building without the "fb" feature.

Reviewed By: quark-zju

Differential Revision: D23386786

fbshipit-source-id: b00a8b0b8b0b978c1cfab2838629fcb388a076e9
2020-08-27 18:28:46 -07:00
Jun Wu
d586a40ada hgcommands: add debugfsync
Summary:
The `debugfsync` command calls fsync on newly modified files in svfs.
Right now it only includes locations that we know have constant number
of files.

The fsync logic is put in a separate crate to avoid slow compiles.

Reviewed By: DurhamG

Differential Revision: D23124169

fbshipit-source-id: 438296002eed14db599d6ec225183bf824096940
2020-08-27 18:26:03 -07:00
Xavier Deguillard
eb57ebb4d8 eden: decrease verbosity of "fetching tree" message
Summary:
A warning means that every tree fetched will be printed in the edenfs log,
which is way too much. Let's decrease this to a debug message.

Reviewed By: genevievehelsel

Differential Revision: D23385778

fbshipit-source-id: d77f1cac3efb945d4b95750822f2f12f48c75ffe
2020-08-27 18:16:51 -07:00
Jun Wu
c2d36d03c4 changegroup: avoid using rev numbers
Summary: `len(repo)` can no longer predicate the next rev number. Use nodes instead.

Reviewed By: DurhamG

Differential Revision: D23307791

fbshipit-source-id: cc20e53f039eee2a714748352e8e98aab253095a
2020-08-27 18:14:29 -07:00
Jun Wu
d8e775f423 tracing-collector: limit maximum count of spans
Summary:
Some functions might be called very frequently. For example,
`phases.phasecache.loadphaserevs` might be called 100k+ times.
That makes the tracing data harder to process.

Limit the count of spans to 1k by default so the data is cheaper to process,
and some highly repetitive cases can now be reasoned about. Note the limit
is only put on static Span Ids. If a span uses dynamic metadata or ask for
different Span Ids each time, they will not be limited.

In debugshell,

  td = %trace repo.revs('smartlog()')
  len(td.serialize())

dropped from 6MB to 0.87MB.

It's also possible to reason about:

  td = %trace len(repo.revs('ancestors(.)'))

in debugshell (taking 30s, 98KB serialized, vs 21s without tracing), while
previously the result would be too large to show (`%trace` just hangs).

Reviewed By: DurhamG

Differential Revision: D23307793

fbshipit-source-id: 3c1e9885ce7a275c2abd8935a4e4539a4f14ce83
2020-08-27 18:14:29 -07:00
Jun Wu
9f4dac104f dag: truncate output in <SpanSet as Debug>::fmt
Summary: Set a default limit so the output won't be too long.

Reviewed By: DurhamG

Differential Revision: D23307792

fbshipit-source-id: 7e2ed99e96bbde06436a034e78f899fc2e3e03f8
2020-08-27 18:14:29 -07:00
Jun Wu
54cd73b41b profiling: do not profile debugshell command
Summary:
The debugshell command can be long running and contains uninteresting stuff.
Do not profile it.

Practically this hides showing the background statprof thread when using `%trace`.

Reviewed By: DurhamG

Differential Revision: D23278597

fbshipit-source-id: bad97de22e1be2be8b866bee705ea3a6755aa54b
2020-08-27 18:14:29 -07:00
Jun Wu
d92c80ebcc dispatch: enter ipdb for "NameError 'ipdb' is not defined"
Summary:
This allows entering ipdb for code like: `ipdb` or `ipdb()`. It can be handy to
debug something.

Reviewed By: DurhamG

Differential Revision: D23278599

fbshipit-source-id: 4355dd1944617aeb795450935789f01f66f094eb
2020-08-27 18:14:28 -07:00
Jun Wu
28fa0e1cfe debugshell: add %trace and %hg magics
Summary: This makes it possible to get tracing results, or run hg commands directly.

Reviewed By: DurhamG

Differential Revision: D23278601

fbshipit-source-id: e7dc92080d2881cb4155a481df5ca93f324828fc
2020-08-27 18:14:28 -07:00
Jun Wu
ed78542610 dispatch: add --trace flag
Summary:
The `--trace` flag enables tracing Python modules.
For compatibility reasons, it also enables `--traceback`.

It can be used with debugshell to make `%trace` more useful.

Reviewed By: sfilipco

Differential Revision: D23278600

fbshipit-source-id: d6d0b34bd5c48111f8cd33d7df115f349b0e95b6
2020-08-27 18:14:28 -07:00
Jun Wu
3bbdfd3743 revset: successors(x) should only show visible commits
Summary:
I found this when I aborted an rebase Dxxx and trying rebasing again and it
complained about "nothing to rebase". It was caused by Dxxx resolving into
a hidden commit.

Reviewed By: sfilipco

Differential Revision: D23307794

fbshipit-source-id: f7a956b5300240089b6a4648f28cf4a152ee2433
2020-08-27 18:14:28 -07:00
Arun Kulshreshtha
767570d298 lfs_server: remove PerfCounters from post-request callback signature
Summary:
`PerfCounters` was the only application-specific type exposed as a parameter to the post-request callbacks, and it was only being used in one place. To facilitate making the post-request callback functionality more general, this diff makes the callback in question capture the `CoreContext` in its environment, thereby giving it access to the `PerfCounters` without requiring it to be passed as an argument.

This should not change the behavior since regardless of how the callback obtains a reference, it will still refer to the same underlying `PerfCounters` from the request's `CoreContext`.

Reviewed By: DurhamG

Differential Revision: D23298417

fbshipit-source-id: 898f14e5b35b827e98eaf1731db436261baa43bb
2020-08-27 14:15:25 -07:00
Arun Kulshreshtha
0b9ca4e83b hgcommands: remove unused imports in dynamicconfig module
Summary: Remove unused imports.

Reviewed By: quark-zju

Differential Revision: D23356940

fbshipit-source-id: 31b81eac11946aa8b24ec23c98ddb14716fbea3a
2020-08-27 14:06:52 -07:00
Genevieve Helsel
3eb96cfb62 fix dictionary changed size during iteration in patch
Summary:
We shouldn't delete from a dictionary while iterating over it, instead we should iterate over a copy and then delete from the original.

`.items()` returns a view of the dict, while wrapping it in `list` makes a deep copy.

Reviewed By: DurhamG

Differential Revision: D23283668

fbshipit-source-id: a168eef1ed2a1ce02fe71b3f6e3aed090965d2a4
2020-08-27 13:14:36 -07:00
Durham Goode
fe56f44ca0 treemanifest: prevent fetching nullid
Summary:
Mononoke throws an error if we request the nullid. In the long term we
want to get rid of the concept of the nullid entirely, so let's just add some
Python level blocks to prevent us from attempting to fetch it. This way we can
start to limit how much Rust has to know about these concepts.

Reviewed By: sfilipco

Differential Revision: D23332359

fbshipit-source-id: 8a67703ba1197ead00d4984411f7ae0325612605
2020-08-27 09:59:40 -07:00
Genevieve Helsel
92eba77a06 remove duplicated code in proc_utils
Summary: I refactored this method to be a member fuction of `EdenFSProcess` and I thought this instance of the method was deleted, but I came across it while working in this area again.

Reviewed By: fanzeyi

Differential Revision: D23113075

fbshipit-source-id: 2c257cca2da3a4bfefb974753eb00c7580c5a104
2020-08-27 09:53:10 -07:00
Chad Austin
0683ab6586 fix hg importer test regression
Summary:
Enabling hg dynamicconfigs in D23309090 (d643f48c8c) changed the output of `hg
manifest --debug` and broke HgImportTest. Set TESTTMP to avoid
production configs.

Reviewed By: DurhamG

Differential Revision: D23335847

fbshipit-source-id: 7ffd0394aa7a8466b266000b18f8742ed4a6b53f
2020-08-27 09:44:37 -07:00
Durham Goode
4d4e425624 configs: add fbitwhoami tiers to dynamicconfig inputs
Summary:
Corp has a different concept of tier than prod. Let's load the corp
tier into our tier set as well.

Reviewed By: quark-zju

Differential Revision: D23354056

fbshipit-source-id: c9543b8253f042c7b1224578e0687b4bdf21738e
2020-08-27 09:24:28 -07:00
Durham Goode
c190d283ec py3: don't use universal newlines for patch import
Summary:
The Python 3 email library internally stores the message as text, even
though our input and requested output is bytes. Let's make our own wrapper
around the parser to use ascii surrogateescape encoding so we can get the
actual bytes out later and not get universal newlines.

Based off the upstream 7b12a2d2eedc995405187cdf9a35736a14d60706,
which is basically a copy of the BytesParser implementation (https://github.com/python/cpython/blob/3.8/Lib/email/parser.py) with
newline=chr(10) added.

Reviewed By: quark-zju

Differential Revision: D23363965

fbshipit-source-id: 880f0642cce96edfdd22da5908c0b573887bed12
2020-08-27 09:21:04 -07:00