sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-10 16:57:49 +03:00

Author	SHA1	Message	Date
Jun Wu	49464342fd	indexedlog: try to use symlink for atomic_write on unix Summary: The change is in theory not necessary. However it improves the reliability on OS crashes a bit, and can potentially workaround some bugs in filesystems (as we saw in production where the atomic-written files are empty and the system didn't crash). The idea is, the `symlink` syscall does the file creation and "content" writing together, while there is no way to create a file and write specific content in one syscall. Note that the C symlink call uses 0-terminated string, and the Rust stdlib exports it as accepting `Path`. To be safe, we encode binary or non-utf8 content using `hex`. For downgrade safety, the write path does not use symlink by default unless format.use-symlink-atomic-write is set to true. This makes downgrade possible: the read path is rolled out first, then we can turn on and off the write path. The indexedlog Rust unit tests and test-doctor.t are migrated to use the new symlink code paths. Reviewed By: DurhamG Differential Revision: D20153864 fbshipit-source-id: c31bd4287a8d29575180fbcf7227d2b04c4c1252	2020-03-04 07:23:48 -08:00
Jun Wu	def12896db	indexedlog: add a utility function to read files crated by atomic_write Summary: This makes it possible to implement atomic_write differently (ex. use a symlink). Reviewed By: DurhamG Differential Revision: D20153865 fbshipit-source-id: 07fa78c2f2dac696668f477c75f65cf70950b73f	2020-03-04 07:23:47 -08:00
Mateusz Kwapich	1e33cd40b6	a small tool to backfill git mappings Summary: The git mappings are normally populated during blobimport of the repo but we need something for the repos we've already imported. Reviewed By: markbt Differential Revision: D20160768 fbshipit-source-id: 9e37c7d0f12682e73ca9990e56e4d827e9861a9f	2020-03-04 06:08:43 -08:00
Thomas Orozco	16d5ab5066	mononoke/cache_warmup: remove tracing Summary: We don't use it, and this tries to write to Manifold from tests, which is undesirable. Let's remove it; Reviewed By: farnz Differential Revision: D20219902 fbshipit-source-id: 2e983bee54cadad257648cc9633695be825a1ef3	2020-03-04 04:02:19 -08:00
Thomas Orozco	f4f96c1100	mononoke/microwave: create repository snapshots for faster cache warmup Summary: This introduces a new binary and library that (microwave: it makes warmup faster..!) that can be used to accelerate cache warmup. The idea is the microwave binary will run cache warmup and capture things that are loaded during cache warmup, and commit those to a file. We can then use that file when starting up a host to get a head start on cache warmup by injecting all those entries into our local cache before actually starting cache warmup. Currently, this only supports filenodes, but that's already a pretty good improvement. Changesets should be easy to add as well. Blobs might require a bit more work. Reviewed By: StanislavGlebik Differential Revision: D20219905 fbshipit-source-id: 82bb13ca487f82ca53b4a68a90ac5893895a96e9	2020-03-04 04:02:18 -08:00
Thomas Orozco	7f044a7b2e	mononoke/walker: disable filenodes SQL timeouts Summary: The walker has been hitting the filenodes-enforced 5 second SQL timeout when querying filenodes from MySQL. It's not clear why that is, but looking at previous run history shows that we occasionally have queries that take > 30 seconds to complete (none of those show up in MySQL slow queries, though, and there's no particular load on the hosts around that time, so it's not clear whether this is happening in MySQL or our end). Anyhow, those queries would have worked in the old implementation (after a long time), but they fail in the new one, since it enforces a 5-second timeout. We should investigate why this is happening (and Alex has landed diffs to add more reporting in the walker to that end), but in the meantime, there's no reason to break the walker Reviewed By: farnz Differential Revision: D20227842 fbshipit-source-id: 5ee5c8225b6474b66c1f48a10b4a2d671ebc79c6	2020-03-04 03:20:26 -08:00
Thomas Orozco	f486c3d190	mononoke/fastreplay: add context on cache warmup failures Summary: When it fails, it's better to know which repo failed. Reviewed By: farnz Differential Revision: D20245375 fbshipit-source-id: 9794911308dbdd67b20673857ac8b7b54f06a217	2020-03-04 03:14:45 -08:00
Stanislau Hlebik	e9f78e0601	mononoke: add context with repoid to cache_warmup error message Summary: Makes it easier to understand which repo is failing Reviewed By: krallin Differential Revision: D20244630 fbshipit-source-id: ca32f7831c5ed4e701103020e9878c459ba6d573	2020-03-04 01:52:11 -08:00
Jun Wu	ea7a8b68a5	run-tests: fail instead of skipping tests for unknwon hghave features Summary: If hghave fails to check a feature because the feature name is unknown, treat it as a test failure instead of skipping the entire test. This is especially useful since `#if feature-name` only affects part of the test and failing to test the feature should not skip the entire test. It also allows us to capture issues about mis-spelled feature names or stale feature tests. This has bitten us twice in the past: - D18819680 removed `pure` and accidently disabled tests including `test-install.t`, `test-annotate.t` and `test-issue4074.t`. Those tests got re-enabled as part of D20155399, while they pass Python 2 tests, the Python 3 tests were failing. - D18088850 removed svn related feature checks, which has caused some issues that got fixed by D18713921 and D18713922.<Paste> Reviewed By: xavierd Differential Revision: D20231782 fbshipit-source-id: 6adf99bd79b2a295d4e84ce4da5f9425a100936a	2020-03-03 19:17:59 -08:00
Jun Wu	73f0525b89	test-issue4074: fix py3 compatibility Summary: There are multiple issues. Fix them. Reviewed By: kulshrax Differential Revision: D20231783 fbshipit-source-id: fc6be43fda088822fe8ff9dbd32410aa616c1772	2020-03-03 17:46:34 -08:00
Jun Wu	2e03deb89e	test-annotate: fix py3 compatibility Summary: The encoding.trim function needs update. Reviewed By: kulshrax Differential Revision: D20231780 fbshipit-source-id: 82ea022d815fe9077b8b72403f8de1049173956c	2020-03-03 17:46:34 -08:00
Jun Wu	96ef84b2d4	test-install: fix py3 compatibility Summary: The test should not assert Python version is "2.*". Reviewed By: kulshrax Differential Revision: D20231781 fbshipit-source-id: 2e10c37bb4b665bc4d5d4b27329c4c2cb23d54e3	2020-03-03 17:46:33 -08:00
Arun Kulshreshtha	78adda0589	mercurial_types: make envelope functions use generics instead of trait objects Summary: Make these functions generic so that callers don't need to construct a trait object whenever they want to call them. Passing in a trait object should still work so existing callsites should not be affected. Reviewed By: krallin Differential Revision: D20225830 fbshipit-source-id: df0389b0f19aa44aaa89682198f43cb9f1d84b25	2020-03-03 15:11:04 -08:00
Arun Kulshreshtha	f8d0ad25a2	mononoke_api: add history method to HgFileContext Summary: Add a method to `HgFileContext` to stream the history of the file. Will be used to support EdenAPI history requests. Reviewed By: krallin Differential Revision: D20211779 fbshipit-source-id: 49e8c235468d18b23976e64a9205cbcc86a7a1b4	2020-03-03 15:11:04 -08:00
Arun Kulshreshtha	fa999d9de1	mononoke_api: add HgTreeContext Summary: Add an 'HgTreeContext' struct to the 'hg' module to allow querying for tree data in Mercurial-specific formats. This initial implementation's primary purpose is to enable getting the content of tree nodes in a format that can be written directly to Mercurial's storage. Reviewed By: krallin Differential Revision: D20159958 fbshipit-source-id: d229aee4d6c7d9ef45297c18de6e393d2a2dc83f	2020-03-03 15:11:03 -08:00
Genevieve Helsel	93d6f0a3e9	use a NullTelemetryLogger during integration tests Summary: I was looking in the `edenfs_events` table and saw that sandcastle was logging to this table. Rice was able to identify that the reason was because the integration tests were logging. So if we're on running integration tests, we should return a `NullTelemetryLogger`. The daemon currently does not log on sandcastle AFAIK. Reviewed By: simpkins Differential Revision: D20203556 fbshipit-source-id: e09175347631478cb366d4fa2c6092d976504dd8	2020-03-03 14:56:49 -08:00
Hezi Zhang	2bbc4e5043	--clean option for eden du Summary: `buck run edenfsctl -- du --clean` would help reduce the space used by the storage engine. Reviewed By: chadaustin Differential Revision: D20200616 fbshipit-source-id: 6ffa588fc71660a6a80d81aef7d58dda08932374	2020-03-03 14:07:53 -08:00
Adam Simpkins	d205829363	update process_finder to also be able to report EdenFS build info Summary: Add a `get_build_info()` method to the `EdenFSProcess` objects returned by the `process_finder` module. This returns information about the process version and build time. Reviewed By: wez Differential Revision: D20178487 fbshipit-source-id: b1eb41de9184ca59dc1e90d0a92ff1cbc89a6b77	2020-03-03 13:48:55 -08:00
Adam Simpkins	53f15731c6	fix a couple pyre-fixme comments in eden/cli/main.py Summary: Store member variables in a local variable so that Pyre will allow unwrapping it from an `Optional` type. Pyre refuses to allow member variables to be extracted from `Optional` since other functions called indirectly could modify them. Reviewed By: fanzeyi Differential Revision: D20212162 fbshipit-source-id: 95655b73b5e469688f48d402c0b587928cbb0a35	2020-03-03 13:41:28 -08:00
Jun Wu	7c5e47bab1	indexedlog: rename chunk_size_log to chunk_size_logarithm Summary: This makes it clear that `log` is a math concept, not an append-only file like `Log`. Reviewed By: DurhamG Differential Revision: D20149376 fbshipit-source-id: 67d2e9584b15f48759ca9b6dfce4279a5b1365a0	2020-03-03 13:41:28 -08:00
Jun Wu	49de84398b	bindings: use Str for return type of repair() Summary: This makes it friendly to Python 2. Reviewed By: sfilipco Differential Revision: D20162233 fbshipit-source-id: 5beb7a0f52159afc454332ff6e37e13087177cc0	2020-03-03 13:41:27 -08:00
Jun Wu	5ba323af16	doctor: skip unknown visibleheads format Summary: When I run `hg doctor` in my www checkout it fails the assertion check of the first line of visibleheads is "v1". Make it graceful so doctor can check and fix other components. Reviewed By: DurhamG Differential Revision: D20147969 fbshipit-source-id: 6aee2cab962fcd0ef06a0611d288021e86621249	2020-03-03 13:41:27 -08:00
Puneet Kaushik	7e0c6397c4	Update find_eden to work on Windows Summary: Updated find_eden to find the Eden clone on a Windows system. On Windows we don't use symlinks, which make the logic different than on POSIX implementation. Reviewed By: simpkins Differential Revision: D19953934 fbshipit-source-id: bfbc112c3ccc48735ec6590746d8275cc9850796	2020-03-03 13:27:19 -08:00
Alvaro Leiva Geisse	5bd7c1ad3e	Revert D20090620: Type hints Differential Revision: D20090620 Original commit changeset: 811bb54159ab fbshipit-source-id: 4d00afda362120c23567244cbbb77a288f05a6dd	2020-03-03 12:34:59 -08:00
Adam Simpkins	f85fe60c31	update the CLI to handle old EdenFS instances without getDaemonInfo() Summary: In D20130406 I updated the CLI to call `getDaemonInfo()` to check on the server status. However, some very old EdenFS instances do not have have this method. These instances should all be gone shortly, but for now update the code to handle the unknown method error and fall back to calling `getPid()` and `getStatus()` separately. I implemented this in our `EdenClient` wrapper class, similar to our existing wrapper for `getPid()`. Reviewed By: fanzeyi Differential Revision: D20212518 fbshipit-source-id: 9d48bdd26822802a7e9776128c5567436d4bb445	2020-03-03 12:15:26 -08:00
Adam Simpkins	11451012a0	make autodeps happy about eden/py Summary: Update the import statements so that autodeps works on the `eden/py` directories. Reviewed By: fanzeyi Differential Revision: D20212519 fbshipit-source-id: 37ccabf14dc0dbfe998664260ae9b83c9136ad63	2020-03-03 12:15:25 -08:00
Adam Simpkins	2ea7064ae6	update process_finder to also find the Eden dir through the lock file Summary: Update process_finder.py to look for a process's Eden state directory by looking through its open FDs to find the EdenFS lock file, if it can't find the state directory from the command line arguments. At the moment we almost always invoke EdenFS with an explicit `--edenDir` argument, but this code will allow this code to work even if we remove that in the future. Reviewed By: wez Differential Revision: D20178484 fbshipit-source-id: 361b78f4a2566b8c09ce02fb21c46233d7e2546b	2020-03-03 11:53:50 -08:00
Adam Simpkins	ffa4d589b9	update the process_finder tests to also fake a privhelper process Summary: Update the `FakeProcessFinder.add_edenfs()` function to also add a fake privhelper process in addition to the main edenfs process. This allows the tests to more accurately simulate the normal edenfs behavior. Reviewed By: wez Differential Revision: D20178482 fbshipit-source-id: edc70ade1b61929b37f13ece77757c7c35aa4eec	2020-03-03 11:53:49 -08:00
Adam Simpkins	931523b160	update process_finder to return processes for all users Summary: Update the code in `process_finder.py` to return EdenFS processes owned by all users. We now report the `uid` as a field in the returned process info, so that callers can filter the results based on user ID if they want. This allows callers more flexibility when finding processes. This also updates the `FakeProcessFinder` test utility code to support providing fake UIDs to allow testing this behavior. Reviewed By: wez Differential Revision: D20178490 fbshipit-source-id: 6b76e1109e4835b167c80688fd3ace50f7986a22	2020-03-03 11:53:49 -08:00
Adam Simpkins	da4e49b89b	move rogue process detection from process_finder to doctor code Summary: Move the code to find rogue EdenFS processes out of the generic `process_finder` module and into the `check_rogue_edenfs` module that is specific to the `eden doctor` checks. The `ProcessFinder` class now exposes a `get_edenfs_processes()` API instead of `find_rogue_pids()`, which makes it more generically usable outside of just the doctor code. Reviewed By: wez Differential Revision: D20178486 fbshipit-source-id: e289f1673a5d4a666e9d54e8f58f4f00bdde94b7	2020-03-03 11:53:49 -08:00
Katie Mancini	3a035094f8	Record Mercurial tree import time Summary: - added logging only around the import tree call to capture non-queue related wait time Reviewed By: chadaustin, fanzeyi Differential Revision: D20207472 fbshipit-source-id: d88bb34ce224a26ff2be100d7789ddeff608006d	2020-03-03 11:44:28 -08:00
Katie Mancini	52e211fe8e	Record Mercurial file import time Summary: - added logging only around the import blob call to capture non-queue related wait time - added to `test_reading_file_gets_file_from_hg` in `integration.stats_test.HgBackingStoreStatsTest` to test import blob logging in addition to the get blob loging (not yet done for importing trees, will do in next diff) Reviewed By: chadaustin Differential Revision: D20201215 fbshipit-source-id: c89281fe7d3d6e89d111ac8cce9014adff44ac40	2020-03-03 11:44:27 -08:00
David Tolnay	e988a88be9	rust: Rename futures_preview:: to futures:: Summary: Context: https://fb.workplace.com/groups/rust.language/permalink/3338940432821215/ This codemod replaces all dependencies on `//common/rust/renamed:futures-preview` with `fbsource//third-party/rust:futures-preview` and their uses in Rust code from `futures_preview::` to `futures::`. This does not introduce any collisions with `futures::` meaning 0.1 futures because D20168958 previously renamed all of those to `futures_old::` in crates that depend on both 0.1 and 0.3 futures. Codemod performed by: ``` rg \ --files-with-matches \ --type-add buck:TARGETS \ --type buck \ --glob '!/experimental' \ --regexp '(_\|\b)rust(_\|\b)' \ \| sed 's,TARGETS$,:,' \ \| xargs \ -x \ buck query "labels(srcs, rdeps(%Ss, //common/rust/renamed:futures-preview, 1))" \ \| xargs sed -i 's,\bfutures_preview::,futures::,' rg \ --files-with-matches \ --type-add buck:TARGETS \ --type buck \ --glob '!/experimental' \ --regexp '(_\|\b)rust(_\|\b)' \ \| xargs sed -i 's,//common/rust/renamed:futures-preview,fbsource//third-party/rust:futures-preview,' ``` Reviewed By: k21 Differential Revision: D20213432 fbshipit-source-id: 07ee643d350c5817cda1f43684d55084f8ac68a6	2020-03-03 11:01:20 -08:00
Stanislau Hlebik	b90a3e842a	common/rust: add fbinit::compat_test Summary: While we are transitioning from tokio 0.1 to tokio 0.2 we might need to use [tokio_compat](https://docs.rs/tokio-compat/0.1.4/tokio_compat/) crate. Let's add a helper macro similar to fbinit::test that uses tokio_compat runtime. Reviewed By: farnz Differential Revision: D20213814 fbshipit-source-id: 18976e953011c8ada1fa915686e2dcb76ea288d5	2020-03-03 10:18:02 -08:00
Thomas Orozco	83cd9eec54	mononoke/apiserver: run streams on a Tokio 0.2 runtime Summary: Well, we don't have a Tokio Compat runtime in Actix. This means Tokio 0.2 code (e.g. Tokio 0.2 timers) blows up when executed in the API Server. How do we fix this? By not running Mononoke code on Actix's runtime, and instead running in on a Mononoke runtime we instantiated. How do we do that? By passing a Tokio Compat Executor all the way down to the place where Actix is about to consume our stream ... and at that point, we spawn the stream on our runtime, and give Actix a dumb receiver that does work when polled on a Tokio 0.1 runtime. This feels like the end of the road for the API Server. Nothing about this is even remotely sane, but it should take us through the API Server's eventual demise and replacement with the Gotham-based EdenAPI Server, which runs on the runtime of our choice (i.e. Tokio 0.2). Reviewed By: farnz Differential Revision: D20222294 fbshipit-source-id: 1646e35fe05b131b030e4962c8a7f68f72995035	2020-03-03 10:18:02 -08:00
Doug Neal	1e088c0af2	mononoke: lfs_server: add optional client identities to ratelimit config Summary: * Added intermediate (de)serializers for config types, so that we generate full Identity objects at config load time * Implement FromStr for Identity * Compare configured identities to presented identities in ratelimit middleware in order to decide whether or not to apply the limit Reviewed By: krallin Differential Revision: D20139308 fbshipit-source-id: 340c300db549575eb6d06efcbe437c0b1db4927b	2020-03-03 09:33:03 -08:00
Genevieve Helsel	e1e698ccb3	update eden doctor to log vector of problem types Reviewed By: chadaustin Differential Revision: D20199631 fbshipit-source-id: 30c770167181db30f956a76ea48327800c4a6ae6	2020-03-03 08:04:29 -08:00
Genevieve Helsel	0351783cbe	allow tag support in cli scuba logging Summary: We should support logging tags as well. I pass this along as a set until json construction because we do not want to have repeat values since tags are expected to be sets Reviewed By: chadaustin Differential Revision: D20199632 fbshipit-source-id: 2b5c94f1747a9b30d7a97b605abfd0e39928464c	2020-03-03 08:04:29 -08:00
Stanislau Hlebik	a70ccf6f04	mononoke: make it clearer which repo is accessed in permission error Summary: Usually we have only one repo, but in case of xrepo_commit_lookup we actually have two. It's nice to know which permission failed Reviewed By: krallin Differential Revision: D20221509 fbshipit-source-id: ee98845767e72f99027ba18a8c5b374cb6f9f3ab	2020-03-03 07:22:50 -08:00
Alex Hornby	464ffc40eb	mononoke: pushrebase: fix casefolding_check usage during changeset creation Summary: Honor the repo casefolding_check setting as tested by test-pushrebase-allow-casefolding.t Reviewed By: StanislavGlebik Differential Revision: D20192411 fbshipit-source-id: 8da72049417015b1f284c115a53b13c26ce3c3f6	2020-03-03 03:57:32 -08:00
Alex Hornby	5491f049a4	mononoke: walker: publish per-node-type stats Summary: publish per-node-type progrss stats so we can correlate storage access/load to type of node traversed Reviewed By: farnz Differential Revision: D20181064 fbshipit-source-id: c741b526c50e86a3eee105fab57fd7bc3ecc063b	2020-03-03 03:47:57 -08:00
Alex Hornby	37da3ebd2b	mononoke: pushrebase: add tests for casefolding Summary: Add tests for existing default block casefolding_check behaviour, plus test demonstrating problem with casefolding_check=false Reviewed By: farnz Differential Revision: D20192412 fbshipit-source-id: 1aea0fc5581e0c44388a4224ca693698731d3cd5	2020-03-03 02:44:06 -08:00
David Tolnay	fe65402e46	rust: Move futures-old rdeps to renamed futures-old Summary: In targets that depend on both 0.1 and 0.3 futures, this codemod renames the 0.1 dependency to be exposed as futures_old::. This is in preparation for flipping the 0.3 dependencies from futures_preview:: to plain futures::. rs changes performed by: ``` rg \ --files-with-matches \ --type-add buck:TARGETS \ --type buck \ --glob '!/experimental' \ --regexp '(_\|\b)rust(_\|\b)' \ \| sed 's,TARGETS$,:,' \ \| xargs \ -x \ buck query "labels(srcs, rdeps(%Ss, fbsource//third-party/rust:futures-old, 1) intersect rdeps(%Ss, //common/rust/renamed:futures-preview, 1) )" \ \| xargs sed -i 's/\bfutures::/futures_old::/' ``` Reviewed By: jsgf Differential Revision: D20168958 fbshipit-source-id: d2c099f9170c427e542975bc22fd96138a7725b0	2020-03-02 21:02:50 -08:00
Zeyi (Rice) Fan	7627417ce8	check for interrupted transaction and try to repair it Summary: Recently there are increased reports on EdenFS's backing repo stuck in interrupted transaction state, and the user has to manually run `hg recover` in their backing repo to fix the problem. This diff teaches `eden doctor` to automatically run that command for the users. Reviewed By: simpkins Differential Revision: D20109567 fbshipit-source-id: a7427834e98425be388741c7f214b9d7354ac44e	2020-03-02 18:54:55 -08:00
Adam Simpkins	3c29a20934	move the process_finder CLI code to its own library Summary: Enable pyre-strict type checking for `process_finder.py`, and split it into its own library. Reviewed By: genevievehelsel Differential Revision: D20178483 fbshipit-source-id: e6c62ca5d84c7b7e599ae00fb51df6f7e4c55a65	2020-03-02 15:41:37 -08:00
Adam Simpkins	95ec8e042a	fix platform checks in the CLI code Summary: A couple places in the CLI code (mostly used by `eden doctor`) were checking `sys.platform` to tell if we were on Linux. Unfortunately these checks both expected the value `linux2`. However, since Python 3.3 `sys.platform` is just `linux` on Linux, and not `linux2`. This meant we were always hitting the non-Linux code paths and skipping these checks. This updates the code to check `platform.system()`. Based on the documentation it sounds like this is intended to give a bit more consistent behavior across different platforms and OS versions. Reviewed By: genevievehelsel Differential Revision: D20178488 fbshipit-source-id: c908d5133a9c41e6a239a8893742d03f6c08527c	2020-03-02 15:41:36 -08:00
Adam Simpkins	3d18d04475	change the process name for the privhelper to "edenfs_privhelp" Summary: Call `folly::setThreadName()` in the privhelper process when it starts. This changes the command name reported in `/proc/PID/comm` and in `ps` The process name is limited to 15 bytes, so this shows up as `edenfs_privhelp` Reviewed By: fanzeyi Differential Revision: D20199409 fbshipit-source-id: a5349bfab9230174aaa99c87f0db73fe31659186	2020-03-02 15:35:21 -08:00
Doug Huff	1a1c6d7e35	Type hints Summary: One small step towards typing Reviewed By: thatch Differential Revision: D20090620 fbshipit-source-id: 811bb54159ab91e5560d115c20373eaf6542b2f9	2020-03-02 13:49:15 -08:00
Stanislau Hlebik	25c57e445c	mononoke: add create_warmer() function Summary: Small cleanup that removes a bunch of duplicate code. That should make it easier to add other types of derived data to the warmer Reviewed By: krallin Differential Revision: D20193169 fbshipit-source-id: 437fe7981d8a71164dc9edfcc423e8c41cbe0967	2020-03-02 10:08:09 -08:00
Arun Kulshreshtha	bd4a623ccb	mononoke_api: Add HgFileContext::new_check_exists Summary: Add a `new_check_exists` method to `HgFileContext` to allow looking up potentially nonexistent filenodes. Reviewed By: xavierd Differential Revision: D20159085 fbshipit-source-id: f6047f7a25f59594823672373d8b35adb49586e1	2020-03-02 09:41:21 -08:00
Arun Kulshreshtha	8ec76a0bce	mononoke_api: add hg module Summary: Add a a new `hg` module to the `mononoke_api` crate that provides a `HgRepoContext` type, which can be used to query the repo for data in Mercurial-specific formats. This will be used in the EdenAPI server. Initially, the `HgRepoContext`'s functionality is limited to just getting the content of individual files. It will be expanded to support querying more things in later diffs. Reviewed By: markbt Differential Revision: D20117038 fbshipit-source-id: 23dd0c727b9e3d80bd6dc873804e41c7772f3146	2020-03-02 09:41:20 -08:00
Thomas Orozco	0dadca26e7	mononoke/gotham_ext: make MononokeHttpHandler middleware async & allow preemption Summary: This updates our middleware stack and introduces two new pieces of functinality: - Middleware can now be async. - Middleware can now preempt requests and dispatch a response. The underlying motivation for this is to allow implementing Mononoke LFS's rate limiting middleware in our existing middleware stack. Reviewed By: kulshrax Differential Revision: D20191213 fbshipit-source-id: fc1df7a14eb0bbefd965e32c1fca5557124076b5	2020-03-02 09:28:08 -08:00
Arun Kulshreshtha	615d8392bc	mononoke_api: update doc comments on file content methods Summary: D20121350 changed the methods for accessing file content on `FileContext` to no longer return `Stream`s. We should update the comments accordingly. Reviewed By: ahornby Differential Revision: D20160128 fbshipit-source-id: f5bfd7e31bc7e6db63f56b8f4fc238893aa09a90	2020-03-02 09:21:08 -08:00
Shai Szulanski	42456710dd	Add some missing transitive dependencies Summary: A bunch of files include folly/executors/GlobalExecutors.h transitively through thrift/lib/cpp2/async/Stream.h, which is going away. Explicitly include the header (and add dependency to target) in preparation for deleting Stream.h drop-conflicts Reviewed By: vitaut Differential Revision: D20141838 fbshipit-source-id: 21c58cf82136287fc2d84ba5badec6b872106015	2020-03-02 08:54:49 -08:00
Thomas Orozco	2d04773c23	mononoke/hg_sync_job: update Globalrevs in hgsql Summary: This updates the hg_sync_job to update Globalrevs in hgsql before attempting to sync bundles. This means that if we're syncing successfully, hg is in sync with Mononoke, and if we fail (which should be very uncommon to begin with!), hg might skip a little bit ahead, but that's OK. This only makes sense when generating bundles — when doing pushrebase, hg would be updating its own globalrevs. Reviewed By: StanislavGlebik Differential Revision: D20159262 fbshipit-source-id: 6736f8592682da1001c7c9c4c9444462b71913c2	2020-03-02 08:24:16 -08:00
Genevieve Helsel	528015f9fe	allow more hg fastpath cases Reviewed By: simpkins Differential Revision: D20143888 fbshipit-source-id: 4b1a73159bde6835626ad1766b2cf9dcd2faf6c4	2020-03-02 07:43:39 -08:00
Stanislau Hlebik	638e637ef6	RFC: mononoke: introduce unodes v2 Summary: Our previous implementation of unodes had a problem with diamond merges - essentially because p1 and p2 might have the same file but with different content unode will always create a merge unode which can be unexpected. (code comment in unodes/derive.rs has more info about it). This diff fixes the problem by introducing unodes v2. This allows us to import new repos with new unode implementation while keeping the old repos with unode v1. This implementation uses a heuristic which should be fast and should do the correct thing most of the time. In some cases it might exclude some parts of the history completely. For example: O <- merge commit, doesn't change anything / \ P1 \| <- modified "file.txt" to "B" \| P2 <- modified "file.txt" to "B" \ / ROOT <- created "file.txt" with content "A" In that case history of "file.txt" starting from merge commit will contain only (P1, ROOT), but it won't contain P2. We also considered other options: 1) Move this heuristic to fastlog batch derived data. See D19973553 for more details about why we decided not to do it. 2) Filter out parent unodes that are ancestors of other parent unodes. This should always be correct, but it will be hard to implement, it wil be even harder to make sure it always have good performance. Reviewed By: krallin Differential Revision: D19978157 fbshipit-source-id: 445ddd5629669d987e7aa88c35fecf0b34a40da0	2020-03-02 05:27:31 -08:00
Stanislau Hlebik	d7a4ff29b5	mononoke: log derivations to separate scuba table Summary: I'd like to log all derivations to a single place so that's it's easier to understand what was derived and where Reviewed By: aslpavel Differential Revision: D20140004 fbshipit-source-id: 305ea533031a04ff95995a6fe2a6e57e95a87026	2020-03-02 04:30:12 -08:00
Alex Hornby	63937e3030	mononoke: walker: log the source node when validating Summary: Log the source node when validating so that we can more quickly reproduce any issues in a single step via the --walk-root option, rather than needing to run the entire walk again. Differential Revision: D20098200 fbshipit-source-id: 6b0d7d151c97f25080953d6c0fbf431dc2cec6a8	2020-03-02 02:29:34 -08:00
Jun Wu	c718a5dc19	pathmatcher: add a test about a bug in globset/aho-corasick Summary: Also patch aho-corasick to fix the issue. The issue was introduced by [an optimization path](`063ca0d253`) added in aho-corasick 0.7 series (used by globset 0.4.3). aho-corasick 0.6.x (globset 0.4.2) are not affected. The next aho-corasick release (0.7.9) contains the fix. See https://github.com/BurntSushi/aho-corasick/issues/53 for more context. Reported by: yns88 Reviewed By: DurhamG Differential Revision: D20125697 fbshipit-source-id: 592375b43d7ee494bb3e916a1cb11c18f9ebe425	2020-02-28 22:09:28 -08:00
Jun Wu	c1535925cf	pydag: do not take parentfunc at __init__ Summary: `parentfunc` is only needed when adding new nodes to the DAG. Move it to `addheads` methods instead. Reviewed By: sfilipco Differential Revision: D20155398 fbshipit-source-id: 0bddd5f46e84c44891928b9f598a38206917aecb	2020-02-28 17:45:27 -08:00
Jun Wu	26127e91ec	debugstrip: repo.revs -> repo.nodes Summary: One step towards removing usage of revision numbers. Reviewed By: sfilipco Differential Revision: D20155397 fbshipit-source-id: f4f3823146217afd8be75120e46901691fbd24cd	2020-02-28 17:45:27 -08:00
Jun Wu	10bb5a144e	revset: replace some repo.revs with repo.nodes Summary: Migrate away from some uses of revision numbers. Some dead code in discovery.py is removed. I also fixed some test issues when I run tests locally. Reviewed By: sfilipco Differential Revision: D20155399 fbshipit-source-id: bfdcb57f06374f9f27be51b0980652ef50a2c8e0	2020-02-28 17:45:26 -08:00
Jun Wu	5d253a75df	amend: remove hiddenoverride Summary: `hiddenoverride` is a hacky implementation that preserves part of another hacky `inhibit` extension. With our modern setup (inhibit or narrow-heads), `hiddenoverride` is less useful. Therefore just remove it. Reviewed By: sfilipco Differential Revision: D20148011 fbshipit-source-id: f4a5f05b67ae6f315e9b07d50ef03018d6d05df5	2020-02-28 17:45:26 -08:00
Jun Wu	5b15556e60	pydag: replace SpanSet with NameSet in NameDag public APIs Summary: This makes it so that DAG calculations in NameDag are all using commit hashes. The `id2node`, `node2id` APIs are still using integer ids, and hopefully their usage can eventually be removed. Reviewed By: sfilipco Differential Revision: D20020527 fbshipit-source-id: ee32b1ccacabd5174ff1556e426b5ed32d2b8507	2020-02-28 16:35:25 -08:00
Jun Wu	7c6a84c8f5	pydag: add wrappers for NameSet Summary: This exposes the NameSet type to the Python world. The code is similar to the SpanSet wrapper that exists in pydag. Reviewed By: sfilipco Differential Revision: D20020521 fbshipit-source-id: 840e009eadca7154f11ca61561da4c48022088f6	2020-02-28 16:35:25 -08:00
Jun Wu	0220b4a0c3	nameset: make NameIter Send Summary: This makes it possible to use NameIter in py_class. Reviewed By: sfilipco Differential Revision: D20020529 fbshipit-source-id: b9147b7dccb38d18d8361b420507fcbe97e01351	2020-02-28 16:35:25 -08:00
Jun Wu	2bbbd3d956	pydag: handle null node special case Summary: Mercurial has a special case that b'\0' * 20 maps to rev -1 and means "an empty commit". This cannot be cleanly supported by the zstore commit data, since sha1("") is not '\0' * 20 and zstore does not allow faked SHA1 keys. Therefore let's add the special case in the bindings layer. It's possible to do this check in Python, but that'll be slower. Reviewed By: sfilipco Differential Revision: D20020520 fbshipit-source-id: 0686832666646f2e201035992e3951b47c32eb5a	2020-02-28 16:35:24 -08:00
Jun Wu	13d6e7c92f	pydag: use NameDag Summary: Use the new NameDag as the backing structure and expose its APIs. Reviewed By: sfilipco Differential Revision: D20020528 fbshipit-source-id: ccb49e1a5e757bd35a3f71cfb54ceccfb544664e	2020-02-28 16:35:24 -08:00
Jun Wu	782f2017aa	dag: add hex prefix lookup Summary: This will be used by commit hash prefix lookup. Reviewed By: sfilipco Differential Revision: D20020523 fbshipit-source-id: f2905ddf63098704b08dad8eb48272c3ffba7e25	2020-02-28 16:35:24 -08:00
Jun Wu	12441f48bf	dag: re-export common types at top-level Summary: Export common types at the top-level of the crate so it's easier to use. Reviewed By: sfilipco Differential Revision: D20020526 fbshipit-source-id: e9a0a8bc3cc91f81d0bc74e7530cd4613fc1dd61	2020-02-28 16:35:23 -08:00
Jun Wu	bc9f72ccf3	dag: implement DAG algorithms on NameDag Summary: Those just delegate to IdDag for the actual calculation. Reviewed By: sfilipco Differential Revision: D20020522 fbshipit-source-id: 272828c520097c993ab50dac6ecc94dc370c8e8b	2020-02-28 16:35:23 -08:00
Jun Wu	b88da34fb0	dag: expose NameDag in tests Summary: This allows tests to check NameDag APIs. Reviewed By: sfilipco Differential Revision: D20020525 fbshipit-source-id: 4ee8e4bcbd0731512ba17068e827b8045fc5d522	2020-02-28 16:35:23 -08:00
Jun Wu	194cd25f4f	dag: add Arc<IdMap> to NameDag Summary: This will be used to produce NameSet. Reviewed By: sfilipco Differential Revision: D20020519 fbshipit-source-id: abf6d73f2b985b74560d6b5db2800ff25450f02e	2020-02-28 16:35:22 -08:00
Jun Wu	7a343271b9	dag: rename NameDag::parents to NameDag::parent_names Summary: This matches IdDag::parents (taking a set) and IdDag::parent_ids. Reviewed By: sfilipco Differential Revision: D20020524 fbshipit-source-id: 6e90727c355a7400f9a23e0b25e3392bdc032f49	2020-02-28 16:35:22 -08:00
Jun Wu	e3b28a683c	nameset: add fast paths for DagSet Summary: DagSet's SpanSet has fast paths for set operations. Use them. Reviewed By: sfilipco Differential Revision: D19912104 fbshipit-source-id: 24b55aa14d03be2f1be59c923e0b8e79d6bcbe6d	2020-02-28 16:35:22 -08:00
Jun Wu	587b06efee	nameset: AllSet Summary: This is similar to hg's fullreposet. It'll be useful as a dummy "subset". Reviewed By: sfilipco Differential Revision: D19912108 fbshipit-source-id: 33a95bcb3cf5931a431a1201d1a1f3c627cec7a1	2020-02-28 16:35:21 -08:00
Jun Wu	d41c55a13b	nameset: SortedSet Summary: SortedSet is a wrapper to other sets that marks it as topologically sorted. Reviewed By: sfilipco Differential Revision: D19912111 fbshipit-source-id: 2637e8fd29b97f6db0c5bae3f0decd7ac382eeb1	2020-02-28 16:35:21 -08:00
Jun Wu	51bea7aff7	nameset: LazySet Summary: Similar to Mercurial's smartset.generatorset. Reviewed By: sfilipco Differential Revision: D19912110 fbshipit-source-id: 7d940b8578ec7090282e2addb1fde871cddb2b25	2020-02-28 16:35:20 -08:00
Jun Wu	5e451d07b1	nameset: DagSet Summary: Wraps SpanSet + IdMap so it only exposes commit names without ids. There is no equivalent smartset in Mercurial. Reviewed By: sfilipco Differential Revision: D19912112 fbshipit-source-id: 0d257de11527dfa8836065ac94f652730a97a468	2020-02-28 16:35:20 -08:00
Jun Wu	e7e7a5b356	nameset: StaticSet Summary: Similar to Mercurial's smartset.baseset. All names are statically known. Reviewed By: sfilipco Differential Revision: D19912105 fbshipit-source-id: e4fcf2d59291adb3ca01b3b90f1ac32c65ad7eaa	2020-02-28 16:35:20 -08:00
Jun Wu	349d1bc33e	nameset: IntersectionSet Summary: Similar to Mercurial's smartset.filterset. Reviewed By: sfilipco Differential Revision: D19912113 fbshipit-source-id: 7cf2101b2eb7ba34b542199293cdbfd3973ef72f	2020-02-28 16:35:19 -08:00
Jun Wu	c0a1a3ab22	nameset: DifferenceSet Summary: Similar to Mercurial's smartset.filterset. Reviewed By: sfilipco Differential Revision: D19912107 fbshipit-source-id: a3187c94f8e0c64f6d92e924ba46e83ce74c3e19	2020-02-28 16:35:19 -08:00
Stanislau Hlebik	168b74e38c	mononoke: fix logging in bookmarks Reviewed By: ahornby Differential Revision: D20161053 fbshipit-source-id: 7c69bf9421dd9e55bc2ca805c2f14b9c4cd0e669	2020-02-28 13:24:29 -08:00
Stanislau Hlebik	9cf34d97ca	mononoke: asyncify WarmBookmarksCache Reviewed By: ikostia Differential Revision: D20159967 fbshipit-source-id: dab201530416f17da4b4a3be6c4ecc04b2c10950	2020-02-28 13:24:28 -08:00
Xavier Deguillard	ffced54dff	packaging: add edenscm/hgext/convert/repo Summary: The python files were missing in the package, let's add them. Reviewed By: quark-zju Differential Revision: D20163637 fbshipit-source-id: 0a7870a21c42d9b92a8b78b51e4954db0d96c593	2020-02-28 12:15:10 -08:00
Durham Goode	a50d0da7fe	py3: fix blame tests Summary: Blame can use a templater which doesn't support bytes. Let's just force all blame output to be unicode, since it doesn't make a ton of sense to blame binary files anyway. Also fix test-annotate.py Reviewed By: quark-zju Differential Revision: D19907530 fbshipit-source-id: a7a47246368ed50f65486e824f93552872adc09a	2020-02-28 11:32:16 -08:00
Durham Goode	54484268fb	py3: more commit cloud fixes Summary: Notably, we drop all the encoding business when dealing with json objects, and instead use mercurial.json. Reviewed By: sfilipco Differential Revision: D19888130 fbshipit-source-id: 2101c32833484c37ce4376a61220b1b0afeb175a	2020-02-28 11:32:16 -08:00
Durham Goode	84a42f3471	py3: fix a number of commit cloud tests Reviewed By: xavierd Differential Revision: D19888131 fbshipit-source-id: ce1bc011bf76e8cf4bb9bdc0930b8c916229d66d	2020-02-28 11:32:15 -08:00
Durham Goode	98ed0fc5b0	py3: fix a few test-dirstate* tests Reviewed By: xavierd Differential Revision: D19888129 fbshipit-source-id: 947ea1bd9c5425fe3babcc60d6b885bde8fc4e2f	2020-02-28 11:32:15 -08:00
Thomas Orozco	82027505a0	mononoke/mercurial: add tests for metadata extraction Summary: I noticed in my earlier Bytes 0.5 diff that this doesn't have local test coverage (there might be things somewhere else in the test suite that look for it). Let's add some. Reviewed By: ahornby Differential Revision: D20139437 fbshipit-source-id: c17e4516574d674bb0b009cd1f322008fb3c1a79	2020-02-28 10:54:04 -08:00
Jun Wu	b6cea95ea5	dag: use Bytes to avoid some VertexName copies Summary: This is an example about how to use the new Bytes type. The performance change is not obviously visible in benchmarks since the bottleneck is not at the bytes copying. Reviewed By: DurhamG Differential Revision: D19818720 fbshipit-source-id: a431ae206cfa4fa08b2e162a48b3d7cbcd900f7f	2020-02-28 09:23:59 -08:00
Jun Wu	76ab726056	dag: switch from bytes to minibytes Summary: The APIs are compatible so the switch is straightforward. Reviewed By: DurhamG Differential Revision: D19818713 fbshipit-source-id: 504e9149567c90eb661804e0dad20580a401aa76	2020-02-28 09:23:59 -08:00
Jun Wu	9e3920ca1c	dag: fix benchmarks Summary: D19559127 forgot those files. Reviewed By: DurhamG Differential Revision: D19818715 fbshipit-source-id: 92321492eae89ed9f748800b3bfcc306a54aab20	2020-02-28 09:23:59 -08:00
Jun Wu	c417232b1b	mutationstore: update lag_threshold Summary: D20042045 changes the meaning of "lag_threshold". Update the value in mutation store accordingly. Reviewed By: DurhamG Differential Revision: D20043116 fbshipit-source-id: 154e6dc2aa88ab0a9a9b21929ae5fa6163dcd403	2020-02-28 09:23:59 -08:00
Jun Wu	1962fd5f5b	indexedlog: update lagging indexes at open time Summary: Previously indexes are only updated at `sync()` time. This diff makes it so `open()` can also update lagging indexes. This should make index migration (ex. D19851355) smoother - indexes are built in time and users suffer less from the absent of indexes. Reviewed By: DurhamG Differential Revision: D20042046 fbshipit-source-id: 20412661a0ca4f5f67b671137c47b6373a42981d	2020-02-28 09:23:58 -08:00
Jun Wu	6da3bdadd2	indexedlog: extract logic writing indexes to disk to a method Summary: The logic is currently only used by `sync()`. I'd like to reuse it at `open()`. Reviewed By: DurhamG Differential Revision: D20042044 fbshipit-source-id: 5c9734ff68bdcf8f8c8710c6a821b18d3afeaca0	2020-02-28 09:23:58 -08:00
Jun Wu	afb24f8a8a	indexedlog: change IndexDef.lag_threshold from bytes to entries Summary: This is more friendly for indexedlog users - deciding lag_threshold by number of entries is easier than by bytes. Initially, I thought checking `bytes` is cheaper and checking `entries` is more expensive. However, practically we will have to build indexes for `entires` anyway. So we do know the number of entries lagging behind. Reviewed By: DurhamG Differential Revision: D20042045 fbshipit-source-id: 73042e406bd8b262d5ef9875e45a3fd5f29f78cf	2020-02-28 09:23:58 -08:00
Jun Wu	55363a78a7	indexedlog: add API to convert `&[u8]` to zero-copy `Bytes` Summary: This can be useful for users of indexedlog when they want `Bytes` (to get rid of the lifetime parameter). This might be useful for storage layer that wants to take the ownership of the returned bytes. Reviewed By: xavierd Differential Revision: D19818714 fbshipit-source-id: cb2d4e7deff921915e07454fee15cb94a3d5c00d	2020-02-28 09:23:57 -08:00
Jun Wu	556850e715	indexedlog: remove unused mmap utility functions Summary: Those utilities are no longer necessary since the new code uses Bytes. Reviewed By: xavierd Differential Revision: D19818717 fbshipit-source-id: 0b43af0f1eae1a4288e84d4170db058b27f80334	2020-02-28 09:23:57 -08:00
Jun Wu	aaf59c569d	indexedlog: replace Mmap with Bytes in Log Summary: This simplifies the code a bit and makes it cheaper to clone the Log. Reviewed By: xavierd Differential Revision: D19818716 fbshipit-source-id: bbf07b8b36009d53b63d8066ec422fc3c3796840	2020-02-28 09:23:57 -08:00
Jun Wu	90ee3cb05a	indexedlog: remove ChecksumTable Summary: It's no longer used since Index now has inlined its checksum logic. Reviewed By: ikostia Differential Revision: D19850744 fbshipit-source-id: eb134e4c1613573a2d238710b44ad8119c80a5ee	2020-02-28 09:23:56 -08:00
Jun Wu	a1601bfdd9	indexedlog: bump index filename Summary: Change index filename and metadata name. This makes sure the new format and old format are separate so upgrading or downgrading won't have issues. Reviewed By: DurhamG Differential Revision: D19851355 fbshipit-source-id: 25dee018073a90040f5818b32b753a3f589c10e0	2020-02-28 09:23:56 -08:00
Jun Wu	6f4bf325d5	indexedlog: write Checksum inline with Log Summary: Enhance the index format: The Root entry can be followed by an optional Checksum entry which replaces the need of ChecksumTable. The format is backwards compatible since the old format will be just treated as "there is no ChecksumTable", and the ChecksumTable will be built on the next "flush". This change is non-trivial. But the tests are pretty strong - the bitflip test alone covered a lot of issues, and the dump of Index content helps a lot too. For the index itself without ".sum", checksum, this change is bi-directional compatible: 1. New code reading old file will just think the old file does not have the checksum entry, similar to new code having checksum disabled. 2. Old code will think the root+checksum slice is the "root" entry. Parsing the root entry is fine since it does not complain about unknown data at the end. However, this change dropped the logic updating ".sum" files. That part is an issue blocking old clients from reading new data. Reviewed By: DurhamG Differential Revision: D19850741 fbshipit-source-id: 551a45cd5422f1fb4c5b08e3b207a2ffe3d93dea	2020-02-28 09:23:55 -08:00
Jun Wu	b9e3046a8d	indexedlog: add Checksum entry to Index Summary: To solve the soundness issue of ChecksumTable raised by the last diff. I plan to move Checksum logic to Index. This has multiple benefits: - Solve the soundness issue of ChecksumTable. - Indexedlog no longer writes the ".sum" files. `atomic_write` can be quite slow (tens of milliseconds) on Windows. So this should help perf - with many indexes, it can save hundreds of milliseconds on Windows per indexedlog sync. This diff adds the definition and serialization of the new Checksum entry. The index format is not updated yet. Reviewed By: markbt Differential Revision: D19850742 fbshipit-source-id: df6e6ed12a12ef0d2a782dc9d6b4dc5dec3f4b46	2020-02-28 09:23:55 -08:00
Jun Wu	0f09413ed4	indexedlog: add a broken test showing checksum_table is racy Summary: With the last change, mmap cost is reduced, but ChecksumTable is unsound in a corner case: the buffer to check is shorter than what ChecksumTable covers: checksum: \|----chunk----\|----chunk----\|----chunk--\| buf: \|-------------------------------\| \| ^ ^ logic len physical len The checksum table will be unable to verify the last chunk, since it does not have enough data in buf. The issues is exposed by stress testing the multithread sync tests. It's not always easy to reproduce, though. Reviewed By: markbt Differential Revision: D19850745 fbshipit-source-id: a1a96080163b7b9b56dcd6c1673d5d8d10e18a2b	2020-02-28 09:23:55 -08:00
Jun Wu	1e10527482	indexedlog: share Bytes between Index and ChecksumTable Summary: This avoids some extra mmap syscalls by ChecksumTable. Reviewed By: xavierd Differential Revision: D19818721 fbshipit-source-id: dace55193f2b4b0f35e3868781faa2d2998d3b58	2020-02-28 09:23:54 -08:00
Jun Wu	1ece621c4d	indexedlog: replace Mmap with Bytes in Index Summary: This simplifies the code a bit (no special cases about 0-sized mmap buffers) and makes it cheaper to clone the index buffer (just an Arc::clone, without another mmap syscall). Reviewed By: xavierd Differential Revision: D19818718 fbshipit-source-id: e96d42af74c7f0bb11703c5da31cdfbd5d76c372	2020-02-28 09:23:54 -08:00
Jun Wu	918672b106	tracing-collector: support owned strings in TreeSpans Summary: TreeSpans used to use `&str`, which adds a lifetime to the struct, making it harder to be used in the Python land. Use a type parameter so TreeSpans<String> can be used. Reviewed By: DurhamG Differential Revision: D19797708 fbshipit-source-id: c66429abfaf16d876151ca6f29da976bed91485d	2020-02-28 09:16:14 -08:00
Jun Wu	4cd7df6a01	tracing-collector: rename structs Summary: TreeSpan -> RawTreeSpan; TreeSpanWithMeta -> TreeSpanRef. I'm going to add a non-reference version of TreeSpanRef. Differential Revision: D19797701 fbshipit-source-id: 42b04c23d4d0ddbe821b94fa2ccb133ce9eafa05	2020-02-28 09:16:14 -08:00
Jun Wu	957617c8b8	tracing-collector: support filtering in TreeSpans Summary: The filtering interface allows callsite to select what they want. It's similar to manifest walk with files or directory matchers in source control. Reviewed By: DurhamG Differential Revision: D19784467 fbshipit-source-id: 5cf6e4016d6fa1c90f8aeccc50809baccd4af5ab	2020-02-28 09:16:13 -08:00
Jun Wu	366e701239	tracing-collector: support Events in TreeSpans Summary: The idea is that instants (events) can be a drop-in replacement for `ui.log`. Reviewed By: DurhamG Differential Revision: D19782897 fbshipit-source-id: 795bbba23d921e460f723f19ef529b203aea366a	2020-02-28 09:16:13 -08:00
Jun Wu	d205592d42	tracing-collector: extract logic finding parent span to a function Summary: This function will be reused by the next diff. Reviewed By: DurhamG Differential Revision: D19782895 fbshipit-source-id: 1e636eabee9b0dffd287a1e6784a24ab2259f51f	2020-02-28 09:16:13 -08:00
Jun Wu	8b5fdc01fc	tracing-collector: put treespans into a struct Summary: This allows us to define methods on the treespans, such as filtering APIs. Reviewed By: DurhamG Differential Revision: D19782896 fbshipit-source-id: 2e7bd8344c0196e382728c26a8233abf944bbf29	2020-02-28 09:16:12 -08:00
Alex Hornby	938830d3f6	mononoke: walker: add ability to track route to node Summary: Add ability to track route to node, so that one could report the node from which failing step started from. Reviewed By: ikostia Differential Revision: D20097615 fbshipit-source-id: 4f2c000f54bd212225533e7f3570178020f34a9d	2020-02-28 09:01:35 -08:00
Kostia Balytskyi	cec057adc5	mononoke: add some perf counters for hydrated getbundle responses Summary: In case this starts to cause problems, let's have a way to correlate those problems with some exported metrics. Reviewed By: StanislavGlebik Differential Revision: D20158822 fbshipit-source-id: 6ac9e25861dbedaecdf04fd92bda835ae66535eb	2020-02-28 08:30:43 -08:00
Kostia Balytskyi	7ed52ee31b	mononoke: return hydrated bundles for infinitepush, if config says so Summary: ## Wider goal See D20068839 ## This diff This diff actually implements the conditional hydration of `getbundle` responses, as described in the D20068839. Note that as well as implementing support for hydrated `getbyndle` responses, this diff also implements support for changegroup v3 and lfs in such responses, which is needed if we are to do this kind of stuff in LFS-enabled repository. Reviewed By: StanislavGlebik Differential Revision: D20068838 fbshipit-source-id: fbdd3f8f5fb7cd2cb60473a94094553a1d4b4d2f	2020-02-28 08:30:43 -08:00
Alex Hornby	7f09703c4c	mononoke: walker: log per-run session id to scuba for validate Summary: Extend the session id logging to the validate command by adding ability to set the progress reporters scuba builder. Reviewed By: ikostia Differential Revision: D20074153 fbshipit-source-id: ceaeebdb7eb976080061ad3b76b22d7a0f7bd891	2020-02-28 04:57:09 -08:00
Alex Hornby	7baf1066ab	mononoke: walker: fix performance regression in loading file data for compression-benefit Summary: Fix performance regression in loading file data in compression-benefit subcommand Reviewed By: StanislavGlebik Differential Revision: D20142143 fbshipit-source-id: 0b9d93feaddab1df4b9d5777e0637f35aed2feda	2020-02-28 04:57:08 -08:00
Thomas Orozco	c6957c1f1e	mononoke/newfilenodes: use for for_sharded_connection() Summary: I canaried with this but I forgot to fold it in -_- Reviewed By: HarveyHunt Differential Revision: D20158157 fbshipit-source-id: 4a570bbca421d8c3e1e66605f164f2b8e2a433f6	2020-02-28 04:53:03 -08:00
Kostia Balytskyi	d5080d20ce	mononoke: asyncify get_manifest_and_filenodes in getbundle_response Summary: ## Wider goal See D20068839 ## This diff Modernize this particular function Reviewed By: StanislavGlebik Differential Revision: D20097802 fbshipit-source-id: fe76aaf2c0b65cf9b47a1dedc66d417d22cad255	2020-02-28 04:36:38 -08:00
Kostia Balytskyi	7755c4c4e6	mononoke: asyncify prepare_filenode_entries_stream in getbundle_response Summary: ## Wider goal See D20068839 ## This diff Modernize this particular function. Reviewed By: krallin Differential Revision: D20097805 fbshipit-source-id: bbcf371921d3a709cc7178ec50b7729bddf1f630	2020-02-28 02:49:57 -08:00
Thomas Orozco	c680696e40	mononoke: defer hook loading Summary: Most binaries don't need hooks. Let's not require them. This might not be very long lived since Simon is working on removing lua hooks, but this was a trivial fix. Reviewed By: johansglock Differential Revision: D20140026 fbshipit-source-id: cc74b37459f63c5dd550c5779b72aa1d6531202c	2020-02-28 02:03:07 -08:00
Thomas Orozco	515f4a507d	mononoke/cachelob: remove Memcache blob write leases Summary: (this doesn't remove ad-hoc leases, like derived data) Let's see if this has any impact on performance. We no longer fail Manifold writes on conflicts, and Reviewed By: StanislavGlebik Differential Revision: D20038572 fbshipit-source-id: 4a972ff09ceb65e69a1d22a643a8f2d9b2ab1b17	2020-02-28 01:59:36 -08:00
David Tolnay	37a8401761	rust/thrift: Un-rename futures-preview dependency Summary: The Thrift generated code depends only on futures 0.3, not 0.1. Thus it isn't necessary to depend on renamed:futures-preview and we can depend on futures-preview directly, which is exposed to Rust code as `futures::`. Reviewed By: jsgf Differential Revision: D20145921 fbshipit-source-id: 5cae94ec6747a374c2bf05f124ab237c798de005	2020-02-27 22:27:58 -08:00
David Tolnay	d8bd00ce36	rust/thrift: Drop unused dependencies on old futures in various places Summary: The last uses of futures 0.1 were removed in D18411564 and D18392252. A later diff will switch thrift from using renamed:futures-preview to plain futures-preview to prepare for eliminating the -preview suffix. Reviewed By: jsgf Differential Revision: D20143832 fbshipit-source-id: b7fd79f18368ade59eeba6ed0ac09613000c046b	2020-02-27 22:24:10 -08:00
Adam Simpkins	e7e58e4eb1	make a few additional enhancements to the CLI telemetry code Summary: Add a `TelemetrySample.fail()` method to report error information in a sample, even when it is used in a `with` context that completes without an exception. Also add a `TestTelemetryLogger` to help check telemetry logging behavior in unit tests. Reviewed By: genevievehelsel Differential Revision: D20136170 fbshipit-source-id: ad94d044c7ae0835e3fe17aaa74eb92dfd41bf8e	2020-02-27 19:25:35 -08:00
Jun Wu	b1f8456309	phabdiff: allow reviewers to be used as a string Summary: It was a list. Make it possible to use it as a string. Reviewed By: xavierd Differential Revision: D20144811 fbshipit-source-id: b280c0344215a4c23ab9c63d89f47adf34fb06f3	2020-02-27 19:21:44 -08:00
Jun Wu	ae8f6ff8e8	tests: opt-in DUMMYSSH_STABLE_ORDER for more tests Summary: This should help reduce test flakiness. Reviewed By: xavierd Differential Revision: D19872952 fbshipit-source-id: d66f6c404534b3f47903b478e3cdfdda5ed46284	2020-02-27 17:54:08 -08:00
Durham Goode	becb7da2e3	py3: use 's' instead of 'C' for dirstatetuple parsing Summary: The state entry of a dirstate tuple is a single character. In python 3 it's a unicode string. To parse it, previously we used 'C' which takes a single character unicode string and (little did I know) returns an int. We were storing this in a char, which causes corruption. Let's switch to reading the string, and just grabbing the first byte. Reviewed By: xavierd Differential Revision: D20143094 fbshipit-source-id: d9946c0cefdafe0941f4bdac070659fac27f30e3	2020-02-27 13:07:13 -08:00
Jeff Zhang	c517e81329	Push `compat` down deeper into subcommands & make subcommand functions `async` in eden/mononoke/cmds/admin/main.rs Summary: Continue to push `compat()` deeper into subcommands. This enables us to refactor each file one at a time and ultimately remove the old futures from our code base. Reviewed By: farnz Differential Revision: D20132126 fbshipit-source-id: cc10dde6eda7ddcbf911dbe8d3ebe1713f8ec2ab	2020-02-27 12:39:28 -08:00
Thomas Orozco	b7dfbdd09d	mononoke/newfilenodes: stop using i8 internally for is_tree Summary: Makes the code a little nicer to work with. Reviewed By: HarveyHunt Differential Revision: D20138720 fbshipit-source-id: 19f228782ab3582739e35fddcb2b0bf952110641	2020-02-27 12:34:23 -08:00
Thomas Orozco	ed602e6009	mononoke/newfilenodes: retry on master whens paths are missing Summary: Paths are in a different replica, so they can be missing even if copy info is present. Let's fallback to master in this case. Differential Revision: D20098902 fbshipit-source-id: 838ab1c70a74420c431a2f442f1504c8edd29a2e	2020-02-27 12:34:23 -08:00
Thomas Orozco	4d2932c43b	mononoke/newfilenodes: switch to a virtual sharding strategy Summary: Locking by physical shard worked earlier in this stack as indicated in the benchmarks, but after Ondemand restored their fetching for www, it proved insufficient in terms of parallelism, and resulted in substantially slower gettreepacks. Besides, with the "physical sharding" approach, we found ourselves between a rock and a hard place in terms of what to do with paths: - We could keep holding the semaphore for a filenode while fetching paths. This is undesirable because it further limits our level our concurrency (because fetching a filenode + paths is going to be at least 2x as slow as fetching a filenode). - We could fetch them without holding a lease at all. This is even more undesirable, because it means that when we release the semaphore for a given shard, we haven't filled the cache yet. This means that if we have a queue of 2 requests for the same bit of data, we're going to fetch twice (task A acquires the lock, goes to MySQL for the filenode, releases the lock and starts going to paths, at which point task B acquires the lock and goes to MySQL again since the filenode hasn't been filled yet). To fix this, I had to add a dedicated cache for paths, and put it behind semaphores as well. In the example above, this would ensure task B finds a "partial filenode" in the cache and doesn't go to MySQL (instead, it goes straight up to queuing for access to paths, where it will wait behind task A and also won't hit MySQL). There are a few problems with this: - It's a lot of extra complexity (because we need to handle half misses where we have the filenode but not the path). - It ties together our level of concurrency a second time to that of the underlying number of physical shards, which is kinda meaningless when some of this data can be provided by Memcache to begin with. This diff fixes both problems. The root cause of our problem that is that we're tying our level of concurrency to physical MySQL shards, whereas what we actually want is a tunable level of concurrency that matches our work load, yet effectively deduplicates queries. In this diff, I'm updating our exclusive locking to be purely virtual. This means that we're still not over-fetching, but we are no longer constrained by the parallelism of the underlying DB (this does mean we might queue up requests there, but they won't be duplicate requests). This also results in simpler code, and opens up the way for further improvements in the future, such as using Memcache lease-get operations to further deduplicate calls, if we'd like. As part of that, I've also updated our remote_cache to use the same CacheKey entity as the local cache, to avoid spending time producing new keys when we have perfectly good ones available. Reviewed By: StanislavGlebik Differential Revision: D20097821 fbshipit-source-id: 03d7be9082982fc1c6ef365d541c1ed8ae3e6e8d	2020-02-27 12:34:23 -08:00
Thomas Orozco	b4e8201d4c	mononoke/newfilenodes: track perf counters appropriately Summary: Let's record perf counters properly. Reviewed By: StanislavGlebik Differential Revision: D20097823 fbshipit-source-id: 0daed281d3c080fcbe7b4fac996fb265bdd6d408	2020-02-27 12:34:22 -08:00
Thomas Orozco	500baffb5c	mononoke/newfilenodes: add tests for cache fill behavior Summary: This adds a test for our cache fill behavior, which is to fill the remote cache if we miss in local cache. I hadn't added this later and it's a little easier to add now that the refactor for FilenodeInfo is through. Reviewed By: ahornby Differential Revision: D19905396 fbshipit-source-id: 88b5fd83f5d2213e91efc3c5dfb91dfe4e395136	2020-02-27 12:34:22 -08:00
Thomas Orozco	95d463ce47	mononoke/filenodes: Remove path from FilenodeInfo Summary: This updates our filenodes implementation to use different types for writing (`PreparedFilenode`) and reading `(FilenodeInfo`). The bottom line is that this avoids a bunch of cloning of paths on the read path, which doesn't need to return the path to the caller, since the caller already knows it! We can also take it out of Memcache, since we don't need Memcache to tell us the path for a blob we could only possibly have found by having the path to begin with. This does update our filenodes serialization format. I bumped MC_CODEVER accordingly. Reviewed By: StanislavGlebik Differential Revision: D19905400 fbshipit-source-id: 6037802c1773de564cade8e264d36087382ee15a	2020-02-27 12:34:21 -08:00
Thomas Orozco	7fa9607859	mononoke/newfilenodes: remove sqlfilenodes Summary: This removes the old sqlfilenodes implementation, since we're now using the new one. There's also a bit of cruft here and there we can get rid of. Reviewed By: StanislavGlebik Differential Revision: D19905395 fbshipit-source-id: 2526b6d65eeb981f5aedda9951b44b389ecec29d	2020-02-27 12:34:21 -08:00
Thomas Orozco	149e15f2ad	mononoke: use spawn_future in getpack to fetch history Summary: The former implementation would eagerly query Memcache when fetching history (due to how old futures work) for files in getpack, but the new one does not. This means the new one loses out on a lot of buffering, which the old one used to do. This diff emulates the old behavior by eagerly querying filenodes in getpack, which improves performance on a very big getpack (32K files) by about 3x, and makes it 30% faster than the old code, instead of > 2x slower. Note that I'm not certain we really want to do this kind of aggressive buffering in getpack long term, but for now, I'd like to keep this unchanged. Reviewed By: StanislavGlebik Differential Revision: D19905398 fbshipit-source-id: 49f9a2cd505a98123fd1dabb835e8e378d45c930	2020-02-27 12:34:21 -08:00
Thomas Orozco	f6866eb97d	mononoke: switch to new filenodes implementation Summary: This updates Mononoke to use the new filenodes implementation introduced earlier in this stack. See the test plan for detailed performance results supporting why I'm making this change. Reviewed By: StanislavGlebik Differential Revision: D19905394 fbshipit-source-id: 8370fd30c9cfd075c3527b9220e4cf4f604705ae	2020-02-27 12:34:20 -08:00
Thomas Orozco	a039745642	mononoke/newfilenodes: introduce timeouts talking to Memcache, MySQL Summary: Since we have one connection per shard, it's a good idea to make sure we don't keep those locked for too long. This diffs adds generous timeouts to protect against this, as well as ODS reporting to track errors. Reviewed By: StanislavGlebik Differential Revision: D19905393 fbshipit-source-id: ee4f4d3e33cf48a9002b016e31d37a401c6578f2	2020-02-27 12:34:20 -08:00
Thomas Orozco	c31b7d9ef9	mononoke/newfilenodes: introduce remote caching Summary: This introduces caching of filenodes to Memcache as in the old filenodes implementation. The code is mostly was ported over from the existing filenodes implementation, and converted to async / await. However, one key difference is that the lookups happen once we hold the semaphore to talk to the underlying MySQL shard. The reason for this is: - Reads to Memcache are really fast. They're often under 1ms. If you're going to miss in Memcache and have to go to SQL, it won't make you much slower. - Reads to Memcache are kinda expensive CPU-wise. Data in Memcache is compressed, and we often see a lot of our CPU cycles spent talking to Memache when we're under load. - Memcache isn't an infinite resource. If we're reading the exact same key a hundred times, that's going to hit the same Memcache box. A bit of deduplication on our end is a nice thing to strive for. Besides, our own thread pool we use to talk to Memcache is limited in size. From a performance perspective, this doesn't make things any slower, but reduces CPU usage when we'd otherwise have a lot of duplicate fetching. Finally, note that this update also includes support for dirty-tracking in our local cache. We use this to know if we should fill the remote cache (if we 100% hit in local cache, we don't fill the remote cache). Reviewed By: StanislavGlebik Differential Revision: D19905390 fbshipit-source-id: 363f638bb24cf488c7cd3a8ecea43e93f8391d3f	2020-02-27 12:34:19 -08:00
Thomas Orozco	1c94a586f0	mononoke/newfilenodes: introduce local caching Summary: This is the meat of the change I'm trying to make here. This updates newfilenodes to check their cache before dispatching queries to MySQL once they acquire the connection. Since we only get one connection per shard, this ensures that we don't query several times for the same piece of data. Note that the caching structure is a little different from the old one, which cached entire filenode info. Instead, this now caches the exact data we'd get out of MySQL, since we want to map MySQL queries 1-1 to cache lookups. With this change, we also now have a local cache for file history queries. Historically, we hadn't cached those at all, but with this change, we can get a lot of value of caching them even for small period of time in order to de-amplify reads to MySQL and Memcache. However, they are in separate cache pools to make sure they don't evict point filenodes, which we use for gettreepack (and have a good hit rate, unlike history blocks, which have a pretty poor hit rate). Note that having those semaphored connections might feel a little scary, but it's worth noting that the exact same bottleneck is implicitly present in the existing filenodes implementation, since we can only have one active query to any given shard a given time. That said, this approach also gives us a little more future flexibility, if we'd like, since we could map multiple semaphores to "sub shards" that map N-to-1 to real, physical shards. Reviewed By: HarveyHunt Differential Revision: D19905391 fbshipit-source-id: 02b5efaa44789e6afcccdeb9ee2b4791f7c3c824	2020-02-27 12:34:19 -08:00
Thomas Orozco	ab4f7adaeb	mononoke/newfilenodes: introduce a queue-conscious filenodes implementation Summary: This introduces a new implementation of filenodes that maintains its own queuing on top of the queuing enforced by the SQL crate. Later in this stack, the goal is for this implementation to avoid dispatching duplicate queries when there is a lot of contention talking to MySQL, which happens when large changes land and suddenly everyone wants the updated code. The underlying goal is to avoid dispatching a lot of duplicate queries when there is contention. Indeed, if there is contention, then the latency between query and response increases. As a result, without visibility in the queue, the following can happen: - Task 1 looks for A in the cache. It misses - Task 1 dispatches a SQL query - Task 2 looks for A in the cache. It misses - Task 2 dispatches a SQL query - Task 3 looks for A in the cache. It misses - Task 3 dispatches a SQL query - ... - Task 1's SQL query finally executes and fills the cache. - All other queries execute anyway. The longer the dispatch queue, the longer it takes to run those queries. Looking at Mononoke's stats in prod, this happens pretty often: https://pxl.cl/10xxmo (the spike at 3pm was a 10K-files change in fbsource, for example). The goal of this stack is to avoid this effect, by checking the cache only once we know we're ready to go to SQL. In this particular diff, what's added is: - The SQL read and write implementation. This is all implemented using new futures, but the logic should be largely unchanged from before (i.e. we store filenodes and their associated copy info in shards by the filenode's path — not the source path if there is copy info —, and paths in their own shard). The queries themselves largely unchanged from the existing filenodes, with only a few tweaks: - Filenodes and copy info are now selected in one go. - There are types to distinguish path hashes and paths. - The structs to support this implementation. Reviewed By: StanislavGlebik Differential Revision: D19905397 fbshipit-source-id: bec981e7bfb396d62eb06e5ce249c21555afc64b	2020-02-27 12:34:19 -08:00
Thomas Orozco	341b4f1bc3	mononoke/filenodes: expect a `Vec` of filenodes to insert Summary: The API expects a stream of filenodes to insert, but we actually never used that ability. Instead, every single callsites has a `Vec`, which it converts to a stream and passes that in. I'd like to change this for two reasons: - It's un-necessary - It makes the code more complex on the Filenodes implementation side, and less efficient, since we need to `chunk()` there in small chunks, which might not all be in the same shard. If we get the entire `Vec` at once, we can chunk on a per-shard basis (this happens later in this stack). Besides, if we end up having a stream and wanting the old behavior, we can always call `chunk()` the stream and call `add_filenodes` on each batch (which is actually nicer because if you have a futures 0.2 stream that isn't static, you can do this, but you can't turn it into a `BoxStream`!). Reviewed By: StanislavGlebik Differential Revision: D19902537 fbshipit-source-id: a4c030c4a51afbb6e9db133b32464009eed197af	2020-02-27 12:34:18 -08:00
Xavier Deguillard	6fac9ebad0	revisionstore: add a get_stripped method to ContentStore Summary: This new method returns the content of a blob without the copy-from metadata header. Reviewed By: DurhamG Differential Revision: D20102889 fbshipit-source-id: e96f636b7d30460b59707a2cb700d667e616116a	2020-02-27 12:29:42 -08:00
Stanislau Hlebik	cc8be5997e	mononoke: asyncify derived data Reviewed By: krallin Differential Revision: D20139701 fbshipit-source-id: 7f1c8370707eb415dd7e23d94eb923846f7ef59b	2020-02-27 12:17:54 -08:00
Durham Goode	88f9e15086	phrevset: use Mercurial json instead of Python json Summary: Python json produces unicode strings in the parsed results. This breaks when passed to parts of the code that now assert that byte strings are required (like the wire protocol). Let's switch phabricator stuff to use Mercurial json, which produces bytes in Python 2 and unicode in Python 3. Reviewed By: ikostia Differential Revision: D20123140 fbshipit-source-id: d1b11426736a0f43ff7e74acf709ab1fd70d5bfe	2020-02-27 09:30:43 -08:00
Alex Hornby	e70f3dc76c	mononoke: walker: log per-run session id to scuba for scrub Summary: Log a per-run session id to distinguish runs more easily. This diff adds the session for scrub logging , following one extends this to validate/progress logging. So that each tail has a separate session logged, setup is delayed until the start of each tail by passing it in as a function. Differential Revision: D19907398 fbshipit-source-id: 8e5470918112321866c67c9f94e703fd46e6a16b	2020-02-27 09:00:44 -08:00
Thomas Orozco	f1121ccef6	mononoke: add a @nocommit hook Reviewed By: HarveyHunt Differential Revision: D20139540 fbshipit-source-id: 0be6d1aa8ad7ad1197197ec886f0cf44bd6b864d	2020-02-27 08:28:05 -08:00

1 2 3 4 5 ...

3985 Commits