Summary:
I noticed that every time we fetch blob from hg, we calculate sha hash and put it into metadata table.
Both calculating sha1 of content and writing it to rocks is fairly expensive, and it would be nice if we can skip doing so in some cases.
In this diff I use inexpensive cache check to see if we already calculated metadata for given blob and skip recalculation
In terms of performance, it reduces blob access time in hot case from **0.62 ms to 0.22 ms**.
[still need to do some testing with buck, but I think this should not block the diff since it seem farily trivial]
This is short-medium term fix, the longer term solution will be keeping hashes in mercurial and fetching them via eden api, but this will take some time to implement
Reviewed By: chadaustin, xavierd
Differential Revision: D30587132
fbshipit-source-id: 3b24ec88fb02e1ea514568b4e2c8f9fd784a0f10
Summary: Similarly to the enqueue benchmark, let's have a dequeue benchmark.
Differential Revision: D30560489
fbshipit-source-id: ae18f7e283e4bab228aaa0f58bff2e6f2cfa3021
Summary:
In order to enqueue and find an element in a hash table, the key needs to be
hashed. Hashing a HgProxyHash relies on hashing a string which is significantly
more expensive than hashing a Hash directly. Note that they both represent the
same data and thus there shouldn't be more collisions.
Reviewed By: chadaustin
Differential Revision: D30520223
fbshipit-source-id: 036007c445c28686f777aa170d0344346e7348b0
Summary:
Allocations are expensive, especially when done under a lock as this increase
the critical section, reducing the potential concurrency. While this yields to
a 1.25x speedup, this is more of a sideway improvement as the allocation is now
done prior to enqueuing. This also means that de-duplicating requests is now
more expensive, as no allocation would be done before, but at the same time,
de-duplication is the non-common code path, so the tradeoff is worthwhile.
Reviewed By: chadaustin
Differential Revision: D30520228
fbshipit-source-id: 99dea65e828f9c896fdfca6b308106554c989282
Summary: The F14 hashmap are significantly faster than the std::unordered_map.
Reviewed By: chadaustin
Differential Revision: D30520225
fbshipit-source-id: d986908c5eac17f66ae2c7589f134c430a3c656e
Summary:
When turning on the native prefetch, EdenFS will enqueue tons of blob requests
to the import request queue. The expectation is then that the threads will
dequeue batch of requests and run them. What is being observed is however
vastly different: the dequeued batches are barely bigger than 10, far lower
than the batch capacity, leading to fetching inefficiencies. The reason for
this is that enqueuing is too costly.
The first step in making enqueuing less costly is to reduce the number of times
the lock needs to be acquired by moving the de-duplication inside the enqueue
function itself. On top of reducing the number of times the lock is held, it
also halves the number of allocation done under the lock.
Reviewed By: chadaustin
Differential Revision: D30520226
fbshipit-source-id: 52f6e3c1ec45caa5c47e3fd122b3a933b0448e7c
Summary:
It turns out that we do want to use a Future to make sure that the tracebus and
watches are completed on the producer and not on the consumer of the future. We
could use a `.via(inline executor)` but the code becomes less readable, so
let's just revert the diff.
Reviewed By: chadaustin
Differential Revision: D30545721
fbshipit-source-id: 524033ab4dbd16be0c377647f7f81f7cd57c206d
Summary:
I need to be able to run 'hg sparse files $PROFILE' in an Eden checkout
for a Sandcastle job that analyzes sparse profiles. We can't run 'hg sparse' at
all in Eden repos, so instead I run it in the backing repo. But that repo is
checked out to the null commit, so let's support --rev on 'hg sparse files' so I
can inspect an arbitrary rev.
Differential Revision: D30561529
fbshipit-source-id: 93b46caa9b63637bf4d63d5438bd23cbffb3983a
Summary:
This config option was used to slowly roll out LFS for a repo.
However, it is no longer used and can therefore be removed.
Reviewed By: StanislavGlebik
Differential Revision: D30511880
fbshipit-source-id: 59fe5925cc203aa609488fdf8ea29e9ff65ee862
Summary: Apparently OSX has different ls -R output than linux.
Reviewed By: singhsrb
Differential Revision: D30566611
fbshipit-source-id: 2f232b12d1971bea18c7131c1ec82244252527c7
Summary:
This adds a new endpoint to EdenApi: `/:repo/snapshot`.
It fetches information about a snapshot (for now, parents and files).
This will be used by the `hg snapshot restore` command.
- Parent will be used to do a normal "hg update" to it
- File changes will be used to apply the working state file changes on top of it.
Reviewed By: ahornby
Differential Revision: D30514854
fbshipit-source-id: b6c6930410cf15fe874eca1fce54314e5011512a
Summary:
Very basic shell for the `hg snapshot restore` command.
For now just reads the id and prints it to stdout. Next diffs will add functionality to it.
Reviewed By: StanislavGlebik
Differential Revision: D30449567
fbshipit-source-id: 9f4e0bb284dff7e6ffb4398942f3247a09511933
Summary:
This refactors the complete_trees endpoint to use the abstraction added on D30451710.
This one needed some extra refactoring, the reason why I separated it on its own diff:
- It was the only endpoint that also returned HTTP 400 errors, as well as 500.
- I made the handler trait return a custom error that may be 400 or 500 errors, but the default is 500 and like anyhow Error, errors automatically convert to that to make the happy path very simple
- Also added a type aliasing that makes the return type simpler to look at.
- Then moved this endpoint to use EdenApiHandler, while keeping the previous behaviour of return 400 error when the path is wrong.
Reviewed By: StanislavGlebik
Differential Revision: D30483501
fbshipit-source-id: 5cb4cd7f3fc2bbdafb9f0e6a78739086c829eee1
Summary:
This diff refactors most endpoints to use the new abstraction added on D30451710.
There are still a few endpoints that couldn't be refactored (see summary of previous diff).
Reviewed By: StanislavGlebik
Differential Revision: D30483500
fbshipit-source-id: ea2cc24afb119bf63a45a6f4bf48706a004743cc
Summary:
Uses the abstraction built on D30451710 to make more endpoints simpler. It needed to be made a little more generic by using streams instead of `Vec`.
Some endpoints can't be migrated as of now:
- hash_to_location and trees as they a custom CBOR stream. We might want to implement a similar abstraction that can deal with custom cbor stream, though since it's just two endpoints, maybe we should just move them to the same as the others, if possible.
- revlog_data as it doesn't use the Wire format. This seems like it's just legacy, and should probably be moved to use the wire format, though the transition is tricky.
- Some endpoints have no input, and this assumes input is a wire cbor struct.
- Some endpoints don't output a stream, but just a cbor object, that will need a different EdenApiHandler trait but it's doable.
The rest seem to be possible to migrate with current abstractions.
Reviewed By: StanislavGlebik
Differential Revision: D30453597
fbshipit-source-id: 0d1068021e83f3e1143f3bc9ee68d20e4c34cb50
Summary:
While I was creating EdenAPI endpoints, I noticed that there was quite a bit of duplicated code in it, for example, the logic to read things from inside a request and write a response.
It also needed several "manual" steps to create a new one, and though types made it hard to make a mistake, it was still possible, as nothing guaranteed you were using the same types everywhere. This diff aims to fix all of that.
Creating an endpoint, before:
1. Create handler function, that receives `State` and returns `TryIntoResponse`, by copying from other functions the initial/final handling, which includes adding some handler info, parsing path and query string extraction, creating a repo context and parsing the wire request.
2. Write a PathExtractor struct by copying the other structs, it must take a repo parameter.
3. Create a new variant in EdenApiMethod and update related methods.
2. From the handler function, convert stuff and call your inner logic function, that actually is different on all endpoints.
3. Write a wrapper for your wrapper, with can be done using `define_handler!`
4. Call `route.post(URL)`, don't forget to use the same Path/QueryStringExtractors.
Creating an endpoint, now:
1. Write a struct that implements `EdenApiHandler`, in which you specify the HTTP method, endpoint, and inner logic.
3. Create a new variant in EdenApiMethod and update related methods.
3. Call `Handlers::setup::<YourHandler>(route)`
This makes it easier to create a new endpoint and harder to make mistakes while doing so. It could be even easier with some procedural macro dark magic, but this is already pretty good and simple.
**On this diff, I only refactor one of the endpoints, to make it easier to review the new "refactor logic", on the next diffs I'll move more endpoints**
Reviewed By: StanislavGlebik
Differential Revision: D30451710
fbshipit-source-id: 9de4de483d4a755bfceafb6d52ec79bbf3f3b5e7
Summary:
Currently we have a problem in tunables - if a tunables was added, and then
removed, then value of tunable is not renewed until the binary is restarted.
This is not great - it makes reverts harder (e.g. if a diff added "tunable =>
true", then reverting the diff won't make any difference until the server is
restarted. So in order to revert the value of the tunable we'd need to write a
diff that does "tunable => false" or restart the binaries, neither of which is
great).
This is counter-intutivite, and this diff fixes it. From what I can tell, this
problem affected only strings/bools/ints - per repo variable weren't affected.
Reviewed By: Croohand
Differential Revision: D30544003
fbshipit-source-id: 353970a259a7faa0682866aae9a5699b8a783b22
Summary: It's convenient to be able to derive multiple derived data types at once
Reviewed By: mitrandir77
Differential Revision: D30545263
fbshipit-source-id: 79ba541981af1340a59145f44a2f7d5a54cc28e1
Summary:
fburl.com/botsdiffs is a random diff
fburl.com/botdiffs is the wiki
Differential Revision: D30547647
fbshipit-source-id: 337d6457cb6403f11fbbc9654f3d34f50d69b0e5
Summary:
Sparse profile change computation is quite slow, since globs mean it
has to iterate over a large portion of the tree even if the sparse profile
change is small. Let's make the BFS async and parallel.
Reviewed By: andll
Differential Revision: D30554961
fbshipit-source-id: bce461ad0b21d1dab1013cf3f501b5744b295c30
Summary:
Avoiding the GIL can speed up tree diff/iteration a lot. Let's extract
the pure Rust matcher when possible. We also convert a Python differencematcher
into a rust DifferenceMatcher when possible, since that is the core use case for
sparse profile change computation.
Reviewed By: andll
Differential Revision: D30553727
fbshipit-source-id: 41d2f13130da18a55f64f9f0f047825bd24144c6
Summary:
In a future diff we'll make tree BFS iteration async and parallel. To
do so we need the matchers to be Send/Sync. Let's update the types for now.
Reviewed By: andll
Differential Revision: D30553731
fbshipit-source-id: 82a441a0caa86b65ca7eaea7283bbbbbb79d5c88
Summary:
In a future diff we'll want to extract matchers into their pure Rust
forms when possible. Let's create a helper function for matcher extraction. In a
later diff we'll make this function smart.
Reviewed By: andll
Differential Revision: D30553730
fbshipit-source-id: 0612a3be54f0286308fc4c43d9a2e5e9aa431a16
Summary:
In a later diff we'll be making sparse profile change computation use a
pure Rust matcher. To do so, we need to be able to represent a difference
matcher in Rust.
Reviewed By: andll
Differential Revision: D30553729
fbshipit-source-id: df2194bbacaa7924bafbfc33e00f25e524df4613
Summary:
The diff algorithm kept track of left vs right stores, and applied the
left vs right tree lookups to the appropriate store. In reality we only ever
have one store, so we're just wasting cycles doing two sequential lookups. Let's
just get rid of the distinction.
Reviewed By: andll
Differential Revision: D30553726
fbshipit-source-id: 72de95d298dad71dc04728d5140b4d489073a33c
Summary:
When testing hg checkout with an empty cache, I noticed the tree
diff'ing phase would alternate between downloading data and doing cpu work. This
implied it was all synchronous. This diff speeds it up by doing the downloading
on a background thread, while the main thread does the actual diff.
Ideally we'd make the actual diff itself parallel as well, but the tree
structure is a tree of semi-mutable references, which would require a larger
refactoring.
Reviewed By: andll
Differential Revision: D30553728
fbshipit-source-id: 4ea953909827cf1ec4fd67ac297f63bfadd67483
Summary:
This change has the unintended effect of causing any Thrift calls to
potentially issue a recursive EdenFS call due to symlink resolution requiring
running `readlink` on the root of the repo itself.
Fixing this isn't really possible, thus let's revert the change altogether, we
can force clients to issue a realpath before issuing EdenFS Thrift calls.
Reviewed By: kmancini
Differential Revision: D30550796
fbshipit-source-id: 9494c8e08c8af2392eeb344879f156cb56f93ea6
Summary:
The documentation allows for not having to test in enqueue if the queue is
still running: if called in the destructor of the owner, no enqueue can
logically happen, and thus we do not need to protect against it.
Reviewed By: chadaustin
Differential Revision: D30520227
fbshipit-source-id: 9d6280ccd7fe875cd06b0746151a2897d1f98d61
Summary:
When trying to push thousands of requestst to the queue, the dequeue side only
manage to pull batches of ~10 requests at most. Let's measure the cost of
enqueue to optimize it.
Reviewed By: chadaustin
Differential Revision: D30503110
fbshipit-source-id: d06ae6741b13b831fa3711fb2dd0e38c3e54193c
Summary:
If hg sync job is pushing a commit, then we have no choice but to accept it,
even if it's too big. So let's not fail if too many commits are pushed.
Reviewed By: ahornby
Differential Revision: D30535025
fbshipit-source-id: eb607a8fbd691d6591ad990e0920411b1ad2f09c