Summary: Restructure the configs so that we can specify more than one blobstore
Reviewed By: lukaspiatkowski
Differential Revision: D13234286
fbshipit-source-id: a98ede17921ed6148add570288ac23636b086398
Summary:
Previously metaconfig depended on BlobRepo, and so ManifoldArgs had to be
definided in BlobRepo. That was an weird dependency, but a necessary one
because of Mononoke config repo. In the previous diffs we got rid of the
Mononoke config repo, so now we can reverse the dependencies.
Reviewed By: lukaspiatkowski
Differential Revision: D13180160
fbshipit-source-id: efe713ce3b160c98d56fc13559c57a920146841f
Summary:
Now that Rust macros can be `use`d like normal symbols, `stats` can
simply import the `lazy_static!` macro without requiring its users to do it.
Reviewed By: Imxset21
Differential Revision: D13281897
fbshipit-source-id: a6780fbace07dd784308e642d4a384322a17c367
Summary:
Let's add a command that builds and reads a skiplist indexes. This indexes will
be used by getbundle wireproto request to decrease the latency and cpu usage.
Note that we are saving only the longest "jump" from the skiplist. This is done
in order to save space.
Reviewed By: jsgf
Differential Revision: D13169018
fbshipit-source-id: 4d654284b0c0d8a579444816781419ba6ad86baa
Summary:
Let's make it use the same ChangesetFethcer as getbundle already does. It will
be used in the next diffs
Reviewed By: lukaspiatkowski
Differential Revision: D13122344
fbshipit-source-id: 37eba612935a209098a245f4be0af3bc18c5787e
Summary:
Most of our revsets are already migrated, let's migrate skiplists as well since
we want to use them in getbundle requests.
Reviewed By: lukaspiatkowski
Differential Revision: D13083910
fbshipit-source-id: 4c3bc40ccff95c3231c76b9e920af5db31b80d01
Summary:
We've recently found that `known()` wireproto request gets much slower when we
send more traffic to Mononoke jobs. Other wireproto methods looked fine, cpu
and memory usage was fine as well.
Background: `known()` request takes a list of hg commit hashes and returns
which of them Mononoke knows about.
One thing that we've noticed is that `known()` handler sends db requests sequentially.
Experiments with sending `known()` requests with commit hashes that Mononoke
didn't know about confirmed that it's latency got higher the more parallel
requests we sent. We suspect this is because Mononoke has to send a requests to
db master, and we limit the number of master connections.
A thing that should help is batching the requests i.e. do not send many
requests asking if a single hg commit exists, but sending the same request for
many commits at once.
That change also required doing changes to the bonsai-mapping caching layer to
do batch cache requests.
Reviewed By: lukaspiatkowski
Differential Revision: D13194775
fbshipit-source-id: 47c035959c7ee12ab92e89e8e85b723cb72738ae
Summary:
Currently we read all bookmarks from primary replica a few times during `hg
pull`. First time when we do listkeys, second time when we get heads.
That might create a high load on primary replica.
However the delay between primary and secondary replicas is fairly small, and so it
*should* be fine to read bookmarks from secondary local replica as long as there is only
one replica per region (because if we have a few replicas per region, then
heads and listkeys response might be inconsistent).
Reviewed By: lukaspiatkowski
Differential Revision: D13039779
fbshipit-source-id: e1b8050f63a3a05dc6cf837e17a448c3b346b723
Summary:
According to [Git-LFS Plan](https://www.mercurial-scm.org/wiki/LfsPlan), `getfiles` instead of file content should return file in the [following format](https://www.mercurial-scm.org/wiki/LfsPlan#Metadata_format)
```
oid: sha256.SHA256HASH
size: size_int
```
Hg client requests files using sha1 hgfilenode hash. To calculate sha256 of the content, Mononoke is fetching the file from blobstore to memory, and calculate sha256.
It does not give any profit in time and memory consumptions, comparing to non-LFS transfer of Mononoke.
*Solution:*
To put a `key-value` to blobstore, after first request of the file. This means, that after hg client requested sha256 of the file for the first time, after calculation, put it to the blobstore.
Next request of the sha256 of the file content avoid recalcualtion of sha256 in Mononoke. It return sha256 saved in the blob.
Reviewed By: StanislavGlebik
Differential Revision: D13021826
fbshipit-source-id: 692e01e212e7d716bd822fa968e87abed5103aa7
Summary:
Mononoke requires several references to the same blob in the blobstore.
Sha256 aliases are good example. [post](https://fb.facebook.com/groups/scm.mononoke/permalink/739273266435251/)
Short description of alias mechanism:
- we have `key: value` blob in blobstore.
- put a `key1: key` blob in blobstore to have 2-step access from `key1` to `value`.
All the keys in Mononoke are of the form `type_prefix.hash_name.hash`
I expanded MononokeId interface to have an access to the prefix `type_prefix.hash_name` for verification `key` content (see alias mechanism description).
Reviewed By: farnz
Differential Revision: D13084145
fbshipit-source-id: 5b8a4e80869481414a7356ccd7c9aab6e24a5138
Summary:
Purpose:
- Sha256 alias link to file_content is a required part of LFS getfiles works correct.
LFS protocol uses SHA-256 to refer to the file content. Mononoke uses Blake2.
To support LFS in Mononoke we need to set up a link from SHA-256 hash of the content to blake2 of the content.
These links are called aliases.
- Aliases are uploading together with file content blobs.
But only for new push operations.
- If repo is blobimported from somewhere, we need to make sure, that all the links are in blobstore.
If repo was blobimported before aliases were added then it may miss aliases for some blobs.
- This tool can be used to
- find if any aliases are missing
- fill missing aliases.
Implementation:
- Run in repo.
- Iterate through all changesets.
- Go through all the file_content blobs in the changesets
- Verify/generate alias256 links to file_content blobs.
Mode supported:
- verify, count the number of errors and print to console
- generate, if blob is missing to add it to the blobstore
Reviewed By: StanislavGlebik
Differential Revision: D10461827
fbshipit-source-id: c2673c139e2f2991081c4024db7b85953d2c5e35
Summary:
Added a get_stats() Hashmap<String, Box<Any>> method for all ChangesetFetchers.
The CachingChangesetFetcher now returns statistics for cache.misses: usize, cache.hits: usize, fetches.from.blobstore: usize and max.latency: Duration.
Reviewed By: StanislavGlebik
Differential Revision: D10852637
fbshipit-source-id: 34114fd94c47aa26ea525fcc4ff76ad60827bc71
Summary:
Sharding filenodes by path should stop us knocking over databases -
make it configurable.
Reviewed By: StanislavGlebik
Differential Revision: D12894523
fbshipit-source-id: e27452f9b436842e1cb5e9e0968c1822f422b4c9
Summary:
Bookmarks point to Bonsai changesets. So previously we were fetching bonsai
changeset for a bookmark then converting it to hg changeset in `get_bookmark`
method, then converting it back to bonsai in `pushrebase.rs`.
This diff adds method `get_bonsai_bookmark()` that removes these useless
conversions.
Reviewed By: farnz
Differential Revision: D10427433
fbshipit-source-id: 1b15911fc5d77483b5a135a8d4484fccff23c774
Summary:
getfiles implementation for lfs
The implementation is the following:
- get file size from file envelope (retrieve from manifold by HgNodeId)
- if file size > threshold from lfs config
- fetch file to memory, get sha256 of the file, will be fixed later, as this approach consumes a lot of memory, but we don't have any mapping from sha256 - blake2 [T35239107](https://our.intern.facebook.com/intern/tasks/?t=35239107)
- generate lfs metadata file according to [LfsPlan](https://www.mercurial-scm.org/wiki/LfsPlan)
- set metakeyflag (REVID_STORED_EXT) in the file header
- if file size < threshold, process usual way
Reviewed By: StanislavGlebik
Differential Revision: D10335988
fbshipit-source-id: 6a1ba671bae46159bcc16613f99a0e21cf3b5e3a
Summary: Reverting the myrouter based filenodes for now as they cause some problems
Reviewed By: jsgf
Differential Revision: D10405364
fbshipit-source-id: 07da917455ae5af9ef81a24d99f516171101c8a7
Summary:
According to [Mercurial Lfs Plan](https://www.mercurial-scm.org/wiki/LfsPlan), on push, for files which size is above the threshold (lfs.threshold config) hg client is sending LFS metadata instead of actual files contents. The main part of LFS metadata is SHA-256 of the file content (oid).
The format requires the following mandatory fields: version, oid, size.
When lfs metadata is sent instead of a real file content then lfs_ext_stored flag is in the request's revflags.
If this flag is set, We are ignoring sha-1 hash verification inconsistency.
Later check that the content is actually loaded to the blobstore and create filenode envelope from it, load the envelope to the blobstore.
Filenode envelope requires the following info:
- size - retrieved on fetching the actual data from blobstore.
- copy_from - retrieved from the file, sent by hg client.
Mononoke still does the same checks for LFS push as for non-lfs push (i.e. checks that all the necessary manifests/filelogs were uploaded by a client)
Reviewed By: StanislavGlebik
Differential Revision: D10255314
fbshipit-source-id: efc8dac4c9f6d6f9eb3275d21b7b0cbfd354a736
Summary:
Mercurial stores executable bit as part of the manifest, so if changeset only changes that attribute of a file Hg reuses file hash. But mononoke has been creating additional file node. So this change tries to handle this special case. Note this kind of reuse only happens if file has only one parent [P60183653](P60183653)
Some of our fixtures repo were effected, hence this hashes were replaced with updated ones
```
396c60c14337b31ffd0b6aa58a026224713dc07d => a5ab070634ab9cbdfc92404b3ec648f7e29547bc
339ec3d2a986d55c5ac4670cca68cf36b8dc0b82 => c10443fa4198c6abad76dc6c69c1417b2e821508
b47ca72355a0af2c749d45a5689fd5bcce9898c7 => 6d0c1c30df4acb4e64cb4c4868d4c974097da055
```
Reviewed By: farnz
Differential Revision: D10357440
fbshipit-source-id: cdd56130925635577345b08d8ed0ae6e229a82a7
Summary: Make get_manifest_by_nodeid accept HgManifestId and correct all calls to get_manifest_by_nodeid.
Reviewed By: StanislavGlebik
Differential Revision: D10298425
fbshipit-source-id: 932e2a896657575c8998e5151ae34a96c164e2b2
Summary:
PUT request upload to mononoke API
hg client sends a PUT request to store a file into blobstore during push supporting LFS
Upload file by alias is divied into 2 parts:
- Put alias : blobstore key
- Put blobstore_key: contents
Keep in mind, that file content is thrift encoded
host_address for batch request is from command line flags -H for host, -p for port
Reviewed By: StanislavGlebik
Differential Revision: D10026683
fbshipit-source-id: 6c2726c7fee2fb171582bdcf7ce86b22b0130660
Summary:
JSON blobs let other users of Mononoke learn what they need to know
about commits. When we get a commit, log a JSON blob to Scribe that other users can pick up to learn what they want to know.
Because Scribe does not guarantee ordering, and can sometimes lose messages, each message includes enough data to allow a tailer that wants to know about all commits to follow backwards and detect lost messages (and thus fix them up locally). It's expected that tailers will either sample this data, or have their own state that they can use to detect missing commits.
Reviewed By: StanislavGlebik
Differential Revision: D9995985
fbshipit-source-id: 527b6b8e1ea7f5268ce4ce4490738e085eeeac72
Summary:
We want to be able to do post-commit logging for all changesets. Set
up the data structures I intend to use for now, and arrange to discard all
logging.
A later diff will add logging to a ScribeClient.
Reviewed By: StanislavGlebik
Differential Revision: D9995984
fbshipit-source-id: 796b390f6b83ace576f73a217ac564c4251d7ec5
Summary:
This diff adds a real implementation for CachingChangesetFetcher. Now it
fetches the data for the cache from the blobstore.
The rest is explained in the comments.
Reviewed By: farnz
Differential Revision: D9908320
fbshipit-source-id: 5427f3ed312cb7753434161423cb27b48744347f
Summary:
Initial implementation of ChangesetsFetcher that will use cache smarter.
At the moment it doesn't do anything special, but in the next diffs it will pre
warm cache in case it has a lot of cache misses (that's why it has to have a
reference to the cachelib CachePool).
Reviewed By: farnz
Differential Revision: D9908319
fbshipit-source-id: 6377a947696bae6b060de5a441722c28309b341c
Summary:
High-level goal: we want to make certain big getbundle requests faster. To do
that we'd store blobs of commits that are close to each other in the blobstore
and fetch them only if we had too many cache misses. All this logic will be
hidden in ChangesetFetcher trait implementation. ChangesetFetcher will be
created per request (hence the factory).
Reviewed By: farnz
Differential Revision: D9869659
fbshipit-source-id: 9e3ace3188b3c13f83ef1bd61b668d4f22103f74
Summary:
WIP
Mononoke API download for lfs
support get request
curl http://127.0.0.1:8000/{repo_name}/lfs/download/{sha256}
Reviewed By: StanislavGlebik
Differential Revision: D9850413
fbshipit-source-id: 4d756679716893b2b9c8ee877433cd443df52285
Summary:
Let's check that new case conflicts are not added by a commit.
That diff also fixes function check_case_conflict_in_manifest - it needs to
take into account that if one of the conflicting files was removed then there
is no case conflict.
There should be a way to disable this check because we sometimes need to allow
broken commits. For example, during blobimport
Reviewed By: aslpavel
Differential Revision: D9789809
fbshipit-source-id: ca09ee2d3e5340876a8dbf57d13e5135344d1d36
Summary: Use `ChangesetId` in `DifferenceOfUnionsOfAncestorsNodeStream` instead of `HgNodeHash`. This avoids several bonsai lookups of parent nodes.
Reviewed By: StanislavGlebik
Differential Revision: D9631341
fbshipit-source-id: 1d1be7857bf4e84f9bf5ded70c28ede9fd3a2663
Summary:
Additional 2-step reference for blob:
For each file add an additional blob with:
key = aliases.sha256.sha256(raw_file_contents)
value = blob_key
Pay attention, that sha256 hash is taken `from raw_file_content`, not from a blob content.
Additional blob is sent together with the file content blob.
Reviewed By: lukaspiatkowski, StanislavGlebik
Differential Revision: D9775509
fbshipit-source-id: 4cc997ca5903d0a991fa0310363d6af929f8bbe7
Summary:
In `fetch_file_contents()` `blobstore_bytes.into()` converted the bytes to
`Blob<Id>`. This code calls `MononokeId::from_data()` which calls blake2
hashing. Turns out it causes big problems for large many large files that
getfiles can return.
Since this hash is not used at all, let's avoid generating it.
Reviewed By: jsgf
Differential Revision: D9786549
fbshipit-source-id: 65de6f82c1671ed64bdd74b3a2a3b239f27c9f17
Summary:
Profiling showed that since we are inserting objects into blobstore
sequentially it takes a lot of time for long stacks of commit. Let's do it in
parallel.
Note that we are still inserting sequentially into changesets table
Reviewed By: farnz
Differential Revision: D9683037
fbshipit-source-id: 8f9496b97eaf265d9991b94243f0f14133f463da
Summary:
Use .chain_err() where appropriate to give context to errors coming up from
below. This requires the outer errors to be proper Fail-implementing errors (or
failure::Error), so leave the string wrappers as Context.
Reviewed By: lukaspiatkowski
Differential Revision: D9439058
fbshipit-source-id: 58e08e6b046268332079905cb456ab3e43f5bfcd