Summary: This is needed in the following diff.
Reviewed By: krallin
Differential Revision: D16621517
fbshipit-source-id: 5a50cae7c8b761d7578bcbe5caf302a5ee2578a3
Summary: This updates benchmark_filestore to allow testing with caches (notably, Memcache & Cachelib). It also reads twice now, which is nice for caches that aren't filled by us (e.g. Manifold CDN).
Reviewed By: ahornby
Differential Revision: D16584952
fbshipit-source-id: 48ceaa9f2ea393626ac0e5f3988672df020fbc28
Summary:
NOTE: This changes our file storage format. It's fine to do it now since we haven't started enabling chunking yet (see: D16560844).
This updates the Filestore's chunking to store chunks as their own entity in Thrift, as opposed to have them be just FileContents.
The upside of this approach is that this we can't have an entity that's both a File and a Chunk, which means:
- We don't have to deal with recursive chunks (since, unlike Files, Chunks can't contain be pointers to other chunks).
- We don't have to produce metadata (forward mappings and backmappings) for chunks (the reason we had to produce it was to make sure we wouldn't accidentally produce inconsitent data if the upload for one of our chunks happened to have been tried as a file earlier and failed).
Note that this also updates the return value from the Filestore to `ContentMetadata`. We were using `Chunk` before there because it was sufficient and convenient, but now that `Chunk` no longer contains a `ContentId`, it no longer is convenient, so it's worth changing :)
Reviewed By: HarveyHunt
Differential Revision: D16598906
fbshipit-source-id: f6bec75d921f1dea3a9ea3441f57213f13aeb395
Summary:
This allows running blobimport multiple times over the same path locally (with a blob files storage, for example), which is how we use it in prod (but there we don't use the file blobstore so it works).
This is helpful when playing around with local changes to blobimport.
Reviewed By: HarveyHunt
Differential Revision: D16580697
fbshipit-source-id: 4a62ff89542f67ce6396948c666244ef40ffe5e7
Summary:
This is a very trivial patch simply intended to make it easier to safely roll
out the Filestore. With this, we can roll out the Filestore with chunking fully
disabled and make sure all hosts know how to read chunked data before we turn
it on.
Reviewed By: aslpavel, HarveyHunt
Differential Revision: D16560844
fbshipit-source-id: 30c49c27e839dfb06b417050c0fcde13296ddade
Summary: This adds a filestore benchmark that allows for playing around with Filestore parameters and makes it easier to measure performance improvements.
Reviewed By: aslpavel
Differential Revision: D16559941
fbshipit-source-id: 50a4e91ad07bf6f9fc1efab14aa1ea6c81b9ca27
Summary:
This adds support for LFS filenodes in blobimport. This works by passing a `--lfs-helper` argument, which should be an executable that can provide a LFS blob's contents on stdout when called with the OID and size of a LFS blob. My thinking is to `curl` directly from Dewey when running this in prod.
Note that, as of this change, we can blobimport LFS files, but doing so might be somewhat inefficient, since we'll roundtrip the blobs to our filestore, then generate filenodes. For now, however, I'd like to start with this so we can get a sense of whether this is acceptable performance-wise.
Reviewed By: farnz
Differential Revision: D16494241
fbshipit-source-id: 2ae032feb1530c558edf2cfbe967444a9a7c0d0f
Summary:
NOTE: This isn't 100% complete yet. I have a little more work to do around the aliasverify binary, but I think it'll make sense to rework this a little bit with the Filestore anyway.
This patch incorporates the Filestore throughout Mononoke. At this time, what this means is:
- Blobrepo methods return streams of `FileBytes`.
- Various callsites that need access to `FileBytes` call `concat2` on those streams.
This also eliminates the Sha256 aliasing code that we had written for LFS and replaces it with a Filestore-based implementation.
However, note that this does _not_ change how files submitted through `unbundle` are written to blobstores right now. Indeed, those contents are passed into the Filestore through `store_bytes`, which doesn't do chunking. This is intentional since it lets us use LFS uploads as a testbed for chunked storage before turning it on for everything else (also, chunking those requires further refactoring of content uploads, since right now they don't expect the `ContentId` to come back through a Future).
The goal of doing it this way is to make the transition simpler. In other words, this diff doesn't change anything functionally — it just updates the underlying API we use to access files. This is also important to get a smooth release: it we had new servers that started chunking things while old servers tried to read them, things would be bad. Doing it this way ensures that doesn't happen.
This means that streaming is there, but it's not being leveraged just yet. I'm planning to do so in a separate diff, starting with the LFS read and write endpoints in
Reviewed By: farnz
Differential Revision: D16440671
fbshipit-source-id: 02ae23783f38da895ee3052252fa6023b4a51979
Summary:
NOTE: This diff updates Thrift serialization for chunked files. Normally, this wouldn't be OK, but we never stored anything in this format, so that's fine.
This updates the Filestore to rebuild backmappings on read. The rationale for this is that since we store backmappings *after* writing FileContents, it is possible to have FileContents that logically exist but don't have backmappings. This should be a fairly exceptional case, but we can handle it by recomputing the missing backmappings on the fly.
As part of this change, I'm updating our Thrift serialization for chunked files and backmappings (as mentioned earlier, this is normally not a good idea, but it should be fine here):
- Chunked files now contain chunk lengths, which lets us derive offsets as well as total size for a chunked file.
- Backmappings don't contain the size anymore (since FileContents contains it now).
This was necessary because we need to have a file's size in order to recompute backmappings.
In this change, I also updated GitSha1 to stop embedding its blob type and length. This lets us reconstruct a strongly-typed GitSha1 object from just the hash (and was therefore necessary to avoid duplicating file size into backmappings), which seems sufficient at this stage (and if we do need the size, we can obtain it through the Filestore by querying the file).
Reviewed By: aslpavel
Differential Revision: D16440676
fbshipit-source-id: 23b66caf40fde2a2f756fef89af9fe0bb8bdadef
Summary:
This adds support for chunking in the Filestore. To do so, this reworks writes to be a 2 stage process:
- First, you prepare your write. This puts everything into place, but doesn't upload the aliases, backmappings, and the logical FileContents blob representing your file. If you're uploading a file that fits in a single chunk, preparing your writes effectively makes no blobstore changes. If you're uploading chunks, then you upload the individual chunks (those are uploaded essentially as if they were small files).
- Then, you commit your write. At that point, prepared gave you `FileContents` that represent your write, and a set of aliases. To commit your write, you write the aliases, then the file contents, and then a backmapping.
Note that we never create hierarchies when writing to the Filestore (i.e. chunked files always reference concrete chunks that contain file data), but that's not a guarantee we can rely on when reading. Indeed, since chunks and files are identical as far as blobstore storage is concerned, we have to handle the case where a chunked file references chunks that are themselves chunked (as I mentioned, we won't write it that way, but it could happen if we uploaded a file, then later on reduced the chunk size and wrote an identical file).
Note that this diff also updates the FileContents enum and adds unimplemented methods there. These are all going away later in this stack (in the diff where I incorporate the Filestore into the rest of Mononoke).
Reviewed By: StanislavGlebik
Differential Revision: D16440678
fbshipit-source-id: 07187aa154f4fdcbd3b3faab7c0cbcb1f8a91217
Summary:
Currently, we require in the MononokeId trait that the Id know how to recompute itself from a value (and this has to return `Self, not `Result<Self>`, so it also can't fail).
This isn't ideal for a few reasons:
- It means we cannot support the MononokeId trait (and therefore typed blobstore fetches) for things that aren't purely a hash of their contents
- One workaround (which jsgf had implemented for ContentAliasBackmappingId) is to peek at the contents and embed an ID in there. A downside of this approach is that we end up parsing the content twice when loading a value from a blobstore (once to get the ID, and once to get the contents later).
- It's a little inefficient in general, since it means we recompute hashes of things we just fetched just to know what their hash should be (which we often proceed to immediately discard afterwards). This could be worth doing if we verified that the ID we got is the ID wee wanted, but we don't actually do this right now.
Reviewed By: StanislavGlebik
Differential Revision: D16486775
fbshipit-source-id: a75864eed3efa7e07b8bf642dbac3ada00cadc7c
Summary:
It's not cool to include the `flat/` prefix into the queue. Not cool at all.
It's not part of the actual blobstore key.
Let's make the populator cool again.
Reviewed By: ahornby
Differential Revision: D16494119
fbshipit-source-id: 8d3a34271d19ab0531cbb59e56b7a1ac04f66b07
Summary:
Before this change, we would always include the shard id in our mysql-related fb303 counters. This is not perfect for two reasons:
- the the xdb blobstore we have 4K shards and 24 counters, so we were reporting 96K counters in total
- we rarely care about per-counter metrics anyway, since in most cases all queries are uniformly distributed
Therefore, let's change this approach to not use per-shard counters and use per-shardmap ones (when sharding is involved).
Reviewed By: krallin
Differential Revision: D16360591
fbshipit-source-id: b2df94a3ca9cacbf5c1f328b48e87b48cd18287e
Summary:
Improve memory usage on mononoke startup and reduce number of small allocations. Done via:
* Pre-size CHashMap used by SkiplistEdgeMapping, working around the 4x multiplier and lack of load factor awareness in CHashMap::with_capacity
* Add a SingleEdge optimization to SkiplistNodeType so as to save vector overhead in the case of one edge ( this is the common case )
* Size the HashMap in rust thrift deserialization with HashMap::with_capacity
Reviewed By: krallin
Differential Revision: D16265993
fbshipit-source-id: 99c3a7149493d824a3c00540bc5557410d0273fc
Summary:
My previous diff (D16327788), despite claiming a 1000x improvement, was merely a ~220x
improvement. This diff is some tweaking with numbers, whihc brings about
another 1.3x improvement, bringing the total one to be 290x.
Reviewed By: krallin
Differential Revision: D16356932
fbshipit-source-id: 3d3f0c844eec9866217cf5a57f285fe8a56152de
Summary:
In earlier diffs in this stack, I updated the callsites that reference XDB tiers to use concrete &str types (which is what they were receiving until now ... but it wasn't spelled out as-is).
In this diff, I'm updating them to use owned `String` instead, which lets us hoist up `to_string()` and `clone()` calls in the stack, rather than pass down reference only to copy them later on.
This allows us to skip some unnecessary copies. Tt turns out we were doing quite a few "turn this String into a reference, pass it down the stack, then turn it back into a String".
Reviewed By: farnz
Differential Revision: D16260372
fbshipit-source-id: faec402a575833f6555130cccdc04e79ddb8cfef
Summary:
This implements and uses the `add_many` method of the blob healer queue. This method allows us to do batched adds, which in turn allows us to use `chunks` on Manifold iteration.
NB 1: I deliberately removed control symbols form progress print message. If we only print it on the same line, we loose it when the job crashes.
NB 2: I deliberately use range of `entries[0]`, as I want to pessimistically restart from the earliest in case of a failure.
Reviewed By: krallin
Differential Revision: D16327788
fbshipit-source-id: 8d9f3cf85ee7cbca657a8003a787b5ea84a1b9b0
Summary:
Instantiating a new DB connection may require remote calls to be made to e.g. Hipster to allocate a new certificate (this is only the case when connecting to MySQL).
Currently, our bindings to our underlying DB locator make a blocking call to pretend that this operaiton is synchronous: https://fburl.com/ytmljxkb
This isn't ideal, because this call might actually take time, and we might also occasionally want to retry it (we've had issues in our MySQL tests with acquiring certificates that retrying should resolve). Running this synchronously makes doing so inefficient.
This patch doesn't update that, but it fixes everything on the Rust side of things to stop expecting connections to return a `Result` (and to start expecting a Future instead).
In a follow up diff, I'll work on making the changes in common/rust/sql to start returning a Future here.
Reviewed By: StanislavGlebik
Differential Revision: D16221857
fbshipit-source-id: 263f9237ff9394477c65e455de91b19a9de24a20
Summary: This helps us investigate the workings of the (upcoming) xdb blobstore.
Reviewed By: StanislavGlebik
Differential Revision: D16208770
fbshipit-source-id: 33542f3d34a5c8b4287bb14b0aa97d3802b0e0d6
Summary: Tool that can run generation of hg manifests from bonsai, and compare them with already recorded one.
Reviewed By: StanislavGlebik
Differential Revision: D16131666
fbshipit-source-id: 8c59ec0fdf03431af85c886c22ee269cecc8e8c4
Summary:
This means that we can later destructure and rebuild the `CensoredBlob` struct. If it is only
possible to construct it from a talbe name, there's no way to get this name
back from a destructured struct.
Note: I do this to eventually rebase D16163225 on top of this one, then be able implement `in_memory_writes_READ_DOC_COMMENT` in terms of destructuring of the existing `CensoredBlob`.
Reviewed By: krallin
Differential Revision: D16180745
fbshipit-source-id: e587675d444bf53f0bd25f01602503abad9d7d97
Summary:
Report to Scuba whenever someone tries to access a blobstore which is blacklisted. Scuba reporting is done for any `get` or `put` method call.
Because of the possible overload - given the high number of requests mononoke receives and that CensoredBlobstore make the verification before we add the caching layer for blobstores - I considered reporting at most one bad request per second. If multiple requests to blacklisted blobstores are made in less than one second, only the first request should be reported. Again, this is not the best approach (to not report all of them), but performance wise is the best solution.
NOTE: I also wrote an implementation using `RwLock` (instead of the current `AtomicI64`), but atomic variables should be faster than using lockers so I gave up on that idea.
Reviewed By: ikostia, StanislavGlebik
Differential Revision: D16108456
fbshipit-source-id: 9e5338c50a1c7d15f823a2b8af177ffdb99e399f
Summary: With these fixes running the healer for `repo_id` 502 seems to work.
Reviewed By: aslpavel
Differential Revision: D16140872
fbshipit-source-id: bf2caf7428a74cf14cd61ea8b52b7e5822432646
Summary: Log when a `censored file` is accessed via calling `get` or `put` method.
Reviewed By: StanislavGlebik
Differential Revision: D16089138
fbshipit-source-id: 0d2f0e21e7afcad8783be7587e6b676af20ba029
Summary:
`local_instances` option was used to create fileblobstore or sqlite blobstore.
Now we use mononoke config for this purpose. Since this option is no longer
useful let's delete it
Reviewed By: krallin
Differential Revision: D16120065
fbshipit-source-id: 375a168b27e7f2cf1a6a77f487c5e013f9004546
Summary:
`MultiplexedBlobstore` can hide errors up until we suddenly lose availabilty of the wrong blobstore. Introduce an opt-in `ScrubBlobstore`, which functions as a `MultiplexedBlobstore` but checks that the combination of blobstores and healer queue should result in no data loss.
Use this new blobstore in the blobrepo checker, so that we can be confident that data is safe.
Later, this blobstore should trigger the healer to fix "obvious" problems.
Reviewed By: krallin
Differential Revision: D15353422
fbshipit-source-id: 83bb73261f8ae291285890324473f5fc078a4a87
Summary:
Added an option to control for which repositories should censoring be
enabled or disabled. The option is added in `server.toml` as `censoring` and it
is set to true or false. If `censoring` is not specified, then the default
option is set to true ( censoring is enabled).
By disabling `censoring` the verification if the key is blacklisted or not is
omitted, therefor all the files are fetchable.
Reviewed By: ikostia
Differential Revision: D16029509
fbshipit-source-id: e9822c917fbcec3b3683d0e3619d0ef340a44926
Summary:
Subcommand `blacklist` to censor multiple files. It stores the censored keys of the blacklisted files in table censored_contents.
The subcommand takes as arguments
- list of files
- commit hash
- string, representing the task ID for censoring the files (the task should contain the reason why the files is censored)
Reviewed By: ikostia
Differential Revision: D15939450
fbshipit-source-id: 7261ab2358cc4905d61a14f354de2949a2a94e7c
Summary:
CensoredBlob was placed between Blobstore and PrefixBlobstore. I moved CensoredBlob, so that now it is a wrapper for PrefixBlobstore. This means the key is compared before appending the `repoid` to the key.
By moving CensoredBlob on top of PrefixBlobstore, it provides better isolation for the existing blobstores. This way CensoredBlob does not interact with the underlaying layers and future changes of those layers will, most probably, not impact CensoredBlob's implementation.
Reviewed By: ikostia
Differential Revision: D15900610
fbshipit-source-id: 391594355d766f43638f3152b56d4e9acf49af32
Summary: We are copying `MPath` a lot, this change should reduce amount of allocations and memory reuse for `MPath`
Reviewed By: farnz
Differential Revision: D15939495
fbshipit-source-id: 8da8f2c38f7b46f27d0df661210c9964aed52101
Summary: This adds a command in Mononoke Admin to list all bookmarks.
Reviewed By: ikostia
Differential Revision: D15737660
fbshipit-source-id: 5d4e649adf09c5f601b149a87515707f3d88b203
Summary: We have the same pattern 3 times, and I have at least two more instances of this pattern to add. Factor it out.
Reviewed By: StanislavGlebik
Differential Revision: D15202663
fbshipit-source-id: 9c0de8ccef71964e65389e0bf0f2b0fc383f7c1d
Summary: This help text was wrong. This fixes it.
Reviewed By: HarveyHunt
Differential Revision: D15555767
fbshipit-source-id: 5202d46a8527c9e44c5588d52f7425d3efb38898
Summary:
As part of adding support for infinitepush in Mononoke, we'll include additional server-side metadata on Bookmarks (specifically, whether they are publishing and pull-default).
However, we do use the name `Bookmark` right now to just reference a Bookmark name. This patch updates all reference to `Bookmark` to `BookmarkName` in order to free up `Bookmark`.
Reviewed By: StanislavGlebik
Differential Revision: D15364674
fbshipit-source-id: 126142e24e4361c19d1a6e20daa28bc793fb8686
Summary:
`Phases` currently have very ugly API, which is constant source of confusion. I've made following changes
- only return/cache public phases
- do not require `public_heads` and always request them from `BlobRepo::get_bonsai_heads_maybe_stale`
- consolidated `HintPhases` and `SqlPhases`
- removed `Hint` from types which does not carry any meaning
- fixed all effected callsites
Reviewed By: StanislavGlebik
Differential Revision: D15344092
fbshipit-source-id: 848245a58a4e34e481706dbcea23450f3c43810b
Summary: Let's make it a bit more manageable.
Reviewed By: StanislavGlebik
Differential Revision: D15407978
fbshipit-source-id: 490e4d48c36349cf65171f38df0ef0372883281d
Summary: Exploit the fact that database parents are faster than blobstore reads to get more concurrency.
Reviewed By: aslpavel
Differential Revision: D15202668
fbshipit-source-id: e2f35f80531e4a558b40b969cb6d7b9434919a9f
Summary: I have three similar chunks of boilerplate code; there's two more that I plan to write. Factor it out.
Reviewed By: aslpavel
Differential Revision: D15202667
fbshipit-source-id: 16b7b2d44852679d33b44da57c003e7833539f00
Summary: We can't recreate HgChangesets from Bonsai cleanly - we have imported with broken hashes created by Mercurial. Make sure that they're not lost in the checker.
Reviewed By: StanislavGlebik
Differential Revision: D15120894
fbshipit-source-id: 0eab8ad3924e961f3603dd465d87702f456be36f
Summary: When you ask a repo for parents, it goes to a database in preference to the blobstore. Compare the blobstore and database parents, to ensure they match
Reviewed By: aslpavel
Differential Revision: D15120890
fbshipit-source-id: 1123fba8c97865affe86c06965c78d70d61e3d7a
Summary:
When underlying storage layers give us trouble, we want to be able to validate that nothing is broken.
Use `BlobRepo` to fetch all the data for history of a given bookmark, so that we can be confident that we have sufficient data to not lose history.
Later diffs will add support for checking the saved Mercurial changesets, too
Reviewed By: StanislavGlebik
Differential Revision: D15084862
fbshipit-source-id: 8c7184cf1cd0d52ca2ca9c8ce1d7e97b1e51db6d
Summary: Practice commit. The behaviour will be implemented in the future
Reviewed By: ikostia
Differential Revision: D15372554
fbshipit-source-id: c9b3de81fcd2d79d34b8f8c898e28de2564cd887
Summary: I'm going to be doing some work on this file, but it's not up-to-date with rustfmt. To minimize merge conflicts and simplify diff reviews, I ran that earlier.
Reviewed By: StanislavGlebik
Differential Revision: D15364673
fbshipit-source-id: 32658629cc7ee29cdfc21be8dd526766d2dcbb0e
Summary: I'm going to be doing some work on this file, but it's not up-to-date with rustfmt. To minimize merge conflicts and simplify diff reviews, I ran that earlier.
Reviewed By: StanislavGlebik
Differential Revision: D15364672
fbshipit-source-id: dd420ed8fc625142b686d08f6459148c73891a20
Summary:
Printint id will make it easier to debug sync job problems (for example,
rewiding the latest replayed id counter)
Reviewed By: krallin
Differential Revision: D15322624
fbshipit-source-id: 5c94be9cc0dcced9df51162adb598b6498f1c749
Summary:
This change has two goals:
- Put storage configuration that's common to multiple repos in a common place,
rather than replicating it in each server.toml
- Allow tools that only operate on the blobstore level - like blobstore healing
- to be configured directly in terms of the blobstore, rather than indirectly
by using a representative repo config.
This change makes several changes to repo configuration:
1. There's a separate common/storage.toml which defines named storage
configurations (ie, a combination of a blobstore and metadata DB)
2. server.toml files can also define local storage configurations (mostly
useful for testing)
3. server.toml files now reference which storage they're using with
`storage_config = "name"`.
4. Configuration of multiplex blobstores is now explicit. Previously if a
server.toml defined multiple blobstores, it was assumed that it was a
multiplex. Now storage configuration only accepts a single blobstore config,
but that config can be explicitly a multiplexed blobstore, which has the
sub-blobstores defined within it, in the `components` field. (This is
recursive, so it could be nested, but I'm not sure if this has much value in
practice.)
5. Makes configuration parsing more strict - unknown fields will be treated as
an error rather than ignored. This helps flag problems in refactoring/updating
configs.
I've updated all the configs to the new format, both production and in
integration tests. Please review to make sure I haven't broken anything.
Reviewed By: StanislavGlebik
Differential Revision: D15065423
fbshipit-source-id: b7ce58e46e91877f4e15518c014496fb826fe03c
Summary:
Seems redundant to also require callers to open_ssl to also pass a
(mostly) identical string.
Also make open_ssl special-case filenodes with sharding (though filenodes
aren't currently opened through it).
Reviewed By: StanislavGlebik
Differential Revision: D15157834
fbshipit-source-id: 0df45307f17bdb2c021673b3153606031008bee2
Summary:
This migrates the internal structures representing the repo and storage config,
while retaining the existing config file format.
The `RepoType` type has been replaced by `BlobConfig`, an enum containing all
the config information for all the supported blobstores. In addition there's
the `StorageConfig` type which includes `BlobConfig`, and also
`MetadataDBConfig` for the local or remote SQL database for metadata.
Reviewed By: StanislavGlebik
Differential Revision: D15065421
fbshipit-source-id: 47636074fceb6a7e35524f667376a5bb05bd8612
Summary:
Reads current replay counter, and where a bookmark would point to after this
bundle is replayed. That can be useful for debugging
Reviewed By: aslpavel
Differential Revision: D15216378
fbshipit-source-id: fd250e27c2a6d7ee407510561a36b820cc5a1d2b
Summary:
In the later diff we'll add batching of BlobstoreSyncQueue writes. It would be
much harder to add the batching if we also have to return this boolean.
And since noboby uses it, let's just remove it
Reviewed By: farnz
Differential Revision: D15248290
fbshipit-source-id: 72c64770c1b023e9de23a5dfccd8b4482302fe96
Summary:
The mononoke admin integration tests can be flaky when there is logging and an error, because those are respectively sent to stdout and stderr, which means they're not ordered relative to one another.
I attempted to fix this with minimal changes in D15146392, but that didn't solve the issue: StanislavGlebik reported that he still ran into a flaky test.
The reason for this is presumably that even though we write to stderr first then to stdout, there's no guarantee that the `.t` test runner will read whetever we output to stderr before it reads what we output to stdout.
I noted in that earlier diff that a more proper fix would be to write errors to stderr so they are indeed ordered relative to logging. That is what this diff does.
For consistency, I updated other fatal outcomes (bad arguments) to also log to stderr.
Reviewed By: StanislavGlebik
Differential Revision: D15181944
fbshipit-source-id: 3ca48870c39f11a7dcc57f1341f25ce61ccae360
Summary:
1) I don't think anybody uses it
2) Hook tailer has the same functionality
Reviewed By: farnz
Differential Revision: D15216418
fbshipit-source-id: 698fc7d998475fe77ff7bf1ac55068ee75a34acc
Summary:
In the case of mononoke's admin tool it's annoying for users to be required to run myrouter in the background and provide myrouter port to every command.
Thanks to this change it is no longer necessary to run admin commands through myrouter - the tool will simply use a direct connection to XDB using the sql crate.
It is important to note that the raw XDB connection via sql crate doesn't have connection pooling and doesn't handle XDB failover so it is crucial that it is never used for long-lived or request heavy use cases like running mononoke server or blobimport
Reviewed By: jsgf
Differential Revision: D15174538
fbshipit-source-id: 299d3d7941ae6aec31961149f926c2a4965ed970
Summary:
Disallow unknown fields. They're generally the result of a mis-editing
a file and putting the config in the wrong place, or some incomplete refactor.
Reviewed By: StanislavGlebik
Differential Revision: D15168963
fbshipit-source-id: a9c9658378cda4866e44daf6e2c6bfbdfcdb9f84
Summary:
Currently this checks for:
- all referenced files are present
- they're well-formed toml
- all the required config keys are set, and no unknown ones are set
- blob storage and metadata db are both remote or local
- repo ids are not duplicated
Reviewed By: farnz
Differential Revision: D15068649
fbshipit-source-id: ace0e7bc52bf853ac42384c4346c3b73591312e4
Summary:
One of the tests for mononoke admin (which I introduced recently) appears to be flaky. Sometimes, output from stdout (the error that terminates the program) and stderr (logs emitted while the program runs) is flipped.
I suspect this is caused by buffering, so this patch flushes all output before writing the error (if there was one).
An alternative approach here might be to write the final error to stderr instead of stdosut (so everything goes to stderr). That feels cleaner, but it does change the interface a little bit, so I didn't take that approach just yet. That said, if nobody objects, I'm happy to pick that approach instead.
Reviewed By: farnz
Differential Revision: D15146392
fbshipit-source-id: 67481afd4802cb48d24d19052988be4a83433efd
Summary: This adds the ability to exclude blobimport entries when querying the count of remaining entries in the HG sync replay log.
Reviewed By: ikostia
Differential Revision: D15097549
fbshipit-source-id: ae1a9a31f51a044924fdebbdd219fff1e2b3d46a
Summary:
This introduces a new `--skip` flag in Mononoke admin under `hg-sync-bundle last-processed`. This will update the last-synced counter to the last blobimport that preceeds a log entry that is no a blobimport.
In other words, it makes it so that the last change to be processed is *not* a blobimport.
It will fail if:
- There is no valid log entry to jump ahead to (e.g. all the further log entries are blobimports).
- The current change to be processed is not a blobimport.
- The mutable counter was changed by someone else in the meantime.
Reviewed By: ikostia
Differential Revision: D15081759
fbshipit-source-id: 8465321b08d9c7b5bc97526400518bcf3ac77f13
Summary: This adds a command in mononoke admin to verify the consistency of remaining bundles to sync -- i.e. whether all bundles are blobimports or all of them are not blobimports.
Reviewed By: ikostia
Differential Revision: D15097935
fbshipit-source-id: a0df221c38e84897213edf232972ba420977e9d3
Summary:
The code in admin/main.rs passes `bookmarks` as the name of the SQLBookmarks database, but other pieces of code (the server which writes this data, and the hg sync job that normally consumes it) use `books`.
This hasn't been a problem until now since no integration tests exist for the admin tool. This path mismatch was never exposed since in production we don't actually care about this name.
Reviewed By: ikostia, StanislavGlebik
Differential Revision: D15081757
fbshipit-source-id: c4fcd328568160023f3c15fa2ab7d77accf2ad68
Summary: Now it is possible to configure and enable/disable bookmark cache from configs
Reviewed By: StanislavGlebik
Differential Revision: D14952840
fbshipit-source-id: 3080f7ca4639da00d413a949547705ad480772f7
Summary: We'll do batching to save time on the sync job. We need to sync faster
Reviewed By: ikostia, farnz
Differential Revision: D14929027
fbshipit-source-id: 3139d0ece07f344cdafa5e39b698bc3b02625f0a
Summary:
Add a functionality to show the log of a bookmark i.e. show previous positions of the bookmark. It should look like
mononoke_admin bookmarks log master
2d0e180d7fbec1fd9825cfb246e1fecde31d8c35 push March 18, 15:03
9740f4b7f09a8958c90dc66cbb2c79d1d7da0555 push March 17, 15:03
b05aafb29abb61f59f12fea13a907164e54ff683 manual move March 17, 15:03
...
Reviewed By: StanislavGlebik
Differential Revision: D14639245
fbshipit-source-id: 59d6a559a7ba9f9537735fa2e36fbd0f3f9db77c
Summary:
The main fix is in speeding up sql query that returns entries to heal.
The sql query was slow in the case when there are a lot of entries for one
repo and few entries for another repo. Selecting entries for smaller repo can
become too slow because mysql has to scan the whole table in order to sort
entries. Since ordering by id doesn't look necessary I suggest to just remove
them.
Also waiting for 1 minute between heal attemps is too slow
There are a few more smaller fixes - replacing join_all with more efficient
futures_unordered and doing batch delete of entries from the sync queue
Reviewed By: aslpavel
Differential Revision: D14598578
fbshipit-source-id: e8d302aab7b5a4bc16c63e14228713b75295e97a
Summary: Slim down the blobstore trait crate as much as possible.
Reviewed By: aslpavel
Differential Revision: D14542675
fbshipit-source-id: faf09255f7fe2236a491742cd836226474f5967c
Summary:
Pointing the configs to a new package allows us to control the rollout more easily than overriding the existing skiplist in manifold.
This skiplist features a single skip entry for each node with a depth of up to 512 commits. Previously depth was 32k. This means on average we will have to do 16k iterations to find the commit we want after doing the last skiplist jump.
If we were to jump to the very beginning of the repo that would mean before we had to do ~130 skips and then on average 16k steps. With this skiplist we would do ~8k skips and 256 steps. As skips and steps are equally expensive this should be faster in almost every scenario. The total size of the skiplist is not affected by this change (as each node still contains one skip entry) but the new skiplist also covers the complete history, not only the 2kk most recent commits. This leads to size increase of ~20%.
Reviewed By: StanislavGlebik, HarveyHunt
Differential Revision: D14540318
fbshipit-source-id: 86fda5b6c57a06bc4a77e30625014ec119e7a155
Summary:
- healer runs on all repositories at once, and queries for some repositories are timing out
- It is now possible to run healer just for specified repository
Reviewed By: HarveyHunt
Differential Revision: D14539978
fbshipit-source-id: 9139999da97b2655ae9312c33c9e8c15f0b24016
Summary:
- convert to 2018 edition, and removed all `extern crate`
- wait for `myrouter` to be available before actually doing anything
Reviewed By: HarveyHunt
Differential Revision: D14524247
fbshipit-source-id: ebe2e2e74935f00c87945129370f268c794fcab7
Summary: To learn how far behind are we in the absolute bundle numbers.
Reviewed By: StanislavGlebik
Differential Revision: D14491672
fbshipit-source-id: 31d16f115b2b6fe4b88c25a847ce229e123b048b
Summary: We want to be able to manipulate hg-sync counters from the Mononoke admin.
Reviewed By: StanislavGlebik
Differential Revision: D14477676
fbshipit-source-id: 11218390bf469d4f297f7f13e9daee2d5f9bb35b
Summary: See D14279065, this diff is simply to clean up the deprecated code
Reviewed By: StanislavGlebik
Differential Revision: D14279210
fbshipit-source-id: 10801fb04ad533a80bb7a2f9dcdf3ee5906aa68d
Summary: use common funcions to parse --myrouter-port, this is a simple clean up
Reviewed By: StanislavGlebik
Differential Revision: D14084003
fbshipit-source-id: 63d6c8301e977faead62cb1c705bac372d56594e
Summary: New commits should be logged to scribe, these will be used to trigger the update for the hg clone streamfile.
Reviewed By: lukaspiatkowski
Differential Revision: D14022599
fbshipit-source-id: a8a68f12a8dc1e65663d1ccf1a5eafa54ca2daf0
Summary: Tool that is used to populate blobstore synchronization queue from manifold bucket
Reviewed By: StanislavGlebik
Differential Revision: D13880216
fbshipit-source-id: 6527e4f7027dc8ce8e73f9949e080e53eda1c541
Summary:
This is the first step in adding support for tracking all bookmark moves. They
will be recorded in the separate mysql table in the same transaction as
bookmark is updated.
That gives us two things:
1) Ability to inspect all bookmark moves and debug issues with them
2) Also record which mercurial bundle moved a bookmark if any so that we could
later replay these bundles in correct order on hg
Add a struct that let us track bookmark moves.
Reviewed By: ikostia
Differential Revision: D13958872
fbshipit-source-id: 9adfee6d977457db5af4ad5d3a6734c73fcbcd76
Summary: The Copy trait means that something is so cheap to copy that you don't even need to explicitly do `.clone()` on it. As it doesn't make much sense to pass &i64 it also doesn't make much sense to pass &<Something that is Copy>, so I have removed all the occurences of passing one of ouf hashes that are Copy.
Reviewed By: fanzeyi
Differential Revision: D13974622
fbshipit-source-id: 89efc1c1e29269cc2e77dcb124964265c344f519
Summary:
Repo name is used only be verify_integrity hook and even there the name that Mononoke provides is incorrect. Instead of Mononoke's `repo-RepositoryId(1234)` name the hook is interested in Mercurial's `fbsource` name.
HookConfig is a perfect way to pass such an arbitrary parameter so use it.
Reviewed By: StanislavGlebik
Differential Revision: D13964486
fbshipit-source-id: 94090e409d5206828364202ae62a37abc16e4a27
Summary: Move complex things out of `blobstore` to thin out the dep graph. This didn't work as well as I'd hoped because `blobstore`->`mononoke-types`->`thrift`/`sql`.
Reviewed By: StanislavGlebik
Differential Revision: D13915511
fbshipit-source-id: c210dda23fa7102168c0ca14a035ed5c03a6993c
Summary:
`blobrepo_factory` is a crate that knows how to create blobrepo given
a configuration i.e. it creates blobstores, filenodes, changesets etc and
initializes blobrepo with them.
`post_commit` is a small part of blobrepo which can also be extracted from
blobrepo crate.
There are a few upsides with this approach
1) Less dependencies on Blobrepo, meaning we have to rebuild it fewer times
2) BlobRepo compilation is faster
Reviewed By: jsgf
Differential Revision: D13896334
fbshipit-source-id: 1f5701341f01fcefff4e5f9430ddc914b9496064
Summary:
Let's split reachability index crate. The main goal is to reduce compilation
time. Now crates like revsets will only depend on traits definition but not on
the actual implementation (skiplist of genbfs).
Reviewed By: lukaspiatkowski
Differential Revision: D13878403
fbshipit-source-id: 022eca50ac4bc7416e9fe5f3104f0a9a65195b26
Summary:
Some crates, namely revsets and reachabilityindex, currently depend on
blobrepo, while all they need is the ability to fetch commits.
By moving changeset_fetcher outside this dependency will be removed. That may
make builds faster
Reviewed By: lukaspiatkowski
Differential Revision: D13878369
fbshipit-source-id: 9ee8973a9170557a4dede5404dd374aa4a000405
Summary:
The main reason for doing it is to remove dependency on `BlobRepo` from hooks. Most of the `hooks` crate code needs from `BlobRepo` just a HgBlobChangeset type, which was moved to a separate crate in one of the previous diffs. There is just a small piece of code that depends on blobrepo, and it was moved in the separate crate.
Because of that changing anything in BlobRepo won't trigger rebuilding of most of the hooks crate.
Reviewed By: lukaspiatkowski
Differential Revision: D13878208
fbshipit-source-id: d74336e959282c176258c653d4c408854e1f1849
Summary:
Currently if a crate depends even on a single type from metaconfig then in
order to compile this trait buck first compiles metaconfig crate with all the
logic of parsing the configs.
This diff split metaconfig into two crates. The first one just holds the types for
"external consumption" by other crates. The second holds the parsing logic.
That makes builds faster
Reviewed By: jsgf, lukaspiatkowski
Differential Revision: D13877592
fbshipit-source-id: f353fb2d1737845bf1fa0de515ff8ef131020063
Summary:
Previously blobimport_lib was part of cmdlib. That means that to compile tools
like `mononoke_admin` or `alias_verify` we had to also compile blobimport_lib
code. By moving it outside of cmdlib we are making compilation faster
Reviewed By: HarveyHunt
Differential Revision: D13877528
fbshipit-source-id: c59f92589134588fee2e28ffeef9990bb7e1631f
Summary: In this diff the configs are parsed from toml and passed around to hook's execution context. The actual usage of configs will be introduced in separate diff.
Reviewed By: StanislavGlebik
Differential Revision: D13862837
fbshipit-source-id: 60ac10aa9c25d224e703e1e55bef13dc481ba07e
Summary:
Previously pushrebasing an empty commit failed because we assumed that root
manifest of a commit is always sent in a bundle. This diff removes this
assumption
Reviewed By: lukaspiatkowski
Differential Revision: D13818556
fbshipit-source-id: 44e96374ae343074f48e42a90c691b21e3c41386
Summary:
Previously we rely on CachingChangesetFetcher to quickly fetch all commits into
memory, but CachingChangesetFetcher was deleted in D13695201. Instead let's use
`get_many()` method from Changesets trait to quickly fetch many changesets at
once into memory.
Reviewed By: lukaspiatkowski
Differential Revision: D13712783
fbshipit-source-id: 12e8fa148f7989028547ac8d374438e23b44b6d1
Summary:
We decided to populate phases table in 2 places: blobimport and push-rebase
(alredy done).
This diff is for blobimport. We know the commits are public.
Reviewed By: lukaspiatkowski
Differential Revision: D13731900
fbshipit-source-id: b64e5643e7cffd9e8fb842e9929f4c1ee7a66197
Summary:
This version still misses:
- proper production-ready logging
- smarter handling of case where the queue entries related to each other do not fit in the limit or `older_than` limit, so the healer will heal much more entries without realizing it shouldn't do so.
Reviewed By: aslpavel
Differential Revision: D13528686
fbshipit-source-id: 0245becea7e4f0ac69383a7885ff3746d81c4add
Summary: Instead of dumping the debug output we print the most important information: Changeset id, author, message and file changes.
Reviewed By: StanislavGlebik
Differential Revision: D13621492
fbshipit-source-id: ea0f93f58516cc759d0dc9aac14545b1827ea136
Summary: Format files effected by next commit in a stack
Reviewed By: StanislavGlebik
Differential Revision: D13650639
fbshipit-source-id: d4e37acd2bcd29b291968a529543c202f6944e1a
Summary: There's nothing Mercurial-specific about identifying a repo. This also outright removes some dependencies on mercurial-types.
Reviewed By: StanislavGlebik
Differential Revision: D13512616
fbshipit-source-id: 4496a93a8d4e56cd6ca319dfd8effc71e694ff3e
Summary:
Now that mononoke's config can be built using normal fbpkg tools,
we can remove the ability for mononoke_admin to build configuration.
Reviewed By: StanislavGlebik
Differential Revision: D13399189
fbshipit-source-id: 17aff327773b9c904916f99030a732a99aa34134
Summary: Restructure the configs so that we can specify more than one blobstore
Reviewed By: lukaspiatkowski
Differential Revision: D13234286
fbshipit-source-id: a98ede17921ed6148add570288ac23636b086398
Summary:
Config repo proved to be tricky to understand and hard to use. Let's just use
toml files.
Reviewed By: farnz
Differential Revision: D13179926
fbshipit-source-id: 3a44ee08c37284cc4c189c74b5c369ce82651cc6
Summary:
Let's add a command that builds and reads a skiplist indexes. This indexes will
be used by getbundle wireproto request to decrease the latency and cpu usage.
Note that we are saving only the longest "jump" from the skiplist. This is done
in order to save space.
Reviewed By: jsgf
Differential Revision: D13169018
fbshipit-source-id: 4d654284b0c0d8a579444816781419ba6ad86baa
Summary:
Purpose:
- Sha256 alias link to file_content is a required part of LFS getfiles works correct.
LFS protocol uses SHA-256 to refer to the file content. Mononoke uses Blake2.
To support LFS in Mononoke we need to set up a link from SHA-256 hash of the content to blake2 of the content.
These links are called aliases.
- Aliases are uploading together with file content blobs.
But only for new push operations.
- If repo is blobimported from somewhere, we need to make sure, that all the links are in blobstore.
If repo was blobimported before aliases were added then it may miss aliases for some blobs.
- This tool can be used to
- find if any aliases are missing
- fill missing aliases.
Implementation:
- Run in repo.
- Iterate through all changesets.
- Go through all the file_content blobs in the changesets
- Verify/generate alias256 links to file_content blobs.
Mode supported:
- verify, count the number of errors and print to console
- generate, if blob is missing to add it to the blobstore
Reviewed By: StanislavGlebik
Differential Revision: D10461827
fbshipit-source-id: c2673c139e2f2991081c4024db7b85953d2c5e35
Summary:
Sharding filenodes by path should stop us knocking over databases -
make it configurable.
Reviewed By: StanislavGlebik
Differential Revision: D12894523
fbshipit-source-id: e27452f9b436842e1cb5e9e0968c1822f422b4c9
Summary:
Panic is useless here. It produces huge stack trace which just contains the
main function and makes it harder to debug the actual problem.
Let's just exit in case of errors.
Reviewed By: farnz
Differential Revision: D12912198
fbshipit-source-id: 1faeacfb96765ce047a801f6b072112f10b50b7b
Summary:
Add one more restriction to the config repo to make sure we don't forget to
move PROD bookmark
Reviewed By: HarveyHunt
Differential Revision: D12857619
fbshipit-source-id: c4b5e65f2d0b437aad77d8ccc4b4971b60020af4
Summary:
Let's have separate config bookmarks for release candidate and prod.
That will let us customize shadow tier behaviour.
This diff also adds checking of config repo consistency. It requires that RC
bookmark is a descendant of a PROD bookmark. This topology makes it easy to see
what are the changes between PROD and RC, and verification prevents divergence of
configs i.e. sutiations when somebody updated a prod config but forget to
rebase rc config.
Reviewed By: HarveyHunt
Differential Revision: D12857131
fbshipit-source-id: b60d8f7af16e3d530e5edeb22145ec0bd473ffe4
Summary: Make get_manifest_by_nodeid accept HgManifestId and correct all calls to get_manifest_by_nodeid.
Reviewed By: StanislavGlebik
Differential Revision: D10298425
fbshipit-source-id: 932e2a896657575c8998e5151ae34a96c164e2b2
Summary:
One brainless idiot decided to prune all trees from changed files calcualation.
Since it also prunes subtrees, that leaves with just files in the root
directory.
Reviewed By: lukaspiatkowski
Differential Revision: D10302299
fbshipit-source-id: 8fe2c4ad8de998dfd4083d97cd816d85b5fec604
Summary: Pushvars is a one more way to bypass hooks. This diff implements it
Reviewed By: purplefox
Differential Revision: D10257602
fbshipit-source-id: 1bd188239878ff917ded7db995ea2453da9f64c4
Summary:
Let's add a logic to allow users to bypass hooks.
We'll have two ways to bypass hooks. One is via a string in commit message,
another is via pushvars.
This diff implements the first one.
Reviewed By: purplefox
Differential Revision: D10255378
fbshipit-source-id: 31e803a58e2f4798294f7c807933c8e26de3cfaf
Summary:
The idea for rollout is to:
- first make sure that Mononoke doesn't crash when a --myrouter-port is provided
- then tupperware configs will be modified to include myrouter as a collocated proces on every host and the port of that myrouter instance will be provided via command line
- lastly land the change that actually talks to myrouter
Reviewed By: StanislavGlebik
Differential Revision: D10258251
fbshipit-source-id: ea9d461b401d41ef624304084014c2227968d33f
Summary:
Hooks need to know whether file was added, modified or removed. For example, we
can't fetch content of a removed file. Also hook authors may want to allow
modifying existing files of a particular type, but they may want to disallow
addition of new files of this type.
`cs.files()` doesn't give information about whether a file was
added/deleted/modified, so we have to use `get_changed_stream` function from
manifest_utils.
Note - currently it still returns incorrect list of changed files for merges.
It will be fixed in the next diffs.
Reviewed By: farnz
Differential Revision: D10237587
fbshipit-source-id: cd7f76334070cde451b4690071d03275e40c95f3
Summary: This adds `--compare-commits` option for pushrebase replayer. Which also check that commits are close enough to treat them as equal for the purpose of pushrebase.
Reviewed By: StanislavGlebik
Differential Revision: D10084308
fbshipit-source-id: f1fd05173a9a7663125a89dd03b79b2deea40dc4
Summary: This popped up while I was building Mononoke - fix it.
Reviewed By: StanislavGlebik
Differential Revision: D10126386
fbshipit-source-id: 117239dea88c3ecd921f852ce86691ba6aa8bb07
Summary:
Previously cachelib cmdline args were added only to cmd line binaries, but not
to Mononoke this diff fixes it.
Reviewed By: farnz
Differential Revision: D10083899
fbshipit-source-id: 8febba96561c5ab9a61f60fafc7a7e56985dc038
Summary:
JSON blobs let other users of Mononoke learn what they need to know
about commits. When we get a commit, log a JSON blob to Scribe that other users can pick up to learn what they want to know.
Because Scribe does not guarantee ordering, and can sometimes lose messages, each message includes enough data to allow a tailer that wants to know about all commits to follow backwards and detect lost messages (and thus fix them up locally). It's expected that tailers will either sample this data, or have their own state that they can use to detect missing commits.
Reviewed By: StanislavGlebik
Differential Revision: D9995985
fbshipit-source-id: 527b6b8e1ea7f5268ce4ce4490738e085eeeac72
Summary:
This is mainly generated by
```
fbgs mononoke_test_2 -l -s | xargs sed -i 's/mononoke_test_2/mononoke_production/g'
```
mononoke config repo is still pending todo. But it's ok to do it in several run
as right now 2 names are pointing to the same shard.
Reviewed By: StanislavGlebik
Differential Revision: D9939694
fbshipit-source-id: ded772037844a220b18d99b207c976b88dafdaa5