Summary: As part of the effort to deprecate futures 0.1 in favor of 0.3 I want to create a new futures_ext crate that will contain some of the extensions that are applicable from the futures_01_ext. But first I need to reclame this crate name by renaming the old futures_ext crate. This will also make it easier to track which parts of codebase still use the old futures.
Reviewed By: farnz
Differential Revision: D24725776
fbshipit-source-id: 3574d2a0790f8212f6fad4106655cd41836ff74d
Summary:
In Mononoke for a sharded DB we historically used connection pool size 1 per shard. With the Mysql FFI client now it doesn't make sense, as the client's conn pool is smart enough and designed to works with sharded DBs, so currently we don't even benefit from having a pool.
In this diff I added an API to create sharded connections: a single pool is shared between all the shards.
Reviewed By: farnz
Differential Revision: D24475317
fbshipit-source-id: b7142c030a10ccfde1d5a44943b38cfa70332c6a
Summary:
We want to be able to detect garbage blobs by looking at generation numbers.
Update generation numbers on put, and have a mark command exist to mark blobs as not garbage.
Reviewed By: ahornby
Differential Revision: D23989289
fbshipit-source-id: d96f38649151e3dbd5297cffc262776e74f6cc86
Summary: SQLBlob GC (next diff in stack) will need a ConfigStore in SQLBlob. Make one available to blobstore creation
Reviewed By: krallin
Differential Revision: D24460586
fbshipit-source-id: ea2d5149e0c548844f1fd2a0d241ed0647e137ae
Summary:
Implement BlobstorePutOps for multiplex blobstores, so restoring the ability to have multiplexes of multiplexes.
Note that this makes BlobstorePutOps::put_behaviour accessor problematic as the inner stores could have different put_behaviour, will remove it in next diff so as to keep this diff reasonable size.
Reviewed By: StanislavGlebik
Differential Revision: D24162235
fbshipit-source-id: 2ace3af5f60607996e449451316c5c0720351f82
Summary:
Add overwrite status logging to multiplexblob puts now that the inner types held all implement BlobstorePutOps
This has the effect of making any configuration that specifies a multiplex of multiplexes invalid, which is addressed in next diff in stack.
Reviewed By: StanislavGlebik
Differential Revision: D24159958
fbshipit-source-id: dd1f70a636dfb36686d796af7afd8d5da8797a23
Summary:
Make blobstore_factory PutBehaviour aware by layering all except the final multiplex as BlobstorePutOps
This makes it so all the components that go into a multiplex are BlobstorePutOps, which is a prerequisite for making the multiplex logging include the Overwrite status.
Reviewed By: StanislavGlebik
Differential Revision: D24109289
fbshipit-source-id: 23f4cedbaebadae194e41cfbff9ef46b651e3fd4
Summary:
Add predicate based PutBehaviour logic to manifoldblob.
This will prevent overwrites of keys when in IfAbsent mode, and will generate useful logging in OverwriteAndLog and IsAbsent mode.
This change factors our part of the put logic to put_check_conflict, so that it can use re-used from each of the PutBehaviour cases.
Reviewed By: StanislavGlebik
Differential Revision: D24021170
fbshipit-source-id: d2e71afadada3d5e661634449108e6c9f8dc5907
Summary:
Implement BlobstorePutOps for S3Blob. This uses is_present to check the various put behaviours
While implementing this I noticed get_sharded_key could be updated to take a reference, so I did that as well.
Differential Revision: D24079253
fbshipit-source-id: 16e194076dbdb4da8a7a9b779e0bd5fb60f550a6
Summary: Add put behaviour handling to fileblob so that it can prevent overwrites if requested.
Differential Revision: D23933228
fbshipit-source-id: 8e74ac96b232be841174f6ad2bd2fccf92aaa90d
Summary:
Add put behaviour to BlobstoreOptions in preparation for passing in the put behaviour through blobstore_factory.
Later in the stack a command line option is added to set this non-None so that we can turn on overwrite logging for particular jobs.
Reviewed By: StanislavGlebik
Differential Revision: D24021169
fbshipit-source-id: 5692e2d3912ebde07b0d7bcce54b79df188a9f16
Summary:
This diff introduces Mysql client for Rust to Mononoke as a one more backend in the same row with raw xdb connections and myrouter. So now Mononoke can use new Mysql client connections instead of Myrouter.
To run Mononoke with the new backend, pass `--use-mysql-client` options (conflicts with `--myrouter-port`).
I also added a new target for integration tests, which runs mysql tests using mysql client.
Now to run mysql tests using raw xdb connections, you can use `mononoke/tests/integration:integration-mysql-raw-xdb` and using mysql client `mononoke/tests/integration:integration-mysql`
Reviewed By: ahornby
Differential Revision: D23213228
fbshipit-source-id: c124ccb15747edb17ed94cdad2c6f7703d3bf1a2
Summary:
Implemented S3 blobstore
Isilon implements S3 as 1:1 mapping into filesystem, and it limits the maximum number of blobs in the single directory. To overcome it lets shard the keys using base64 encoding and making 2 level dir structure with 2 chars dir names.
Reviewed By: krallin
Differential Revision: D23562541
fbshipit-source-id: c87aca2410381a07babb191cbd8cf28233556e03
Summary:
There are two reasons to want a write quorum:
1. One or more blobstores in the multiplex are experimental, and we don't want to accept a write unless the write is in a stable blobstore.
2. To reduce the risk of data loss if one blobstore loses data at a bad time.
Make it possible
Reviewed By: krallin
Differential Revision: D22850261
fbshipit-source-id: ed87d71c909053867ea8b1e3a5467f3224663f6a
Summary:
We're going to add an SQL blobstore to our existing multiplex, which won't have all the blobs initially.
In order to populate it safely, we want to have normal operations filling it with the latest data, and then backfill from Manifold; once we're confident all the data is in here, we can switch to normal mode, and never have an excessive number of reads of blobs that we know aren't in the new blobstore.
Reviewed By: krallin
Differential Revision: D22820501
fbshipit-source-id: 5f1c78ad94136b97ae3ac273a83792ab9ac591a9
Summary:
Add a cmdlib argument to control cachelib zstd compression. The default behaviour is unchanged, in that the CachelibBlobstore will attempted compression when putting to the cache if the object is larger than the cachelib max size.
To make the cache behaviour more testable, this change also adds an option to do an eager put to cache without the spawn. The default remains to do a lazy fire and forget put into the cache with tokio::spawn.
The motivation for the change is that when running the walker the compression putting to cachelib can dominate CPU usage for part of the walk, so it's best to turn it off and let those items be uncached as the walker is unlikely to visit them again (it only revisits items that were not fully derived).
Reviewed By: StanislavGlebik
Differential Revision: D22797872
fbshipit-source-id: d05f63811e78597bf3874d7fd0e139b9268cf35d
Summary: D22381744 updated the version of `futures` in third-party/rust to 0.3.5, but did not regenerate the autocargo-managed Cargo.toml files in the repo. Although this is a semver-compatible change (and therefore should not break anything), it means that affected projects would see changes to all of their Cargo.toml files the next time they ran `cargo autocargo`.
Reviewed By: dtolnay
Differential Revision: D22403809
fbshipit-source-id: eb1fdbaf69c99549309da0f67c9bebcb69c1131b
Summary:
Remove unused dependencies for Rust targets.
This failed to remove the dependencies in eden/scm/edenscmnative/bindings
because of the extra macro layer.
Manual edits (named_deps) and misc output in P133451794
Reviewed By: dtolnay
Differential Revision: D22083498
fbshipit-source-id: 170bbaf3c6d767e52e86152d0f34bf6daa198283
Summary:
Add optional compress on put controlled by a command line option.
Other than costing some CPU time, this may be a good option when populating repos from existing uncompressed stores to new stores.
Reviewed By: farnz
Differential Revision: D22037756
fbshipit-source-id: e75190ddf9cfd4ed3ea9a18a0ec6d9342a90707b
Summary: Add basic packblob store which wraps blobs in the thrift StorageEnvelope type on put() and unwraps them on get()
Reviewed By: farnz
Differential Revision: D21924405
fbshipit-source-id: 81eccf5da6b5b50bdb9aae13622301a20dca0eac
Summary:
The blobstore multiplexer contains logic to log blobstore operations to
scuba, as well as updating `PerfCounters`. There are some cases where we don't use the
multiplexed blobstore, which means that we're missing this important logging.
Factor out the logging to a separate crate and implement `LogBlob`, which wraps
another blobstore and performs logging to both scuba and PerfCounters.
Reviewed By: StanislavGlebik
Differential Revision: D21573455
fbshipit-source-id: 490ffd347f1ad19effc93b93f880836467b87651
Summary:
The motivation for making this function async is that it needs to spawn things,
so it should only ever execute while polled by an executor. If we don't do
this, then it can panic if there is no executor, which is annoying.
I've been wanting to do this for a while but hadn't done it because it required
refactoring a lot of things (see the rest of this stack). But, now, it's done.
Reviewed By: mitrandir77
Differential Revision: D21427348
fbshipit-source-id: bad077b90bcf893f38b90e5c470538d2781c51e9
Summary:
This updates our blobrepo factory code to async / await. The underlying
motivation is to make this easier to modify. I've ran into this a few times
now, and I'm sure others have to, so I think it's time.
In doing so, I've simplified the code a little bit to stop passing futures
around when values will do. This makes the code a bit more sequential, but
considering none of those futures were eager in any way, it shouldn't really
make any difference.
Reviewed By: markbt
Differential Revision: D21427290
fbshipit-source-id: e70500b6421a95895247109cec75ca7fde317169
Summary: This also unblocks the MacOS Mononoke builds, so enabling them back
Reviewed By: farnz
Differential Revision: D21455422
fbshipit-source-id: 4eae10785db5b93b1167f580a1c887ee4c8a96a2
Summary:
This removes our own (Mononoke's) implementation of failure chains, and instead
replaces them with usage of Anyhow. This doesn't appear to be used anywhere
besides Mononoke.
The historical motivation for failure chains was to make context introspectable
back when we were using Failure. However, we're not using Failure anymore, and
Anyhow does that out of the box with its `context` method, which you can
downcast to the original error or any of the context instances:
https://docs.rs/anyhow/1.0.28/anyhow/trait.Context.html#effect-on-downcasting
Reviewed By: StanislavGlebik
Differential Revision: D21384015
fbshipit-source-id: 1dc08b4b38edf8f9a2c69a1e1572d385c7063dbe
Summary: For some reason, direct connections don't work from devvm4263.prn3.facebook.com, complaining about an IP address mismatch. Rather than debug that, just use MyRouter.
Reviewed By: StanislavGlebik
Differential Revision: D21089203
fbshipit-source-id: 88489a6a3b83de885c7c6e9405b325b56b807e12
Summary: I'm going to want to be able to test against a single ephemeral shard, as well as production use against a real DB. Use the standard config to make that possible.
Reviewed By: ahornby
Differential Revision: D21048697
fbshipit-source-id: 644854e2c831a9410c782ca1fddc1c4b5f324d03
Summary:
Migrate the configuration of sql data managers from the old configuration using `sql_ext::SqlConstructors` to the new configuration using `sql_construct::SqlConstruct`.
In the old configuration, sharded filenodes were included in the configuration of remote databases, even when that made no sense:
```
[storage.db.remote]
db_address = "main_database"
sharded_filenodes = { shard_map = "sharded_database", shard_num = 100 }
[storage.blobstore.multiplexed]
queue_db = { remote = {
db_address = "queue_database",
sharded_filenodes = { shard_map = "valid_config_but_meaningless", shard_num = 100 }
}
```
This change separates out:
* **DatabaseConfig**, which describes a single local or remote connection to a database, used in configuration like the queue database.
* **MetadataDatabaseConfig**, which describes the multiple databases used for repo metadata.
**MetadataDatabaseConfig** is either:
* **Local**, which is a local sqlite database, the same as for **DatabaseConfig**; or
* **Remote**, which contains:
* `primary`, the database used for main metadata.
* `filenodes`, the database used for filenodes, which may be sharded or unsharded.
More fields can be added to **RemoteMetadataDatabaseConfig** when we want to add new databases.
New configuration looks like:
```
[storage.metadata.remote]
primary = { db_address = "main_database" }
filenodes = { sharded = { shard_map = "sharded_database", shard_num = 100 } }
[storage.blobstore.multiplexed]
queue_db = { remote = { db_address = "queue_database" } }
```
The `sql_construct` crate facilitates this by providing the following traits:
* **SqlConstruct** defines the basic rules for construction, and allows construction based on a local sqlite database.
* **SqlShardedConstruct** defines the basic rules for construction based on sharded databases.
* **FbSqlConstruct** and **FbShardedSqlConstruct** allow construction based on unsharded and sharded remote databases on Facebook infra.
* **SqlConstructFromDatabaseConfig** allows construction based on the database defined in **DatabaseConfig**.
* **SqlConstructFromMetadataDatabaseConfig** allows construction based on the appropriate database defined in **MetadataDatabaseConfig**.
* **SqlShardableConstructFromMetadataDatabaseConfig** allows construction based on the appropriate shardable databases defined in **MetadataDatabaseConfig**.
Sql database managers should implement:
* **SqlConstruct** in order to define how to construct an unsharded instance from a single set of `SqlConnections`.
* **SqlShardedConstruct**, if they are shardable, in order to define how to construct a sharded instance.
* If the database is part of the repository metadata database config, either of:
* **SqlConstructFromMetadataDatabaseConfig** if they are not shardable. By default they will use the primary metadata database, but this can be overridden by implementing `remote_database_config`.
* **SqlShardableConstructFromMetadataDatabaseConfig** if they are shardable. They must implement `remote_database_config` to specify where to get the sharded or unsharded configuration from.
Reviewed By: StanislavGlebik
Differential Revision: D20734883
fbshipit-source-id: bb2f4cb3806edad2bbd54a47558a164e3190c5d1
Summary: In the process the blobstore/factory/lib.rs was split into submodules - this way it was easier to untangle the dependencies and refactor it, so I've left the split in this diff.
Reviewed By: markbt
Differential Revision: D20302068
fbshipit-source-id: caa3a2b5487c30198c62f7e4f4e9cb7c488dc8de
Summary:
This shifts the responsibility of mocking all facebook-specific code
to mononoke's sql_ext crate. If OSS code calls into any of that code it will
most likely result in a panic.
Reviewed By: ahornby
Differential Revision: D20247580
fbshipit-source-id: 43f158d91aa32adaa5df6e3786243fb89c9ce961
Summary:
Right now, ContextConcurrencyBlobstore is instantiated in make_blobstore, which
makes it a lot more effective (3 times more effective, in fact) than we want it
to be, since a ticket is acquired by 3 blobstores in the chain in order to
complete a put:
- The multiplex
- The two underlying blobstores
This also has the potential to deadlock if all tickets are held by the
multiplex, which results in an eventual timeout after 600s of waiting in the
multiplex (this looks like it might be happening at least once or twice per
hour right now on the experimental tier).
In any case, the intention had always been to have one of those per repo, not
one per sub-blobstore, so let's do that. The more natural place to put this
seems to be the RepoBlobstore instantiation.
Since I anticipate I might not be the only one who gets tripped up by this at
some point, I also added a comment about this. I also updated the blobsync
tests to stop re-implementing `RepoBlobstoreArgs::new()` so that adding new
blobstores in RepoBlobstoreArgs will have minimal friction.
Reviewed By: HarveyHunt
Differential Revision: D20467346
fbshipit-source-id: a6ad2d8f04bff1c6fcaa151e947cb8af919eec07
Summary:
This adds a blobstore that can reach into a CoreContext in order to identify
the allowed level of concurrency for blobstore requests initiated by this
CoreContext. This will let us replay infinitepush bundles with limits on a
per-request basis.
Reviewed By: farnz
Differential Revision: D20038575
fbshipit-source-id: 07299701879b7ae65ad9b7ff6e991ceddf062b24
Summary: separate out the Facebook-specific pieces of the sql_ext crate
Reviewed By: ahornby
Differential Revision: D20218219
fbshipit-source-id: e933c7402b31fcd5c4af78d5e70adafd67e91ecd
Summary:
This updates Mononoke to use the new filenodes implementation introduced
earlier in this stack.
See the test plan for detailed performance results supporting why I'm making
this change.
Reviewed By: StanislavGlebik
Differential Revision: D19905394
fbshipit-source-id: 8370fd30c9cfd075c3527b9220e4cf4f604705ae
Summary:
Nearly all of the Mononoke SQL stores are instantiated once per repo but they don't store the `RepositoryId` anywhere so every method takes it as argument. And because providing the repo_id on every call is not ergonomical we tend to add methods to blob_repo that just call the right method with the right repo_id in on of the underlying stores (see `get_bonsai_from_globalrev` on blobrepo for example).
Because my reviewers [pushed back](https://our.intern.facebook.com/intern/diff/D19972871/?transaction_id=196961774880671&dest_fbid=1282141621983439) when I've tried to do the same for bonsai_git_mapping I've decided to make it right by adding the repo_id to the BonsaiGitMapping.
Reviewed By: krallin
Differential Revision: D20029485
fbshipit-source-id: 7585c3bf9cc8fa3cbe59ab1e87938f567c09278a
Summary:
This updates our multiplexed blobstore configuration to carry its own DB
config. The upshot of this change is that we can move the blobstore sync queue
(a fairly unruly table) to its own DB.
Another nice side effect of this is that it cleans up a bunch of other code, by
finally decoupling the blobstore config from the DB config. For examples,
places that need to instantiate a blobstore can now to do even without a DB
config (such as wireproto logging).
Obviously, this cannot land until we update the configs to include this. I'll
do so in Configerator prior to landing the diff.
Reviewed By: HarveyHunt
Differential Revision: D19973905
fbshipit-source-id: 79e4ff92cdb989aab4532decd3fe4fd6c55e2bb2
Summary:
This is the second (and last) step on removing RocksDB as a blobstore.
Check the task for more description.
Context for OSS:
> The issue with rocksblob (and to some extent sqlite) is that unless we
> introduce a blobstore tier/thift api (which is something I'm hoping to avoid
> for xdb blobstore) we'd have to combine all the mononoke function like hg,
> scs, LFS etc into one binary for it to have access to rocksdb, which would be
> quite a big difference to how we deploy internally
(Note: this ignores all push blocking failures!)
Reviewed By: farnz
Differential Revision: D20001261
fbshipit-source-id: c4b2b2a393b918d17680ad483aa1d77356f1d07c
Summary:
This commit manually synchronizes the internal move of
fbcode/scm/mononoke under fbcode/eden/mononoke which couldn't be
performed by ShipIt automatically.
Reviewed By: StanislavGlebik
Differential Revision: D19722832
fbshipit-source-id: 52fbc8bc42a8940b39872dfb8b00ce9c0f6b0800
Summary:
In order to uniquely identify a blobstore multiplexer configuration,
add an ID.
Reviewed By: krallin
Differential Revision: D19770058
fbshipit-source-id: 8e09d5531d1d27b337cf62a6126f88ce15de341b
Summary: This will let us lower Scuba utilization from Fastreplay.
Reviewed By: HarveyHunt
Differential Revision: D19766018
fbshipit-source-id: 4eac19b929914db910ed13096b2a5910c134ed3a