Summary:
When a client requests a blob that is redacted, we should tell them that,
instead of returning a 500. This does that, we now return a `410 Gone` when
redacted content is accessed.
Reviewed By: farnz
Differential Revision: D20897251
fbshipit-source-id: fc6bd75c82e0cc92a5dbd86e95805d0a1c8235fb
Summary:
If a blob is redacted, we shouldn't crash in batch. Instead, we should return
that the blob exists, and let the download path return to the client the
information that the blob is redacted. This diff does that.
Reviewed By: HarveyHunt
Differential Revision: D20897247
fbshipit-source-id: 3f305dfd9de4ac6a749a9eaedce101f594284d16
Summary:
502 made a bit of sense since we can occasionally proxy things to upstream, but
it's not very meaningful because our inability to service a batch request is
never fully upstream's fault (it would not a failure if we had everything
internally).
So, let's just return a 500, which makes more sense.
Reviewed By: farnz
Differential Revision: D20897250
fbshipit-source-id: 239c776d04d2235c95e0fc0c395550f9c67e1f6a
Summary:
I noticed this while doing some unrelated work on this code. Basically, if we
get an error from upstream, then we shouldn't return an error the client
*unless* upstream being down means we are unable to satisfy their request
(meaning, we are unable to say whether a particular piece of content is
definitely present or definitely missing).
This diff fixes that. Instead of checking for a success when hearing form
upstream _then_ running our routing logic, let's instead only fail if in the
course of trying to route the client, we discover that we need a URL from
upstream AND upstream has failed.
Concretely, this means that if upstream blew up but internal has all the data
we want, we ignore the fact that upstream is down. In practice, internal is
usually very fast (because it's typically all locally-cached) so this is
unlikely to really occur in real life, but it's still a good idea to account
for this failure scenario.
Reviewed By: HarveyHunt
Differential Revision: D20897252
fbshipit-source-id: f5a8598e8a9da382d0d7fa6ea6a61c2eee8ae44c
Summary:
Right now we have a couple functions, but they're not easily composable. I'd
like to make the redacted blobs configurable when creating a test repo, but I
also don't want to have 2 new variants, so let's create a little builder for
test repos.
This should make it easier to extend in the future to add more customizability
to test repos, which should in turn make it easier to write unit tests :)
Reviewed By: HarveyHunt
Differential Revision: D20897253
fbshipit-source-id: 3cb9b52ffda80ccf5b9a328accb92132261616a1
Summary:
This asyncifies the internals of `subcommand_tail`, which
loops over a stream, by taking the operation performed in
the loop and making it an async function.
The resulting code saves a few heap allocations by reducing
clones, and is also *much* less indented, which helps with
readability.
Reviewed By: krallin
Differential Revision: D20664511
fbshipit-source-id: 8e81a1507e37ad2cc59e616c739e19574252e72c
Summary: These hooks behave the same way in Mercurial and Bonsai form. Port them over to operating on Bonsai form
Reviewed By: krallin
Differential Revision: D20891165
fbshipit-source-id: cbcdf217398714642d2f2d6669376defe8b944d7
Summary: Running on Mercurial hooks isn't scalable long term - move the consumers of hooks to run on both forms for a transition period
Reviewed By: krallin
Differential Revision: D20879136
fbshipit-source-id: 4630cafaebbf6a26aa6ba92bd8d53794a1d1c058
Summary: To use Bonsai-based hooks, we ned to be able to load them. Make it possible.
Reviewed By: krallin
Differential Revision: D20879135
fbshipit-source-id: 9b44d7ca83257c8fc30809b4b65ec27a8e9a8209
Summary: We want all hooks to run against the Bonsai form, not a Mercurial form. Create a second form of hooks (currently not used) which acts on Bonsai hooks. Later diffs in the stack will move us over to Bonsai only, and remove support for Mercurial changeset derived hooks
Reviewed By: krallin
Differential Revision: D20604846
fbshipit-source-id: 61eece8bc4ec5dcc262059c19a434d5966a8d550
Summary:
Thanks to StanislavGlebik for this idea: we can make the looping over
upload changesets into straightforward imperative code instead
of using `.and_then` + `.fold` by taking the next chunk in a
while loop.
The resulting code is probably easier to understand (depends whether
you come from a functional background I guess), and it's less indented
which is definitely more readable
Reviewed By: StanislavGlebik
Differential Revision: D20881862
fbshipit-source-id: 7ecf76a2fae3eb0e6c24a1ee14e0684b6334b087
Summary:
A couple of minor improvements, removing some overhead:
- We don't need to pass cloned structs to `erive_data_for_csids`,
refs work just fine
- We can strip out one of the boxing blocks by directly assigning
an `async` block to `globalrevs_work`
- We can't do the same for `synced_commit_mapping_work` because
we have to iterate over `chunk` in synchronous code, so that
`chunk` can later be consumed by the line defining `changesets`.
Reviewed By: StanislavGlebik
Differential Revision: D20863304
fbshipit-source-id: 14cad3324978a66bcf325b77df7803d77468d30b
Summary:
This wound up being a little tricky, because
that `async move` blocks capture any data used,
and most of the fields of the `Blobimport` struct
are values rather than refs.
The easiest solution that I came up with, which looks
a little weird but works better than anything else
I tried, is to just inject a little block of code
(which I commented so it will hopefully be clear to
future readers) taking refs of anything that we need
to use in an async block but also have available later.
In the process, we are able to strip out a layer of
clones, which should improve efficiency a bit.
Reviewed By: StanislavGlebik
Differential Revision: D20862358
fbshipit-source-id: 186bf9939b9496c432ff0d9a01e602da47f4b5d4
Summary: Some methods that were unused or barely used outside of the cmdlib crate were made non-public (parse_caching, CachelibSettings, init_cachelib_from_settings).
Reviewed By: krallin
Differential Revision: D20671251
fbshipit-source-id: 232e786fa5af5af543239aca939cb15ca2d6bc10
Summary:
Old is defined by being based on a commit that is more than 30 days old.
The build date is taken from the version string.
One observation is that if we fail to release in more than 30 days then all
users will start seeing this message without any way of turning it off. Doesn't
seem worth while to add a config for silencing it though.
Reviewed By: quark-zju
Differential Revision: D20825399
fbshipit-source-id: f97518031bbda5e2c49226f3df634c5b80651c5b
Summary:
I decided to go with integration test because backfilling derived data at the
moment requires two separate calls - a first one to prefetch changesets, and a
second one to actually run backfill. So integration test is better suited for this
case than unit tests.
While doing so I noticed that fetch_all_public_changesets actually won't fetch
all changesets - it loses the last commit becauses t_bs_cs_id_in_range was
returning exclusive (i.e. max_id was not included). I fixed the bug and made the name clearer.
Reviewed By: krallin
Differential Revision: D20891457
fbshipit-source-id: f6c115e3fcc280ada26a6a79e1997573f684f37d
Summary:
`log_v2` supports time-filters and that means it needs to be able to drop the history stream if the commits got older than the given time frame. (if not it just traverses the whole history...)
However, it cannot be done from the SCS commit_path API or from changeset_path, because they already receive history stream where commits are not ordered by creation time. And the naive solution "if next commit in the stream is older than `after_ts` then drop" won't work: there might be another branch (commit after the current one) which is still **in** the time frame.
I added a terminator-function to the `list_file_history` that is called on changeset id, for which a new fastlog batch is going to be fetched. If terminator returns true, then the fastlog is not fetched and the current history branch is dropped. All ready nodes are still streamed.
```
For example, if we have a history of the file changes like this:
A 03/03 ^|
| |
B 02/03 |
| | - one fastlog batch
C 01/03 |
| \ |
02/01 D E 10/02 _| - let assume, that fastlog batches for D and E ancestors are needed to prefetch
| |
01/01 F G 05/02
# Example 1
We query "history from A after time 01/02"
The old version would fetch all the commits and then filter them in `commit_path`. We would fetch both fastlog batches for the D branch and E branch.
With the terminator, `list_file_history` will call terminator on commit D and get `true` in return and then will drop the D branch,
then it will call terminator on E and get `false` and proceed with fetching fastlog for the E branch.
# Example 2
We query "history from A after time 01/04"
The old version would fetch all the commits and then filter them in `commit_path`, despite the fact that
the very first commit is already older than needed.
With the terminator it will call terminator on A and get `true` and won't proceed any further.
Reviewed By: StanislavGlebik
Differential Revision: D20801029
fbshipit-source-id: e637dcfb6fddceb4a8cfc29d08b427413bf42e79
Summary: Asyncified main functions of the fastlog/ops, so it'd be easier to modify them and proceed with the new features.
Reviewed By: StanislavGlebik
Differential Revision: D20801028
fbshipit-source-id: 2a03eedca776c6e1048a72c7bd613a6ef38c5c17
Summary: We need to parse `directories` here. Let's do so.
Reviewed By: HarveyHunt
Differential Revision: D20869830
fbshipit-source-id: 74830aa0045b801fba089812447fb61d7d09ad14
Summary: As it says in the title!
Reviewed By: HarveyHunt
Differential Revision: D20869828
fbshipit-source-id: df7728ce548739ef2dadad1629817fb56c166b66
Summary:
We use the logged arguments directly for wireproto replay, and then we replay
this directly in traffic replay, but just joining a list with `,` doesn't
actually work for directories:
- We need trailing commas
- We need wireproto encoding
This does that. It also clarifies that this encoding is for debug purposes by
updating function names, and relaxes a bunch of types (since hgproto uses
bytes_old).
Reviewed By: StanislavGlebik
Differential Revision: D20868630
fbshipit-source-id: 3b805c83505aefecd639d4d2375e0aa9e3c73ab9
Summary: This is helpful to draw conclusions as to how fast it is.
Reviewed By: StanislavGlebik
Differential Revision: D20872108
fbshipit-source-id: d323358bbba29de310d6dfb4c605e72ce550a019
Summary:
`list_file_history` implements BFS on the commit graph and returns a stream of changeset ids using `bounded_traversal_stream`.
The old version iterated BFS "levels" and each iteration streamed all nodes on the current level. For example, for the commit graph:
```
1 <- start # 1 level
|
2 # 2
| \
3 4 # 3
| |
```
there would be 3 iterations and on each nodes would be yielded: [1], [2], [3, 4]. If there was need to prefetch fastlog batch or batches, it prefetched in parallel batch/batches for changesets on the same level.
The implementation was a bit hacky and it is a bit unfortunate that we need to make 100 iterations to stream changesets that are ready and do not require fetching fastlog. I also needed some simplification so I could then add a terminator function (3rd diff in the stack) on the fastlog batch prefetching stage (and add Deleted Manifest intefration).
So now the `bounded_traversal_stream` keeps a BFS-queue as a state and on each iteration streams all nodes in the queue until it hits the node for which it needs prefetch fastlog batch and goes to the next iteration.
```
state - [queue, prefetch_cs_id]
* on each iteration:
1. If prefetch_cs_id.is_some() =>
- fetch fastlog batch for prefetch_cs_id
- fill the commit graph
- add parents of the prefetch_cs_id to the bfs queue
2. Get from the queue all nodes until we meet changeset without fetched parents.
Mark this node as `prefetch_cs_id` for the next iteration.
3. Stream ready nodes and go to the next iteration.
```
Thus
- we still fetch fastlog batches on demand and not before we really need them
- if we have 100 ready to be yield commits in the queue, we won't do 100 iterations and stream them in one go
- now if we need prefetch fastlog batches for 2 branches on the same "bfs level" we will do it one by one and not in parallel, but this situation is pretty uncommon
- code is simpler and allows to integrate Deleted Manifest and add terminator function.
Reviewed By: StanislavGlebik
Differential Revision: D20768401
fbshipit-source-id: cdba40539a842b3628826f6c72a29514da0d539e
Summary:
In the previous diff we asyncified the signature of Blobimport::import,
but the body remained an old-style future with a compat and await at
the end.
This diff asyncifies the outermost logic from within the function,
slightly improving readability and removing one layer of clones
to cut down on heap allocations. The derivation of `max_rev` still
currently uses old-style streams and futures.
Reviewed By: StanislavGlebik
Differential Revision: D20861230
fbshipit-source-id: 1b462f17581c764e77a0a0a163c86ffa894df742
Summary:
Switch the Blobimport struct to take a reference to a ctx,
and have `import` be an `async fn`.
Reviewed By: StanislavGlebik
Differential Revision: D20861165
fbshipit-source-id: eda9d599af2e525ec3142facc1eeb6b5b433ab06
Summary: Add default implementations for samplingblobs put and is_present handlers to save some boilerplate
Reviewed By: farnz
Differential Revision: D20868507
fbshipit-source-id: 40275cc832870019238c0635e097e53671b76783
Summary:
This way we'll never select more than (no_of_stores * limit) rather than
potentially unbounded output.
NOTE: This diff has to be landed and rolled out **after** D20557702 is rolled out. I'm assuming that after some time since D20557702 rollout all the rows in the production db will have proper `operation_key` value set so we can make queries based on them.
Reviewed By: krallin
Differential Revision: D20557700
fbshipit-source-id: 5a1d4b69949b425915214f5227c5c0dcce374360
Summary: So we're sure that all the quries work not only in sqlite.
Reviewed By: krallin
Differential Revision: D20839958
fbshipit-source-id: 9d05cc175d65396af7495b31f8c6958ac7bd8fb6
Summary:
When we have more entries with a single blobkey we always select all of them
regardles when and how they were added. That's why we need to select basing on
operation_key.
Reviewed By: krallin
Differential Revision: D20557699
fbshipit-source-id: 77ccf992bb24d1a46ea28a13ab0780e6c92935ae
Summary: Log errors to scuba regardless of the error_as_data setting, as as finding the logs is much easier from scuba than stderr.
Reviewed By: farnz
Differential Revision: D20838462
fbshipit-source-id: b78e3a3213ed4aee4e4b2feb871ad7e42e25ed00
Summary:
Combined with the unbundle resolver stats, we will be able to say which
percentage of pushrebases fails, for example.
Reviewed By: StanislavGlebik
Differential Revision: D20818224
fbshipit-source-id: 70888b1cb90ffae8b11984bb024ec1db0e0542f7
Summary:
We need this to be able to monitor how frequently we get pushes vs
infinitepushes, etc. A furhter diff will add a similar reporting to
`processing.rs`, so that we can compute a percentage of successful pushes to
all pushes, for example.
Reviewed By: StanislavGlebik
Differential Revision: D20818225
fbshipit-source-id: 7945dc285560d1357bdc6aef8e5fe50b61622254
Summary:
Our phases caching wasn't great. If you tried to ask for a draft commit then
we'd call mark_reachable_as_public method, bu this method was bypassing
caches.
The reason why we had this problem was because we had caching on a higher level
than necessary - we had SqlPhases struct which was "smarter" (i.e. it has a
logic of traversing ancestors of public heads and marking these ancestors and
public) and SqlPhasesStore which just did sql access. Previously we had our
caching layer on top of SqlPhases, meaning that when SqlPhases calls
`mark_reachable_as_public` it can't use caches anymore.
This diff fixes it by moving caching one layer lower - now we have a cache
right on top of SqlPhasesStore. Because of this change we no longer need
CachingPhases, and they were removed. Also `ephemeral_derive` logic was
simplified a bit
Reviewed By: krallin
Differential Revision: D20834740
fbshipit-source-id: 908b7e17d6588ce85771dedf51fcddcd2fabf00e
Summary:
Very small refactoring to store MemcacheHandler (i.e. an enum which can either
be a real Memcache client or a mock) instead of a memcache client.
It will be used in the next diff to create mock caches
Reviewed By: krallin
Differential Revision: D20834916
fbshipit-source-id: cb1e3e8f0ae0e2c0f7018d3a003ada56725c65c6
Summary: SelectPhases does the same thing - no need to keep two queries
Reviewed By: krallin
Differential Revision: D20817379
fbshipit-source-id: 8cc56ea4a94e81f110a286899a8f5c596566a142
Summary: I'm going to refactor it soon, for now just move it to another file.
Reviewed By: krallin
Differential Revision: D20817293
fbshipit-source-id: 6fb44b4be858ebbd0e8c9dfee160b91806f78285
Summary:
This diff turns off the support_old_nightly feature of async-trait (https://github.com/dtolnay/async-trait/blob/0.1.24/Cargo.toml#L28-L32) everywhere in fbcode. I am getting ready to remove the feature upstream. It was an alternative implementation of async-trait that produces worse error messages but supports some older toolchains dating back to before stabilization of async/await that the default implementation does not support.
This diff includes updating async-trait from 0.1.24 to 0.1.29 to pull in fixes for some patterns that used to work in the support_old_nightly implementation but not the default implementation.
Differential Revision: D20805832
fbshipit-source-id: cd34ce55b419b5408f4f7efb4377c777209e4a6d
Summary:
Add a fingerprint method that returns a subset of the hash.
This will allow us to see compression benefit, or write out a corpus, sampling 1 in N of a group of keys
Reviewed By: krallin
Differential Revision: D20541312
fbshipit-source-id: 93bd44ba9c14285daf50d8cd18eeb4b6dcc38d82
Summary:
Use the new sampling blobstore and sampling key in existing compression-benefit subcommand and check the new vs old reported sizes.
The overall idea for these changes is that the walker uses a CoreContext tagged with a SamplingKey to correlate walker steps for a node to the underlying blobstore reads, this allows us to track overall bytes size (used in scrub stats) or the data itself (used in compression benefit) per node type.
The SamplingVisitor and NodeSamplingHandler cooperate to gather the sampled data into the maps in NodeSamplingHandler, which the output stream from the walk then operates on, e.g. to compress the blobs and report on compression benefit.
The main new logic sits in sampling.rs, it is used from sizing.rs (and later in stack from scrub.rs)
Reviewed By: krallin
Differential Revision: D20534841
fbshipit-source-id: b20e10fcefa5c83559bdb15b86afba209c63119a
Summary: Now that everything is using `sql_construct`, we can remove the old `SqlConstructors` trait.
Reviewed By: StanislavGlebik
Differential Revision: D20734881
fbshipit-source-id: af46b41d17b40f6eb0839cdb9e85b00067360fe9
Summary:
Migrate the configuration of sql data managers from the old configuration using `sql_ext::SqlConstructors` to the new configuration using `sql_construct::SqlConstruct`.
In the old configuration, sharded filenodes were included in the configuration of remote databases, even when that made no sense:
```
[storage.db.remote]
db_address = "main_database"
sharded_filenodes = { shard_map = "sharded_database", shard_num = 100 }
[storage.blobstore.multiplexed]
queue_db = { remote = {
db_address = "queue_database",
sharded_filenodes = { shard_map = "valid_config_but_meaningless", shard_num = 100 }
}
```
This change separates out:
* **DatabaseConfig**, which describes a single local or remote connection to a database, used in configuration like the queue database.
* **MetadataDatabaseConfig**, which describes the multiple databases used for repo metadata.
**MetadataDatabaseConfig** is either:
* **Local**, which is a local sqlite database, the same as for **DatabaseConfig**; or
* **Remote**, which contains:
* `primary`, the database used for main metadata.
* `filenodes`, the database used for filenodes, which may be sharded or unsharded.
More fields can be added to **RemoteMetadataDatabaseConfig** when we want to add new databases.
New configuration looks like:
```
[storage.metadata.remote]
primary = { db_address = "main_database" }
filenodes = { sharded = { shard_map = "sharded_database", shard_num = 100 } }
[storage.blobstore.multiplexed]
queue_db = { remote = { db_address = "queue_database" } }
```
The `sql_construct` crate facilitates this by providing the following traits:
* **SqlConstruct** defines the basic rules for construction, and allows construction based on a local sqlite database.
* **SqlShardedConstruct** defines the basic rules for construction based on sharded databases.
* **FbSqlConstruct** and **FbShardedSqlConstruct** allow construction based on unsharded and sharded remote databases on Facebook infra.
* **SqlConstructFromDatabaseConfig** allows construction based on the database defined in **DatabaseConfig**.
* **SqlConstructFromMetadataDatabaseConfig** allows construction based on the appropriate database defined in **MetadataDatabaseConfig**.
* **SqlShardableConstructFromMetadataDatabaseConfig** allows construction based on the appropriate shardable databases defined in **MetadataDatabaseConfig**.
Sql database managers should implement:
* **SqlConstruct** in order to define how to construct an unsharded instance from a single set of `SqlConnections`.
* **SqlShardedConstruct**, if they are shardable, in order to define how to construct a sharded instance.
* If the database is part of the repository metadata database config, either of:
* **SqlConstructFromMetadataDatabaseConfig** if they are not shardable. By default they will use the primary metadata database, but this can be overridden by implementing `remote_database_config`.
* **SqlShardableConstructFromMetadataDatabaseConfig** if they are shardable. They must implement `remote_database_config` to specify where to get the sharded or unsharded configuration from.
Reviewed By: StanislavGlebik
Differential Revision: D20734883
fbshipit-source-id: bb2f4cb3806edad2bbd54a47558a164e3190c5d1
Summary:
Refactor `sql_ext::SqlConstructors` and its related traits into a separate
crate. The new `SqlConstruct` trait is joined by `SqlShardedConstruct` which
allows construction of sharded databases.
The new crate will support a new configuration model where the distinction
between the database configuration for different repository metadata types
will be made clear.
Reviewed By: StanislavGlebik
Differential Revision: D20734882
fbshipit-source-id: b44cf9d1efd014c29df88e2ad933025e440119dc
Summary:
The new API does nothing that cloud sync does not want: bookmarks, obsmarkers,
prefetch, etc. Wrappers to disable features are removed.
This solves a "lagged master" issue where selectivepull adds `-B master` to
pull extra commits but cloud sync cannot hide them without narrow-heads. Now
cloud sync just does not pull the extra commits.
Reviewed By: sfilipco
Differential Revision: D20808884
fbshipit-source-id: 0e60d96f6bbb9d4ce02c04e8851fc6bda442c764
Summary:
For the initial rollout of lfs on fbsource we want to rollout just for our
team using rollout_smc_tier option. This diff adds a support for that in
Mononoke.
It spawns a future that periodically updates list of enabled hosts in smc tier.
I had a slight concern about listing all the available services and storing
them in memory - what if smc tier have too many services? I decided to go ahead
with that because
1) [Smc antipatterns](https://fburl.com/wiki/ox43ni3a) wiki page doesn't seem
to list it as a concern.
2) We are unlikely to use for large tier - most likely we'll use it just for
hg-dev which contains < 100 hosts.
Reviewed By: krallin
Differential Revision: D20789751
fbshipit-source-id: d35323e49530df6983e159e2ed5bce205cc5666d
Summary:
This is a followup from D20766465. In D20766465 we've avoided re-traversing
fsnodes for all entries except for copy/move sources. This diff make copy/move
sources fetching more efficient as well.
It does by sending a find_entries() request to prefetch all fsnodes
Reviewed By: mitrandir77
Differential Revision: D20770182
fbshipit-source-id: 7e4a68a2ded20b2895ee4d1c4f8fd897dbe1c850
Summary:
We are currently having problems with streaming clone:
```
$ hg --config 'extensions.fsmonitor=!' clone --shallow -U --config 'ui.ssh=ssh -oControlMaster=no' --configfile /etc/mercurial/repo-specific/fbsource.rc 'ssh://hg.vip.facebook.com//data/scm/fbsource?force_mononoke' "$(pwd)/fbsource-clone-test"
remote: server: https://fburl.com/mononoke
remote: session: vJ3qkiQIm9FT7mCp
connected to twshared11499.02.cln2.facebook.com
streaming all changes
2 files to transfer, 5.42 GB of data
abort: unexpected response from remote server:
'\x00\x01B?AB\x00\x00\x00\x00\x02U\x00\x00\x02\xc7\x00b\xf0\xd5\x00b\xf0\xd5\x00b\xf0\xd4\xff\xff\xff\xff\xa8z\xc7W\xd0&\xab\xb2\xf1{\xbfq\xac<\xaf6W\x06q\x81\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01B?C\x97\x00\x00\x00\x00\x053\x00\x00\x06\xce\x00b\xf0\xd6\x00b\xf0\xd6\x00b\xf0\xd5\xff\xff\xff\xff\xa3I\x19+\xe2\x0f\xae\xd2\x95\x14\x8a\xde\x19\x18\xf0\x8cUQu\xf1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01B?H\xca\x00\x00\x00\x00\x02\xe4\x00\x00\x03\x9e\x00b\xf0\xd7\x00b\xf0\xd7\x00b\xf0\xd6\xff\xff\xff\xffx\xd6}\x12nt\xb9\xbc(\x83\xfb\xfa\xcc\xc1o?\xde\xcc\x06L\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01B?K\xae\x00\x00\x00\x00\x02j\x00\x00\x02\xb5\x00b\xf0\xd8\x00b\xf0\xd8\x00b\xf0\xd7\xff\xff\xff\xff\x04"\xfcw6\'M\xba\xf1f\xdb\x02\xbeE\x93:\xc8\x17\x88P\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01B?N\x18\x00\x00\x00\x00\x03\xbb\x00\x00\x04\xb8\x00b\xf0\xd9\x00b\xf0\xd9\x00b\xf0\xd8\xff\xff\xff\xff\xb9\x15*p/\xa4*\x00\x9dZw\x01B\x87L\x8f\x08\x11\x89\xe0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0000changelog.d\x005406413267\n'
```
as the result of the debugging it is turned out that we are sending more data than expected, to have better error next time if we have any corruption of the `streaming_changelog_chunks` table
Reviewed By: StanislavGlebik
Differential Revision: D20763738
fbshipit-source-id: 6f6fa9f9a29909e044d9ba42fe84916ddcb62e8f
Summary: Each benchmark takes about 3 minutes to run. We've already got 16 benchmarks, and we're going to grow. Allow you to limit the number of benchmarks we run at once..
Reviewed By: ahornby
Differential Revision: D20735795
fbshipit-source-id: 241184085b35da8ab85314fef1c6a08404bdb769
Summary: We're going to be doing a variety of changes to sqlblob - let's enable working against a known baseline each time, instead of incremental changes.
Reviewed By: ahornby
Differential Revision: D20735796
fbshipit-source-id: 86f15dac1f004b2f3c83ced829a65f3f6e111d6b
Summary:
We want to be able to benchmark blobstore stacks to allow us to compare blobstore stacks, and ensure that we're not regressing as we improve SQLblob to production state.
Use Criterion to benchmark a few basic workflows - starting with writes, but reads can be added later.
Reviewed By: ahornby
Differential Revision: D20720789
fbshipit-source-id: e8b10664a9d08a1aa7e646e1ebde251bec0db991
Summary: Use the client identity middleware from gotham_ext in the EdenAPI server. This middleware parses validated client identities from HTTP headers inserted by Proxygen; these identities can then be used to enforce repo ACLs.
Reviewed By: HarveyHunt
Differential Revision: D20744887
fbshipit-source-id: 651e171d1b20448b3e99bfc938d118fb6dddea91
Summary: Looks like it's not dead anymore
Reviewed By: krallin
Differential Revision: D20766497
fbshipit-source-id: c49ae3b6c8a660b33e61e65adda94f78addd1498
Summary:
It is preferable to use the higher-level API of cached_config instead of ConfigeratorAPI whenever possible since the higher-level API supports OSS builds.
For `ConfigStore` let `poll_interval` be None so that for one-off reading of configs the ConfigStore doesn't needlessly spawn an updating thread.
Also this update is with compliance to the discussion in D19026190.
Reviewed By: ahornby
Differential Revision: D20670224
fbshipit-source-id: 24fc124d440fd458a9fa88a906fc3a1cfdbd827e
Summary: In the process the blobstore/factory/lib.rs was split into submodules - this way it was easier to untangle the dependencies and refactor it, so I've left the split in this diff.
Reviewed By: markbt
Differential Revision: D20302068
fbshipit-source-id: caa3a2b5487c30198c62f7e4f4e9cb7c488dc8de
Summary:
As suggested in D20680173, we can reduce the overall need to copy things by
storing refs in the resolver.
Reviewed By: krallin
Differential Revision: D20696588
fbshipit-source-id: 9456e2e208cfef6faed57fc52ca59fafdccfc68c
Summary:
See bottom diff of this stack for overview.
This diff in particular asyncifies the `upload_changeset` fn. Apart from that,
it also makes sure it can accept `&RevlogChangeset` instead of
`RevlogChangeset`, which helps us to get rid of cloning.
Reviewed By: krallin
Differential Revision: D20693932
fbshipit-source-id: b0e5e1604cbfb6f6b6e269c85a79208115325734
Summary: Same as the bottom diff of this stack, but for another file.
Reviewed By: krallin
Differential Revision: D20693934
fbshipit-source-id: 4c2d12bf9d9ab272898a7830ece6d9f563adb8fb
Summary:
This diff focuses on the following:
- replaces clones with references, both when this decreases the total sum of
clones, and when it causes the only clone to be on the boundary with the
compat code. This, when those boundaries are pushed further, we can only fix
one place in resolver
- removes a weird wrapping of a closure into an `Arc` and just calls
`upload_changesets` directly instead
- in cases when `BundleResolver` methods take `ctx` as an argument removes it
and makes those methods use the one stored in the struct
Reviewed By: StanislavGlebik
Differential Revision: D20680173
fbshipit-source-id: c397c4ade57a07cbbc9206fa8a44f4225426778c
Summary:
This bitrot with two different changes:
- D19473960 put it on a v2 runtime, but the I/O is v1 so that doesn't work (it
panics).
- The clap update a couple months ago made duplicate arguments illegal, and a
month before that we had put `debug` in the logger args (arguably where it
belong), so this binary was now setting `debug` twice, which would now panic.
Evidently, just static typing wasn't quite enough to keep this working through
time (though that's perhaps due to the fact that both of those changes were
invisible to the type system), so I also added a smoke test for this.
Reviewed By: farnz
Differential Revision: D20618785
fbshipit-source-id: a1bf33783885c1bb2fe99d3746d1b73853bcdf38
Summary:
As the name indicates, this updates unbundle_replay to run hooks. Hook failures
don't block the replay, but they're logged to Scuba.
Differential Revision: D20693851
fbshipit-source-id: 4357bb0d6869a658026dbc5421a694bc4b39816f
Summary:
Setting up a derived data tailer for this is a better approach (see D20668301
for context).
Reviewed By: StanislavGlebik
Differential Revision: D20693270
fbshipit-source-id: 7a06ffe059c41c4e100f8b0f8837978717293829
Summary:
Since we do those concurrently, it makes sense to do them on their own task.
Besides, since those are still old futures that need ownership, there is
effectively no tradeoff here.
Differential Revision: D20691373
fbshipit-source-id: 1a45e43ec857d91bed1614568b4354d56a2b0848
Summary: This will make it easier to compare performance.
Differential Revision: D20674164
fbshipit-source-id: eb1a037b0b060c373c1e87635f52dd228f728c89
Summary: This adds some Scuba reporting to unbundle_replay.
Differential Revision: D20674162
fbshipit-source-id: 59e12de90f5fca8a7c341478048e68a53ff0cdc1
Summary:
This updates unbundle_replay to do things concurrently where possible.
Concretely, this means we do ingest unbundles concurrently, and filenodes
derivation concurrently, and only do the actual pushrebase sequentially. This
lets us get ahead on work wherever we can, and makes the process faster.
Doing unbundles concurrently isn't actually guaranteed to succeed, since it's
*possible* that an unbundle coming in immediately after a pushrebase actually
depends the commits created in said pushrebase. In this case, we simply retry
the unbundle when we're ready to proceed with the pushrebase (in the code, this
is the `Deferred` variant). This is fine from a performance perspective
As part of this, I've also moved the loading of the bundle to processing, as
opposed to the hg recording client (the motivation for this is that we want to
do this loading in parallel as well).
This will also let us run hooks in parallel once I add this in.
Reviewed By: StanislavGlebik
Differential Revision: D20668301
fbshipit-source-id: fe2c62ca543f29254b4c5a3e138538e8a3647daa
Summary:
pushrebase_errmsg is NULL when we have conflicts, but we still shouldn't replay
the entry (because it'll fail, with conflicts). Let's exclude those.
Reviewed By: StanislavGlebik
Differential Revision: D20668304
fbshipit-source-id: a058bb466e0a8a53ec81e41db7ba138d6aedf3f9
Summary:
This updates unbundle_replay to support sleeping when watching for updates in a
bookmark and said bookmark isn't moving. This will be useful so it can run as a
service.
Reviewed By: StanislavGlebik
Differential Revision: D20645157
fbshipit-source-id: 6edeb66b65b2ef8b88c8db5e664982756acbfaf1
Summary:
I accidentally forgot to insert the entry, so that made this test a bit
useless. Let's make it not useless.
Reviewed By: StanislavGlebik
Differential Revision: D20645158
fbshipit-source-id: 0f0eb0cf9d16e8c346897088891aa3277b4d9c07
Summary:
This adds support for replaying the updates to a bookmark through unbundle
replay. The goal is to be able to run this as a process that keeps a bookmark
continuously updated.
There is still a bit of work here, since we don't yet allow the stream to pause
until bookmark update becomes available (i.e. once caught up, it will exit).
I'll introduce this in another diff.
Note that this is only guaranteed to work if there is a single bookmark in the
repo. With more, it could fail if a commit is first introduced in a bookmark that
isn't the one being replayed here, and later gets introduced in said bookmark.
Reviewed By: StanislavGlebik
Differential Revision: D20645159
fbshipit-source-id: 0aa11195079fa6ac4553b0c1acc8aef610824747
Summary:
I'm going to update this to run in a loop, so to do that it would be nice to
represent the things to replay as a stream. This does that change, but for now
all our streams have just one element.
Reviewed By: StanislavGlebik
Differential Revision: D20645156
fbshipit-source-id: fce7536d0ccbc1911335704816b71c17e80f2116
Summary:
We normally derive those lazily when accepting pushrebase, but we do derive
them eagerly in blobimport. For now, let's be consistent with blobimport.
This ensures that we don't lazily generate them, which would require read traffic,
and gives a picture a little more consistent with what an actual push would look like.
Reviewed By: ikostia
Differential Revision: D20623966
fbshipit-source-id: 2209877e9f07126b7b40561abf3e6067f7a613e6
Summary:
This makes it easier to realize if you used the wrong entry ID when replaying
(instead of telling you the bookmark isn't at `None` as expected, it tells you
the Hg Changeset could not be mapped to a Bonsai).
Reviewed By: ikostia
Differential Revision: D20623847
fbshipit-source-id: aaa66e7825f12373742efd4f779ae20ff21f0b46
Summary:
This updates unbundle_replay to account for pushrebase hooks, notably to assign
globalrevs.
To do so, I've extracted the creation of pushrebase hooks in repo_client and
reused it in unbundle_replay. I also had to update unbundle_replay to no longer
use `args::get_repo` since that doesn't give us access to the config (which we
need to know what pushrebase hooks to enable).
Reviewed By: ikostia
Differential Revision: D20622723
fbshipit-source-id: c74068c920822ac9d25e86289a28eeb0568768fc
Summary:
This adds a unbundle_replay Rust binary. Conceptually, this is similar to the
old unbundle replay Python script we used to have, but there are a few
important differences:
- It runs fully in-process, as opposed to pushing to a Mononoke host.
- It will validate that the pushrebase being produced is consistent with what
is expected before moving the bookmark.
- It can find sources to replay from the bookmarks update log (which is
convenient for testing).
Basically, this is to writes and to the old unbundle replay mechanism what
Fastreplay is to reads and to the traffic replay script.
There is still a bit of work to do here, notably:
- Make it possible to run this in a loop to ingest updates iteratively.
- Run hooks.
- Log to Scuba!
- Add the necessary hooks (notably globalrevs)
- Set up pushrebase flags.
I would also like to see if we can disable the presence cache here, which would
let us also use this as a framework for benchmarking work on push performance,
if / when we need that.
Reviewed By: StanislavGlebik
Differential Revision: D20603306
fbshipit-source-id: 187c228832fc81bdd30f3288021bba12f5aca69c
Summary: I'd like to get the timestamps here without needing to clone them.
Reviewed By: StanislavGlebik
Differential Revision: D20603308
fbshipit-source-id: 2d8f72b4fb3a3eed33b58dc2f0fb1a857bb3f5b9
Summary:
This updates pushrebase hooks to allow into_transaction_hook to be async (the
reason I hadn't made it async is because it hadn't been needed yet).
Currently, this is a no-op, but I'm going to use this later in this stack.
Reviewed By: StanislavGlebik
Differential Revision: D20603307
fbshipit-source-id: 79651184dbe08322c4cab03d7119a31036391852
Summary:
A few of our tasks failed on startup and most likely it was during warmup
though we are not sure (see attached task).
Let's add move logging
Reviewed By: farnz
Differential Revision: D20698273
fbshipit-source-id: 4facd21a94d2917103e417a014b820c893da4718
Summary:
The IdDag provides graph algorithms using Segments.
The IdMap allows converting from the SegmentedChangelogId domain to the
ChangesetId domain.
The Dag struct wraps IdDag and IdMap in order to provide graph algorithms using
the common application level identifiers for commits (ChangesetId).
The construction of the Dag is currently mocked with something that can only be
used in a test environment (unit tests but also integration tests).
This diff also implements a location_to_name function. This is the most
important new functionality that segmented changelog clients require. It
recovers the hash of a commit for which the client only has a segmented
changelog Id. The current assumption is that clients have identifiers for all
merge commit parents so the path to a known commit always follow a set
of first parents.
The IdMap queries will have to be changed to async in the future, but IdDag
queries we expect to stay sync.
Reviewed By: quark-zju
Differential Revision: D20635577
fbshipit-source-id: 4f9bd8dd4a5bd9b0de55f51086f3434ff507963c