mononoke: add --no-upload-if-less-than-chunks option to streaming_changelog

Summary:
We usually had option called "--skip-last-chunk" set, however turned out it's a
dangerous option to use. Normally a streaming changelog has only a single head,
however if we add --skip-last-chunk then there might be multiple heads in the
repo if the skipped chunk contained merge commits.
And this code https://fburl.com/code/0fkgmnaj just uses a "tip" commit to do a
pull right after streaming clone. The problem is that "tip" can be only a
single head out of all the heads we have. And tip can be a commit with a very
low generation number, in that case the subsequent pull would try to pull too
many commits at once.

In order to fix the issue I suggest to add --no-upload-if-less-than-chunks
option. This option wouldn't even try uploading chunks if we have too few of
them. So on the one hand it would allow us to not create too many small chunks,
but on the other it would fix the issues mentioned above.

Note that this option would mean that we would create more chunks than we
otherwise would. For example if --no-upload-if-less-than-chunks is 2, then on
average we would create 2x more chunks.

Reviewed By: Croohand

Differential Revision: D31957177

fbshipit-source-id: 8fa9476e4236d962d5f4f0fae96ce6db996a22dc
This commit is contained in:
Stanislau Hlebik 2021-10-28 00:06:14 -07:00 committed by Facebook GitHub Bot
parent d546bee255
commit 7e25291af1
2 changed files with 38 additions and 0 deletions

View File

@ -33,6 +33,7 @@ pub const MAX_DATA_CHUNK_SIZE: &str = "max-data-chunk-size";
pub const SKIP_LAST_CHUNK_ARG: &str = "skip-last-chunk";
pub const STREAMING_CLONE: &str = "streaming-clone";
pub const TAG_ARG: &str = "tag";
pub const NO_UPLOAD_IF_LESS_THAN_CHUNKS_ARG: &str = "no-upload-if-less-than-chunks";
pub const UPDATE_SUB_CMD: &str = "update";
pub async fn streaming_clone<'a>(
@ -133,6 +134,20 @@ async fn update_streaming_changelog(
skip_last_chunk,
)?;
let no_upload_if_less_than_chunks: Option<usize> =
args::get_and_parse_opt(sub_m, NO_UPLOAD_IF_LESS_THAN_CHUNKS_ARG);
if let Some(at_least_chunks) = no_upload_if_less_than_chunks {
if chunks.len() < at_least_chunks {
info!(
ctx.logger(),
"has too few chunks to upload - {}. Exiting",
chunks.len()
);
return Ok(0);
}
}
info!(ctx.logger(), "about to upload {} entries", chunks.len());
let chunks = upload_chunks_blobstore(&ctx, &repo, &chunks, &idx, &data).await?;
@ -479,6 +494,14 @@ fn add_common_args<'a, 'b>(sub_cmd: App<'a, 'b>) -> App<'a, 'b> {
.required(false)
.help("skip uploading last chunk. "),
)
.arg(
Arg::with_name(NO_UPLOAD_IF_LESS_THAN_CHUNKS_ARG)
.long(NO_UPLOAD_IF_LESS_THAN_CHUNKS_ARG)
.takes_value(true)
.required(false)
.conflicts_with(SKIP_LAST_CHUNK_ARG)
.help("Do not do anything if we have less than that number of chunks to upload"),
)
}
#[fbinit::main]

View File

@ -178,3 +178,18 @@ Clone it again to make sure saved streaming chunks are valid
$ cd "$TESTTMP"
$ diff repo-streamclone-2/.hg/store/00changelog.i repo-streamclone-3/.hg/store/00changelog.i
$ diff repo-streamclone-2/.hg/store/00changelog.d repo-streamclone-3/.hg/store/00changelog.d
Check no-upload-if-less-than-chunks option
$ sqlite3 "$TESTTMP/monsql/sqlite_dbs" "delete from streaming_changelog_chunks where repo_id = 0;"
$ streaming_clone create --dot-hg-path "$TESTTMP/repo-hg/.hg" --no-upload-if-less-than-chunks 2
* using repo "repo" repoid RepositoryId(0) (glob)
* Reloading redacted config from configerator (glob)
* current sizes in database: index: 0, data: 0, repo: repo (glob)
* has too few chunks to upload - 1. Exiting, repo: repo (glob)
$ streaming_clone create --dot-hg-path "$TESTTMP/repo-hg/.hg" --no-upload-if-less-than-chunks 2 --max-data-chunk-size 1
* using repo "repo" repoid RepositoryId(0) (glob)
* Reloading redacted config from configerator (glob)
* current sizes in database: index: 0, data: 0, repo: repo (glob)
* about to upload 3 entries, repo: repo (glob)
* inserting into streaming clone database, repo: repo (glob)
* current max chunk num is None, repo: repo (glob)