Summary:
In D26945466 (7a3539b9c6) I started to use correct repo name for backup repos whenever we
sync an entry. However most of the time sync job is idle, and while doing so it
also logs a heartbeat to scuba table. But it was using wrong repo_id for that
(i.e. for instagram-server_backup it was using instagram-server repo_id). This
diff fixes that.
Reviewed By: krallin
Differential Revision: D27123193
fbshipit-source-id: 80425a56ad0a432180f420f5c7957105407e0fc9
Summary:
Unlike the source control service, requests aren't usually cancelled in the
main server. However, if the request doesn't complete within the shutdown
timeout, it does get cancelled.
Add logging for this case.
Reviewed By: krallin
Differential Revision: D27086622
fbshipit-source-id: dbd9dee1a6a84b4cd5570302a0a62fb96d2489aa
Summary: Can simplify the decode as it was just passing the metadata in then out again
Reviewed By: krallin
Differential Revision: D27044277
fbshipit-source-id: 4e8fb995d3643f5420f9315fab6453b027be6297
Summary: this is useful so we can measure the effect of cachelib marshalling overhead in CacheBlob separately without zstd compression in play
Reviewed By: krallin
Differential Revision: D27043229
fbshipit-source-id: cf7e35688bdd96c029ee7858f59e46583726f271
Summary: Can use the ones from BlobstoreOptions rather than doing our own
Reviewed By: krallin
Differential Revision: D27043230
fbshipit-source-id: d3db19adaf8819709d069296dec955b2159d5546
Summary: Benchmark had a partial duplicate of the sqlblob opening code from blobstore factory but without unsharded support, so use factory instead.
Reviewed By: krallin
Differential Revision: D27060010
fbshipit-source-id: d1c5704cdec17e3d0b1b54538caf7a3893c3610f
Summary:
Finding a parent that was previously found signals that we want to assign
that changeset sooner if it was not already assigned.
Reviewed By: quark-zju
Differential Revision: D27092205
fbshipit-source-id: ed39a91460ff2f91a458236cdab8018341ec618b
Summary:
Seeding fbsource I found that loading the commits from sql took longer than I
was expecting, around 90 minutes where I was expecting around 10 miuntes.
I added more logging to validate that commits were actively loaded rather
than something being stuck.
Reviewed By: krallin
Differential Revision: D27084739
fbshipit-source-id: 07972707425ecccd4458eec849c63d6d9ccd923d
Summary:
When requests are cancelled, their futures are dropped without completion.
Currently this causes no logs or statistics to be logged, as normally this
would happen after the request implementation completes.
Add logging for cancelled requests. Include the gathered statistics so far,
so that we know how much time was spent on the cancelled request.
Reviewed By: StanislavGlebik
Differential Revision: D27084866
fbshipit-source-id: d4c5c276d496478f0c7caa700627b92d8f9e80a2
Summary:
Pretty big bug here with the "Overlay" when we are updating both stores. It
turns out that we don't really want a standard Overlay. We want the loaded
iddag to operate with the Ids in the shared IdMap and we want whatever is
updates to use the in process IdMap. The problem we have with the overlay is
that the shared IdMap may have more data than the in process IdMap. The shared
IdMap is always updated by the tailer, after all. This means that when we query
the overlay, we may get data from the shared store even if this is the first
time we are trying to update a changeset for the current process.
The solution here is to specify which vertexes are fetched from either store.
Reviewed By: quark-zju
Differential Revision: D27028367
fbshipit-source-id: e09f003d94100778eabd990724579c84b0f86541
Summary:
Using the generic load function from SegmentedChangelogManager. This is the
config SegmentedChangelog that is consistent with the specified configuration.
I wanted to have another look at ArcSwap to understand if
`Arc<ArcSwap<Arc<dyn SegmentedChangelog>>>` was the type that it was
recommending for our situation and indeed it is.
Reviewed By: quark-zju
Differential Revision: D27028369
fbshipit-source-id: 7c601d0c664f2be0eef782700ef4dcefa9b5822d
Summary:
Keep SegmentedChangelog up to date by triggerring an update to the master
bookmark every minute.
Updating SegmentedChangelog in process has the sideeffect of adding some in
process only bookkeeping. Over long periods of time this can result in
increased memory usage. To mitigate any potential issues, we reload Segmented
Changelog every hour. This will make it's parameters more predictable.
Reviewed By: quark-zju
Differential Revision: D27028368
fbshipit-source-id: dae581b9a067c6eae7975b4517203085b168e2f0
Summary:
Several methods (`commit_compare`, `commit_is_ancestor_of`, `commit_file_diffs`
and `commit_common_base_with`) operate on a pair of commits. Currently these
all resolve the other commit manually and in different ways. Commonize the
code, and add contextual information so the caller can see which of the two
commits failed to resolve.
Reviewed By: StanislavGlebik
Differential Revision: D27079920
fbshipit-source-id: a2b735801ed75232dd302061aaff2da23448d812
Summary:
Add a `.context` method for `ServiceError`, which allows the addition of
context information in errors.
Since these are wrapped Thrift errors, we can't use the usual error-chain
mechanism of `std::error::Error`. Instead, we just prepend the message that
the Thrift client will see with the context.
Add an extenstion to `Result` for results that contain an error that can be
converted into a `ServiceError` to allow the addition of context when
processing a chain of `Result`s.
Reviewed By: StanislavGlebik
Differential Revision: D27079921
fbshipit-source-id: a1200f44346530c91bd559f4be0ca2b04f7d4480
Summary:
Initializing twice causes it to fail. Let's not do that, and also let's use
init_mononoke function instead of our adhoc logger and runtime creationg (at
the very least it also initializes tunables and sets correct tokio runtime
parameters).
Also let's add more logging to see the progress of uploading
Reviewed By: ahornby
Differential Revision: D27079673
fbshipit-source-id: 940135a9aed62f7139835b2450a1964b879e814b
Summary:
The way I plan to use new streaming_changelog in prod is by running it
periodically (say, every 15 mins or so). However some repos won't get many
commits in the last 15 mins (in fact, they might get just 1 or 2).
And even for high commit rate repos most of the times the last chunk
will not be a full chunk (i.e. it will be less that --max-data-chunk-size).
If we were just uploading last chunk regardless of its size then the size of
streaming changelog database table would've just keep growing by 1 entry every
15 mins even if it's completely unnecessary. Instead I suggest to add an option
to not upload the last chunk if it's not necessary.
Reviewed By: farnz
Differential Revision: D27045681
fbshipit-source-id: 2d0fed3094944c4ed921f36943b881af394d9c17
Summary:
This command can be used to update already existing streaming changelog.
It takes a newly cloned changelog and updates the new streaming changelog
chunks in the database.
The biggest difference from "create" command is that we first need to figure
out what's already uploaded to streaming changelog. For that two new methods
were added SqlStreamingChunksFetcher.
Reviewed By: farnz
Differential Revision: D27045386
fbshipit-source-id: 36fc9387f621e1ec8ad3eb4fbb767ab431a9d0bb
Summary:
Small refactoring that will be used in the next diff. In the next diff we'll add
"update" command, and this command will specify the chunk number's itself.
So let's move setting chunk numbers from upload_chunks_to_blobstore function
Differential Revision: D27045387
fbshipit-source-id: c5387a60841fe184c6db5edc4812ddd409eb2215
Summary:
Small refactoring that makes a few things easier to do in the later diffs:
1) Adds a verification that checks the data offset
2) We now read the first chunk's offset from revlog, instead of hardcoding it
to 0, 0. This will be useful in "update" commands which needs to skip revlog
entries that already exists in the database
Differential Revision: D27045388
fbshipit-source-id: 4ee80c96d9307c77b1108889e457f10e83c8beb7
Summary: Duplicate name caused getdeps build to fail. This diff fixes it
Reviewed By: krallin
Differential Revision: D27049661
fbshipit-source-id: b23fe52ad89cbe764e656dfe960921ff1ac92b32
Summary: Now that EdenAPI requests are being logged to the same dataset as regular requests (`mononoke_test_perf`), let's prefix the EdenAPI-specific columns with `edenapi_` to avoid confusion.
Reviewed By: krallin
Differential Revision: D26896670
fbshipit-source-id: 92a0710ff1a7297c9cf46ff9bd9576c9bc155e26
Summary: Was fixed at 2 reads, add an option to allow testing read performance more throughly.
Reviewed By: farnz
Differential Revision: D27043234
fbshipit-source-id: 4bb5f49007a4fa67c42e872e236417fa5ce5c9a0
Summary:
Right now, fileblob crashes if those two things are on different devices
(because you cannot atomically rename across devices), so you need to make sure
your TMPDIR is on the same volume as your fileblob.
This is kinda annoying and kinda unnecessary. Let's just fix default to putting
the temporary files into the same location as the blobstore.
Note: while the temp files will be in the same directory as the rest of our
blobs, they don't have the `blob-` prefix (their prefix will be `.tmp`), so
they cannot be read by accident as if they were blobs.
Reviewed By: farnz
Differential Revision: D27046889
fbshipit-source-id: c2b47cd6927eef34ac19325f87f446a6f6532eaf
Summary: Tidy up a bit and use the constants
Reviewed By: krallin
Differential Revision: D27043233
fbshipit-source-id: 3208e2f35c67b4b22bb5f8189cd8c5b399604833
Summary:
Our current straming changelog updater logic is written in python, and it has a
few downsides:
1) It writes directly to manifold, which means it bypasses all the multiplexed
blobstore logic...
2) ...more importantly, we can't write to non-manifold blobstores at all.
3) There are no tests for the streaming changelogs
This diff moves the logic of initial creation of streaming changelog entry to
rust, which should fix the issues mentioned above. I want to highligh that
this implementation only works for the initial creation case i.e. when there are no
entries in the database. Next diffs will add incremental updates functionality.
Reviewed By: krallin
Differential Revision: D27008485
fbshipit-source-id: d9583bb1b98e5c4abea11c0a43c42bc673f8ed48
Summary: Add the EdenAPI endpoint for resolving bookmarks. This is a first pass that just takes a bookmark name as a path variable, to make sure that this is on the right track. We'll want to add a proper request type that includes a list of bookmarks and a response type that can indicate that no bookmark was found. Then the hg bookmark command will also need support for prefix listing capabilities.
Reviewed By: kulshrax
Differential Revision: D26920845
fbshipit-source-id: 067db6a636a75531ee5953392b734c038a58efb6
Summary:
Scuba stats provide a lot of context around the workings of the service.
The most interesting operation for segmented changelog is the update.
Reviewed By: krallin
Differential Revision: D26770846
fbshipit-source-id: a5250603f74930ef4f86b4167d43bdd1790b3fce
Summary:
STATS!!!
Count, success, failure, duration. Per instances, per repo.
I wavered on what to name the stats. I wondered whether it was worth being more
specific that "mononoke.segmented_changelog.update" with something like
"inprocess". In my view the in process stats are more important than the tailer
stats because the tailer is more simple and thus easier to understand. So I add
extra qualifications to the tailer stats and keep the name short for inprocess
stats.
Reviewed By: krallin
Differential Revision: D26770845
fbshipit-source-id: 8e02ec3e6b84621327e665c2099abd7a034e43a5
Summary: Currently unused. Will add stats the reference it.
Reviewed By: krallin
Differential Revision: D26770847
fbshipit-source-id: d5694cd221c90ba3adaf89345ffeb06fa46b9e7b
Summary:
Handle (ignore) git-submodules in gitimport.
git-sub-modules are represented as ObjectType::Commit inside the tree. For now we do not support git-sub-modules but we still need to import repositories that has sub-modules in them (just not synchronized), so ignoring any sub-module for now.
Reviewed By: StanislavGlebik
Differential Revision: D26999625
fbshipit-source-id: eb32247d4ad0325ee433e21a516ac4a92469fd90
Summary: Record some more stats so we can see last finish time. Also record update stats for run and chunk number so can see how far along a run is.
Differential Revision: D26949482
fbshipit-source-id: 5e7df4412c25149559883b6e15afa70e1c670cdc
Summary: All important jobs (SCS Server, LFS Server, Mononoke Server, derived data) have switched successfully. Roll up anything that's been missed by switching the default and letting contbuild take care of it
Reviewed By: krallin
Differential Revision: D26980991
fbshipit-source-id: 2c9f7cd56c38e9e1a2f8374c76141e7a99c88a2a
Summary:
Bounded traversal's internal book-keeping moves the futures returned from fold and unfold callbacks around while they are being queued to be scheduled. If these futures are large, then this can result in a significant portion of bounded traversal's CPU time being spent on `memcpy`ing these futures around.
This can be prevented by always boxing the futures that are returned to bounded traversal. Make this a requirement by changing the type from `impl Future<...>` to `BoxFuture<...>`.
Reviewed By: mitrandir77
Differential Revision: D26997706
fbshipit-source-id: 23a3583adc23c4e7d3607a78e82fc9d1056691c3
Summary:
Previously it was possible to use streaming clone only with xdb table. This
diff changes it
Reviewed By: farnz
Differential Revision: D27008486
fbshipit-source-id: b8d51832dd62b4343b36c3a7a96b83a327056025
Summary:
Knowing the prepushrebase changeset id is required for retroactive review.
Retroactive review checks landed commits, but verify integrity hook runs on a commit before landing. This way the landed commit has no straightforward connection with the original one and retroactive review can't acknowledge if verify integrity have seen it.
Reviewed By: StanislavGlebik
Differential Revision: D26944453
fbshipit-source-id: af1ec3c2e7fd3efc6572bb7be4a8065afa2631c1
Summary:
This tunable is not used anymore, we use
getbundle_high_low_gen_num_difference_threshold instead. Let's remove it.
Differential Revision: D26984966
fbshipit-source-id: 4e8ded5982f7e90c90476ff758b766df55644273
Summary:
The existing query to establish HgChangesetId on the path to FileContentMetadata for LFS validation is quite complex, using HgFilenode linknodes.
This change adds an optional edge from BonsaiHgMappingToHgBonsaiMapping that can be used to simplify the LFS validation case and load less data to get there.
Reviewed By: mitrandir77
Differential Revision: D26975799
fbshipit-source-id: 799acb8228721c1878f33254ebfa5e6345673e5d
Summary: Comment can go as we're using SmallVec now.
Reviewed By: farnz
Differential Revision: D26987009
fbshipit-source-id: f520c90b3a210283d139ba1de8ce140e12a4f875
Summary:
errors are nicer - panic show a huge stack trace, and it takes a long time to
get coredump
Reviewed By: krallin
Differential Revision: D26984264
fbshipit-source-id: 2fa6aca32544d63f0c4264cdbad56eaf55d71955
Summary:
If a filename passes this hook it should be checkoutable on windows.
Unfortunately it's not the case now - we're missing the check for invalid
characters. Although this list might be redundant and some of chars might be
already banned by other hook let's make this hook a place for all
windows-specific filename checks.
Reviewed By: farnz
Differential Revision: D26979418
fbshipit-source-id: f1fa685b9e7e5413d8030d18bc083458b6a148e1