Summary:
We want to protect against races where we may have multiple writers with
different IdMap versions. We want IdMapVersions to be monotonous increasing.
The common race scenario is between tailers and seeder. We may have a tailer
that starts to updates while a seeder writes a version with a new idmap. The
tailer may try to write a version that has an old version of the idmap. We want
the tailer to fail in this case.
Reviewed By: krallin
Differential Revision: D27210035
fbshipit-source-id: 168c7c6f25622587071337df1172a12b799e0285
Summary:
In production, all repos are instantiated roughly the same time so all reload
processes are started roughly the same time. Reload makes a bunch of requests
and could potentially cause load. Jitter spreads out the load of the reload.
Avoiding the load spike will make overall server behavior more predictable.
Reviewed By: krallin
Differential Revision: D27280117
fbshipit-source-id: 0727af2e7f231a5b6c948424022788a8e7071f82
Summary:
We would like to distribute the load of update process when many repositories
have Segmented Changelog enabled. Without the jitter all enabled repositories
start their update at roughly the same time. The jitter smooth out the load and
reduces variance in the update process.
Reviewed By: krallin
Differential Revision: D27280118
fbshipit-source-id: 41ad83b09700da1ef70c09dd5d284977e53a95a2
Summary:
`build_from_scratch` only called in `run_with_idmap_version` so we can inline
the code so that the seed process reads better.
This function used to be used as a shortcut towards getting a built dag but now
we prefer to fetch the dags from the store that the seeder writes to.
Reviewed By: krallin
Differential Revision: D27210036
fbshipit-source-id: 0b31ff1126a0f4904578da333cf6d34d69b2782c
Summary:
Removing the last callsite for SegmentedChangelogBuilder means that
the whole class goes away.
Reviewed By: krallin
Differential Revision: D27208339
fbshipit-source-id: 006aa91ed9656e4c33b082cbed82f9a497b0d267
Summary:
We are removing SegmentedChangelogBuilder.
Remove the last uses of Builder in the tests module.
Reviewed By: krallin
Differential Revision: D27208341
fbshipit-source-id: 00f1aaa2376ee5d68dbf7c1256b312cfe0b96d86
Summary:
Any functions that returns SegmentedChangelog is a valid argument for
reloading.
Reviewed By: krallin
Differential Revision: D27202520
fbshipit-source-id: fe903c6be4646c8ec98058d1a025829268c36619
Summary:
PeriodReloading is not fundamentally tied to the Manager. A future change will
update the load function.
Reviewed By: krallin
Differential Revision: D27202524
fbshipit-source-id: a0e4b08cb8605d071d5f30be8c3054f75321aa9c
Summary:
Let's look at these test from a higher perspective. Right now the tests use
internal APIs because not all components were ready when they were added. We
now have components for all the parts of the lifecycle so we can set up tests
in the same form that we would set up the production workflow.
This simplifies the API structure of the components since they can be catered
to one workflow.
Reviewed By: krallin
Differential Revision: D27202530
fbshipit-source-id: 6ec10a0b1ae49da13cfbe803e120a4e754b35fc7
Summary:
The broad goal is to get rid of SegmentedChangelogBuilder.
We will have a new constructor for Seeder, one that uses non segmented_changelog
dependencies as input.
Reviewed By: krallin
Differential Revision: D27202523
fbshipit-source-id: d420507502925d4440d5c3058efef0a4d2dbe895
Summary:
For tests that don't care about the bookmarks specifically, we want to use the
default bookmark name that we defined in BOOKMARK_NAME.
FWIW, it makes sense for this bookmark to be set by blobrepo even... at least
the fixtures. They set a bookmark and it makes sense for us to have a reference
to the bookmark that they set. Something to think about.
Reviewed By: krallin
Differential Revision: D27202522
fbshipit-source-id: 7615e4978dded491dd04ae44ce0b85134a252feb
Summary:
This gets rid of the odd builder for the Seeder.
We can get into design discussions with this one. What is struct and what is
function? For real structures that provide some behavior, I prefer to put
dependencies in owned data. Things that are part of the request go into
function parameters. In mononoke, RepositoryId is the common exception.
Anyway, IdMapVersion is part of the request for seeding. It is useful to have
that as a parameter when starting the seeder.
Reviewed By: krallin
Differential Revision: D27202528
fbshipit-source-id: a67b33493b20d2813fd0a144b9bb7f4510635ae8
Summary:
Convert `BlobRepo` to a `facet::container`. This will allow it to be built
from an appropriate facet factory.
This only changes the definition of the structure: we still use
`blobrepo_factory` to construct it. The main difference is in the types
of the attributes, which change from `Arc<dyn Trait>` to
`Arc<dyn Trait + Send + Sync + 'static`>, specified by the `ArcTrait` alias
generated by the `#[facet::facet]` macro.
Reviewed By: StanislavGlebik
Differential Revision: D27169437
fbshipit-source-id: 3496b6ee2f0d1e72a36c9e9eb9bd3d0bb7beba8b
Summary:
The broad goal here is to remove SegmentedChangelogBuilder.
Looking at Seeder dependencies here.
Reviewed By: StanislavGlebik
Differential Revision: D27202527
fbshipit-source-id: 164da1c4d202cc0f069f67963b71920dca9bcba5
Summary:
The original for this test was to desribe how the SegmentedChangelogBuilder was
to be used. We are removing SegmentedChangelogBuilder. This test goes away now.
Reviewed By: ahornby
Differential Revision: D27202521
fbshipit-source-id: e39a47411c61e8812d4081f6ee02323732369c1b
Summary:
This is part of removing SegmentedChangelogBuilder.
The Tailer constructor directly specifies the Mononoke requirements that is
needs to be provided in order to function.
Reviewed By: ahornby
Differential Revision: D27163312
fbshipit-source-id: 961066e2aa4e88c330841f7362b3ba17d0030638
Summary:
The problem appears when we have shared idmap moving ahead of the memory idmap
before we ever write to the memory idmap. In that case we would incorrectly
fetch last shared from the shared idmap. When that shared last entry is larger
than the cutoff we would try to assign ids starting from the shared entry. When
updating the IdDag it would assume that it has to insert all ids above the
cutoff but it would fail to resolve all ids exactly above the cutoff.
For example, MemIdMap is empty, cutoff is 5, shared idmap last entry is 7. We
asign 8-10 then the IdDag tries to search for 5-10 and fails to resolve 5.
This function was not updated after adding cutoff to OverlayIdMap in an earlier
diff.
Reviewed By: quark-zju
Differential Revision: D27248367
fbshipit-source-id: 97fc1efe8cdfb446c4571196dcef7c2db9a43330
Summary:
Resolve a circular dependency whereby `BlobRepo` needs to depend on
`Arc<dyn SegmentedChangelog>`, but the segmented changelog implementation
depends on `BlobRepo`, by moving the trait definition to its own crate.
Reviewed By: sfilipco
Differential Revision: D27169423
fbshipit-source-id: 5bf7c632607dc8baba40d7a9d65e96e265d58496
Summary:
This test verifies that the issue we had previously with assign_ids does not
creep up again.
Reviewed By: quark-zju
Differential Revision: D27105741
fbshipit-source-id: 49b385b2026b599c92c406331a2299931a2eae46
Summary: Update the logs so that it's more clear what is going on.
Reviewed By: quark-zju
Differential Revision: D27145099
fbshipit-source-id: 11ec7b467157d07dd41893dc82f251a1c555365f
Summary:
We are also going to make update the IdMapVersionStore before we start writing
the IdMap. This means that if we crash while writing the IdMap, future runs
don't try to use the same IdMapVersion that we used previously.
Reviewed By: quark-zju
Differential Revision: D27145097
fbshipit-source-id: b911e2dca32d0fe8ae0aead3de75373dd2f936c4
Summary:
We are going to build the iddag before starting to write the idmap.
This means if the iddag fails to build for whatever reason we would not have
written a potentially useless idmap.
Reviewed By: quark-zju
Differential Revision: D27145098
fbshipit-source-id: c9045abea2a1f5a8b96c524d546776fdc693b56a
Summary:
`update::build` is only used by the Seeder. The steps in this function are not
isolated enough from the seeder to have a separate function. The seeder has the
role of builing it's own type of StartState. It is also the only process that
deals with the IdMapVersionStore. The seeder is particular enough that it makes
sense to inline it's build order.
Reviewed By: quark-zju
Differential Revision: D27099265
fbshipit-source-id: f86b8d7d4637a5f2582e70fc58b60c2041b93548
Summary:
The most important invariant for IdDag is that parent nodes have ids that are
smaller than child nodes. We had a couple of issues that resulted in failing
this invariant so we are adding these extra checks. They will help us diagnose
issues faster and proctect protect production data against faulty updates.
Reviewed By: quark-zju
Differential Revision: D27092204
fbshipit-source-id: 1f052b290a494e267fac2f551ba51582baa67973
Summary: Shadowing can end up being more confusing.
Reviewed By: quark-zju
Differential Revision: D27143481
fbshipit-source-id: 0a1913d8952fe913cc7596b9aea84df2d62cc3fe
Summary:
Check that head has a dag id assignment after finishing the process. This was
done at a later point but it is better to group it with assignment process so
that we have a clear source of the error.
Reviewed By: quark-zju
Differential Revision: D27143482
fbshipit-source-id: 2a94cee70142967b4f8d57df43dfcc339a0b4f2e
Summary:
Segmented Changelog distinguishes commits into two groups: MASTER and
NON_MASTER. The MASTER group is assumed to big and special attention is payed
to it. Algorithms optimize for efficiency on MASTER.
The current state for the segmented_changelog crate in Mononoke is that it does
not assign NON_MASTER commits. It doesn't need to right now. We want to
establish a baseline with the MASTER group. It was however possible for the
on demand update dag to assign commits that were no in the master group to the
master group because no explicit checks were performed. That could lead to
surprising behavior.
At a high level, the update logic that we want is: 1. assign the master
bookmark changeset to the MASTER group, 2. assign other commits that we need to
operate on to the NON_MASTER group. For now we need 1, we will implement 2
later.
Reviewed By: krallin
Differential Revision: D27070083
fbshipit-source-id: 922bcde3641ca25512000cd1a912c5b399bdff4b
Summary:
Pull in SegmentedChangelogConfig and build a SegmentedChangelog instance.
This ties the config with the object that we build on the servers.
Separating the instatiation of the sql connections from building any kind of
segmented changelog structure. The primary reason is that there may be multiple
objects that get instantiated and for that it is useful to be able to pass
this object around.
Reviewed By: krallin
Differential Revision: D26708175
fbshipit-source-id: 90bc22eb9046703556381399442117d13b832392
Summary:
This was lost somehow. I probably incorrectly resolved some conflict when
rebasing a previous change.
Reviewed By: quark-zju
Differential Revision: D27146022
fbshipit-source-id: 13bb0bb3df565689532b2ab5299cd757f278f26e
Summary:
Finding a parent that was previously found signals that we want to assign
that changeset sooner if it was not already assigned.
Reviewed By: quark-zju
Differential Revision: D27092205
fbshipit-source-id: ed39a91460ff2f91a458236cdab8018341ec618b
Summary:
Seeding fbsource I found that loading the commits from sql took longer than I
was expecting, around 90 minutes where I was expecting around 10 miuntes.
I added more logging to validate that commits were actively loaded rather
than something being stuck.
Reviewed By: krallin
Differential Revision: D27084739
fbshipit-source-id: 07972707425ecccd4458eec849c63d6d9ccd923d
Summary:
Pretty big bug here with the "Overlay" when we are updating both stores. It
turns out that we don't really want a standard Overlay. We want the loaded
iddag to operate with the Ids in the shared IdMap and we want whatever is
updates to use the in process IdMap. The problem we have with the overlay is
that the shared IdMap may have more data than the in process IdMap. The shared
IdMap is always updated by the tailer, after all. This means that when we query
the overlay, we may get data from the shared store even if this is the first
time we are trying to update a changeset for the current process.
The solution here is to specify which vertexes are fetched from either store.
Reviewed By: quark-zju
Differential Revision: D27028367
fbshipit-source-id: e09f003d94100778eabd990724579c84b0f86541
Summary:
Using the generic load function from SegmentedChangelogManager. This is the
config SegmentedChangelog that is consistent with the specified configuration.
I wanted to have another look at ArcSwap to understand if
`Arc<ArcSwap<Arc<dyn SegmentedChangelog>>>` was the type that it was
recommending for our situation and indeed it is.
Reviewed By: quark-zju
Differential Revision: D27028369
fbshipit-source-id: 7c601d0c664f2be0eef782700ef4dcefa9b5822d
Summary:
Scuba stats provide a lot of context around the workings of the service.
The most interesting operation for segmented changelog is the update.
Reviewed By: krallin
Differential Revision: D26770846
fbshipit-source-id: a5250603f74930ef4f86b4167d43bdd1790b3fce
Summary:
STATS!!!
Count, success, failure, duration. Per instances, per repo.
I wavered on what to name the stats. I wondered whether it was worth being more
specific that "mononoke.segmented_changelog.update" with something like
"inprocess". In my view the in process stats are more important than the tailer
stats because the tailer is more simple and thus easier to understand. So I add
extra qualifications to the tailer stats and keep the name short for inprocess
stats.
Reviewed By: krallin
Differential Revision: D26770845
fbshipit-source-id: 8e02ec3e6b84621327e665c2099abd7a034e43a5
Summary: Currently unused. Will add stats the reference it.
Reviewed By: krallin
Differential Revision: D26770847
fbshipit-source-id: d5694cd221c90ba3adaf89345ffeb06fa46b9e7b
Summary:
I am not sure why the integration tests didn't fail for this one. I know that
a similar issue was caught last week. Probably one of those cases where not
all tests ran. Anyway. SegmentedChangelogManager requires bookmarks now.
It's not going to use them with the way to SegmentedChangelog is built. Using
the bookmarks needs another code change.
I noticed this because it was failing the Tailer. It will crash Mononoke too.
Long story on why the tailer uses this codepath. Needless to say, we don't want
Mononoke crashing so FIX :)
Reviewed By: quark-zju
Differential Revision: D26962608
fbshipit-source-id: 6efafc67f0816792b841af2cc456edc0cc579460
Summary:
Using a more specific name. Looking to differentiate between tailer update
and in process dag update.
Reviewed By: krallin
Differential Revision: D26770844
fbshipit-source-id: b35e6e705a0bfac6289c70a8e8e8cb9ba38a8d99
Summary:
Our production setup has an OnDemandUpdateSegmentedChangelog that gets updated
in various ways. With a setup where the dag is reloaded completely from saves,
we need a factory for the OnDemandUpdateSegmentedChangelog.
SegmentedChangelogManager takes the role of being the factory for our
production Dags.
At some point we will remove the SegmentedChangelog implementation for Manager.
Reviewed By: krallin
Differential Revision: D26708173
fbshipit-source-id: b3d8ea612b317af374f2c0ce6d7c512e3b09b2d2