Commit Graph

164 Commits

Author SHA1 Message Date
Stefan Filip
f16a99af95 segmented_changelog: add IdMapVersion check to VersionStore
Summary:
We want to protect against races where we may have multiple writers with
different IdMap versions. We want IdMapVersions to be monotonous increasing.

The common race scenario is between tailers and seeder. We may have a tailer
that starts to updates while a seeder writes a version with a new idmap. The
tailer may try to write a version that has an old version of the idmap. We want
the tailer to fail in this case.

Reviewed By: krallin

Differential Revision: D27210035

fbshipit-source-id: 168c7c6f25622587071337df1172a12b799e0285
2021-03-31 10:29:36 -07:00
Stefan Filip
6d49f0092a segmented_changelog: fix update stats prefix
Summary: typo

Reviewed By: quark-zju

Differential Revision: D27448319

fbshipit-source-id: 5fe2a385b242f5f0f4d660c4dbc58cee793f4569
2021-03-30 18:24:13 -07:00
Stefan Filip
181a3753f4 segmented_changelog: add jitter to periodic reload
Summary:
In production, all repos are instantiated roughly the same time so all reload
processes are started roughly the same time. Reload makes a bunch of requests
and could potentially cause load. Jitter spreads out the load of the reload.
Avoiding the load spike will make overall server behavior more predictable.

Reviewed By: krallin

Differential Revision: D27280117

fbshipit-source-id: 0727af2e7f231a5b6c948424022788a8e7071f82
2021-03-25 21:11:38 -07:00
Stefan Filip
4349fa4aed segmented_changelog: add jitter to periodic update start
Summary:
We would like to distribute the load of update process when many repositories
have Segmented Changelog enabled. Without the jitter all enabled repositories
start their update at roughly the same time. The jitter smooth out the load and
reduces variance in the update process.

Reviewed By: krallin

Differential Revision: D27280118

fbshipit-source-id: 41ad83b09700da1ef70c09dd5d284977e53a95a2
2021-03-25 21:11:38 -07:00
Stefan Filip
35cea2ccc0 segmented_changelog: inline Seeder:build_from_scratch
Summary:
`build_from_scratch` only called in `run_with_idmap_version` so we can inline
the code so that the seed process reads better.

This function used to be used as a shortcut towards getting a built dag but now
we prefer to fetch the dags from the store that the seeder writes to.

Reviewed By: krallin

Differential Revision: D27210036

fbshipit-source-id: 0b31ff1126a0f4904578da333cf6d34d69b2782c
2021-03-25 21:11:38 -07:00
Stefan Filip
6101529d05 segmented_changelog: remove SegmentedChangelogBuilder
Summary:
Removing the last callsite for SegmentedChangelogBuilder means that
the whole class goes away.

Reviewed By: krallin

Differential Revision: D27208339

fbshipit-source-id: 006aa91ed9656e4c33b082cbed82f9a497b0d267
2021-03-25 21:11:38 -07:00
Stefan Filip
2562af5f66 segmented_changelog: update tests away from Builder
Summary:
We are removing SegmentedChangelogBuilder.
Remove the last uses of Builder in the tests module.

Reviewed By: krallin

Differential Revision: D27208341

fbshipit-source-id: 00f1aaa2376ee5d68dbf7c1256b312cfe0b96d86
2021-03-25 21:11:37 -07:00
Stefan Filip
bced59263b segmented_changelog: update PeriodicReload to accept function
Summary:
Any functions that returns SegmentedChangelog is a valid argument for
reloading.

Reviewed By: krallin

Differential Revision: D27202520

fbshipit-source-id: fe903c6be4646c8ec98058d1a025829268c36619
2021-03-25 21:11:37 -07:00
Stefan Filip
af2e5ec339 segmented_changelog: move PeriodicReload to its own module
Summary:
PeriodReloading is not fundamentally tied to the Manager. A future change will
update the load function.

Reviewed By: krallin

Differential Revision: D27202524

fbshipit-source-id: a0e4b08cb8605d071d5f30be8c3054f75321aa9c
2021-03-25 21:11:37 -07:00
Stefan Filip
dfe070eea2 segmented_changelog: remove Builder from VersionStore tests
Summary: The broad goal is to remove SegmentedChangelogBuilder.

Reviewed By: krallin

Differential Revision: D27202526

fbshipit-source-id: cfc7a2782b4b5d432c1f47f0fdc988bb7076352c
2021-03-25 21:11:37 -07:00
Stefan Filip
24f4bd74c3 segmented_changelog: remove Builder from CachedIdMap tests
Summary: The broad goal is to remove SegmentedChangelogBuilder

Reviewed By: krallin

Differential Revision: D27202525

fbshipit-source-id: b03844a25931f09d7fccf407127414202eca5e7a
2021-03-25 21:11:37 -07:00
Stefan Filip
618eb699c6 segmented_changelog: remove Builder usage from SqlIdMap tests
Summary: The broad goal is to remove SegmentedChangelogBuilder.

Reviewed By: krallin

Differential Revision: D27202529

fbshipit-source-id: dda61ed0bbfaa86460736d482215e55afca535fa
2021-03-25 21:11:36 -07:00
Stefan Filip
850dcbf439 segmented_changelog: remove Builder usage from IdMapVersionStore tests
Summary: The broad goal is to delete SegmentedChangelogBuilder.

Reviewed By: krallin

Differential Revision: D27202519

fbshipit-source-id: a6daff5e1068535a368481691dad6941e9da5a9e
2021-03-25 21:11:36 -07:00
Stefan Filip
6160ce352a segmented_changelog: add seed method to tests
Summary:
Let's look at these test from a higher perspective. Right now the tests use
internal APIs because not all components were ready when they were added.  We
now have components for all the parts of the lifecycle so we can set up tests
in the same form that we would set up the production workflow.

This simplifies the API structure of the components since they can be catered
to one workflow.

Reviewed By: krallin

Differential Revision: D27202530

fbshipit-source-id: 6ec10a0b1ae49da13cfbe803e120a4e754b35fc7
2021-03-25 21:11:36 -07:00
Stefan Filip
b8aced7ceb segmented_changelog: rename Seeder:new to new_from_built_dependencies
Summary:
The broad goal is to get rid of SegmentedChangelogBuilder.
We will have a new constructor for Seeder, one that uses non segmented_changelog
dependencies as input.

Reviewed By: krallin

Differential Revision: D27202523

fbshipit-source-id: d420507502925d4440d5c3058efef0a4d2dbe895
2021-03-25 21:11:36 -07:00
Stefan Filip
d10a25093d segmented_changelog: consolidate the default bookmark name in tests
Summary:
For tests that don't care about the bookmarks specifically, we want to use the
default bookmark name that we defined in BOOKMARK_NAME.
FWIW, it makes sense for this bookmark to be set by blobrepo even... at least
the fixtures. They set a bookmark and it makes sense for us to have a reference
to the bookmark that they set. Something to think about.

Reviewed By: krallin

Differential Revision: D27202522

fbshipit-source-id: 7615e4978dded491dd04ae44ce0b85134a252feb
2021-03-25 21:11:36 -07:00
Stefan Filip
7fe1c619ac segmented_changelog: remove idmap_version from Seeder fields
Summary:
This gets rid of the odd builder for the Seeder.

We can get into design discussions with this one. What is struct and what is
function? For real structures that provide some behavior, I prefer to put
dependencies in owned data. Things that are part of the request go into
function parameters. In mononoke, RepositoryId is the common exception.
Anyway, IdMapVersion is part of the request for seeding. It is useful to have
that as a parameter when starting the seeder.

Reviewed By: krallin

Differential Revision: D27202528

fbshipit-source-id: a67b33493b20d2813fd0a144b9bb7f4510635ae8
2021-03-25 21:11:36 -07:00
Mark Juggurnauth-Thomas
6b16e16fa9 blobrepo: convert to facet container
Summary:
Convert `BlobRepo` to a `facet::container`.  This will allow it to be built
from an appropriate facet factory.

This only changes the definition of the structure: we still use
`blobrepo_factory` to construct it.  The main difference is in the types
of the attributes, which change from `Arc<dyn Trait>` to
`Arc<dyn Trait + Send + Sync + 'static`>, specified by the `ArcTrait` alias
generated by the `#[facet::facet]` macro.

Reviewed By: StanislavGlebik

Differential Revision: D27169437

fbshipit-source-id: 3496b6ee2f0d1e72a36c9e9eb9bd3d0bb7beba8b
2021-03-25 07:34:49 -07:00
Stefan Filip
f81221bc55 segmented_changelog: update test_incremental_update_with_desync_iddag
Summary:
The broad goal here is to remove SegmentedChangelogBuilder.
Looking at Seeder dependencies here.

Reviewed By: StanislavGlebik

Differential Revision: D27202527

fbshipit-source-id: 164da1c4d202cc0f069f67963b71920dca9bcba5
2021-03-23 17:08:29 -07:00
Stefan Filip
fa28ff74d5 segmented_changelog: remove tests::test_build_call_together
Summary:
The original for this test was to desribe how the SegmentedChangelogBuilder was
to be used. We are removing SegmentedChangelogBuilder. This test goes away now.

Reviewed By: ahornby

Differential Revision: D27202521

fbshipit-source-id: e39a47411c61e8812d4081f6ee02323732369c1b
2021-03-23 17:08:28 -07:00
Stefan Filip
038867c2c3 segmented_changelog: update Tailer constructor to use Mononoke dependencies
Summary:
This is part of removing SegmentedChangelogBuilder.
The Tailer constructor directly specifies the Mononoke requirements that is
needs to be provided in order to function.

Reviewed By: ahornby

Differential Revision: D27163312

fbshipit-source-id: 961066e2aa4e88c330841f7362b3ba17d0030638
2021-03-23 17:08:28 -07:00
Stefan Filip
3094096af7 segmented_changelog: fix OverlayIdMap::get_last_entry
Summary:
The problem appears when we have shared idmap moving ahead of the memory idmap
before we ever write to the memory idmap. In that case we would incorrectly
fetch last shared from the shared idmap. When that shared last entry is larger
than the cutoff we would try to assign ids starting from the shared entry. When
updating the IdDag it would assume that it has to insert all ids above the
cutoff but it would fail to resolve all ids exactly above the cutoff.

For example, MemIdMap is empty, cutoff is 5, shared idmap last entry is 7. We
asign 8-10 then the IdDag tries to search for 5-10 and fails to resolve 5.

This function was not updated after adding cutoff to OverlayIdMap in an earlier
diff.

Reviewed By: quark-zju

Differential Revision: D27248367

fbshipit-source-id: 97fc1efe8cdfb446c4571196dcef7c2db9a43330
2021-03-22 18:54:23 -07:00
Mark Juggurnauth-Thomas
c83baeb00d segmented_changelog: split trait to separate crate
Summary:
Resolve a circular dependency whereby `BlobRepo` needs to depend on
`Arc<dyn SegmentedChangelog>`, but the segmented changelog implementation
depends on `BlobRepo`, by moving the trait definition to its own crate.

Reviewed By: sfilipco

Differential Revision: D27169423

fbshipit-source-id: 5bf7c632607dc8baba40d7a9d65e96e265d58496
2021-03-22 07:26:47 -07:00
Stefan Filip
38a9add1da segmented_changelog: add test for update::assign_ids
Summary:
This test verifies that the issue we had previously with assign_ids does not
creep up again.

Reviewed By: quark-zju

Differential Revision: D27105741

fbshipit-source-id: 49b385b2026b599c92c406331a2299931a2eae46
2021-03-18 09:51:48 -07:00
Stefan Filip
822209122f segmented_changelog: update logging for seeder
Summary: Update the logs so that it's more clear what is going on.

Reviewed By: quark-zju

Differential Revision: D27145099

fbshipit-source-id: 11ec7b467157d07dd41893dc82f251a1c555365f
2021-03-18 09:51:48 -07:00
Stefan Filip
cd6d171167 segmented_changelog: update idmap version before writing idmap in seeder
Summary:
We are also going to make update the IdMapVersionStore before we start writing
the IdMap.  This means that if we crash while writing the IdMap, future runs
don't try to use the same IdMapVersion that we used previously.

Reviewed By: quark-zju

Differential Revision: D27145097

fbshipit-source-id: b911e2dca32d0fe8ae0aead3de75373dd2f936c4
2021-03-18 09:51:48 -07:00
Stefan Filip
11454ae053 segmented_changelog: update iddag before writing idmap in seeder
Summary:
We are going to build the iddag before starting to write the idmap.
This means if the iddag fails to build for whatever reason we would not have
written a potentially useless idmap.

Reviewed By: quark-zju

Differential Revision: D27145098

fbshipit-source-id: c9045abea2a1f5a8b96c524d546776fdc693b56a
2021-03-18 09:51:47 -07:00
Stefan Filip
d8736b7cf2 segmented_changelog: inline update::build
Summary:
`update::build` is only used by the Seeder. The steps in this function are not
isolated enough from the seeder to have a separate function. The seeder has the
role of builing it's own type of StartState. It is also the only process that
deals with the IdMapVersionStore. The seeder is particular enough that it makes
sense to inline it's build order.

Reviewed By: quark-zju

Differential Revision: D27099265

fbshipit-source-id: f86b8d7d4637a5f2582e70fc58b60c2041b93548
2021-03-18 09:51:47 -07:00
Stefan Filip
ef294cb359 segmented_changelog: check that parents are assigned smaller ids
Summary:
The most important invariant for IdDag is that parent nodes have ids that are
smaller than child nodes. We had a couple of issues that resulted in failing
this invariant so we are adding these extra checks. They will help us diagnose
issues faster and proctect protect production data against faulty updates.

Reviewed By: quark-zju

Differential Revision: D27092204

fbshipit-source-id: 1f052b290a494e267fac2f551ba51582baa67973
2021-03-18 09:51:47 -07:00
Stefan Filip
014b80dc5d segmented_changelog: minor, remove variable shadowing in update_iddag
Summary: Shadowing can end up being more confusing.

Reviewed By: quark-zju

Differential Revision: D27143481

fbshipit-source-id: 0a1913d8952fe913cc7596b9aea84df2d62cc3fe
2021-03-18 09:51:47 -07:00
Stefan Filip
834e35d278 segmented_changelog: move head not assigned error
Summary:
Check that head has a dag id assignment after finishing the process. This was
done at a later point but it is better to group it with assignment process so
that we have a clear source of the error.

Reviewed By: quark-zju

Differential Revision: D27143482

fbshipit-source-id: 2a94cee70142967b4f8d57df43dfcc339a0b4f2e
2021-03-18 09:51:47 -07:00
Stefan Filip
e3cd28089c segmented_changelog: ensure master group consistency in on demand update
Summary:
Segmented Changelog distinguishes commits into two groups: MASTER and
NON_MASTER.  The MASTER group is assumed to big and special attention is payed
to it. Algorithms optimize for efficiency on MASTER.

The current state for the segmented_changelog crate in Mononoke is that it does
not assign NON_MASTER commits. It doesn't need to right now. We want to
establish a baseline with the MASTER group. It was however possible for the
on demand update dag to assign commits that were no in the master group to the
master group because no explicit checks were performed. That could lead to
surprising behavior.

At a high level, the update logic that we want is: 1. assign the master
bookmark changeset to the MASTER group, 2. assign other commits that we need to
operate on to the NON_MASTER group. For now we need 1, we will implement 2
later.

Reviewed By: krallin

Differential Revision: D27070083

fbshipit-source-id: 922bcde3641ca25512000cd1a912c5b399bdff4b
2021-03-17 20:12:27 -07:00
Stefan Filip
9fdb3faff6 segmented_changelog: add builder with SegmentedChangelogConfig
Summary:
Pull in SegmentedChangelogConfig and build a SegmentedChangelog instance.
This ties the config with the object that we build on the servers.

Separating the instatiation of the sql connections from building any kind of
segmented changelog structure. The primary reason is that there may be multiple
objects that get instantiated and for that it is useful to be able to pass
this object around.

Reviewed By: krallin

Differential Revision: D26708175

fbshipit-source-id: 90bc22eb9046703556381399442117d13b832392
2021-03-17 20:12:27 -07:00
Stefan Filip
4217421d20 segmented_changelog: remove unused dependency
Summary:
This was lost somehow. I probably incorrectly resolved some conflict when
rebasing a previous change.

Reviewed By: quark-zju

Differential Revision: D27146022

fbshipit-source-id: 13bb0bb3df565689532b2ab5299cd757f278f26e
2021-03-17 19:49:58 -07:00
Thomas Orozco
840a572036 Daily common/rust/cargo_from_buck/bin/autocargo
Reviewed By: HarveyHunt

Differential Revision: D27124565

fbshipit-source-id: d2e4ca99324ee2037f05741c55a3d6ee8ad98211
2021-03-17 10:48:37 -07:00
Stefan Filip
c81edb9f71 segmented_changelog: fix idmap assignment
Summary:
Finding a parent that was previously found signals that we want to assign
that changeset sooner if it was not already assigned.

Reviewed By: quark-zju

Differential Revision: D27092205

fbshipit-source-id: ed39a91460ff2f91a458236cdab8018341ec618b
2021-03-16 20:38:04 -07:00
Stefan Filip
f9599c714d segmented_changelog: add logging to seeder process commit loading
Summary:
Seeding fbsource I found that loading the commits from sql took longer than I
was expecting, around 90 minutes where I was expecting around 10 miuntes.
I added more logging to validate that commits were actively loaded rather
than something being stuck.

Reviewed By: krallin

Differential Revision: D27084739

fbshipit-source-id: 07972707425ecccd4458eec849c63d6d9ccd923d
2021-03-16 20:38:04 -07:00
Stefan Filip
62cca2ec9b segmented_changelog: add scuba logs for loads
Summary: Logs. Minimal observability for loading Segmented Changelog.

Reviewed By: ahornby

Differential Revision: D27048940

fbshipit-source-id: 3005e7c71a32572743d06d5d371a009a030f8e4c
2021-03-16 09:30:55 -07:00
Stefan Filip
deae65979e segmented_changelog: update OverlayIdMap with assigned vertex ranges
Summary:
Pretty big bug here with the "Overlay" when we are updating both stores.  It
turns out that we don't really want a standard Overlay. We want the loaded
iddag to operate with the Ids in the shared IdMap and we want whatever is
updates to use the in process IdMap. The problem we have with the overlay is
that the shared IdMap may have more data than the in process IdMap. The shared
IdMap is always updated by the tailer, after all. This means that when we query
the overlay, we may get data from the shared store even if this is the first
time we are trying to update a changeset for the current process.

The solution here is to specify which vertexes are fetched from either store.

Reviewed By: quark-zju

Differential Revision: D27028367

fbshipit-source-id: e09f003d94100778eabd990724579c84b0f86541
2021-03-16 09:30:55 -07:00
Stefan Filip
c18b35a400 segmented_changelog: update PeriodicReload to work with any SegmentedChangelog
Summary:
Using the generic load function from SegmentedChangelogManager. This is the
config SegmentedChangelog that is consistent with the specified configuration.

I wanted to have another look at ArcSwap to understand if
`Arc<ArcSwap<Arc<dyn SegmentedChangelog>>>` was the type that it was
recommending for our situation and indeed it is.

Reviewed By: quark-zju

Differential Revision: D27028369

fbshipit-source-id: 7c601d0c664f2be0eef782700ef4dcefa9b5822d
2021-03-16 09:30:55 -07:00
Stefan Filip
e097ff6951 segmented_changelog: clarify logs
Summary: Words.

Reviewed By: quark-zju

Differential Revision: D27028370

fbshipit-source-id: 4e4be1048837f09e18b1b65762b6f23c28cc4c6a
2021-03-16 09:30:54 -07:00
Stefan Filip
41049b62ca segmented_changelog: add scuba logs for updates
Summary:
Scuba stats provide a lot of context around the workings of the service.
The most interesting operation for segmented changelog is the update.

Reviewed By: krallin

Differential Revision: D26770846

fbshipit-source-id: a5250603f74930ef4f86b4167d43bdd1790b3fce
2021-03-12 11:29:40 -08:00
Stefan Filip
3d50bcc878 segmented_changelog: add stats for inprocess update
Summary:
STATS!!!
Count, success, failure, duration. Per instances, per repo.

I wavered on what to name the stats. I wondered whether it was worth being more
specific that "mononoke.segmented_changelog.update" with something like
"inprocess". In my view the in process stats are more important than the tailer
stats because the tailer is more simple and thus easier to understand. So I add
extra qualifications to the tailer stats and keep the name short for inprocess
stats.

Reviewed By: krallin

Differential Revision: D26770845

fbshipit-source-id: 8e02ec3e6b84621327e665c2099abd7a034e43a5
2021-03-12 11:29:39 -08:00
Stefan Filip
0bd89797a1 segmented_changelog: add repo_id to OnDemandUpdateSegmentedChangelog
Summary: Currently unused. Will add stats the reference it.

Reviewed By: krallin

Differential Revision: D26770847

fbshipit-source-id: d5694cd221c90ba3adaf89345ffeb06fa46b9e7b
2021-03-12 11:29:39 -08:00
Stefan Filip
72195c55e5 segmented_changelog: update builder to hand clones of changeset_fetcher
Summary:
Fixes failing tests:
test-edenapi-server-segmented-changelog-setup.t

Reviewed By: krallin

Differential Revision: D26980053

fbshipit-source-id: ee5d1a928f91bfd1be91918cf7c27c0ae9ad5381
2021-03-11 08:19:39 -08:00
Stefan Filip
1d620372f5 segmented_changelog: update Builder with Bookmarks
Summary:
I am not sure why the integration tests didn't fail for this one. I know that
a similar issue was caught last week. Probably one of those cases where not
all tests ran. Anyway. SegmentedChangelogManager requires bookmarks now.
It's not going to use them with the way to SegmentedChangelog is built. Using
the bookmarks needs another code change.

I noticed this because it was failing the Tailer. It will crash Mononoke too.
Long story on why the tailer uses this codepath. Needless to say, we don't want
Mononoke crashing so FIX :)

Reviewed By: quark-zju

Differential Revision: D26962608

fbshipit-source-id: 6efafc67f0816792b841af2cc456edc0cc579460
2021-03-10 15:30:08 -08:00
Stefan Filip
0276503786 segmented_changelog: rename tailer stats
Summary:
Using a more specific name. Looking to differentiate between tailer update
and in process dag update.

Reviewed By: krallin

Differential Revision: D26770844

fbshipit-source-id: b35e6e705a0bfac6289c70a8e8e8cb9ba38a8d99
2021-03-10 12:15:56 -08:00
Stefan Filip
a90cdda01c segmented_changelog: remove unused stat entry
Summary: Unused.

Reviewed By: krallin

Differential Revision: D26770848

fbshipit-source-id: 7e8620f0b405d6af0d9acaded6d89b541297807a
2021-03-10 12:15:55 -08:00
Stefan Filip
9da9993d6d segmented_changelog: update Manager to build OnDemandUpdate
Summary:
Our production setup has an OnDemandUpdateSegmentedChangelog that gets updated
in various ways. With a setup where the dag is reloaded completely from saves,
we need a factory for the OnDemandUpdateSegmentedChangelog.
SegmentedChangelogManager takes the role of being the factory for our
production Dags.

At some point we will remove the SegmentedChangelog implementation for Manager.

Reviewed By: krallin

Differential Revision: D26708173

fbshipit-source-id: b3d8ea612b317af374f2c0ce6d7c512e3b09b2d2
2021-03-10 12:15:55 -08:00
Stefan Filip
6b7930ef45 segmented_changelog: remove idmap_version for Manager::load() result
Summary: IdMapVersion is no longer used.

Reviewed By: krallin

Differential Revision: D26921452

fbshipit-source-id: 81555e37d2aa0cf915d564e1ea76fa2c3ff3f131
2021-03-10 12:15:55 -08:00