Summary: Upstream crate has landed my PR for zstd 1.4.9 support and made a release, so can remove this patch now.
Reviewed By: ikostia
Differential Revision: D28221163
fbshipit-source-id: b95a6bee4f0c8d11f495dc17b2737c9ac9142b36
Summary:
Vertex is old. It no longer makes sense with the current structure. The main
issue is that the dag crate now has VertexName which may confuse readers at
first glance.
When Vertex was added DagId would have been confusing because we had structs
that were named Dag that did not use DagId directly. Those structures are now
renamed and DagId is consistently used for dag crate structures.
The IdMap database would still use the vertex name until someone runs a
migration to rename the column.
I am not 100% that this is needed, but it's a change that's been on my mind.
Reviewed By: quark-zju
Differential Revision: D28110184
fbshipit-source-id: b996a7545a90acc25e2bb5326f2731b95c8740b4
Summary:
We used to carry patches for Tokio 0.2 to add support for disabling Tokio coop
(which was necessary to make Mononoke work with it), but this was upstreamed
in Tokio 1.x (as a different implementation), so that's no longer needed. Nobody
else besides Mononoke was using this.
For Hyper we used to carry a patch with a bugfix. This was also fixed in Tokio
1.x-compatible versions of Hyper. There are still users of hyper-02 in fbcode.
However, this is only used for servers and only when accepting websocket
connections, and those users are just using Hyper as a HTTP client.
Reviewed By: farnz
Differential Revision: D28091331
fbshipit-source-id: de13b2452b654be6f3fa829404385e80a85c4420
Summary:
This used to be used by Mononoke, but we're now on Tokio 1.x and on
corresponding versions of Gotham so it's not needed anymore.
Reviewed By: farnz
Differential Revision: D28091091
fbshipit-source-id: a58bcb4ba52f3f5d2eeb77b68ee4055d80fbfce2
Summary:
The changesets object is only valid to access the changesets of a single repo
(other repos may have different metadata database config), so it is pointless
for all methods to require the caller to provide the correct one. Instead,
make the changesets object remember the repo id.
Reviewed By: krallin
Differential Revision: D27430611
fbshipit-source-id: bf2c398af2e5eb77c1c7c55a89752753020939ab
Summary: This makes CloneData possible to represent an empty repo.
Reviewed By: sfilipco
Differential Revision: D27926246
fbshipit-source-id: 0bcead224ef5b89c66d07a34d8217edaef62177f
Summary:
NOTE: there is one final pre-requisite here, which is that we should default all Mononoke binaries to `--use-mysql-client` because the other SQL client implementations will break once this lands. That said, this is probably the right time to start reviewing.
There's a lot going on here, but Tokio updates being what they are, it has to happen as just one diff (though I did try to minimize churn by modernizing a bunch of stuff in earlier diffs).
Here's a detailed list of what is going on:
- I had to add a number `cargo_toml_dir` for binaries in `eden/mononoke/TARGETS`, because we have to use 2 versions of Bytes concurrently at this time, and the two cannot co-exist in the same Cargo workspace.
- Lots of little Tokio changes:
- Stream abstractions moving to `tokio-stream`
- `tokio::time::delay_for` became `tokio::time::sleep`
- `tokio::sync:⌚:Sender::send` became `tokio::sync:⌚:Sender::broadcast`
- `tokio::sync::Semaphore::acquire` returns a `Result` now.
- `tokio::runtime::Runtime::block_on` no longer takes a `&mut self` (just a `&self`).
- `Notify` grew a few more methods with different semantics. We only use this in tests, I used what seemed logical given the use case.
- Runtime builders have changed quite a bit:
- My `no_coop` patch is gone in Tokio 1.x, but it has a new `tokio::task::unconstrained` wrapper (also from me), which I included on `MononokeApi::new`.
- Tokio now detects your logical CPUs, not physical CPUs, so we no longer need to use `num_cpus::get()` to figure it out.
- Tokio 1.x now uses Bytes 1.x:
- At the edges (i.e. streams returned to Hyper or emitted by RepoClient), we need to return Bytes 1.x. However, internally we still use Bytes 0.5 in some places (notably: Filestore).
- In LFS, this means we make a copy. We used to do that a while ago anyway (in the other direction) and it was never a meaningful CPU cost, so I think this is fine.
- In Mononoke Server it doesn't really matter because that still generates ... Bytes 0.1 anyway so there was a copy before from 0.1 to 0.5 and it's from 0.1 to 1.x.
- In the very few places where we read stuff using Tokio from the outside world (historical import tools for LFS), we copy.
- tokio-tls changed a lot, they removed all the convenience methods around connecting. This resulted in updates to:
- How we listen in Mononoke Server & LFS
- How we connect in hgcli.
- Note: all this stuff has test coverage.
- The child process API changed a little bit. We used to have a ChildWrapper around the hg sync job to make a Tokio 0.2.x child look more like a Tokio 1.x Child, so now we can just remove this.
- Hyper changed their Websocket upgrade mechanism (you now need the whole `Request` to upgrade, whereas before that you needed just the `Body`, so I changed up our code a little bit in Mononoke's HTTP acceptor to defer splitting up the `Request` into parts until after we know whether we plan to upgrade it.
- I removed the MySQL tests that didn't use mysql client, because we're leaving that behind and don't intend to support it on Tokio 1.x.
Reviewed By: mitrandir77
Differential Revision: D26669620
fbshipit-source-id: acb6aff92e7f70a7a43f32cf758f252f330e60c9
Summary:
Update the zstd crates.
This also patches async-compression crate to point at my fork until upstream PR https://github.com/Nemo157/async-compression/pull/117 to update to zstd 1.4.9 can land.
Reviewed By: jsgf, dtolnay
Differential Revision: D27942174
fbshipit-source-id: 26e604d71417e6910a02ec27142c3a16ea516c2b
Summary:
This error occurs when the client sends us a set of heads that we cannot match.
For example when the client forces a commit in the master group; this was
possible with revlogs but should be a bug with Segmented Changelog. This can
also happen when master moves backwards, clients and server have multiple heads
then the server reseeds.
Clients that get this error should reclone.
Reviewed By: quark-zju
Differential Revision: D27786602
fbshipit-source-id: 9854ccee929ae0a845236ebd83ed68158c93fc7a
Summary:
This scenario appears when master moves backwards.
Since the IdDag can handle multiple master heads, the server can piggy-back on that
functionality and support multiple master heads when translating location to hash.
Reviewed By: quark-zju
Differential Revision: D27521780
fbshipit-source-id: c27541890d4fda13648857f010c11a25bf96ef67
Summary:
Fixes MySQL syntax errors we've seen in some cases. No reason to call to the
database if we have no items to query for. It seems that empty queries can come
from the caching layer under certain configurations.
Reviewed By: krallin
Differential Revision: D27624798
fbshipit-source-id: 2febeff127f2fbdc739368ff1d1065c8f64723f8
Summary:
Let's give some time for bookmarks to warm up before we start checking the
location of the master bookmark.
Reviewed By: quark-zju
Differential Revision: D27472520
fbshipit-source-id: f20e6f04acadaf8b7caf6aa38a6ab9d244c6f8ea
Summary:
We want to protect against races where we may have multiple writers with
different IdMap versions. We want IdMapVersions to be monotonous increasing.
The common race scenario is between tailers and seeder. We may have a tailer
that starts to updates while a seeder writes a version with a new idmap. The
tailer may try to write a version that has an old version of the idmap. We want
the tailer to fail in this case.
Reviewed By: krallin
Differential Revision: D27210035
fbshipit-source-id: 168c7c6f25622587071337df1172a12b799e0285
Summary:
In production, all repos are instantiated roughly the same time so all reload
processes are started roughly the same time. Reload makes a bunch of requests
and could potentially cause load. Jitter spreads out the load of the reload.
Avoiding the load spike will make overall server behavior more predictable.
Reviewed By: krallin
Differential Revision: D27280117
fbshipit-source-id: 0727af2e7f231a5b6c948424022788a8e7071f82
Summary:
We would like to distribute the load of update process when many repositories
have Segmented Changelog enabled. Without the jitter all enabled repositories
start their update at roughly the same time. The jitter smooth out the load and
reduces variance in the update process.
Reviewed By: krallin
Differential Revision: D27280118
fbshipit-source-id: 41ad83b09700da1ef70c09dd5d284977e53a95a2
Summary:
`build_from_scratch` only called in `run_with_idmap_version` so we can inline
the code so that the seed process reads better.
This function used to be used as a shortcut towards getting a built dag but now
we prefer to fetch the dags from the store that the seeder writes to.
Reviewed By: krallin
Differential Revision: D27210036
fbshipit-source-id: 0b31ff1126a0f4904578da333cf6d34d69b2782c
Summary:
Removing the last callsite for SegmentedChangelogBuilder means that
the whole class goes away.
Reviewed By: krallin
Differential Revision: D27208339
fbshipit-source-id: 006aa91ed9656e4c33b082cbed82f9a497b0d267
Summary:
We are removing SegmentedChangelogBuilder.
Remove the last uses of Builder in the tests module.
Reviewed By: krallin
Differential Revision: D27208341
fbshipit-source-id: 00f1aaa2376ee5d68dbf7c1256b312cfe0b96d86
Summary:
Any functions that returns SegmentedChangelog is a valid argument for
reloading.
Reviewed By: krallin
Differential Revision: D27202520
fbshipit-source-id: fe903c6be4646c8ec98058d1a025829268c36619
Summary:
PeriodReloading is not fundamentally tied to the Manager. A future change will
update the load function.
Reviewed By: krallin
Differential Revision: D27202524
fbshipit-source-id: a0e4b08cb8605d071d5f30be8c3054f75321aa9c
Summary:
Let's look at these test from a higher perspective. Right now the tests use
internal APIs because not all components were ready when they were added. We
now have components for all the parts of the lifecycle so we can set up tests
in the same form that we would set up the production workflow.
This simplifies the API structure of the components since they can be catered
to one workflow.
Reviewed By: krallin
Differential Revision: D27202530
fbshipit-source-id: 6ec10a0b1ae49da13cfbe803e120a4e754b35fc7
Summary:
The broad goal is to get rid of SegmentedChangelogBuilder.
We will have a new constructor for Seeder, one that uses non segmented_changelog
dependencies as input.
Reviewed By: krallin
Differential Revision: D27202523
fbshipit-source-id: d420507502925d4440d5c3058efef0a4d2dbe895
Summary:
For tests that don't care about the bookmarks specifically, we want to use the
default bookmark name that we defined in BOOKMARK_NAME.
FWIW, it makes sense for this bookmark to be set by blobrepo even... at least
the fixtures. They set a bookmark and it makes sense for us to have a reference
to the bookmark that they set. Something to think about.
Reviewed By: krallin
Differential Revision: D27202522
fbshipit-source-id: 7615e4978dded491dd04ae44ce0b85134a252feb
Summary:
This gets rid of the odd builder for the Seeder.
We can get into design discussions with this one. What is struct and what is
function? For real structures that provide some behavior, I prefer to put
dependencies in owned data. Things that are part of the request go into
function parameters. In mononoke, RepositoryId is the common exception.
Anyway, IdMapVersion is part of the request for seeding. It is useful to have
that as a parameter when starting the seeder.
Reviewed By: krallin
Differential Revision: D27202528
fbshipit-source-id: a67b33493b20d2813fd0a144b9bb7f4510635ae8
Summary:
Convert `BlobRepo` to a `facet::container`. This will allow it to be built
from an appropriate facet factory.
This only changes the definition of the structure: we still use
`blobrepo_factory` to construct it. The main difference is in the types
of the attributes, which change from `Arc<dyn Trait>` to
`Arc<dyn Trait + Send + Sync + 'static`>, specified by the `ArcTrait` alias
generated by the `#[facet::facet]` macro.
Reviewed By: StanislavGlebik
Differential Revision: D27169437
fbshipit-source-id: 3496b6ee2f0d1e72a36c9e9eb9bd3d0bb7beba8b
Summary:
The broad goal here is to remove SegmentedChangelogBuilder.
Looking at Seeder dependencies here.
Reviewed By: StanislavGlebik
Differential Revision: D27202527
fbshipit-source-id: 164da1c4d202cc0f069f67963b71920dca9bcba5
Summary:
The original for this test was to desribe how the SegmentedChangelogBuilder was
to be used. We are removing SegmentedChangelogBuilder. This test goes away now.
Reviewed By: ahornby
Differential Revision: D27202521
fbshipit-source-id: e39a47411c61e8812d4081f6ee02323732369c1b
Summary:
This is part of removing SegmentedChangelogBuilder.
The Tailer constructor directly specifies the Mononoke requirements that is
needs to be provided in order to function.
Reviewed By: ahornby
Differential Revision: D27163312
fbshipit-source-id: 961066e2aa4e88c330841f7362b3ba17d0030638
Summary:
The problem appears when we have shared idmap moving ahead of the memory idmap
before we ever write to the memory idmap. In that case we would incorrectly
fetch last shared from the shared idmap. When that shared last entry is larger
than the cutoff we would try to assign ids starting from the shared entry. When
updating the IdDag it would assume that it has to insert all ids above the
cutoff but it would fail to resolve all ids exactly above the cutoff.
For example, MemIdMap is empty, cutoff is 5, shared idmap last entry is 7. We
asign 8-10 then the IdDag tries to search for 5-10 and fails to resolve 5.
This function was not updated after adding cutoff to OverlayIdMap in an earlier
diff.
Reviewed By: quark-zju
Differential Revision: D27248367
fbshipit-source-id: 97fc1efe8cdfb446c4571196dcef7c2db9a43330
Summary:
Resolve a circular dependency whereby `BlobRepo` needs to depend on
`Arc<dyn SegmentedChangelog>`, but the segmented changelog implementation
depends on `BlobRepo`, by moving the trait definition to its own crate.
Reviewed By: sfilipco
Differential Revision: D27169423
fbshipit-source-id: 5bf7c632607dc8baba40d7a9d65e96e265d58496
Summary:
This test verifies that the issue we had previously with assign_ids does not
creep up again.
Reviewed By: quark-zju
Differential Revision: D27105741
fbshipit-source-id: 49b385b2026b599c92c406331a2299931a2eae46
Summary: Update the logs so that it's more clear what is going on.
Reviewed By: quark-zju
Differential Revision: D27145099
fbshipit-source-id: 11ec7b467157d07dd41893dc82f251a1c555365f
Summary:
We are also going to make update the IdMapVersionStore before we start writing
the IdMap. This means that if we crash while writing the IdMap, future runs
don't try to use the same IdMapVersion that we used previously.
Reviewed By: quark-zju
Differential Revision: D27145097
fbshipit-source-id: b911e2dca32d0fe8ae0aead3de75373dd2f936c4
Summary:
We are going to build the iddag before starting to write the idmap.
This means if the iddag fails to build for whatever reason we would not have
written a potentially useless idmap.
Reviewed By: quark-zju
Differential Revision: D27145098
fbshipit-source-id: c9045abea2a1f5a8b96c524d546776fdc693b56a
Summary:
`update::build` is only used by the Seeder. The steps in this function are not
isolated enough from the seeder to have a separate function. The seeder has the
role of builing it's own type of StartState. It is also the only process that
deals with the IdMapVersionStore. The seeder is particular enough that it makes
sense to inline it's build order.
Reviewed By: quark-zju
Differential Revision: D27099265
fbshipit-source-id: f86b8d7d4637a5f2582e70fc58b60c2041b93548
Summary:
The most important invariant for IdDag is that parent nodes have ids that are
smaller than child nodes. We had a couple of issues that resulted in failing
this invariant so we are adding these extra checks. They will help us diagnose
issues faster and proctect protect production data against faulty updates.
Reviewed By: quark-zju
Differential Revision: D27092204
fbshipit-source-id: 1f052b290a494e267fac2f551ba51582baa67973
Summary: Shadowing can end up being more confusing.
Reviewed By: quark-zju
Differential Revision: D27143481
fbshipit-source-id: 0a1913d8952fe913cc7596b9aea84df2d62cc3fe
Summary:
Check that head has a dag id assignment after finishing the process. This was
done at a later point but it is better to group it with assignment process so
that we have a clear source of the error.
Reviewed By: quark-zju
Differential Revision: D27143482
fbshipit-source-id: 2a94cee70142967b4f8d57df43dfcc339a0b4f2e
Summary:
Segmented Changelog distinguishes commits into two groups: MASTER and
NON_MASTER. The MASTER group is assumed to big and special attention is payed
to it. Algorithms optimize for efficiency on MASTER.
The current state for the segmented_changelog crate in Mononoke is that it does
not assign NON_MASTER commits. It doesn't need to right now. We want to
establish a baseline with the MASTER group. It was however possible for the
on demand update dag to assign commits that were no in the master group to the
master group because no explicit checks were performed. That could lead to
surprising behavior.
At a high level, the update logic that we want is: 1. assign the master
bookmark changeset to the MASTER group, 2. assign other commits that we need to
operate on to the NON_MASTER group. For now we need 1, we will implement 2
later.
Reviewed By: krallin
Differential Revision: D27070083
fbshipit-source-id: 922bcde3641ca25512000cd1a912c5b399bdff4b
Summary:
Pull in SegmentedChangelogConfig and build a SegmentedChangelog instance.
This ties the config with the object that we build on the servers.
Separating the instatiation of the sql connections from building any kind of
segmented changelog structure. The primary reason is that there may be multiple
objects that get instantiated and for that it is useful to be able to pass
this object around.
Reviewed By: krallin
Differential Revision: D26708175
fbshipit-source-id: 90bc22eb9046703556381399442117d13b832392