Summary:
In a very mergy repos we can hit a combinatoric explosion by visiting the same
node over and over again. Derived data framework has the same problem, and this diff
fixes it.
I had a few attempts at implementing it:
**1** Use `bounded_traversal`, but change unfold to filter out parents that were already visited.
That wasn't correct because during fold will be called only with
the "unvisited" parents. For example in a case like
```
D
/ \
C B
\ /
A
```
fold for C or B will be called with empty parents, and that's incorrect.
**2** Use `bounded_traversal`, change unfold to filter out visited parents but
also remember real parents.
That won't work as well. The reason is that fold might be called before unfold
for parents have finished. so in the case like
```
D
/ \
C B
\ /
A
|
...
thousands of commits
```
If C reaches A first, then B won't visit any other node, and it will try to
derive data for B. However derived data for A might not be ready yet, so
deriving data for B might fail.
**3** Change bounded_traversal to support DAGs not just tree.
From two points above it's clear that bounded_traversal should be called
bounded_tree_traversal, because on any other DAGs it might hit combinatoric
explosion. I looked into changing bounded_traversal to support DAGs, and that
was possible but that was not easy. Specifically we need to make sure that all
unfold operations are called after fold operations, stop using integers for
nodes etc. It might also have a perf hit for the tree case, but not clear how
big is it.
While I think supporting DAGs in bounded_traversal makes sense, I don't want to
block derived data implementation on that. I'll create a separate task for that
---------------------------------------------------------------------------------
The approach I took in the end was to use bounded_stream_traversal that don't
visit the same node twice. Doing this will find all commits that need to be
regenerated but it might return them in an arbitrary order. After that we need
to topo_sort the commits (note that I introduced the bug for hg changeset
generation in D16132403, so this diff fixes it as well).
This is not the most optimal implementation because it will generate the nodes
sequentially even if they can be generated in parallel (e.g. if the nodes are
in different branches). I don't think it's a huge concern so I think it worth
waiting for bounded_dag_traversal implementation (see point 3) above)
---------------------------------------------------------------------------------
Finally there were concerns about memory usage from additional hashset that
keeps visited nodes. I think these concerns are unfounded for a few reasons:
1) We have to keep the nodes we visited *anyway* because we need to generated
derived data from parents to children. In fact, bounded_traversal keeps them in
the map as well.
That's true that bounded traversal can do it a bit more efficiently in cases
we have two different branches that do not intersect. I'd argue that's a rare
case and happens only on repo merges which have two independent equally sized
branches. But even for the case it's not a huge problem (see below).
2) Hashset just keep commit ids which are 32 bytes long. So even if we have 1M
commits to generate that would take 32Mb + hashset overhead. And the cases like
that should never happen in the first place - we do not expect to generate
derived data for 1M of commits except for the initial huge repo imports (and
for those cases we can afford 30Mb memory hit). If we in the state where we
need to generate too many commits we should just return an error to the user,
and we'll add it in the later diffs.
Reviewed By: krallin
Differential Revision: D16438342
fbshipit-source-id: 4d82ea6111ac882dd5856319a16dda8392dfae81
Summary:
Before this change, we would always include the shard id in our mysql-related fb303 counters. This is not perfect for two reasons:
- the the xdb blobstore we have 4K shards and 24 counters, so we were reporting 96K counters in total
- we rarely care about per-counter metrics anyway, since in most cases all queries are uniformly distributed
Therefore, let's change this approach to not use per-shard counters and use per-shardmap ones (when sharding is involved).
Reviewed By: krallin
Differential Revision: D16360591
fbshipit-source-id: b2df94a3ca9cacbf5c1f328b48e87b48cd18287e
Summary: This adds a few retries in create_raw_xdb_connection. This is intended as a first step towards solving some of the flakiness we've observed when connecting to MySQL through direct connections (sometimes, we fail to acquire certificates).
Reviewed By: farnz
Differential Revision: D16228401
fbshipit-source-id: 0804797aecfe0b917099191cd2a36ce4c077b949
Summary:
In earlier diffs in this stack, I updated the callsites that reference XDB tiers to use concrete &str types (which is what they were receiving until now ... but it wasn't spelled out as-is).
In this diff, I'm updating them to use owned `String` instead, which lets us hoist up `to_string()` and `clone()` calls in the stack, rather than pass down reference only to copy them later on.
This allows us to skip some unnecessary copies. Tt turns out we were doing quite a few "turn this String into a reference, pass it down the stack, then turn it back into a String".
Reviewed By: farnz
Differential Revision: D16260372
fbshipit-source-id: faec402a575833f6555130cccdc04e79ddb8cfef
Summary:
Instantiating a new DB connection may require remote calls to be made to e.g. Hipster to allocate a new certificate (this is only the case when connecting to MySQL).
Currently, our bindings to our underlying DB locator make a blocking call to pretend that this operaiton is synchronous: https://fburl.com/ytmljxkb
This isn't ideal, because this call might actually take time, and we might also occasionally want to retry it (we've had issues in our MySQL tests with acquiring certificates that retrying should resolve). Running this synchronously makes doing so inefficient.
This patch doesn't update that, but it fixes everything on the Rust side of things to stop expecting connections to return a `Result` (and to start expecting a Future instead).
In a follow up diff, I'll work on making the changes in common/rust/sql to start returning a Future here.
Reviewed By: StanislavGlebik
Differential Revision: D16221857
fbshipit-source-id: 263f9237ff9394477c65e455de91b19a9de24a20
Summary:
Add cachelib layer to `CacheManager`.
`CacheManager` behaviors:
| cachelib | memcache | Behavior |
| -- | -- | -- |
| Miss | Miss | Resolve `fill` future, fill both cache layer |
| Miss | Hit | Fetch data from memcache and fill cachelib with the data fetched |
| Hit | Miss | Return data fetched from cachelib, DO NOT fill memcache |
| Hit | Hit | Return data fetched from cachelib |
Reviewed By: StanislavGlebik
Differential Revision: D15929659
fbshipit-source-id: f7914efc7718c614f39a8fd6ad5e6588773fdd78
Summary: Add type safety to `abomonation_future_cache` by requiring usage of `VolatileLruCachePool`, and make that change for all usages of `LruCachePool`.
Reviewed By: farnz
Differential Revision: D15882275
fbshipit-source-id: 3f192142af254d7b6b8ea7f9cc586c2034c97b93
Summary: It updates SqlConstructors to expose a `with_xdb` method that accepts an optional myrouter port.
Reviewed By: krallin
Differential Revision: D15897639
fbshipit-source-id: 25047c24ef28c76d2a27a8d26de8ecad521a1f82
Summary:
This updates the mononoke server code to support booting without myrouter. This required 2 changes:
- There were a few callsites where we didn't handle not having a myrouter port.
- In our function that waits for myrouter, we were failing if we had no myrouter port, but that's not desirable: if we don't have a myrouter port, we simply don't need to wait.
Arguably, This isn't 100% complete yet. Notably, RepoReadWriteStatus still requires myrouter. I'm planning to create a bootcamp task for this since it's not blocking my work adding integration tests, but would be a nice to have.
Speaking of further refactor, it'd be nice if we supported a `SqlConstructors::with_xdb` function that did this matching for us so we didn't have to duplicate it all over the place. I'm also planning to bootcamp this.
Reviewed By: farnz
Differential Revision: D15855431
fbshipit-source-id: 96187d887c467abd48ac605c28b826d8bf09862b
Summary:
1.add function myrouter_ready to common/sql_ext/sr/lib.rs;
2.refactor main.rs and repo_handlers.rs to use the new function
Reviewed By: ikostia
Differential Revision: D15623501
fbshipit-source-id: 7b9d6c5fd7c33845148dfacefbcf1bf3c6afaa5d
Summary:
We don't control whatever `get_from_db` will do when asked to fetch no data. We can hope it'll do the smart thing and not hit the DB nor increment any monitoring counters.
This can be problematic, because it can result in confusing data in ODS. See T45198435 for a recent example of this.
Reviewed By: StanislavGlebik
Differential Revision: D15620424
fbshipit-source-id: 629c2eaad00d4977b0598c26e1f2a2ca64a1d66e
Summary:
This updates caching_ext to record cache hit and miss stats. This makes
it easier to write tests that exercise this caching.
As part of this, I refactored the CachelibHandler and MemcacheHandler mocks to
use a shared MockStore implementation.
Reviewed By: StanislavGlebik
Differential Revision: D15220647
fbshipit-source-id: b0f70b9780f577226664ebf6760b5fc93d733cd3
Summary:
Seems redundant to also require callers to open_ssl to also pass a
(mostly) identical string.
Also make open_ssl special-case filenodes with sharding (though filenodes
aren't currently opened through it).
Reviewed By: StanislavGlebik
Differential Revision: D15157834
fbshipit-source-id: 0df45307f17bdb2c021673b3153606031008bee2
Summary:
In the case of mononoke's admin tool it's annoying for users to be required to run myrouter in the background and provide myrouter port to every command.
Thanks to this change it is no longer necessary to run admin commands through myrouter - the tool will simply use a direct connection to XDB using the sql crate.
It is important to note that the raw XDB connection via sql crate doesn't have connection pooling and doesn't handle XDB failover so it is crucial that it is never used for long-lived or request heavy use cases like running mononoke server or blobimport
Reviewed By: jsgf
Differential Revision: D15174538
fbshipit-source-id: 299d3d7941ae6aec31961149f926c2a4965ed970
Summary:
Add a LABEL constant to the SqlConstructors trait to make it easier to identify
which table is being used, for stats and logging.
Reviewed By: HarveyHunt
Differential Revision: D13457488
fbshipit-source-id: a061a9582bc1783604f249d5b7dcede4b1e1d3c5
Summary: Following D14935155, let's decrease number of write connections as well
Reviewed By: ikostia
Differential Revision: D14973547
fbshipit-source-id: c344ecc568be26287e998b45b6988744cb5e0a09
Summary:
Mononoke and hg both have their own implementation wrappers for lz4
compression, unify these to avoid duplication.
Reviewed By: StanislavGlebik
Differential Revision: D14131430
fbshipit-source-id: 3301b755442f9bea00c650c22ea696912a4a24fd
Summary: There's nothing Mercurial-specific about identifying a repo. This also outright removes some dependencies on mercurial-types.
Reviewed By: StanislavGlebik
Differential Revision: D13512616
fbshipit-source-id: 4496a93a8d4e56cd6ca319dfd8effc71e694ff3e
Summary:
Previously `max_gen()` function did a linear scan through all the keys, and it
was linear. Let's use `UniqueHeap` datastructure to track maximum generation
number.
Reviewed By: lukaspiatkowski
Differential Revision: D13275471
fbshipit-source-id: 21b026c54d4bc08b26a96102d2b77c58a981930f
Summary:
We've recently found that `known()` wireproto request gets much slower when we
send more traffic to Mononoke jobs. Other wireproto methods looked fine, cpu
and memory usage was fine as well.
Background: `known()` request takes a list of hg commit hashes and returns
which of them Mononoke knows about.
One thing that we've noticed is that `known()` handler sends db requests sequentially.
Experiments with sending `known()` requests with commit hashes that Mononoke
didn't know about confirmed that it's latency got higher the more parallel
requests we sent. We suspect this is because Mononoke has to send a requests to
db master, and we limit the number of master connections.
A thing that should help is batching the requests i.e. do not send many
requests asking if a single hg commit exists, but sending the same request for
many commits at once.
That change also required doing changes to the bonsai-mapping caching layer to
do batch cache requests.
Reviewed By: lukaspiatkowski
Differential Revision: D13194775
fbshipit-source-id: 47c035959c7ee12ab92e89e8e85b723cb72738ae
Summary:
Default ServiceType is ServiceType.Any, so it might go to master in a master
region. This diff changes it.
Reviewed By: lukaspiatkowski, farnz
Differential Revision: D13021674
fbshipit-source-id: 928cf59b095549f3048411241116c097e1193c7d
Summary: Additionally use a lower max_number_of_concurrent_connections for read connections to master to avoid overloading it.
Reviewed By: farnz
Differential Revision: D12979366
fbshipit-source-id: 258dbae554155d7a33d619f445293092940aad61
Summary:
We were using incorrect buffer size. That's *very* surprising that our servers
weren't continuously crashing. However, see the test plan - it really looks
like `LZ4_compressBound()` is the correct option here.
Reviewed By: farnz
Differential Revision: D9738590
fbshipit-source-id: d531f32e79ab900f40d46b7cb6dac01dff8e9cdc
Summary:
Backout D9124508.
This is actually more complex than it seems. It breaks non-buck build
everywhere:
- hgbuild on all platforms. POSIX platforms break because `hg archive` will
miss `scm/common`. Windows build breaks because of symlink.
- `make local` on GitHub repo because `failure_ext` is not public. The `pylz4`
Cargo.toml has missing dependencies.
Fixing them correctly seems non-trivial. Therefore let's backout the change to
unblock builds quickly.
The linter change is kept in case we'd like to try again in the future.
Reviewed By: simpkins
Differential Revision: D9225955
fbshipit-source-id: 4170a5f7664ac0f6aa78f3b32f61a09d65e19f63
Summary: Moved the lz4 compression code into a separate module in `scm/common/pylz4` and redirected code referencing the former two files to the new module
Reviewed By: quark-zju, mitrandir77
Differential Revision: D9124508
fbshipit-source-id: e4796cf36d16c3a8c60314c75f26ee942d2f9e65
Summary:
This is a series of patches which adds Cargo.toml files to all the crates and tries to build them. There is individual patch for each crate which tells whether that crate build successfully right now using cargo or not, and if not, reason behind that.
Following are the reasons why the crates don't build:
* failure_ext and netstring crates which are internal
* error related to tokio_io, there might be an patched version of tokio_io internally
* actix-web depends on httparse which uses nightly features
All the build is done using rustc version `rustc 1.27.0-dev`.
Pull Request resolved: https://github.com/facebookexperimental/mononoke/pull/7
Differential Revision: D8778746
Pulled By: jsgf
fbshipit-source-id: 927a7a20b1d5c9643869b26c0eab09e90048443e
Summary:
Unify all uses of Sqlite and of Mysql
This superceded D8712926
Reviewed By: farnz
Differential Revision: D8732579
fbshipit-source-id: a02cd04055a915e5f97b540d6d98e2ff2d707875
Summary:
We had a memory leak because context wasn't cleaned afterwards. This diff fixes
it
Reviewed By: farnz
Differential Revision: D8236762
fbshipit-source-id: f82b061f3f541d9104d1185ed04ea21224b7d5bc
Summary: We are going to add CompressContext in the next diff
Reviewed By: farnz
Differential Revision: D8236761
fbshipit-source-id: 0df55b9bc5e9fd78ac8c060576513c1216641ead
Summary: Will be used in remotefilelog getfiles method.
Reviewed By: jsgf
Differential Revision: D6884919
fbshipit-source-id: e8037123a4843322c29b37c6b5749444781f4fa7
Summary:
Add a separate crate that uses lz4 in the same way as python lz4 library. The
main difference is that first 4 bytes are length of the raw data in le32
format. The reason for moving it in a separate crate is to use pylz4 for
remotefilelog getfiles method.
Also removed one panic and replaced it with error.
Reviewed By: jsgf
Differential Revision: D6884918
fbshipit-source-id: 1b05381c045a1f138ab28820175289233b07a91d