Summary:
The output streams from the walks are of the form (Key, Payload, Stats).
Where the Key is Node and the Payload is NodeData this is ok, but with the key and payload both tuples it gets hard to read, so this introduces named tuple-like structs for clarity.
Reviewed By: StanislavGlebik
Differential Revision: D21504916
fbshipit-source-id: a856d34af4117d3183ef0741b311c1c34cf9dacc
Summary:
Add a --sample-path-regex option for use in the corpus dumper so we can dump out just a subset of directories from a repo.
This is most useful on large repos.
Reviewed By: farnz
Differential Revision: D21325548
fbshipit-source-id: bfda87aa76fbd325e4e01c2df90b5dcfc906a8f6
Summary:
Track path mtime, as being able to order by mtime is important to be able to use the on disk corpus to evaluate delta compression approaches
The dumped blobs mtime is set based on the last traversed bonsai or hg commit's timestamp. For Bonsai it prefers committer_time if present and if not falls back to author_time.
Reviewed By: farnz
Differential Revision: D21312223
fbshipit-source-id: fa14615603f78675ca54a0f4946cc8480b8eade5
Summary:
Update the corpus walker to dump the sampled bytes as early as possible to the Inflight area of the output dir, then move them to final location once path is known.
When walking large files and manifests this uses a lot less memory that holding the bytes in a map!
Layout is changed is to make comparison by file type easier. we get a top level dir per extension, e.g. all .json files are under FileContent/byext/json
This also reduces the number of bytes taken from the sampling fingerprint used to make directories, 8 was overkill. 3 is enough to limit directory size.
Reviewed By: farnz
Differential Revision: D21168633
fbshipit-source-id: e0e108736611d552302e085d91707cca48436a01
Summary:
Add corpus dumper for space analysis
This reuses the path based tracking from compression-benefit and the size sampling from scrub.
The core new functionality is the dump to disk from inside corpus stream.
Reviewed By: StanislavGlebik
Differential Revision: D20815125
fbshipit-source-id: 01fdc9dd69050baa8488177782cbed9e445aa3f7
Summary:
Make it so that remote bookmarks like `foo/name` or `bar/name` would pull from
different sources `paths.foo` or `paths.bar`.
Reviewed By: DurhamG
Differential Revision: D21526666
fbshipit-source-id: 6791ab047840c6c49df0c96ff1f56ae7bd1aeeba
Summary:
We now have auto pull logic that covers most unknown rev use-cases. The hint
message is no longer necessary. It's also unclear how to use `hg pull`
correctly. For example, should it be `-r`, `-B remote/foo` or `-B foo`?
Reviewed By: DurhamG
Differential Revision: D21526667
fbshipit-source-id: 40583bfb094e52939130250dd71b96db4d725ad5
Summary: The original purpose of this test was to verify that `hg restack` would work correctly with the `inihibt` extension disabled. `inhibit` has not been relevant at FB for years, so this test has no value.
Reviewed By: quark-zju
Differential Revision: D21555411
fbshipit-source-id: 475ed37439ed71aee08ad1b23ebe1770c3324890
Summary:
I want to reuse the functionality provided by `fetch_all_public_changesets`
in building Segmented Changelog. To share the code I am adding a new crate
intended to store utilities in dealing with bulk fetches.
Reviewed By: krallin
Differential Revision: D21471477
fbshipit-source-id: 609907c95b438504d3a0dee64ab5a8b8b3ab3f24
Summary:
This is helpful, when we have raw unbundle bytes and a server path and just
want to send these bytes server's way.
Very similar to `sendunbundlereplay`, but does not do anything additional,
and reads from stdin.
Reviewed By: markbt
Differential Revision: D21527243
fbshipit-source-id: 97726cb40a32c7e44f47e0f56d8c8eabc4faf209
Summary:
As a developpers is working on large blobs and iterating on them, the local LFS
store will be growing significantly over time, and that growth is unfortunately
unbounded and will never be cleaned up. Thankfully, one the guarantee that the
server is making is that an uploaded LFS blob will never be removed[0]. By using
this property, we can simply move blobs from the local store to the shared
store after uploading the blob is complete.
[0]: As long as it is not censored.
Reviewed By: DurhamG
Differential Revision: D21134191
fbshipit-source-id: ca43ddeb2322a953aca023b49589baa0237bbbc5
Summary:
In order to build a StringPiece from a C-string, the length of the that
string needs to be known, and a constexp strlen is performed on it. That
strlen is however a recursive one, causing the stack to blow up on big file.
Interestingly enough, this also means that binary files potentially had a
wrong sha1 being computed, potentially causing `hg status` to report some
files as being modified, while they aren't. By switching to using a vector
instead of a string, the intent should of this should be more obvious.
Reviewed By: simpkins
Differential Revision: D21551331
fbshipit-source-id: 2dc1f08d96f49d310593e0e934a03215be2b5cbb
Summary:
If a push or pull operation doesn't involve any changesets for which mutation
information might be relevant, don't spend any time querying the database, and
instead exit early.
Reviewed By: krallin
Differential Revision: D21549937
fbshipit-source-id: a6f992e621456b826acd1bddde3591e751d23b31
Summary:
MySQL returns an error for a query of the form `WHERE value IN ()`. Avoid
these by checking that collections are not empty before making the query.
Reviewed By: krallin
Differential Revision: D21549690
fbshipit-source-id: 1507d36e81f7a743d2a1efb046e52a5479633ab9
Summary:
The `test-infinitepush-mutation.t` test covers the new mutation database, so
add it to the mysql tests.
Reviewed By: krallin
Differential Revision: D21548966
fbshipit-source-id: 0dc1f90129fa61fb6db1c1b5a747efa3d20041f5
Summary:
The `support_bundle2_listkeys` flag controls at runtime whether we support
`listkeys` in bundle2. Since this was added before tunables were available,
it uses a value in the mutable counters SQL store.
We could migrate this to tunables, but in practice we have never disabled it,
so let's just make it the default.
Reviewed By: krallin
Differential Revision: D21546246
fbshipit-source-id: 066a375693757ea841ecf0fddb0cc91dc144fd6f
Summary:
When the client pulls draft commits, include mutation information in the bundle
response.
Reviewed By: farnz
Differential Revision: D20871339
fbshipit-source-id: a89a50426fbd8f9ec08bbe43f16fd0e4e3424e0b
Summary:
Advertise support for `b2x:infinitepushmutation`. When the client sends us
mutation information, store it in the mutation store.
Reviewed By: mitrandir77
Differential Revision: D20871340
fbshipit-source-id: ab0b3a20f43a7d97b3c51dcc10035bf7115579af
Summary: Add the mutation store to blobrepo.
Reviewed By: krallin
Differential Revision: D20871336
fbshipit-source-id: 777cba6c2bdcfb16b711dbad61fc6d0d2f337117
Summary: What it says in the title
Reviewed By: StanislavGlebik
Differential Revision: D21549635
fbshipit-source-id: 75939ebbfb317a9beaa9acd1fc1a7c6f41b0f88f
Summary: The compiler is warning about it.
Reviewed By: singhsrb
Differential Revision: D21550266
fbshipit-source-id: 4e66b0dda0e443ed63aeccd888d38a8fcb5e4066
Summary: Expose the API that returns a real graph.
Reviewed By: DurhamG
Differential Revision: D21486520
fbshipit-source-id: 4ebdb4011df8971c54930173c4e77503cd2dac47
Summary:
Part of the mutation graph (excluding split and fold) can fit in the DAG
abstraction. Add a method to do that. This allows cross-dag calculations
like:
changelogdag = ... # suppose available by segmented changelog
# mutdag and changelogdag are independent (might have different nodes),
# with full DAG operations on either of them.
mutdag = mutation.getdag(...)
mutdag.heads(mutdag.descendants([node])) & changelogdag.descendants([node2]) # now possible
Comparing to the current situation, this has some advantages:
- No need to couple the "visibility", "filtered node" logic to the mutation
layer. The unknown nodes can be filtered out naturally by a set "&"
operation.
- DAG operations like heads, roots can be performed on mutdag when it's
previously impossible. We also get operations like visualization for free.
There are some limitations, though:
- The DAG cannot represent non 1:1 modifications (fold, split) losslessly.
Those relationships are simply ignored for now.
- The MemNameDag is not lazy. Reading a long chain of amends might be slow.
For most normal use-cases it is probably okay. If it becomes an issue we
can seek for other solutions, for example, store part of mutationstore
directly in a DAG format on disk, or have fast paths to bypass long
predecessor chain calculation.
Reviewed By: DurhamG
Differential Revision: D21486521
fbshipit-source-id: 03624c8e9803eb1852b3034b8f245555ec582e85
Summary:
subcommand_single calls `derived_data_utils.regenerate(vec![cs_id])` with the
intention that derived data for this commit will be regenerated. However
previously it didn't work because DerivedDataUtils::derive() was ignoring
regenerate parameter. This diff fixes it.
Reviewed By: krallin
Differential Revision: D21527344
fbshipit-source-id: 56d93135071a7f3789262b7a9d9ad84a0896c895
Summary:
This allows us to replace the pyre2 C++ bindings so the fast regex engine can
work with Python 3, and simplify our build steps.
Reviewed By: DurhamG
Differential Revision: D20973179
fbshipit-source-id: e123ac18954991f2c701526108f5c2ecd2b31a3b
Summary: Add a `/history` endpoint that serves EdenAPI history data. Like the other endpoints, this one currently buffers the response in memory, and will be modified to return a streaming response in a later diff.
Reviewed By: krallin
Differential Revision: D21489463
fbshipit-source-id: 259d2d1b7d700251fe902f1ac741545e5895404a
Summary: Add the ability to parse EdenAPI history responses to `data_util`.
Reviewed By: quark-zju
Differential Revision: D21489228
fbshipit-source-id: 42dda64273673431a6f3e4d7bd430689c76c387f
Summary: Factor out logic that will be common to many handlers into new functions in the `util` module.
Reviewed By: krallin
Differential Revision: D21489469
fbshipit-source-id: 9aff4e5182748ab0a0bedd6038852692b8e721a7
Summary: Break up the EdenAPI server integration tests to prevent the test from getting too long.
Reviewed By: krallin
Differential Revision: D21464056
fbshipit-source-id: 076aaf8717547fe9188f40c078d577961c02325d
Summary: Add an endpoint that serves trees. Uses the same underlying logic as the files endpoint, and returns the requested nodes in a CBOR DataResponse.
Reviewed By: krallin
Differential Revision: D21412987
fbshipit-source-id: a9bcc169644a5889c3118a3207130228a5246b2f
Summary: Change `make_req` to take a JSON array as input when constructing `DataRequest`s instead of a JSON object. This is more correct because DataRequests can include multiple `Key`s with the same path; this cannot be represented as an object since an object is effectively a hash map wherein we would have duplicate keys.
Reviewed By: quark-zju
Differential Revision: D21412989
fbshipit-source-id: 07a092a15372d86f3198bea2aa07b973b1a8449d
Summary: EdenAPI data responses can contain data for either files or trees. As such, the implementation of both the files and trees endpoints is almost identical. To allow the logic to be shared between both, this diff makes the handler code generic.
Reviewed By: krallin
Differential Revision: D21412986
fbshipit-source-id: 89501915b0401214b373ed1db2eb09e59de2e5b7
Summary: In order to allow writing code that is generic over files and trees, move the functionality common between the two to a separate trait. This will allow for a significant amount of code sharing in the EdenAPI server. (This diff does not introduce any new functionality; it's mostly just moving existing code into the new traits.)
Reviewed By: krallin
Differential Revision: D21412988
fbshipit-source-id: 31b55904f62ccb965b0f9425de875fc069e10b5a
Summary:
Add an endpoint that serves Mercurial file data.
The data for all files involved is fetched concurrently from Mononoke's backend but in this initial version the results are first buffered in memory before the response is returned; I plan to change this to stream the results in a later diff.
For now this version demonstrates the basic functionality as well as things like ACL enforcement (a valid client identity header with appropriate access permissions must be present for requests to succeed).
Reviewed By: krallin
Differential Revision: D21330777
fbshipit-source-id: c02a70dff1f646d02d75b9fc50c19e79ad2823e6