Summary:
Mercurial has a hack to determine if a file was renamed. If p1 is None then
copy metadata is checked. Note that this hack is purely to make finding renames
faster and we don't need it in Mononoke. So let's just read copy metadata.
This diff also removes `maybe_copied()` method and unused code like `Symlink`
Reviewed By: farnz
Differential Revision: D12826409
fbshipit-source-id: 53792218cb61fcba96144765790278d17eecdbb1
Summary:
We have decided that this will be used to transport phases to the client
Hg client already supports this part.
Reviewed By: StanislavGlebik
Differential Revision: D13507921
fbshipit-source-id: 621e93bb6e1a0c87d4f4963ba7fa635b77a5b6ec
Summary: There's nothing Mercurial-specific about identifying a repo. This also outright removes some dependencies on mercurial-types.
Reviewed By: StanislavGlebik
Differential Revision: D13512616
fbshipit-source-id: 4496a93a8d4e56cd6ca319dfd8effc71e694ff3e
Summary:
Removed:
cmd-line cmd tool for filenodes and bookmarks. These should be a part of
mononoke_admin script
Outdates docs folder
Commitsim crate, because it's replaced by real pushrebase
unused hooks_old crate
storage crate which wasn't used
Reviewed By: aslpavel
Differential Revision: D13301035
fbshipit-source-id: 3ae398752218915dc4eb85c11be84e48168677cc
Summary:
The file of some revision is the initial file content with applied deltas
Delta is a vector of Fragments.
Fragment is a sequential change of the file (old part of the content -> new content)
This diff is representing the implementation of optimization of the process of getting a file content of some revision.
Reviewed By: lukaspiatkowski
Differential Revision: D12928138
fbshipit-source-id: fcc28e2d0e0acf83e17887092f6593e155431c1b
Summary:
Recently there was a change in core hg that changed the way we encode filenames - D9967059. However, it wasn't reflected in Mononoke blobimport code, so the job is constantly fails
This diff change the filename encoding process according to Mercurial
Encoding process is in 3 steps:
1. (Capital -> _lowercaseletter) + ( _ -> __).
If new file name is > 255, than go to step 2, otherwise exit
2. (Capital -> Capital) + (_ -> __)
if new filename is > 255, then got to step 3, otherwise exit
3. (Capitals -> Capitals) + (_ -> : )
Reviewed By: StanislavGlebik
Differential Revision: D10851634
fbshipit-source-id: 28b7503b2601729113326a18ede3e93c04572c6d
Summary:
getfiles implementation for lfs
The implementation is the following:
- get file size from file envelope (retrieve from manifold by HgNodeId)
- if file size > threshold from lfs config
- fetch file to memory, get sha256 of the file, will be fixed later, as this approach consumes a lot of memory, but we don't have any mapping from sha256 - blake2 [T35239107](https://our.intern.facebook.com/intern/tasks/?t=35239107)
- generate lfs metadata file according to [LfsPlan](https://www.mercurial-scm.org/wiki/LfsPlan)
- set metakeyflag (REVID_STORED_EXT) in the file header
- if file size < threshold, process usual way
Reviewed By: StanislavGlebik
Differential Revision: D10335988
fbshipit-source-id: 6a1ba671bae46159bcc16613f99a0e21cf3b5e3a
Summary:
According to [Mercurial Lfs Plan](https://www.mercurial-scm.org/wiki/LfsPlan), on push, for files which size is above the threshold (lfs.threshold config) hg client is sending LFS metadata instead of actual files contents. The main part of LFS metadata is SHA-256 of the file content (oid).
The format requires the following mandatory fields: version, oid, size.
When lfs metadata is sent instead of a real file content then lfs_ext_stored flag is in the request's revflags.
If this flag is set, We are ignoring sha-1 hash verification inconsistency.
Later check that the content is actually loaded to the blobstore and create filenode envelope from it, load the envelope to the blobstore.
Filenode envelope requires the following info:
- size - retrieved on fetching the actual data from blobstore.
- copy_from - retrieved from the file, sent by hg client.
Mononoke still does the same checks for LFS push as for non-lfs push (i.e. checks that all the necessary manifests/filelogs were uploaded by a client)
Reviewed By: StanislavGlebik
Differential Revision: D10255314
fbshipit-source-id: efc8dac4c9f6d6f9eb3275d21b7b0cbfd354a736
Summary: Make get_manifest_by_nodeid accept HgManifestId and correct all calls to get_manifest_by_nodeid.
Reviewed By: StanislavGlebik
Differential Revision: D10298425
fbshipit-source-id: 932e2a896657575c8998e5151ae34a96c164e2b2
Summary:
One brainless idiot decided to prune all trees from changed files calcualation.
Since it also prunes subtrees, that leaves with just files in the root
directory.
Reviewed By: lukaspiatkowski
Differential Revision: D10302299
fbshipit-source-id: 8fe2c4ad8de998dfd4083d97cd816d85b5fec604
Summary:
Hooks need to know whether file was added, modified or removed. For example, we
can't fetch content of a removed file. Also hook authors may want to allow
modifying existing files of a particular type, but they may want to disallow
addition of new files of this type.
`cs.files()` doesn't give information about whether a file was
added/deleted/modified, so we have to use `get_changed_stream` function from
manifest_utils.
Note - currently it still returns incorrect list of changed files for merges.
It will be fixed in the next diffs.
Reviewed By: farnz
Differential Revision: D10237587
fbshipit-source-id: cd7f76334070cde451b4690071d03275e40c95f3
Summary:
This assert was just broken - it's fine to call `get_full_path()` on file
entry. What's disturbing is that there were no tests that cover this behaviour
i.e. no tests returned modified file!
This diff fixes both problems
Reviewed By: farnz
Differential Revision: D10237589
fbshipit-source-id: dcb7f1977768262491b4624a30a5e861c3c1eadf
Summary:
JSON blobs let other users of Mononoke learn what they need to know
about commits. When we get a commit, log a JSON blob to Scribe that other users can pick up to learn what they want to know.
Because Scribe does not guarantee ordering, and can sometimes lose messages, each message includes enough data to allow a tailer that wants to know about all commits to follow backwards and detect lost messages (and thus fix them up locally). It's expected that tailers will either sample this data, or have their own state that they can use to detect missing commits.
Reviewed By: StanislavGlebik
Differential Revision: D9995985
fbshipit-source-id: 527b6b8e1ea7f5268ce4ce4490738e085eeeac72
Summary: Use `ChangesetId` in `DifferenceOfUnionsOfAncestorsNodeStream` instead of `HgNodeHash`. This avoids several bonsai lookups of parent nodes.
Reviewed By: StanislavGlebik
Differential Revision: D9631341
fbshipit-source-id: 1d1be7857bf4e84f9bf5ded70c28ede9fd3a2663
Summary:
Use .chain_err() where appropriate to give context to errors coming up from
below. This requires the outer errors to be proper Fail-implementing errors (or
failure::Error), so leave the string wrappers as Context.
Reviewed By: lukaspiatkowski
Differential Revision: D9439058
fbshipit-source-id: 58e08e6b046268332079905cb456ab3e43f5bfcd
Summary: This commits change `HgBlob` from an enum into a struct that only contains one Bytes field, completely removes `HgBlobHash` and changes the methods of `HgBlob` from returning `Option`s into directly returning results.
Reviewed By: farnz
Differential Revision: D9317851
fbshipit-source-id: 48030a621874d628602b1c5d3327e635d721facf
Summary: gettreepack doesn't care for deleted entries, only about added or modified ones
Reviewed By: StanislavGlebik
Differential Revision: D9378909
fbshipit-source-id: 2935e6b74fbb0208f7cf89ab4b1e761bb9c6000b
Summary:
These are the types that we currently need to be able to serialize if we're to
replace `Asyncmemo`'s caching uses with cachelib. Derive the `Abomonation`
trait for all of them.
Reviewed By: jsgf
Differential Revision: D9082597
fbshipit-source-id: 910e90476a3cc4d18ba758226b8572c3e8d264c6
Summary:
Alas, the diff is huge. One thing is changing Changesets to use ChangesetId.
This is actually quite straightforward. But in order to do this we need to
adapt our test fixtures to also use bonsai changesets. Modifying existing test
fixtures to work with bonsai changesets is very tricky. Besides, existing test
fixtures is a big pile of tech debt anyway, so I used this chance to get rid of
them.
Now test fixtures use `generate_new_fixtures` binary to generate an actual Rust
code that creates a BlobRepo. This Rust code creates a bonsai changeset, that
is converted to hg changeset later.
In many cases it results in the same hg hashes as in old test fixtures.
However, there are a couple of cases where the hashes are different:
1) In the case of merge we are generating different hashes because of different
changed file list (lukaspiatkowski, aslpavel, is it expected?). this is the case for test
fixtures like merge_even, merge_uneven and so on.
2) Old test fixtures used flat manifest hashes while new test fixtures are tree
manifest only.
Reviewed By: jsgf
Differential Revision: D9132296
fbshipit-source-id: 5c4effd8d56dfc0bca13c924683c19665e7bed31
Summary: Going to use this in an upcoming diff.
Reviewed By: farnz
Differential Revision: D9017005
fbshipit-source-id: f2dac81c5349a0715fd68aa4584389de35e85b01
Summary: Those structures will be used in next diffs to store the FilenodeInfo inside memcache for caching purposes
Reviewed By: farnz
Differential Revision: D9014213
fbshipit-source-id: 4952a90415d4b8ab903387fd5cdfaf08d9870c07
Summary:
Fix crate names for where the crate name doesn't match the package
name. This affected a few crates, but in practice only rust-crypto/crypto was
used.
Reviewed By: Imxset21
Differential Revision: D9002131
fbshipit-source-id: d9591e4b6da9a00029054785b319a6584958f043
Summary: Makes structured logging a bit easier.
Reviewed By: farnz
Differential Revision: D8945574
fbshipit-source-id: 12209a99fe33ada1050fda681814b432116f6ca6
Summary:
I don't like this because particularly the empty string for regular
files looks weird.
Reviewed By: StanislavGlebik
Differential Revision: D8888553
fbshipit-source-id: 20a9048a19b3fdfe681160a637bc2dfc8932c113
Summary: This code can easily be shared.
Reviewed By: StanislavGlebik
Differential Revision: D8777307
fbshipit-source-id: f11314f6a63bb191dc38d07cec181a4b05b158d9
Summary:
This is a series of patches which adds Cargo.toml files to all the crates and tries to build them. There is individual patch for each crate which tells whether that crate build successfully right now using cargo or not, and if not, reason behind that.
Following are the reasons why the crates don't build:
* failure_ext and netstring crates which are internal
* error related to tokio_io, there might be an patched version of tokio_io internally
* actix-web depends on httparse which uses nightly features
All the build is done using rustc version `rustc 1.27.0-dev`.
Pull Request resolved: https://github.com/facebookexperimental/mononoke/pull/7
Differential Revision: D8778746
Pulled By: jsgf
fbshipit-source-id: 927a7a20b1d5c9643869b26c0eab09e90048443e
Summary:
This is going to be pretty useful for the admin tool (but there's more
scope for improvement, as filed in the TODO tasks!)
Reviewed By: jsgf
Differential Revision: D8667771
fbshipit-source-id: 55f3d849b6a04d68d4d01ebdc8267cffe57b93dc
Summary:
I really liked how expressing the problem in terms of data structures
made it fairly straightforward to understand.
Reviewed By: farnz
Differential Revision: D8586383
fbshipit-source-id: d1e3c92a3a5b760a0f13f4912420e2a73b937e8d
Summary: There were none, so let's add one
Reviewed By: farnz
Differential Revision: D8596062
fbshipit-source-id: 99340575e80e3341174bec156db57e902747e1ab
Summary:
Manifests are always able to return entries immediately, and never
fail.
Reviewed By: lukaspiatkowski, farnz
Differential Revision: D8556499
fbshipit-source-id: e21a2522f1219e47db9b55b24b6ac6c0c463933e
Summary:
This will also allow file blob sharing between the Mercurial and Mononoke
data models.
Reviewed By: farnz
Differential Revision: D8440330
fbshipit-source-id: a29cd07dcecf0959dffb74b7428f3cb11fbd3db6
Summary:
Store manifests as Thrift blobs instead. Required fixing up a lot of
different places, but they should all be pretty clear now.
Reviewed By: farnz
Differential Revision: D8416238
fbshipit-source-id: 523e3054e467e54d180df5ba78445c9b1ccc3b5c
Summary:
Pretty straightforward. Also using this opportunity to add per-repo
prefixes, since all the hashes are going to break anyway.
Note for reviewers: almost all the change is regenerated test fixtures (unfortunately necessary to make atomic). The actual substantive changes are all in the first few files.
Reviewed By: farnz
Differential Revision: D8392234
fbshipit-source-id: c93fc8c6388cb00fe5cff95646ad8c853581cb8c
Summary:
Background info: pruner is a function that is called on every node during a
tree walk. If it returns false that this node and all of it's subdirectories
are excluded from the walk. visited_pruner is used during gettreepack request
processing when clients requests data for a few manifest nodes at once.
Without visited_pruner some nodes maybe generated twice, and that can make
gettreepack request significantly slower.
While processing gettreepack requests we were inserting all the directories and
files in the visited_pruner's hashmap. However, gettreepack request doesn't
care about files at all, and we were wasting memory for no reason.
This diff fixes it by adding a file_pruner and and_pruner_combinator. File
pruner removes all the files, and only then we apply visited pruner.
Reviewed By: jsgf
Differential Revision: D8517101
fbshipit-source-id: 62f5f6de75a0aac06ec6b1d2cc54c6a99813965b
Summary:
Previously if a directory had two files with different names but same hashes,
then only one would be returned. This diff fixes it.
Reviewed By: jsgf
Differential Revision: D8517100
fbshipit-source-id: a0f439fb41e3eb9d69fb809b5f57d704c544e898
Summary: Added logic to save `FileChange` as a Mercurial format `HgBlobEntry`
Reviewed By: sunshowers
Differential Revision: D8187792
fbshipit-source-id: 4714c81ab23ebac528cfec15c4a9e66083d4fb6c
Summary:
Note that no prefix is actually prepended at the moment -- there's an
XXX marking the spots we'll need to update. We'll probably add a prefix once Thrift serialization is turned on.
Reviewed By: farnz
Differential Revision: D8387761
fbshipit-source-id: 0fe2005692183fa91f9787b4c80f600df21d1d93
Summary:
Unfortunately `HgParents` can't represent all valid parents, because
it can't represent the semantically important case where `p1` is `None` and
`p2` is not. (For incoming changesets we'd like to keep full fidelity with
Mercurial.)
All the Thrift definitions store `p1` and `p2` separately anyway, so just make
that change throughout `RevlogChangeset` and `BlobChangeset`.
Reviewed By: StanislavGlebik
Differential Revision: D8374125
fbshipit-source-id: 63674daaad05d4d4cae3778744dbf1c14b3c2e3b
Summary:
Now it is as it should be: mercurial_types have the types, mercurial has revlog related structures
burnbridge
Reviewed By: farnz
Differential Revision: D8319906
fbshipit-source-id: 256e73cdd1b1a304c957b812b227abfc142fd725
Summary:
Pruner is an FnMut that is called on every entry
that `changed_entry_stream` visits. If Pruner returns false, then
`changed_entry_stream` won't recurse into that.
The primary motivation for that is gettreepack request. We may get requests
with a few `mfnodes`. Currently we just merge a few `get_changed_entry`
streams. The problem is that some entries will be in both streams, and thus
will be generated twice. visited_pruner changes it.
Reviewed By: jsgf
Differential Revision: D8207271
fbshipit-source-id: 1896cb68edbca57651c4767d9326b9bbbab2c980