Summary: The Copy trait means that something is so cheap to copy that you don't even need to explicitly do `.clone()` on it. As it doesn't make much sense to pass &i64 it also doesn't make much sense to pass &<Something that is Copy>, so I have removed all the occurences of passing one of ouf hashes that are Copy.
Reviewed By: fanzeyi
Differential Revision: D13974622
fbshipit-source-id: 89efc1c1e29269cc2e77dcb124964265c344f519
Summary:
Removed:
cmd-line cmd tool for filenodes and bookmarks. These should be a part of
mononoke_admin script
Outdates docs folder
Commitsim crate, because it's replaced by real pushrebase
unused hooks_old crate
storage crate which wasn't used
Reviewed By: aslpavel
Differential Revision: D13301035
fbshipit-source-id: 3ae398752218915dc4eb85c11be84e48168677cc
Summary:
Mononoke blobimports filenodes from Mercurial Host Machines to its blobstore.
It parses revlogs to retrieve flags used for parsing.
Revlogs Contains the flags for Marking Ext Stored FIles. This flag means that the file is not stored at the Mercurial Host machine, but somewhere remotely.
This diff provides the api to get the extstored flag from revlog.
Reviewed By: farnz
Differential Revision: D13081950
fbshipit-source-id: ba5bc04ad3659d4880960995d1cc46594d89e220
Summary:
Add support for the storerequirements feature of Mercurial repositories, which
requires the reader to additionally check the store/requires file for store
requirements.
Reviewed By: StanislavGlebik
Differential Revision: D9850335
fbshipit-source-id: 557ea0f90f3d138d1df56edd94ee23760b9fd849
Summary:
The old blobimport tool will not be able to import commits with the new Thrift serialization they'll be switching to.
`blobrepo::utils::RawNodeBlob` is also used by the admin tool, and it will go away once we start using Thrift serialization.
Reviewed By: farnz
Differential Revision: D8372455
fbshipit-source-id: d02a37e33e1ccd4dd1f695e38dbb40851dd51cd6
Summary:
Now it is as it should be: mercurial_types have the types, mercurial has revlog related structures
burnbridge
Reviewed By: farnz
Differential Revision: D8319906
fbshipit-source-id: 256e73cdd1b1a304c957b812b227abfc142fd725
Summary:
This codemod tries not to change the existing behavior of system, only introduce new types specific to Mercurial Revlogs.
It introduces a lot of copypasta intentionally and it will be cleaned in following diffs.
Reviewed By: farnz
Differential Revision: D7367191
fbshipit-source-id: 0a915f427dff431065e903b5f6fbd3cba6bc22a7
Summary: Represent the root tree as None.
Reviewed By: farnz
Differential Revision: D7354168
fbshipit-source-id: 5d71a3bd43c19e86ecf7d53a3f721547acabe080
Summary:
RevlogRepo exposes a ton of methods that are almost equvalent to taking Revlog directly and ignoring the RevloRepo abstraction above it.
This diff cleans this up a bit, there are still some methods that the "old" blobimport uses, but the "new" one shouldn't need to do that.
Reviewed By: StanislavGlebik
Differential Revision: D7289445
fbshipit-source-id: ac7130fe41c4e4484d6986fe5b19d5adc751369a
Summary:
While writing Thrift deserialization code I realized there was nothing
that actually checked that MPathElement instances don't have embedded nulls or
slashes.
Reviewed By: farnz
Differential Revision: D7296838
fbshipit-source-id: 6a23d559da11e5e935e23d7b9a13f58894efaf62
Summary:
Mononoke will introduce its own ChangesetId, ManifestId and BlobHash, and it
would be good to rename these before that lands.
Reviewed By: farnz
Differential Revision: D7293334
fbshipit-source-id: 7d9d5ddf1f1f45ad45f04194e4811b0f6decb3b0
Summary: Replace the generic types if `Blob` and `BlobNode` with `Bytes`.
Reviewed By: lukaspiatkowski
Differential Revision: D7115361
fbshipit-source-id: 924d347377569c6d1b3b4aed14d584510598da7b
Summary: Change BlobChangeset and callers to use ChangesetId instead of NodeId
Reviewed By: lukaspiatkowski
Differential Revision: D6835450
fbshipit-source-id: 7b20359837632aef4803e40965380c38f54c9d0a
Summary: Update the bookmarks module to use ChangesetId to represent bookmarks, rather than NodeHash.
Reviewed By: lukaspiatkowski
Differential Revision: D6774650
fbshipit-source-id: 1742e4e78798ad68a7f17ebd345eef14a7de2cec
Summary:
We're never going to serve RevlogRepo in production, and we're down to
a single BlobRepo type that will have different backing stores. Remove the
unused trait, and use BlobRepo everywhere bar blobimport and repo_config
(because we previously hardcoded revlog here - we want to change to a BlobRepo
once blobimport is full-fidelity).
Reviewed By: jsgf
Differential Revision: D6596164
fbshipit-source-id: ba6e76e78c495720792cbe77ae6037f7802ec126
Summary:
Adds an option that sets the number of filelogs and revlogs that will be loaded
in memory. That let's us use blobimporting in memory constrained
enviroments.
Reviewed By: jsgf
Differential Revision: D6532734
fbshipit-source-id: b748478ec80e75f56a8e07ae1532b0d69c4a5e16
Summary:
We don't read the dirstate, but treedirstate has a new repo requires
as it changes dirstate format. Learn to ignore it.
Reviewed By: jsgf
Differential Revision: D6509774
fbshipit-source-id: 46faedcece308e2ebc34d87a62d2391a68eeea38
Summary:
Convert scm/mononoke to use failure, and update common/rust crates it depends on as well.
What it looks like is a lot of deleted code...
General strategy:
- common/rust/failure_ext adds some things that are in git failure that aren't yet in crates.io (`bail!` and `ensure!`, `Result<T, Error>`)
- everything returns `Result<T, failure::Error>`
- crates with real error get an error type, with a derived Fail implementation
- replicate error-chain by defining an `enum ErrorKind` where the fields match the declared errors in the error! macro
- crates with dummy error-chain (no local errors) lose it
- `.chain_err()` -> `.context()` or `.with_context()`
So far the only place I've needed to extract an error is in a unit test.
Having a single unified error type has simplified a lot of things, and removed a lot of error type parameters, error conversion, etc, etc.
Reviewed By: sid0
Differential Revision: D6446584
fbshipit-source-id: 744640ca2997d4a85513c4519017f2e2e78a73f5
Summary:
It doesn't do anything specific, but at least now it doesn't
fail if you try to open hgsql repo
Reviewed By: jsgf
Differential Revision: D6405323
fbshipit-source-id: d844f723ffe4cb8dcd2d2d71351d43524db51201
Summary:
There is a complex logic of path encoding in mercurial. Previously only one
case was implemented, when both 'fncache' and 'store' requirements are present.
This commit adds implementation for the case when 'store' requirement is
present, but 'fncache' is not.
Reviewed By: jsgf
Differential Revision: D6405322
fbshipit-source-id: 3b4a0c5b0fd22f43593ffff54dfe748589294012
Summary:
There are a few different path encodings in mercurial. Next diff will add
another one, so in this let's rename fsencode to fncache_fsencode
Reviewed By: jsgf
Differential Revision: D6405324
fbshipit-source-id: 3e67d972b02ca41f29fe24250fb227dd384ea0da
Summary:
Returning the bookmarks object gets in the way of degenericising
bookmarks. It's also not in line with other methods on the Repo trait - the
repo handles querying the underlying storage, not the user.
Switch to providing pass-through interfaces for bookmarks.
Reviewed By: StanislavGlebik
Differential Revision: D6408644
fbshipit-source-id: 2808850a070b7bcc478cd40d824bdc8d3acb8b0f
Summary:
This makes it quite easy to write out linknodes.
Also regenerate linknodes for our test fixtures -- the next commit will bring
them in.
Reviewed By: jsgf
Differential Revision: D6214033
fbshipit-source-id: 3b930fe9eda45a1b7bc6f0b3f81dd8af102061fc
Summary:
`RepoPath` represents any absolute path -- root, directory or file. There's a
lot of code that manually switches between directory and file entries --
abstract all of that away.
Reviewed By: farnz
Differential Revision: D6201383
fbshipit-source-id: 0047023a67a5484ddbdd00bb57bca3bfb7d4dd3f
Summary:
Blobimporting of huge repos can result in huge memory usage and may result in OOM errors.
Let's use stupid but quite effective method in practice - let's clear revlogs when we have too many of them.
We can't use lru_cache crate (http://contain-rs.github.io/lru-cache/lru_cache/) because it doesn't have immutable methods, so we won't be able to use ReadWriteLock optimization we've added before.
Reviewed By: lukaspiatkowski
Differential Revision: D5953892
fbshipit-source-id: 9d78b0065ee9901d35567972cf0014c3c00c3c77
Summary:
This Version type is going to form the basis for other key-value
stores like linknodes, so it needs to be moved into a separate crate.
I've chosen `storage_types` as the name because it seems to be the most obvious
candidate.
Reviewed By: jsgf
Differential Revision: D6015772
fbshipit-source-id: 52de7866d68fdec2a4908626679a6f08c5f73402
Summary:
Will be making functional changes to these files separately and don't
want the two to interfere.
Reviewed By: jsgf
Differential Revision: D6015773
fbshipit-source-id: 26529ce4075ac47e5f0e80177319e1beb90c2076
Summary: Bottleneck while blob importing of a huge repo is in the lock contention. Replacing it with RwLock speeds up fbsource import from 20 hours down to 6 hours.
Reviewed By: farnz
Differential Revision: D5891097
fbshipit-source-id: bbac2e113896958d6f2da270837c9787e701b5cb
Summary:
`Path` has the potential to be confused with `std::path::Path`.
`MPath` is nice, concise, and clearly different from `Path`.
Reviewed By: jsgf
Differential Revision: D5895665
fbshipit-source-id: dc5ed5c3866b227d753c6d904d3c6d213c882cd7
Summary: Going to make a few changes here.
Reviewed By: jsgf
Differential Revision: D5895642
fbshipit-source-id: 79483e15087d4c552b6bc9801ad3fe0aaba071d6
Summary: Encoding only index path is not enough. It works fine for now because we don't use hash encoding. Next diff adds hash encoding, so we need to encode datapath too.
Reviewed By: jsgf
Differential Revision: D5719574
fbshipit-source-id: 4f2a4a75baad73313e80ffb81031166d4bab3e29
Summary:
Let's use separate function fsencode instead of two methods fsencode_dir and fsencode_file.
There are a few reasons for that:
1) This is similar to upsteram mercurial code - it also uses separate function, not a method of the class.
2) Path is supposed to represent a file in the mercurial vfs. Previously we joined this file with "00manifest.i" - it creates a file that doesn't exist in mercurial vfs. This point is debatable though, so I'm fine with making it a method of the class. But probably it doesn't matter that much.
2) We never actually need to encode directory - even in tree manifest case we use fsencode to find location of`00manifest.*` files. That means, that we don't really need to have separate fsencode_dir function, so I was wrong when I added them in the first place.
3) Special hash encoding is used to encode paths that are longer than 120 chars (will be added in the next diff). `00manifest.i` and `00manifest.d` are used in the hash digest, and that means that one fsencode_dir() method is not enough - we'd need to add separate methods to fsencode idx file and separate method to fsencode data file.
Reviewed By: jsgf
Differential Revision: D5719576
fbshipit-source-id: ca6b38dd7d0c6c0c5a345d8fcbe1b0d6fa10a062
Summary: Going to be making significant changes to these files soon.
Reviewed By: kulshrax
Differential Revision: D5796735
fbshipit-source-id: 879fbca3fc936a538c95e50a3333fc2c312de15b
Summary:
Pass a reference to the cache to the fill function. This allows the
function to be recursive based on memoized values.
This also required quite a bit of restructuring to make sure that locks
and ownership are handled properly during recursive calls. Specifically,
a new `Slot` state - `Polling` - is used to indicate when a thread/task
is currently calling `.poll()` on the future. This contains a list of
futures Tasks which are interested in the state of the slot which can be
notified when it changes state.
Also removed unused Entry API code.
Reviewed By: sid0
Differential Revision: D5652704
fbshipit-source-id: 29cd3fe37d4eb9316235872b7e2e228bf10a016f
Summary:
Core mercurial takes "data/" and "meta/" prefixes into account when does
fsencode.
It doesn't make a difference now, but it will make a difference when we'll add
hashencode to the fsencode() function.
Reviewed By: jsgf
Differential Revision: D5670748
fbshipit-source-id: 661974c25e00979eedffb30b432518135f0dc631
Summary:
We want to avoid putting the same entries twice in the blobstore. And even more - we want to avoid generating list of these entries at all in the first place.
The first approach was to add a `Mutex<HashSet>` that worker threads will use to filter out entries that were already imported. Turned out that this Mutex kills almost all the speedup from concurrency.
But since we have linkrevs then for each entry we know in which commit this entry was created [1]. That means that all of the entries are already nicely split between the threads. So no synchronization is needed.
It gives a good speedup - from ~7min to 2min of importing of hg upstream treemanifest repo using file blobstore.
Note: there is still a lock contention - tree revlogs and file revlogs maps are protected by mutex. We can optimize it later if needed.
[1] There is a well-known linkrev issue in mercurial. It shouldn't affect our case at all.
Reviewed By: jsgf
Differential Revision: D5650074
fbshipit-source-id: c4f9e2763127ffe4402417dd3963f1f450d7b325
Summary: We'll need to implement blob importing from revlogrepo to blobrepo
Reviewed By: jsgf
Differential Revision: D5622462
fbshipit-source-id: c57f016711bcec7c0bd432d40881588ebdce6f7f
Summary: Repos with flat manifest and with tree manifest use different path. Let's open tree manifest file if it's present and flat manifest otherwise
Reviewed By: sid0
Differential Revision: D5583566
fbshipit-source-id: e35eef4b1f8067c2a91ebfc62718fec100f19e2e
Summary:
Add function similar to get_file_revlog to get tree manifest revlog.
Will be used in the next diffs.
Reviewed By: jsgf
Differential Revision: D5563895
fbshipit-source-id: 0ff84c458eb071763cbdc6d98bcf92e9b8ccc1b8
Summary:
Previously `fsencode()` worked incorrectly if Path was a directory. We didn't notice it before because we've never used Path to store directories, but we will use it for TreeManifest.
I considered two options when implementing it.
1) Put some kind of flag `isDir` inside Path struct. But that would create complications with `join()` method. For example, you can't join anything to the file - what should we do in this case? panic? return result?
2) and another `fsencode_dir()` method. Clients need to know what kind of Path they have. I choose this option because it requires less changes and brings less complications compared to the option 1
Reviewed By: sid0
Differential Revision: D5574847
fbshipit-source-id: c4c476a7fc3b884de847c431a56ff5f313c1389f
Summary:
Both changelog and manifest are revlogs and they are both protected by their
own mutexes. There is no need to protect by RevlogRepo mutex.
It requires making `Revlog::get_heads()` method accept immutable `&self`. This
is fine since all `Revlog` methods are protected by mutex.
Reviewed By: farnz
Differential Revision: D5618949
fbshipit-source-id: 0511d547360e9785cb6e2cefadf8c10626a433c4