Summary:
We'll be running in Tupperware, and want to shrink when we get too
large to avoid OOM due to caches. Configure cachelib appropriately
Reviewed By: StanislavGlebik
Differential Revision: D8900371
fbshipit-source-id: 4f1f64c2508c64e4ce2d201e0a0e86446f84ffef
Summary:
I don't like glog for interactive use at all. But keep it as the
default for blobimport, and add a flag to change it.
Reviewed By: StanislavGlebik
Differential Revision: D8909674
fbshipit-source-id: d0b9c439f72f231c95e9109e16b30e87cfaa2eed
Summary: later this library will be used by admin tool to ease importing of config repo
Reviewed By: StanislavGlebik
Differential Revision: D8882178
fbshipit-source-id: 293a26b038f8d76e9fcedb72a4041a48f502a00a
Summary: Will factor this out into several files in upcoming patches.
Reviewed By: StanislavGlebik
Differential Revision: D6094811
fbshipit-source-id: cd354888882aff2552e61dea788aeb5426e08f4d
Summary:
There is no need to insert the same entries twice. Let's filter them.
Note that while it's possible to have the same manifest entries (for example,
file or dirs with the same content), all changeset entries should be unique,
because each changeset in the repo is unique and is processed exactly once.
Reviewed By: farnz
Differential Revision: D6076667
fbshipit-source-id: 64bdf25a21884eb2faf43f32590f7cbb8f8dd300
Summary:
Let's move all IO to the separate thread. This helps quite a lot when used with
slow blostore, because parser threads are not blocked on IO -
importing upstream mercurial repo went from 20 mins to 9 mins.
Reviewed By: lukaspiatkowski
Differential Revision: D6050992
fbshipit-source-id: c3877b123bad993d819495247135544a141eab10
Summary: Change the default bucket for blobimport to be mononoke_prod, a higher capacity bucket than the previous mononoke bucket. Also make it possible to specify the bucket via the CLI rather than hardcoding it.
Reviewed By: jsgf
Differential Revision: D6073745
fbshipit-source-id: 11dcf0c8bbef0b7c3f5971cf0676cf6325f276a6
Summary: the glog drain does not swallow f.e. backtrace of error_chain errors, so it is a bit easier to debug the tool
Reviewed By: farnz
Differential Revision: D6021671
fbshipit-source-id: 32bfe01bfd77d85c37a2a446cb3e5d000763c689
Summary:
Realized that we were missing a few crates from the Tokio cleanup because those crates
didn't have `#![deny(warnings)]`.
This also caused a bunch of files to be rustfmted, which is fine.
Reviewed By: kulshrax
Differential Revision: D6024628
fbshipit-source-id: 55032d20f3676c92ef124d861e1edcd34126ab55
Summary: Compaction can slow down blobimporting a lot. Let's add an option to postpone it till the end
Reviewed By: farnz
Differential Revision: D5882003
fbshipit-source-id: 0611a8e94b3d7331bdacf909d820526f547414a0
Summary: Also ensure that `blobimport` doesn't use its own copy.
Reviewed By: jsgf
Differential Revision: D5847604
fbshipit-source-id: 5390848cd5fab8abd967ef9701720491d703c0f1
Summary: Use `impl Future` rather than a boxed future.
Reviewed By: sid0
Differential Revision: D5829773
fbshipit-source-id: 40c4339e96f7194544f416534952b78a23d93fa6
Summary: Add the `--blobstore manifold` option to blobimport to make it write blobs to Manifold.
Reviewed By: jsgf
Differential Revision: D5758930
fbshipit-source-id: a14a3c155b5d8d7b171ed7a4e53f8569539cb2e9
Summary:
`:` is a reserved character for Windows paths, so Mercurial rejects
them from being committed. Use `-` instead, so that we can commit file blob
repo test fixtures.
Reviewed By: kulshrax
Differential Revision: D5731525
fbshipit-source-id: 8d14fc03f1b135cbc4d42aeaf2f3a0ae6d13f956
Summary: This gets us `Display` support as well.
Reviewed By: lukaspiatkowski
Differential Revision: D5734383
fbshipit-source-id: 1485cf80bb310cdd282b4546bed56c60082be8ec
Summary: Just a few minor changes that make our lives easier overall.
Reviewed By: lukaspiatkowski
Differential Revision: D5737854
fbshipit-source-id: da951d7872433bffa8fc64d15cd0e917f77144b5
Summary:
We want to avoid putting the same entries twice in the blobstore. And even more - we want to avoid generating list of these entries at all in the first place.
The first approach was to add a `Mutex<HashSet>` that worker threads will use to filter out entries that were already imported. Turned out that this Mutex kills almost all the speedup from concurrency.
But since we have linkrevs then for each entry we know in which commit this entry was created [1]. That means that all of the entries are already nicely split between the threads. So no synchronization is needed.
It gives a good speedup - from ~7min to 2min of importing of hg upstream treemanifest repo using file blobstore.
Note: there is still a lock contention - tree revlogs and file revlogs maps are protected by mutex. We can optimize it later if needed.
[1] There is a well-known linkrev issue in mercurial. It shouldn't affect our case at all.
Reviewed By: jsgf
Differential Revision: D5650074
fbshipit-source-id: c4f9e2763127ffe4402417dd3963f1f450d7b325
Summary: Main part is `get_stream_of_manifest_entries` that creates a stream of all tree manifest entries by recursively going through all of them.
Reviewed By: jsgf
Differential Revision: D5622490
fbshipit-source-id: 4a8b2707df0300a37931c465bafb1ed54d6d4d25
Summary:
A preparation step before blob importing of tree manifest repos to blobrepo.
`get_parents()` method of BlobEntry reads parents from the blobstore. It works fine for file entries because file entries can stores its parents in the blobstore. With tree manifests BlobEntry can contain also tree manifest entries, and that means that tree manifest entries parents should also be stored somewhere in the blobstore.
I suggest to use the same logic for the tree manifest entries as for the file entries. File and manifest entries have two blobstore entries - one stores hash of the content and parents, another stores the actual content.
To do this I moved `RawNodeBlob` and `get_node()` to the separate module and made fields public.
Reviewed By: jsgf
Differential Revision: D5622342
fbshipit-source-id: c9f0c446107d4697b042544ff8b37a159064f061
Summary:
Instead of storing `Vec<u8>`, let's store `Vec<PathComponent>`, where PathComponent is Vec<u8> without b'\'.
To make sure len() is still `O(1)` let's store it too.
Reviewed By: sid0
Differential Revision: D5573721
fbshipit-source-id: 91967809284d79bf0fcdcabcae9fd787a37c318b