Summary:
batch_derive() is a dangerous function to use. I'd love to delete it but this
function is very useful for backfilling, so unfortunately I can't.
The problem arises when one tries to backfill blame and unodes simultaneously
(or just derive blame which in turn derives unodes). While batch_derive()
tries to be careful with inserting "outer" derived data's mappings (i.e. blame
mapping), it doesn't do it for inner derived data mappings (i.e. unodes). So we
might end up in the situation where we insert unodes mapping before we inserted
all the manifests for it. If this thing fails in the middle of derivation then
we have a corruption.
Let's do not use it in blobimport. It will make derivation slower, but I'd
rather make it slower than incorrect.
Reviewed By: farnz
Differential Revision: D21905619
fbshipit-source-id: c0227df195a8cf4482b2452ca928acbc5750b3e5
Summary:
I want to reuse the functionality provided by `fetch_all_public_changesets`
in building Segmented Changelog. To share the code I am adding a new crate
intended to store utilities in dealing with bulk fetches.
Reviewed By: krallin
Differential Revision: D21471477
fbshipit-source-id: 609907c95b438504d3a0dee64ab5a8b8b3ab3f24
Summary:
subcommand_single calls `derived_data_utils.regenerate(vec![cs_id])` with the
intention that derived data for this commit will be regenerated. However
previously it didn't work because DerivedDataUtils::derive() was ignoring
regenerate parameter. This diff fixes it.
Reviewed By: krallin
Differential Revision: D21527344
fbshipit-source-id: 56d93135071a7f3789262b7a9d9ad84a0896c895
Summary:
This diff logs the delay in deriving data. In particular it logs how much time
has left since an underived commit was created.
Note that this code makes an assumption about monotonic dates - for repos with pushrebase
repos that should be the case.
Reviewed By: krallin
Differential Revision: D21427265
fbshipit-source-id: bfddf594467dfd2424f711f895275fb54a4e1c60
Summary:
Two things will be simplified:
1) Do not pass sqlbookmarks, we can always get them from blobrep
2) Instead of passing repo per derived data type let's just always pass
unredacted repo
Add a very simple unittest
Differential Revision: D21426885
fbshipit-source-id: 712ef23340466438bf34a086517f7ba33d4eabed
Summary:
- Change get return value for `Blobstore` from `BlobstoreBytes` to `BlobstoreGetData` which include `ctime` metadata
- Update the call sites and tests broken due to this change
- Change `ScrubHandler::on_repair` to accept metadata and log ctime
- `Fileblob` and `Manifoldblob` attach the ctime metadata
- Tests for fileblob in `mononoke:blobstore-test` and integration test `test-walker-scrub-blobstore.t`
- Make cachelib based caching use `BlobstoreGetData`
Reviewed By: ahornby
Differential Revision: D21094023
fbshipit-source-id: dc597e888eac2098c0e50d06e80ee180b4f3e069
Summary:
Currently we need to specify which derived data we need to derive, however they
are already specified in the configerator configs. Let's just read it from
there.
That means that we no longer need to update tw spec to add new derived data types - we'll just need to add them to configerator and restart the backfiller.
Reviewed By: krallin
Differential Revision: D21378640
fbshipit-source-id: f97c3f0b8bb6dbd23d5a50f479ecfccbebd33897
Summary:
In the next diffs I'd like to introduce cleaner for unodes. This diff just
moves a bunch of code around to make reviewing next diffs easier
Reviewed By: krallin
Differential Revision: D21226921
fbshipit-source-id: c9f9b37bf9b11f36f8fc070dfa293fd8e6025338
Summary:
This diff adds a special dry-run mode of backfilling (for now only fsnodes are
supported). It does by keeping all derived data in memory (i.e. nothing is
written to blobstore) and periodically cleaning entries that can no longer
be referenced.
This mode can be useful to e.g. estimate size of derived data before actually
running the derivation.
Note that it requires --readonly-storage in order to make sure that we don't
accidentally write anything to e.g. mysql.
Reviewed By: ahornby
Differential Revision: D21088989
fbshipit-source-id: aeb299d5dd90a7da1e06a6be0b6d64b814bc7bde
Summary: File is getting too large - let's split it
Reviewed By: farnz
Differential Revision: D21180807
fbshipit-source-id: 43f0af8e17ed9354a575b8f4dac6a9fe888e8b6f
Summary: See comments for more details
Reviewed By: krallin
Differential Revision: D21088800
fbshipit-source-id: b4c187b5d4d476602e69d26d71d3fe1252fd78e6
Summary: I think it's more readable this way
Reviewed By: krallin
Differential Revision: D21088598
fbshipit-source-id: 1608c250701ae6870094f0f61c0c2ce4e2c12ebf
Summary:
In the next few diffs I'm going to add more functionality to backfill derived
data. The file has grown quite big already, so I'd rather put this new
functionality in a separate file. This diff does the first step - it just moves
a file to a separte directory.
Reviewed By: farnz
Differential Revision: D21087813
fbshipit-source-id: 4a8e3eac4b8d478aa4ceca6bb55fa0d2973068ba