Summary:
At open time, it's pointless to attempt to create new levels. So let's just
read the existing max_level and do not try to build max_level + 1.
This turns out to save 300ms in profiling result.
Reviewed By: sfilipco
Differential Revision: D23494509
fbshipit-source-id: 4ea326a3cc21792790ea0b87e5bf608a94ae382b
Summary:
With MultiLog, per-log meta was previously entirely ignored. However, they can
be useful for updated indexes. For example, application defines a new index,
and opens a Log via MultiLog. The application would expect the new index is
built only once. Without MultiLog, per-log meta is updated at open time in
place. With MultiLog, the updated index meta is not written back to the
multimeta so the new index would be rebuilt multiple times undesirably.
Update MultiLog to reuse the per-log meta if it's compatible so it can pick up
new indexes.
Reviewed By: sfilipco
Differential Revision: D23488212
fbshipit-source-id: c8b3e6b5589dbda2e76a143d15085862a93dae22
Summary:
The poisoned meta makes investigation harder. ex. `debugdumpindexlog` won't
work on those logs.
Reviewed By: sfilipco
Differential Revision: D23488213
fbshipit-source-id: b33894d8c605694b6adf5afdaed45707fbd7357e
Summary:
I'm going to change this function soon, so it's nice to asyncify it to make
next diffs simpler and also remove duplicated logic.
Also remove unnecessary `logger` parameter - we can always get logger from CoreContext
Reviewed By: krallin
Differential Revision: D23501634
fbshipit-source-id: 7ad2fc17167e4107481ceb230e0b7cb3e7f2549a
Summary: This closely replicates EscapeZero work in D23328638 and will allow us to issue requests to SCS using Thrift Fiddle (https://www.internalfb.com/thrift_fiddle).
Reviewed By: EscapeZero
Differential Revision: D23475864
fbshipit-source-id: fb286e3fcd6ea79704fa2e7e1ed9ab5595ff7b81
Summary: Now that post-request callbacks are available in `gotham_ext`, we can make `StreamBody` use them directly instead of using an LFS-specific wrapper (previously required to access the LFS server's `RequestContext`). This also means that the EdenAPI server will get this behavior for free.
Reviewed By: krallin
Differential Revision: D23402969
fbshipit-source-id: 56ab710473f13e8983b136664af364af6884bd3f
Summary: Add `LogMiddleware` to the EdenAPI server, which will print a log message whenever a request is received or has completed.
Reviewed By: DurhamG
Differential Revision: D23299902
fbshipit-source-id: f44ef1b01692f0e4f9b109917fcee89a84ca4208
Summary: Use `LoadMiddleware` to track the number of outstanding requests in the server.
Reviewed By: DurhamG
Differential Revision: D23298415
fbshipit-source-id: bdcdb0f657d8deac593d356c87ac0d8d3f39e322
Summary: Now that `LogMiddleware` no longer depends on `RequestContext`, it can be moved into `gotham_ext`.
Reviewed By: DurhamG
Differential Revision: D23298412
fbshipit-source-id: d5288decba98c3dd4605b9a44e41eba0f47fee37
Summary: Now that `LoadMiddleware` no longer depends on `RequestContext`, it can be moved into `gotham_ext`.
Reviewed By: DurhamG
Differential Revision: D23298416
fbshipit-source-id: 5d29da492e39beb5621daf0570d9b3e657cbfc04
Summary: This diff removes the post-request callback functionality from the LFS server's `RequestContext` and replaces it with the new `PostRequestMiddleware`. The middleware is directly based on `RequestContext`, so the underlying behavior is essentially the same as before.
Reviewed By: krallin
Differential Revision: D23298413
fbshipit-source-id: 1e58a40f6ce6d526456dbd9ae3a8efc85768bf04
Summary: Make `PostRequestMiddleware` generic over a user-provided config struct which can be used to dynamically configure the behavior of post-request callback dispatching. Right now this is only used to support disabling hostname logging, but could be easily extended to cover more uses in the future.
Reviewed By: krallin
Differential Revision: D23495005
fbshipit-source-id: 3d59a8346f449775ec76d03c260d973d04fb90a9
Summary: Add new middleware that allows HTTP handlers and other middleware to register callbacks that will be run once the current request completes. This is heavily based on the post-request callback functionality from the LFS server's `RequestContext`. The intention here is to expose this functionality in a manner that's independent of other, application-specific logic.
Reviewed By: krallin
Differential Revision: D23298419
fbshipit-source-id: e4b1534b02c35f685ce544de13e331947e187818
Summary:
I pattern matched off of this for the previous diff in this stack, and spotted
a bit of clean up that might make sense here:
- Using `.help()` for a subcommand overrides the whole help text. We meant to
use `.about()` here. I fixed this in some copy-pasted code as well.
- Printing debug output alongside real output makes it harder to select the
real output. I fixed this by logging debug output to stderr instead.
Reviewed By: StanislavGlebik
Differential Revision: D23471560
fbshipit-source-id: 7900cfe65613c48abd77faad6d6a45a7aa523b36
Summary:
This adds a subcommand for dumping all the paths in a repository. This is
helpful when you have a Content ID, limited imagination and time on your hands,
and you'd like to turn those into a file path where that Content ID lives.
This uses fsnodes for the traversal because that's O(# directories) as opposed
top O(# files). I had an earlier implementation that used unodes, but that was
really slow.
Reviewed By: StanislavGlebik
Differential Revision: D23471561
fbshipit-source-id: 948bfd20939adf4de0fb1e4b2852ad4d12182f16
Summary:
add backsyncing to rewrite file paths:
After setting the variables for large repo (D23294833 (d6895d837d)), we try to import the git commits into large repo and rewrite the file paths.
Following this, repo import tool should back-sync the commits into small_repo.
next step: derive all the data types for both small and large repos. Currently, we only derive it for the large repo.
==============
remove backup file:
The backup file was a last-minute addition when trying to import a repo for the first time.
Removed it, because we shouldn't write to external files. Future plan is to include
better process recoverability across the whole tool and not just rewrite file paths functionality.
Reviewed By: StanislavGlebik
Differential Revision: D23452571
fbshipit-source-id: bda39694fa34788218be795319dbbfd014ba85ff
Summary: More hooks will come in next diffs.
Reviewed By: aslpavel
Differential Revision: D23449755
fbshipit-source-id: 451fdb7a759140f2d6df8f3a18493c700fa2b761
Summary:
That's one of the sev followups. Before redacting a file content let's check if
it exists in "main-bookmark" (which is be default master), and refuse to redact
if it actually exists.
If this check passes (i.e. the content we are about to redact is not reachable
from master) that doesn't mean that we are 100% safe. E.g. this comment can be
in ancestor of master, or in any other repo or it can be added in the next
commit.
This check is a best-effort check to prevent shooting ourselves in the foot.
Reviewed By: aslpavel
Differential Revision: D23476278
fbshipit-source-id: 5a4cd10964a65b8503ba9a6391f17319f0ce37d8
Summary:
The loop took care to advance `b` to match the amount
of data that it had processed, but was still passing `buf`
(the unadjusted start of the buffer) to the syscalls.
This meant that in situations where a `readFull` might
encounter a partial read, it would scribble over the start
of the buffer and leave junk at the end.
For example:
write("hell");
write("o");
could produce "oell?" in the buffer when `readFull` consumes
the other end of the pipe.
Reviewed By: xavierd
Differential Revision: D23486270
fbshipit-source-id: 0848f6789b44421b609b91fe08890768ff59f7f5
Summary:
Currently we use a single path prefix to configure data fetch logging in eden
(i.e if the path of a file which we fetch is an extension of our configured
path, then we log that data fetch. )
There is some interest in extending this to multiple path prefixes, so that we
can log separate parts repo.
Reviewed By: StanislavGlebik
Differential Revision: D22877942
fbshipit-source-id: f6eb3dcb4fa460b4acab09677e972caf9421ddff
Summary:
We use Re2 in D22877942 for parsing multiple path prefix data fetch logging,
this introduces the dependency for eden's opensource builds.
Reviewed By: chadaustin
Differential Revision: D23431175
fbshipit-source-id: 44b399e92cb89ba1403295ecd10bc8f8d769b02c
Summary: This code can be used on Mac as well, so I can just move it to `UnixProcUtils` to be shared. I think to start it we can just try using this before trying to add special idleness detection with looking for active screensavers etc.
Reviewed By: fanzeyi
Differential Revision: D23183163
fbshipit-source-id: fffad8314e70f8726836c482f7a5e30e57a75c0d
Summary: We don't need to restart users if their running version is the same as their installed version, so we should check that when deciding if we should restart. This will give us more freedom in restarts since we won't have to play with `min_uptime`. I will add a flag to skip this check in case for some reason we need to do so on the fly.
Reviewed By: wez
Differential Revision: D23438306
fbshipit-source-id: b17c0e13789071b8b7c1b15ac5a8deb74a4fd091
Summary: I want to be able to reverse engineer an EdenInstance in the `edenfs_restarter` given the cmdline of the process. I think this best lives in the `config.py` file.
Reviewed By: fanzeyi
Differential Revision: D23438318
fbshipit-source-id: b3d9ac3981d3fb2bb8045b07b8d949cd601f6898
Summary: The latter will not strip new lines from the system error message, while the former does.
Reviewed By: genevievehelsel
Differential Revision: D23480435
fbshipit-source-id: 44742b960935552fa1781ed19f38ff446a8c9403
Summary:
Change dag_ops benchmarks to use different IdDagStores. An example run shows:
benchmarking dag::iddagstore::indexedlog_store::IndexedLogStore
building segments (old) 856.803 ms
building segments (new) 127.831 ms
ancestors 54.288 ms
children (spans) 619.966 ms
children (1 id) 12.596 ms
common_ancestors (spans) 3.050 s
descendants (small subset) 35.652 ms
gca_one (2 ids) 164.296 ms
gca_one (spans) 3.132 s
gca_all (2 ids) 270.542 ms
gca_all (spans) 2.817 s
heads 247.504 ms
heads_ancestors 40.106 ms
is_ancestor 108.719 ms
parents 243.317 ms
parent_ids 10.752 ms
range (2 ids) 7.370 ms
range (spans) 23.933 ms
roots 620.150 ms
benchmarking dag::iddagstore::in_process_store::InProcessStore
building segments (old) 790.429 ms
building segments (new) 55.007 ms
ancestors 8.618 ms
children (spans) 196.562 ms
children (1 id) 2.488 ms
common_ancestors (spans) 545.344 ms
descendants (small subset) 8.093 ms
gca_one (2 ids) 24.569 ms
gca_one (spans) 529.080 ms
gca_all (2 ids) 38.462 ms
gca_all (spans) 540.486 ms
heads 103.930 ms
heads_ancestors 6.763 ms
is_ancestor 16.208 ms
parents 103.889 ms
parent_ids 0.822 ms
range (2 ids) 1.748 ms
range (spans) 6.157 ms
roots 197.924 ms
benchmarking dag::iddagstore::bytes_store::BytesStore
building segments (old) 724.467 ms
building segments (new) 90.207 ms
ancestors 23.812 ms
children (spans) 348.237 ms
children (1 id) 4.609 ms
common_ancestors (spans) 1.315 s
descendants (small subset) 20.819 ms
gca_one (2 ids) 72.423 ms
gca_one (spans) 1.346 s
gca_all (2 ids) 116.025 ms
gca_all (spans) 1.470 s
heads 155.667 ms
heads_ancestors 19.486 ms
is_ancestor 51.529 ms
parents 157.285 ms
parent_ids 5.427 ms
range (2 ids) 4.448 ms
range (spans) 13.874 ms
roots 365.568 ms
Overall, InProcessStore > BytesStore > IndexedLogStore. The InProcessStore
uses `Vec<BTreeMap<Id, StoreId>>` for the level-head index, which is more
efficient on the "Level" lookup (Vec), and more cache efficient (BTree).
BytesStore outperforms IndexedLogStore because it does not need to verify
checksum on every read access - the checksum was verified at store creation
(IdDag::from_bytes).
Note: The `BytesStore` is something optimized for serialization, and hasn't been sent.
Reviewed By: sfilipco
Differential Revision: D23438174
fbshipit-source-id: 6e5f15188e3b935659ccde25fac573e9b963b78f
Summary: This allows them to use the SyncableIdDag APIs.
Reviewed By: sfilipco
Differential Revision: D23438170
fbshipit-source-id: 7ec7288cfb8186b88f85f0212a913cb0dffe7345
Summary: Other IdDagStores can also use the API. This will be used in benchmarks.
Reviewed By: sfilipco
Differential Revision: D23438180
fbshipit-source-id: 565552b66372dcfbb268c397883f627491d6e154
Summary:
Similar to `IdDagStore::sync` -> `GetLock::persist`, `reload` is more related
to filesystem/internal state exchange, and should be protected by a lock. So
let's move the API there, and requires a lock.
Reviewed By: sfilipco
Differential Revision: D23438169
fbshipit-source-id: 4228106b7739a1a758677adfddd213ad54aa4b6a
Summary:
`NameDag::reload` is used in `flush` to get a "fresh" NameDag.
In a future diff the `IdDag::reload` API gets changed, so let's
remove NameDag's use of it.
Instead, let's just re-`open` the path again to get a fresh NameDag.
It's a bit more expensive but probably okay, and easier to understand.
`get_new_segment_size()` was added as an internal API to preserve tests.
This also solves an issue where `NameDag` cannot recover properly if its
`flush` fails, because the old `NameDag` state is not lost.
After removing `NameDag::reload`, `idMap::reload` is no longer used publicly
and was made private.
Reviewed By: sfilipco
Differential Revision: D23438179
fbshipit-source-id: 0a32556a2cd786919c233d7efcae1cb9cbc5fb09
Summary:
The word "sync" is bi-directional: flush + reload. It was indexedlog::Log's
behavior. However, in the IdDag context "sync" is confusing - it is actually
only used to write data out, with protection from lock. Rename to `persist`
to clarify it's memory -> disk. Besides, requires a reference to a lock object
as a lightweight prove that some lock is held.
Reviewed By: sfilipco
Differential Revision: D23438175
fbshipit-source-id: 3d9ccd7431691d1c4e2ee74f3c80d95f5e7243b5
Summary:
This removes the need of cloning `IdMap`.
SyncableIdMap is a bit tricky. I added some comments to clarify things.
Reviewed By: sfilipco
Differential Revision: D23438176
fbshipit-source-id: fe66071da07067ed6c53a6437790af1d81b28586
Summary:
Make the test cover IndexedLogIdDagStore. The only change is the parent index
returns children in a different order.
Reviewed By: sfilipco
Differential Revision: D23438173
fbshipit-source-id: bcfabcd329e45bbc5e7e773103fa42307c23c35d
Summary:
We are unifying C++ APIs for accessing optional and unqualified fields:
https://fb.workplace.com/groups/1730279463893632/permalink/2541675446087359/.
This diff migrates code from accessing data members generated from unqualified
Thrift fields directly to the `field_ref` API, i.e. replacing
```
thrift_obj.field
```
with
```
*thrift_obj.field_ref()
```
The `_ref` suffixes will be removed in the future once data members are private
and names can be reclaimed.
The output of this codemod has been reviewed in D20039637.
The new API is documented in
https://our.intern.facebook.com/intern/wiki/Thrift/FieldAccess/.
drop-conflicts
Reviewed By: simpkins
Differential Revision: D23465292
fbshipit-source-id: bb9df3ad183685fae28173da7275e6ecd34df048
Summary:
There aren't too many thigs that we can do with the responses that we get back
from the server. Thigs are somewhat application specific for this endpoint.
One option that is not available right now and might make sense to add is
limiting the number of entries that are printed for a given location.
Reviewed By: kulshrax
Differential Revision: D23456220
fbshipit-source-id: eb24602c3dea39b568859b82fc27b7f6acc77600
Summary:
To reduce the size over the wire on cases where we would be traversing the
changelog on the client, we want to allow the endpoint to return a whole parent
chain with their hashes.
Reviewed By: kulshrax
Differential Revision: D23456216
fbshipit-source-id: d048462fa8415d0466dd8e814144347df7a3452a
Summary:
Renaming all the LocationToHash related structures to CommitLocationToHash.
This is done for consistency. I realized the issue when the command for reading
the request from cbor was not what I was expecting it to be. The reason was that
the commit prefix was used inconsistently for LocationToHash.
Reviewed By: kulshrax
Differential Revision: D23456221
fbshipit-source-id: 0181dcaf81368b978902d8ca79c5405838e4b184
Summary:
Segmented Changelog is a component that has multiple components of each own
that each can be configured in different ways. It seems that it already is
more complicated than other components in how it is set up and it will probably
evolve to have more knobs (caching comes to mind).
Right now we have 3 ways of instantiating SegmentedChangelog:
- Disabled, all requests return errors
- ReadOnly, requests to unprocessed commits return errors
- OnDemandUpdate, requests trigger commit processing when required
Reviewed By: aslpavel
Differential Revision: D23456217
fbshipit-source-id: a6016f05197abbc3722764fa8e9056190a767b36