Commit Graph

2587 Commits

Author SHA1 Message Date
Durham Goode
474b043a34 grep: fix biggrep integration when corpus rev is not present
Summary:
The corpus rev that biggrep has indexed may not be available in the
local client. Later on in the function it will pull that revision, but earlier
in the function the new logic I added a few weeks ago is just crashing.

That logic was trying to diff against the earlier revision, but that's pretty
arbitrary. Let's just diff against one of the revs at random
(deterministically) and get rid of the need for the hash to exist in the repo
early in the command.

Reviewed By: sfilipco

Differential Revision: D23635801

fbshipit-source-id: 1c284d710b8df9539a696e900183bc10d5d71869
2020-09-10 18:01:38 -07:00
Durham Goode
f5a2347fbb py3: fix Mononoke Python 3 test failures
Summary:
Fixes a few issues with Mononoke tests in Python 3.

1. We need to use different APIs to account for the unicode vs bytes difference
for path hash encoding.
2. We need to set the language environment for tests that create utf8 file
paths.
3. We need the redaction message and marker to be bytes.  Oddly this test still
fails with jq CLI errors, but it makes it past the original error.

Reviewed By: quark-zju

Differential Revision: D23582976

fbshipit-source-id: 44959903aedc5dc9c492ec09a17b9c8e3bdf9457
2020-09-09 18:31:04 -07:00
Xavier Deguillard
ed4021b8e3 revisionstore: disallow reading LFS pointers from packfiles
Summary:
For repositories that have the old-style LFS extension enabled, the pointers
are stored in packfiles/indexedlog alongside with a flag that signify to the
upper layers that the blob is externally stored. With the new way of doing LFS,
pointers are stored separately.

When both are enabled, we are observing some interesting behavior where
different get and get_meta calls may return different blobs/metadata for the
same filenode. This may happen if a filenode is stored in both a packfile as an
LFS pointers, and in the LFS store. Guaranteeing that the revisionstore code is
deterministic in this situation is unfortunately way too costly (a get_meta
call would for instance have to fully validate the sha256 of the blob, and this
wouldn't guarantee that it wouldn't become corrupted on disk before calling
get).

The solution take here is to simply ignore all the lfs pointers from
packfiles/indexedlog when remotefilelog.lfs is enabled. This way, there is no
risk of reading the metadata from the packfiles, and the blob from the
LFSStore. This brings however another complication for the user created blobs:
these are stored in packfiles and would thus become unreadable, the solution is
to simply perform a one-time full repack of the local store to make sure that
all the pointers are moved from the packfiles to to LFSStore.

In the code, the Python bindings are using ExtStoredPolicy::Ignore directly as
these are only used in the treemanifest code where no LFS pointers should be
present, the repack code uses ExtStoredPolicy::Use to be able to read the
pointers, it wouldn't be able to otherwise.

Reviewed By: DurhamG

Differential Revision: D22951598

fbshipit-source-id: 0e929708ba5a3bb2a02c0891fd62dae1ccf18204
2020-09-09 18:27:42 -07:00
Stefan Filip
1c172c9008 lfs: use hg-http built client for network requests
Summary: This client provides automatic metrics collection.

Reviewed By: kulshrax

Differential Revision: D23577871

fbshipit-source-id: 137299222a20bc8e4d52c3321febbb91d861b236
2020-09-09 17:35:49 -07:00
Stefan Filip
046db98222 edenapi: use hg-http built client for network requests
Summary:
hg-http's built client should provide integration with Mercurial's stats
collection mechanisms.

Reviewed By: kulshrax

Differential Revision: D23577867

fbshipit-source-id: 93c777021bc347511322269d678d6879710eed3e
2020-09-09 17:35:48 -07:00
Stefan Filip
c1ab6a4e92 http-client: add stats reporting hook
Summary:
Add `with_stats_reporting` to HttpClient. It takes a closure that will be
called with all `Stats` objects generated. We then use this function in
the hg-http crate to integrate with the metrics backend used in Mercurial.

Reviewed By: kulshrax

Differential Revision: D23577869

fbshipit-source-id: 5ac23f00183f3c3d956627a869393cd4b27610d4
2020-09-09 17:35:48 -07:00
Stefan Filip
008d0c82df metrics: use the hgmetrics bindings for incrementing counters
Summary: Rust based metrics so that even Rust libraries can write metrics.

Reviewed By: quark-zju

Differential Revision: D23577870

fbshipit-source-id: b19904968d9372c8ce19775fb37c7af53a370ea5
2020-09-09 17:35:48 -07:00
Stefan Filip
de9b34e83a bindings: add pyhgmetrics to bind the hg-metrics crate
Summary: Exposing the hg-metrics crate to the Python application.

Reviewed By: quark-zju

Differential Revision: D23577875

fbshipit-source-id: 1d919160f8514ae8bfcb0171a0c9d1d9d0de80e6
2020-09-09 17:35:48 -07:00
Stefan Filip
7f72a04c0e metrics: crate for collecting metrics
Summary:
We start off simple here. Python only really has counters so we only implement
counters. There are a lot of options on how to improve this and things get
slightly complicated when we look at the how ecosystem and fb303. Anyway,
simple start.

Reviewed By: quark-zju

Differential Revision: D23577874

fbshipit-source-id: d50f5b2ba302d900b254200308bff7446121ae1d
2020-09-09 17:35:48 -07:00
Stefan Filip
ead17552cf metrics: treat slash '/' as metric delimiter
Summary:
Slash is probably the standard metric delimiter nowadays. Since we don't have
that many metrics I think that it makes sense to look at slash as the
standard metric delimiter going forward.
This diff updates parsing of metric names to treat both '_' and '/' as
delimiters.

Reviewed By: quark-zju

Differential Revision: D23577876

fbshipit-source-id: 03997b1285df9c52d6e2837b5af5372deb69b133
2020-09-09 17:35:48 -07:00
Stefan Filip
4ad9091598 thrift: update thrift types
Summary: autogenerated by `make local`

Reviewed By: quark-zju

Differential Revision: D23577872

fbshipit-source-id: 6ca98fd865c3b3bc3a00d8126ce20b59110f8118
2020-09-09 17:35:48 -07:00
Liubov Dmitrieva
321f4dfb31 add hg cloud switch command to simplify switching between
Summary:
The command is easier to use than `hg cloud join --switch`.

Also highlight the workspace name in the output of `hg cloud status`

Reviewed By: mitrandir77

Differential Revision: D23601507

fbshipit-source-id: 74eb17c9366a9dbe96881c8e3e0705619fadb3d6
2020-09-09 14:04:57 -07:00
Pavel Aslanov
897ec3d6d8 verify that received files have the correct size
Summary:
Streaming clone implementation did not check that received files have the corrects. This change addresses it.

Before this change if connection was interrupted for whatever reason client would treat fetch of changeset as successful and proceed with cloning operations, but later checks would report corruption of internal state of hg data. This is based on user [report](https://fb.workplace.com/groups/scm/permalink/3177150312334567/)

Reviewed By: quark-zju, krallin

Differential Revision: D23572058

fbshipit-source-id: d740b45ca217cd6db0a65e01aabc2ba9a4835221
2020-09-09 11:32:38 -07:00
Saurabh Singh
384c4f61fa fix the Windows build
Reviewed By: sfilipco

Differential Revision: D23601358

fbshipit-source-id: c5a33286b7468882bbedb3e8fe85f66a8f9db0e2
2020-09-09 10:39:35 -07:00
Arun Kulshreshtha
de7f7ab4fe http-client: rename crate
Summary: The Mercurial codebase uses hyphens in crate names rather than underscores. This is similar to the convention favored by the larger Rust community, though it is different from Mononoke, which uses underscores. While we'll probably need to eventually settle on a consistent convention for all of projects in the Eden SCM repo, for now, `http_client` should be made consistent with the adjacent crates.

Reviewed By: sfilipco

Differential Revision: D23585721

fbshipit-source-id: d2e690d86815be02d7b8d645198bcd28e8cbd6e0
2020-09-09 10:12:50 -07:00
David Tolnay
e83e05ff25 Update formatter to rustfmt 2.0
Reviewed By: zertosh

Differential Revision: D23591028

fbshipit-source-id: f458503fc2b9c25023fa1643eca5e166882a4811
2020-09-09 07:52:34 -07:00
Lukasz Piatkowski
379065faab eden/scm: remove leftover of tokio-core after tokio 0.2 migration (#52)
Summary: Pull Request resolved: https://github.com/facebookexperimental/eden/pull/52

Reviewed By: krallin

Differential Revision: D23594074

Pulled By: lukaspiatkowski

fbshipit-source-id: 776c02418f4951321887f566bac8b76c9da8bcc1
2020-09-09 02:32:49 -07:00
Zeyi (Rice) Fan
5e02a93e91 eden-client: move to use tokio 0.2 socket transport
Summary: No more tokio-core! More `async/await`.

Reviewed By: kulshrax

Differential Revision: D23586509

fbshipit-source-id: b2e766ddb7575bc96963432f0c8582b4370b19aa
2020-09-08 20:24:26 -07:00
Zeyi (Rice) Fan
a6a73ec6b6 switch to tokio 0.2 transport
Summary:
This diff adds a `SocketTransport` implementation that no longer uses legacy `tokio-core` based futures but `tokio-tower` and `tower-service` for processing Thrift requests.

The old implementation is renamed to `SocketTransportLegacy` for better transitioning.

Reviewed By: dtolnay

Differential Revision: D20019196

fbshipit-source-id: 3bee684e9254bf1a81669ef0d2c2262a55e75daa
2020-09-08 17:53:57 -07:00
Saurabh Singh
858dbc6861 tests: fix 'test-remotefilelog-undesired-file-logging.t'
Reviewed By: DurhamG

Differential Revision: D23589645

fbshipit-source-id: 350bab980baa811824d7c4fd36d689a5a3395dd8
2020-09-08 17:36:35 -07:00
Durham Goode
2919268555 revisionstore: auto-delete when we have too much pack data
Summary:
In order to keep the hgcache size bounded we need to keep track of pack
file size even during normal operations and delete excess packs.

This has the negative side effect of deleting necessary data if the operation is
legitimately huge, but we'd rather have extra downloading time than fill up the
entire disk.

Reviewed By: quark-zju

Differential Revision: D23486922

fbshipit-source-id: d21be095a8671d2bfc794c85918f796358dc4834
2020-09-08 11:33:50 -07:00
Durham Goode
717d10958f revisionstore: refactor pack iteration code
Summary:
In a future diff we'll add logic to delete old pack files. We'll want
to use this pack iteration code, so let's move it to a function.

Reviewed By: quark-zju

Differential Revision: D23486920

fbshipit-source-id: 5f872e946ffe816289c925dd2e03c292e29da5af
2020-09-08 11:33:50 -07:00
Durham Goode
651a0690be revisionstore: auto-commit datapacks when they get large
Summary:
As the repository grows the opportunity for large downloads increases.
Today all writes to data packs get sent straight to disk, but we have no way to
prevent this from eating all the disk.

Let's automatically flush datapacks when they reach a certain size (default
4GB). In a future diff this will let us automatically garbage collect data packs
to bound the maximum size of packs.

Rotatelog already have this behavior.

Reviewed By: quark-zju

Differential Revision: D23478780

fbshipit-source-id: 14f9f707e8bffc59260c2d04c18b1e4f6bdb2f90
2020-09-08 11:33:50 -07:00
Thomas Orozco
2948993c38 remotefilelog: add killswitch for client certs
Summary:
See D23538897 for context. This adds a killswitch so we can rollout client
certs gradually through dynamicconfig.

Reviewed By: StanislavGlebik

Differential Revision: D23563905

fbshipit-source-id: 52141365d89c3892ad749800db36af08b79c3d0c
2020-09-08 10:39:07 -07:00
Thomas Orozco
d1c4772da3 remotefilelog: use client certs when connecting to LFS
Summary:
Like it says in the title, this updates remotefilelog to present client
certificates when connecting to LFS (this was historically the case in the
previous LFs extension). This has a few upsides:

- It lets us understand who is connecting, which makes debugging easier;
- It lets us enforce ACLs.
- It lets us apply different rate limits to different use cases.

Config-wise, those certs were historically set up for Ovrsource, and the auth
mechanism will ignore them if not found, so this should be safe. That said, I'd
like to a killswitch for this nonetheless. I'll reach out to Durham to see if I
can use dynamic config for that

Also, while I was in there, I cleaned up few functions that were taking
ownership of things but didn't need it.

Reviewed By: DurhamG

Differential Revision: D23538897

fbshipit-source-id: 5658e7ae9f74d385fb134b88d40add0531b6fd10
2020-09-08 10:39:07 -07:00
David Tolnay
e62b176170 Prepare for rustfmt 2.0
Summary:
Generated by formatting with rustfmt 2.0.0-rc.2 and then a second time with fbsource's current rustfmt (1.4.14).

This results in formatting for which rustfmt 1.4 is idempotent but is closer to the style of rustfmt 2.0, reducing the amount of code that will need to change atomically in that upgrade.

 ---

*Why now?* **:** The 1.x branch is no longer being developed and fixes like https://github.com/rust-lang/rustfmt/issues/4159 (which we need in fbcode) only land to the 2.0 branch.

 ---

Reviewed By: zertosh

Differential Revision: D23568779

fbshipit-source-id: 477200f35b280a4f6471d8e574e37e5f57917baf
2020-09-07 20:47:59 -07:00
Mateusz Kwapich
6e5a6c3d71 metaedit: JSON input mode
Summary:
This makes it easy for `metaedit` to be used by automation. Provided
with a simple JSON file with hash->{user, message} mapping metaedit will
do all of its work without any prompts.

Reviewed By: quark-zju

Differential Revision: D23545527

fbshipit-source-id: 18763ecacff9143b9ad492faf654b176b0f86d1f
2020-09-07 13:33:58 -07:00
Jun Wu
89eb6520d2 scmutil: remove meaningfulparents
Summary:
The "meaningfulparents" concept is coupled with rev numbers.
Remove it. This changes default templates to not show parents, and `{parents}`
template to show parents.

Reviewed By: DurhamG

Differential Revision: D23408970

fbshipit-source-id: f1a8060122ee6655d9f64147b35a321af839266e
2020-09-05 15:06:44 -07:00
Durham Goode
8b91cccc8b remotefilelog: log undesired filename fetches
Summary:
Now that the Rust revisionstore records undesired filename fetches,
let's log those results to Scuba in Python.

Reviewed By: StanislavGlebik

Differential Revision: D23462572

fbshipit-source-id: b55f2290e30e3a5c3b67d9f612b24bc3aad403a8
2020-09-04 14:55:15 -07:00
Durham Goode
9772ab1718 revisionstore: record remote fetches that match a pattern
Summary:
We want to be able to record when fetches to certain paths happen.
Let's add recording infrastructure to the new ReportingRemoteDataStore.

A future diff will make the seen accessible from Python for scuba logging.

Reviewed By: xavierd

Differential Revision: D23462574

fbshipit-source-id: 5d749f2429e26e8e7fe4fb5adc29140b4309eac9
2020-09-04 14:55:15 -07:00
Durham Goode
84cbc26b1e revisionstore: add reporting wrapper for remote data store
Summary:
We want to monitor what paths are fetched from our remote servers.
Since all of our remote stores are hidden behind the RemoteDataStore interface,
let's create a wrapper around that. A future diff will insert the actual
monitoring and reporting.

Reviewed By: quark-zju

Differential Revision: D23462571

fbshipit-source-id: e6031f19db23f7d1b09767efb9613d7528fb457d
2020-09-04 14:55:14 -07:00
Jun Wu
dabb68c1e5 checkmessagehook: make error message more obvious
Summary: This hopefully makes it more obvious so it looks less like an hg crash.

Reviewed By: kulshrax

Differential Revision: D23509569

fbshipit-source-id: 7174780bc7e9841e3f89a482280c49427b62fb74
2020-09-04 14:55:14 -07:00
Jun Wu
4131dcf012 context: avoid memorizing revs
Summary:
The revs can change after flush. For example, during pushrebase, some ctx might
initially have a non-master Id assigned, and later got assigned an Id in the
master group:

```
ipdb> p self.__dict__
{'_repo': <edenscm.hgext.fastannotate.protocol.localreposetup.<locals>.fastannotaterepo object at 0x7f2415b3f8e0>, '_rev': 72057594038527478, '_node': b'\xb6\x12\xcd\x81b#\xa3\x01\xe2pP\x84\x05{\xd2He\xbe\xcc\xf0'}
ipdb> p self._node
b'\xb6\x12\xcd\x81b#\xa3\x01\xe2pP\x84\x05{\xd2He\xbe\xcc\xf0'
ipdb> p self._repo.changelog.rev(self._node)
7198913
ipdb> p self._rev
72057594038527478
```

Note that `self._rev` becomes inconsistent with `changelog.rev(self._node)`.

The error looks like:

  $ hg push -r . --to master --debug --trace --traceback --verbose
  ...
  pushing rev 556400239977 to destination ...
  ...
  1 commits found
  list of changesets:
  556400239977b9ed523eae5ad28773784c975f7f
  sending unbundle command
  ...
  added 79 commits with 0 changes to 0 files
  moving remote bookmark 'remote/master' to 84829e9242e4
  ...
  using eden update code path
  Traceback (most recent call last):
    ...
    File "/opt/fb/mercurial/edenscm/mercurial/merge.py", line 2220, in update
      return eden_update.update(
    File "/opt/fb/mercurial/edenscm/mercurial/eden_update.py", line 126, in update
      stats, actions = _handle_update_conflicts(
    ...
    File "/opt/fb/mercurial/edenscm/mercurial/context.py", line 503, in _changeset
      return self._repo.changelog.changelogrevision(self.rev())
      # self = <changectx 84829e9242e4>
    File "/opt/fb/mercurial/edenscm/mercurial/changelog2.py", line 312, in changelogrevision
      return changelogrevision(self.revision(nodeorrev))
      # nodeorrev = 72057594038527521
    File "/opt/fb/mercurial/edenscm/mercurial/changelog2.py", line 365, in revision
      node = self.node(nodeorrev)
      # nodeorrev = 72057594038527521
    File "/opt/fb/mercurial/edenscm/mercurial/changelog2.py", line 280, in node
      raise IndexError("revlog index out of range")
  Traceback (most recent call last):
    File "/opt/fb/mercurial/edenscm/mercurial/changelog2.py", line 278, in node
      return self.idmap.id2node(rev)
  error.CommitLookupError: 'N599585 cannot be found'

Change `context` object to not memorizing revs.

Reviewed By: DurhamG

Differential Revision: D23468702

fbshipit-source-id: b623bcec99b09d61169371e08c69fc6d6f38935c
2020-09-04 13:22:18 -07:00
Jun Wu
e74133f0fa dag: limit max segment level to 4
Summary:
This is based on fbsource data, building level 5 proves to be not useful.

This would save 300ms in the write path.

Reviewed By: sfilipco

Differential Revision: D23494505

fbshipit-source-id: ca795b4900af40dbfdaa463d36f3169413bf6a62
2020-09-04 12:20:54 -07:00
Jun Wu
b4adf0602f dag: remove non-master "Name -> Id" index on request
Summary:
Previously the IdMap's "Name -> Id" index simply ignores the "reassign
non-master" request. It turns out stale entries in that index can cause
issues as demonstrated by the previous diff.

Update IdMap to actually remove both indexes of non-master group on
remove_non_master so it cannot have stale entries.

To optimize the index, the format of IdMap is changed from:

  [ 8 bytes Id (Big Endian) ] [ Name ]

to:

  [ 8 bytes Id (Big Endian) ] [ 1 byte Group ] [ Name ]

So the index can use reference to the slice, instead of embedding the bytes, to
reduce index size.

The filesystem directory name for IdMap used by NameDag is bumped to `idmap2`
so it won't read the incompatible old `idmap` data.

Reviewed By: sfilipco

Differential Revision: D23494508

fbshipit-source-id: 3cb7782577750ba5bd13515b370f787519ed3894
2020-09-04 12:20:53 -07:00
Jun Wu
c5d6c9d0f2 dag: add a test showing non-master rebuild issues
Summary: Some vertexes can disappear from the graph!

Reviewed By: sfilipco

Differential Revision: D23494506

fbshipit-source-id: ecbf2a4169e5fc82596e89a4bfe4c442a82e9cd2
2020-09-04 12:20:53 -07:00
Jun Wu
4aea3657e1 dag: move some test utilities to a TestDag struct
Summary: The TestDag struct will be used to do some more complicated tests.

Reviewed By: sfilipco

Differential Revision: D23494507

fbshipit-source-id: 11350f9e448725ae49f50a7b6f19efc57ad84448
2020-09-04 12:20:53 -07:00
Thomas Orozco
3ba2c2b429 mononoke/hg_sync: make it work on Mercurial Python 3
Summary:
A few things here:

- The heads must be bytes.
- The arguments to wireproto must be strings (we used to encode / decode them,
  but we shouldn't).
- The bookmark must be a string (otherwise it gets serialized as `"b\"foo\""`
  and then it deserializes to that instead of `foo`).

Reviewed By: StanislavGlebik

Differential Revision: D23499846

fbshipit-source-id: c8a657f24c161080c2d829eb214d17bc1c3d13ef
2020-09-04 11:56:44 -07:00
Jun Wu
c9e6995675 py2: fix crecord compatibility
Summary:
D23460476 (c84653c7a9) breaks Python 2:

Python 2: bytes + bytearray -> bytearray
Python 3: bytes + bytearray -> bytes

Fix it.

Python 2: b"%s" % bytearray -> bytes
Python 2: b"%s" % bytearray -> bytes

Reviewed By: singhsrb

Differential Revision: D23514590

fbshipit-source-id: 7fd5f2372444732f13909c42251f000f05955228
2020-09-03 18:51:10 -07:00
Stefan Filip
c09f80882c edenapi: use async-runtime to schedule futures
Summary:
Replacing places where the tokio runtime is instantiated inside the edenapi
client crate.

Reviewed By: quark-zju

Differential Revision: D23468596

fbshipit-source-id: ef68718c7d5b89b6477a2946daaa51618b53d06a
2020-09-03 15:45:34 -07:00
Jun Wu
cea2bf8728 dag: limit segment level at open time
Summary:
At open time, it's pointless to attempt to create new levels. So let's just
read the existing max_level and do not try to build max_level + 1.

This turns out to save 300ms in profiling result.

Reviewed By: sfilipco

Differential Revision: D23494509

fbshipit-source-id: 4ea326a3cc21792790ea0b87e5bf608a94ae382b
2020-09-03 13:48:43 -07:00
Jun Wu
f238529a97 multilog: use per-log meta to pick up updated indexes
Summary:
With MultiLog, per-log meta was previously entirely ignored. However, they can
be useful for updated indexes. For example, application defines a new index,
and opens a Log via MultiLog. The application would expect the new index is
built only once. Without MultiLog, per-log meta is updated at open time in
place. With MultiLog, the updated index meta is not written back to the
multimeta so the new index would be rebuilt multiple times undesirably.

Update MultiLog to reuse the per-log meta if it's compatible so it can pick up
new indexes.

Reviewed By: sfilipco

Differential Revision: D23488212

fbshipit-source-id: c8b3e6b5589dbda2e76a143d15085862a93dae22
2020-09-03 13:48:43 -07:00
Jun Wu
f79e7657af multilog: stop writing poisoned per-log meta
Summary:
The poisoned meta makes investigation harder. ex. `debugdumpindexlog` won't
work on those logs.

Reviewed By: sfilipco

Differential Revision: D23488213

fbshipit-source-id: b33894d8c605694b6adf5afdaed45707fbd7357e
2020-09-03 13:48:43 -07:00
Jun Wu
99511f8743 dag: benchmark dag_ops on different IdDagStores
Summary:
Change dag_ops benchmarks to use different IdDagStores. An example run shows:

  benchmarking dag::iddagstore::indexedlog_store::IndexedLogStore
  building segments (old)                           856.803 ms
  building segments (new)                           127.831 ms
  ancestors                                          54.288 ms
  children (spans)                                  619.966 ms
  children (1 id)                                    12.596 ms
  common_ancestors (spans)                            3.050 s
  descendants (small subset)                         35.652 ms
  gca_one (2 ids)                                   164.296 ms
  gca_one (spans)                                     3.132 s
  gca_all (2 ids)                                   270.542 ms
  gca_all (spans)                                     2.817 s
  heads                                             247.504 ms
  heads_ancestors                                    40.106 ms
  is_ancestor                                       108.719 ms
  parents                                           243.317 ms
  parent_ids                                         10.752 ms
  range (2 ids)                                       7.370 ms
  range (spans)                                      23.933 ms
  roots                                             620.150 ms

  benchmarking dag::iddagstore::in_process_store::InProcessStore
  building segments (old)                           790.429 ms
  building segments (new)                            55.007 ms
  ancestors                                           8.618 ms
  children (spans)                                  196.562 ms
  children (1 id)                                     2.488 ms
  common_ancestors (spans)                          545.344 ms
  descendants (small subset)                          8.093 ms
  gca_one (2 ids)                                    24.569 ms
  gca_one (spans)                                   529.080 ms
  gca_all (2 ids)                                    38.462 ms
  gca_all (spans)                                   540.486 ms
  heads                                             103.930 ms
  heads_ancestors                                     6.763 ms
  is_ancestor                                        16.208 ms
  parents                                           103.889 ms
  parent_ids                                          0.822 ms
  range (2 ids)                                       1.748 ms
  range (spans)                                       6.157 ms
  roots                                             197.924 ms

  benchmarking dag::iddagstore::bytes_store::BytesStore
  building segments (old)                           724.467 ms
  building segments (new)                            90.207 ms
  ancestors                                          23.812 ms
  children (spans)                                  348.237 ms
  children (1 id)                                     4.609 ms
  common_ancestors (spans)                            1.315 s
  descendants (small subset)                         20.819 ms
  gca_one (2 ids)                                    72.423 ms
  gca_one (spans)                                     1.346 s
  gca_all (2 ids)                                   116.025 ms
  gca_all (spans)                                     1.470 s
  heads                                             155.667 ms
  heads_ancestors                                    19.486 ms
  is_ancestor                                        51.529 ms
  parents                                           157.285 ms
  parent_ids                                          5.427 ms
  range (2 ids)                                       4.448 ms
  range (spans)                                      13.874 ms
  roots                                             365.568 ms

Overall, InProcessStore > BytesStore > IndexedLogStore. The InProcessStore
uses `Vec<BTreeMap<Id, StoreId>>` for the level-head index, which is more
efficient on the "Level" lookup (Vec), and more cache efficient (BTree).
BytesStore outperforms IndexedLogStore because it does not need to verify
checksum on every read access - the checksum was verified at store creation
(IdDag::from_bytes).

Note: The `BytesStore` is something optimized for serialization, and hasn't been sent.

Reviewed By: sfilipco

Differential Revision: D23438174

fbshipit-source-id: 6e5f15188e3b935659ccde25fac573e9b963b78f
2020-09-02 18:54:12 -07:00
Jun Wu
84ad7a5351 dag: implement GetLock for all IdDagStores
Summary: This allows them to use the SyncableIdDag APIs.

Reviewed By: sfilipco

Differential Revision: D23438170

fbshipit-source-id: 7ec7288cfb8186b88f85f0212a913cb0dffe7345
2020-09-02 18:54:12 -07:00
Jun Wu
cfff0e9144 dag: make IdDag::prepare_filesystem_sync generic
Summary: Other IdDagStores can also use the API. This will be used in benchmarks.

Reviewed By: sfilipco

Differential Revision: D23438180

fbshipit-source-id: 565552b66372dcfbb268c397883f627491d6e154
2020-09-02 18:54:12 -07:00
Jun Wu
8874e07f9b dag: IdDagStore::reload -> GetLock::reload
Summary:
Similar to `IdDagStore::sync` -> `GetLock::persist`, `reload` is more related
to filesystem/internal state exchange, and should be protected by a lock.  So
let's move the API there, and requires a lock.

Reviewed By: sfilipco

Differential Revision: D23438169

fbshipit-source-id: 4228106b7739a1a758677adfddd213ad54aa4b6a
2020-09-02 18:54:12 -07:00
Jun Wu
d633576880 dag: remove NameDag::reload
Summary:
`NameDag::reload` is used in `flush` to get a "fresh" NameDag.
In a future diff the `IdDag::reload` API gets changed, so let's
remove NameDag's use of it.

Instead, let's just re-`open` the path again to get a fresh NameDag.
It's a bit more expensive but probably okay, and easier to understand.
`get_new_segment_size()` was added as an internal API to preserve tests.

This also solves an issue where `NameDag` cannot recover properly if its
`flush` fails, because the old `NameDag` state is not lost.

After removing `NameDag::reload`, `idMap::reload` is no longer used publicly
and was made private.

Reviewed By: sfilipco

Differential Revision: D23438179

fbshipit-source-id: 0a32556a2cd786919c233d7efcae1cb9cbc5fb09
2020-09-02 18:54:11 -07:00
Jun Wu
8e16e4260f dag: IdDagStore::sync -> GetLock::persist
Summary:
The word "sync" is bi-directional: flush + reload. It was indexedlog::Log's
behavior. However, in the IdDag context "sync" is confusing - it is actually
only used to write data out, with protection from lock. Rename to `persist`
to clarify it's memory -> disk. Besides, requires a reference to a lock object
as a lightweight prove that some lock is held.

Reviewed By: sfilipco

Differential Revision: D23438175

fbshipit-source-id: 3d9ccd7431691d1c4e2ee74f3c80d95f5e7243b5
2020-09-02 18:54:11 -07:00
Jun Wu
3ad58ff945 dag: make SyncableIdMap use &mut IdMap instead of IdMap
Summary:
This removes the need of cloning `IdMap`.

SyncableIdMap is a bit tricky. I added some comments to clarify things.

Reviewed By: sfilipco

Differential Revision: D23438176

fbshipit-source-id: fe66071da07067ed6c53a6437790af1d81b28586
2020-09-02 18:54:11 -07:00