Commit Graph

657 Commits

Author SHA1 Message Date
Jun Wu
4ca0b2c778 indexedlog: move macros to a separate module
Summary:
The internal rustfmt linter suggests wrong autofixes for the `impl_offset!`
macro. That's noisy for every diff touching `index.rs`. Silence it by moving
macros to a separate file.

To be consistent, `define_error!` is also moved.

Differential Revision: D14885746

fbshipit-source-id: d1a518e631f80d6d7945f1ea3c2e4d18e1c799ca
2019-04-11 12:51:26 -07:00
Xavier Deguillard
5ddf39f788 remotefilelog: add an indexedlog contentstore
Summary:
While the Rust code can read/write content out of an indexedlog, the Python
code cannot. For now, all the writes will be done in Rust, and the Python code
will only be able to read from it.

Reviewed By: quark-zju

Differential Revision: D14894330

fbshipit-source-id: 5c1698d31412bc93e93dabb93be106a2ef17d184
2019-04-11 12:07:58 -07:00
Xavier Deguillard
10ae96292e asyncpacks: add indexedlogdatastore
Summary:
As The IndexedLogDataStore will be used in hg_memcache_client, it needs to be
used in async code, and thus needs an async wrapper.

Note: I should probably rename the crate to "asyncrevisionstore" :)

Reviewed By: quark-zju

Differential Revision: D14881362

fbshipit-source-id: 203ce50954d99899715b32f85e6118e757578ece
2019-04-11 12:07:58 -07:00
Jun Wu
59231d903d indexedlog: implement read-only fast paths for sync
Summary:
In case there are nothing to write, `Log` and `LogRotate` can take
a fast path that does not take directory locks.

Differential Revision: D14885450

fbshipit-source-id: 4d72d5a3e33b7371880ad31f8bc43ed31c03797f
2019-04-10 20:59:56 -07:00
Jun Wu
29fa45c973 indexedlog: rename some flush() to sync()
Summary:
The `flush()` function does two things: read and write. It's not just writing
data. Rename it to `sync` to clarify.

`Index::flush` is unchanged because although it might read new data, the new
data is not visible. Calling `Index::flush` without dirty changes does not
cause visible (queriable) changes to the index.

`FlushFilter` is unchanged because it is coupled with the write path. It is
not to filter reading.

The old name is kept temporarily until all pending code gets committed and
we can do a codemod.

Differential Revision: D14885451

fbshipit-source-id: 3aed3b741e5e8f09b611ddcc25930a6fdf71706c
2019-04-10 20:59:56 -07:00
Jun Wu
a4322ca904 logrotate: make writable_log private
Summary:
The `writable_log` API can be misused to "flush" a Log, bypassing the check
about whether it should be rotated or not.

The real need of `writable_log` is to get accesses to indexes on the "writable"
(or "latest") log. Therefore let's just expose that instead.

Practically, the only use case of querying the index on the "latest" log is to
make sure dependent content are written to a same Log. That also requires a
"flush_filter" to be provided. Therefore add an assertion about it.

Differential Revision: D14866022

fbshipit-source-id: f6c07a498597b6f0f07d7cc3130e9033ba8b9be4
2019-04-10 19:50:01 -07:00
Jun Wu
f364cd1420 logrotate: add flush_filter
Summary:
Introduce the "flush filter" that can replace content to be written.
This would be useful to make sure delta chains are self-contained.

For LogRotate, flush_filter is trigger not only when the log file
was modified, but also when rotation happens,

Differential Revision: D14866024

fbshipit-source-id: f417200d3ae573e9ac82985ad6afd082412b358d
2019-04-10 19:50:01 -07:00
Jun Wu
2e300dba51 indexedlog: add a flush_filter function to Log
Summary:
The flush filter allows mutating entries being flushed. It can be used to avoid
inserting duplicated data.

Differential Revision: D14866023

fbshipit-source-id: ecf6cf60a0a97cf8110ef9c957e7e3bbab5855fc
2019-04-10 19:50:00 -07:00
Jun Wu
0cd1d8ce9d indexedlog: error out if the primary log does not match metadata
Summary:
Previously the code allows the "log" file to be longer than the metadata,
intended to allow advanced usecases that replaces the "meta" file to
get a read-only view in the past.

That implies we trust the length of "log" file. But it's in theory easy to mess
up - when appending to the "log" file, the process might be killed.

Data integrity is first priority. Therefore let's just error out if the file
length does not match the metadata. To support read-only views in the past,
we can use potentially use file names other than "meta" or support in-memory
metadata instead.

Differential Revision: D14866025

fbshipit-source-id: bbf0061a6448375a2de06fbf31f2b9838c749be0
2019-04-10 19:50:00 -07:00
Xavier Deguillard
a38ac46869 revisionstore: add an indexedlog backed content store
Summary:
Packfiles are proving complex in several situation in order to perform well.
For instance, repack are required to keep common operation from spending most
of their time in scanning and iterating over the filesystem. In fact, most of
the pain point with packfiles is caused by their immutability: once written,
they can no longer be updated.

IndexedLog on the other hand can be updated in place, and therefore do no
require repacks and thus do not exhibit some of the pathological behavior that
packfile are showing.

As a first step, let's add a simple content store backed by indexedlog.

Reviewed By: quark-zju

Differential Revision: D14790070

fbshipit-source-id: 44f766db6a08169971f87a38246873c6e53c3233
2019-04-10 10:34:34 -07:00
Arun Kulshreshtha
de2605bf4d edenapi: make data and history batch sizes separately configurable
Summary: Make the batch size of data and history requests independently configurable, since data responses are typically much larger than history responses (since the former contains actual file data whereas the latter is only metadata).

Differential Revision: D14859686

fbshipit-source-id: c87c31f3e6611a55ae712e7f0ed9bb392d31a579
2019-04-09 17:00:01 -07:00
Arun Kulshreshtha
1bb8d63b74 edenapi: split MultiDriver into its own module
Summary: Move the code for managing a curl::Multi session into its own submodule to avoid cluttering the main client file.

Reviewed By: quark-zju

Differential Revision: D14855344

fbshipit-source-id: 8c93959774c3bc03d2620012d1665228fbcb6681
2019-04-09 14:59:44 -07:00
Arun Kulshreshtha
9fdd71e4df edenapi: use curl multi interface
Summary: Use the curl multi interface to fetch multiple batches of files or history entries concurrently.

Differential Revision: D14718547

fbshipit-source-id: c5a740c7e9106b719e825540f8182be31a72bae7
2019-04-09 14:59:44 -07:00
Stefan Filip
a61980ab9b asyncpacks: update asyncmutablehistorypack to testutil
Summary: testutil everywhere

Differential Revision: D14716080

fbshipit-source-id: 197cce4f64443a7a065010dd9ff8da32f548d496
2019-04-08 16:21:08 -07:00
Stefan Filip
becc32004a asyncpacks: update asynchistorypacks.rs to testutil
Summary: testutil everywhere

Differential Revision: D14716079

fbshipit-source-id: c83388f5248bf6afd9c2b6af87dcd8f6b0b850e1
2019-04-08 16:21:08 -07:00
Stefan Filip
f958833800 asyncpacks: update asyncdatapack.rs to testutil
Summary: testutil everywhere

Differential Revision: D14713053

fbshipit-source-id: 26fcdea580dd45280bf2f1725dcdb6ab8948465f
2019-04-08 16:21:08 -07:00
Stefan Filip
d6ad49db5b manifest: migrate to types::testutil node
Summary: migration

Differential Revision: D14660306

fbshipit-source-id: 71df6814d93f8b9f814aedaa4ceb558a8b69cdf6
2019-04-08 16:21:08 -07:00
Stefan Filip
1e2816f7c6 types: add testutil module to help writing tests
Summary:
Building test objects can be tedious using various of our bottom bytes.
This diff addresses that issue by adding helper functions in a new module
in the types crate.

Handling this case could be improved in rust.

Differential Revision: D14660307

fbshipit-source-id: a866c1f3ede60ba1b87eb17d35817b8a8d7674a4
2019-04-08 16:21:07 -07:00
Arun Kulshreshtha
6705e2d120 bindings: allow choice between edenapi backends
Summary: Allow users to configure which HTTP client backend to use for the Eden API via the `edenapi.backend` config option. Valid options are `curl` and `hyper`, with `curl` being the default.

Reviewed By: quark-zju

Differential Revision: D14657871

fbshipit-source-id: 7a9972d2380fbbd5ed62d1accae764dc03ca4c29
2019-04-05 17:34:14 -07:00
Arun Kulshreshtha
6c8c87dea1 edenapi: add curl-based client
Summary:
Add a new Eden API client based on libcurl (via the rust-curl crate). This should help us work around issues with Hyper.

This implementation is based on curl's "easy" API, and is intentionally naive. I intend to update it to use curl's "multi" API to send several concurrent HTTP requests per operation in a later diff.

Differential Revision: D14656756

fbshipit-source-id: 1f71074506844104f0f3237023b38317a7f41979
2019-04-05 17:34:14 -07:00
Jun Wu
12b98e1e96 indexedlog: use failure for error handling
Summary:
Failure makes it easier to chain errors, and backtraces. Use it.

There is probably still room for improvement, by chainning errors and avoiding
exposing low-level errors for APIs, and/or provide more context in error
messages. But it should be already much better than before.

Differential Revision: D14759305

fbshipit-source-id: b1d3a8ec959dde575f06533ea9e4cd0757057051
2019-04-05 12:17:28 -07:00
Jun Wu
dbfad715b8 logrotate: reduce max_log_count to u8 range
Summary:
Practically there are many issues with a large max_log_count:
- The directory scan would be slower.
- The index would be slower.

Let's reduce it to u8 range to address the issues. This also makes the
directory name short.

Differential Revision: D14717896

fbshipit-source-id: d39f008abe576991e14d444c37a049a6132df507
2019-04-03 22:16:33 -07:00
Jun Wu
cf82cb6340 indexedlog: replace atomicwrites with tempfile
Summary:
Some tests added by upcoming diffs were timing out while they don't seem that
expensive. I tracked it down to the use of `fsync` in atomicwrites.

In our case, we don't need `fsync`. `fsync` is useful for making sure the order
of file writes is desired even in case of system crash. For example, making
sure the "primary" log file is written before writing the "meta" file.

That's too expensive (esp. on filesystems like ext4) for our usecase.
Indexedlog is designed to make sure data corruption can be detected, and there
can be a "reasonable" way to recover (ex. by deleting all indexes, scanning
through entries and re-inserting them in a new log), not to fight against OS
crashes.

`cargo bench` change on a btrfs filesystem:

Before:

  index flush                    42.570 ms
  log flush                       7.712 ms

After:

  index flush                    36.485 ms
  log flush                       1.609 ms

Differential Revision: D14759304

fbshipit-source-id: 66b95d10040cf1480367b767811dfabee5e27ffe
2019-04-03 22:16:33 -07:00
Jun Wu
7cb1663ae0 indexedlog: migrate to Rust 2018
Summary: Used `cargo fix --edition`. Removed some `mut`s according to rustc warnings.

Differential Revision: D14718308

fbshipit-source-id: 94e3c3f8e47143ede767fe883fdb5e9602b12854
2019-04-03 22:16:33 -07:00
Zeyi Fan
9277617d1e fix double free in cdatapack
Summary:
Mercurial recently started to generate empty pack files (`0x01`). This will cause this check to fail:

diffusion/FBS/browse/master/fbcode/scm/hg/lib/cdatapack/cdatapack.c;2c4197d003ed906dd8eaf70fbb04aa53440ce681$314-319

This will subsequently result as a double-free error between these two:

**In `error_cleanup`**

diffusion/FBS/browse/master/fbcode/scm/hg/lib/cdatapack/cdatapack.c;2c4197d003ed906dd8eaf70fbb04aa53440ce681$387-389

**In `close_datapack`**

diffusion/FBS/browse/master/fbcode/scm/hg/lib/cdatapack/cdatapack.c;2c4197d003ed906dd8eaf70fbb04aa53440ce681$401

This diff will fix this bug.

Differential Revision: D14759374

fbshipit-source-id: 06f192513a935740c2142b5a2baac87a28903496
2019-04-03 21:13:13 -07:00
Xavier Deguillard
e106c73ddf revisionstore: do not create an empty datapack/historypack
Summary:
Zeyi realized that empty packfiles were problematic for the cdatapack code.
While its code should be fixed, having empty packfiles lying around is
unecessary anyway, so let's not write them.

Reviewed By: fanzeyi

Differential Revision: D14760942

fbshipit-source-id: a128eedaf79a6388a3c7142399715bb4eb96a2ae
2019-04-03 20:43:06 -07:00
Xavier Deguillard
39e66964f4 revisionstore: limit history repack memory usage
Summary:
While datapack repack is fairly inexpensive in memory and mostly limited to the
number of entries in its index, a historypack repack needs to keep both the
data and the index in memory. It appears that the overhead of doing so is a big
factor in repack taking a lot of memory as a resulting 100MB histpack would use
about 1.2GB of RAM. Extrapolating the numbers, a resulting 4GB histpack would
need 48GB, which is enough to put a devserver in a swapping state, and worse
for laptops. Limiting the historypack size to 400MB should cap the RAM usage to
a bit under 5GB.

Reviewed By: kulshrax

Differential Revision: D14757839

fbshipit-source-id: b08bf01bddad01f1cae9cc67d4bd3d637c0bf0db
2019-04-03 16:56:09 -07:00
Arun Kulshreshtha
1f09251a85 edenapi: make hyper client fields private
Summary: Now that the Hyper client is contained in a single module, its fields do not need to be crate-public.

Reviewed By: sfilipco

Differential Revision: D14733274

fbshipit-source-id: aa5c2f4fd9fdf6e686da1fed6300e8cf8f7e5dbc
2019-04-03 14:14:27 -07:00
Arun Kulshreshtha
1472ff1efa edenapi: refactor Builder into Config
Summary:
Rather than having a `Builder` struct that knows how to build just one kind of Eden API client, let's have a common `Config` type that can be potentially passed to the constructor of several different client implementations.

This will allow the same config code to be re-used across different client types, as seen later in this stack.

Differential Revision: D14656757

fbshipit-source-id: 883ffd2dc0302ebe08960f079c113e2d0da2d2ca
2019-04-01 20:15:38 -07:00
Arun Kulshreshtha
48ab61d83d edenapi: move EdenApi implementation to client module
Summary: Move the implementation of the `EdenApi` trait for the current Hyper-backed Eden API client to the same module as the client, so that we can have several side-by-side implementations, each contained in their own respective module.

Differential Revision: D14656758

fbshipit-source-id: a0d5fa36ec346c40466df559ccc900b14a7c542f
2019-04-01 20:15:38 -07:00
Jun Wu
acc0aaea7d indexedlog: migrate from tempdir to tempfile
Summary: `tempdir` is deprecated. Use `tempfile` instead.

Differential Revision: D14690867

fbshipit-source-id: f5df77708078538a0832bd941726f280ed97355f
2019-04-01 17:16:18 -07:00
Jun Wu
1e59d25e17 indexedlog: add OpenOptions::index
Summary:
Make it a bit easier to define indexes.

Before:

    OpenOptions::new()
      .index_defs(vec![IndexDef::new("first-byte", |_| {
          vec![IndexOutput::Reference(0..1)]
      })])

After:

    OpenOptions::new()
      .index("first-byte", |_| vec![IndexOutput::Reference(0..1)])

Reviewed By: kulshrax

Differential Revision: D14690357

fbshipit-source-id: 6e80a91f4279f960d9f41369c228e79023b5164c
2019-04-01 17:16:17 -07:00
Jun Wu
88fb64a6ee indexedlog: use monospace font for links to code
Summary:
The Rust stdlib uses this pattern.  This is done by:

  sed -i 's/\[\([A-Z][a-zA-Z:]*\)\]/[`\1`]/g' *.rs

Unfortunately it seems only rustdoc nightly can linkify things correctly.

More context: https://github.com/rust-lang/rust/issues/43466

Reviewed By: kulshrax

Differential Revision: D14689887

fbshipit-source-id: ba2b5968bdaad06f39dc43962430906ee80692fd
2019-04-01 17:16:17 -07:00
Jun Wu
7c74b40bc1 logrotate: de-dup logic in OpenOptions
Summary:
rotate::OpenOptions is a superset of log::OpenOptions. Change the code to reuse
logic in log::OpenOptions as much as possible.

Reviewed By: kulshrax

Differential Revision: D14689888

fbshipit-source-id: a6958723c49f9d41b03100f01283a8c3fb37a1ab
2019-04-01 17:16:17 -07:00
Jun Wu
277d25b581 indexedlog: move checksum_type to OpenOptions
Summary:
The motivation of this is, LogRotate might copy dirty (non-flushed) entries
from one Log to another, and it cannot preserve the checksum type for those
entries. There are 2 solutions:

- Make `iter_dirty` return checksum type.
- Make checksum type known by Log directly.

The second choice provides a simpler public API. `append_advanced` can be
removed, then `iter_dirty` is still consistent with `iter`. Therefore this
change.

Differential Revision: D14688174

fbshipit-source-id: 09e07d64c886a5ce9bc48dce8e29d036af1c0381
2019-04-01 17:16:16 -07:00
Jun Wu
8fc9742997 indexedlog: make Log own OpenOptions
Summary: A later diff adds another field to OpenOptions that Log needs access to.

Differential Revision: D14688171

fbshipit-source-id: 33170a2b74639ba0fd8a9c86207d840fb6427580
2019-04-01 17:16:16 -07:00
Jun Wu
341b3dad6f logrotate: make flush delete old logs
Summary: This is the final piece to make space usage bounded.

Differential Revision: D14688179

fbshipit-source-id: a6e0058b9022789fcf036c4427d29eab19144b53
2019-04-01 17:16:16 -07:00
Jun Wu
b1b92b8def logrotate: make flush handle "latest" change
Summary:
If "latest" pointer has changed, we should write to the new "latest" Log,
instead of the stale one.

Differential Revision: D14688180

fbshipit-source-id: eab8df8ddb8f311e472361ecc2b1bc4155f2aba4
2019-04-01 17:16:15 -07:00
Jun Wu
c23508dcd9 indexedlog: add Log::iter_dirty
Summary:
This API iterates entries that are in-memory only. It is useful to extract
entries and store them elsewhere.

Differential Revision: D14688178

fbshipit-source-id: 6ace51d859ba6886aeb94689f6c45162b9c6958e
2019-04-01 17:16:15 -07:00
Jun Wu
f38bbfd92e logrotate: partially implement flush
Summary: Implement the basic flush logic. Missing bits are listed as TODO items.

Differential Revision: D14688177

fbshipit-source-id: 3613009ec2c216398af6eaff44487a20ceeb97ef
2019-04-01 17:16:15 -07:00
Jun Wu
cd1750f06b indexedlog: make Log::flush return the new file size
Summary:
The file size will be used to decide whether the Log needs "rotate" in upcoming
changes.

Reviewed By: kulshrax

Differential Revision: D14688169

fbshipit-source-id: b273abcc870b96650d2c76e6e742a3141ce48f13
2019-04-01 17:16:15 -07:00
Jun Wu
ec90e8db57 logrotate: implement append and lookup
Summary:
These methods just delegate to `Log` structures. Unfortunately, the key has to
be copied so it can be used by the iterator to query remaining logs.

Differential Revision: D14688172

fbshipit-source-id: fd581f7256031a0622ec0533c84daaab89f9bb82
2019-04-01 17:16:14 -07:00
Jun Wu
aecd9edae9 logrotate: implement open
Summary: Implement the open logic.

Reviewed By: kulshrax

Differential Revision: D14688170

fbshipit-source-id: df3d39040e2268b3eddb131b2ae1b1f76d3e4311
2019-04-01 17:16:14 -07:00
Jun Wu
f160f31cde logrotate: add a LogRotate structure
Summary:
Start implementing the "log rotate" idea by markbt. It is similar to
logrotate, with plain text log files replaced by indexedlog. This
implementation also avoids renaming, which can be troublesome on Windows,
by just increasing the number (ex. to rotate "1/", "2/", create "3/", and
delete "1/", without renaming "2/").

The main use case would be LRU key-value cache on disk.

Reviewed By: kulshrax

Differential Revision: D14688176

fbshipit-source-id: 3bf7917e06386ebf85d8d6deeea850c58f4875e8
2019-04-01 17:16:14 -07:00
Jun Wu
a7371c96d3 indexedlog: add create option to Log::OpenOptions
Summary:
One of the future need is to open a `Log` without creating it by default. The
newly added `create` option can be disabled to prevent that.

This also changes the code path so we no longer take a directory lock
unconditionally during `open`.

Differential Revision: D14688173

fbshipit-source-id: 88795d5637a1a5135d4014434b2cf828540c0333
2019-04-01 17:16:13 -07:00
Jun Wu
6555afa621 indexedlog: add Log::OpenOptions
Summary:
One of the upcoming changes is to add an option to avoid creating Log on demand
at open time. To avoid `open` being too complicated, add an `OpenOptions` struct.
This is consistent with `index` and `std::fs`.

Differential Revision: D14688175

fbshipit-source-id: bb7f1556a32f1f7b15c64a23c5aee7493dd40ce6
2019-04-01 17:16:13 -07:00
Stefan Filip
02851845a9 manifest: Fix skip_subtree on Leaf
Summary:
This diff fixes the behavior of `skip_subtree` when called on a Leaf. The bug is
that the path is not correctly handled in this case. The name of the file
continues to stay in the path resulting in incorrect path names for all
subsequent calls to `path()`.
The high level perspective  is that `skip_subtree` is a no-op in a Leaf node.
To fix, clarify the behavior and improve readability of the code we ad a new
state that handles poping elements from the path.

Durham noticed this bug when reviewing D14347655.

Reviewed By: quark-zju

Differential Revision: D14654557

fbshipit-source-id: 625278366e492a3048dddc44f9234a06d6928b7e
2019-04-01 11:51:16 -07:00
Jun Wu
64db96a4b7 indexedlog: make IndexDef clone-able
Summary:
It's hard to clone a `Fn`. But `fn` can be cloned. Change the API to use `fn`
instead.

Cloning `IndexDef` allows the same index definition to be used by multiple
Logs. It's used by upcoming diffs.

Differential Revision: D14688181

fbshipit-source-id: 6fda03a5f744dc90ee5d7ad3f36c243602f33510
2019-03-30 08:59:13 -07:00
Jun Wu
69a6c18747 indexedlog: normalize benchmarks to use 204800 entries
Summary:
This makes it easier to compare benchmark results between abstractions.

A sample of the result is listed below. Comparing to radixbuf, which is highly
optimized and less flexible, indexedlog is about 10x slower on insertion, and
about 3x slower on lookup.

indexedlog:

  index insertion (owned key)    90.201 ms
  index insertion (referred key) 81.567 ms
  index flush                    50.285 ms
  index lookup (memory)          25.201 ms
  index lookup (disk, no verify) 31.325 ms
  index lookup (disk, verified)  46.893 ms

  log insertion                  18.421 ms
  log insertion (no checksum)    12.106 ms
  log insertion with index      110.143 ms
  log flush                       8.783 ms
  log iteration (memory)          6.444 ms
  log iteration (disk)            6.719 ms

raidxbuf:

  index insertion                11.874 ms
  index lookup                    8.495 ms

Differential Revision: D14635330

fbshipit-source-id: 28b3f33b87f4e882cb3839c37a2a11b8ac80d3e9
2019-03-27 16:29:58 -07:00
Jun Wu
1568a30c9a indexedlog: add a benchmark inserting entries without checksum
Summary:
This is just a trivial test case showing the overhead of xxhash.

  log insertion                  18.359 ms
  log insertion (no checksum)     7.835 ms

Differential Revision: D14635329

fbshipit-source-id: adc2629c0c41aaab48d29d467849e4d96eb01c51
2019-03-27 16:29:58 -07:00