Summary:
`Log::lookup_range` exposes the range query feature provided by `Index`.
The iterator is made double-ended by the way.
Reviewed By: sfilipco
Differential Revision: D14895477
fbshipit-source-id: 6aef0973e009bf8fc6f3b5e5a8f6c54e57c81360
Summary:
The RangeIter is actually faster. The main reason is that it avoids recursion.
RangeIter does require double Vec, which seems like extra overhead. Practically
it does not seem to matter much.
The RangeIter code is also better written than PrefixIter. So let's delete
PrefixIter, and switch prefix lookups to use RangeIter.
Before:
index prefix scan (2B) 89.788 ms
index prefix scan (1B) 72.337 ms
index prefix scan (2B, disk) 102.098 ms
index prefix scan (1B, disk) 90.445 ms
After:
index prefix scan (2B) 76.335 ms
index prefix scan (1B) 54.517 ms
index prefix scan (2B, disk) 91.798 ms
index prefix scan (1B, disk) 67.143 ms
Reviewed By: sfilipco
Differential Revision: D14895478
fbshipit-source-id: 79a01774fb640c78fc5733db82f86f0f9403c960
Summary:
This would provide data about scan_prefix performance.
The benchmark code is slightly changed to share the index across test cases.
That reduces test setup cost.
Reviewed By: sfilipco
Differential Revision: D14895481
fbshipit-source-id: e70098bd202e102822a0829c0ae28de8d49fbe85
Summary:
This API allows range query, similar to `BTreeMap::range`.
It's going to be used by segmented changelog. There are spans (start, end)
stored in the index, and we need to find spans by rev (start <= rev <= end).
Initially I was changing PrefixIter incrementally towards the new RangeIter.
There are too many small commits and I got some useful feedback early. Now
it seems cleaner to just introduce the desired state of RangeIter first.
We can later migrate prefix lookup to RangeIter, if perf regression is
negligible.
The added code is long. But some of them are modified from existing code:
- `RangeIter::next_internal` is modified from `PrefixIter::next`.
- `Index::get_stack_by_bound` is modified from `Index::scan_prefix_base16`.
The tests helped find some issues of the code. I hope they're not too weak.
Reviewed By: sfilipco
Differential Revision: D14895479
fbshipit-source-id: fb8f1bd35c61187fe5f7764fa485206bbb13c8e0
Summary:
The internal rustfmt linter suggests wrong autofixes for the `impl_offset!`
macro. That's noisy for every diff touching `index.rs`. Silence it by moving
macros to a separate file.
To be consistent, `define_error!` is also moved.
Differential Revision: D14885746
fbshipit-source-id: d1a518e631f80d6d7945f1ea3c2e4d18e1c799ca
Summary:
In case there are nothing to write, `Log` and `LogRotate` can take
a fast path that does not take directory locks.
Differential Revision: D14885450
fbshipit-source-id: 4d72d5a3e33b7371880ad31f8bc43ed31c03797f
Summary:
The `flush()` function does two things: read and write. It's not just writing
data. Rename it to `sync` to clarify.
`Index::flush` is unchanged because although it might read new data, the new
data is not visible. Calling `Index::flush` without dirty changes does not
cause visible (queriable) changes to the index.
`FlushFilter` is unchanged because it is coupled with the write path. It is
not to filter reading.
The old name is kept temporarily until all pending code gets committed and
we can do a codemod.
Differential Revision: D14885451
fbshipit-source-id: 3aed3b741e5e8f09b611ddcc25930a6fdf71706c
Summary:
The `writable_log` API can be misused to "flush" a Log, bypassing the check
about whether it should be rotated or not.
The real need of `writable_log` is to get accesses to indexes on the "writable"
(or "latest") log. Therefore let's just expose that instead.
Practically, the only use case of querying the index on the "latest" log is to
make sure dependent content are written to a same Log. That also requires a
"flush_filter" to be provided. Therefore add an assertion about it.
Differential Revision: D14866022
fbshipit-source-id: f6c07a498597b6f0f07d7cc3130e9033ba8b9be4
Summary:
Introduce the "flush filter" that can replace content to be written.
This would be useful to make sure delta chains are self-contained.
For LogRotate, flush_filter is trigger not only when the log file
was modified, but also when rotation happens,
Differential Revision: D14866024
fbshipit-source-id: f417200d3ae573e9ac82985ad6afd082412b358d
Summary:
The flush filter allows mutating entries being flushed. It can be used to avoid
inserting duplicated data.
Differential Revision: D14866023
fbshipit-source-id: ecf6cf60a0a97cf8110ef9c957e7e3bbab5855fc
Summary:
Previously the code allows the "log" file to be longer than the metadata,
intended to allow advanced usecases that replaces the "meta" file to
get a read-only view in the past.
That implies we trust the length of "log" file. But it's in theory easy to mess
up - when appending to the "log" file, the process might be killed.
Data integrity is first priority. Therefore let's just error out if the file
length does not match the metadata. To support read-only views in the past,
we can use potentially use file names other than "meta" or support in-memory
metadata instead.
Differential Revision: D14866025
fbshipit-source-id: bbf0061a6448375a2de06fbf31f2b9838c749be0
Summary:
Failure makes it easier to chain errors, and backtraces. Use it.
There is probably still room for improvement, by chainning errors and avoiding
exposing low-level errors for APIs, and/or provide more context in error
messages. But it should be already much better than before.
Differential Revision: D14759305
fbshipit-source-id: b1d3a8ec959dde575f06533ea9e4cd0757057051
Summary:
Practically there are many issues with a large max_log_count:
- The directory scan would be slower.
- The index would be slower.
Let's reduce it to u8 range to address the issues. This also makes the
directory name short.
Differential Revision: D14717896
fbshipit-source-id: d39f008abe576991e14d444c37a049a6132df507
Summary:
Some tests added by upcoming diffs were timing out while they don't seem that
expensive. I tracked it down to the use of `fsync` in atomicwrites.
In our case, we don't need `fsync`. `fsync` is useful for making sure the order
of file writes is desired even in case of system crash. For example, making
sure the "primary" log file is written before writing the "meta" file.
That's too expensive (esp. on filesystems like ext4) for our usecase.
Indexedlog is designed to make sure data corruption can be detected, and there
can be a "reasonable" way to recover (ex. by deleting all indexes, scanning
through entries and re-inserting them in a new log), not to fight against OS
crashes.
`cargo bench` change on a btrfs filesystem:
Before:
index flush 42.570 ms
log flush 7.712 ms
After:
index flush 36.485 ms
log flush 1.609 ms
Differential Revision: D14759304
fbshipit-source-id: 66b95d10040cf1480367b767811dfabee5e27ffe
Summary: Used `cargo fix --edition`. Removed some `mut`s according to rustc warnings.
Differential Revision: D14718308
fbshipit-source-id: 94e3c3f8e47143ede767fe883fdb5e9602b12854
Summary:
The Rust stdlib uses this pattern. This is done by:
sed -i 's/\[\([A-Z][a-zA-Z:]*\)\]/[`\1`]/g' *.rs
Unfortunately it seems only rustdoc nightly can linkify things correctly.
More context: https://github.com/rust-lang/rust/issues/43466
Reviewed By: kulshrax
Differential Revision: D14689887
fbshipit-source-id: ba2b5968bdaad06f39dc43962430906ee80692fd
Summary:
rotate::OpenOptions is a superset of log::OpenOptions. Change the code to reuse
logic in log::OpenOptions as much as possible.
Reviewed By: kulshrax
Differential Revision: D14689888
fbshipit-source-id: a6958723c49f9d41b03100f01283a8c3fb37a1ab
Summary:
The motivation of this is, LogRotate might copy dirty (non-flushed) entries
from one Log to another, and it cannot preserve the checksum type for those
entries. There are 2 solutions:
- Make `iter_dirty` return checksum type.
- Make checksum type known by Log directly.
The second choice provides a simpler public API. `append_advanced` can be
removed, then `iter_dirty` is still consistent with `iter`. Therefore this
change.
Differential Revision: D14688174
fbshipit-source-id: 09e07d64c886a5ce9bc48dce8e29d036af1c0381
Summary: A later diff adds another field to OpenOptions that Log needs access to.
Differential Revision: D14688171
fbshipit-source-id: 33170a2b74639ba0fd8a9c86207d840fb6427580
Summary: This is the final piece to make space usage bounded.
Differential Revision: D14688179
fbshipit-source-id: a6e0058b9022789fcf036c4427d29eab19144b53
Summary:
If "latest" pointer has changed, we should write to the new "latest" Log,
instead of the stale one.
Differential Revision: D14688180
fbshipit-source-id: eab8df8ddb8f311e472361ecc2b1bc4155f2aba4
Summary:
This API iterates entries that are in-memory only. It is useful to extract
entries and store them elsewhere.
Differential Revision: D14688178
fbshipit-source-id: 6ace51d859ba6886aeb94689f6c45162b9c6958e
Summary: Implement the basic flush logic. Missing bits are listed as TODO items.
Differential Revision: D14688177
fbshipit-source-id: 3613009ec2c216398af6eaff44487a20ceeb97ef
Summary:
The file size will be used to decide whether the Log needs "rotate" in upcoming
changes.
Reviewed By: kulshrax
Differential Revision: D14688169
fbshipit-source-id: b273abcc870b96650d2c76e6e742a3141ce48f13
Summary:
These methods just delegate to `Log` structures. Unfortunately, the key has to
be copied so it can be used by the iterator to query remaining logs.
Differential Revision: D14688172
fbshipit-source-id: fd581f7256031a0622ec0533c84daaab89f9bb82
Summary:
Start implementing the "log rotate" idea by markbt. It is similar to
logrotate, with plain text log files replaced by indexedlog. This
implementation also avoids renaming, which can be troublesome on Windows,
by just increasing the number (ex. to rotate "1/", "2/", create "3/", and
delete "1/", without renaming "2/").
The main use case would be LRU key-value cache on disk.
Reviewed By: kulshrax
Differential Revision: D14688176
fbshipit-source-id: 3bf7917e06386ebf85d8d6deeea850c58f4875e8
Summary:
One of the future need is to open a `Log` without creating it by default. The
newly added `create` option can be disabled to prevent that.
This also changes the code path so we no longer take a directory lock
unconditionally during `open`.
Differential Revision: D14688173
fbshipit-source-id: 88795d5637a1a5135d4014434b2cf828540c0333
Summary:
One of the upcoming changes is to add an option to avoid creating Log on demand
at open time. To avoid `open` being too complicated, add an `OpenOptions` struct.
This is consistent with `index` and `std::fs`.
Differential Revision: D14688175
fbshipit-source-id: bb7f1556a32f1f7b15c64a23c5aee7493dd40ce6
Summary:
It's hard to clone a `Fn`. But `fn` can be cloned. Change the API to use `fn`
instead.
Cloning `IndexDef` allows the same index definition to be used by multiple
Logs. It's used by upcoming diffs.
Differential Revision: D14688181
fbshipit-source-id: 6fda03a5f744dc90ee5d7ad3f36c243602f33510
Summary:
This makes it easier to compare benchmark results between abstractions.
A sample of the result is listed below. Comparing to radixbuf, which is highly
optimized and less flexible, indexedlog is about 10x slower on insertion, and
about 3x slower on lookup.
indexedlog:
index insertion (owned key) 90.201 ms
index insertion (referred key) 81.567 ms
index flush 50.285 ms
index lookup (memory) 25.201 ms
index lookup (disk, no verify) 31.325 ms
index lookup (disk, verified) 46.893 ms
log insertion 18.421 ms
log insertion (no checksum) 12.106 ms
log insertion with index 110.143 ms
log flush 8.783 ms
log iteration (memory) 6.444 ms
log iteration (disk) 6.719 ms
raidxbuf:
index insertion 11.874 ms
index lookup 8.495 ms
Differential Revision: D14635330
fbshipit-source-id: 28b3f33b87f4e882cb3839c37a2a11b8ac80d3e9
Summary:
This is just a trivial test case showing the overhead of xxhash.
log insertion 18.359 ms
log insertion (no checksum) 7.835 ms
Differential Revision: D14635329
fbshipit-source-id: adc2629c0c41aaab48d29d467849e4d96eb01c51
Summary: `std::fs` is only needed for Windows. Do not "use" it on *nix systems.
Reviewed By: sfilipco
Differential Revision: D14634779
fbshipit-source-id: 9fd9a29ae27e13f00b4adbc83a74bd92a1b1658c
Summary:
Change fields in IndexDef to private. Provide a public constructor method and
switch users to use that instead. This makes it possible to change the IndexDef
struct in the future (ex. having extra optional fields about whether the index
is backed by radix tree or something different).
Differential Revision: D14608955
fbshipit-source-id: 62a413268d97ba96b2c4efd2ce67cd4fa0ff4293
Summary:
Windows disallows rewriting or truncating mmaped files. Fix the tests by
either dropping the mmap, or skipping the test.
Reviewed By: sfilipco
Differential Revision: D14572119
fbshipit-source-id: dccafdc66db3830c2919232d899ba31365120066
Summary:
The `load_or_create_meta` function is subject to filesystem races. Solve it by
always taking a lock.
This hurts performance a little bit. But `open()` should not be in a hot loop.
So it should probably be fine.
Reviewed By: sfilipco
Differential Revision: D14568122
fbshipit-source-id: d9b28555ab94252da4717de709b780b361e1dda7
Summary:
On Windows it's impossible to open (2) a directory. Therefore add a utility
function that creates `lock` file automatically on Windows and open that file
instead.
Reviewed By: sfilipco
Differential Revision: D14568117
fbshipit-source-id: bc7ae7046be654560c38fbd98ec4dd58c071b1dc
Summary:
Previously, `load_or_create_meta` could return without actually creating the
meta file. That leads to problems when `load_or_create_meta` is called a
second time via `flush()`, it rewrites the primary file incorrectly. On Windows,
it will fail to rewrite the primary file.
Fix it by actually writing a meta file before returning.
Reviewed By: sfilipco
Differential Revision: D14568118
fbshipit-source-id: da3ad42bf48a923d732b1719839ca1953bd2b06c
Summary:
This exposes the underlying lookup functions from `Index`.
Alternatively we can allow access to `Index` and provide an `iter_started_from`
method on `Log` which takes a raw offset. I have been trying to avoid exposing
raw offsets in public interfaces, as they would change after `flush()` and cause
problems.
Reviewed By: markbt
Differential Revision: D13498303
fbshipit-source-id: 8b00a2a36a9383e3edb6fd7495a005bc985fd461
Summary:
This is the missing API before `indexedlog::Index` can fit in the
`changelog.partialmatch` case. It's actually more flexible as it can provide
some example commit hashes while the existing revlog.c or radixbuf
implementation just error out saying "ambiguous prefix".
It can be also "abused" for the semantics of sorted "sub-keys". By replace
"key" with "key + subkey" when inserting to the index. Looking up using "key"
would return a lazy result list (`PrefixIter`) sorted by "subkey". Note:
the radix tree is NOT efficient (both in time and space) when there are common
prefixes. So this use-case needs to be careful.
Reviewed By: markbt
Differential Revision: D13498301
fbshipit-source-id: 637856ebd761734d68b20c15866424b1d4518ad6
Summary: This will be used in prefix lookups.
Reviewed By: markbt
Differential Revision: D13498300
fbshipit-source-id: 3db7a21d6f35a18699d9dc3a0eca71a5410e0e61
Summary:
It makes testing duplicated - now `cargo test` would try running tests on 2 entry points:
lib.rs and indexedlog_dump.rs. Move it to a separate crate to solve the issue.
Reviewed By: markbt
Differential Revision: D13498266
fbshipit-source-id: 8abf07c1272dfa825ec7701fd8ea9e0d1310ec5f
Summary: `write!` result needs to be used.
Reviewed By: markbt
Differential Revision: D13471967
fbshipit-source-id: d48752bcac05dd33b112679d7faf990eb8ddd651
Summary:
Add a new entry type - INLINE_LEAF, which embeds the EXT_KEY and LINK entries
to save space.
The index size for referred keys is significantly reduced with little overhead:
index insertion (owned key) 3.732 ms
index insertion (referred key) 3.604 ms
index flush 11.868 ms
index lookup (memory) 1.159 ms
index lookup (disk, no verify) 2.175 ms
index lookup (disk, verified) 4.303 ms
index size (5M owned keys) 216626039
index size (5M referred keys) 96616431
11.87s user 2.96s system 98% cpu 15.107 total
The breakdown of the "5M referred keys" size is:
type count bytes
radixes 1729472 33835772
inline_leafs 5000000 62780651
There are no other kinds of entries stored.
Previously, the index size of referred keys is:
index size (5M referred keys) 136245815 bytes
So it's 136MB -> 96MB, 40% decrease.
Reviewed By: DurhamG
Differential Revision: D13036801
fbshipit-source-id: 27e68e4b6c332c1dc419abc6aba69271952e4b3d
Summary:
Replace the 20-byte "jump table" with 3-byte "flag + bitmap". This saves space
for indexes less than 4GB. There are some reserved bits in the "flag" so if we
run into space issues when indexes are larger than 4GB, we can try adding
6-byte integer, or VLQ back without breaking backwards-compatibility.
It seems to hurt flush performance a bit, because we have to scan the child
array twice. However, lookup (the most important performance) does not change
much. And the index is more compact.
After:
index flush 19.644 ms
index lookup (disk, no verify) 2.220 ms
index lookup (disk, verified) 4.067 ms
index size (5M owned keys) 216626039 bytes
index size (5M referred keys) 136245815 bytes
Before:
index flush 16.764 ms
index lookup (disk, no verify) 2.205 ms
index lookup (disk, verified) 4.030 ms
index size (5M owned keys) 240838647 bytes
index size (5M referred keys) 160458423 bytes
For the "referred key" case, it's 160->136MB, 17% decrease.
A detailed break down of components of index is:
After:
type count bytes (using owned keys)
radixes 1729472 33835772
links 5000000 27886336
leafs 5000000 44629384
keys 5000000 110000000
type count bytes (using referred keys)
radixes 1729472 33835772
links 5000000 27886336
leafs 5000000 44629384
ext_keys 5000000 29894315
Before:
type count bytes (using owned keys)
radixes 1729472 58048380
links 5000000 27886336
leafs 5000000 44903923
keys 5000000 110000000
type count bytes (using referred keys)
radixes 1729472 58048380
links 5000000 27886336
leafs 5000000 44629384
ext_keys 5000000 29894315
Leaf nodes are taking too much space. It seems the next big optimization might
be inlining ext_keys into leafs.
Reviewed By: DurhamG, markbt
Differential Revision: D13028196
fbshipit-source-id: 6043b16fd67a497eb52d20a17e153fcba5cb3e81
Summary:
Since the size test only runs once, we can use a larger number of keys. This is
closer to some production use-cases.
`cargo bench size` shows:
index size (5M owned keys) 240838647
index size (5M referred keys) 160458423
It currently uses 32 bytes per key for 5M referred keys.
Reviewed By: markbt
Differential Revision: D13027880
fbshipit-source-id: 726f5fb2da056e77ab93d82fda9f1afa500d0a8d