Summary:
An upcoming diff will need the ability to iterate over all the keys in
the store. So let's expose that functionality.
Reviewed By: quark-zju
Differential Revision: D13062575
fbshipit-source-id: a173fcdbbf44e2d3f09f7229266cca6f3e67944b
Summary:
You can currently iterate over indexlog entries, but there's no way to
iterate over the keys without keeping a copy of the index function with you.
Let's add a key iterator function.
Reviewed By: quark-zju
Differential Revision: D13010744
fbshipit-source-id: 1fcaf959ae82417e5cbafae7c1927c3ae8f8e76a
Summary:
Turn BookmarkStore rust implementation into indexed-log backed.
Note that this no longer matches existing mercurial bookmark store
disk representation.
Reviewed By: DurhamG
Differential Revision: D13133605
fbshipit-source-id: 2e0a27738bcec607892b0edab6f759116929c8e1
Summary:
This is done by running `fix-code.py`. Note that those strings are
semvers so they do not pin down the exact version. An API-compatiable upgrade
is still possible.
Reviewed By: ikostia
Differential Revision: D10213073
fbshipit-source-id: 82f90766fb7e02cdeb6615ae3cb7212d928ed48d
Summary:
The "misc" benchmark requires the base16 module to be public. It was made
private in a previous change. Let's make it public again so the benchmark can
run.
Reviewed By: singhsrb
Differential Revision: D13015031
fbshipit-source-id: 0dc1542803aae290de26651e367898eebfc95e83
Summary: It needs to be Send to be used in cpython.
Reviewed By: ikostia
Differential Revision: D10250289
fbshipit-source-id: ea57e356a0752764e50db9b6872b5cc4a456303f
Summary:
Make it more detailed for public APIs. Hide too detailed information (file
format).
Reviewed By: DurhamG
Differential Revision: D10250140
fbshipit-source-id: d9d9af9d67984b80f07db13e69bbffdf77e6a30e
Summary:
The log module is the "entry point" of other features. Update it so things are
more detailed. I tried to make it more friendly for people without knowledge
about the implementation details.
This could probably be further improved by adding some examples. For now, I'm
focusing on the plain English parts.
To reviewers: Let me know how you feel reading it assuming no prior knowledge
with the implementation. Ways to make sentences shorter, natural to native
speakers without losing important information are also very welcome.
Reviewed By: DurhamG
Differential Revision: D10250141
fbshipit-source-id: 35258c7197c1ce0a1d3d0554fab2f2d2866e123c
Summary:
Make important modules public. Make internal utility (base16) private. Add
some text to the crate-level document. It just refers to important structures.
Will revise document of those structures.
Reviewed By: DurhamG, kulshrax
Differential Revision: D10250143
fbshipit-source-id: c79859ee7d3d9cc4ee9a093ef5d12ec6599f2a42
Summary: This is just the result of running `./contrib/fix-code.py $(hg files .)`
Reviewed By: ikostia
Differential Revision: D10213075
fbshipit-source-id: 88577c9b9588a5b44fcf1fe6f0082815dfeb363a
Summary:
The code block is not a valid Rust program. Mark it as "plain".
This fixes `cargo doc`.
Reviewed By: markbt
Differential Revision: D10137806
fbshipit-source-id: 1197d3a2ebc1450a0738686fa6cfa7c7b79dcb0d
Summary:
The primary log and indexes could be out of sync when mutating the indexes
error out. In that case, mark the indexes as "corrupted" and refuse to
perform index read (lookup) operations, for correctness.
Reviewed By: DurhamG
Differential Revision: D8337689
fbshipit-source-id: 3db9006ea03cfcaba52391f189aa697944b616e5
Summary:
This demonstrates the index definitions can have different orders, as long
as their names do not change, things still work.
Reviewed By: DurhamG
Differential Revision: D8337688
fbshipit-source-id: 2fbbdf711d8edc10fc6d3314532390ea712aca6c
Summary:
This allows us to store arbitrary metadata in the root node. It will be used
by the `Log` structure to store how many bytes the index covers.
Reviewed By: DurhamG
Differential Revision: D8337687
fbshipit-source-id: 159a89d66765fc251a486fd62c1ffd01f625b503
Summary: Implement the dependencies of the "open" public API.
Reviewed By: DurhamG
Differential Revision: D8156518
fbshipit-source-id: 9fed441f520a3b74cbef5bfb815c82943c615fdf
Summary:
The read_entry function takes care of reading an entry from a given offset,
and return internal stats like real data offset (skipping the length and
checksum metadata), and the next entry offset.
It does integrity check and handles offset for both in-memory and on-disk
buffers. The offsets to in-memory entries are fairly simple - they start
from "meta.primary_len" instead of a fixed reserved value. This makes the
"next_offset" work seamlessly.
The public API won't have "offset" exposed, so the API is private.
Reviewed By: DurhamG
Differential Revision: D8156513
fbshipit-source-id: 8661f2f2757de6f3f94defc64f4a8dd5261973b2
Summary:
Partially implement open, append, flush, lookup APIs. This shows how things
work in general, like how locking works. What's in-memory and what's on-disk
etc.
Reviewed By: DurhamG
Differential Revision: D8156514
fbshipit-source-id: 2de23dcde2f63895f3f3e4f67057aa9520fdfa34
Summary: Implemented as the file format specification added by the previous diff.
Reviewed By: DurhamG
Differential Revision: D8156516
fbshipit-source-id: 7153932b9442b3ab5bdb81490f88c40346128afc
Summary: The public interface and its dependencies.
Reviewed By: DurhamG
Differential Revision: D8156509
fbshipit-source-id: c6f3e4b88851683a5d8804b80f689282e3f582d4
Summary:
Without this change, code doing `index.get(...).values().collect()` might
end up with an infinite loop.
Reviewed By: DurhamG
Differential Revision: D8156510
fbshipit-source-id: 5497aa354de7d49cfc4308a025856608ce981a1e
Summary:
Previously, the index API optionally takes a root offset. This is
inconvenient for the caller since they probably need to record both
valid file length and root offsets. Since root nodes are always at
the end of the index. Let's just simplify the API to take a logical
file length instead of a root offset.
Reviewed By: DurhamG
Differential Revision: D8156512
fbshipit-source-id: 7029272a61c9990e6484bca7ebbff64e2233c6cd
Summary:
Previously, `mmap_readonly` always reads file length, and uses that for mmap
length. In many cases we do know the desired file length and it's cleaner to
not `mmap` unused bytes. So let's add a parameter to do that.
Note: The `stat` call is still needed. Since `mmap` wouldn't return an error
of the requested length is greater than the file length.
Reviewed By: DurhamG
Differential Revision: D8156523
fbshipit-source-id: 991aa28f3542eaff24387dcc6a7302122fb6962f
Summary: The function will be reused in another module.
Reviewed By: DurhamG
Differential Revision: D8156522
fbshipit-source-id: 2aff6f2e4b8fc9b5d2c000e12ac2d940f7fab407
Summary:
`rand` 0.5 has too many breaking changes that the code is not ready to
migrate yet. So let's ping rand to 0.4. Ideally all dependencies in
Cargo.toml should avoid using "*". But for now `rand` is the only
troublemaker.
Note `rand 0.4` is a dependency of `quickcheck 0.6.2` so it's available.
Reviewed By: phillco, singhsrb
Differential Revision: D8158406
fbshipit-source-id: 417ae6807a2efc650acb8d82370964fab6531fdb
Summary:
Add a test that bitflips the index content, and make sure reading the index
would trigger an error.
Due to run-time performance difference, the release version tests 2-byte key
while the debug version only tests 1-byte key.
The header byte was not verified. Now it is verified.
Reviewed By: DurhamG
Differential Revision: D7517134
fbshipit-source-id: b3d8665ff4ac08c1a70db8d21122ba241913a2ed
Summary:
In "split_leaf" "Example 3" case, the old leaf entry (and its key) becomes
unused. Writing them to disk is unnecessary. This patch adds "unused" marker
so they could be marked and skipped inside flush().
No visible performance change:
index insertion 3.710 ms
index flush 3.717 ms
index lookup (memory) 1.128 ms
index lookup (disk, no verify) 1.993 ms
index lookup (disk, verified) 7.866 ms
Reviewed By: DurhamG
Differential Revision: D7517139
fbshipit-source-id: 253c878bc4b3762382c424777dfa779b3868e851
Summary: Since we now have the ability to store multiple values. Add a test.
Reviewed By: DurhamG
Differential Revision: D7472880
fbshipit-source-id: 85b1c69245ac7f0c4702daf22a02f5e5072f0924
Summary:
The value type is a linked list of u64 integers. Add an API to expose that.
Using iterator framework has benefits about flexibility - the caller can
take the first value, or convert it to a vector, or count the values, etc.
easily.
Reviewed By: DurhamG
Differential Revision: D7472881
fbshipit-source-id: d31e81770e069734b54fa08729c0cd45a699aae2
Summary:
This is caught by a later test. Looking up a non-existed child (jumptable
value is 0) returns InvalidData error, while it should return Offset(0).
The added if condition does not seem to have noticeable performance impact:
index insertion 3.840 ms
index flush 3.740 ms
index lookup (memory) 1.085 ms
index lookup (disk, no verify) 1.972 ms
index lookup (disk, verified) 7.752 ms
Reviewed By: DurhamG
Differential Revision: D7472882
fbshipit-source-id: 1cc51e9afa248e123cca9c561d7bb2128fd898b1
Summary:
Previously, the code was focusing on getting the hardest (index) part right,
but less about the value part. There is no way to get all values in the
linked list, as designed, yet. This diff starts the work.
Similar to `KeyOffset::key_and_link_offset`, change the internal API of
LinkOffset to return both value and the next link offset.
Reviewed By: DurhamG
Differential Revision: D7472879
fbshipit-source-id: 4a4512d7c63abbb667146de582e0f8cd04c9c04a
Summary:
`Index::open` now takes too many parameters, which is not very convenient to
use. Inspired by `fs::OpenOptions`, use a dedicated strut for specifying
open options.
Motivation: To test checksum ability more confidently, I'd like to write
something that randomly mutates 1 byte from a sane index. To make sure the
checksum coverage is "correct", checksum chunk size is another parameter.
Reviewed By: DurhamG
Differential Revision: D7464182
fbshipit-source-id: 469ce7d1cfa5de3946028418567a9f3e2bc303fa
Summary:
Address DurhamG's review comment on D7422832.
Previously, `OffsetMap::get` expects a dirty offset. That's because it was
changed from `HashMap` and we don't control `HashMap::get`. It's cleaner to
let `OffsetMap` do the `is_dirty` check.
Reviewed By: DurhamG
Differential Revision: D7461707
fbshipit-source-id: 9f2abdf6c6f993d98d9443f16bafcc6154ee0dbb
Summary:
The new test covers the `else` branch inside `LeafOffset::set_link`
previously not covered.
Coverage was checked by the following script:
```
from __future__ import absolute_import
import glob
import os
import shutil
os.system('cargo rustc --lib --profile test -- -Ccodegen-units=1 -Clink-dead-code -Zno-landing-pads')
path = max((os.stat(path).st_mtime, path) for path in glob.glob('./target/debug/*-????????????????'))[1]
shutil.rmtree('target/kcov')
os.system('kcov --include-path $PWD/src --verify target/kcov %s' % path)
```
Reviewed By: DurhamG
Differential Revision: D7446902
fbshipit-source-id: 293da2ff53b83c8f11534f0f8e5e7fd102216a01
Summary:
Change `insert_advanced` to accept an enum that could be either a key, or an
(offset, len) that refers to the external key buffer.
Insertion becomes slower due to new flexibility overhead. For some reason,
"index lookup (no verify)" becomes faster (restores pre-D7440248 performance):
index insertion 6.434 ms
index flush 3.757 ms
index lookup (memory) 1.068 ms
index lookup (disk, no verify) 1.969 ms
index lookup (disk, verified) 7.805 ms
With 2M 20-byte keys, the non-external key version generates a 105MB index:
seconds operation
1.247 insert
0.622 flush
1.859 flush done
0.702 lookup (without checksum)
1.395 lookup (with checksum)
Using external keys,the index is 70MB, and time for each operation:
seconds operation
1.086 insert
0.702 flush
0.665 lookup (without checksums)
1.602 lookup (with checksums)
The external key will have more space wins for longer keys, ex. file path.
`Index` module was made public so `InsertKey` type is usable.
Reviewed By: DurhamG
Differential Revision: D7444907
fbshipit-source-id: b89d95246845799c2c55fb73ad203a7e6724b85e
Summary:
Previously, a leaf entry can only have a `KeyOffset`. This diff makes it
possible to be either `KeyOffset`, or `ExtKeyOffset`. The API didn't change
much since `LeafOffset::key_and_link_offset` handles the difference
transparently.
Latest benchmark result:
index insertion 4.879 ms
index flush 3.620 ms
index lookup (memory) 1.827 ms
index lookup (disk, no verify) 3.508 ms
index lookup (disk, verified) 7.861 ms
Reviewed By: DurhamG
Differential Revision: D7444909
fbshipit-source-id: 5441e1ae187d42931377d7213dcb77156b2af714
Summary:
The leaf entry has a `key_and_link_offset` method. Previously it returns a
`KeyOffset`, since we now have `ExtKeyOffset`, it's friendly to handle the
key entry type difference at the leaf entry level, instead of requiring the
caller to handle it.
Reviewed By: DurhamG
Differential Revision: D7444905
fbshipit-source-id: 56d87641a2a5a50ddca8b1e4c74c9aaa3891b542
Summary:
Previously, I thought there is only one index that will use "commit hash" as
keys, that is the nodemap, and other indexes (like childmap) would just use
shorter integer keys (ex. revision number, or offsets). So the space overhead
of storing full keys only applies to one index and seems acceptable.
But that implies strict topo order for the source of truth data (ex. to use
integers as keys in childmap, you have to know how to translate parent
revisions from hashes to integers at the time writing the revision).
Thinking about it again, it seems the topo-order requirement would make a lot
of things less flexible. It's much easier to just use hashes as keys in the
index. Then it's worthwhile to address the space efficiency problem by
introducing an "external key buffer" concept. That's actually what `radixbuf`
does.
This is the start. It adds the type to the strcut. The feature is not completed
yet.
Reviewed By: DurhamG
Differential Revision: D7444904
fbshipit-source-id: 60a83c9e6e8b0734450f0c5827928a7c5bd111d5
Summary:
It further slows down lookups, even when checksum is disabled, since even a
`is_none()` check is not free:
index insertion 4.697 ms
index flush 3.764 ms
index lookup (memory) 2.878 ms
index lookup (disk, no verify) 3.564 ms
index lookup (disk, verified) 7.788 ms
The "verified" version basically needs 2x time due to more memory lookups.
Unfortunately this means eventual lookup performance will be slower than
gdbm, but insertion is still much faster. And the index still has a better
locking properties (lock-free read) that gdbm does not have.
With correct time complexity (no O(len(changelog)) index-only operations for
example), I'd expect it's rare for the overall performance to be bounded by
index performance. Data integrity is more important.
With a larger number of nodes, ex. 2M 20-byte strings: inserting to memory
takes 1.4 seconds, flushing to disk takes 0.9 seconds, looking up without
checksum takes 0.9 seconds, looking up with checksum takes 1.7 seconds.
Reviewed By: DurhamG
Differential Revision: D7440248
fbshipit-source-id: 020e5204606f9f0a4f68843a491009a6a6f75751
Summary:
This is in the critical path for lookup, and has very visible performance
penalty:
index insertion 3.923 ms
index flush 3.921 ms
index lookup (memory) 1.070 ms
index lookup (disk, no verify) 1.980 ms
index lookup (disk, verified) 5.206 ms
Reviewed By: DurhamG
Differential Revision: D7440252
fbshipit-source-id: 49540f974faff1cdd0603a72328f141ccd054ee2
Summary:
Previously checksum is only for `MemRoot`, now it's for all `Mem` structs.
Since `Mem*` structs are not frequently used in the normal lookup code path,
there is no visible performance change.
Reviewed By: DurhamG
Differential Revision: D7440253
fbshipit-source-id: 945f5a8c38d228f59190a487b0cf6dbc5daac4f7
Summary:
The type will be used all over the place and may make `rustfmt` wrap lines.
Use a shorter type to make it slightly cleaner.
Reviewed By: DurhamG
Differential Revision: D7436338
fbshipit-source-id: ecaada23916a22658f65669b748632a077e60df2
Summary:
This only affects `Index::open` right now. So it's a one time check and does
not affect performance.
Reviewed By: DurhamG
Differential Revision: D7436341
fbshipit-source-id: 30313064bf2ea50320ac744fc18c03bff4b12c89