Commit Graph

103 Commits

Author SHA1 Message Date
Jun Wu
b3893b3d3c indexedlog: add methods on Log to do prefix lookups
Summary:
This exposes the underlying lookup functions from `Index`.

Alternatively we can allow access to `Index` and provide an `iter_started_from`
method on `Log` which takes a raw offset. I have been trying to avoid exposing
raw offsets in public interfaces, as they would change after `flush()` and cause
problems.

Reviewed By: markbt

Differential Revision: D13498303

fbshipit-source-id: 8b00a2a36a9383e3edb6fd7495a005bc985fd461
2018-12-20 15:50:55 -08:00
Jun Wu
3237b77e4c indexedlog: add APIs to lookup by prefix
Summary:
This is the missing API before `indexedlog::Index` can fit in the
`changelog.partialmatch` case. It's actually more flexible as it can provide
some example commit hashes while the existing revlog.c or radixbuf
implementation just error out saying "ambiguous prefix".

It can be also "abused" for the semantics of sorted "sub-keys". By replace
"key" with "key + subkey" when inserting to the index. Looking up using "key"
would return a lazy result list (`PrefixIter`) sorted by "subkey". Note:
the radix tree is NOT efficient (both in time and space) when there are common
prefixes. So this use-case needs to be careful.

Reviewed By: markbt

Differential Revision: D13498301

fbshipit-source-id: 637856ebd761734d68b20c15866424b1d4518ad6
2018-12-20 15:50:55 -08:00
Jun Wu
562b7a1704 indexedlog: add a function to convert base16 to base256
Summary: This will be used in prefix lookups.

Reviewed By: markbt

Differential Revision: D13498300

fbshipit-source-id: 3db7a21d6f35a18699d9dc3a0eca71a5410e0e61
2018-12-20 15:50:55 -08:00
Jun Wu
443a8f33b3 indexedlog: move binary indexedlog_dump out
Summary:
It makes testing duplicated - now `cargo test` would try running tests on 2 entry points:
lib.rs and indexedlog_dump.rs.  Move it to a separate crate to solve the issue.

Reviewed By: markbt

Differential Revision: D13498266

fbshipit-source-id: 8abf07c1272dfa825ec7701fd8ea9e0d1310ec5f
2018-12-18 08:17:21 -08:00
Jun Wu
61b1a5f475 indexedlog: fix rustc warnings
Summary: `write!` result needs to be used.

Reviewed By: markbt

Differential Revision: D13471967

fbshipit-source-id: d48752bcac05dd33b112679d7faf990eb8ddd651
2018-12-17 12:10:52 -08:00
Jun Wu
421c7b3f45 indexedlog: add a tool to dump indexedlog content
Summary: The tool can dump indexedlog content. Useful for manually investigating issues.

Reviewed By: DurhamG

Differential Revision: D13051387

fbshipit-source-id: 8687a1aa9dfb54776e80f184208c49da2492c34d
2018-12-06 14:57:52 -08:00
Jun Wu
54dc931140 indexedlog: use inlined leaf entries to further reduce index size
Summary:
Add a new entry type - INLINE_LEAF, which embeds the EXT_KEY and LINK entries
to save space.

The index size for referred keys is significantly reduced with little overhead:

  index insertion (owned key)     3.732 ms
  index insertion (referred key)  3.604 ms
  index flush                    11.868 ms
  index lookup (memory)           1.159 ms
  index lookup (disk, no verify)  2.175 ms
  index lookup (disk, verified)   4.303 ms
  index size (5M owned keys)     216626039
  index size (5M referred keys)   96616431
    11.87s user 2.96s system 98% cpu 15.107 total

The breakdown of the "5M referred keys" size is:

  type          count     bytes
  radixes       1729472   33835772
  inline_leafs  5000000   62780651

There are no other kinds of entries stored.

Previously, the index size of referred keys is:

  index size (5M referred keys)  136245815 bytes

So it's 136MB -> 96MB, 40% decrease.

Reviewed By: DurhamG

Differential Revision: D13036801

fbshipit-source-id: 27e68e4b6c332c1dc419abc6aba69271952e4b3d
2018-12-06 14:57:52 -08:00
Jun Wu
a4958163ee indexedlog: optimize size of radix entries (BC)
Summary:
Replace the 20-byte "jump table" with 3-byte "flag + bitmap". This saves space
for indexes less than 4GB. There are some reserved bits in the "flag" so if we
run into space issues when indexes are larger than 4GB, we can try adding
6-byte integer, or VLQ back without breaking backwards-compatibility.

It seems to hurt flush performance a bit, because we have to scan the child
array twice. However, lookup (the most important performance) does not change
much. And the index is more compact.

After:

  index flush                    19.644 ms
  index lookup (disk, no verify)  2.220 ms
  index lookup (disk, verified)   4.067 ms
  index size (5M owned keys)     216626039 bytes
  index size (5M referred keys)  136245815 bytes

Before:

  index flush                    16.764 ms
  index lookup (disk, no verify)  2.205 ms
  index lookup (disk, verified)   4.030 ms
  index size (5M owned keys)     240838647 bytes
  index size (5M referred keys)  160458423 bytes

For the "referred key" case, it's 160->136MB, 17% decrease.

A detailed break down of components of index is:

After:

  type       count     bytes (using owned keys)
  radixes    1729472   33835772
  links      5000000   27886336
  leafs      5000000   44629384
  keys       5000000  110000000

  type       count     bytes (using referred keys)
  radixes    1729472   33835772
  links      5000000   27886336
  leafs      5000000   44629384
  ext_keys   5000000   29894315

Before:

  type       count     bytes (using owned keys)
  radixes    1729472   58048380
  links      5000000   27886336
  leafs      5000000   44903923
  keys       5000000  110000000

  type       count     bytes (using referred keys)
  radixes    1729472   58048380
  links      5000000   27886336
  leafs      5000000   44629384
  ext_keys   5000000   29894315

Leaf nodes are taking too much space. It seems the next big optimization might
be inlining ext_keys into leafs.

Reviewed By: DurhamG, markbt

Differential Revision: D13028196

fbshipit-source-id: 6043b16fd67a497eb52d20a17e153fcba5cb3e81
2018-12-06 14:57:52 -08:00
Jun Wu
d8117b3b04 indexedlog: increase key count for size test
Summary:
Since the size test only runs once, we can use a larger number of keys. This is
closer to some production use-cases.

`cargo bench size` shows:

  index size (5M owned keys)     240838647
  index size (5M referred keys)  160458423

It currently uses 32 bytes per key for 5M referred keys.

Reviewed By: markbt

Differential Revision: D13027880

fbshipit-source-id: 726f5fb2da056e77ab93d82fda9f1afa500d0a8d
2018-12-06 14:57:52 -08:00
Jun Wu
55b6331aa4 indexedlog: add more benchmarks
Summary:
Add benchmarks about index sizes, and a benchmark of insertion using key
references.

An example `cargo bench` result running on my devserver looks like:

  index insertion (owned key)     3.551 ms
  index insertion (referred key)  3.713 ms
  index flush                    20.648 ms
  index lookup (memory)           1.087 ms
  index lookup (disk, no verify)  2.041 ms
  index lookup (disk, verified)   4.347 ms
  index size (owned key)            886010
  index size (referred key)         534298

Reviewed By: markbt

Differential Revision: D13027879

fbshipit-source-id: 70644c504026ffee2122d857d5035f5b7eea4f42
2018-12-06 14:57:52 -08:00
Jun Wu
d7129256d4 indexedlog: switch checksum table to little endian (BC)
Summary:
For checksum values like xxhash, there is no benefit using big endian. Switch
to little endian so it's slightly slightly faster on the major platforms we
care about.

This is a breaking change. However, the format is not used in production yet.
So there is no migration code.

Reviewed By: markbt

Differential Revision: D13015465

fbshipit-source-id: ca83d19b3328370d089b03a33e848e64b728ef2a
2018-12-06 14:57:52 -08:00
Jun Wu
75b4f92c44 indexedlog: support different checksum functions for Log entries (BC)
Summary:
Previously, the format of an Log entry is hard-coded - length, xxhash, and
content. The xxhash always takes 8 bytes.

For small (ex. 40-byte) entries, xxhash32 is actually faster and takes less
disk space.

Introduce the "entry flags" concept so we can store some metadata about what
checksum function to use. The concept could be potentially used to support
other new format changes at per entry level in the future.

As we're here, also support data without checksums. That can be useful for
content with its own checksum, like a blob store with its own SHA1 integrity
check.

Performance-wise, log insertion is slower (but the majority insertaion overhead
would be on the index part), iteration is a little bit faster, perhaps because
the log can use less data.

Before:

  log insertion                  15.874 ms
  log iteration (memory)          6.778 ms
  log iteration (disk)            6.830 ms

After:

  log insertion                  18.114 ms
  log iteration (memory)          6.403 ms
  log iteration (disk)            6.307 ms

Reviewed By: DurhamG, markbt

Differential Revision: D13051386

fbshipit-source-id: 629c251633ecf85058ee7c3ce7a9f576dfac7bdf
2018-12-06 14:57:52 -08:00
Jun Wu
049cd99f05 indexedlog: use non-VLQ encoding for xxhash (BC)
Summary:
Xxhash result won't usually have leading zeros. So VLQ encoding is not an
efficient choice. Use non-VLQ encoding instead.

Performance wise, this is noticably faster than before:

  log insertion                  14.161 ms
  log insertion with index      102.724 ms
  log flush                      11.336 ms
  log iteration (memory)          6.351 ms
  log iteration (disk)            7.922 ms
    10.18s user 3.66s system 97% cpu 14.218 total
  log insertion                  13.377 ms
  log insertion with index       97.422 ms
  log flush                      11.792 ms
  log iteration (memory)          6.890 ms
  log iteration (disk)            7.139 ms
    10.20s user 3.56s system 97% cpu 14.117 total
  log insertion                  14.573 ms
  log insertion with index       94.216 ms
  log flush                      18.993 ms
  log iteration (memory)          7.867 ms
  log iteration (disk)            7.567 ms
    9.85s user 3.73s system 96% cpu 14.073 total
  log insertion                  15.526 ms
  log insertion with index       98.868 ms
  log flush                      19.600 ms
  log iteration (memory)          7.533 ms
  log iteration (disk)            7.150 ms
    10.13s user 4.02s system 96% cpu 14.647 total
  log insertion                  14.629 ms
  log insertion with index      100.449 ms
  log flush                      20.997 ms
  log iteration (memory)          7.299 ms
  log iteration (disk)            7.518 ms
    10.14s user 3.65s system 96% cpu 14.274 total

This is a format-breaking change. Fortunately we haven't really use the old
format in production yet.

Reviewed By: DurhamG, markbt

Differential Revision: D13015463

fbshipit-source-id: 6e7e4f7a845ea8dbf0904b3902740b65cc7467d5
2018-12-06 14:57:52 -08:00
Jun Wu
42c3ef6eb6 indexedlog: add benchmark for "log"
Summary:
Some simple benchmark for "log". The initial result running from my devserver
looks like:

  log insertion                  33.146 ms
  log insertion with index      106.449 ms
  log flush                       9.623 ms
  log iteration (memory)         10.644 ms
  log iteration (disk)           11.517 ms
    13.75s user 3.61s system 97% cpu 17.778 total
  log insertion                  27.906 ms
  log insertion with index      107.683 ms
  log flush                      19.204 ms
  log iteration (memory)         10.239 ms
  log iteration (disk)           11.118 ms
    12.89s user 3.55s system 97% cpu 16.924 total
  log insertion                  31.645 ms
  log insertion with index      109.403 ms
  log flush                       9.416 ms
  log iteration (memory)         10.226 ms
  log iteration (disk)           10.757 ms
    13.07s user 3.02s system 97% cpu 16.423 total
  log insertion                  31.848 ms
  log insertion with index      109.332 ms
  log flush                      18.345 ms
  log iteration (memory)         10.709 ms
  log iteration (disk)           11.346 ms
    13.12s user 3.70s system 97% cpu 17.276 total
  log insertion                  29.665 ms
  log insertion with index      106.041 ms
  log flush                      16.159 ms
  log iteration (memory)         10.367 ms
  log iteration (disk)           11.110 ms
    12.99s user 3.27s system 97% cpu 16.717 total

Reviewed By: markbt

Differential Revision: D13015464

fbshipit-source-id: 035fee6c8b6d0bea4cfe194eed3d58ba4b5ebcb8
2018-12-06 14:57:52 -08:00
Durham Goode
1a3a0bcd72 nodemap: add key iteration
Summary:
An upcoming diff will need the ability to iterate over all the keys in
the store. So let's expose that functionality.

Reviewed By: quark-zju

Differential Revision: D13062575

fbshipit-source-id: a173fcdbbf44e2d3f09f7229266cca6f3e67944b
2018-12-06 11:47:41 -08:00
Durham Goode
668ba5165c indexedlog: add an iterator function for iterating over keys
Summary:
You can currently iterate over indexlog entries, but there's no way to
iterate over the keys without keeping a copy of the index function with you.
Let's add a key iterator function.

Reviewed By: quark-zju

Differential Revision: D13010744

fbshipit-source-id: 1fcaf959ae82417e5cbafae7c1927c3ae8f8e76a
2018-12-06 11:47:41 -08:00
Haozhun Jin
461dabad96 bookmark: Turn BookmarkStore into indexed-log backed
Summary:
Turn BookmarkStore rust implementation into indexed-log backed.
Note that this no longer matches existing mercurial bookmark store
disk representation.

Reviewed By: DurhamG

Differential Revision: D13133605

fbshipit-source-id: 2e0a27738bcec607892b0edab6f759116929c8e1
2018-11-28 10:21:26 -08:00
Jun Wu
616306543b codemod: use explicit versions in Cargo.toml
Summary:
This is done by running `fix-code.py`. Note that those strings are
semvers so they do not pin down the exact version. An API-compatiable upgrade
is still possible.

Reviewed By: ikostia

Differential Revision: D10213073

fbshipit-source-id: 82f90766fb7e02cdeb6615ae3cb7212d928ed48d
2018-11-15 18:54:06 -08:00
Jun Wu
647f7dfb8e indexedlog: fix misc benchmark
Summary:
The "misc" benchmark requires the base16 module to be public. It was made
private in a previous change. Let's make it public again so the benchmark can
run.

Reviewed By: singhsrb

Differential Revision: D13015031

fbshipit-source-id: 0dc1542803aae290de26651e367898eebfc95e83
2018-11-09 20:49:56 -08:00
Jun Wu
61790b12a9 indexedlog: make it Send
Summary: It needs to be Send to be used in cpython.

Reviewed By: ikostia

Differential Revision: D10250289

fbshipit-source-id: ea57e356a0752764e50db9b6872b5cc4a456303f
2018-10-29 21:02:41 -07:00
Jun Wu
840d242822 indexedlog: revise docs for the index module
Summary:
Make it more detailed for public APIs. Hide too detailed information (file
format).

Reviewed By: DurhamG

Differential Revision: D10250140

fbshipit-source-id: d9d9af9d67984b80f07db13e69bbffdf77e6a30e
2018-10-29 21:02:41 -07:00
Jun Wu
23e41f98a4 indexedlog: revise checksum_table documentation
Summary: Revise ChecksumTable documentation so it's more detailed and accurate.

Reviewed By: DurhamG

Differential Revision: D10250142

fbshipit-source-id: bff89877fb9a65a305e8d8636a200d50c7e2d548
2018-10-29 21:02:41 -07:00
Jun Wu
ecc14e0860 indexedlog: update public documentation for the log module
Summary:
The log module is the "entry point" of other features. Update it so things are
more detailed. I tried to make it more friendly for people without knowledge
about the implementation details.

This could probably be further improved by adding some examples. For now, I'm
focusing on the plain English parts.

To reviewers: Let me know how you feel reading it assuming no prior knowledge
with the implementation. Ways to make sentences shorter, natural to native
speakers without losing important information are also very welcome.

Reviewed By: DurhamG

Differential Revision: D10250141

fbshipit-source-id: 35258c7197c1ce0a1d3d0554fab2f2d2866e123c
2018-10-29 21:02:41 -07:00
Jun Wu
67ff256aa2 indexedlog: revise crate-level document and visibility of modules
Summary:
Make important modules public. Make internal utility (base16) private.  Add
some text to the crate-level document. It just refers to important structures.
Will revise document of those structures.

Reviewed By: DurhamG, kulshrax

Differential Revision: D10250143

fbshipit-source-id: c79859ee7d3d9cc4ee9a093ef5d12ec6599f2a42
2018-10-29 21:02:41 -07:00
Jun Wu
3adc813687 codemod: add copyright headers
Summary: This is just the result of running `./contrib/fix-code.py $(hg files .)`

Reviewed By: ikostia

Differential Revision: D10213075

fbshipit-source-id: 88577c9b9588a5b44fcf1fe6f0082815dfeb363a
2018-10-26 15:09:12 -07:00
Jun Wu
100c360e54 indexedlog: mark block as non-code
Summary:
The code block is not a valid Rust program. Mark it as "plain".
This fixes `cargo doc`.

Reviewed By: markbt

Differential Revision: D10137806

fbshipit-source-id: 1197d3a2ebc1450a0738686fa6cfa7c7b79dcb0d
2018-10-03 18:19:27 -07:00
Jun Wu
9e8f7613fb indexedlog: detect index corruption
Summary:
The primary log and indexes could be out of sync when mutating the indexes
error out. In that case, mark the indexes as "corrupted" and refuse to
perform index read (lookup) operations, for correctness.

Reviewed By: DurhamG

Differential Revision: D8337689

fbshipit-source-id: 3db9006ea03cfcaba52391f189aa697944b616e5
2018-07-09 14:37:27 -07:00
Jun Wu
9714887f14 indexedlog: add a test about swapping indexes
Summary:
This demonstrates the index definitions can have different orders, as long
as their names do not change, things still work.

Reviewed By: DurhamG

Differential Revision: D8337688

fbshipit-source-id: 2fbbdf711d8edc10fc6d3314532390ea712aca6c
2018-07-09 14:37:26 -07:00
Jun Wu
fdcf835ec4 indexedlog: log: add a test about index lookup
Summary: The test tries to cover interesting variants.

Reviewed By: DurhamG

Differential Revision: D8156520

fbshipit-source-id: b739d1dfcecf8bfa5b23671a83c7f314a021007b
2018-07-09 14:37:26 -07:00
Jun Wu
7a5291ee43 indexedlog: log: add LogLookupIter.into_vec
Summary: This is handy to use.

Reviewed By: DurhamG

Differential Revision: D8156517

fbshipit-source-id: 63aa836bf469de2ad55237dea02b9d0ca28fa3ce
2018-07-09 14:37:26 -07:00
Jun Wu
ee638e6de4 indexedlog: log: implement flush
Summary: Completes the interface.

Reviewed By: DurhamG

Differential Revision: D8156511

fbshipit-source-id: 0d4d05aa23c47117da70ec47cf9be3d4fe41df7b
2018-07-09 14:37:26 -07:00
Jun Wu
119b479c9e indexedlog: log: implement index updating logic
Reviewed By: DurhamG

Differential Revision: D8156519

fbshipit-source-id: eb82e7547d10c7b839e757fa787f91950dea181e
2018-06-11 19:36:16 -07:00
Jun Wu
365c728134 indexedlog: index: add metadata to the root node
Summary:
This allows us to store arbitrary metadata in the root node. It will be used
by the `Log` structure to store how many bytes the index covers.

Reviewed By: DurhamG

Differential Revision: D8337687

fbshipit-source-id: 159a89d66765fc251a486fd62c1ffd01f625b503
2018-06-11 19:36:16 -07:00
Jun Wu
0b92632004 indexedlog: log: implement log loading functions
Summary: Implement the dependencies of the "open" public API.

Reviewed By: DurhamG

Differential Revision: D8156518

fbshipit-source-id: 9fed441f520a3b74cbef5bfb815c82943c615fdf
2018-06-11 19:36:16 -07:00
Jun Wu
77d75acbdd indexedlog: log: implement the iterators
Summary: Implement `LogLookupIter`, and `LogIter` for fetching data.

Reviewed By: DurhamG

Differential Revision: D8156521

fbshipit-source-id: 5ef2b2e6475d41ae7468e79b4a1234619decf75f
2018-06-11 19:36:15 -07:00
Jun Wu
8c3a69a56e indexedlog: log: implement internal read_entry function
Summary:
The read_entry function takes care of reading an entry from a given offset,
and return internal stats like real data offset (skipping the length and
checksum metadata), and the next entry offset.

It does integrity check and handles offset for both in-memory and on-disk
buffers. The offsets to in-memory entries are fairly simple - they start
from "meta.primary_len" instead of a fixed reserved value. This makes the
"next_offset" work seamlessly.

The public API won't have "offset" exposed, so the API is private.

Reviewed By: DurhamG

Differential Revision: D8156513

fbshipit-source-id: 8661f2f2757de6f3f94defc64f4a8dd5261973b2
2018-06-11 19:36:15 -07:00
Jun Wu
991a9343b9 indexedlog: log: partially implement main APIs
Summary:
Partially implement open, append, flush, lookup APIs. This shows how things
work in general, like how locking works. What's in-memory and what's on-disk
etc.

Reviewed By: DurhamG

Differential Revision: D8156514

fbshipit-source-id: 2de23dcde2f63895f3f3e4f67057aa9520fdfa34
2018-06-11 19:36:15 -07:00
Jun Wu
529c79bd33 indexedlog: log: implement serialization for the meta file
Summary: Implemented as the file format specification added by the previous diff.

Reviewed By: DurhamG

Differential Revision: D8156516

fbshipit-source-id: 7153932b9442b3ab5bdb81490f88c40346128afc
2018-06-11 19:36:15 -07:00
Jun Wu
97281caabf indexedlog: log: define public facing interface
Summary: The public interface and its dependencies.

Reviewed By: DurhamG

Differential Revision: D8156509

fbshipit-source-id: c6f3e4b88851683a5d8804b80f689282e3f582d4
2018-06-11 19:36:15 -07:00
Jun Wu
8ad9276975 indexedlog: log: add comments about the file format
Summary: Start implementing the "Log" object. Let's define the file formats first.

Reviewed By: DurhamG

Differential Revision: D8156515

fbshipit-source-id: 037f7454452959f82583a4d97d3f38dfa60aa741
2018-06-11 19:36:14 -07:00
Jun Wu
c65612acc9 indexedlog: index: stop iteration if an error is encountered
Summary:
Without this change, code doing `index.get(...).values().collect()` might
end up with an infinite loop.

Reviewed By: DurhamG

Differential Revision: D8156510

fbshipit-source-id: 5497aa354de7d49cfc4308a025856608ce981a1e
2018-06-05 00:12:29 -07:00
Jun Wu
798e55d53d indexedlog: index: change APIs to take file lengths instead of root offsets
Summary:
Previously, the index API optionally takes a root offset. This is
inconvenient for the caller since they probably need to record both
valid file length and root offsets. Since root nodes are always at
the end of the index. Let's just simplify the API to take a logical
file length instead of a root offset.

Reviewed By: DurhamG

Differential Revision: D8156512

fbshipit-source-id: 7029272a61c9990e6484bca7ebbff64e2233c6cd
2018-06-05 00:12:29 -07:00
Jun Wu
68660cc443 indexedlog: utils: make mmap_readonly optionally take file length
Summary:
Previously, `mmap_readonly` always reads file length, and uses that for mmap
length. In many cases we do know the desired file length and it's cleaner to
not `mmap` unused bytes. So let's add a parameter to do that.

Note: The `stat` call is still needed. Since `mmap` wouldn't return an error
of the requested length is greater than the file length.

Reviewed By: DurhamG

Differential Revision: D8156523

fbshipit-source-id: 991aa28f3542eaff24387dcc6a7302122fb6962f
2018-06-05 00:12:29 -07:00
Jun Wu
c43312ad9c indexedlog: utils: move xxhash to utils
Summary: The function will be reused in another module.

Reviewed By: DurhamG

Differential Revision: D8156522

fbshipit-source-id: 2aff6f2e4b8fc9b5d2c000e12ac2d940f7fab407
2018-06-05 00:12:29 -07:00
Jun Wu
7b9867ac12 crates: pin rand to 0.4 version
Summary:
`rand` 0.5 has too many breaking changes that the code is not ready to
migrate yet. So let's ping rand to 0.4. Ideally all dependencies in
Cargo.toml should avoid using "*". But for now `rand` is the only
troublemaker.

Note `rand 0.4` is a dependency of `quickcheck 0.6.2` so it's available.

Reviewed By: phillco, singhsrb

Differential Revision: D8158406

fbshipit-source-id: 417ae6807a2efc650acb8d82370964fab6531fdb
2018-05-25 09:51:19 -07:00
Jun Wu
40a88364be indexedlog: replace div with shr to make checksum faster
Summary:
Spot `div` slowness using Linux's `perf` tool.

        |    Disassembly of section .text:
        |
        |    0000000000018990 <indexedlog::checksum_table::ChecksumTable::check_range>:
        |    _ZN10indexedlog14checksum_table13ChecksumTable11check_range17h2303c96b1e035e20E():
   1.36 |      push   %rax
   0.18 |      mov    %rdx,%r8
   0.54 |      mov    $0x1,%cl
        |      test   %r8,%r8
        |      je     60
   0.54 |      add    %rsi,%r8
   0.72 |      cmp    0x30(%rdi),%r8
        |      ja     64
   0.27 |      mov    0x28(%rdi),%r9
   0.27 |      test   %r9,%r9
        |      je     6a
   0.36 |      add    $0xffffffffffffffff,%r8
   0.18 |      xor    %edx,%edx
   0.45 |      mov    %rsi,%rax
   0.36 |      div    %r9
  43.72 |      mov    %rax,%rsi
        |      xor    %edx,%edx
        |      mov    %r8,%rax
   0.18 |      div    %r9
  42.82 |      add    $0x1,%rax
   0.09 |      cmp    %rax,%rsi
        |      jae    60
   2.17 |      cmpq   $0x0,0x60(%rdi)
        |      je     78
        |      mov    0x50(%rdi),%rcx
        |      cmpb   $0x0,(%rcx)
   1.63 |      sete   %cl
   0.18 |      xchg   %ax,%ax
        |50:   test   $0x1,%cl
        |      je     64
   0.45 |      add    $0x1,%rsi
   0.81 |      mov    $0x1,%cl
   0.09 |      cmp    %rax,%rsi
        |      jb     50
        |60:   mov    %ecx,%eax
        |      pop    %rcx
   2.62 |      retq
        |64:   xor    %ecx,%ecx
        |      mov    %ecx,%eax
        |      pop    %rcx
        |      retq
        |6a:   lea    panic_loc.a.llvm.9800112514578621117,%rdi
        |      callq  core::panicking::panic
        |      ud2
        |78:   lea    panic_bounds_check_loc.7.llvm.9800112514578621117,%rdi
        |      xor    %esi,%esi
        |      xor    %edx,%edx
        |      callq  core::panicking::panic_bounds_check
        |      ud2

Change `chunk_size` to `chunk_size_log`. Replace `div` with `shr` to make it
significantly faster:

Before:

  index lookup (memory)           1.118 ms
  index lookup (disk, no verify)  2.078 ms
  index lookup (disk, verified)   7.687 ms

After:

  index lookup (memory)           1.066 ms
  index lookup (disk, no verify)  1.992 ms
  index lookup (disk, verified)   3.591 ms

Reviewed By: DurhamG, markbt

Differential Revision: D7554992

fbshipit-source-id: c24189ced722d880af6ca0d64967eb762363d9e3
2018-04-17 18:54:39 -07:00
Jun Wu
f25c152d01 indexedlog: add a test about checksum
Summary:
Add a test that bitflips the index content, and make sure reading the index
would trigger an error.

Due to run-time performance difference, the release version tests 2-byte key
while the debug version only tests 1-byte key.

The header byte was not verified. Now it is verified.

Reviewed By: DurhamG

Differential Revision: D7517134

fbshipit-source-id: b3d8665ff4ac08c1a70db8d21122ba241913a2ed
2018-04-17 18:54:39 -07:00
Jun Wu
9ce455769c indexedlog: avoid writing unused entries due to leaf split
Summary:
In "split_leaf" "Example 3" case, the old leaf entry (and its key) becomes
unused. Writing them to disk is unnecessary. This patch adds "unused" marker
so they could be marked and skipped inside flush().

No visible performance change:

  index insertion                 3.710 ms
  index flush                     3.717 ms
  index lookup (memory)           1.128 ms
  index lookup (disk, no verify)  1.993 ms
  index lookup (disk, verified)   7.866 ms

Reviewed By: DurhamG

Differential Revision: D7517139

fbshipit-source-id: 253c878bc4b3762382c424777dfa779b3868e851
2018-04-17 18:54:38 -07:00
Jun Wu
ac52e4a6fb indexedlog: add a test against std hashmap for multi-values
Summary: Since we now have the ability to store multiple values. Add a test.

Reviewed By: DurhamG

Differential Revision: D7472880

fbshipit-source-id: 85b1c69245ac7f0c4702daf22a02f5e5072f0924
2018-04-13 21:51:46 -07:00
Jun Wu
de74642bc7 indexedlog: implement value iterator
Summary:
The value type is a linked list of u64 integers. Add an API to expose that.

Using iterator framework has benefits about flexibility - the caller can
take the first value, or convert it to a vector, or count the values, etc.
easily.

Reviewed By: DurhamG

Differential Revision: D7472881

fbshipit-source-id: d31e81770e069734b54fa08729c0cd45a699aae2
2018-04-13 21:51:46 -07:00