sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-12 17:58:27 +03:00

Author	SHA1	Message	Date
Jun Wu	b3893b3d3c	indexedlog: add methods on Log to do prefix lookups Summary: This exposes the underlying lookup functions from `Index`. Alternatively we can allow access to `Index` and provide an `iter_started_from` method on `Log` which takes a raw offset. I have been trying to avoid exposing raw offsets in public interfaces, as they would change after `flush()` and cause problems. Reviewed By: markbt Differential Revision: D13498303 fbshipit-source-id: 8b00a2a36a9383e3edb6fd7495a005bc985fd461	2018-12-20 15:50:55 -08:00
Jun Wu	3237b77e4c	indexedlog: add APIs to lookup by prefix Summary: This is the missing API before `indexedlog::Index` can fit in the `changelog.partialmatch` case. It's actually more flexible as it can provide some example commit hashes while the existing revlog.c or radixbuf implementation just error out saying "ambiguous prefix". It can be also "abused" for the semantics of sorted "sub-keys". By replace "key" with "key + subkey" when inserting to the index. Looking up using "key" would return a lazy result list (`PrefixIter`) sorted by "subkey". Note: the radix tree is NOT efficient (both in time and space) when there are common prefixes. So this use-case needs to be careful. Reviewed By: markbt Differential Revision: D13498301 fbshipit-source-id: 637856ebd761734d68b20c15866424b1d4518ad6	2018-12-20 15:50:55 -08:00
Jun Wu	562b7a1704	indexedlog: add a function to convert base16 to base256 Summary: This will be used in prefix lookups. Reviewed By: markbt Differential Revision: D13498300 fbshipit-source-id: 3db7a21d6f35a18699d9dc3a0eca71a5410e0e61	2018-12-20 15:50:55 -08:00
Jun Wu	443a8f33b3	indexedlog: move binary indexedlog_dump out Summary: It makes testing duplicated - now `cargo test` would try running tests on 2 entry points: lib.rs and indexedlog_dump.rs. Move it to a separate crate to solve the issue. Reviewed By: markbt Differential Revision: D13498266 fbshipit-source-id: 8abf07c1272dfa825ec7701fd8ea9e0d1310ec5f	2018-12-18 08:17:21 -08:00
Jun Wu	61b1a5f475	indexedlog: fix rustc warnings Summary: `write!` result needs to be used. Reviewed By: markbt Differential Revision: D13471967 fbshipit-source-id: d48752bcac05dd33b112679d7faf990eb8ddd651	2018-12-17 12:10:52 -08:00
Jun Wu	421c7b3f45	indexedlog: add a tool to dump indexedlog content Summary: The tool can dump indexedlog content. Useful for manually investigating issues. Reviewed By: DurhamG Differential Revision: D13051387 fbshipit-source-id: 8687a1aa9dfb54776e80f184208c49da2492c34d	2018-12-06 14:57:52 -08:00
Jun Wu	54dc931140	indexedlog: use inlined leaf entries to further reduce index size Summary: Add a new entry type - INLINE_LEAF, which embeds the EXT_KEY and LINK entries to save space. The index size for referred keys is significantly reduced with little overhead: index insertion (owned key) 3.732 ms index insertion (referred key) 3.604 ms index flush 11.868 ms index lookup (memory) 1.159 ms index lookup (disk, no verify) 2.175 ms index lookup (disk, verified) 4.303 ms index size (5M owned keys) 216626039 index size (5M referred keys) 96616431 11.87s user 2.96s system 98% cpu 15.107 total The breakdown of the "5M referred keys" size is: type count bytes radixes 1729472 33835772 inline_leafs 5000000 62780651 There are no other kinds of entries stored. Previously, the index size of referred keys is: index size (5M referred keys) 136245815 bytes So it's 136MB -> 96MB, 40% decrease. Reviewed By: DurhamG Differential Revision: D13036801 fbshipit-source-id: 27e68e4b6c332c1dc419abc6aba69271952e4b3d	2018-12-06 14:57:52 -08:00
Jun Wu	a4958163ee	indexedlog: optimize size of radix entries (BC) Summary: Replace the 20-byte "jump table" with 3-byte "flag + bitmap". This saves space for indexes less than 4GB. There are some reserved bits in the "flag" so if we run into space issues when indexes are larger than 4GB, we can try adding 6-byte integer, or VLQ back without breaking backwards-compatibility. It seems to hurt flush performance a bit, because we have to scan the child array twice. However, lookup (the most important performance) does not change much. And the index is more compact. After: index flush 19.644 ms index lookup (disk, no verify) 2.220 ms index lookup (disk, verified) 4.067 ms index size (5M owned keys) 216626039 bytes index size (5M referred keys) 136245815 bytes Before: index flush 16.764 ms index lookup (disk, no verify) 2.205 ms index lookup (disk, verified) 4.030 ms index size (5M owned keys) 240838647 bytes index size (5M referred keys) 160458423 bytes For the "referred key" case, it's 160->136MB, 17% decrease. A detailed break down of components of index is: After: type count bytes (using owned keys) radixes 1729472 33835772 links 5000000 27886336 leafs 5000000 44629384 keys 5000000 110000000 type count bytes (using referred keys) radixes 1729472 33835772 links 5000000 27886336 leafs 5000000 44629384 ext_keys 5000000 29894315 Before: type count bytes (using owned keys) radixes 1729472 58048380 links 5000000 27886336 leafs 5000000 44903923 keys 5000000 110000000 type count bytes (using referred keys) radixes 1729472 58048380 links 5000000 27886336 leafs 5000000 44629384 ext_keys 5000000 29894315 Leaf nodes are taking too much space. It seems the next big optimization might be inlining ext_keys into leafs. Reviewed By: DurhamG, markbt Differential Revision: D13028196 fbshipit-source-id: 6043b16fd67a497eb52d20a17e153fcba5cb3e81	2018-12-06 14:57:52 -08:00
Jun Wu	d8117b3b04	indexedlog: increase key count for size test Summary: Since the size test only runs once, we can use a larger number of keys. This is closer to some production use-cases. `cargo bench size` shows: index size (5M owned keys) 240838647 index size (5M referred keys) 160458423 It currently uses 32 bytes per key for 5M referred keys. Reviewed By: markbt Differential Revision: D13027880 fbshipit-source-id: 726f5fb2da056e77ab93d82fda9f1afa500d0a8d	2018-12-06 14:57:52 -08:00
Jun Wu	55b6331aa4	indexedlog: add more benchmarks Summary: Add benchmarks about index sizes, and a benchmark of insertion using key references. An example `cargo bench` result running on my devserver looks like: index insertion (owned key) 3.551 ms index insertion (referred key) 3.713 ms index flush 20.648 ms index lookup (memory) 1.087 ms index lookup (disk, no verify) 2.041 ms index lookup (disk, verified) 4.347 ms index size (owned key) 886010 index size (referred key) 534298 Reviewed By: markbt Differential Revision: D13027879 fbshipit-source-id: 70644c504026ffee2122d857d5035f5b7eea4f42	2018-12-06 14:57:52 -08:00
Jun Wu	d7129256d4	indexedlog: switch checksum table to little endian (BC) Summary: For checksum values like xxhash, there is no benefit using big endian. Switch to little endian so it's slightly slightly faster on the major platforms we care about. This is a breaking change. However, the format is not used in production yet. So there is no migration code. Reviewed By: markbt Differential Revision: D13015465 fbshipit-source-id: ca83d19b3328370d089b03a33e848e64b728ef2a	2018-12-06 14:57:52 -08:00
Jun Wu	75b4f92c44	indexedlog: support different checksum functions for Log entries (BC) Summary: Previously, the format of an Log entry is hard-coded - length, xxhash, and content. The xxhash always takes 8 bytes. For small (ex. 40-byte) entries, xxhash32 is actually faster and takes less disk space. Introduce the "entry flags" concept so we can store some metadata about what checksum function to use. The concept could be potentially used to support other new format changes at per entry level in the future. As we're here, also support data without checksums. That can be useful for content with its own checksum, like a blob store with its own SHA1 integrity check. Performance-wise, log insertion is slower (but the majority insertaion overhead would be on the index part), iteration is a little bit faster, perhaps because the log can use less data. Before: log insertion 15.874 ms log iteration (memory) 6.778 ms log iteration (disk) 6.830 ms After: log insertion 18.114 ms log iteration (memory) 6.403 ms log iteration (disk) 6.307 ms Reviewed By: DurhamG, markbt Differential Revision: D13051386 fbshipit-source-id: 629c251633ecf85058ee7c3ce7a9f576dfac7bdf	2018-12-06 14:57:52 -08:00
Jun Wu	049cd99f05	indexedlog: use non-VLQ encoding for xxhash (BC) Summary: Xxhash result won't usually have leading zeros. So VLQ encoding is not an efficient choice. Use non-VLQ encoding instead. Performance wise, this is noticably faster than before: log insertion 14.161 ms log insertion with index 102.724 ms log flush 11.336 ms log iteration (memory) 6.351 ms log iteration (disk) 7.922 ms 10.18s user 3.66s system 97% cpu 14.218 total log insertion 13.377 ms log insertion with index 97.422 ms log flush 11.792 ms log iteration (memory) 6.890 ms log iteration (disk) 7.139 ms 10.20s user 3.56s system 97% cpu 14.117 total log insertion 14.573 ms log insertion with index 94.216 ms log flush 18.993 ms log iteration (memory) 7.867 ms log iteration (disk) 7.567 ms 9.85s user 3.73s system 96% cpu 14.073 total log insertion 15.526 ms log insertion with index 98.868 ms log flush 19.600 ms log iteration (memory) 7.533 ms log iteration (disk) 7.150 ms 10.13s user 4.02s system 96% cpu 14.647 total log insertion 14.629 ms log insertion with index 100.449 ms log flush 20.997 ms log iteration (memory) 7.299 ms log iteration (disk) 7.518 ms 10.14s user 3.65s system 96% cpu 14.274 total This is a format-breaking change. Fortunately we haven't really use the old format in production yet. Reviewed By: DurhamG, markbt Differential Revision: D13015463 fbshipit-source-id: 6e7e4f7a845ea8dbf0904b3902740b65cc7467d5	2018-12-06 14:57:52 -08:00
Jun Wu	42c3ef6eb6	indexedlog: add benchmark for "log" Summary: Some simple benchmark for "log". The initial result running from my devserver looks like: log insertion 33.146 ms log insertion with index 106.449 ms log flush 9.623 ms log iteration (memory) 10.644 ms log iteration (disk) 11.517 ms 13.75s user 3.61s system 97% cpu 17.778 total log insertion 27.906 ms log insertion with index 107.683 ms log flush 19.204 ms log iteration (memory) 10.239 ms log iteration (disk) 11.118 ms 12.89s user 3.55s system 97% cpu 16.924 total log insertion 31.645 ms log insertion with index 109.403 ms log flush 9.416 ms log iteration (memory) 10.226 ms log iteration (disk) 10.757 ms 13.07s user 3.02s system 97% cpu 16.423 total log insertion 31.848 ms log insertion with index 109.332 ms log flush 18.345 ms log iteration (memory) 10.709 ms log iteration (disk) 11.346 ms 13.12s user 3.70s system 97% cpu 17.276 total log insertion 29.665 ms log insertion with index 106.041 ms log flush 16.159 ms log iteration (memory) 10.367 ms log iteration (disk) 11.110 ms 12.99s user 3.27s system 97% cpu 16.717 total Reviewed By: markbt Differential Revision: D13015464 fbshipit-source-id: 035fee6c8b6d0bea4cfe194eed3d58ba4b5ebcb8	2018-12-06 14:57:52 -08:00
Durham Goode	1a3a0bcd72	nodemap: add key iteration Summary: An upcoming diff will need the ability to iterate over all the keys in the store. So let's expose that functionality. Reviewed By: quark-zju Differential Revision: D13062575 fbshipit-source-id: a173fcdbbf44e2d3f09f7229266cca6f3e67944b	2018-12-06 11:47:41 -08:00
Durham Goode	668ba5165c	indexedlog: add an iterator function for iterating over keys Summary: You can currently iterate over indexlog entries, but there's no way to iterate over the keys without keeping a copy of the index function with you. Let's add a key iterator function. Reviewed By: quark-zju Differential Revision: D13010744 fbshipit-source-id: 1fcaf959ae82417e5cbafae7c1927c3ae8f8e76a	2018-12-06 11:47:41 -08:00
Haozhun Jin	461dabad96	bookmark: Turn BookmarkStore into indexed-log backed Summary: Turn BookmarkStore rust implementation into indexed-log backed. Note that this no longer matches existing mercurial bookmark store disk representation. Reviewed By: DurhamG Differential Revision: D13133605 fbshipit-source-id: 2e0a27738bcec607892b0edab6f759116929c8e1	2018-11-28 10:21:26 -08:00
Jun Wu	616306543b	codemod: use explicit versions in Cargo.toml Summary: This is done by running `fix-code.py`. Note that those strings are semvers so they do not pin down the exact version. An API-compatiable upgrade is still possible. Reviewed By: ikostia Differential Revision: D10213073 fbshipit-source-id: 82f90766fb7e02cdeb6615ae3cb7212d928ed48d	2018-11-15 18:54:06 -08:00
Jun Wu	647f7dfb8e	indexedlog: fix misc benchmark Summary: The "misc" benchmark requires the base16 module to be public. It was made private in a previous change. Let's make it public again so the benchmark can run. Reviewed By: singhsrb Differential Revision: D13015031 fbshipit-source-id: 0dc1542803aae290de26651e367898eebfc95e83	2018-11-09 20:49:56 -08:00
Jun Wu	61790b12a9	indexedlog: make it Send Summary: It needs to be Send to be used in cpython. Reviewed By: ikostia Differential Revision: D10250289 fbshipit-source-id: ea57e356a0752764e50db9b6872b5cc4a456303f	2018-10-29 21:02:41 -07:00
Jun Wu	840d242822	indexedlog: revise docs for the index module Summary: Make it more detailed for public APIs. Hide too detailed information (file format). Reviewed By: DurhamG Differential Revision: D10250140 fbshipit-source-id: d9d9af9d67984b80f07db13e69bbffdf77e6a30e	2018-10-29 21:02:41 -07:00
Jun Wu	23e41f98a4	indexedlog: revise checksum_table documentation Summary: Revise ChecksumTable documentation so it's more detailed and accurate. Reviewed By: DurhamG Differential Revision: D10250142 fbshipit-source-id: bff89877fb9a65a305e8d8636a200d50c7e2d548	2018-10-29 21:02:41 -07:00
Jun Wu	ecc14e0860	indexedlog: update public documentation for the log module Summary: The log module is the "entry point" of other features. Update it so things are more detailed. I tried to make it more friendly for people without knowledge about the implementation details. This could probably be further improved by adding some examples. For now, I'm focusing on the plain English parts. To reviewers: Let me know how you feel reading it assuming no prior knowledge with the implementation. Ways to make sentences shorter, natural to native speakers without losing important information are also very welcome. Reviewed By: DurhamG Differential Revision: D10250141 fbshipit-source-id: 35258c7197c1ce0a1d3d0554fab2f2d2866e123c	2018-10-29 21:02:41 -07:00
Jun Wu	67ff256aa2	indexedlog: revise crate-level document and visibility of modules Summary: Make important modules public. Make internal utility (base16) private. Add some text to the crate-level document. It just refers to important structures. Will revise document of those structures. Reviewed By: DurhamG, kulshrax Differential Revision: D10250143 fbshipit-source-id: c79859ee7d3d9cc4ee9a093ef5d12ec6599f2a42	2018-10-29 21:02:41 -07:00
Jun Wu	3adc813687	codemod: add copyright headers Summary: This is just the result of running `./contrib/fix-code.py $(hg files .)` Reviewed By: ikostia Differential Revision: D10213075 fbshipit-source-id: 88577c9b9588a5b44fcf1fe6f0082815dfeb363a	2018-10-26 15:09:12 -07:00
Jun Wu	100c360e54	indexedlog: mark block as non-code Summary: The code block is not a valid Rust program. Mark it as "plain". This fixes `cargo doc`. Reviewed By: markbt Differential Revision: D10137806 fbshipit-source-id: 1197d3a2ebc1450a0738686fa6cfa7c7b79dcb0d	2018-10-03 18:19:27 -07:00
Jun Wu	9e8f7613fb	indexedlog: detect index corruption Summary: The primary log and indexes could be out of sync when mutating the indexes error out. In that case, mark the indexes as "corrupted" and refuse to perform index read (lookup) operations, for correctness. Reviewed By: DurhamG Differential Revision: D8337689 fbshipit-source-id: 3db9006ea03cfcaba52391f189aa697944b616e5	2018-07-09 14:37:27 -07:00
Jun Wu	9714887f14	indexedlog: add a test about swapping indexes Summary: This demonstrates the index definitions can have different orders, as long as their names do not change, things still work. Reviewed By: DurhamG Differential Revision: D8337688 fbshipit-source-id: 2fbbdf711d8edc10fc6d3314532390ea712aca6c	2018-07-09 14:37:26 -07:00
Jun Wu	fdcf835ec4	indexedlog: log: add a test about index lookup Summary: The test tries to cover interesting variants. Reviewed By: DurhamG Differential Revision: D8156520 fbshipit-source-id: b739d1dfcecf8bfa5b23671a83c7f314a021007b	2018-07-09 14:37:26 -07:00
Jun Wu	7a5291ee43	indexedlog: log: add LogLookupIter.into_vec Summary: This is handy to use. Reviewed By: DurhamG Differential Revision: D8156517 fbshipit-source-id: 63aa836bf469de2ad55237dea02b9d0ca28fa3ce	2018-07-09 14:37:26 -07:00
Jun Wu	ee638e6de4	indexedlog: log: implement flush Summary: Completes the interface. Reviewed By: DurhamG Differential Revision: D8156511 fbshipit-source-id: 0d4d05aa23c47117da70ec47cf9be3d4fe41df7b	2018-07-09 14:37:26 -07:00
Jun Wu	119b479c9e	indexedlog: log: implement index updating logic Reviewed By: DurhamG Differential Revision: D8156519 fbshipit-source-id: eb82e7547d10c7b839e757fa787f91950dea181e	2018-06-11 19:36:16 -07:00
Jun Wu	365c728134	indexedlog: index: add metadata to the root node Summary: This allows us to store arbitrary metadata in the root node. It will be used by the `Log` structure to store how many bytes the index covers. Reviewed By: DurhamG Differential Revision: D8337687 fbshipit-source-id: 159a89d66765fc251a486fd62c1ffd01f625b503	2018-06-11 19:36:16 -07:00
Jun Wu	0b92632004	indexedlog: log: implement log loading functions Summary: Implement the dependencies of the "open" public API. Reviewed By: DurhamG Differential Revision: D8156518 fbshipit-source-id: 9fed441f520a3b74cbef5bfb815c82943c615fdf	2018-06-11 19:36:16 -07:00
Jun Wu	77d75acbdd	indexedlog: log: implement the iterators Summary: Implement `LogLookupIter`, and `LogIter` for fetching data. Reviewed By: DurhamG Differential Revision: D8156521 fbshipit-source-id: 5ef2b2e6475d41ae7468e79b4a1234619decf75f	2018-06-11 19:36:15 -07:00
Jun Wu	8c3a69a56e	indexedlog: log: implement internal read_entry function Summary: The read_entry function takes care of reading an entry from a given offset, and return internal stats like real data offset (skipping the length and checksum metadata), and the next entry offset. It does integrity check and handles offset for both in-memory and on-disk buffers. The offsets to in-memory entries are fairly simple - they start from "meta.primary_len" instead of a fixed reserved value. This makes the "next_offset" work seamlessly. The public API won't have "offset" exposed, so the API is private. Reviewed By: DurhamG Differential Revision: D8156513 fbshipit-source-id: 8661f2f2757de6f3f94defc64f4a8dd5261973b2	2018-06-11 19:36:15 -07:00
Jun Wu	991a9343b9	indexedlog: log: partially implement main APIs Summary: Partially implement open, append, flush, lookup APIs. This shows how things work in general, like how locking works. What's in-memory and what's on-disk etc. Reviewed By: DurhamG Differential Revision: D8156514 fbshipit-source-id: 2de23dcde2f63895f3f3e4f67057aa9520fdfa34	2018-06-11 19:36:15 -07:00
Jun Wu	529c79bd33	indexedlog: log: implement serialization for the meta file Summary: Implemented as the file format specification added by the previous diff. Reviewed By: DurhamG Differential Revision: D8156516 fbshipit-source-id: 7153932b9442b3ab5bdb81490f88c40346128afc	2018-06-11 19:36:15 -07:00
Jun Wu	97281caabf	indexedlog: log: define public facing interface Summary: The public interface and its dependencies. Reviewed By: DurhamG Differential Revision: D8156509 fbshipit-source-id: c6f3e4b88851683a5d8804b80f689282e3f582d4	2018-06-11 19:36:15 -07:00
Jun Wu	8ad9276975	indexedlog: log: add comments about the file format Summary: Start implementing the "Log" object. Let's define the file formats first. Reviewed By: DurhamG Differential Revision: D8156515 fbshipit-source-id: 037f7454452959f82583a4d97d3f38dfa60aa741	2018-06-11 19:36:14 -07:00
Jun Wu	c65612acc9	indexedlog: index: stop iteration if an error is encountered Summary: Without this change, code doing `index.get(...).values().collect()` might end up with an infinite loop. Reviewed By: DurhamG Differential Revision: D8156510 fbshipit-source-id: 5497aa354de7d49cfc4308a025856608ce981a1e	2018-06-05 00:12:29 -07:00
Jun Wu	798e55d53d	indexedlog: index: change APIs to take file lengths instead of root offsets Summary: Previously, the index API optionally takes a root offset. This is inconvenient for the caller since they probably need to record both valid file length and root offsets. Since root nodes are always at the end of the index. Let's just simplify the API to take a logical file length instead of a root offset. Reviewed By: DurhamG Differential Revision: D8156512 fbshipit-source-id: 7029272a61c9990e6484bca7ebbff64e2233c6cd	2018-06-05 00:12:29 -07:00
Jun Wu	68660cc443	indexedlog: utils: make `mmap_readonly` optionally take file length Summary: Previously, `mmap_readonly` always reads file length, and uses that for mmap length. In many cases we do know the desired file length and it's cleaner to not `mmap` unused bytes. So let's add a parameter to do that. Note: The `stat` call is still needed. Since `mmap` wouldn't return an error of the requested length is greater than the file length. Reviewed By: DurhamG Differential Revision: D8156523 fbshipit-source-id: 991aa28f3542eaff24387dcc6a7302122fb6962f	2018-06-05 00:12:29 -07:00
Jun Wu	c43312ad9c	indexedlog: utils: move xxhash to utils Summary: The function will be reused in another module. Reviewed By: DurhamG Differential Revision: D8156522 fbshipit-source-id: 2aff6f2e4b8fc9b5d2c000e12ac2d940f7fab407	2018-06-05 00:12:29 -07:00
Jun Wu	7b9867ac12	crates: pin rand to 0.4 version Summary: `rand` 0.5 has too many breaking changes that the code is not ready to migrate yet. So let's ping rand to 0.4. Ideally all dependencies in Cargo.toml should avoid using "*". But for now `rand` is the only troublemaker. Note `rand 0.4` is a dependency of `quickcheck 0.6.2` so it's available. Reviewed By: phillco, singhsrb Differential Revision: D8158406 fbshipit-source-id: 417ae6807a2efc650acb8d82370964fab6531fdb	2018-05-25 09:51:19 -07:00
Jun Wu	40a88364be	indexedlog: replace `div` with `shr` to make checksum faster Summary: Spot `div` slowness using Linux's `perf` tool. \| Disassembly of section .text: \| \| 0000000000018990 <indexedlog::checksum_table::ChecksumTable::check_range>: \| _ZN10indexedlog14checksum_table13ChecksumTable11check_range17h2303c96b1e035e20E(): 1.36 \| push %rax 0.18 \| mov %rdx,%r8 0.54 \| mov $0x1,%cl \| test %r8,%r8 \| je 60 0.54 \| add %rsi,%r8 0.72 \| cmp 0x30(%rdi),%r8 \| ja 64 0.27 \| mov 0x28(%rdi),%r9 0.27 \| test %r9,%r9 \| je 6a 0.36 \| add $0xffffffffffffffff,%r8 0.18 \| xor %edx,%edx 0.45 \| mov %rsi,%rax 0.36 \| div %r9 43.72 \| mov %rax,%rsi \| xor %edx,%edx \| mov %r8,%rax 0.18 \| div %r9 42.82 \| add $0x1,%rax 0.09 \| cmp %rax,%rsi \| jae 60 2.17 \| cmpq $0x0,0x60(%rdi) \| je 78 \| mov 0x50(%rdi),%rcx \| cmpb $0x0,(%rcx) 1.63 \| sete %cl 0.18 \| xchg %ax,%ax \|50: test $0x1,%cl \| je 64 0.45 \| add $0x1,%rsi 0.81 \| mov $0x1,%cl 0.09 \| cmp %rax,%rsi \| jb 50 \|60: mov %ecx,%eax \| pop %rcx 2.62 \| retq \|64: xor %ecx,%ecx \| mov %ecx,%eax \| pop %rcx \| retq \|6a: lea panic_loc.a.llvm.9800112514578621117,%rdi \| callq core::panicking::panic \| ud2 \|78: lea panic_bounds_check_loc.7.llvm.9800112514578621117,%rdi \| xor %esi,%esi \| xor %edx,%edx \| callq core::panicking::panic_bounds_check \| ud2 Change `chunk_size` to `chunk_size_log`. Replace `div` with `shr` to make it significantly faster: Before: index lookup (memory) 1.118 ms index lookup (disk, no verify) 2.078 ms index lookup (disk, verified) 7.687 ms After: index lookup (memory) 1.066 ms index lookup (disk, no verify) 1.992 ms index lookup (disk, verified) 3.591 ms Reviewed By: DurhamG, markbt Differential Revision: D7554992 fbshipit-source-id: c24189ced722d880af6ca0d64967eb762363d9e3	2018-04-17 18:54:39 -07:00
Jun Wu	f25c152d01	indexedlog: add a test about checksum Summary: Add a test that bitflips the index content, and make sure reading the index would trigger an error. Due to run-time performance difference, the release version tests 2-byte key while the debug version only tests 1-byte key. The header byte was not verified. Now it is verified. Reviewed By: DurhamG Differential Revision: D7517134 fbshipit-source-id: b3d8665ff4ac08c1a70db8d21122ba241913a2ed	2018-04-17 18:54:39 -07:00
Jun Wu	9ce455769c	indexedlog: avoid writing unused entries due to leaf split Summary: In "split_leaf" "Example 3" case, the old leaf entry (and its key) becomes unused. Writing them to disk is unnecessary. This patch adds "unused" marker so they could be marked and skipped inside flush(). No visible performance change: index insertion 3.710 ms index flush 3.717 ms index lookup (memory) 1.128 ms index lookup (disk, no verify) 1.993 ms index lookup (disk, verified) 7.866 ms Reviewed By: DurhamG Differential Revision: D7517139 fbshipit-source-id: 253c878bc4b3762382c424777dfa779b3868e851	2018-04-17 18:54:38 -07:00
Jun Wu	ac52e4a6fb	indexedlog: add a test against std hashmap for multi-values Summary: Since we now have the ability to store multiple values. Add a test. Reviewed By: DurhamG Differential Revision: D7472880 fbshipit-source-id: 85b1c69245ac7f0c4702daf22a02f5e5072f0924	2018-04-13 21:51:46 -07:00
Jun Wu	de74642bc7	indexedlog: implement value iterator Summary: The value type is a linked list of u64 integers. Add an API to expose that. Using iterator framework has benefits about flexibility - the caller can take the first value, or convert it to a vector, or count the values, etc. easily. Reviewed By: DurhamG Differential Revision: D7472881 fbshipit-source-id: d31e81770e069734b54fa08729c0cd45a699aae2	2018-04-13 21:51:46 -07:00

1 2 3

103 Commits