sapling/lib/indexedlog
Jun Wu a4958163ee indexedlog: optimize size of radix entries (BC)
Summary:
Replace the 20-byte "jump table" with 3-byte "flag + bitmap". This saves space
for indexes less than 4GB. There are some reserved bits in the "flag" so if we
run into space issues when indexes are larger than 4GB, we can try adding
6-byte integer, or VLQ back without breaking backwards-compatibility.

It seems to hurt flush performance a bit, because we have to scan the child
array twice. However, lookup (the most important performance) does not change
much. And the index is more compact.

After:

  index flush                    19.644 ms
  index lookup (disk, no verify)  2.220 ms
  index lookup (disk, verified)   4.067 ms
  index size (5M owned keys)     216626039 bytes
  index size (5M referred keys)  136245815 bytes

Before:

  index flush                    16.764 ms
  index lookup (disk, no verify)  2.205 ms
  index lookup (disk, verified)   4.030 ms
  index size (5M owned keys)     240838647 bytes
  index size (5M referred keys)  160458423 bytes

For the "referred key" case, it's 160->136MB, 17% decrease.

A detailed break down of components of index is:

After:

  type       count     bytes (using owned keys)
  radixes    1729472   33835772
  links      5000000   27886336
  leafs      5000000   44629384
  keys       5000000  110000000

  type       count     bytes (using referred keys)
  radixes    1729472   33835772
  links      5000000   27886336
  leafs      5000000   44629384
  ext_keys   5000000   29894315

Before:

  type       count     bytes (using owned keys)
  radixes    1729472   58048380
  links      5000000   27886336
  leafs      5000000   44903923
  keys       5000000  110000000

  type       count     bytes (using referred keys)
  radixes    1729472   58048380
  links      5000000   27886336
  leafs      5000000   44629384
  ext_keys   5000000   29894315

Leaf nodes are taking too much space. It seems the next big optimization might
be inlining ext_keys into leafs.

Reviewed By: DurhamG, markbt

Differential Revision: D13028196

fbshipit-source-id: 6043b16fd67a497eb52d20a17e153fcba5cb3e81
2018-12-06 14:57:52 -08:00
..
benches indexedlog: increase key count for size test 2018-12-06 14:57:52 -08:00
src indexedlog: optimize size of radix entries (BC) 2018-12-06 14:57:52 -08:00
Cargo.toml indexedlog: add benchmark for "log" 2018-12-06 14:57:52 -08:00