mirror of
https://github.com/facebook/sapling.git
synced 2024-12-29 08:02:24 +03:00
slides: add indexedlog slide
Summary: Let's commit in technical slides for future reference. Reviewed By: singhsrb Differential Revision: D9300269 fbshipit-source-id: 8625a846c1a11f9f0d892d1743ae5e947cbe647b
This commit is contained in:
parent
bfdb697078
commit
02357010b2
BIN
slides/201808-indexedlog/1.jpg
Normal file
BIN
slides/201808-indexedlog/1.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 464 KiB |
298
slides/201808-indexedlog/indexedlog.md
Normal file
298
slides/201808-indexedlog/indexedlog.md
Normal file
@ -0,0 +1,298 @@
|
||||
|
||||
# Indexed Log
|
||||
|
||||
<!-- animation: true -->
|
||||
|
||||
---
|
||||
|
||||
# Agenda
|
||||
|
||||
- Problems
|
||||
- Indexed Log
|
||||
|
||||
---
|
||||
|
||||
# Revlog
|
||||
|
||||
- 2 revlog files per file to track
|
||||
- Delta-ed
|
||||
|
||||
Current usage
|
||||
|
||||
- Client: Changelog
|
||||
- Server: Everything (Changelog, manifest, filelog)
|
||||
- hgsql enforced
|
||||
|
||||
---
|
||||
|
||||
# Revlog
|
||||
|
||||
The single data structure powering most of the vanilla Mercurial.
|
||||
|
||||
```bob,scale=0.8
|
||||
.i | .d
|
||||
+------------------+ | +-------------------+
|
||||
| rev 0 metadata | -- points to -> | rev 0 full text |
|
||||
+------------------+ | +--------------+----+
|
||||
| rev 1 metadata | -- points to -> | rev 1 delta |
|
||||
+------------------+ | +--------------+-+
|
||||
| rev 2 metadata | -- points to -> | rev 2 delta |
|
||||
+------------------+ | +----------------+
|
||||
|
|
||||
|<--- 64 bytes --->| | |<- variant sized ->|
|
||||
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# Revlog
|
||||
|
||||
- $O(1)$ lookup by *Revision Numbers*
|
||||
- $O(1)$ insertion
|
||||
- Also has *SHA1 Hashes* for integrity check
|
||||
|
||||
Problems
|
||||
|
||||
- $O(N)$ lookup by *SHA1 Hash* (first time without index)
|
||||
- Too many inodes (Filelogs)
|
||||
- Sparse is hard (Topological sorted revision number)
|
||||
|
||||
---
|
||||
|
||||
# Loose files and Pack files
|
||||
|
||||
The two formats powering Git.
|
||||
|
||||
Git does not have *Revision Numbers*.
|
||||
|
||||
Remotefilelog is similar.
|
||||
|
||||
---
|
||||
|
||||
# Loose file
|
||||
|
||||
- One file per file *revision*
|
||||
- No Deltas
|
||||
- $O(\log N)$ lookup by *SHA1 Hash*
|
||||
- By kernel (filesystem)
|
||||
|
||||
Problems
|
||||
|
||||
- *Extremely* space inefficient
|
||||
- Way too many inodes
|
||||
|
||||
---
|
||||
|
||||
# Pack file
|
||||
|
||||
- 1 pack file for a range of file revisions
|
||||
- Delta-ed
|
||||
|
||||
---
|
||||
|
||||
# Pack file
|
||||
|
||||
```bob
|
||||
.idx | .pack
|
||||
|
|
||||
Level1 Level2 | Similar to
|
||||
1st byte Sorted SHA1s | revlog.d
|
||||
+----+ +------+ | +-----------+
|
||||
| 00 | --> | 0000 | ---------> | full text |
|
||||
+----+ | 0002 | ---. | +---------+-+
|
||||
| 01 | | ... | \ .--> | delta |
|
||||
+----+ | 00ff | -. X | +--------++
|
||||
| . | +------+ \ / `--> | delta |
|
||||
| . | X | +--------+----+
|
||||
| . | +------+ / \.---> | full text |
|
||||
+----+ | ff01 | -' /\ | +-------+-----+
|
||||
| ff | --> | ff02 | ---' '--> | delta |
|
||||
+----+ +------+ | +-------+
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# Pack file
|
||||
|
||||
- $O(\log N)$ lookup
|
||||
- $O(\frac{N}{256})$ same-file insertion. $O(1)$ creating new file insertion.
|
||||
|
||||
Problems
|
||||
- Too many ($M$) pack files degrades performance
|
||||
- $O(M \log \frac{N}{M})$ lookup
|
||||
- And space, if pack files are self-contained
|
||||
- Delta-chain become less efficient
|
||||
- Must do `repack` to maintain performance
|
||||
- `repack` can be very expensive
|
||||
|
||||
---
|
||||
|
||||
# Obsstore
|
||||
|
||||
- Not using revision numbers.
|
||||
|
||||
Problems
|
||||
|
||||
- No index - Pay $O(N)$ time loading all markers for anything accessing obsmarkers.
|
||||
|
||||
Complexities
|
||||
|
||||
- Need to lookup by predecessors or successors - multiple indexes needed
|
||||
|
||||
---
|
||||
|
||||
# Problem Summary
|
||||
|
||||
<!--column-->
|
||||
|
||||
File Storage:
|
||||
|
||||
| | Revlog | Loose | Pack |
|
||||
|------------|-------------------------|-----------------------|-----------------------|
|
||||
| Revnum | :cry: |:smiley: |:smiley: |
|
||||
| Insertion | :smiley: |:slightly_smiling_face:|:thinking: |
|
||||
| Lookup | :cry: |:smiley: |:slightly_smiling_face:|
|
||||
| Space | :slightly_smiling_face: |:scream: |:slightly_smiling_face:|
|
||||
| Inode # | :cry: |:scream: |:smiley: |
|
||||
| Maintenance| :smiley: |:smiley: |:cry: |
|
||||
|
||||
<!--column-->
|
||||
Obsstore:
|
||||
- Multiple indexes
|
||||
|
||||
<br />
|
||||
|
||||
Changelog:
|
||||
- Multiple indexes (nodemap, parent-child map)
|
||||
|
||||
---
|
||||
|
||||
# Indexed Log
|
||||
|
||||
Goals
|
||||
|
||||
- Decouple from revision numbers
|
||||
- $O(\log N)$ insertion
|
||||
- $O(\log N)$ lookup
|
||||
- Avoid $O(N)$ in all cases except for fixing corruption
|
||||
- No maintenance to keep above time complexity
|
||||
- Strong integrity
|
||||
|
||||
---
|
||||
|
||||
# Indexed Log
|
||||
|
||||
Be general purposed.
|
||||
|
||||
![](1.jpg)
|
||||
|
||||
---
|
||||
|
||||
# Indexed Log
|
||||
|
||||
|
||||
```bob,scale=0.8
|
||||
.--------------------------------------------.
|
||||
| File Storage |
|
||||
| |
|
||||
| .-----------------------------. |
|
||||
| | Indexed Log | |
|
||||
| | | |
|
||||
| | .-------------------------. | |
|
||||
| | | Append Only Radix Index | | |
|
||||
| | | | | |
|
||||
| | | .-----------------. | | .-------. |
|
||||
| | | | Integrity Check | | | | Zstd | |
|
||||
| | | | for append only | | | | Delta | |
|
||||
| | | | files | | | '-------' |
|
||||
| | | '-----------------' | | |
|
||||
| | '-------------------------' | |
|
||||
| '-----------------------------' |
|
||||
'--------------------------------------------'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# The Index
|
||||
|
||||
<!-- note: simplified -->
|
||||
|
||||
```bob
|
||||
Insert 81c2 | Insert 82ee
|
||||
|
|
||||
.----------------------. .------.
|
||||
| | | | |
|
||||
v | | | v
|
||||
+-------------+ | +---+-|-+---+-|-+ +-------------+
|
||||
| value: 81c2 | | | 1 | * | 2 | * | | value: 82ee |
|
||||
+-------------+ | +---+---+---+---+ +-------------+
|
||||
^ | ^
|
||||
| | |
|
||||
'---. | '---.
|
||||
| | |
|
||||
+---+-|-+ | +---+-|-+
|
||||
| 8 | * | | | 8 | * |
|
||||
+---+---+ | +---+---+
|
||||
Root v1 | Root v2
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# The Index
|
||||
|
||||
- Append-only Index + Atomic-replaced root pointer. Read is lock-free.
|
||||
- Keep modifications in-memory until an explicit `flush`.
|
||||
- $O(\log N)$ insertion and lookup.
|
||||
- No new files written. No maintenance required.
|
||||
|
||||
---
|
||||
|
||||
# The Log
|
||||
|
||||
- Stores a list of *entries*. An *entry* is a slice of `bytes`.
|
||||
- Maintains checksum internally.
|
||||
|
||||
<!-- note: not SHA1 commit hash -->
|
||||
|
||||
---
|
||||
|
||||
# Indexed Log
|
||||
|
||||
- Indexed Log = Log (source of truth) + Indexes (cache)
|
||||
- Define 0 or more *Index Functions* (`entry -> Vec<bytes>`)
|
||||
- Indexed Log builds indexes automatically
|
||||
- Indexes can be rebuilt purely from Log
|
||||
|
||||
<!-- note: no network access -->
|
||||
|
||||
---
|
||||
|
||||
# Indexed Log
|
||||
|
||||
On disk, an `IndexedLog` is stored as a directory:
|
||||
|
||||
- `log` The source of truth.
|
||||
- `index.{foo}` Index "foo".
|
||||
- `index.{foo}.sum` Chunked checksums of Index "foo".
|
||||
- `meta` Pointers to root nodes. Logical file lengths.
|
||||
|
||||
|
||||
---
|
||||
|
||||
# Planned Use Cases
|
||||
|
||||
- File Storage
|
||||
- Changelog Nodemap and Childmap
|
||||
- Obsstore indexes
|
||||
- Bookmark indexes
|
||||
- Undo indexes
|
||||
|
||||
---
|
||||
|
||||
# Lightweight Transaction
|
||||
|
||||
With every data structure being append-only and controlled by `meta`. Transactions can be just different `meta` files ex. `meta.tr{name}`. This allows multiple on-going transactions.
|
||||
|
||||
---
|
||||
|
||||
# Q & A
|
9
slides/README.md
Normal file
9
slides/README.md
Normal file
@ -0,0 +1,9 @@
|
||||
Related technical slides. Not part of an installation.
|
||||
|
||||
The slides reflect ideas at the time they were presented. They are not formal
|
||||
technical documentations and can be outdated. For the latest and accurate
|
||||
documentation, see comments in code and the formal `doc`, and `help/internals`
|
||||
directory.
|
||||
|
||||
`.md` format slides are rendered using [Marp](https://yhatt.github.io/marp/).
|
||||
Some of them use extra features added by [a fork](https://github.com/quark-zju/marp).
|
Loading…
Reference in New Issue
Block a user