sapling

mirror of https://github.com/facebook/sapling.git synced 2024-10-12 17:58:27 +03:00

Author	SHA1	Message	Date
Jun Wu	a7e3e7884d	indexedlog: add a type alias for `Option<ChecksumTable>` Summary: The type will be used all over the place and may make `rustfmt` wrap lines. Use a shorter type to make it slightly cleaner. Reviewed By: DurhamG Differential Revision: D7436338 fbshipit-source-id: ecaada23916a22658f65669b748632a077e60df2	2018-04-13 21:51:42 -07:00
Jun Wu	bfd8e33370	indexedlog: verify checksum for root entry Summary: This only affects `Index::open` right now. So it's a one time check and does not affect performance. Reviewed By: DurhamG Differential Revision: D7436341 fbshipit-source-id: 30313064bf2ea50320ac744fc18c03bff4b12c89	2018-04-13 21:51:42 -07:00
Jun Wu	a0cec9853c	indexedlog: add checksum table to index struct Summary: Add `ChecksumTable` to the `Index` struct. But it's not functional yet. The checksum will mainly affect "index lookup (disk)" case. Add another benchmark for showing the difference with checksum on and off. They do not have much difference right now: index insertion 3.756 ms index flush 3.469 ms index lookup (memory) 0.990 ms index lookup (disk, no verify) 1.768 ms index lookup (disk, verified) 1.766 ms Reviewed By: DurhamG Differential Revision: D7436339 fbshipit-source-id: 60a6554a2c96067a53ce9e1753cd51d0d61c0bea	2018-04-13 21:51:42 -07:00
Jun Wu	8d7d4de8ee	indexedlog: separate benchmarks Summary: The minibench framework does not provide benchmark filtering. So let's separate benchmarks using different entry points. Reviewed By: DurhamG Differential Revision: D7440250 fbshipit-source-id: 11e7790a5074ebf4c08e33c312a490a66a921926	2018-04-13 21:51:42 -07:00
Jun Wu	d86adc417e	indexedlog: remove "index clone" benchmarks Summary: The "clone" benchmarks were added to be subtracted from "lookup" to workaround the test framework limitation. The new minibench framework makes it easier to exclude preparation cost. Therefore the clone benchmarks are no longer needed. index insertion 3.881 ms index flush 3.286 ms index lookup (memory) 0.928 ms index lookup (disk) 1.685 ms "index lookup (memory)" is basically "index lookup (memory)" minus "index clone (memory)" in previous benchmarks. Reviewed By: DurhamG Differential Revision: D7440251 fbshipit-source-id: 0e6a1fb7ee64f9a393ee9ada4db6e6eb052e20bf	2018-04-13 21:51:42 -07:00
Jun Wu	9b9dd289e4	indexedlog: use minibench to do benchmark Summary: See the previous minibench diff for the motivation. "failure" was removed from build dependencies since it's not used yet. Run benchmark a few times. It seems the first several items are less stable due to possibly warming up issues. Otherwise the result looks good enough. The test also compiles and runs much faster. ``` base16 iterating 1M bytes 0.921 ms index insertion 4.804 ms index flush 5.104 ms index lookup (memory) 2.929 ms index lookup (disk) 1.767 ms index clone (memory) 2.036 ms index clone (disk) 0.010 ms base16 iterating 1M bytes 0.853 ms index insertion 4.512 ms index flush 4.717 ms index lookup (memory) 2.907 ms index lookup (disk) 1.755 ms index clone (memory) 1.856 ms index clone (disk) 0.010 ms base16 iterating 1M bytes 1.525 ms index insertion 4.577 ms index flush 4.901 ms index lookup (memory) 2.800 ms index lookup (disk) 1.790 ms index clone (memory) 1.794 ms index clone (disk) 0.010 ms base16 iterating 1M bytes 0.768 ms index insertion 4.486 ms index flush 4.918 ms index lookup (memory) 2.658 ms index lookup (disk) 1.721 ms index clone (memory) 1.763 ms index clone (disk) 0.010 ms base16 iterating 1M bytes 0.732 ms index insertion 4.489 ms index flush 4.792 ms index lookup (memory) 2.689 ms index lookup (disk) 1.739 ms index clone (memory) 1.850 ms index clone (disk) 0.009 ms base16 iterating 1M bytes 1.124 ms index insertion 7.188 ms index flush 4.888 ms index lookup (memory) 2.829 ms index lookup (disk) 1.609 ms index clone (memory) 2.642 ms index clone (disk) 0.010 ms base16 iterating 1M bytes 1.055 ms index insertion 4.683 ms index flush 4.996 ms index lookup (memory) 2.782 ms index lookup (disk) 1.710 ms index clone (memory) 1.802 ms index clone (disk) 0.009 ms ``` Reviewed By: DurhamG Differential Revision: D7440249 fbshipit-source-id: 0f946ab184455acd40c5a38cf46ff94d9e3755c8	2018-04-13 21:51:42 -07:00
Jun Wu	f9fb60337a	minibench: add a simple library to do benchmark Summary: It's sad to find that existing Rust benchmark frameworks do not fit well in our simple benchmark purpose. The benchmark library shipped with Rust [1] has been in "nightly-only" for long. Third-party choices like "criterion.rs" does too many things and misses certain small features. Namely, indexedlog wants: - More stable benchmark result. This means not picking the average time, but the "best" time among all runs, like what Mercurial does. - Do not measure setup cost from repetitive runs. As in D7404532, do not clone the index, and do not have separate "clone" benchmarks. - Faster benchmarks. This means getting rid of unused parts like calling gnuplot. Besides, having the test framework to be lightweight also helps compilation time. Looking at `indexedlog`'s dependencies (with unused "failure" removed), 70% of them are from `criterion.rs`. ``` indexedlog v0.1.0 (lib/indexedlog) [dependencies] \|-- atomicwrites v0.1.5 \| [dependencies] \| \|-- nix v0.9.0 \| \| [dependencies] \| \| \|-- bitflags v0.9.1 \| \| \|-- cfg-if v0.1.2 \| \| \|-- libc v0.2.39 \| \| `-- void v1.0.2 \| `-- tempdir v0.3.6 \| [dependencies] \| \|-- rand v0.4.2 \| \| [dependencies] \| \| `-- libc v0.2.39 () \| `-- remove_dir_all v0.3.0 \| [dependencies] \| \|-- kernel32-sys v0.2.2 \| \| [dependencies] \| \| `-- winapi v0.2.8 \| \| [build-dependencies] \| \| `-- winapi-build v0.1.1 \| `-- winapi v0.2.8 () \|-- byteorder v1.2.1 \|-- fs2 v0.4.3 \| [dependencies] \| `-- libc v0.2.39 () \|-- memmap v0.6.2 \| [dependencies] \| `-- libc v0.2.39 () \|-- twox-hash v1.1.0 \| [dependencies] \| `-- rand v0.3.22 \| [dependencies] \| \|-- libc v0.2.39 () \| `-- rand v0.4.2 () `-- vlqencoding v0.1.0 (lib/vlqencoding) [dev-dependencies] \|-- criterion v0.2.1 \| [dependencies] \| \|-- atty v0.2.8 \| \| [dependencies] \| \| `-- libc v0.2.39 () \| \|-- clap v2.31.1 \| \| [dependencies] \| \| \|-- ansi_term v0.11.0 \| \| \|-- atty v0.2.8 () \| \| \|-- bitflags v1.0.1 \| \| \|-- strsim v0.7.0 \| \| \|-- textwrap v0.9.0 \| \| \| [dependencies] \| \| \| `-- unicode-width v0.1.4 \| \| \|-- unicode-width v0.1.4 () \| \| `-- vec_map v0.8.0 \| \|-- criterion-plot v0.2.1 \| \| [dependencies] \| \| \|-- byteorder v1.2.1 () \| \| \|-- cast v0.2.2 \| \| `-- itertools v0.7.7 \| \| [dependencies] \| \| `-- either v1.4.0 \| \|-- criterion-stats v0.2.1 \| \| [dependencies] \| \| \|-- cast v0.2.2 () \| \| \|-- num-traits v0.2.1 \| \| \|-- num_cpus v1.8.0 \| \| \| [dependencies] \| \| \| `-- libc v0.2.39 () \| \| \|-- rand v0.4.2 () \| \| `-- thread-scoped v1.0.2 \| \|-- failure v0.1.1 \| \| [dependencies] \| \| \|-- backtrace v0.3.5 \| \| \| [dependencies] \| \| \| \|-- backtrace-sys v0.1.16 \| \| \| \| [dependencies] \| \| \| \| `-- libc v0.2.39 () \| \| \| \| [build-dependencies] \| \| \| \| `-- cc v1.0.8 \| \| \| \|-- cfg-if v0.1.2 () \| \| \| \|-- libc v0.2.39 () \| \| \| `-- rustc-demangle v0.1.7 \| \| `-- failure_derive v0.1.1 \| \| [dependencies] \| \| \|-- quote v0.3.15 \| \| \|-- syn v0.11.11 \| \| \| [dependencies] \| \| \| \|-- quote v0.3.15 () \| \| \| \|-- synom v0.11.3 \| \| \| \| [dependencies] \| \| \| \| `-- unicode-xid v0.0.4 \| \| \| `-- unicode-xid v0.0.4 () \| \| `-- synstructure v0.6.1 \| \| [dependencies] \| \| \|-- quote v0.3.15 () \| \| `-- syn v0.11.11 () \| \|-- failure_derive v0.1.1 () \| \|-- handlebars v0.31.0 \| \| [dependencies] \| \| \|-- lazy_static v1.0.0 \| \| \|-- log v0.4.1 \| \| \| [dependencies] \| \| \| `-- cfg-if v0.1.2 () \| \| \|-- pest v1.0.6 \| \| \|-- pest_derive v1.0.6 \| \| \| [dependencies] \| \| \| \|-- pest v1.0.6 () \| \| \| \|-- quote v0.3.15 () \| \| \| `-- syn v0.11.11 () \| \| \|-- quick-error v1.2.1 \| \| \|-- regex v0.2.10 \| \| \| [dependencies] \| \| \| \|-- aho-corasick v0.6.4 \| \| \| \| [dependencies] \| \| \| \| `-- memchr v2.0.1 \| \| \| \| [dependencies] \| \| \| \| `-- libc v0.2.39 () \| \| \| \|-- memchr v2.0.1 () \| \| \| \|-- regex-syntax v0.5.3 \| \| \| \| [dependencies] \| \| \| \| `-- ucd-util v0.1.1 \| \| \| \|-- thread_local v0.3.5 \| \| \| \| [dependencies] \| \| \| \| \|-- lazy_static v1.0.0 () \| \| \| \| `-- unreachable v1.0.0 \| \| \| \| [dependencies] \| \| \| \| `-- void v1.0.2 () \| \| \| `-- utf8-ranges v1.0.0 \| \| \|-- serde v1.0.33 \| \| `-- serde_json v1.0.11 \| \| [dependencies] \| \| \|-- dtoa v0.4.2 \| \| \|-- itoa v0.3.4 \| \| \|-- num-traits v0.2.1 () \| \| `-- serde v1.0.33 () \| \|-- itertools v0.7.7 () \| \|-- itertools-num v0.1.1 \| \| [dependencies] \| \| `-- num-traits v0.1.43 \| \| [dependencies] \| \| `-- num-traits v0.2.1 () \| \|-- log v0.4.1 () \| \|-- serde v1.0.33 () \| \|-- serde_derive v1.0.33 \| \| [dependencies] \| \| \|-- proc-macro2 v0.2.3 \| \| \| [dependencies] \| \| \| `-- unicode-xid v0.1.0 \| \| \|-- quote v0.4.2 \| \| \| [dependencies] \| \| \| `-- proc-macro2 v0.2.3 () \| \| \|-- serde_derive_internals v0.21.0 \| \| \| [dependencies] \| \| \| \|-- proc-macro2 v0.2.3 () \| \| \| `-- syn v0.12.14 \| \| \| [dependencies] \| \| \| \|-- proc-macro2 v0.2.3 () \| \| \| \|-- quote v0.4.2 () \| \| \| `-- unicode-xid v0.1.0 () \| \| `-- syn v0.12.14 () \| \|-- serde_json v1.0.11 () \| `-- simplelog v0.5.0 \| [dependencies] \| \|-- chrono v0.4.0 \| \| [dependencies] \| \| \|-- num v0.1.42 \| \| \| [dependencies] \| \| \| \|-- num-integer v0.1.36 \| \| \| \| [dependencies] \| \| \| \| `-- num-traits v0.2.1 () \| \| \| \|-- num-iter v0.1.35 \| \| \| \| [dependencies] \| \| \| \| \|-- num-integer v0.1.36 () \| \| \| \| `-- num-traits v0.2.1 () \| \| \| `-- num-traits v0.2.1 () \| \| `-- time v0.1.39 \| \| [dependencies] \| \| `-- libc v0.2.39 () \| \| [dev-dependencies] \| \| `-- winapi v0.3.4 \| \|-- log v0.4.1 () \| `-- term v0.4.6 \|-- quickcheck v0.6.2 \| [dependencies] \| \|-- env_logger v0.5.6 \| \| [dependencies] \| \| \|-- atty v0.2.8 () \| \| \|-- humantime v1.1.1 \| \| \| [dependencies] \| \| \| `-- quick-error v1.2.1 () \| \| \|-- log v0.4.1 () \| \| \|-- regex v0.2.10 () \| \| `-- termcolor v0.3.5 \| \|-- log v0.4.1 () \| `-- rand v0.4.2 () \|-- rand v0.4.2 () `-- tempdir v0.3.6 () ``` [1]: https://github.com/rust-lang/rust/issues/29553 Reviewed By: DurhamG Differential Revision: D7440254 fbshipit-source-id: 53cdbd470945388db96702ab771a3f73b456da37	2018-04-13 21:51:42 -07:00
Jun Wu	8bcff92cab	indexedlog: use a dedicated map type for offset translation Summary: The dirty -> non-dirty offset mapping can be optimized using a dedicated "map" type that is backed by `vec`s, because dirty offsets are continuous per type. This makes "flush" significantly faster: ``` index flush time: [5.8808 ms 6.1800 ms 6.4813 ms] change: [-62.250% -59.481% -56.325%] (p = 0.00 < 0.05) Performance has improved. ``` Reviewed By: DurhamG Differential Revision: D7422832 fbshipit-source-id: 9ab8a70d1663155941dae5b4f02f7452f5e3cadf	2018-04-13 21:51:42 -07:00
Jun Wu	00503a6d94	indexedlog: avoid a memory allocation Summary: It seems to improve the performance a bit: ``` index insertion time: [5.4643 ms 5.6818 ms 5.9188 ms] change: [-24.526% -17.384% -10.315%] (p = 0.00 < 0.05) Performance has improved. ``` Reviewed By: DurhamG Differential Revision: D7422831 fbshipit-source-id: fc1c72f402258db7e189cd8724583757d48affb7	2018-04-13 21:51:42 -07:00
Jun Wu	4cb2cc1abb	indexedlog: use Box<[u8]> instead of Vec<u8> Summary: For key entries, the key is immutable once stored. So just use `Box<[u8]>`. It saves a `usize` per entry. On 64-bit platform, that's a lot. Performance is slightly improved and it catches up with D7404532 before typed offset refactoring now: index insertion time: [6.1852 ms 6.6598 ms 7.2433 ms] index flush time: [15.814 ms 16.538 ms 17.235 ms] index lookup (memory) time: [3.7636 ms 3.9403 ms 4.1424 ms] index lookup (disk) time: [1.9413 ms 2.0366 ms 2.1325 ms] index clone (memory) time: [2.6952 ms 2.9221 ms 3.0968 ms] index clone (disk) time: [5.0296 us 5.2862 us 5.5629 us] Reviewed By: DurhamG Differential Revision: D7422837 fbshipit-source-id: 4aabfdc028aefb8e796803e103f0b2e4965f84e6	2018-04-13 21:51:42 -07:00
Jun Wu	36793b7c14	indexedlog: simplify `insert_advanced` API Summary: Previously, both `value` and `link` are optional in `insert_advanced`. This diff makes `value` required. `maybe_create_link_entry` becomes unused and removed. No visible performance change. Reviewed By: DurhamG Differential Revision: D7422838 fbshipit-source-id: 8d7d3cc1cc325f6fea7e8ce996d0a43d3ee49839	2018-04-13 21:51:41 -07:00
Jun Wu	892fcd6dfd	indexedlog: use typed offsets Summary: This is a large refactoring that replaces `u64` offsets with strong typed ones. Tests about serialization are removed since they generate illegal data that cannot pass type check. It seems to slow down the code a bit, comparing with D7404532. But there are still room to improve. index insertion time: [6.9395 ms 7.3863 ms 7.7620 ms] index flush time: [15.949 ms 17.965 ms 20.246 ms] index lookup (memory) time: [3.6212 ms 3.8855 ms 4.1923 ms] index lookup (disk) time: [2.2496 ms 2.4649 ms 2.8090 ms] index clone (memory) time: [2.7292 ms 2.9399 ms 3.2055 ms] index clone (disk) time: [4.9239 us 5.5928 us 6.3167 us] Reviewed By: DurhamG Differential Revision: D7422833 fbshipit-source-id: 7357cb0f4f573f620e829c5e300cd423619dbd62	2018-04-13 21:51:41 -07:00
Jun Wu	283b8d130d	pathmatcher: initial Rust matcher that handles gitignore lazily Summary: The "pathmatcher" crate is intended to eventually cover more "matcher" abilities so all Python "matcher" related logic can be handled by Rust. For now, it only contains a gitignore matcher. The gitignore matcher is designed to work in a repo (no need to create multiple gitignore matchers for a repo from a higher layer), and be lazy i.e. be tree-aware, and do not parse ".gitignore" unless necessary. Worth mentioning that the gitignore logic provided by the "ignore" crate seems decent in time complexity - it uses regular expression, which uses state machines to achieve "testing against multiple patterns at once", instead of testing patterns one-by-one like what git currently does. Note: The "ignore" crate provides a nice "Walker" interface but that does not fit very well with the required laziness here. So the walker interface is not used. Reviewed By: markbt Differential Revision: D7319609 fbshipit-source-id: ebd131adf45a38f83acdf653f5e49d0624012152	2018-04-13 21:51:40 -07:00
Jun Wu	a87fea077c	indexedlog: prefix in-memory entries with `Mem` Summary: This makes it clear the code has different code paths for on-disk entries. Reviewed By: DurhamG Differential Revision: D7422836 fbshipit-source-id: 018fa0e2c20682d4e1beba99f3307550e1f40388	2018-04-13 21:51:40 -07:00
Jun Wu	3332522d43	indexedlog: add some benchmarks Summary: Add benchmarks inserting / looking up 20K entries. Benchmark results on my laptop are: index insertion time: [6.5339 ms 6.8174 ms 7.1805 ms] index flush time: [15.651 ms 16.103 ms 16.537 ms] index lookup (memory) time: [3.6995 ms 4.0252 ms 4.3046 ms] index lookup (disk) time: [1.9986 ms 2.1224 ms 2.2464 ms] index clone (memory) time: [2.5943 ms 2.6866 ms 2.7749 ms] index clone (disk) time: [5.2302 us 5.5477 us 5.9518 us] Comparing with highly optimized radixbuf: index insertion time: [991.89 us 1.1708 ms 1.3844 ms] index lookup time: [863.83 us 945.69 us 1.0304 ms] Insertion takes 6x time. Lookup from memory takes 1.4x time, from disk takes 2.2x time. Flushing is the slowest - it needs 16x radixbuf insertion time. Note: need to subtract "clone" time from "lookup" to get meaningful values about "lookup". This cannot be done automatically due to the limitation of the benchmark framework. Although it's slower than radixbuf, the index is still faster than gdbm and rocksdb. Note: the index does less than gdbm/rocksdb since it does not return a `[u8]`-ish which requires extra lookups. So it's not a very fair comparison. gdbm insertion time: [69.607 ms 75.102 ms 79.334 ms] gdbm lookup time: [9.0855 ms 9.8480 ms 10.637 ms] gdbm prepare time: [110.35 us 120.40 us 135.63 us] rocksdb insertion time: [117.96 ms 123.42 ms 127.85 ms] rocksdb lookup time: [24.413 ms 26.147 ms 28.153 ms] rocksdb prepare time: [3.8316 ms 4.1776 ms 4.5039 ms] Note: Subtract "prepare" from "insertion" to get meaningful values. Code to benchmark rocksdb and gdbm: ``` extern crate criterion; extern crate gnudbm; extern crate rand; extern crate rocksdb; extern crate tempdir; use criterion::Criterion; use gnudbm::GdbmOpener; use rand::{ChaChaRng, Rng}; use rocksdb::DB; use tempdir::TempDir; const N: usize = 20480; /// Generate random buffer fn gen_buf(size: usize) -> Vec<u8> { let mut buf = vec![0u8; size]; ChaChaRng::new_unseeded().fill_bytes(buf.as_mut()); buf } fn criterion_benchmark(c: &mut Criterion) { c.bench_function("rocksdb prepare", \|b\| { b.iter(move \|\| { let dir = TempDir::new("index").expect("TempDir::new"); let _db = DB::open_default(dir.path().join("a")).unwrap(); }); }); c.bench_function("rocksdb insertion", \|b\| { let buf = gen_buf(N * 20); b.iter(move \|\| { let dir = TempDir::new("index").expect("TempDir::new"); let db = DB::open_default(dir.path().join("a")).unwrap(); for i in 0..N { db.put(&&buf[20 * i..20 * (i + 1)], b"v").unwrap(); } }); }); c.bench_function("rocksdb lookup", \|b\| { let dir = TempDir::new("index").expect("TempDir::new"); let db = DB::open_default(dir.path().join("a")).unwrap(); let buf = gen_buf(N * 20); for i in 0..N { db.put(&&buf[20 * i..20 * (i + 1)], b"v").unwrap(); } b.iter(move \|\| { for i in 0..N { db.get(&&buf[20 * i..20 * (i + 1)]).unwrap(); } }); }); c.bench_function("gdbm prepare", \|b\| { let buf = gen_buf(N * 20); b.iter(move \|\| { let dir = TempDir::new("index").expect("TempDir::new"); let _db = GdbmOpener::new().create(true).readwrite(dir.path().join("a")).unwrap(); }); }); c.bench_function("gdbm insertion", \|b\| { let buf = gen_buf(N * 20); b.iter(move \|\| { let dir = TempDir::new("index").expect("TempDir::new"); let mut db = GdbmOpener::new().create(true).readwrite(dir.path().join("a")).unwrap(); for i in 0..N { db.store(&&buf[20 * i..20 * (i + 1)], b"v").unwrap(); } }); }); c.bench_function("gdbm lookup", \|b\| { let dir = TempDir::new("index").expect("TempDir::new"); let mut db = GdbmOpener::new().create(true).readwrite(dir.path().join("a")).unwrap(); let buf = gen_buf(N * 20); for i in 0..N { db.store(&&buf[20 * i..20 * (i + 1)], b"v").unwrap(); } b.iter(move \|\| { for i in 0..N { db.fetch(&&buf[20 * i..20 * (i + 1)]).unwrap(); } }); }); } criterion_group!{ name=benches; config=Criterion::default().sample_size(20); targets=criterion_benchmark } criterion_main!(benches); ``` Reviewed By: DurhamG Differential Revision: D7404532 fbshipit-source-id: ff39f520b78ad1b71eb36970506b313bb2ff426b	2018-04-13 21:51:40 -07:00
Jun Wu	5576402ea9	indexedlog: add ability to clone a `Index` object Summary: This will be useful for benchmarks - prepare an index as a template, and clone it in the tests. Reviewed By: DurhamG Differential Revision: D7422835 fbshipit-source-id: 190bbdee7cb7c1526274b4d4dab07af4984b5df6	2018-04-13 21:51:40 -07:00
Jun Wu	2f30189748	indexedlog: reorder "use"s Summary: The latest rustfmt disagrees about the order of `std::io` imports. Move the troublesome line to a separate group so both the old and new rustfmt agress on the format. Reviewed By: DurhamG Differential Revision: D7422834 fbshipit-source-id: 9f5289ef2af1a691559fe691e121190f6d845162	2018-04-13 21:51:40 -07:00
Jun Wu	704eef1e4e	radixbuf: use criterion for benchmark Summary: The old `rustc-test` crate no longer works. There is an upstream bug report at https://github.com/servo/rustc-test/issues/7. This change makes it possible to compare radixbuf performance with the new index. Reviewed By: DurhamG Differential Revision: D7404531 fbshipit-source-id: 515e732a65388db4c865c7b139d0f57ead76f788	2018-04-13 21:51:40 -07:00
Jun Wu	9672c45582	indexedlog: add a test comparing with std HashMap Reviewed By: DurhamG Differential Revision: D7404529 fbshipit-source-id: a52da9aa9661b48eefc015ce351886677f842d66	2018-04-13 21:51:40 -07:00
Jun Wu	9077cbb5a7	indexedlog: reverse the writing order of radix entries Summary: Radix entries need to be written in an reversed order given the order they are added to the vector. Reviewed By: DurhamG Differential Revision: D7404530 fbshipit-source-id: 403189b5c0fa6f21183e62eea04ce4ce7c4e1129	2018-04-13 21:51:40 -07:00
Jun Wu	2075ad87c2	indexedlog: implement leaf splitting Summary: Complete the insertion interface. Reviewed By: DurhamG Differential Revision: D7377210 fbshipit-source-id: 96645ac03a3fd65f22d9a9a54d8479715f49e67d	2018-04-13 21:51:39 -07:00
Jun Wu	a436d0554d	indexedlog: add more helper methods Summary: Those little read and write helpers are used in the next diff. Reviewed By: DurhamG Differential Revision: D7377214 fbshipit-source-id: c6e2d240334c11a0b08b15cd7d5c114b6f4d8ace	2018-04-13 21:51:39 -07:00
Jun Wu	61bf1f3854	indexedlog: add a helper function to get key content Summary: Add a helper function `peek_key_entry_content` that checks key type and return the key content. Reviewed By: DurhamG Differential Revision: D7377211 fbshipit-source-id: 0ce509aba30309373a709cf5fbcb909dd80471dc	2018-04-13 21:51:39 -07:00
Jun Wu	bf55572f78	indexedlog: partially implement insertion Summary: Implement insertion when there is no need to split a leaf entry. The API may be subject to change if we want other value types. For now, it's better to get something working and can be benchmarked so we have data about performance impact with new format changes. Reviewed By: DurhamG Differential Revision: D7343423 fbshipit-source-id: 9761f72168046dbafcb00883634aa7ad513a522b	2018-04-13 21:51:39 -07:00
Jun Wu	2389fd95c0	indexedlog: add helper methods about writing data Summary: Like the `peek_` family of helper methods. Those methods handles writing data for both dirty (in-memory) and non-dirty (on-disk) cases. They will be used in the next diff. Reviewed By: DurhamG Differential Revision: D7377208 fbshipit-source-id: f458a20da4bb7808f37daeed3077be2f7e90a9df	2018-04-13 21:51:39 -07:00
Jun Wu	cb58628046	indexedlog: add debug formatter Summary: Add code to print out Index's on-disk and in-memory entries in human-friendly form. This is useful for explaining its internal state, so it could be used in tests. Reviewed By: DurhamG Differential Revision: D7343427 fbshipit-source-id: 706a35404ea42c413657b389166729f8dd1315a3	2018-04-13 21:51:39 -07:00
Jun Wu	a3f7ec3f9b	indexedlog: fix root entry serialization Summary: Offset stored in it needs to be translated, as done in other types of entries. I forgot it. Reviewed By: DurhamG Differential Revision: D7404528 fbshipit-source-id: fb09a9c3052ddfe8f8016440290062084d5d8b03	2018-04-13 21:51:39 -07:00
Jun Wu	fcc71af3ab	indexedlog: add API to find link offset from a key Summary: This is a low-level API that follows the base16 sequence of a key, and return potentially matched `LinkOffset`. Reviewed By: DurhamG Differential Revision: D7343424 fbshipit-source-id: 38f260064d1a23695a28dda6f7dc921f88c7fccc	2018-04-13 21:51:39 -07:00
Jun Wu	871ca6c96b	indexedlog: add helper methods to read data Summary: Add a bunch of helper methods to "peek" data inside all kinds of entries. They will be used in the next diff. The benefit of those helper methods is they handle both dirty offsets and non-dirty offsets transparently. Previously I have tried to always parse on-disk entries into in-memory ones and stored them in a hashmap cache. But that turned to have too much overhead so always reading from disk is more desirable. It seems to provide at least 2x perf improvement from my previous quick test. Reviewed By: DurhamG Differential Revision: D7377207 fbshipit-source-id: 1b393f1fe64c1d54b986ba7c3b03c790adb694d4	2018-04-13 21:51:39 -07:00
Jun Wu	983d6920f5	indexedlog: add a non-dirty helper method Summary: The `non_dirty` helper method enforces the offset to be a non-dirty one. It will be used frequently for checking offsets read from the disk, since the on-disk offsets shouldn't have any reference to dirty (in-memory) entries. Reviewed By: DurhamG Differential Revision: D7377209 fbshipit-source-id: c6c381c065d3ba8aaa65698224e4778b86edbc4a	2018-04-13 21:51:39 -07:00
Jun Wu	f0b5cd6eae	indexedlog: add simple `DirtyOffset` abstraction Summary: The `DirtyOffset` enum converts between array indexes and u64. Reviewed By: DurhamG Differential Revision: D7377215 fbshipit-source-id: 29d4f7d74f15523034c11abcc09329a1b21142b1	2018-04-13 21:51:39 -07:00
Jun Wu	3859d00394	indexedlog: implement flush for the main index Summary: The flush method will write buffered data to disk. A mistake in Root entry serialization is fixed - it needs to translate dirty offsets to non-dirty ones. Reviewed By: DurhamG Differential Revision: D7223729 fbshipit-source-id: baeaab27627d6cfb7c5798d3a39be4d2b8811e5f	2018-04-13 21:51:35 -07:00
Jun Wu	8f5c35c8d2	indexedlog: initial main index structure Summary: Add the main `Index` structure and its constructor. The structure focus on the index logic itself. It does not have the checksum part yet. Some notes about choices made: - The use of mmap: mmap is good for random I/O, and has the benefit of sharing buffers between processes reading the same file. We may be able to do good user-space caching for the random I/O part. But it's harder to share the buffers between processes. - The "read_only" auto decision. Common "open" pattern requires the caller to pass whether they want to read or write. The index makes the decision for the caller for convenience (ex. running "hg log" on somebody else's repo). - The "load root entry from the end of the file" feature. It's just for convenience for users wanting to use the Index in a standalone way. We probably Reviewed By: DurhamG Differential Revision: D7208358 fbshipit-source-id: 14b74d7e32ef28bd5bc3483fd560c489d36bf8e5	2018-04-13 21:51:35 -07:00
Jun Wu	545f670504	pathencoding: utility for converting between bytes and paths Summary: A simple utility that does paths <-> local bytes conversion. It's needed since Mercurial stores paths using local encoding in manifests. For POSIX, the code is zero-cost - no real conversion or error can happen. This is in theory cheaper than what treedirstate does. For Windows, the "local_encoding" crate is selected as Yuya suggested the `MultiByteToWideChar` Win32 API [1] and "local_encoding" uses it. It does the right thing given my experiment with GBK (Chinese, simplified) encoding. ``` .... C:\Users\quark\enc>hg debugshell --config extensions.debugshell= >>> repo[0].manifest().text() '\xc4\xbf\xc2\xbc1/\xce\xc4\xbc\xfe1\x00b80de5d138758541c5f05265ad144ab9fa86d1db\n' >>> repo[0].files() ['\xc4\xbf\xc2\xbc1/\xce\xc4\xbc\xfe1'] extern crate local_encoding; use std::path::PathBuf; use local_encoding::{Encoder, Encoding}; const mpath: &[u8] = b"\xc4\xbf\xc2\xbc1/\xce\xc4\xbc\xfe1"; fn main() { let p = PathBuf::from(Encoding::OEM.to_string(mpath).unwrap()); println!("exists: {}", p.exists()); println!("mpath len: {}, osstr len: {}", mpath.len(), p.as_path().as_os_str().len()); } exists: true mpath len: 11, osstr len: 15 ``` In the future, we might normalize the paths to UTF-8 before storing them in manifest to avoid issues. Differential Revision: D7319604 fbshipit-source-id: a7ed5284be116c4176598b4c742e8228abcc3b02	2018-04-13 21:51:35 -07:00
Jun Wu	78f4faea65	xdiff: add a preprocessing step that trims files Summary: xdiff has a `xdl_trim_ends` step that removes common lines, unmatchable lines. That is in theory good, but happens too late - after splitting, hashing, and adjusting the hash values so they are unique. Those splitting, hashing and adjusting hash values steps could have noticeable overhead. For not uncommon cases like diffing two large files with minor differences, the raw performance of those preparation steps seriously matter. Even allocating an O(N) array and storing line offsets to it is expensive. Therefore my previous attempts [1] [2] cannot be good enough since they do not remove the O(N) array assignment. This patch adds a preprocessing step - `xdl_trim_files` that runs before other preprocessing steps. It counts common prefix and suffix and lines in them (needed for displaying line number), without doing anything else. Testing with a crafted large (169MB) file, with minor change: ``` open('a','w').write(''.join('%s\n' % (i % 100000) for i in xrange(30000000) if i != 6000000)) open('b','w').write(''.join('%s\n' % (i % 100000) for i in xrange(30000000) if i != 6003000)) ``` Running xdiff by a simple binary [3], this patch improves the xdiff perf by more than 10x for the above case: ``` # xdiff before this patch 2.41s user 1.13s system 98% cpu 3.592 total # xdiff after this patch 0.14s user 0.16s system 98% cpu 0.309 total # gnu diffutils 0.12s user 0.15s system 98% cpu 0.272 total # (best of 20 runs) ``` It's still slightly slower than GNU diffutils. But it's pretty close now. Testing with real repo data: For the whole repo, this patch makes xdiff 25% faster: ``` # hg perfbdiff --count 100 --alldata -c d334afc585e2 --blocks [--xdiff] # xdiff, after ! wall 0.058861 comb 0.050000 user 0.050000 sys 0.000000 (best of 100) # xdiff, before ! wall 0.077816 comb 0.080000 user 0.080000 sys 0.000000 (best of 91) # bdiff ! wall 0.117473 comb 0.120000 user 0.120000 sys 0.000000 (best of 67) ``` For files that are long (ex. commands.py), the speedup is more than 3x, very significant: ``` # hg perfbdiff --count 3000 --blocks commands.py.i 1 [--xdiff] # xdiff, after ! wall 0.690583 comb 0.690000 user 0.690000 sys 0.000000 (best of 12) # xdiff, before ! wall 2.240361 comb 2.210000 user 2.210000 sys 0.000000 (best of 4) # bdiff ! wall 2.469852 comb 2.440000 user 2.440000 sys 0.000000 (best of 4) ``` The improvement is also seen for the `json` test case mentioned in D7124455. xdiff's time improves from 0.3s to 0.04s, similar to GNU diffutils. This patch is also sent as https://phab.mercurial-scm.org/D2686. [1]: https://phab.mercurial-scm.org/D2631 [2]: https://phab.mercurial-scm.org/D2634 [3]: ``` // Code to run xdiff from command line. No proper error handling. mmfile_t readfile(const char path) { struct stat st; int fd = open(path, O_RDONLY); fstat(fd, &st); mmfile_t f = { malloc(st.st_size), st.st_size }; ensure(read(fd, f.ptr, st.st_size) == st.st_size); close(fd); return f; } static int xdiff_outf(void priv_, mmbuffer_t mb, int nbuf) { int i; for (i = 0; i < nbuf; i++) { write(STDOUT_FILENO, mb[i].ptr, mb[i].size); } return 0; } int main(int argc, char const argv[]) { mmfile_t a = readfile(argv[1]), b = readfile(argv[2]); xpparam_t xpp = { XDF_INDENT_HEURISTIC, 0 }; xdemitconf_t xecfg = { 3, 0 }; xdemitcb_t ecb = { 0, &xdiff_outf }; xdl_diff(&a, &b, &xpp, &xecfg, &ecb); return 0; } ``` Reviewed By: ryanmce Differential Revision: D7151582 fbshipit-source-id: 3f2dd43b74da118bd827af4fc5e1bf65be191ad2	2018-04-13 21:51:25 -07:00
Jun Wu	865700883d	indexedlog: move mmap_readonly to utils Summary: `mmap_readonly` will be reused in `index.rs` so let's moved it to a shared utils module. Reviewed By: DurhamG Differential Revision: D7208359 fbshipit-source-id: d98779e4e21765ce0e185281c9560245b59b174c	2018-04-13 21:51:25 -07:00
Jun Wu	d3b0f0cdfb	indexedlog: add RAII file lock Summary: Add ScopedFileLock. This is similar to Python's contextmanager. It's easier to use than the fs2 raw API, since it guarantees the file is unlocked. Reviewed By: jsgf Differential Revision: D7203684 fbshipit-source-id: 5d7beed99ff992466ab7bf1fbea0353de4dfe4f9	2018-04-13 21:51:25 -07:00
Jun Wu	605cd36716	indexedlog: add serialization for root entry Reviewed By: DurhamG Differential Revision: D7191653 fbshipit-source-id: 4c82a6b2a00d8e4cb3c67ecb382659ff8946bdad	2018-04-13 21:51:25 -07:00
Jun Wu	0f9d39cae8	indexedlog: add serialization for key entry Reviewed By: DurhamG Differential Revision: D7191651 fbshipit-source-id: 8eb8cbc00f0b15660e6d9e988ae41b761d854fa2	2018-04-13 21:51:25 -07:00
Jun Wu	ba05e88179	indexedlog: add serialization for leaf and link entry Summary: They are simpler than radix entry and similar. Reviewed By: DurhamG Differential Revision: D7191652 fbshipit-source-id: b516663567267a2e354748396b44c2ac8ebb691f	2018-04-13 21:51:25 -07:00
Jun Wu	dab5948078	indexedlog: add serialization for radix entry Summary: Start serialization implementation. First, add support for the radix entry. Reviewed By: DurhamG Differential Revision: D7191365 fbshipit-source-id: 54a5ba5c666ba4def1e80eaa2ff7d4d77ff53f8c	2018-04-13 21:51:25 -07:00
Jun Wu	599194b15d	indexedlog: define basic structures Summary: These are Rust structures that map to the file format. Reviewed By: DurhamG Differential Revision: D7191366 fbshipit-source-id: 23a4431383be9713e955b74306cd68108eb80536	2018-04-13 21:51:25 -07:00
Jun Wu	6542d0ebf4	indexedlog: add comment about index file format Summary: Document the format. Actual implementation in later diffs. Reviewed By: DurhamG Differential Revision: D7190575 fbshipit-source-id: 243992fd052ca7a9688d54d20694e65daebb9660	2018-04-13 21:51:25 -07:00
Jun Wu	015a4ac5d6	indexedlog: port base16 iterator from radixbuf Summary: The append-only index is too different so it's cleaner to cherry-pick code from radixbuf, instead of modifying radixbuf which would break code depending on it. Started by picking the base16 iterator part. `rustc-test` does not work with buck, and seems to be in an unmaintained state, so benchmark tests are migrated to criterion. Reviewed By: DurhamG Differential Revision: D7189143 fbshipit-source-id: 459a79b4cf16f35d2ff86f11a5980ba1fc627951	2018-04-13 21:51:25 -07:00
Jun Wu	d2c457a6e2	indexedlog: integrity check utility on an append-only file Summary: Filesystem is hard. Append-only sounds like a safe way to write files, but it only really helps with process crashes. If the OS crashes, it's possible that other parts of the file gets corrupted. As source control, data integrity check is important. So bytes not logically touched by appending also needs to be checked. Implement a `ChecksumTable` which adds integrity check ability to append-only files. It's intended to be used by future append-only indexes. Reviewed By: DurhamG Differential Revision: D7108433 fbshipit-source-id: 16daf6b8d04bba464f1ee9221716beba69c1d47b	2018-04-13 21:51:24 -07:00
Jun Wu	0518016553	indexedlog: initial boilerplate Summary: First step of a storage-related building block that is in Rust. The goal is to use it to replace revlog, obsstore and packfiles. Extern crates that are likely useful are added to reduce future churns. Reviewed By: DurhamG Differential Revision: D7108434 fbshipit-source-id: 97ebd9ba69547d876dcecc05e604acdf9088877e	2018-04-13 21:51:24 -07:00
Kostia Balytskyi	0ef59877cd	hg: some portability fixes to py-cdatapack.h Summary: 1. Variable Length Arrays are not supported by MSVC, but since this is a C++ code, we can just use heap allocation 2. Replacing `inet` with portability version Depends on D7196403 Reviewed By: quark-zju Differential Revision: D7196605 fbshipit-source-id: a0d88b6e06f255ef648c0b35a99b42ba3bee538a	2018-04-13 21:51:24 -07:00
Ryan Prince	573a8eb9cc	fixing xdiff build on windows Summary: fixing xdiff build on windows Reviewed By: quark-zju Differential Revision: D7189839 fbshipit-source-id: ef05219d911af44f3546bc51fb74539d06b443b5	2018-04-13 21:51:23 -07:00
Jun Wu	81e68a9a57	xdiff: decrease indent heuristic overhead Summary: Add a "boring" threshold to limit the search range of the indention heuristic, so the performance of the diff algorithm is mostly unaffected by turning on indention heuristic. Reviewed By: ryanmce Differential Revision: D7145002 fbshipit-source-id: 024ec685f96aa617fb7da141f38fa4e12c4c0fc9	2018-04-13 21:51:21 -07:00
Jun Wu	511ec41260	xdiff: add a bdiff hunk mode Summary: xdiff generated hunks for the differences (ex. questionmarks in the `@@ -?,? +?,? @@` part from `diff --git` output). However, bdiff generates matched hunks instead. This patch adds a `XDL_EMIT_BDIFFHUNK` flag used by the output function `xdl_call_hunk_func`. Once set, xdiff will generate bdiff-like hunks instead. That makes it easier to use xdiff as a drop-in replacement of bdiff. Note that since `bdiff('', '')` returns `[(0, 0, 0, 0)]`, the shortcut path `if (xscr)` is removed. I have checked functions called with `xscr` argument (`xdl_mark_ignorable`, `xdl_call_hunk_func`, `xdl_emit_diff`, `xdl_free_script`) work just fine with `xscr = NULL`. Reviewed By: ryanmce Differential Revision: D7135207 fbshipit-source-id: cfb8c363e586841c06c94af283c7f014ba65fcc0	2018-04-13 21:51:21 -07:00
Jun Wu	56a738fce4	xdiff: remove patience and histogram diff algorithms Summary: Patience diff is the normal diff algorithm, plus some greediness that unconditionally matches common common unique lines. That means it is easy to construct cases to let it generate suboptimal result, like: ``` open('a', 'w').write('\n'.join(list('a' + 'x' * 300 + 'u' + 'x' * 700 + 'a\n'))) open('b', 'w').write('\n'.join(list('b' + 'x' * 700 + 'u' + 'x' * 300 + 'b\n'))) ``` Patience diff has been advertised as being able to generate better results for some C code changes. However, the more scientific way to do that is the indention heuristic [1]. Since patience diff could generate suboptimal result more easily and its "better" diff feature could be replaced by the new indention heuristic, let's just remove it and its variant histogram diff to simplify the code. [1]: `433860f3d0` Reviewed By: ryanmce Differential Revision: D7124711 fbshipit-source-id: 127e8de6c75d0262687a1b60814813e660aae3da	2018-04-13 21:51:20 -07:00
Jun Wu	65d9160c6f	xdiff: vendor xdiff library from git Summary: Vendor git's xdiff library from git commit d7c6c2369d7c6c2369ac21141b7c6cceaebc6414ec3da14ad using GPL2+ license. There is another recent user report that hg diff generates suboptimal result. It seems the fix to issue4074 isn't good enough. I crafted some other interesting cases, and hg diff barely has any advantage compared with gnu diffutils or git diff. \| testcase \| gnu diffutils \| hg diff \| git diff \| \| \| lines time \| lines time \| lines time \| \| patience \| 6 0.00 \| 602 0.08 \| 6 0.00 \| \| random \| 91772 0.90 \| 109462 0.70 \| 91772 0.24 \| \| json \| 2 0.03 \| 1264814 1.81 \| 2 0.29 \| "lines" means the size of the output, i.e. the count of "+/-" lines. "time" means seconds needed to do the calculation. Both are the smaller the better. "hg diff" counts Python startup overhead. Git and GNU diffutils generate optimal results. For the "json" case, git can have an optimization that does a scan for common prefix and suffix first, and match them if the length is greater than half of the text. See https://neil.fraser.name/news/2006/03/12/. That would make git the fastest for all above cases. About testcases: patience: Aiming for the weakness of the greedy "patience diff" algorithm. Using git's patience diff option would also get suboptimal result. Generated using the Python script: ``` open('a', 'w').write('\n'.join(list('a' + 'x' * 300 + 'u' + 'x' * 700 + 'a\n'))) open('b', 'w').write('\n'.join(list('b' + 'x' * 700 + 'u' + 'x' * 300 + 'b\n'))) ``` random: Generated using the script in `test-issue4074.t`. It practically makes the algorithm suffer. Impressively, git wins in both performance and diff quality. json: The recent user reported case. It's a single line movement near the end of a very large (800K lines) JSON file. Reviewed By: ryanmce Differential Revision: D7124455 fbshipit-source-id: 832651115da770f9d2ed5fdff2e200453c0013f8	2018-04-13 21:51:20 -07:00
Jun Wu	c114d2499b	vlqencoding: add read_vlq_at API that works for AsRef<[u8]> Summary: This allows us to decode VLQ integers at a given offset, for anything that implements `AsRef<[u8]>`. Instead of having to couple with a `&mut Read` interface. The main benefit is to get rid of `mut`. The old `VLQDecode` interface has to use `&mut Read` since reading has a side effect of changing the internal position counter. Reviewed By: markbt Differential Revision: D7093998 fbshipit-source-id: 20cb14e38c828462c34f32245d0f0f512028b647	2018-04-13 21:51:19 -07:00
Jun Wu	e266793816	vlqencoding: add a benchmark Summary: I'm going to add more ways to do VLQ parsing (ex. reading from a `&[u8]` instead of a `Read` which has to be mutable). So let's add a benchmark to compare the `&[u8]` version with the `Read` version. Reviewed By: DurhamG Differential Revision: D7092960 fbshipit-source-id: e1189de10396516c732dc73b45b7690a1718f1c0	2018-04-13 21:51:19 -07:00
Jun Wu	f547ef9ed0	rust: vendor more crates Summary: criterion provides useful utilities for writing benchmarks. fs2 provides cross-platform file locking. memmap provides cross-platform mmap. atomicwrites provides cross-platform atomic file rewrite. twox-hash provides xxHash fast hash algorithm for integrity check usecase. Reviewed By: singhsrb Differential Revision: D7092764 fbshipit-source-id: a3a2a31c198e73701708d7124574ba447ab99c45	2018-04-13 21:51:19 -07:00
Jun Wu	c1bebda5d6	radixbuf: avoid using unstable features in buck build Summary: `test::Bencher` is an unstable feature, which is enabled by 3rd-party crate `rustc-test`. However, `rustc-test` does not work with buck build. So let's workaround that by allowing all usage of `test::Bencher` to be disabled by a feature. And turn on that feature in buck build. Cargo build will remain unchanged. Reviewed By: singhsrb Differential Revision: D7011703 fbshipit-source-id: e08ba9516bf7fadb6edb52ab107e0172df0aaf5b	2018-04-13 21:51:12 -07:00
Kostia Balytskyi	62ecc73818	hg: make sure platform_madvise_away returns -1 on Windows Summary: On the other two platforms we return the result of `madvise`, so let's return -1, as this is the error return value of `madvise` on POSIX. Reviewed By: quark-zju Differential Revision: D6979093 fbshipit-source-id: 7c715eb459aaad6c21fae6e346e8650211649182	2018-04-13 21:51:11 -07:00
Kostia Balytskyi	c85791785b	hg: build cdatapack on Windows Summary: Seems to be working now. Reviewed By: quark-zju Differential Revision: D6970927 fbshipit-source-id: e67753d811819015282f47fcbdfbb263d85f054f	2018-04-13 21:51:10 -07:00
Kostia Balytskyi	5d1139f87d	hg: move defines out of struct definition in cdatapack.c Summary: The current location of these defines is really odd and does not work with the current version of `PACKEDSTRUCT` macro expansion (it expands everything in the same line, therefore `#defines` are inline, which fails to compile. Reviewed By: quark-zju Differential Revision: D6970926 fbshipit-source-id: ed01042760fa729004e159b492cf67a4afd25923	2018-04-13 21:51:10 -07:00
Kostia Balytskyi	7d4f6a9033	hg: start using imported mman-win32 in the portability headers Summary: Let's create a new portability header, which can be used on both Windows and Posix. Reviewed By: quark-zju Differential Revision: D6970928 fbshipit-source-id: a3970c50260f52bfc0a9420a4ff11d93ace304b0	2018-04-13 21:51:10 -07:00
Kostia Balytskyi	67b2e1496a	hg: vendor a third-party implementation of mman library for Windows Summary: This is needed to make our C code compile on Windows. Reviewed By: quark-zju Differential Revision: D6970929 fbshipit-source-id: 2cfe46e0718fe75916912d0e59c5400038e03a12	2018-04-13 21:51:10 -07:00
Jun Wu	d942f5a88e	hg: basic support for building hg using buck Summary: Adds some basic building blocks to build hg using buck. Header files are cleaned up, so they are relative to the project root. Some minor changes to C code are made to remove clang build warnings. Rust dependencies, fb-hgext C/Python dependencies (ex. cstore, mysql-connector), and 3rd-party dependencies like python-lz4 are not built yet. But the built hg binary should be able to run most tests just fine. Reviewed By: wez Differential Revision: D6814686 fbshipit-source-id: 59eefd5a3ad86db2ad1c821ed824c9f1878c93e4	2018-04-13 21:50:58 -07:00
Phil Cohen	c097dde0b9	READMEs: tweaks based on feedback Summary: Based on feedback to D6687860. Test Plan: n/a Reviewers: durham, #mercurial Reviewed By: durham Differential Revision: https://phabricator.intern.facebook.com/D6714211 Signature: 6714211:1515788399:386b8f7330f343349234d1f317e5ac0a594142cf	2018-01-12 12:35:52 -08:00
Phil Cohen	bf8527e7a9	lib: add READMEs to lib, extlib, cext	2018-01-09 15:20:46 -08:00
Saurabh Singh	9da30944be	cfastmanifest: move to hgext/extlib/ Summary: Moves ctreemanifest into hgext/extlib/. D6679698 was committed to scratch branch by mistake. Test Plan: make local && cd tests && ./run-tests.py Reviewers: durham, #mercurial, #sourcecontrol Reviewed By: durham Differential Revision: https://phabricator.intern.facebook.com/D6684623 Signature: 6684623:1515522634:9bec363d00990d9ff7d5f655e30ab8cae636155c	2018-01-09 10:36:54 -08:00
Durham Goode	228e6a901e	cstore: move to hgext/extlib/ Summary: Moves cstore to hgext/extlib/ and makes it build. Test Plan: make local && run-tests.py Reviewers: #mercurial Differential Revision: https://phabricator.intern.facebook.com/D6678852	2018-01-08 17:55:53 -08:00
Durham Goode	eb099b7fe1	cdatapack: move to lib/ Summary: This moves the cdatapack code to the new lib/ directory and adds it to the main setup.py. Test Plan: hg purge --all && make local && cd tests && ./run-tests.py -S -j 48 Reviewers: #mercurial Differential Revision: https://phabricator.intern.facebook.com/D6677491	2018-01-08 17:55:53 -08:00
Jun Wu	1a84c9d5db	linelog: format the code using clang-format Summary: I didn't notice the test failure because clang-format was not installed. Might be a good idea to make it a hard error. Test Plan: Run test-check-clang-format.t Reviewers: phillco, #mercurial Reviewed By: phillco Subscribers: mathieubaudet Differential Revision: https://phabricator.intern.facebook.com/D6679576 Signature: 6679576:1515457526:6b1935858da284b896244b0d99e2fef03ead97b8	2018-01-08 16:22:30 -08:00
Jun Wu	1802036ff3	linelog: move to lib/ and mercurial/cyext Summary: The `lib/linelog` directory contains pure C code that is unrelated from either Mercurial or Python. The `mercurial/cyext` contains Cython extension code (although for linelog's case, the Cython extension is unrelated from Mercurial). Cython is now a hard dependence to simplify the code. Test Plan: `make local` and check `from mercurial.cyext import linelog` works. Reviewers: durham, #mercurial Reviewed By: durham Subscribers: durham, fried Differential Revision: https://phabricator.intern.facebook.com/D6678541 Signature: 6678541:1515455512:967266dc69c702dbff95fdea05671e11c32ebf28	2018-01-08 14:35:01 -08:00
Mark Thomas	2e81565606	fb-hgext: integrate rust libraries and extensions with setup.py Summary: Move the rust libraries and extensions to their new locations, and integrate them with the hg-crew setup.py. Test Plan: Run `python setup.py build` and verify rust extensions are built. Reviewers: durham, #mercurial Reviewed By: durham Subscribers: fried, jsgf, mitrandir Differential Revision: https://phabricator.intern.facebook.com/D6677251 Tasks: T24908724 Signature: 6677251:1515450235:920faf40babbce9b09e3283ff9ca328d1c5c51e6	2018-01-08 15:26:24 -08:00
Durham Goode	0938fe19a3	clib: move fb-hgext/clib/ to lib Summary: cdatapack depends on clib, so let's move it to lib/ outside of fb-hgext. None of the consumers of these files were changed. They will be changed as they are moved into the main part of the repo. Test Plan: hg purge --all && make local && cd tests && ./run-tests.py -S -j 48 Reviewers: mitrandir, #mercurial Reviewed By: mitrandir Differential Revision: https://phabricator.intern.facebook.com/D6677197 Signature: 6677197:1515447873:399fb3e7beb5cc1ad8db18f42b359ffbfbeb21f2	2018-01-08 15:08:18 -08:00
Durham Goode	1ab0bb112d	sha1: add sha1detectcoll library to setup.py Summary: cdatapack depends on sha1detectcoll, so let's add the library to setup.py before we add cdatapack. Test Plan: hg purge --all && make local && cd tests/ && ./run-tests.py -S -j 48 Verified sha1dc was in the build output and the tests passed. Reviewers: quark, #mercurial Reviewed By: quark Differential Revision: https://phabricator.intern.facebook.com/D6676405 Signature: 6676405:1515444508:2da65c6c3a18267a1d3c151c8e9acf60b674ffc2	2018-01-08 12:54:57 -08:00

... 8 9 10 11 12

572 Commits