Commit Graph

1269 Commits

Author SHA1 Message Date
Jun Wu
70b566d008 indexedlog: mark NotFound during mmap as data corruption
Summary:
Right now, not being able to find the mmap file can be seen as data corruption.
The only case that NotFound needs special handling is at open time.

This fixes some cases covered by an upcoming test about `repair`.

Reviewed By: xavierd

Differential Revision: D17741999

fbshipit-source-id: 1bd7c65c5a6381892723b31e2e749b22081e96d2
2019-10-03 19:57:32 -07:00
Jun Wu
6735395ee0 indexedlog: remove failure
Summary: `indexedlog` now no longer depends on `failure`.

Reviewed By: xavierd

Differential Revision: D17732135

fbshipit-source-id: 79526dcfa0b5e5a11baca1395573c2aea9c9cc12
2019-10-03 19:57:32 -07:00
Jun Wu
d12269cbcc indexedlog: add context for public RotateLog APIs
Summary:
Similar to the previous change, add context for RotateLog APIs.

This shows error context that might replace backtrace. For example, run:

  cargo test --test low_fileno_limit -- --nocapture

An example (complicated) error:

  "/tmp/.tmp7kTUWt/rotatelog/1": cannot create tempfile
  in log::OpenOptions::open("/tmp/.tmp7kTUWt/rotatelog/1")
    OpenOptions = OpenOptions { index_defs: ["key1"], fsync: false, create: true, checksum_type: Auto, flush_filter: None }
  cannot create new empty log after failing to read existing logs
  in rotate::OpenOptions::open("/tmp/.tmp7kTUWt/rotatelog")
    OpenOptions = OpenOptions { max_bytes_per_log: 50, max_log_count: 20, recovery_policy: ROTATE_ON_CORRUPTED_LATEST_LOG, log_open_options: OpenOptions { index_defs: ["key1"], fsync: false, create: true, checksum_type: Auto, flush_filter: None } }
  Caused by 2 errors:
  - Custom { kind: Other, error: PathError { path: "/tmp/.tmp7kTUWt/rotatelog/1/.tmp6dorJq", err: Os { code: 24, kind: Other, message: "Too many open files" } } }
  - "/tmp/.tmp7kTUWt/rotatelog": no valid logs found
    Caused by 1 errors:
    - "/tmp/.tmp7kTUWt/rotatelog/0/index-key1.sum": cannot open checksum file
      in ChecksumTable::new
      in index::OpenOptions::open("/tmp/.tmp7kTUWt/rotatelog/0/index-key1")
        OpenOptions = OpenOptions { checksum_chunk_size: 1048576, fsync: false, len: Some(0), write: None, key_buf: Some(_) }
      in log::OpenOptions::open("/tmp/.tmp7kTUWt/rotatelog/0")
        OpenOptions = OpenOptions { index_defs: ["key1"], fsync: false, create: false, checksum_type: Auto, flush_filter: None }
      Caused by 1 errors:
      - Os { code: 24, kind: Other, message: "Too many open files" }

(Ignoring whitespace will make this diff much easier to review)

Reviewed By: xavierd

Differential Revision: D17732131

fbshipit-source-id: b1685ded5c76c1200b9c1985749bd67588df1fb3
2019-10-03 19:57:31 -07:00
Jun Wu
7889031092 indexedlog: migrate RotateLog to use new Error type
Summary: Now all `indexedlog` APIs use the new new Error type.

Reviewed By: xavierd

Differential Revision: D17732136

fbshipit-source-id: 8d306a08d8e8052d1c5e68fc5f05a9eed5c7d21f
2019-10-03 19:57:31 -07:00
Jun Wu
029f233d32 indexedlog: make atomic_write return new Error type
Summary: This provides more details, and makes callsites simpler.

Reviewed By: xavierd

Differential Revision: D17732127

fbshipit-source-id: 0fe6dedee4ebb8874ea95505c86d8b107e3367ff
2019-10-03 19:57:31 -07:00
Jun Wu
0a045becd1 indexedlog: add error context for public Log APIs
Summary:
Similar to the previous change, add context for Log APIs.

This shows error context that might replace backtrace. For example, run:

  cargo test --test low_fileno_limit -- --nocapture

An example error looks like:

  "/tmp/.tmpjrsfQt/rotatelog/1/index-key1": cannot duplicate file descriptor
  in ChecksumTable::try_clone
  in Index::try_clone
    Index.path = "/tmp/.tmpjrsfQt/rotatelog/1/index-key1"
  in Log::sync
    Log.dir = Some("/tmp/.tmpjrsfQt/rotatelog/1")
  Caused by 1 errors:
  - Os { code: 24, kind: Other, message: "Too many open files" }

(Ignoring whitespace will make this diff much easier to review)

Reviewed By: xavierd

Differential Revision: D17732124

fbshipit-source-id: b0d500652d80b4a4755453c69bc05d467ecbdf90
2019-10-03 19:57:30 -07:00
Jun Wu
b5708c5caa indexedlog: add error context for public Index APIs
Summary:
Since we lost backtrace by opting out failure, it'd be nice to restore some
"backtrace" information like what Index function is being called.

This diff adds it. It also includes more context like what key is being looked
up so it might actually be more useful than backtrace.

(Ignoring whitespace will make this diff much easier to review)

Reviewed By: xavierd

Differential Revision: D17732126

fbshipit-source-id: 8e5a2c714bee8a943076818f0cff3a21498a954e
2019-10-03 19:57:30 -07:00
Jun Wu
7d6d6ebfb0 indexedlog: migrate Log to use new Error type
Summary: This basically involves adding contexts for io::Error and other error types.

Reviewed By: xavierd

Differential Revision: D17732130

fbshipit-source-id: 79fb3b93d57562f1922f3990a8bda0018d2675e8
2019-10-03 19:57:30 -07:00
Jun Wu
48ceb99202 indexedlog: add utils::mmap_len
Summary: The new utlity function makes it easier to deal with mmap errors.

Reviewed By: xavierd

Differential Revision: D17732139

fbshipit-source-id: 93c8209b983d51198ebb367db983a2e9bc498d63
2019-10-03 19:57:29 -07:00
Jun Wu
f9f969319d indexedlog: add directory locking utilities
Summary: This makes it easier to lock a directory and makes error handling easier.

Reviewed By: xavierd

Differential Revision: D17732133

fbshipit-source-id: a404d41c0aaee7aad43271433f1352a8aa06bccb
2019-10-03 19:57:29 -07:00
Jun Wu
eb53228f47 indexedlog: migrate part of Index to new Error type (7)
Summary:
Migrate the remaining part of Index functions to use the new Error type. This
gives us an accurate view about whether an error indicates data corruption or
not, and makes the code more friendly - it works with `std::error::Error` now.

Reviewed By: xavierd

Differential Revision: D17705168

fbshipit-source-id: 8ae518602e7379d121e718a08127f0873f2e2423
2019-10-03 19:57:29 -07:00
Jun Wu
0def09884f indexedlog: migrate part of Index to new Error type (6)
Summary:
Migrate some return types from Fallible to the new Result. The main changes are
the way `io::Result` gets handled. The new API enforces attaching a `path` and
a message to them.

Reviewed By: xavierd

Differential Revision: D17705163

fbshipit-source-id: d060bdb2846a75c588b99201fd07ca3872f3a358
2019-10-03 19:57:29 -07:00
Jun Wu
0740e20b83 indexedlog: migrate part of Index to new Error type (5)
Summary:
Migrate more free-form errors handling like `data_error`, `parameter_error`
to the new Error type.

Reviewed By: xavierd

Differential Revision: D17705164

fbshipit-source-id: 45560a96e36fb5e83a9e365506e27c201f9448a6
2019-10-03 19:57:28 -07:00
Jun Wu
bfc618ea06 indexedlog: migrate part of Index to new Error type (4)
Summary:
Migrate `range_error` and `verify_checksum` to the `IndexBuf` trait so they
all get path information on error. Remove the free-form `range_error` and
`verify_checksum` functions.

Reviewed By: xavierd

Differential Revision: D17705165

fbshipit-source-id: 556fda8081c69b6beccc8c666902810a90635231
2019-10-03 19:57:28 -07:00
Jun Wu
de36889bf6 indexedlog: migrate part of Index to new Error type (3)
Summary:
A lot of functions take (buf, checksum) tuple, instead of `Index` for input.
That is to avoid issues where borrowing the entire `Index` forbids modifying
other fields in `Index`.

However, not taking `Index` means it cannot figure out the file path on error.

To solve both problems, this diff defines a trait that is a subset of Index
including (on-disk buf, checksum, path). Then migrate functions from using
(buf, checksum) to the new trait (if it only needs to read from the on-disk
buffer), or &Index (if it also needs to work with in-memory dirty/mutable
data).

Reviewed By: xavierd

Differential Revision: D17705166

fbshipit-source-id: 90bde88142ea3718a2093beb02b8030d725a0e15
2019-10-03 19:57:28 -07:00
Jun Wu
3a8a96388d indexedlog: migrate part of Index to new Error type (2)
Summary:
Change some `range_error` to `Index::range_error`.
The new error is better because it includes path information.

Reviewed By: xavierd

Differential Revision: D17705162

fbshipit-source-id: 1de1c7cdd730fcf7c6c39e9e5840939fa561bc33
2019-10-03 19:57:27 -07:00
Jun Wu
2b842d8c79 indexedlog: migrate part of Index to new Error type (1)
Summary:
Change `read_bitmap_unchecked` and `read_raw_int_unchecked` to use the new
Error type. Change their function signature from taking `&[u8]` to taking
`&Index` so we can get the file path in the error message.

Reviewed By: xavierd

Differential Revision: D17705167

fbshipit-source-id: 82bcbe21061cdf993d5c7f9867941c1f936166e5
2019-10-03 19:57:27 -07:00
Jun Wu
52f8171869 indexedlog: migrate ChecksumTable to new Error type
Summary:
Migrate to the new Error type so we can know whether an error is considered
as a data corruption. The new Error should also provide more explicit error
messages.

(This diff is easier to review if whitespace changes are all ignored)

Reviewed By: xavierd

Differential Revision: D17696536

fbshipit-source-id: bfceffbf75a75940a90c914da7914a601d75a747
2019-10-03 19:57:27 -07:00
Jun Wu
0f3fda039d indexedlog: define a way to convert io::Result to Result
Summary:
`io::Result` is widely used in indexedlog internal and they need to be
converted to `Result`.

This diff defines the conversion function. It enforces 2 context parameters:
- File path.
- What operations is it? This is needed since we will lose the backtrace.

Reviewed By: xavierd

Differential Revision: D17696533

fbshipit-source-id: d9417a6b65cbfbb5d6d7d1c6449ddd13e3035b5c
2019-10-03 19:57:26 -07:00
Jun Wu
d58e5c3984 indexedlog: define new error types
Summary:
I need to make RotateLog understand whether errors occured in Log/std::io/Index
are data corruption or not. To be explicit, I defined a `is_data_corruption`
method. Downcasting a chain does not look like a confident solution (ex. less
confident to check that it covers all possible cases).

There are other motivations for this change:
- `failure`: it is unfriendly in a low level library; it requires callsites to
  use failure, too. `failure` is less maintained - it still provides the nice
  backtrace feature but it's more friendly if libraries just use std Error (we
  lose backtrace inside the library, but hopefully the errors are in a high
  quality so backtrace in the application is enough for debugging).
- Error with multi-sources. Both std and failure Error provides one slot for
  "cause". Sometimes it's desirable to use multiple slots. For example,
  RotateLog::open fails to read existing logs, and also fails to auto recover
  by creating a new log. In that case, ideally we keep both errors in the
  returned type.

Reviewed By: xavierd

Differential Revision: D17696532

fbshipit-source-id: 0387b3a3b71f097b1a3dc2dcc7671a43c465abb2
2019-10-03 19:57:26 -07:00
Jun Wu
069159e531 indexedlog: add a test about running indexedlog in low fileno limit
Summary:
This test checks that the RotateLog can still be opened, and read if the fileno
limit crashes other writer processes "randomly".

Reviewed By: xavierd

Differential Revision: D17676318

fbshipit-source-id: e08528189adfa260047c357c723c87735592ec8f
2019-10-03 19:57:26 -07:00
Jun Wu
c3d8074aa2 indexedlog: add more context for RotateLog::open
Summary: This logs more contexts for errors that might help debugging.

Reviewed By: xavierd

Differential Revision: D17670723

fbshipit-source-id: d22fb53689c0766b99aa344659a15148017212ad
2019-10-03 19:57:26 -07:00
Xavier Deguillard
98046de9e9 revisionstore: remove Key argument from PackStore::run
Summary: This is no longer needed.

Reviewed By: quark-zju

Differential Revision: D17729378

fbshipit-source-id: 43d24df01dfae0449473d33fa851114e951197b0
2019-10-03 14:38:22 -07:00
Xavier Deguillard
d0b81200dd revisionstore: remove KeyError
Summary:
Instead of using KeyError to indicate that data isn't found, let's use an
Option. The Option type better encode that data is missing without having to do
a potentially error prone downcast, this may also enable us to set
RUST_BACKTRACE=1 everywhere as we won't except errors to happen often anymore,
previously, Mercurial will slow to a crawl due to the many KeyError being
thrown around.

I initially wanted to keep the change small to help reviews, but that didn't
really work out, as the dependencies on the `DataStore`/`HistoryStore` traits
are all over the place...

Reviewed By: quark-zju

Differential Revision: D17728486

fbshipit-source-id: de89c4fc441fd12ff37cc248e2230e4a1403ce44
2019-10-03 14:38:22 -07:00
Jun Wu
f00b8d6ca3 revisionstore: do not delete indexedlog cache on od hosts
Summary:
This is a short-term fix to help surface the real errors.

Instead of silently deleting or renaming data, surface the error so we
can get crash/traceback logged, and can login to investigate the broken
state.

Reviewed By: xavierd

Differential Revision: D17729511

fbshipit-source-id: b066ef12101aa742b4834bfd2e90bcb42fa15aff
2019-10-02 17:48:39 -07:00
Arun Kulshreshtha
f6d075eee5 manifest: move File struct to file module
Summary: Now that the `File` type is part of the crate's public API, it should be placed in the `files` module along with all the other exported file-related types (such as `FileMetadata`).

Reviewed By: xavierd

Differential Revision: D17726709

fbshipit-source-id: 4e3c0100ca765a7145f9eea49aa0b7ff11496c4b
2019-10-02 17:13:38 -07:00
Arun Kulshreshtha
7dd8231716 types: print parent hashes when content validation fails
Summary: When a downloaded manifest node fails to validate, investigating the issue generally requires the p1/p2 nodes (to manually compute the hash and compare it to the expected value). As such, let's print these out as part of the error message.

Reviewed By: xavierd

Differential Revision: D17724746

fbshipit-source-id: 0b1eb8d5344c0376a5895745dcdfb1092ad06321
2019-10-02 16:47:02 -07:00
Durham Goode
8b0b788ba8 hgcommands: print errors to io instead of stderr
Summary:
HgPython::run_hg was printing errors directly to stderr instead of to
the provided io.error. This caused unhandlable output in the -t.py tests. Let's
fix it to output to the provided pipe.

Reviewed By: quark-zju

Differential Revision: D17634721

fbshipit-source-id: f441e7be461193ef54db25e0939b2e67cdf06126
2019-10-02 10:12:19 -07:00
Arun Kulshreshtha
78fe67ae86 manifest: add BFS prefetch function
Summary:
Add a client-driven tree prefetching implementation to the Rust manifest code. Unlike the existing prefetch implementation in Python, this one does all computation of which nodes to fetch on the client side using the BFS logic from BfsDiff. The trees are then bulk fetched layer-by-layer using EdenAPI.

This initial version is fairly naive, and omits some obvious optimizations (such as performing fetches of multiple trees concurrently), but is sufficient to demonstrate HTTP tree prefetching in action.

Reviewed By: xavierd

Differential Revision: D17379178

fbshipit-source-id: f17fe99834ad4fec07b4a4ab196928cc4fe91142
2019-09-30 20:26:36 -07:00
Arun Kulshreshtha
c48652ad4b manifest: walk manifest in breadth-first order
Summary:
Change the `Files` iterator in the Rust manifest code to traverse the tree in BFS order, allowing for layer-by-layer prefetching similar to `Diff`. This can substantially speed up walks over the tree when the cache is cold.

As a side-effect, this changes the order in which paths are reported during a manifest walk. (In particular, they are now reported in breadth-first order rather than depth-first order.) This may break things that rely on the existing ordering; as such, we may need to add a sort somewhere if this turns out to be a problem.

Reviewed By: xavierd

Differential Revision: D17645389

fbshipit-source-id: 624e426094a93e206bde4523ea8bd034fe5aeb90
2019-09-30 17:04:43 -07:00
Zeyi (Rice) Fan
784f741312 hg: make configparser buildable with cmake
Reviewed By: simpkins

Differential Revision: D17505870

fbshipit-source-id: 974ed48424b46a0828133b9fb05da86129c3e21f
2019-09-30 10:41:04 -07:00
Andres Suarez
c333ddd733 Enable text linter for toml files
Reviewed By: mzlee

Differential Revision: D17638063

fbshipit-source-id: 9ee13ab9ac2f2202d33028c34922234d51d5dd08
2019-09-27 16:21:47 -07:00
Stefan Filip
92f92d5138 manifest: add test verifying materialization in finalize
Summary:
This test checks that a directory without modifications between tree
and parent is not going to be materialized.

Reviewed By: quark-zju

Differential Revision: D17540173

fbshipit-source-id: 465f1e0410c42a55665bcd6903d75266c61d5e80
2019-09-27 13:07:21 -07:00
Arun Kulshreshtha
55d07eeb94 manifest: remove dfs diff
Summary: Remove the DFS diff implementation and replace it with the BFS implementation.

Reviewed By: xavierd

Differential Revision: D17618818

fbshipit-source-id: d486642caae924f866a200d3c82fa5a4cb7d5286
2019-09-27 11:07:23 -07:00
Arun Kulshreshtha
323aaa9dd1 manifest: copy dfs diff tests into bfs diff module
Summary: To ensure feature parity between BFS diff and DFS diff, copy the DFS tests into the BFS module and ensure they pass.

Reviewed By: xavierd

Differential Revision: D17618820

fbshipit-source-id: b516abbfa4e231fdc383293d94d8965333f2ab99
2019-09-27 11:07:23 -07:00
Zeyi (Rice) Fan
951b9e95f5 getdeps: add rust_static_library to build Rust crate
Reviewed By: simpkins

Differential Revision: D16945510

fbshipit-source-id: a7a88cd94235e3f8c01235d7e7e500c90bde3b38
2019-09-26 15:50:51 -07:00
Mateusz Kwapich
114bc9d1aa xdiff: add a friendly wrapper for rust bindings (xdiff)
Summary:
This will make using xdiff much easier.

In the next diff I'm planning to also add a function that converts the output of this function to a textual diff (the one with `+`'s and `-`'s).

Reviewed By: quark-zju

Differential Revision: D17551184

fbshipit-source-id: cda332e817f733d7aa32aeeb7b2d312d971826dd
2019-09-26 11:01:09 -07:00
Mateusz Kwapich
02db823918 xdiff: add rust bindings (xdiff-sys)
Summary:
These are limited (not all features are exposed) bindings for xdiff - the diff library used by git and our version of hg. We need them to be able to generate diffs in Mononoke.

In the next diff I'll add more rust-friendly wrapper library.

Reviewed By: quark-zju

Differential Revision: D17548528

fbshipit-source-id: f23c8a65d11d2c5de8f0456d32883f16b19a98e2
2019-09-26 11:01:09 -07:00
Jun Wu
4afaf065fe indexedlog: add checks about the append-only property
Summary:
Both Index and Log requires their on-disk files to be append-only.
Detect non-append-only changes and return errors. This might help
us get better error messages if the case actually happens.

Reviewed By: markbt

Differential Revision: D17592914

fbshipit-source-id: b12791177ceb04f2373e93a679101e8b96e2bc98
2019-09-26 10:19:44 -07:00
Jun Wu
11b6baa066 indexedlog: stresstest RotateLog multi-thread writes
Summary:
Similar to Log, this somehow stress tests RotateLog behavior in a multi-thread
environment.

Reviewed By: xavierd

Differential Revision: D17542324

fbshipit-source-id: 35ea358157cf141bec3802b959c9f921eca3143a
2019-09-25 18:59:39 -07:00
Jun Wu
e0ac732667 indexedlog: stresstest multi-thread sync()
Summary:
Add a simple stresstest that calls sync() in multiple threads. This should
give us some confidence that `sync()` has expected behavior when called in
an multi-thread environment.

Reviewed By: xavierd

Differential Revision: D17538980

fbshipit-source-id: 1793a3f871f0377c452807efa466d65d0da4b1f6
2019-09-25 18:59:39 -07:00
Jun Wu
ab3da9107c blackbox: do not inherit session_id if pid has changed
Summary:
D17429691 made blackbox reuse session_id unconditionally. That has an
undesirable side effect that chg processes are all logged as a same session id.
Fix that by detecting pid change and avoid reusing session_id in that case.

Reviewed By: singhsrb

Differential Revision: D17532555

fbshipit-source-id: cf11bb66f7d7242429b90ab5e5ea85ca307f92c3
2019-09-25 17:17:40 -07:00
Jun Wu
bf2ff65093 indexedlog: add more context to some errors
Summary:
Add some context around "invalid read offset" to make errors slightly more
useful.

Reviewed By: xavierd

Differential Revision: D17577202

fbshipit-source-id: d51ba30abf6c462102be8bec1b60668ee66e07f2
2019-09-25 14:08:19 -07:00
Jun Wu
71b243f3f8 revisionstore: keep error'd indexedlog files for debugging
Summary:
Instead of removing them unconditionally, keep one copy that failed to `open`
so we can have a look later.

Reviewed By: xavierd

Differential Revision: D17576432

fbshipit-source-id: 4f967d61aa602e6d3cac90d411e1971893c162bd
2019-09-25 14:08:19 -07:00
Jun Wu
e7a21f8b44 indexedlog: revise error handling in RotateLog::open
Summary:
Revise some error handling details so it covers corner cases more acurately and
provides more detailed error messages.

Since D16554090, RotateLog::open unconditionally attempts to create an empty
log at 0/ and reset latest to 0 if read_latest_and_log fails. That could be
undesirable if latest can be read but logs cannot, since it can silently reset
latest to 0 and might cause trouble in the future (For example, failed to
create an empty log at 1/ because it already exits).

This diff splits read_latest_and_log to read_latest and read_logs and handles
their errors individually.

The table summaries changes:

  | latest  | logs     | old behavior | new behavior        |
  | okay    | okay     | return both  | return both         |
  | okay    | error    | create log 0 | create log latest+1 |
  | missing | whatever | create log 0 | create log 0        |
  | error   | whatever | create log 0 | error               |

Reviewed By: xavierd

Differential Revision: D17576431

fbshipit-source-id: c9ab1fca5fb60eecf9e326baf90dfa98560a2b32
2019-09-25 14:08:19 -07:00
Carolyn Busch
537419bcea Workingcopy: Remove matcher reference
Summary:
Walker should own matcher instead of storing a reference, so the walker can be
stored as a member of a struct by itself

Reviewed By: xavierd

Differential Revision: D17511588

fbshipit-source-id: 039c6c3cced7feec4e9141c31e5333c43879484a
2019-09-25 11:00:05 -07:00
Jun Wu
31ec755dd0 indexedlog: fix a race condition creating "lock" file on Windows
Summary:
The current way of opening the lock file and creating it on demand is racy.
Fix it by making it one operation.

Reviewed By: singhsrb

Differential Revision: D17552687

fbshipit-source-id: 5469862902ccab2d317f2c0ac61867c365e22aba
2019-09-24 17:47:25 -07:00
Stefan Filip
6e9cdfe56c manifest: fix finalize to not eagerly fetch directories
Summary:
Finalize is asking the cursor to traverse into directories that haven't changed.
This is bug introduced when updating finalize to support being called on
"Durable" nodes. Until then directories would always be traversed if they were
in the processing path. The path would only be chosen for "Ephemeral"
directories which we knew were different from a parent that is assumed to be
"Durable". I later learned that `finalize` is expected to return the manifests
that are directly fetched from storage. The update meant that we would skip
the directory that is processed if the "Node" (hash) is present and matches a
parent. The problem is that didn't update the point at which the parent cursor
is advanced.

Reviewed By: xavierd

Differential Revision: D17537448

fbshipit-source-id: 9c71a8f8f5a70c600031bc9d32535e59f2f32700
2019-09-24 10:13:38 -07:00
Jun Wu
aa8e29b5a7 nodemap: add nodeset structure
Summary: This will be used to store "invisible heads".

Reviewed By: sfilipco

Differential Revision: D17264837

fbshipit-source-id: a450b5c10cc961d43ec8eb852cb2fb22849a8c00
2019-09-23 17:11:23 -07:00
Jun Wu
20dfef4801 bindings: implement lazy iteration on spans
Summary:
The SpanSet can include a large amount of revs. Iterating through it by putting
everything in a PyList is suboptimal. Therefore add a dedicated native iterator
for it. This speeds up iteration greatly, which can be verified via debugshell:

Before:

  In [1]: s=m.smartset.spansset(b.dag.spans(xrange(5000000)))
  In [2]: %time s.slice(0,10)
  CPU times: user 135 ms, sys: 42.9 ms, total: 178 ms
  Wall time: 180 ms

After:

  In [1]: s=m.smartset.spansset(b.dag.spans(xrange(5000000)))
  In [2]: %time s.slice(0,10)
  CPU times: user 49 µs, sys: 6 µs, total: 55 µs
  Wall time: 58.2 µs

Reviewed By: sfilipco

Differential Revision: D17305350

fbshipit-source-id: 0db00aa57fb6bf2141ccea94b2536da78f103cef
2019-09-23 17:11:21 -07:00