Summary:
EdenAPI makes heavy use of streaming HTTP responses consisting of a series of serialized CBOR values. In order to process the data in a streaming manner, we use the `CborStream` combinator, which attempts to deserialize the CBOR values as they are received.
`CborStream` hits a pathological case when it receives a very large CBOR value. Previously, it would always buffer the input stream into 1 MB chunks, and attempt to deserialize whenever a new chunk was received. In the case of downloading values that are >1GB in size, this meant potentially thousands of wasted deserialization attempts. In practice, this meant that EdenAPI would hang when receiving the content of large files.
To address this problem, this diff adds a simple heuristic: If a partial CBOR value exceeds the current buffer size, double the size threshold before attempting to deserialize again. This reduces the pathological case from `O(n^2)` to `O(log(n))` (with some caveats, described in the comment in the code).
Reviewed By: krallin
Differential Revision: D27759698
fbshipit-source-id: 67882c31ce95a934b96c61f1c72bd97cad942d2e
Any native code (C/C++/Rust) that Mercurial (either core or extensions)
depends on should go here. Python code, or native code that depends on
Python code (e.g. #include <Python.h> or use cpython) is disallowed.
As we start to convert more of Mercurial into Rust, and write new paths
entrirely in native code, we'll want to limit our dependency on Python, which is
why this barrier exists.
See also hgext/extlib/README.md, mercurial/cext/README.mb.
How do I choose between lib and extlib (and cext)?
If your code is native and doesn't depend on Python (awesome!), it goes here.
Otherwise, put it in hgext/extlib (if it's only used by extensions) or
mercurial/cext (if it's used by extensions or core).