sapling/lib
Jun Wu 545f670504 pathencoding: utility for converting between bytes and paths
Summary:
A simple utility that does paths <-> local bytes conversion. It's needed
since Mercurial stores paths using local encoding in manifests.

For POSIX, the code is zero-cost - no real conversion or error can happen.
This is in theory cheaper than what treedirstate does.

For Windows, the "local_encoding" crate is selected as Yuya suggested the
`MultiByteToWideChar` Win32 API [1] and "local_encoding" uses it. It does
the right thing given my experiment with GBK (Chinese, simplified) encoding.

```
  ....
  C:\Users\quark\enc>hg debugshell --config extensions.debugshell=
  >>> repo[0].manifest().text()
  '\xc4\xbf\xc2\xbc1/\xce\xc4\xbc\xfe1\x00b80de5d138758541c5f05265ad144ab9fa86d1db\n'
  >>> repo[0].files()
  ['\xc4\xbf\xc2\xbc1/\xce\xc4\xbc\xfe1']
  extern crate local_encoding;
  use std::path::PathBuf;
  use local_encoding::{Encoder, Encoding};
  const mpath: &[u8] = b"\xc4\xbf\xc2\xbc1/\xce\xc4\xbc\xfe1";
  fn main() {
      let p = PathBuf::from(Encoding::OEM.to_string(mpath).unwrap());
      println!("exists: {}", p.exists());
      println!("mpath len: {}, osstr len: {}", mpath.len(), p.as_path().as_os_str().len());
  }
  exists: true
  mpath len: 11, osstr len: 15
```

In the future, we might normalize the paths to UTF-8 before storing them in
manifest to avoid issues.

Differential Revision: D7319604

fbshipit-source-id: a7ed5284be116c4176598b4c742e8228abcc3b02
2018-04-13 21:51:35 -07:00
..
cdatapack hg: some portability fixes to py-cdatapack.h 2018-04-13 21:51:24 -07:00
clib hg: start using imported mman-win32 in the portability headers 2018-04-13 21:51:10 -07:00
indexedlog indexedlog: move mmap_readonly to utils 2018-04-13 21:51:25 -07:00
linelog hg: basic support for building hg using buck 2018-04-13 21:50:58 -07:00
pathencoding pathencoding: utility for converting between bytes and paths 2018-04-13 21:51:35 -07:00
radixbuf radixbuf: avoid using unstable features in buck build 2018-04-13 21:51:12 -07:00
third-party xdiff: add a preprocessing step that trims files 2018-04-13 21:51:25 -07:00
vlqencoding vlqencoding: add read_vlq_at API that works for AsRef<[u8]> 2018-04-13 21:51:19 -07:00
README.md READMEs: tweaks based on feedback 2018-01-12 12:35:52 -08:00

lib

Any native code (C/C++/Rust) that Mercurial (either core or extensions) depends on should go here. Python code, or native code that depends on Python code (e.g. #include <Python.h> or use cpython) is disallowed.

As we start to convert more of Mercurial into Rust, and write new paths entrirely in native code, we'll want to limit our dependency on Python, which is why this barrier exists.

See also hgext/extlib/README.md, mercurial/cext/README.mb.

How do I choose between lib and extlib (and cext)?

If your code is native and doesn't depend on Python (awesome!), it goes here.

Otherwise, put it in hgext/extlib (if it's only used by extensions) or mercurial/cext (if it's used by extensions or core).