sapling/lib/pathencoding
Jun Wu 545f670504 pathencoding: utility for converting between bytes and paths
Summary:
A simple utility that does paths <-> local bytes conversion. It's needed
since Mercurial stores paths using local encoding in manifests.

For POSIX, the code is zero-cost - no real conversion or error can happen.
This is in theory cheaper than what treedirstate does.

For Windows, the "local_encoding" crate is selected as Yuya suggested the
`MultiByteToWideChar` Win32 API [1] and "local_encoding" uses it. It does
the right thing given my experiment with GBK (Chinese, simplified) encoding.

```
  ....
  C:\Users\quark\enc>hg debugshell --config extensions.debugshell=
  >>> repo[0].manifest().text()
  '\xc4\xbf\xc2\xbc1/\xce\xc4\xbc\xfe1\x00b80de5d138758541c5f05265ad144ab9fa86d1db\n'
  >>> repo[0].files()
  ['\xc4\xbf\xc2\xbc1/\xce\xc4\xbc\xfe1']
  extern crate local_encoding;
  use std::path::PathBuf;
  use local_encoding::{Encoder, Encoding};
  const mpath: &[u8] = b"\xc4\xbf\xc2\xbc1/\xce\xc4\xbc\xfe1";
  fn main() {
      let p = PathBuf::from(Encoding::OEM.to_string(mpath).unwrap());
      println!("exists: {}", p.exists());
      println!("mpath len: {}, osstr len: {}", mpath.len(), p.as_path().as_os_str().len());
  }
  exists: true
  mpath len: 11, osstr len: 15
```

In the future, we might normalize the paths to UTF-8 before storing them in
manifest to avoid issues.

Differential Revision: D7319604

fbshipit-source-id: a7ed5284be116c4176598b4c742e8228abcc3b02
2018-04-13 21:51:35 -07:00
..
src pathencoding: utility for converting between bytes and paths 2018-04-13 21:51:35 -07:00
Cargo.toml pathencoding: utility for converting between bytes and paths 2018-04-13 21:51:35 -07:00