Commit Graph

44922 Commits

Author SHA1 Message Date
Jun Wu
dd159e20ee setup: build and install IPython.zip at mercurial/thirdparty
Summary:
IPython is pretty handy (ex. `%timeit`, `?` etc). It should probably be the
recommended way to explore the code base for new people. This is a step towards
making it available for all platforms.

It's also smaller when compiled directly. The zip file containing all `.pyc`
files is just 8MB. When installed using CentOS's system package, a lot of GUI
dependencies will be installed, which need hundreds of MBs.

Note the zip only contains pure Python modules. The only native dependency
seems to be `scandir`. But `scandir` has a pure Python fallback. So it still
runs, just slower.

Reviewed By: DurhamG

Differential Revision: D13352617

fbshipit-source-id: 2ecbef69438ffc478389c5bec67bec5f83f7edeb
2018-12-11 16:19:58 -08:00
Jun Wu
8a8cfc23ed setup: add IPython dependencies
Summary:
To build IPython locally, its dependencies need to be fetched.

The direct motivation behind IPython is, recently when new people tried to hack
on Mercurial, they don't have a great way to understand the dynamic codebase.
Some of them use static code search, which does not work well. IPython's `?`
feature is pretty handy to get the location of related definitions. Together
with `hg dbsh`, it's much easier to explore the code base easily.

So let's just ship hg with IPython bundled.

Note installing IPython is also not very smooth:
- IPython on CentOS 7 is too old (3.x) and installing it brings in too many
  GUI dependencies unnecessarily.
- IPython on Windows is not that easy to install, as we shipped our own
  python.exe without corresponding pip.

Reviewed By: DurhamG

Differential Revision: D13352616

fbshipit-source-id: 91750664170c99f556ca406e718b030bf509f275
2018-12-11 16:19:58 -08:00
Jun Wu
b80cda8bf8 setup: cleanup logic fetching re2 source code
Summary:
Previously we had 2 different places to fetch re2 source code: Makefile and
Windows-only build_nupkg.py. We now have a command in setup.py that fetches them.
Let's just call the "fetch_build_deps" command as part of "build_ext".

This also makes re2 a required component. It's no longer optional.

Reviewed By: DurhamG

Differential Revision: D13352619

fbshipit-source-id: 0bd93560acfbc2e900005a20e4b33a236aad5f98
2018-12-11 16:19:58 -08:00
Jun Wu
950de97ae5 setup: add a step to download build dependencies
Summary:
The way to download build dependencies are not consistent and reinvented in a
couple of places. For example:

- Cython: error out if it's not avaialable
- Rust: (the fb part of) distutils_rust auto-installs it
- Re2: Makefile and build_nupkg.py are doing the same thing
- Various Windows dependencies: downloaded by the fb part of build_nupkg.py

This is an attempt to unify part of them to be a command of setup.py:
"fetch_build_deps".

The motivation was to get IPython bundled. For now I just add re2 to the asset
list. Upcoming diffs will add IPython. Cython might be a good fit, too.

Reviewed By: DurhamG

Differential Revision: D13352622

fbshipit-source-id: 151d299663eba9bb49c9577be0e224f9de8a9912
2018-12-11 16:19:58 -08:00
Jun Wu
a256e07d68 setup: teach shutil.rmtree to handle read-only files on Windows
Summary:
While running `make local` on Windows, I got:

  running build_rust_ext
  downloading vendored crates 'tp2-crates-io'
  [crates-io.py] downloading vendored crates archive from LFS
  [crates-io.py] removing outdated vendor directory
  Traceback (most recent call last):
    File "fbcode\tools\lfs\crates-io.py", line 101, in <module>
      download()
    File "fbcode\tools\lfs\crates-io.py", line 69, in download
      shutil.rmtree(VENDOR_DIRNAME)
    File "fbcode\scm\hg\build\hg-python\lib\shutil.py", line 247, in rmtree
      rmtree(fullname, ignore_errors, onerror)
    File "fbcode\scm\hg\build\hg-python\lib\shutil.py", line 247, in rmtree
      rmtree(fullname, ignore_errors, onerror)
    File "fbcode\scm\hg\build\hg-python\lib\shutil.py", line 247, in rmtree
      rmtree(fullname, ignore_errors, onerror)
    File "fbcode\scm\hg\build\hg-python\lib\shutil.py", line 252, in rmtree
      onerror(os.remove, fullname, sys.exc_info())
    File "fbcode\scm\hg\build\hg-python\lib\shutil.py", line 250, in rmtree
      os.remove(fullname)
  WindowsError: [Error 5] Access is denied: 'vendor\\lalrpop\\src\\parser\\lrgrammar.rs'
  error: download of Rust vendored crates 'tp2-crates-io' failed
  make.exe: *** [local] Error 1

This is caused by the file being deleted has "read-only" attribute set on Windows.
Fix it by removing the read-only attribute automatically.
As we're here, also change `crates-io.py` to do the same thing as it's used by `setup.py`.

Reviewed By: ikostia

Differential Revision: D13413081

fbshipit-source-id: b10c44fd152a61c021edf6a8d86cb82a339f366f
2018-12-11 12:19:02 -08:00
Jun Wu
93e4f4ef7b commitcloud: attempt to fix no '_localstr_to_mysql' error
Summary:
There are certain commits that failed to backup with this error:

  remote: Traceback (most recent call last):
  remote:   File "/usr/lib64/python2.7/site-packages/mercurial/dispatch.py", line 590, in _callcatch
  remote:     return scmutil.callcatch(ui, func)
  remote:   File "/usr/lib64/python2.7/site-packages/mercurial/scmutil.py", line 160, in callcatch
  remote:     return func()
  remote:   File "/usr/lib64/python2.7/site-packages/mercurial/dispatch.py", line 571, in _runcatchfunc
  remote:     return _dispatch(req)
  remote:   File "/usr/lib64/python2.7/site-packages/mercurial/dispatch.py", line 1357, in _dispatch
  remote:     lui, repo, cmd, fullargs, ui, options, d, cmdpats, cmdoptions
  remote:   File "/usr/lib64/python2.7/site-packages/hgext/clienttelemetry.py", line 91, in _runcommand
  remote:     return orig(lui, repo, cmd, fullargs, ui, options, d, cmdpats, cmdoptions)
  remote:   File "/usr/lib64/python2.7/site-packages/mercurial/dispatch.py", line 1061, in runcommand
  remote:     ret = _runcommand(ui, options, cmd, d)
  remote:   File "/usr/lib64/python2.7/site-packages/mercurial/dispatch.py", line 1369, in _runcommand
  remote:     return cmdfunc()
  remote:   File "/usr/lib64/python2.7/site-packages/mercurial/dispatch.py", line 1354, in <lambda>
  remote:     d = lambda: util.checksignature(func)(ui, *args, **strcmdopt)
  remote:   File "/usr/lib64/python2.7/site-packages/mercurial/util.py", line 1319, in check
  remote:     return func(*args, **kwargs)
  remote:   File "/usr/lib64/python2.7/site-packages/mercurial/util.py", line 1319, in check
  remote:     return func(*args, **kwargs)
  remote:   File "/usr/lib64/python2.7/site-packages/hgext/directaccess.py", line 118, in wrapwitherror
  remote:     return orig(ui, repo, *args, **kwargs)
  remote:   File "/usr/lib64/python2.7/site-packages/mercurial/util.py", line 1319, in check
  remote:     return func(*args, **kwargs)
  remote:   File "/usr/lib64/python2.7/site-packages/mercurial/commands/__init__.py", line 5911, in serve
  remote:     s.serve_forever()
  remote:   File "/usr/lib64/python2.7/site-packages/mercurial/sshserver.py", line 107, in serve_forever
  remote:     while self.serve_one():
  remote:   File "/usr/lib64/python2.7/site-packages/mercurial/sshserver.py", line 138, in serve_one
  remote:     rsp = wireproto.dispatch(self.repo, self, cmd)
  remote:   File "/usr/lib64/python2.7/site-packages/hgext/pushrebase/__init__.py", line 266, in _wireprodispatch
  remote:     return orig(repo, proto, command)
  remote:   File "/usr/lib64/python2.7/site-packages/mercurial/wireproto.py", line 671, in dispatch
  remote:     res = func(repo, proto, *args)
  remote:   File "/usr/lib64/python2.7/site-packages/mercurial/wireproto.py", line 1166, in unbundle
  remote:     r = exchange.unbundle(repo, gen, their_heads, "serve", proto._client())
  remote:   File "/usr/lib64/python2.7/site-packages/hgext/pushrebase/__init__.py", line 305, in unbundle
  remote:     result = orig(repo, cg, heads, source, url)
  remote:   File "/usr/lib64/python2.7/site-packages/hgext/hgsql.py", line 332, in unbundle
  remote:     return orig(repo, cg, *args, **kwargs)
  remote:   File "/usr/lib64/python2.7/site-packages/mercurial/exchange.py", line 2138, in unbundle
  remote:     op = bundle2.processbundle(repo, cg, op=op)
  remote:   File "/usr/lib64/python2.7/site-packages/mercurial/bundle2.py", line 469, in processbundle
  remote:     processparts(repo, op, unbundler)
  remote:   File "/usr/lib64/python2.7/site-packages/hgext/infinitepush/__init__.py", line 1394, in processparts
  remote:     storebundle(op, cgparams, bundlefile)
  remote:   File "/usr/lib64/python2.7/site-packages/hgext/infinitepush/__init__.py", line 1484, in storebundle
  remote:     index.addbundle(key, nodesctx)
  remote:   File "/usr/lib64/python2.7/site-packages/hgext/infinitepush/sqlindexapi.py", line 170, in addbundle
  remote:     data,
  remote:   File "/usr/lib/python2.7/site-packages/mysql/connector/cursor.py", line 567, in executemany
  remote:     self.execute(operation, params)
  remote:   File "/usr/lib/python2.7/site-packages/mysql/connector/cursor.py", line 477, in execute
  remote:     stmt = operation % self._process_params(params)
  remote:   File "/usr/lib/python2.7/site-packages/mysql/connector/cursor.py", line 355, in _process_params
  remote:     "Failed processing format-parameters; %s" % err)
  remote: ProgrammingError: Failed processing format-parameters; 'CustomConverter' object has no attribute '_localstr_to_mysql'

I'm not sure which code path creates localstr exactly. But it seems to be
created for non-utf8 strings. Since Python 2's `str` is actually `bytes` that
can store non-utf8 losslessly, let's just define the missing method that casts
localstr to str (bytes).

The change is not backported to hgsql intentionally since hgsql hasn't seen
similar issues.

Reviewed By: liubov-dmitrieva

Differential Revision: D13069914

fbshipit-source-id: 8185dc457f8e6ac98a484f5bc6d6e7008ddcee02
2018-12-11 11:54:02 -08:00
Jun Wu
cbab61fcab doc: fix a typo
Summary: Code block starts with `::`, not `:` in rst.

Reviewed By: kulshrax

Differential Revision: D13389445

fbshipit-source-id: 35e2948154e71b91c7c7f2cacfa3f9ed8b312521
2018-12-11 11:49:20 -08:00
Jun Wu
cc6b265d05 sparse: make sparse work on empty repo
Summary:
As the title. This exposes the next issue, which appears to be caused by
`forceincludematcher` not working with complex patterns.

Reviewed By: DurhamG

Differential Revision: D10861613

fbshipit-source-id: d58c74fdf5da2b0399fe69ca499169cd5887645f
2018-12-10 20:09:05 -08:00
Jun Wu
149eab7b65 treestate: add a test demostrating NEED_CHECK is removed for files outside sparse
Summary:
Add a test about NEED_CHECK cleanup for files outside sparse. It does not
demostrate anything wrong, unfortunately.

Reviewed By: DurhamG

Differential Revision: D12906651

fbshipit-source-id: 6e9e690ba431e141666ffdb431f62280e86bbf0b
2018-12-10 17:52:04 -08:00
Mark Thomas
ac25a0f173 pyrevisionstore: suppress warning of unused parameter
Summary:
The `known` parameter can't be renamed to `_known`, as the Python bindings
expect it to have the name `known`, however we can suppress the warning by
assigning it to a local variable named `_known`.  The compiler will optimize
this away.

Reviewed By: quark-zju

Differential Revision: D13239755

fbshipit-source-id: cc83be0441415dc3dd5959632dcab2af54d11513
2018-12-10 03:37:25 -08:00
Saurabh Singh
bc9e47bfb6 infinitepush: fix visibility issue during update
Summary:
I faced an issue where I had the following DAG:

```
A   B
@   o
|  /
| /
|/
o C
```

I did the following operations at that point:

```
hg hide -r A
hg up top
```

but the update took me back to A instead of taking me to B as I expected.
quark-zju debugged the issue and we found out that this was due to D6899232.
This commit fixes the issue by ensuring that we do not make a filtered repo
unfiltered before the update operation.

Reviewed By: DurhamG

Differential Revision: D13388494

fbshipit-source-id: 1678f97a4b3179720c30cf1366e9301a2c27a5ba
2018-12-07 18:07:16 -08:00
Saurabh Singh
1db0a03741 infinitepush: add a test to highlight visibility bug during update
Summary:
I faced an issue where I had the following DAG:

```
A   B
@   o
|  /
| /
|/
o C
```

I did the following operations at that point:

```
hg hide -r A
hg up top
```

but the update took me back to A instead of taking me to B as I expected.

This commit just adds a test case that captures this issue.

Reviewed By: quark-zju

Differential Revision: D13389113

fbshipit-source-id: 6febb5459db3fbb69bbb7d23b4f0c6e1a12d4130
2018-12-07 18:07:16 -08:00
Xinyue Zhang
65ed20fe71 Pass callsite to worker to enable thread-based windows worker from blobstore.
Summary:
This diff adds the `callsite` parameter to worker.worker and checks whether the passed-in callsite is enabled in the worker.enablecallsites config flag. This change essentially enabled parallization for lfs prefetch, which is called during initial repo clone(when mercurial cache is clean). Since blobstore.py seems to be thread-safe, we could also anticipate similar improvements for the more generalized `hg udpate` case by batching files to download.

Details see T37718264

Reviewed By: ikostia

Differential Revision: D13284680

fbshipit-source-id: c3b825033a28344e19ba5ca1621b59fe7b46b322
2018-12-07 15:57:41 -08:00
Durham Goode
ecc8d179ea hggit: add hg git-updatemeta command
Summary:
Now that git hashes are stored in the commit extras, let's add a
command that reads the latest commits in the repo and adds their git/hg mapping
to the map.

Reviewed By: quark-zju

Differential Revision: D13369867

fbshipit-source-id: 250a62979472c2ab68b2ca6f6c6463daca4bdb45
2018-12-07 12:23:34 -08:00
Jun Wu
e2c247c6fc remotefilelog: fix LFS compatibility with localpackdata
Summary:
The name "meta" used in remotefilelog.py is confusing. It can sometimes be
"filelog meta", sometimes "datapack meta". The old code passes the wrong
meta and caused issues.

Differential Revision: D13380634

fbshipit-source-id: a1a6ada2d880b6c28fe22c486c45a6203e4cf5db
2018-12-07 12:10:54 -08:00
Liubov Dmitrieva
4cdcd72015 Back out "[hg] hgsubversion: use a single transaction for hgsubversion pull"
Summary:
Original commit changeset: 502c7e020ba2

autogenerated with  `hg backout 502c7e020ba2a2df1ec245a845b6d8eb8b9c3456`  command

temporarily roll back because it breaks the unit test:

./run-tests.py comprehensive/test-hgsubversion-sqlite-revmap.py

Reviewed By: DurhamG

Differential Revision: D13377261

fbshipit-source-id: d437de11212439741fb97c230225cb5c906e4f04
2018-12-07 09:21:48 -08:00
Arun Kulshreshtha
80cc6e3bf1 don't build pymononokeapi on windows
Summary: The pymononokeapi Rust extension currently fails to build on Windows due to issues with OpenSSL. For now, don't build it on Windows to unbreak the build. This is acceptable because this crate does not yet have any significant functionality.

Reviewed By: DurhamG

Differential Revision: D13371991

fbshipit-source-id: ab3c3de116fc6a04b4a706919ceb541349eb353e
2018-12-06 23:34:18 -08:00
Jun Wu
b51bf2af16 filemerge: fix fcd.p1 raising exception
Summary:
See user report https://fburl.com/latdfq5n. `fcd` in the code path is not
always a `workingfilectx` and calling `p1` might raise if it does not have
a parent.

Reviewed By: phillco

Differential Revision: D13144605

fbshipit-source-id: 8961943e65504b43b4a9d1a4828755fdf8ca0915
2018-12-06 19:38:36 -08:00
Jun Wu
232dbea03a config: use Rust-backed parselist
Summary:
Now that `parselist` is available in Rust. Let's switch to that.
It's 50x faster.

```
In [1]: s='command, commandfinish, commandexception, exthook, pythonhook, fsmonitor, fsmonitor_details,
   ...:  fsmonitor_status, treedirstate, watchman-command, merge-resolve'

In [3]: p1=m.config.parselist

In [4]: %timeit p1(s)
10000 loops, best of 3: 97.2 µs per loop

In [5]: from mercurial.rust import config

In [6]: p2=config.parselist

In [9]: %timeit p2(s)
100000 loops, best of 3: 2.03 µs per loop
```

Reviewed By: DurhamG

Differential Revision: D9332596

fbshipit-source-id: e7b7d5d41f537d663d36169e14d27ed3d395a59c
2018-12-06 18:23:52 -08:00
Saurabh Singh
a249ab6d9d globalrevs: return svn revision as global revision for older commits
Summary:
This requirement was discussed in both this thread:
https://fb.facebook.com/groups/247583349136387/permalink/359377874623600/ and
D13277462. Basically, we want

```
hg log -r <hash> -T "globalrev()"
```

to return the svn revision in case there is no global revision assigned to the
commit which is possible for older commits.

Reviewed By: phillco

Differential Revision: D13370832

fbshipit-source-id: 5f8ba4c1781f83204de775127554a4944f00bb1d
2018-12-06 17:55:08 -08:00
Saurabh Singh
4bf5dab30d globalrevs: split method to allow wrapping
Summary:
This commit just splits the method `_globalrevkw` method to allow for
wrapping in D13370340.

Reviewed By: phillco

Differential Revision: D13370833

fbshipit-source-id: aa28d8e0771c2fb10aee8a267c66f8fdb7842b2b
2018-12-06 17:55:08 -08:00
Saurabh Singh
e0cd5f91fd hgsubversion: reformat using black
Summary:
Accidentally ran black on `hgsubversion` extension and noticed that
this was fixed.

Reviewed By: phillco

Differential Revision: D13370343

fbshipit-source-id: ffdaff23978cbcfe6a9443e1603a535efc44abab
2018-12-06 17:55:08 -08:00
Durham Goode
ab1f7b6eef hggit: support indexedlog git map file
Summary:
Add a config option to use indexedlog as the node map storage instead
of a flat text file.

Reviewed By: quark-zju

Differential Revision: D13062573

fbshipit-source-id: ae14df24a4e36c59fbd9ec82d785aac52a2f8b5f
2018-12-06 16:21:37 -08:00
Jun Wu
417dc50086 util: remove xmlrpclib
Summary:
The last use of it was the bugzilla extension, which was removed.
Therefore remove it.

Reviewed By: DurhamG

Differential Revision: D13285203

fbshipit-source-id: 47d1fc9df68369660a63313fe4f39aaaffd7f239
2018-12-06 15:37:40 -08:00
Jun Wu
7c69c26d52 date: fix parsing months
Summary:
Backports my upstream patch (https://phab.mercurial-scm.org/D2289)

Discovered by nemo on #mercurial freenode IRC, some months did not parse.

Reviewed By: DurhamG

Differential Revision: D12983571

fbshipit-source-id: b33227b0c986180966579afbb2b7c11fd48d275b
2018-12-06 15:14:18 -08:00
Jun Wu
421c7b3f45 indexedlog: add a tool to dump indexedlog content
Summary: The tool can dump indexedlog content. Useful for manually investigating issues.

Reviewed By: DurhamG

Differential Revision: D13051387

fbshipit-source-id: 8687a1aa9dfb54776e80f184208c49da2492c34d
2018-12-06 14:57:52 -08:00
Jun Wu
54dc931140 indexedlog: use inlined leaf entries to further reduce index size
Summary:
Add a new entry type - INLINE_LEAF, which embeds the EXT_KEY and LINK entries
to save space.

The index size for referred keys is significantly reduced with little overhead:

  index insertion (owned key)     3.732 ms
  index insertion (referred key)  3.604 ms
  index flush                    11.868 ms
  index lookup (memory)           1.159 ms
  index lookup (disk, no verify)  2.175 ms
  index lookup (disk, verified)   4.303 ms
  index size (5M owned keys)     216626039
  index size (5M referred keys)   96616431
    11.87s user 2.96s system 98% cpu 15.107 total

The breakdown of the "5M referred keys" size is:

  type          count     bytes
  radixes       1729472   33835772
  inline_leafs  5000000   62780651

There are no other kinds of entries stored.

Previously, the index size of referred keys is:

  index size (5M referred keys)  136245815 bytes

So it's 136MB -> 96MB, 40% decrease.

Reviewed By: DurhamG

Differential Revision: D13036801

fbshipit-source-id: 27e68e4b6c332c1dc419abc6aba69271952e4b3d
2018-12-06 14:57:52 -08:00
Jun Wu
a4958163ee indexedlog: optimize size of radix entries (BC)
Summary:
Replace the 20-byte "jump table" with 3-byte "flag + bitmap". This saves space
for indexes less than 4GB. There are some reserved bits in the "flag" so if we
run into space issues when indexes are larger than 4GB, we can try adding
6-byte integer, or VLQ back without breaking backwards-compatibility.

It seems to hurt flush performance a bit, because we have to scan the child
array twice. However, lookup (the most important performance) does not change
much. And the index is more compact.

After:

  index flush                    19.644 ms
  index lookup (disk, no verify)  2.220 ms
  index lookup (disk, verified)   4.067 ms
  index size (5M owned keys)     216626039 bytes
  index size (5M referred keys)  136245815 bytes

Before:

  index flush                    16.764 ms
  index lookup (disk, no verify)  2.205 ms
  index lookup (disk, verified)   4.030 ms
  index size (5M owned keys)     240838647 bytes
  index size (5M referred keys)  160458423 bytes

For the "referred key" case, it's 160->136MB, 17% decrease.

A detailed break down of components of index is:

After:

  type       count     bytes (using owned keys)
  radixes    1729472   33835772
  links      5000000   27886336
  leafs      5000000   44629384
  keys       5000000  110000000

  type       count     bytes (using referred keys)
  radixes    1729472   33835772
  links      5000000   27886336
  leafs      5000000   44629384
  ext_keys   5000000   29894315

Before:

  type       count     bytes (using owned keys)
  radixes    1729472   58048380
  links      5000000   27886336
  leafs      5000000   44903923
  keys       5000000  110000000

  type       count     bytes (using referred keys)
  radixes    1729472   58048380
  links      5000000   27886336
  leafs      5000000   44629384
  ext_keys   5000000   29894315

Leaf nodes are taking too much space. It seems the next big optimization might
be inlining ext_keys into leafs.

Reviewed By: DurhamG, markbt

Differential Revision: D13028196

fbshipit-source-id: 6043b16fd67a497eb52d20a17e153fcba5cb3e81
2018-12-06 14:57:52 -08:00
Jun Wu
d8117b3b04 indexedlog: increase key count for size test
Summary:
Since the size test only runs once, we can use a larger number of keys. This is
closer to some production use-cases.

`cargo bench size` shows:

  index size (5M owned keys)     240838647
  index size (5M referred keys)  160458423

It currently uses 32 bytes per key for 5M referred keys.

Reviewed By: markbt

Differential Revision: D13027880

fbshipit-source-id: 726f5fb2da056e77ab93d82fda9f1afa500d0a8d
2018-12-06 14:57:52 -08:00
Jun Wu
55b6331aa4 indexedlog: add more benchmarks
Summary:
Add benchmarks about index sizes, and a benchmark of insertion using key
references.

An example `cargo bench` result running on my devserver looks like:

  index insertion (owned key)     3.551 ms
  index insertion (referred key)  3.713 ms
  index flush                    20.648 ms
  index lookup (memory)           1.087 ms
  index lookup (disk, no verify)  2.041 ms
  index lookup (disk, verified)   4.347 ms
  index size (owned key)            886010
  index size (referred key)         534298

Reviewed By: markbt

Differential Revision: D13027879

fbshipit-source-id: 70644c504026ffee2122d857d5035f5b7eea4f42
2018-12-06 14:57:52 -08:00
Jun Wu
d7129256d4 indexedlog: switch checksum table to little endian (BC)
Summary:
For checksum values like xxhash, there is no benefit using big endian. Switch
to little endian so it's slightly slightly faster on the major platforms we
care about.

This is a breaking change. However, the format is not used in production yet.
So there is no migration code.

Reviewed By: markbt

Differential Revision: D13015465

fbshipit-source-id: ca83d19b3328370d089b03a33e848e64b728ef2a
2018-12-06 14:57:52 -08:00
Jun Wu
75b4f92c44 indexedlog: support different checksum functions for Log entries (BC)
Summary:
Previously, the format of an Log entry is hard-coded - length, xxhash, and
content. The xxhash always takes 8 bytes.

For small (ex. 40-byte) entries, xxhash32 is actually faster and takes less
disk space.

Introduce the "entry flags" concept so we can store some metadata about what
checksum function to use. The concept could be potentially used to support
other new format changes at per entry level in the future.

As we're here, also support data without checksums. That can be useful for
content with its own checksum, like a blob store with its own SHA1 integrity
check.

Performance-wise, log insertion is slower (but the majority insertaion overhead
would be on the index part), iteration is a little bit faster, perhaps because
the log can use less data.

Before:

  log insertion                  15.874 ms
  log iteration (memory)          6.778 ms
  log iteration (disk)            6.830 ms

After:

  log insertion                  18.114 ms
  log iteration (memory)          6.403 ms
  log iteration (disk)            6.307 ms

Reviewed By: DurhamG, markbt

Differential Revision: D13051386

fbshipit-source-id: 629c251633ecf85058ee7c3ce7a9f576dfac7bdf
2018-12-06 14:57:52 -08:00
Jun Wu
049cd99f05 indexedlog: use non-VLQ encoding for xxhash (BC)
Summary:
Xxhash result won't usually have leading zeros. So VLQ encoding is not an
efficient choice. Use non-VLQ encoding instead.

Performance wise, this is noticably faster than before:

  log insertion                  14.161 ms
  log insertion with index      102.724 ms
  log flush                      11.336 ms
  log iteration (memory)          6.351 ms
  log iteration (disk)            7.922 ms
    10.18s user 3.66s system 97% cpu 14.218 total
  log insertion                  13.377 ms
  log insertion with index       97.422 ms
  log flush                      11.792 ms
  log iteration (memory)          6.890 ms
  log iteration (disk)            7.139 ms
    10.20s user 3.56s system 97% cpu 14.117 total
  log insertion                  14.573 ms
  log insertion with index       94.216 ms
  log flush                      18.993 ms
  log iteration (memory)          7.867 ms
  log iteration (disk)            7.567 ms
    9.85s user 3.73s system 96% cpu 14.073 total
  log insertion                  15.526 ms
  log insertion with index       98.868 ms
  log flush                      19.600 ms
  log iteration (memory)          7.533 ms
  log iteration (disk)            7.150 ms
    10.13s user 4.02s system 96% cpu 14.647 total
  log insertion                  14.629 ms
  log insertion with index      100.449 ms
  log flush                      20.997 ms
  log iteration (memory)          7.299 ms
  log iteration (disk)            7.518 ms
    10.14s user 3.65s system 96% cpu 14.274 total

This is a format-breaking change. Fortunately we haven't really use the old
format in production yet.

Reviewed By: DurhamG, markbt

Differential Revision: D13015463

fbshipit-source-id: 6e7e4f7a845ea8dbf0904b3902740b65cc7467d5
2018-12-06 14:57:52 -08:00
Jun Wu
42c3ef6eb6 indexedlog: add benchmark for "log"
Summary:
Some simple benchmark for "log". The initial result running from my devserver
looks like:

  log insertion                  33.146 ms
  log insertion with index      106.449 ms
  log flush                       9.623 ms
  log iteration (memory)         10.644 ms
  log iteration (disk)           11.517 ms
    13.75s user 3.61s system 97% cpu 17.778 total
  log insertion                  27.906 ms
  log insertion with index      107.683 ms
  log flush                      19.204 ms
  log iteration (memory)         10.239 ms
  log iteration (disk)           11.118 ms
    12.89s user 3.55s system 97% cpu 16.924 total
  log insertion                  31.645 ms
  log insertion with index      109.403 ms
  log flush                       9.416 ms
  log iteration (memory)         10.226 ms
  log iteration (disk)           10.757 ms
    13.07s user 3.02s system 97% cpu 16.423 total
  log insertion                  31.848 ms
  log insertion with index      109.332 ms
  log flush                      18.345 ms
  log iteration (memory)         10.709 ms
  log iteration (disk)           11.346 ms
    13.12s user 3.70s system 97% cpu 17.276 total
  log insertion                  29.665 ms
  log insertion with index      106.041 ms
  log flush                      16.159 ms
  log iteration (memory)         10.367 ms
  log iteration (disk)           11.110 ms
    12.99s user 3.27s system 97% cpu 16.717 total

Reviewed By: markbt

Differential Revision: D13015464

fbshipit-source-id: 035fee6c8b6d0bea4cfe194eed3d58ba4b5ebcb8
2018-12-06 14:57:52 -08:00
Durham Goode
bf3cad3004 hggit: store git hash in hg extras
Summary:
In order to move our hg-git mirroring off of the main hg servers, we
need to make it possible for the hg servers to compute the hg-git mapping
without having the entire git repository available. To do so, let's store the
git hash as an extra in the hg commit.

This breaks bidirectionality, but we've long since not needed that.

Reviewed By: phillco

Differential Revision: D13362980

fbshipit-source-id: 51df709bc5e77d78bb963abf90d0c35bb743d966
2018-12-06 12:35:14 -08:00
Durham Goode
c1f85ad54d hggit: move git_map storage behind a GitMap class
Summary:
A future diff will store the GitMap data in a rust storage structure.
Let's start by refactoring the python code to meet the same API as the rust
code.

Reviewed By: quark-zju

Differential Revision: D13062574

fbshipit-source-id: 3a1573afb98b73dacfc6e9e9efc5504a8b5ccbfb
2018-12-06 11:47:41 -08:00
Durham Goode
1a3a0bcd72 nodemap: add key iteration
Summary:
An upcoming diff will need the ability to iterate over all the keys in
the store. So let's expose that functionality.

Reviewed By: quark-zju

Differential Revision: D13062575

fbshipit-source-id: a173fcdbbf44e2d3f09f7229266cca6f3e67944b
2018-12-06 11:47:41 -08:00
Durham Goode
60b3bebaff nodemap: python bindings for rust nodemap
Summary: Simple python bindings for the new nodemap rust structure

Reviewed By: quark-zju

Differential Revision: D13062572

fbshipit-source-id: d60407b87bfc19b496de09273a9c8d6b59af0b8b
2018-12-06 11:47:41 -08:00
Durham Goode
e9b755198c nodemap: introduce rust bidirectional node map
Summary:
Introduces a nodemap structure that stores the mapping between two
nodes with bidirectional indexes.

Reviewed By: quark-zju

Differential Revision: D13047698

fbshipit-source-id: 967bf4b26a4b57e4fa2421a342edb21d3a5adbf6
2018-12-06 11:47:41 -08:00
Durham Goode
668ba5165c indexedlog: add an iterator function for iterating over keys
Summary:
You can currently iterate over indexlog entries, but there's no way to
iterate over the keys without keeping a copy of the index function with you.
Let's add a key iterator function.

Reviewed By: quark-zju

Differential Revision: D13010744

fbshipit-source-id: 1fcaf959ae82417e5cbafae7c1927c3ae8f8e76a
2018-12-06 11:47:41 -08:00
Durham Goode
60b42574e5 remotefilelog: fix rename traversals across multiple stores
Summary:
This bug has been here for 2+ years. Basically, when gathering the
ancestors for a given file node, if it traversed a rename it would lose track of
the new name and instead look up the new hash by the old name, which would fail.

We didn't hit this often, because it only causes a problem when you have partial
history and have to go fetch more history. In most remotefilelog repos we
download all of history, so you always have everything and therefore never hit
this.

Reviewed By: kulshrax, singhsrb

Differential Revision: D13332459

fbshipit-source-id: 120bfe9ac618a4979e1685f24dc6462fc7415b1b
2018-12-06 11:07:21 -08:00
Durham Goode
dab12d9939 remotefilelog: test for bad ancestor traversal
Summary:
Adds a test that demonstrates a bug in our ancestor traversal logic.
The test requires very particular circumstances:

Assuming a file x was modified to x' then renamed to z:

1. The rename is required because the bug is in keeping track of the copy source name
2. Version x' must be in the same pack as version z, because the bug is in the
logic that traverses a single packs history.
3. Version x must be in a different pack than x' and z, because the bug is about
trying to look up version x after having lost track of the correct name when
going from z to z'.
4. We must be using rust historypacks, because that implementation attempts to
traverse the entire history within the pack, while the python implementation
stops at renames.

Reviewed By: kulshrax

Differential Revision: D13332458

fbshipit-source-id: b01d09fb8ebd27414e4f6dba06a6d0b1f26ac13c
2018-12-06 11:07:21 -08:00
Durham Goode
179e3ba47e hgsubversion: use a single transaction for hgsubversion pull
Summary:
hgsubversion was doing one transaction per commit, which is both slow
and also causes a lot of packs to be created when operating on a remotefilelog
client.

Let's make it use a single transaction instead. Unfortunately the svn metadata
is not integrated with Mercurial transactions. If we're using a normal flat-text
revmap, if there is an exception when running pull, it will need to svn rebuild
the metadata to remove the bad data and continue.

If we're using a sqlite revmap, then we've integrated with the sqlite
transaction, so it should work as expected.

Reviewed By: phillco

Differential Revision: D13347708

fbshipit-source-id: 502c7e020ba2a2df1ec245a845b6d8eb8b9c3456
2018-12-06 09:43:07 -08:00
Jun Wu
7f87056c37 crecord: show hint about how to use the text interface
Summary:
When the curses interface cannot be used, prompt the user about the text
interface to unblock.

Reviewed By: akushner

Differential Revision: D12987094

fbshipit-source-id: 3eff84d9daaaf19aaa08ebf28cbd7c7bf98f9e9a
2018-12-05 20:09:19 -08:00
Jun Wu
f504f4d8a4 doc: rewrite WritingTests
Summary:
Rewrite it. Add more modern features. Pay less attention on run-tests.py flags
or shell portability issues, since we're enforcing bash already.

Reviewed By: phillco

Differential Revision: D10503344

fbshipit-source-id: 535494d4f1ecccb2b4a45bc6929b6f7398bde70d
2018-12-05 19:42:54 -08:00
Jun Wu
c92e6755d7 plain: add a diff exception
Summary:
Oculus wants to preserve the "do not show binary diff" config while running
`hg export` in automation, since printing the binary diff might just OOM the
container. Let's add a plain exception for it.

Reviewed By: zhh95

Differential Revision: D13111212

fbshipit-source-id: 34af58ac0917de3b3231e637774896d882585e26
2018-12-05 19:23:41 -08:00
Jun Wu
1a18688d9f tests: add a test showing some corner cases in matchmod.match API
Summary: As the title. To help decide the right way for D13332653.

Reviewed By: DurhamG

Differential Revision: D13333157

fbshipit-source-id: 4fe44fffa48409e790efc1ba7b0d681350512eac
2018-12-05 17:58:42 -08:00
Durham Goode
4fcf7c436c hgsubversion: fix match constructor
Summary: This needs to be patterns not pattern.

Reviewed By: quark-zju

Differential Revision: D13345626

fbshipit-source-id: fd998610d3a6840905f82e4b37fd240e8ac7f8d9
2018-12-05 12:37:22 -08:00
Xavier Deguillard
108b8f9d41 treemanifest: corrupt the treemanifest in a test.
Summary:
We've seen corrupted repository in the wild on which we were unable to garbage
collect. Let's add a test to verify that the other repositories will be garbage
collected.

Reviewed By: DurhamG

Differential Revision: D13328487

fbshipit-source-id: 6f6a69d14455d00e468c5b9186d65cca8fd5c1c3
2018-12-05 09:09:18 -08:00
Durham Goode
5bebd357b5 hgsubversion: fix directory tree pattern matching
Summary:
Match patterns don't work the way I expected. They require the path to
not have '/' at the end, and they don't seem to match when a file or sub
directory is appended. Fix it by setting default to 'path'.

```
> matchmod.match("", "/", patterns=['foo/'])('foo/bar')
False
> matchmod.match("", "/", default='path', patterns=['foo/'])('foo/bar')
True
```

Reviewed By: quark-zju

Differential Revision: D13332653

fbshipit-source-id: e0f3fa9a51d36a40ac8a9c54f73296f431536d3c
2018-12-04 19:12:37 -08:00