Commit Graph

18022 Commits

Author SHA1 Message Date
Yuya Nishihara
7109431581 py3: use bytes() to cast to immutable bytes in pure.bdiff.bdiff() 2017-03-26 16:16:45 +09:00
Yuya Nishihara
b24f9e13b8 bdiff: drop support for array.array argument from pure.bdiff.bdiff()
Thanks to 54d8e724da64, we no longer pass array.array('c') object to
bdiff().
2017-03-26 16:14:04 +09:00
Augie Fackler
653b207160 revsetlang: fix _quote on int on python3
Thanks to Yuya for spotting the need.
2017-03-26 16:48:29 -04:00
Jun Wu
fea7f2c74c debugfsinfo: improve case-sensitive testing
Previously the case-sensitive test was for the current directory, and is
fragile with errors, and could remove a real file called ".debugfsinfo".

This patch improves the case-sensitive testing so it test the given path
using a unique temporary file, and does not crash on errors.
2017-03-26 17:59:33 -07:00
Jun Wu
00bc5a1203 debugfsinfo: show fstype for given path 2017-03-26 17:29:37 -07:00
Pulkit Goyal
eb657f47dd diff: use pycompat.{byteskwargs, strkwargs} to switch opts b/w bytes and str 2017-03-26 20:58:21 +05:30
Pulkit Goyal
dbbae871d3 patch: make regular expressions bytes by adding b'' 2017-03-26 20:54:50 +05:30
Pulkit Goyal
dbc47c9005 dispatch: use pycompat.maplist() instead of map() to get a list 2017-03-26 20:49:18 +05:30
Jun Wu
dc9779bcfc statfs: detect more filesystems on Linux
Previously, the code only has what manpager says. In <linux/magic.h>, there
are more defined. This patch adds filesystems that appear in the current
Arch Linux's /proc/filesystems (autofs, overlay, securityfs) and f2fs, which
was seen in news.
2017-03-25 12:58:55 -07:00
Matt Harbison
721da7fc07 repair: use context manager for lock management
If repo.lock() raised inside of the try block, 'tr' would have been None in the
finally block where it tries to release().  Modernize the syntax instead of just
winching the lock out of the try block.

I found several other instances of acquiring the lock inside of the 'try', but
those finally blocks handle None references.  I also started switching some
trivial try/finally blocks to context managers, but didn't get them all because
indenting over 3x for lock, wlock and transaction would have spilled over 80
characters.  That got me wondering if there should be a repo.rwlock(), to handle
locking and unlocking in the proper order.

It also looks like py27 supports supports multiple context managers for a single
'with' statement.  Should I hold off on the rest until py26 is dropped?
2017-03-23 23:47:23 -04:00
Gregory Szorc
ee826b7708 gitweb: use monospace font for commit messages
Commit messages often contain vertically aligned text. The default
paper style already uses monospace fonts for rendering commit messages.
And, AFAICT, a number of Git servers also render commit messages
with monospace. It seems like the reasonable thing to do.

This commit converts all instances of the full commit message
in the gitweb style to render with monospace.
2017-03-24 19:52:43 -07:00
Matt Harbison
bbf17f3e32 pager: improve support for various flavors of more on Windows
Hardcoding 'more' -> 'more.com' means that 'more.exe' from MSYS would need to be
configured with its *.exe extension.  This will resolve to either one, as
cmd.exe would have done if the command ran through the shell.

Something that's maybe problematic with this is it comes after 'pageractive' and
various ui configs have been set by the calling method.  But the other early
exits in this method don't undo those changes either.
2017-03-24 22:40:08 -04:00
Jun Wu
43184506ff statfs: avoid static allocation
Previously we have "static struct statfs" to return a string. That is not
multiple-thread safe. This patch moves the allocation to the caller to
address the problem.
2017-03-24 15:05:42 -07:00
Jun Wu
3633e80713 statfs: change Linux feature detection
Previously we check three things: "statfs" function, "linux/magic.h" and
"sys/vfs.h" headers. But we didn't check "struct statfs" or the "f_type"
field. That means if a system has "statfs" but "struct statfs" is not
defined in the two header files we check, or defined without the "f_type"
field, the compilation will fail.

This patch combines the checks (2 headers + 1 function + 1 field) together
and sets "HAVE_LINUX_STATFS". It makes setup.py faster (less checks), and
more reliable (immutable to the issue above).
2017-03-24 14:59:19 -07:00
FUJIWARA Katsunori
aaa8db9cef misc: update descriptions about removed file for filectxfn
Since 2eef89bfd70d, filectxfn for memctx should return None for
removed file instead of raising IOError.
2017-03-24 22:13:23 +09:00
Martin von Zweigbergk
2e8637ffda merge with stable 2017-03-24 08:37:26 -07:00
Gregory Szorc
6fb64a9cca changegroup: store old heads as a set
Previously, the "oldheads" variable was a list. On a repository at
Mozilla with 46,492 heads, profiling revealed that list membership
testing was dominating execution time of applying small changegroups.

This patch converts the list of old heads to a set. This makes
membership testing significantly faster. On the aforementioned
repository with 46,492 heads:

$ hg unbundle <file with 1 changeset>
before: 18.535s wall
after:   1.303s

Consumers of this variable only check for truthiness (`if oldheads`),
length (`len(oldheads)`), and (most importantly) item membership
(`h not in oldheads` - which occurs twice). So, the change to a set
should be safe and suitable for stable.

The practical effect of this change is that changegroup application
and related operations (like `hg push`) no longer exhibit an O(n^2)
CPU explosion as the number of heads grows.
2017-03-23 19:54:59 -07:00
Simon Farnsworth
ddbe7287e7 subrepo: move prompts out of the if (issue5505)
Prompts weren't available in the else clause
2017-03-20 04:36:55 -07:00
Augie Fackler
b00133b3e1 revsetlang: perform quoting using ui.escapestr instead of repr()
This changes one of the doctest results, but I'm pretty sure on
inspection that it's an equivalent result.
2017-03-23 10:46:50 -04:00
Augie Fackler
7257c843f2 revsetlang: add docstring with some tests to _quote 2017-03-23 10:41:34 -04:00
Augie Fackler
434ea09250 revsetlang: move quoting function to not be a closure
I'm about to change the implementation here and I'd like to add some
doctests, which means this needs to not be hidden inside another
function.
2017-03-19 01:14:19 -04:00
Augie Fackler
9acabc5cd1 revsetlang: portably bytestring-ify another pair of int() calls 2017-03-23 10:33:20 -04:00
Jun Wu
057a1065b9 util: enable hardlink for some BSD-family filesystems
Since we can now detect filesystems on FreeBSD and OSX. Add their major
filesystems (ufs, zfs for FreeBSD; hfs for OSX) to the hardlink whitelist.
2017-03-23 22:31:50 -07:00
Jun Wu
b84e57666e osutil: report fstype for BSD and OSX 2017-03-23 22:13:02 -07:00
Jun Wu
7fc770b181 debugfsinfo: use util.getfstype
This changes the behavior slightly. It now always prints fstype, regardless
of whether osutil.getfstype exists.
2017-03-23 12:03:19 -07:00
Jun Wu
443e227575 util: use util.getfstype 2017-03-23 12:01:18 -07:00
Jun Wu
43dc9d6102 util: add a getfstype method
The util version is a thin wrapper of the osutil version, which is not
always available.
2017-03-23 11:58:45 -07:00
Matt Harbison
6b815a9943 pager: fix the invocation of more on Windows
After 6a86fe38f1f6, with 'shell' being (mostly) set to False, invoking `more` no
longer worked.  Instead, a warning was printed and the pager was disabled.
Invoking `more.com` works.  Since a user may have configured 'pager.pager=more',
do this substitution at the end.  Surprisingly, `more` does allow for arguments,
so those are preserved.  This also allows `more` to work in MSYS.

Setting 'shell=False' runs the executable via CreateProcess(), which has rather
wonky rules for resolving an executable without an extension [1].  Resolving to
*.com is not among them.  Since 'shell=True' yields a cryptic error for a bad
$PAGER, and a *.exe program will work without specifying the extension, sticking
with current 'shell=False' seems like the right thing to do.  I don't think
there are any other *.com pagers out there, so this one special case seems OK.

If somebody wants to do something crazy that requires cmd.exe, I was able to get
normal paged output with 'pager.pager="cmd.exe /c more"'.  I assume you can
replace `more` with *.bat, *.vbs or various other creatures listed in $PATHEXT.

[1] https://msdn.microsoft.com/en-us/library/windows/desktop/ms682425(v=vs.85).aspx
2017-03-20 00:19:33 -04:00
Martin von Zweigbergk
653c126971 help: format `commands` heading correctly
The number of dashes under it needs to match exactly for it to be
rendered as a heading. Without this change, the dashes end up on the
same line as "commands", and "hg help config.commands" does not work.
2017-03-22 16:36:53 -07:00
Martin von Zweigbergk
68e5878038 status: support commands.status.relative config
When the config is set to true, status output becomes relative to the
working directory. This has bugged me since I started using hg and it
turns it is sillily simple to support it (unless I missed something,
of course).

We could also add a --relative flag, but I would personally always
want that on, and I haven't heard any use for having it sometimes on,
so this patch only lets you enable it via config.
2017-03-21 17:50:44 -07:00
Martin von Zweigbergk
dc4c37c445 plain: ignore [commands] config
We only have commands.{update,rebase}.requiredest so far. We should
clearly ignore those two if HGPLAIN is in effect, and it seems like we
should ignore any future config that will be added in [commands] since
that is about changing the behavior of commands.

Thanks to Yuya for suggesting to centralize the code in ui.py.

While at it, remove the unnecessary False values passed to
ui.configbool() for the aforementioned config options.
2017-03-21 21:26:52 -07:00
Pierre-Yves David
3e8a941a87 checkheads: extract obsolete post processing in its own function
The checkheads function is long and complex, extract that logic in a subfunction
is win in itself. As the comment in the code says, this postprocessing is
currently very basic and either misbehave or fails to detect valid push in many
cases. My deeper motive for this extraction is to be make it easier to provide
extensive testing of this case and strategy to cover them. Final test and logic
will makes it to core once done.
2017-03-21 23:30:13 +01:00
Yuya Nishihara
8a64b55504 similar: use cheaper hash() function to test exact matches
We just need a hash table {fctx.data(): fctx} which doesn't keep fctx.data()
in memory. Let's simply use hash(fctx.data()) to put data out from memory,
and manage collided fctx objects by list.

This isn't significantly faster than using sha1, but is more correct as we
know SHA-1 collision attack is getting practical.

Benchmark with 50k added/removed files, on tmpfs:

  $ hg addremove --dry-run --time -q

  previous:   real 12.420 secs (user 11.120+0.000 sys 1.280+0.000)
  this patch: real 12.350 secs (user 11.210+0.000 sys 1.140+0.000)
2017-03-23 20:57:27 +09:00
Yuya Nishihara
8c9b9d4020 similar: take the first match instead of the last
It seems more natural. This makes the next patch slightly cleaner.
2017-03-23 20:52:41 +09:00
Yuya Nishihara
9290e50162 similar: do not look up and create filectx more than once
Benchmark with 50k added/removed files, on tmpfs:

  $ hg addremove --dry-run --time -q

  previous:   real 16.070 secs (user 14.470+0.000 sys 1.580+0.000)
  this patch: real 12.420 secs (user 11.120+0.000 sys 1.280+0.000)
2017-03-23 21:17:08 +09:00
Yuya Nishihara
0087b0c7b8 similar: use common names for changectx variables
We generally use 'wctx' and 'pctx' for working context and its parent
respectively.
2017-03-23 21:10:45 +09:00
Yuya Nishihara
6a4b18deca similar: get rid of quadratic addedfiles.remove()
Instead, build a set of files to be removed and recreate addedfiles
only if necessary.

Benchmark with 50k added/removed files, on tmpfs:

  $ hg addremove --dry-run --time -q

  original:   real 16.550 secs (user 15.000+0.000 sys 1.540+0.000)
  previous:   real 16.730 secs (user 15.280+0.000 sys 1.440+0.000)
  this patch: real 16.070 secs (user 14.470+0.000 sys 1.580+0.000)
2017-03-23 20:50:33 +09:00
Gregory Szorc
aff9286eb0 exchange: use v2 bundles for modern compression engines (issue5506)
Previously, `hg bundle zstd` on a non-generaldelta repo would
attempt to use a v1 bundle. This would fail because zstd is not
supported on v1 bundles.

This patch changes the behavior to automatically use a v2 bundle
when the user explicitly requests a bundlespec that is a compression
engine not supported on v1. If the bundlespec is <engine>-v1, it is
still explicitly rejected because that request cannot be fulfilled.
2017-03-16 12:33:15 -07:00
Gregory Szorc
3bbb6e2c52 exchange: reject new compression engines for v1 bundles (issue5506)
Version 1 bundles only support a fixed set of compression engines.
Before this change, we would accept any compression engine for v1
bundles, even those that may not work on v1. This could lead to
an error.

We define a fixed set of compression engines known to work with v1
bundles and we add checking to ensure a newer engine (like zstd)
won't work with v1 bundles.

I also took the liberty of adding test coverage for unknown compression
names because I noticed we didn't have coverage of it before.
2017-03-16 12:23:56 -07:00
Augie Fackler
5811358454 pycompat: verify sys.argv exists before forwarding it (issue5493)
ISAPI_WSGI doesn't set up sys.argv, so we have to look for the
attribute before assuming it exists.
2017-03-07 13:24:24 -05:00
Yuya Nishihara
ee998576d8 worker: flush messages written by child processes before exit
I found some child outputs were lost while testing the previous patch. Since
os._exit() does nothing special, we need to do that explicitly.
2017-02-25 12:48:50 +09:00
Rishabh Madan
d0ac5a9dcb ui: replace obsolete default-push with default:pushurl (issue5485)
Default-push has been deprecated in favour of default:pushurl. But "hg clone" still
inserts this in every hgrc file it creates. This patch updates the message by replacing
default-push with default:pushurl and also makes the necessary changes to test files.
2017-02-25 16:57:21 +05:30
FUJIWARA Katsunori
47ba9fae77 worker: ignore meaningless exit status indication returned by os.waitpid()
Before this patch, worker implementation assumes that os.waitpid()
with os.WNOHANG returns '(0, 0)' for still running child process. This
is explicitly specified as below in Python API document.

    os.WNOHANG
        The option for waitpid() to return immediately if no child
        process status is available immediately. The function returns
        (0, 0) in this case.

On the other hand, POSIX specification doesn't define the "stat_loc"
value returned by waitpid() with WNOHANG for such child process.

    http://pubs.opengroup.org/onlinepubs/9699919799/functions/waitpid.html

CPython implementation for os.waitpid() on POSIX doesn't take any care
of this gap, and this may cause unexpected "exit status indication"
even on POSIX conformance platform.

For example, os.waitpid() with os.WNOHANG returns non-zero "exit
status indication" on FreeBSD. This implies os.kill() with own pid or
sys.exit() with non-zero exit code, even if no child process fails.

To ignore meaningless exit status indication returned by os.waitpid(),
this patch skips subsequent steps forcibly, if os.waitpid() returns 0
as pid.

This patch also arranges examination of 'p' value for readability.

FYI, there are some issues below about this behavior reported for
CPython.

    https://bugs.python.org/issue21791
    https://bugs.python.org/issue27808
2017-02-25 01:07:52 +09:00
Siddharth Agarwal
7d1a6f9777 bundle2: fix assertion that 'compression' hasn't been set
`n.lower()` will return `compression`, not `Compression`.
2017-02-13 11:43:12 -08:00
Pierre-Yves David
43b1ef004c wireproto: properly report server Abort during 'getbundle'
Previously Abort raised during 'getbundle' call poorly reported (HTTP-500 for
http, some scary messages for ssh). Abort error have been properly reported for
"push" for a long time, there is not reason to be different for 'getbundle'. We
properly catch such error and report them back the best way available. For
bundle, we issue a valid bundle2 reply (as expected by the client) with an
'error:abort' part. With bundle1 we do as best as we can depending of http or
ssh.
2017-02-10 18:20:58 +01:00
Pierre-Yves David
695fa85daa getbundle: cleanly handle remote abort during getbundle
bundle2 allow the server to report error explicitly. This was initially
implemented for push but there is not reason to not use it for pull too. This
changeset add logic similar to the one in 'unbundle' to the
client side of 'getbundle'. That logic make sure the error is properly reported
as "remote". This will allow the server side of getbundle to send clean "Abort"
message in the next changeset.
2017-02-10 18:17:20 +01:00
Pierre-Yves David
d00dbd00d9 bundle1: fix bundle1-denied reporting for pull over ssh
Changeset a0966f529e1b introduced a config option to have the server deny pull
using bundle1. The original protocol has not really been design to allow that
kind of error reporting so some hack was used. It turned the hack only works on
HTTP and that ssh server hangs forever when this is used. After further
digging, there is no way to report the error in a unified way. Using `ooberror`
freeze ssh and raising 'Abort' makes HTTP return a HTTP-500 without further
details. So with sadness we implement a version that dispatch according to the
protocol used.

Now the error is properly reported, but we still have ungraceful abort after
that. The protocol do not allow anything better to happen using bundle1.
2017-02-10 18:06:08 +01:00
Pierre-Yves David
5b07cfa3b3 bundle1: display server abort hint during unbundle
The code was printing the abort message but not the hint. This is now fixed.
2017-02-10 17:56:52 +01:00
Pierre-Yves David
64f57e513b bundle1: fix bundle1-denied reporting for push over ssh
Changeset a0966f529e1b introduced a config option to have the server deny push
using bundle1. The original protocol has not really be design to allow such kind
of error reporting so some hack was used. It turned the hack only works on HTTP
and that ssh wire peer hangs forever when the same hack is used. After further
digging, there is no way to report the error in a unified way. Using 'ooberror'
freeze ssh and raising 'Abort' makes HTTP return a HTTP500 without further
details. So with sadness we implement a version that dispatch according to the
protocol used.

We also add a test for pushing over ssh to make sure we won't regress in the
future. That test show that the hint is missing, this is another bug fixed in
the next changeset.
2017-02-10 17:56:59 +01:00
Pierre-Yves David
e8a7ecc281 bundle2: keep hint close to the primary message when remote abort
The remote hint message was ignored when reporting the remote error and
passed to the local generic abort error. I think I might initially have
tried to avoid reimplementing logic controlling the hint display depending of
the verbosity level. However, first, there does not seems to have such verbosity
related logic and second the resulting was wrong as the primary error and the
hint were split apart. We now properly print the hint as remote output.
2017-02-10 17:56:47 +01:00
FUJIWARA Katsunori
2afd920706 misc: update year in copyright lines
This patch also makes some expected output lines in tests glob-ed for
persistence of them.

BTW, files below aren't yet changed in 2017, but this patch also
updates copyright of them, because:

    - mercurial/help/hg.1.txt

      almost all of "man hg" output comes from online help of hg
      command, and is already changed in 2017

    - mercurial/help/hgignore.5.txt
    - mercurial/help/hgrc.5

      "copyright 2005-201X Matt Mackall" in them mentions about
      copyright of Mercurial itself
2017-02-12 02:23:33 +09:00
Mads Kiilerich
6945cf0f5b merge: more safe detection of criss cross merge conflict between dm and r
0b5f1f2efc77 introduced handling of a crash in this case. A review comment
suggested that it was not entirely obvious that a 'dm' always would have a 'r'
for the source file.

To mitigate that risk, make the code more conservative and make less
assumptions.
2017-02-01 02:10:30 +01:00
Mads Kiilerich
120b66d101 merge: fix crash on criss cross merge with dir move and delete (issue5020)
Work around that 'dm' in the data model only can have one operation for the
target file, but still can have multiple and conflicting operations on the
source file where the other operation is a 'rm'. The move would thus fail with
'abort: No such file or directory'.

In this case it is "obvious" that the file should be removed, either before or
after moving it. We thus keep the 'rm' of the source file but drop the 'dm'.

This is not a pretty fix but quite "obviously" safe (famous last words...) as
it only touches a rare code path that used to crash. It is possible that it
would be better to swap the files for 'dm' as suggested on
https://bz.mercurial-scm.org/show_bug.cgi?id=5020#c13 but it is not entirely
obvious that it not just would create conflicts on the other file. That can be
revisited later.
2017-01-31 03:25:59 +01:00
Martin von Zweigbergk
06f115a93e util: make sortdict.keys() return a copy
dict.keys() is documented to return a copy, so it's surprising that
sortdict.keys() did not. I noticed this because we have an extension
that calls readlocaltags(). That method tries to remove any tags that
point to non-existent revisions (most likely stripped). However, since
it's unintentionally working on the instance it's modifying, it
sometimes fails to remove tags when there are multiple bad tags in a
row. This was not caught because localrepo.tags() does an additional
layer of filtering.

sortdict is also used in other places, but I have not checked whether
its keys() and/or __delitem__() methods are used there.
2017-01-30 22:58:56 -08:00
Yuya Nishihara
74023f2b13 revset: prevent using outgoing() and remote() in hgweb session (BC)
outgoing() and remote() may stall for long due to network I/O, which seems
unsafe per definition, "whether a predicate is safe for DoS attack." But I'm
not 100% sure about this. If our concern isn't elapsed time but CPU resource,
these predicates are considered safe. Perhaps that would be up to the
web/application server configuration?

Anyway, outgoing() and remote() wouldn't be useful in hgweb, so I think
it's okay to ban them.
2017-01-20 21:33:18 +09:00
Sean Farley
e145fc2df7 ui: rename tmpdir parameter to more specific repopath
This was requested by Augie and I agree that repopath is more
descriptive.
2017-01-18 18:25:51 -08:00
Gregory Szorc
9c03a7696d statprof: require input file
statprof has a __main__ handler that allows viewing of previously
written data files. As Yuya pointed out during review, 82ee01726a77
broke this. This patch fixes that.
2017-01-18 22:45:07 -08:00
Sean Farley
a405503f7a cmdutil: add tmpdir parament to ui.edit calls 2017-01-16 21:15:21 -08:00
Sean Farley
9280f19af2 ui: add a parameter to set the temporary directory for edit
Until callsites are updated, this will have no effect. Once callsites
are updated, specifying experimental.editortmpinhg will create editor
temporary files in a subdirectory of .hg, which will make it easier
for tool integrations to determine what repository is in play when
they're asked to edit an hg-related file.
2017-01-16 21:05:22 -08:00
Pulkit Goyal
f38d10e539 help: update help for hg update which was misleading (issue5427) 2017-01-18 03:44:19 +05:30
Matt Harbison
511b164fad templater: add '{envvars}' to access environment variables
Since the option for ui.exportableenviron is experimental, so is this template
until the underlying API is sorted out.
2017-01-17 23:12:54 -05:00
Matt Harbison
5a63dbb230 ui: introduce an experimental dict of exportable environment variables
Care needs to be taken to prevent leaking potentially sensitive environment
variables through hgweb, if template support for environment variables is to be
introduced.  There are a few ideas about the API for preventing accidental
leaking [1].  Option 3 seems best from the POV of not needing to configure
anything in the normal case.  I couldn't figure out how to do that, so guard it
with an experimental option for now.

[1] https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-January/092383.html
2017-01-17 23:05:12 -05:00
Martin von Zweigbergk
ad5f4ef8a6 revlog: give EXTSTORED flag value to narrowhg
Narrowhg has been using "1 << 14" as its revlog flag value for a long
time. We (Google) have many repos with that value in production
already. When the same value was reserved for EXTSTORED, it made those
repos invalid. Upgrading them will be a little painful. We should
clearly have reserved the value for narrowhg a long time ago. Since
the EXTSTORED flag is not yet in any release and Facebook also says
they have not started using it in production, so it should be okay to
change it. This patch gives the current value (1 << 14) back to
narrowhg and gives a new value (1 << 13) to EXTSTORED.
2017-01-17 11:25:02 -08:00
Martin von Zweigbergk
a445384510 help: don't let tools reflow revlog flags list
Before this change, the text about revlog flags was reflowed into a
single paragraph, which made it a bit hard to read. I don't even know
the rules around this, but adding a blank line before each flag seems
to prevent the reflowing.
2017-01-17 11:45:10 -08:00
Martin von Zweigbergk
0ecfe18db3 help: format revlog.txt more closely to result
The rendered text has spaces before each item in the list
2017-01-17 11:29:06 -08:00
Denis Laxalde
86ca3ec602 hgweb: simplify calculation of first revision in filelog command 2017-01-17 09:19:24 +01:00
Denis Laxalde
8eecb0ced7 hgweb: restore ascending iteration on revs in filelog web command
Follow-up on e082a1597833. Adjust back the "parity" generator's offset to keep
rendering the same.
2017-01-17 09:17:29 +01:00
Denis Laxalde
098c0d5368 context: extract _changesinrange() out of blockancestors()
We'll need it to write a blockdescendants function in next changeset.
2017-01-16 09:22:32 +01:00
Pulkit Goyal
5a0e39fb56 util: add length argument to util.buffer()
util.buffer() either returns inbuilt buffer function or defines a new one which
slices. The inbuilt buffer() also has a length argument which is missing from
the ones we defined. This patch adds that length argument.
2017-01-14 20:05:15 +05:30
Pulkit Goyal
3c7388da12 py3: replace pycompat.getenv with encoding.environ.get
pycompat.getenv returns os.getenvb on py3 which is not available on Windows.
This patch replaces them with encoding.environ.get and checks to ensure no
new instances of os.getenv or os.setenv are introduced.
2017-01-15 13:17:05 +05:30
Yuya Nishihara
f3733be9e2 patch: check length of git index header only if integer is specified
Otherwise TypeError would be raised. Follows up 062245c938a0.
2017-01-15 16:33:15 +09:00
Gregory Szorc
765aada92f localrepo: experimental support for non-zlib revlog compression
The final part of integrating the compression manager APIs into
revlog storage is the plumbing for repositories to advertise they
are using non-zlib storage and for revlogs to instantiate a non-zlib
compression engine.

The main intent of the compression manager work was to zstd all
of the things. Adding zstd to revlogs has proved to be more involved
than other places because revlogs are... special. Very small inputs
and the use of delta chains (which are themselves a form of
compression) are a completely different use case from streaming
compression, which bundles and the wire protocol employ. I've
conducted numerous experiments with zstd in revlogs and have yet
to formalize compression settings and a storage architecture that
I'm confident I won't regret later. In other words, I'm not yet
ready to commit to a new mechanism for using zstd - or any other
compression format - in revlogs.

That being said, having some support for zstd (and other compression
formats) in revlogs in core is beneficial. It can allow others to
conduct experiments.

This patch introduces *highly experimental* support for non-zlib
compression formats in revlogs. Introduced is a config option to
control which compression engine to use. Also introduced is a namespace
of "exp-compression-*" requirements to denote support for non-zlib
compression in revlogs. I've prefixed the namespace with "exp-"
(short for "experimental") because I'm not confident of the
requirements "schema" and in no way want to give the illusion of
supporting these requirements in the future. I fully intend to drop
support for these requirements once we figure out what we're doing
with zstd in revlogs.

A good portion of the patch is teaching the requirements system
about registered compression engines and passing the requested
compression engine as an opener option so revlogs can instantiate
the proper compression engine for new operations.

That's a verbose way of saying "we can now use zstd in revlogs!"

On an `hg pull` conversion of the mozilla-unified repo with no extra
redelta settings (like aggressivemergedeltas), we can see the impact
of zstd vs zlib in revlogs:

$ hg perfrevlogchunks -c
! chunk
! wall 2.032052 comb 2.040000 user 1.990000 sys 0.050000 (best of 5)
! wall 1.866360 comb 1.860000 user 1.820000 sys 0.040000 (best of 6)

! chunk batch
! wall 1.877261 comb 1.870000 user 1.860000 sys 0.010000 (best of 6)
! wall 1.705410 comb 1.710000 user 1.690000 sys 0.020000 (best of 6)

$ hg perfrevlogchunks -m
! chunk
! wall 2.721427 comb 2.720000 user 2.640000 sys 0.080000 (best of 4)
! wall 2.035076 comb 2.030000 user 1.950000 sys 0.080000 (best of 5)

! chunk batch
! wall 2.614561 comb 2.620000 user 2.580000 sys 0.040000 (best of 4)
! wall 1.910252 comb 1.910000 user 1.880000 sys 0.030000 (best of 6)

$ hg perfrevlog -c -d 1
! wall 4.812885 comb 4.820000 user 4.800000 sys 0.020000 (best of 3)
! wall 4.699621 comb 4.710000 user 4.700000 sys 0.010000 (best of 3)

$ hg perfrevlog -m -d 1000
! wall 34.252800 comb 34.250000 user 33.730000 sys 0.520000 (best of 3)
! wall 24.094999 comb 24.090000 user 23.320000 sys 0.770000 (best of 3)

Only modest wins for the changelog. But manifest reading is
significantly faster. What's going on?

One reason might be data volume. zstd decompresses faster. So given
more bytes, it will put more distance between it and zlib.

Another reason is size. In the current design, zstd revlogs are
*larger*:

debugcreatestreamclonebundle (size in bytes)
zlib: 1,638,852,492
zstd: 1,680,601,332

I haven't investigated this fully, but I reckon a significant cause of
larger revlogs is that the zstd frame/header has more bytes than
zlib's. For very small inputs or data that doesn't compress well, we'll
tend to store more uncompressed chunks than with zlib (because the
compressed size isn't smaller than original). This will make revlog
reading faster because it is doing less decompression.

Moving on to bundle performance:

$ hg bundle -a -t none-v2 (total CPU time)
zlib: 102.79s
zstd:  97.75s

So, marginal CPU decrease for reading all chunks in all revlogs
(this is somewhat disappointing).

$ hg bundle -a -t <engine>-v2 (total CPU time)
zlib: 191.59s
zstd: 115.36s

This last test effectively measures the difference between zlib->zlib
and zstd->zstd for revlogs to bundle. This is a rough approximation of
what a server does during `hg clone`.

There are some promising results for zstd. But not enough for me to
feel comfortable advertising it to users. We'll get there...
2017-01-13 20:16:56 -08:00
Gregory Szorc
94d36bba2d revlog: use compression engine APIs for decompression
Now that compression engines declare their header in revlog chunks
and can decompress revlog chunks, we refactor revlog.decompress()
to use them.

Making full use of the property that revlog compressor objects are
reusable, revlog instances now maintain a dict mapping an engine's
revlog header to a compressor object. This is not only a performance
optimization for engines where compressor object reuse can result in
better performance, but it also serves as a cache of header values
so we don't need to perform redundant lookups against the compression
engine manager. (Yes, I measured and the overhead of a function call
versus a dict lookup was observed.)

Replacing the previous inline lookup table with a dict lookup was
measured to make chunk reading ~2.5% slower on changelogs and ~4.5%
slower on manifests. So, the inline lookup table has been mostly
preserved so we don't lose performance. This is unfortunate. But
many decompression operations complete in microseconds, so Python
attribute lookup, dict lookup, and function calls do matter.

The impact of this change on mozilla-unified is as follows:

$ hg perfrevlogchunks -c
! chunk
! wall 1.953663 comb 1.950000 user 1.920000 sys 0.030000 (best of 6)
! wall 1.946000 comb 1.940000 user 1.910000 sys 0.030000 (best of 6)
! chunk batch
! wall 1.791075 comb 1.800000 user 1.760000 sys 0.040000 (best of 6)
! wall 1.785690 comb 1.770000 user 1.750000 sys 0.020000 (best of 6)

$ hg perfrevlogchunks -m
! chunk
! wall 2.587262 comb 2.580000 user 2.550000 sys 0.030000 (best of 4)
! wall 2.616330 comb 2.610000 user 2.560000 sys 0.050000 (best of 4)
! chunk batch
! wall 2.427092 comb 2.420000 user 2.400000 sys 0.020000 (best of 5)
! wall 2.462061 comb 2.460000 user 2.400000 sys 0.060000 (best of 4)

Changelog chunk reading is slightly faster but manifest reading is
slower. What gives?

On this repo, 99.85% of changelog entries are zlib compressed (the 'x'
header). On the manifest, 67.5% are zlib and 32.4% are '\0'. This patch
swapped the test order of 'x' and '\0' so now 'x' is tested first. This
makes changelogs faster since they almost always hit the first branch.
This makes a significant percentage of manifest '\0' chunks slower
because that code path now performs an extra test. Yes, I too can't
believe we're able to measure the impact of an if..elif with simple
string compares. I reckon this code would benefit from being written
in C...
2017-01-13 19:58:00 -08:00
Denis Laxalde
e0d6f05072 hgweb: build the "entries" list directly in filelog command
There's no apparent reason to have this "entries" generator function that
builds a list and then yields its elements in reverse order and which is only
called to build the "entries" list. So just build the list directly, in
reverse order.

Adjust "parity" generator's offset to keep rendering the same.
2017-01-13 10:22:25 +01:00
Yuya Nishihara
5d86e43147 ui: check EOF of getpass() response read from command-server channel
readline() returns '' only when EOF is encountered, in which case, Python's
getpass() raises EOFError. We should do the same to abort the session as
"response expected."

This bug was reported to
https://bitbucket.org/tortoisehg/thg/issues/4659/
2017-01-14 20:31:35 +09:00
Gregory Szorc
550169e48e help: make "mergetool" an alias for "merge-tools"
I've probably typed `hg help mergetool` dozens of times. I'm tired
of it not working.
2017-01-13 21:21:02 -08:00
Matthieu Laneuville
1146ca6217 templatekw: force noprefix=False to insure diffstat consistency (issue4755)
The result of diffstatdata should not depend on having noprefix set or not, as
was reported in issue 4755. Forcing noprefix to false on call makes sure the
parser receives the diff in the correct format and returns the proper result.

Another way to fix this would have been to change the regular expressions in
path.diffstatdata(), but that would have introduced many unecessary special
cases.
2017-01-12 21:06:55 +09:00
Pierre-Yves David
b3ce804dcd similar: remove caching from the module level
To prevent Bad Things™ from happening, let's rework the logic to not use
util.cachefunc.
2017-01-13 11:42:36 -08:00
Sean Farley
7335c165eb patch: add label for coloring the similarity extended header
Just like the summary says, this will colorize the:

  similarity index 88%

line in the diff output.
2017-01-09 11:01:45 -08:00
Sean Farley
311a50fdae patch: use opt.showsimilarity to calculate and show the similarity
Tests have been added.
2017-01-09 11:24:18 -08:00
Sean Farley
bf5e8cb800 patch: add similarity config knob in experimental section
This config knob will control whether or not to show the similarity
calculation in the diff output:

  diff --git a/README.md b/foo.md
  similarity index 88%
  rename from README.md
  rename to foo.md
  --- a/README.md
  +++ b/foo.md
2017-01-09 10:51:44 -08:00
Sean Farley
8fc2b48eb5 similar: move score function to module level
Future patches will use this to report the similarity of a rename / copy
in the patch output.
2017-01-07 20:47:57 -08:00
Yuya Nishihara
5ade140d5c revset: abuse x:y syntax to specify line range of followlines()
This slightly complicates the parsing (see the previous patch), but the
overall result seems not bad.

I keep x:, :y and : for future extension.
2017-01-09 17:58:19 +09:00
Yuya Nishihara
615f3c1669 revset: do not transform range* operators in parsed tree
This allows us to handle x:y range as a general range object. A primary user
of it is followlines().
2017-01-09 16:55:56 +09:00
Yuya Nishihara
0f4a24bbbf revset: add default value to getinteger() helper
This seems handy.
2017-01-09 17:45:11 +09:00
Yuya Nishihara
49d42c696d revset: factor out getinteger() helper
We have 4 revset functions that take integer arguments, and they handle
their arguments in slightly different ways. This patch unifies them:

 - getstring() in place of getsymbol(), which is more consistent with the
   handling of integer revisions (both 1 and '1' are valid)
 - say "expects" instead of "requires" for type errors

We don't need to catch TypeError since getstring() must return a string.
2017-01-09 17:39:44 +09:00
Yuya Nishihara
a73b0aaf6b revset: rename rev argument of followlines() to startrev
The rev argument has the same meaning as startrev of follow(), and I think
startrev is more informative.

followlines() is new function, we can make BC now.
2017-01-09 16:16:26 +09:00
Yuya Nishihara
a0c3bc199a help: use :hg: role and canonical name to point to revset string patterns
Follows up ae418afed3f6. Now revisions.txt and revsets.txt has been merged,
so use revisions.* as a pointer.
2017-01-13 23:48:21 +09:00
Gregory Szorc
4a3b8df214 util: compression APIs to support revlog decompression
Previously, compression engines had APIs for performing revlog
compression but no mechanism to perform revlog decompression. This
patch changes that.

Revlog decompression is slightly more complicated than compression
because in the compression case there is (currently) only a single
engine that can be used at a time. However for decompression, a
revlog could contain chunks from multiple compression engines. This
means decompression needs to map to multiple engines and
decompressors. This functionality is outside the scope of this patch.
But it drives the decision for engines to declare a byte header
sequence that identifies revlog data as belonging to an engine and
an API for obtaining an engine from a revlog header.
2017-01-02 13:27:20 -08:00
Anton Shestakov
9427025e13 crecord: add an experimental option for space key to move cursor down
I really want to have an option of toggling a selection on a line and also
moving cursor down as a single keystroke. It also kinda makes sense for space
key to do this, because some other curses UIs in the wild do this (e.g. various
file managers, htop). So I got an idea to make a config option that defaults to
False for compatibility, but allows making crecord UI a lot more useful for
people with big hunks.

We add this an experimental option to experiment with this behavior.
2017-01-08 10:08:29 +08:00
Gregory Szorc
24c1205d69 revlog: use compression engine API for compression
This commit swaps in the just-added revlog compressor API into
the revlog class.

Instead of implementing zlib compression inline in compress(), we
now store a cached-on-first-use revlog compressor on each revlog
instance and invoke its "compress()" method.

As part of this, revlog.compress() has been refactored a bit to use
a cleaner code flow and modern formatting (e.g. avoiding
parenthesis around returned tuples).

On a mozilla-unified repo, here are the "compress" times for a few
commands:

$ hg perfrevlogchunks -c
! wall 5.772450 comb 5.780000 user 5.780000 sys 0.000000 (best of 3)
! wall 5.795158 comb 5.790000 user 5.790000 sys 0.000000 (best of 3)

$ hg perfrevlogchunks -m
! wall 9.975789 comb 9.970000 user 9.970000 sys 0.000000 (best of 3)
! wall 10.019505 comb 10.010000 user 10.010000 sys 0.000000 (best of 3)

Compression times did seem to slow down just a little. There are
360,210 changelog revisions and 359,342 manifest revisions. For the
changelog, mean time to compress a revision increased from ~16.025us to
~16.088us. That's basically a function call or an attribute lookup. I
suppose this is the price you pay for abstraction. It's so low that
I'm not concerned.
2017-01-02 11:22:52 -08:00
Gregory Szorc
29c30e4b7e util: compression APIs to support revlog compression
As part of "zstd all of the things," we need to teach revlogs to
use non-zlib compression formats. Because we're routing all compression
via the "compression manager" and "compression engine" APIs, we need to
introduction functionality there for performing revlog operations.

Ideally, revlog compression and decompression operations would be
implemented in terms of simple "compress" and "decompress" primitives.
However, there are a few considerations that make us want to have a
specialized primitive for handling revlogs:

1) Performance. Revlogs tend to do compression and especially
   decompression operations in batches. Any overhead for e.g.
   instantiating a "context" for performing an operation can be
   noticed. For this reason, our "revlog compressor" primitive is
   reusable. For zstd, we reuse the same compression "context" for
   multiple operations. I've measured this to have a performance
   impact versus constructing new contexts for each operation.

2) Specialization. By having a primitive dedicated to revlog use,
   we can make revlog-specific choices and leave the door open for
   more functionality in the future. For example, the zstd revlog
   compressor may one day make use of dictionary compression.

A future patch will introduce a decompress() on the compressor
object.

The code for the zlib compressor is basically copied from
revlog.compress(). Although it doesn't handle the empty input
case, the null first byte case, and the 'u' prefix case. These
cases will continue to be handled in revlog.py once that code is
ported to use this API.
2017-01-02 12:39:03 -08:00
Gregory Szorc
1a6670d670 revlog: move decompress() from module to revlog class (API)
Upcoming patches will convert revlogs to use the compression engine
APIs to perform all things compression. The yet-to-be-introduced
APIs support a persistent "compressor" object so the same object
can be reused for multiple compression operations, leading to
better performance. In addition, compression engines like zstd
may wish to tweak compression engine state based on the revlog
(e.g. per-revlog compression dictionaries).

A global and shared decompress() function will shortly no longer
make much sense. So, we move decompress() to be a method of the
revlog class. It joins compress() there.

On the mozilla-unified repo, we can measure the impact of this change
on reading performance:

$ hg perfrevlogchunks -c
! chunk
! wall 1.932573 comb 1.930000 user 1.900000 sys 0.030000 (best of 6)
! wall 1.955183 comb 1.960000 user 1.930000 sys 0.030000 (best of 6)
! chunk batch
! wall 1.787879 comb 1.780000 user 1.770000 sys 0.010000 (best of 6
! wall 1.774444 comb 1.770000 user 1.750000 sys 0.020000 (best of 6)

"chunk" appeared to become slower but "chunk batch" got faster. Upon
further examination by running both sets multiple times, the numbers
appear to converge across all runs. This tells me that there is no
perceived performance impact to this refactor.
2017-01-02 13:00:16 -08:00
Gregory Szorc
df8167ed29 revlog: make compressed size comparisons consistent
revlog.compress() compares the compressed size to the input size
and throws away the compressed data if it is larger than the input.
This is the correct thing to do, as storing compressed data that
is larger than the input takes up more storage space and makes reading
slower.

However, the comparison was implemented inconsistently. For the
streaming compression mode, we threw away the result if it was
greater than or equal to the input size. But for the one-shot
compression, we threw away the compression only if it was greater
than the input size!

This patch changes the comparison for the simple case so it is
consistent with the streaming case.

As a few tests demonstrate, this adds 1 byte to some revlog entries.
This is because of an added 'u' header on the chunk. It seems
somewhat wrong to increase the revlog size here. However, IMO the cost
of 1 byte in storage is insignificant compared to the performance gains
of avoiding decompression. This patch should invite questions around
the heuristic for throwing away compressed data. For example, I'd argue
we should be more liberal about rejecting compressed data, additionally
doing so where the number of bytes saved fails to reach a threshold.
But we can have this discussion another time.
2017-01-02 11:50:17 -08:00
Sean Farley
3c1cbd7c9b similar: rename local variable to not collide with previous
Future patches will move the score function to the module level, so
let's not shadow that.
2017-01-07 20:43:49 -08:00
Sean Farley
25acd53e01 patch: add label for coloring the index extended header
Just like the summary says, this will colorize the:

  index 3d3ba4b65e11..57274a0f46b2 100644

line in the diff output.
2017-01-09 10:59:45 -08:00
Sean Farley
14adabd19a patch: add index line for diff output
This helps highlighting in third-party diff coloring (which assumes git
output) and maintains pedantic correctness with diff --git.

Tests will be added at the end of the series.
2016-12-31 15:41:57 -06:00
Sean Farley
8cd1b5827c patch: add config knob for displaying the index header
This config knob can take an integer between 0 and 40 or a
keyword ('none', 'short', 'full') to control the length of hash to
output. It will display diffs with the git index header as such,

  diff --git a/mercurial/mdiff.py b/mercurial/mdiff.py
  index 112edf7..d6b52c5 100644

We'll put this in the experimental section for now.
2017-01-09 11:13:47 -08:00
Martin von Zweigbergk
9e63f2d21c bisect: refer directly to bisect() revset predicate in help
We have specific syntax for displaying the help text for a particular
revset predicate, so let's refer directly to the bisect() revset in
the verbose bisect help. It seems likely that the user doesn't care
about other revsets at that point, so they will probably not miss the
text about the other revset predicates.
2017-01-12 12:05:23 -08:00
Martin von Zweigbergk
029203f29d help: remove now-redundant pointer to revsets help
"hg help revisions" and "hg help revsets" now point to the same text,
so drop the revsets reference.
2017-01-12 11:52:05 -08:00