I used this to more quickly track down a failing test-check-config.t issue
in another repo. I thought it might be useful more generally, so I'm sending
it out in case others think it's a worthwhile change.
Previously, we trust Differential Revision in commit message blindly. This
patch adds sanity check so a host name change will be detected and the
commit message will be ignored.
Differential Revision: https://phab.mercurial-scm.org/D35
The diff property contains metadata like "HG Node". Previously we skip
uploading a new diff if we are sure that the old patch and new patch have a
same content. That has issues when a pusher adds an obsmarker using the old
"HG Node" stored in the old diff.
This patch adds logic to update the diff property so "HG Node" gets updated
to prevent that issue.
Differential Revision: https://phab.mercurial-scm.org/D229
This makes it more strict when checking whether or not we should update a
Differential Revision. For example,
a) Alice updates D1 to content 1.
b) Bob updates D1 to content 2.
c) Alice tries to update D1 to content 1.
Previously, `c)` will do nothing because `phabsend` detects the patch is not
changed. A more correct behavior is to override Bob's update here, hence the
patch.
This also makes it possible to return a reaonsable "last node" when there is
no tags but only `Differential Revision` commit messages.
Test Plan:
```
for i in A B C; do echo $i > $i; hg ci -m $i -A $i; done
hg phabsend 0::
# D40: created
# D41: created
# D42: created
echo 3 >> C; hg amend; hg phabsend .
# D42: updated
hg tag --local --hidden -r 2 -f D42
# move tag to the previous version
hg phabsend .
# D42: skipped (previously it would be "updated")
rm -rf .hg; hg init
hg phabread --stack D42 | hg import -
hg phabsend .
# D42: updated
hg tag --local --remove D42
hg commit --amend
hg phabsend .
# D42: updated (no new diff uploaded, previously it will upload a new diff)
```
The old diff object is now returned, which could be useful in the next
patch.
Differential Revision: https://phab.mercurial-scm.org/D121
This adds a --confirm flag similar to the confirm flag of `hg email` using which
one can confirm the changesets before they get emailed. The confirm flag will
show the changesets and ask for confirmation before sending them.
Differential Revision: https://phab.mercurial-scm.org/D218
For example, if "hg log" was defined as an alias:
# /etc/mercurial/hgrc
[alias]
log = log --graph
the buildrpm script would be surprised by log messages formatted in
unexpected ways, and bail out.
This patch sets HGPLAIN, effectively resetting all the user configs,
including log output, to a common state, making the build more
predictable across all the possible environments.
I'm not sure if there's a reason chg is not added by default.
If not, I would like to propose adding in this patch.
Differential Revision: https://phab.mercurial-scm.org/D220
Sometimes people want to specify reviewer explicitly for a stack. The
webpage only allows changing reviewer for one revision at a time. This patch
adds a `--reviewer` flag to make it easier to specify reviewers.
Test Plan:
On a test Phabricator instance, enable `differential.allow-self-accept`,
assign myself as a reviewer and make sure it works. Also try an invalid
username and make sure it raises.
Differential Revision: https://phab.mercurial-scm.org/D38
Previously we trust local tags blindly and that could cause wrong
Differential Revision to be updated, when people switch between Phabricator
instances.
This patch adds verification logic to detect such issue and remove
problematic tags. For example, a tag "D19" was on node "X", the code will
fetch all diffs attached to D19, and check if nodes server-side overlaps
with nodes in precursors. If they do not overlap, create a new Differential
Revision.
Test Plan:
Use a test Phabricator instance, send patches using `hg phabsend`, then
change the local tag manually to a wrong Differential Revision number.
Amend the patch and send again. Make sure the tag gets ignored and deleted.
Differential Revision: https://phab.mercurial-scm.org/D36
This allows us to do extra sanity checks using batch APIs to prevent
updating a wrong revision, which could happen when people switch Phabricator
instances and having stale tags living in the repo.
Differential Revision: https://phab.mercurial-scm.org/D34
Previously we only respect hg:meta sent by phabsend. This patch makes it
respect local:commits sent by arc as well. This avoids issues where phabread
could lose the author information.
Test Plan:
Commit using a customized user, send the patch using arc to a test
Phabricator instance, and then read the patch using phabread. Make sure it
preserves the user information.
Differential Revision: https://phab.mercurial-scm.org/D33
This was meant as a substitute for Python's "with" with multiple
context managers before we moved to Python 2.7. We're now on 2.7, so
we should have no reason to keep ctxmanager. "hg grep --all
ctxmanager" says that it was never used anyway.
Differential Revision: https://phab.mercurial-scm.org/D73
When using the 'import' command running ZSH together with its
Mercurial-specific completions the flag '--partial' (introduced in Mercurial
3.1, 2014-08-01) is not offered as completion option.
This patch adds it to the list of completions for the 'import' command.
I'm still cleaning this up, but it's easier to do in bite-size chunks
like this than all at once. The negative lookahead avoids one false
positive category from some output related to finding Subversion
bindings.
Differential Revision: https://phab.mercurial-scm.org/D13
It's possible to set up non-linear dependencies in Phabricator like:
o D4
|\
| o D3
| |
o | D2
|/
o D1
The old `phabread` code will print D1 twice. This patch adds de-duplication
to prevent that.
Test Plan:
Construct the above dependencies in a Phabricator test instance and make
sure the old code prints D1 twice while the new code won't.
Previously, we read Differential Revisions one by one by calling
`differential.query`.
Fetching them one by one is suboptimal. Unfortunately, there is no Conduit
API that allows us to get a stack of diffids using a single API call.
This patch tries to be smarter using a simple heuristic: when fetching D59
as a stack, previous IDs like D51, D52, D53, ..., D58 are likely belonging
to a same stack so just fetch them as well. Since `differential.query` only
returns cheap metadata without expensive diff content, it shouldn't be a big
problem for the server.
Using a test Phabricator instance, this patch reduces `phabread` reading a
10 patch stack from about 13 to 30 seconds to 8 seconds.
Previously, we call differential.getcommitmessage API to get commit
messages. Now we read that from "Differential Revision" object fetched
via "differential.query" API.
This removes one API call per patch.
This patch reworked phabread a bit so it fetches the lightweight
"Differential Revision" metadata for a stack first. Then read other data.
This allows the code to:
a) send 1 request to get all `hg:meta` data instead of N requests
b) patches are read in desired order, no need to buffer the output
"b)" reduces the memory usage from O(N^2) to O(N) since we no longer keep
old patch contents in memory.
The above `N` means the number of patches in the stack.
Previously we didn't abort. Now we abort if revs is empty. This is
consistent with "hg export" behavior. Maybe "return 1" is also a reasonable
behavior, but that's inconsistent with the existing "hg export".
Previously, `phabsend` uploads new diffs as long as the commit hash changes.
That's suboptimal because sometimes the diff is exactly the same as before,
the commit hash change is caused by a parent hash change, or commit message
change which do not affect diff content.
This patch adds a check examining actual diff contents to skip uploading new
diffs in that case.
The "hg:meta" property is for extra metadata to reconstruct the patch.
Previously it does not have node or parent information since I think by
reading the patch again, the commit message will be mangled (like, added the
"Differential Revision" line) and we cannot preserve the commit hash.
However, the "parent" information could be useful. It could be helpful to
locate the "base revision" so in case of a conflict applying the patch
directly, we might be able to use 3-way merge to resolve it correctly.
Note: "local:commits" is an existing "property" used by Phabricator that has
the node and parent information. However, it lacks of timezone information
and requires "author" and "authorEmail" to be separated. So we are using a
different "property" - "hg:meta" to be distinguished from "local:commits".
Previously, only tags can "associate" a changeset to a Differential
Revision. But the usual pattern (arc patch or hg phabread) is to put the
Differential Revision URL in commit message.
This patch makes the code read commit message to find associated
Differential Revision if associated tags are not found.
This makes some workflows possible. For example, if the author loses their
repo, or switch to another computer, they can continue download their own
patches from Phabricator and update them without needing to manually create
tags.
This patch adds a `phabread` command generating plain-text patches from
Phabricator, suitable for `hg import`. It respects `hg:meta` so user and
date information might be preserved. And it removes `Summary:` field name
which makes the commit message a bit tidier.
To support stacked diffs, a `--stack` flag was added to read dependent
patches recursively.
The `phabsend` command is intended to provide `hg email`-like experience -
sending a stack, setup dependency information and do not amend existing
changesets.
It uses differential.createrawdiff and differential.revision.edit Conduit
API to create or update a Differential Revision.
Local tags like `D123` are written indicating certain changesets were sent
to Phabricator. The `phabsend` command will use obsstore and tags
information to decide whether to update or create Differential Revisions.
The default Phabricator client arcanist is not friendly to send a stack of
changesets. It works better when a feature branch is reviewed as a single
review unit. However, we want multiple revisions per feature branch.
To be able to have an `hg email`-like UX to send and receive a stack of
commits easily, it seems we have to re-invent things. This patch adds
`phabricator.py` speaking Conduit API [1] in `contrib` as the first step.
This may also be an option for people who don't want to run PHP.
Config could be done in `hgrc` (instead of `arcrc` or `arcconfig`):
[phabricator]
# API token. Get it from https://phab.mercurial-scm.org/conduit/login/
token = cli-xxxxxxxxxxxxxxxxxxxxxxxxxxxx
url = https://phab.mercurial-scm.org/
# callsign is used by the next patch
callsign = HG
This patch only adds a single command: `debugcallconduit` to keep the patch
size small. To test it, having the above config, and run:
$ hg debugcallconduit diffusion.repository.search <<EOF
> {"constraints": {"callsigns": ["HG"]}}
> EOF
The result will be printed in prettified JSON format.
[1]: Conduit APIs are listed at https://phab.mercurial-scm.org/conduit/
The ignore regular expression has been updated to detect
"inconsistent config." If present, we track which configs have
that set and we suppress the conflicting defaults error for those
options.
I also added named groups to the regexp to aid readability.
A comment was added to profiling.py to make a desired inconsistent
value error go away.
For builds that run on hermetic environments, it's possible that the "less"
package is not installed by default, yet it's needed for tests to pass after
revision ca1519568a93 (which sets less as the fallback pager).
When trying to create a zstd bundle, the MSI based install said:
abort: compression engine zstd could not be loaded
The Inno installer is unaffected. The name will need to be updated to include
'cext' when merging into default.
Having linux wheels is going to helps system without compiler or python-dev
plus speed up the installation for everyone.
I followed the manylinux example repository
https://github.com/pypa/python-manylinux-demo
to add a make target (build-linux-wheels) using
official docker image to build python 2 linux wheels
for mercurial. It generates Python 2.6 and Python 2.7 for both
32 and 64 bits architectures.
I had to blacklist several test cases for various reasons:
* test-convert-git.t and test-subrepo-git.t because of the git version
* test-patchbomb-tls.t because of warning using tls 1.0
It's likely because the docker image is based on centos 5.0 and
openssl is outdated.
Now that environment variables override system-wide hgrc settings, we
can default Mercurial to sensible-editor and sensible-pager by default
for debian users.
Some shared-ssh installations assume that 'hg serve --stdio' is a safe
command to run for minimally trusted users. Unfortunately, the messy
implementation of argument parsing here meant that trying to access a
repo named '--debugger' would give the user a pdb prompt, thereby
sidestepping any hoped-for sandboxing. Serving repositories over HTTP(S)
is unaffected.
We're not currently hardening any subcommands other than 'serve'. If
your service exposes other commands to users with arbitrary repository
names, it is imperative that you defend against repository names of
'--debugger' and anything starting with '--config'.
The read-only mode of hg-ssh stopped working because it provided its hook
configuration to "hg serve --stdio" via --config parameter. This is banned for
security reasons now. This patch switches it to directly call ui.setconfig().
If your custom hosting infrastructure relies on passing --config to
"hg serve --stdio", you'll need to find a different way to get that configuration
into Mercurial, either by using ui.setconfig() as hg-ssh does in this patch,
or by placing an hgrc file someplace where Mercurial will read it.
mitrandir@fb.com provided some extra fixes for the dispatch code and
for hg-ssh in places that I overlooked.
This patch also makes some expected output lines in tests glob-ed for
persistence of them.
BTW, files below aren't yet changed in 2017, but this patch also
updates copyright of them, because:
- mercurial/help/hg.1.txt
almost all of "man hg" output comes from online help of hg
command, and is already changed in 2017
- mercurial/help/hgignore.5.txt
- mercurial/help/hgrc.5
"copyright 2005-201X Matt Mackall" in them mentions about
copyright of Mercurial itself
This patch also adds new check-code.py pattern to detect invalid usage
of "mercurial@selenic.com".
Change for test-convert-tla.t is tested, but similar change for almost
same test-convert-baz.t isn't yet tested actually, because I couldn't
find out the way to get "GNU Arch baz client".
AFAIK, buildbot skips test-convert-baz.t, too. Does anybody have
appropriate environment for testing?
This is a cherry pick of an upstream fix. The free() of uninitialed
memory could likely only occur if a malloc() inside zstd fails.
The patched functions aren't currently used by Mercurial. But I don't
like leaving footguns sitting around.
Previously, when runcommand raises, chg aborts with, and does not wait for
pager. The call stack is like:
hgc_runcommand -> handleresponse -> readchannel -> debugmsg("failed to
read channel") -> exit(255)
That means, chg returns to the shell, then both the pager and the shell will
read from the terminal at the same time, causing problems.
This patch fixes that by using "atexit" to register the pager cleanup
function so chg will always wait for pager even if runcommand raises.
Commit 63c68d6f5fc8de4afd9bde81b13b537beb4e47e8 from
https://github.com/indygreg/python-zstandard is imported without
modifications (other than removing unwanted files).
This includes minor performance and feature improvements. It also
changes the vendored zstd library from 1.1.1 to 1.1.2.
# no-check-commit
pycompat.getenv returns os.getenvb on py3 which is not available on Windows.
This patch replaces them with encoding.environ.get and checks to ensure no
new instances of os.getenv or os.setenv are introduced.
Module-level @cachefunc usage is risky because it can easily create a
memory "leak". Let's reject it completely for now. If a valid usage
comes up in the future, we can always improve the check or reconsider.
Now that the revlog has a reference to a compressor, it is
possible to swap in other compression engines. So, teach
`hg perfrevlogchunks` to do that.
The default behavior of `hg perfrevlogchunks` is now to measure the
compression performance of all compression engines implementing the
revlog compressor API. This effectively adds the no-op "none"
compressor and zstd (when available) into the default set.
While we can't yet plug alternate compressors into revlogs, this
command gives us a preview of the performance. On the mozilla-unified
repository:
$ hg perfrevlogchunks -c
! compress w/ none
! wall 0.115159 comb 0.110000 user 0.110000 sys 0.000000 (best of 86)
! compress w/ zlib
! wall 5.681406 comb 5.680000 user 5.680000 sys 0.000000 (best of 3)
! compress w/ zstd
! wall 2.624781 comb 2.620000 user 2.620000 sys 0.000000 (best of 4)
$ hg perfrevlogchunks -m
! compress w/ none
! wall 0.124486 comb 0.120000 user 0.120000 sys 0.000000 (best of 79)
! compress w/ zlib
! wall 10.144701 comb 10.150000 user 10.150000 sys 0.000000 (best of 3)
! compress w/ zstd
! wall 4.383118 comb 4.390000 user 4.390000 sys 0.000000 (best of 3)
Those numbers for zstd look promising. But they aren't the full story.
For that, we'll need to look at decompression times and storage sizes.
Stay tuned...
Upcoming patches will convert revlogs to use the compression engine
APIs to perform all things compression. The yet-to-be-introduced
APIs support a persistent "compressor" object so the same object
can be reused for multiple compression operations, leading to
better performance. In addition, compression engines like zstd
may wish to tweak compression engine state based on the revlog
(e.g. per-revlog compression dictionaries).
A global and shared decompress() function will shortly no longer
make much sense. So, we move decompress() to be a method of the
revlog class. It joins compress() there.
On the mozilla-unified repo, we can measure the impact of this change
on reading performance:
$ hg perfrevlogchunks -c
! chunk
! wall 1.932573 comb 1.930000 user 1.900000 sys 0.030000 (best of 6)
! wall 1.955183 comb 1.960000 user 1.930000 sys 0.030000 (best of 6)
! chunk batch
! wall 1.787879 comb 1.780000 user 1.770000 sys 0.010000 (best of 6
! wall 1.774444 comb 1.770000 user 1.750000 sys 0.020000 (best of 6)
"chunk" appeared to become slower but "chunk batch" got faster. Upon
further examination by running both sets multiple times, the numbers
appear to converge across all runs. This tells me that there is no
perceived performance impact to this refactor.
Selecting single and multiple revisions is closely related, so let's
put it in one place, so users can easily find it. We actually did not
even point to "hg help revsets" from "hg help revisions", but now that
they're on a single page, that won't be necessary.
This patch uses the newly introduced "setprocname" interface to update the
process title server-side, to make it easier to tell what a worker is actually
doing.
The new title is "chg[worker/$PID]", where PID is the process ID of the
connected client. It can be directly observed using "ps -AF" under Linux, or
"ps -A" under FreeBSD.
We have enough bits to switch to the new chg pager code path in runcommand.
So just remove the legacy getpager support.
This is a red-only patch, and will break chg's pager support temporarily.
This patch implements the simple S-channel pager handling at chg
client-side.
Note: It does not deal with environ and cwd currently for simplicity, which
will be fixed later.
Previously S channel is only used to send system commands. It will also be
used to send pager commands. So add a type parameter.
This breaks older chg clients. But chg and hg should always come from a
single commit and be packed into a single package. Supporting running
inconsistent versions of chg and hg seems to be unnecessarily complicated
with little benefit. So just make the change and assume people won't use
inconsistent chg with hg.
After dropping the garbage collector hack, `hg perfstartup` started yelling
about not being able to import the evolve extension, which I have in my user
config. Launching `env` shows that an empty HGRCPATH isn't exported to the
environment. Since `env` doesn't quote, I have no idea if the variable is
trimmed, but Mercurial doesn't complain when processing it.
We previously weren't looking for this config helper. And, surprise,
profiling.py references config options without docs.
If I tried hard enough, I could have combined the regexps using a
positive lookbehind assertion or something. But I didn't want to make
my brain explode.
At some point, we should probably do this linting at the tokenizer or
ast layer. I'm not willing to open that can of worms right now.
Instead of mentioning 127.0.0.1, we should use $LOCALIP. Anytime
$LOCALIP appears in output, we should make sure we use (glob) on that
line of output so that weird environments that do remapping jiggery
pokery (such as our FreeBSD buildbot that's in a jail) don't get
spurious test failures.
Commit 81e1f5bbf1fc54808649562d3ed829730765c540 from
https://github.com/indygreg/python-zstandard is imported without
modifications (other than removing unwanted files).
Updates relevant to Mercurial include:
* Support for multi-threaded compression (we can use this for
bundle and wire protocol compression).
* APIs for batch compression and decompression operations using
multiple threads and optimal memory allocation mechanism. (Can
be useful for revlog perf improvements.)
* A ``BufferWithSegments`` type that models a single memory buffer
containing N discrete items of known lengths. This type can be
used for very efficient 0-copy data operations.
# no-check-commit
I softly formalized the concept of a "bundle specification" a while
ago when I was working on clone bundles and stream clone bundles and
wanted a more robust way to define what exactly is in a bundle file.
The concept has existed for a while. Since it is part of the clone
bundles feature and exposed to the user via the "-t" argument to
`hg bundle`, it is something we need to support for the long haul.
After the 4.1 release, I heard a few people comment that they didn't
realize you could generate zstd bundles with `hg bundle`. I'm
partially to blame for not documenting it in bundle's docstring.
Additionally, I added a hacky, experimental feature for controlling
the compression level of bundles in 054e64c4d837. As the commit
message says, I went with a quick and dirty solution out of time
constraints. Furthermore, I wanted to eventually store this
configuration in the "bundlespec" so it could be made more flexible.
Given:
a) bundlespecs are here to stay
b) we don't have great documentation over what they are, despite being
a user-facing feature
c) the list of available compression engines and their behavior isn't
exposed
d) we need an extensible place to modify behavior of compression
engines
I want to move forward with formalizing bundlespecs as a user-facing
feature. This commit does that by introducing a "bundlespec" help
page. Leaning on the just-added compression engine documentation
and API, the topic also conveniently lists available compression
engines and details about them. This makes features like zstd
bundle compression more discoverable. e.g. you can now
`hg help -k zstd` and it lists the "bundlespec" topic.
Currently, Mercurial has a number of commands to show information. And,
there are features coming down the pipe that will introduce more
commands for showing information.
Currently, when introducing a new class of data or a view that we
wish to expose to the user, the strategy is to introduce a new command
or overload an existing command, sometimes both. For example, there is
a desire to formalize the wip/smartlog/underway/mine functionality that
many have devised. There is also a desire to introduce a "topics"
concept. Others would like views of "the current stack." In the
current model, we'd need a new command for wip/smartlog/etc (that
behaves a lot like a pre-defined alias of `hg log`). For topics,
we'd likely overload `hg topic[s]` to both display and manipulate
topics.
Adding new commands for every pre-defined query doesn't scale well
and pollutes `hg help`. Overloading commands to perform read-only and
write operations is arguably an UX anti-pattern: while having all
functionality for a given concept in one command is nice, having a
single command doing multiple discrete operations is not. Furthermore,
a user may be surprised that a command they thought was read-only
actually changes something.
We discussed this at the Mercurial 4.0 Sprint in Paris and decided that
having a single command where we could hang pre-defined views of
various data would be a good idea. Having such a command would:
* Help prevent an explosion of new query-related commands
* Create a clear separation between read and write operations
(mitigates footguns)
* Avoids overloading the meaning of commands that manipulate data
(bookmark, tag, branch, etc) (while we can't take away the
existing behavior for BC reasons, we now won't introduce this
behavior on new commands)
* Allows users to discover informational views more easily by
aggregating them in a single location
* Lowers the barrier to creating the new views (since the barrier
to creating a top-level command is relatively high)
So, this commit introduces the `hg show` command via the "show"
extension. This command accepts a positional argument of the
"view" to show. New views can be registered with a decorator. To
prove it works, we implement the "bookmarks" view, which shows a
table of bookmarks and their associated nodes.
We introduce a new style to hold everything used by `hg show`.
For our initial bookmarks view, the output varies from `hg bookmarks`:
* Padding is performed in the template itself as opposed to Python
* Revision integers are not shown
* shortest() is used to display a 5 character node by default (as
opposed to static 12 characters)
I chose to implement the "bookmarks" view first because it is simple
and shouldn't invite too much bikeshedding that detracts from the
evaluation of `hg show` itself. But there is an important point
to consider: we now have 2 ways to show a list of bookmarks. I'm not
a fan of introducing multiple ways to do very similar things. So it
might be worth discussing how we wish to tackle this issue for
bookmarks, tags, branches, MQ series, etc.
I also made the choice of explicitly declaring the default show
template not part of the standard BC guarantees. History has shown
that we make mistakes and poor choices with output formatting but
can't fix these mistakes later because random tools are parsing
output and we don't want to break these tools. Optimizing for human
consumption is one of my goals for `hg show`. So, by not covering
the formatting as part of BC, the barrier to future change is much
lower and humans benefit.
There are some improvements that can be made to formatting. For
example, we don't yet use label() in the templates. We obviously
want this for color. But I'm not sure if we should reuse the existing
log.* labels or invent new ones. I figure we can punt that to a
follow-up.
At the aforementioned Sprint, we discussed and discarded various
alternatives to `hg show`.
We considered making `hg log <view>` perform this behavior. The main
reason we can't do this is because a positional argument to `hg log`
can be a file path and if there is a conflict between a path name and
a view name, behavior is ambiguous. We could have introduced
`hg log --view` or similar, but we felt that required too much typing
(we don't want to require a command flag to show a view) and wasn't
very discoverable. Furthermore, `hg log` is optimized for showing
changelog data and there are things that `hg display` could display
that aren't changelog centric.
There were concerns about using "show" as the command name.
Some users already have a "show" alias that is similar to `hg export`.
There were also concerns that Git users adapted to `git show` would
be confused by `hg show`'s different behavior. The main difference
here is `git show` prints an `hg export` like view of the current
commit by default and `hg show` requires an argument. `git show`
can also display any Git object. `git show` does not support
displaying more complex views: just single objects. If we
implemented `hg show <hash>` or `hg show <identifier>`, `hg show`
would be a superset of `git show`. Although, I'm hesitant to do that
at this time because I view `hg show` as a higher-level querying
command and there are namespace collisions between valid identifiers
and registered views.
There is also a prefix collision with `hg showconfig`, which is an
alias of `hg config`.
We also considered `hg view`, but that is already used by the "hgk"
extension.
`hg display` was also proposed at one point. It has a prefix collision
with `hg diff`. General consensus was "show" or "view" are the best
verbs. And since "view" was taken, "show" was chosen.
There are a number of inline TODOs in this patch. Some of these
represent decisions yet to be made. Others represent features
requiring non-trivial complexity. Rather than bloat the patch or
invite additional bikeshedding, I figured I'd document future
enhancements via TODO so we can get a minimal implmentation landed.
Something is better than nothing.
In filerevision view (/file/<rev>/<fname>) we add some event listeners on
mouse clicks of <span> elements in the <pre class="sourcelines"> block.
Those listeners will capture a range of lines selected between two mouse
clicks and a box inviting to follow the history of selected lines will then
show up. Selected lines (i.e. the block of lines) get a CSS class which make
them highlighted. Selection can be cancelled (and restarted) by either
clicking on the cancel ("x") button in the invite box or clicking on any other
source line. Also clicking twice on the same line will abort the selection and
reset event listeners to restart the process.
As a first step, this action is only advertised by the "cursor: cell" CSS rule
on source lines elements as any other mechanisms would make the code
significantly more complicated. This might be improved later.
All JavaScript code lives in a new "linerangelog.js" file, sourced in
filerevision template (only in "paper" style for now).
revlog.revision takes either node or rev, but taking a rev is more
efficient, because converting rev to node is just a seek and read.
That's cheaper than converting node to rev, which may require O(n) walk in
revlog index for the first times, and then triggering building the radix
tree index. Even with the radix tree built, rev -> node is still faster than
node -> rev because the radix tree requires more jumps in memory.
So r.revision(r.node(rev)) should be changed to r.revision(rev). This patch
adds a check-code rule to detect that.
This is similar to what 9704c8e70d2d does. Since 1363aaf74791 has changed
"127.0.0.1" to "$LOCALIP". The glob pattern needs update accordingly. It is
expected to fix tests running in some BSD jails.
This will ensure new extensions are consistent and `hg help -e` has a
consistent output.
I have to add a new group since the normal "pypats" will be filtered by
"pyfilters", which will remove comments and docstrings.
Previously, chg.c maintains the pagerpid. Let's move it to procutil.c.
Note: chg.c still have a pagerpid to decide whether to call attachio or not.
In the future, attachio may be moved from hgc_open to hgc_runcommand, and
hgc_runcommand handles both pager and attachio so we don't need to run
attachio twice. And chg.c will be free of pagerpid.
In the future hgclient will deal with pager directly inside runcommand, so
related signal handling stuff needs to be decoupled from chg.c.
The signal handling and pager logic are coupled because we need to forward
SIGPIPE when pager exits. So they are moved together, otherwise a global
variable (pagerpid) is inevitable.
This patch moves related functions from chg.c to procutil.c, which was
marked as copied to maintain annotate history.
The move is done without code modification for easy review, therefore
`#include "procutil.c"` was introduced temporarily.
$XDG_RUNTIME_DIR [1] is a better place for user daemons. Let's use it and
fallback to $TMPDIR.
After this patch, chg will try socket paths in the following order:
1. $CHGSOCKNAME
2. $XDG_RUNTIME_DIR/chg/server
3. ${TMPDIR:-tmp}/chg$UID/server
[1]: https://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html
"sizeof(sun_path)" is too small. Use the chdir trick to support long socket
path, like "mercurial.util.bindunixsocket".
It's useful for cases where TMPDIR is long. Modern OS X rewrites TMPDIR to a
long value. And we probably want to use XDG_RUNTIME_DIR [2] for Linux.
The approach is a bit different from the previous plan, where we will have
hgc_openat and pass cmdserveropts.sockdirfd to it. That's because the
current change is easier: chg has to pass a full path to "hg" as the
"--address" parameter. There is no "--address-basename" or "--address-dirfd"
flags. The next patch will remove "sockdirfd".
Note: It'd be nice if we can use a native "connectat" implementation.
However, that's not available everywhere. Some platform (namely FreeBSD)
does support it, but the implementation has bugs so it cannot be used [2].
[1]: https://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html
[2]: https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-April/082892.html
We have our own bytes versions of things like, getopt.getopt, os.sep, os.name,
sys.executable, os.environ and few more for python 3 portability. Its better
to come up with warnings if someone breaks the things which we have fixed.
After this patch, check-code will warn us to use our bytes version.
These checks run on mercurial/ and hgext/ and pycompat.py is excluded.
See the previous two patches for the reason. The advantage is a simplified
code base and better throughput when starting multiple servers with multiple
confighashes. The disadvantage is starting multiple servers in parallel with
a single confighash will waste some CPU time, which is probably fine in
common use-cases.
This makes it easier to switch to relative paths to support long unix domain
socket paths.
See the previous patch for motivation. Previously, the server is started at
a globally shared address. This patch appends pid to the address so it
becomes unique.
Note: with Linux pid namespace, the address may be non-unique, but it does
not affect correctness of chg - chg client will receive an redirection and
that's it.
Before this change, you could get in a start where the checker would either
complain about importing local module before stdlib one or complain about the
local one being wrongly lexically sorted with the stdlib one.
We detect the boundary and avoid complaining about lexical sort across it.
ui.load() has been available since d83ca854 and at the time of writing isn't
available on stable branch breaking benchmarking newer stable revisions.
Add historical portability policy note on contrib/benchmarks
In the next patch, we will be creating a bytes version of getopt.getopt() and
doing that will leave getopt as unused import in fancyopts. So before removing
that there are instances in codebase where instead of importing getopt, we
have used fancyopts.getopt. This patch will switch all those cases so that
the next patch can remove the import of getopt from fancyopts without breaking
things.
This allows us to write doctests depending on a ui object, but not on global
configs.
ui.load() is a class method so we can do wsgiui.load(). All ui() calls but
for doctests are replaced with ui.load(). Some of them could be changed to
not load configs later.
See the commit message of the previous patch for the reason. In short,
according to the current POSIX standard, "-r" is "removed", and "-R" is the
current standard way to do "copy file hierarchies".
It was an extension just because there were several dependency cycles I
needed to address.
I don't add 'chgserver' to extensions._builtin since chgserver is considered
an internal extension so nobody should enable it by their config.
Upcoming commits will teach revlogs to leverage the new compression
engine API so that new compression formats can more easily be
leveraged in revlogs. We want to be sure this refactoring doesn't
regress performance. So this commit introduces "perfrevchunks" to
explicitly test performance of reading, decompressing, and
recompressing revlog chunks.
Here is output when run on the mozilla-unified repo:
$ hg perfrevlogchunks -c
! read
! wall 0.346603 comb 0.350000 user 0.340000 sys 0.010000 (best of 28)
! read w/ reused fd
! wall 0.337707 comb 0.340000 user 0.320000 sys 0.020000 (best of 30)
! read batch
! wall 0.013206 comb 0.020000 user 0.000000 sys 0.020000 (best of 221)
! read batch w/ reused fd
! wall 0.013259 comb 0.030000 user 0.010000 sys 0.020000 (best of 222)
! chunk
! wall 1.909939 comb 1.910000 user 1.900000 sys 0.010000 (best of 6)
! chunk batch
! wall 1.750677 comb 1.760000 user 1.740000 sys 0.020000 (best of 6)
! compress
! wall 5.668004 comb 5.670000 user 5.670000 sys 0.000000 (best of 3)
$ hg perfrevlogchunks -m
! read
! wall 0.365834 comb 0.370000 user 0.350000 sys 0.020000 (best of 26)
! read w/ reused fd
! wall 0.350160 comb 0.350000 user 0.320000 sys 0.030000 (best of 28)
! read batch
! wall 0.024777 comb 0.020000 user 0.000000 sys 0.020000 (best of 119)
! read batch w/ reused fd
! wall 0.024895 comb 0.030000 user 0.000000 sys 0.030000 (best of 118)
! chunk
! wall 2.514061 comb 2.520000 user 2.480000 sys 0.040000 (best of 4)
! chunk batch
! wall 2.380788 comb 2.380000 user 2.360000 sys 0.020000 (best of 5)
! compress
! wall 9.815297 comb 9.820000 user 9.820000 sys 0.000000 (best of 3)
We already see some interesting data, such as how much slower
non-batched chunk reading is and that zlib compression appears to be
>2x slower than decompression.
I didn't have the data when I wrote this commit message, but I ran this
on Mozilla's NFS-based Mercurial server and the time for reading with a
reused file descriptor was faster. So I think it is worth testing both
with and without file descriptor reuse so we can make informed
decisions about recycling file descriptors.
Currently, no explicit garbage collection is performed when running
the microbenchmarks in `hg perf`. I think this is wrong because
garbage collection can have a significant impact on execution times.
And, if gc is triggered via the default heuristics, it will
fire effectively randomly during subsequent benchmark iterations
due to variable amount of garbage left over from previous runs.
Running a gc before invoking the measured function will help ensure
state is more consistent across all iterations.
Running 'Start-Process -Wait "vim" "+10" "filename"' from PowerShell
actually spawns a separate cmd window to run vim in. This looks ugly
and in most cases not what user wants. During my initial testing I was
using the Cmder app, which made me not notice this (it captures new
windows as new tabs).
This change is needed for cases when user does not put editmergeps.bat
directly into PATH, but rather uses 'merge-tools.editmergeps.executable'
config option to provide a full path to editmergeps.bat. In such cases,
editmergeps.ps1 cannot be run simply by name, it needs a full path. In
BATCH file %~dp0 stands for the directory in which the original file
is located.
We didn't have explicit microbenchmark coverage for loading revlog
indexes. That seems like a useful thing to have, so let's add it.
We currently measure the low-level nodemap APIs. There is room to
hook in at the actual revlog layer. This could be done as a follow-up.
The hackiest thing about this patch is specifying revlog paths.
Other commands have arguments that allow resolution of changelog,
manifest, and filelog. I needed to hook in at a lower level of
the revlog API than what the existing helper functions to resolve
revlogs allowed. I was too lazy to write some new APIs. This could
be done as a follow-up easily enough.
Example output for `hg perfrevlogindex 00changelog.i` on my
Firefox repo (404418 revisions):
! revlog constructor
! wall 0.003106 comb 0.000000 user 0.000000 sys 0.000000 (best of 912)
! read
! wall 0.003077 comb 0.000000 user 0.000000 sys 0.000000 (best of 924)
! create index object
! wall 0.000000 comb 0.000000 user 0.000000 sys 0.000000 (best of 1803994)
! retrieve index entry for rev 0
! wall 0.000193 comb 0.000000 user 0.000000 sys 0.000000 (best of 14037)
! look up missing node
! wall 0.003313 comb 0.000000 user 0.000000 sys 0.000000 (best of 865)
! look up node at rev 0
! wall 0.003295 comb 0.010000 user 0.010000 sys 0.000000 (best of 858)
! look up node at 1/4 len
! wall 0.002598 comb 0.010000 user 0.010000 sys 0.000000 (best of 1103)
! look up node at 1/2 len
! wall 0.001909 comb 0.000000 user 0.000000 sys 0.000000 (best of 1507)
! look up node at 3/4 len
! wall 0.001213 comb 0.000000 user 0.000000 sys 0.000000 (best of 2275)
! look up node at tip
! wall 0.000453 comb 0.000000 user 0.000000 sys 0.000000 (best of 5697)
! look up all nodes (forward)
! wall 0.094615 comb 0.100000 user 0.100000 sys 0.000000 (best of 100)
! look up all nodes (reverse)
! wall 0.045889 comb 0.050000 user 0.050000 sys 0.000000 (best of 100)
! retrieve all index entries (forward)
! wall 0.078398 comb 0.080000 user 0.060000 sys 0.020000 (best of 100)
! retrieve all index entries (reverse)
! wall 0.079376 comb 0.080000 user 0.070000 sys 0.010000 (best of 100)
We have a couple of commands beginning with "perfrevlog." The
actual "perfrevlog" command actually measures resolving the fulltext
of multiple revisions. So let's rename it to reflect what it actually
does.
Since extension modules aren't included in the list of source files, they
need to be populated somehow. Otherwise the import from cext/cffi would be
treated as a global one.
Loading the obsstore can become a large part of the time necessary to compute
the important volatile set. We add a flag purging all known obsstore related
data.
For example, computing the 'bumped' set currently requires reading the full
obsstore, so timing greatly differ with or without that flag:
Without:
! bumped
! wall 0.005047 comb 0.000000 user 0.000000 sys 0.000000 (best of 446)
With:
! bumped
! wall 0.512367 comb 0.510000 user 0.480000 sys 0.030000 (best of 15)
The goal is to get rid of the debugcommands -> commands dependency.
Since globalopts is the property of the commands, it's kept in the commands
module.
Using Start-Process -Wait makes it wait until the process finishes,
which is necesssary for Windows GUI applications. My short testing
also demonstrated that it does not hurt with command line vim.
cmdutil.command wasn't a member of the registrar framework only for a
historical reason. Let's make that happen. This patch keeps cmdutil.command
as an alias for extension compatibility.
This just adds a translation of existing contrib/editmerge to powershell.
It allows users on Windows to iteratively resolve conflicts in their
editor of choice.
# no-check-commit
I removed this in ff97c183abcf thinking it wasn't necessary. In fact,
we need to always pass a node so the code is compatible with revisions
before f4a6c9197dbd.
The new code uses a variable to avoid check-style complaining
about "r.revision(r.node(" patterns.
Per discussion on the mailing list and elsewhere, we've decided that
Python 2.6 is too old to continue supporting. We keep accumulating
hacks/fixes/workarounds for 2.6 and this is taking time away from
more important work.
So with this patch, we officially drop support for Python 2.6 and
require Python 2.7 to run Mercurial.
We don't need to measure the time it takes to open the revlog or
calculate its length.
This is more consistent with what other perf* functions do.
While I was here, I also renamed the revlog variable from "r" to
"rl" - again in the name of consistency.
cmdutil.openrevlog() may return a cached revlog instance. This /may/
be a recent "regression" due to refactoring of the manifest API. I'm
not sure.
Either way, this perf command was broken for at least manifests because
subsequent invocations of the perf function would get cache hits from
previous invocations, invalidating results. In the extreme case,
testing the last revision in the revlog resulted in near-instantanous
execution of subsequent runs (since the fulltext is cached). A time
of ~1us would be reported in this case.
This completes our rename of internal revlog methods to
distinguish between low-level raw revlog data "segments" and
higher-level, per-revision "chunks."
perf.py has been updated to consult both names so it will work
against older Mercurial versions.
To prepare for renaming revlog._chunkraw, we stuff a reference to this
metho in a local variable. This does 2 things. First, it moves the
attribute lookup outside of a loop, which more accurately measures
the time of the code being invoked. Second, it allows us to alias
to different methods depending on their presence (perf.py needs to
support running against old Mercurial versions).
Removing an attribute lookup from a tigh loop appears to shift the
numbers slightly with mozilla-central:
$ hg perfrevlogchunks -c
! read
! wall 0.354789 comb 0.340000 user 0.330000 sys 0.010000 (best of 28)
! wall 0.335932 comb 0.330000 user 0.290000 sys 0.040000 (best of 30)
! read w/ reused fd
! wall 0.342326 comb 0.340000 user 0.320000 sys 0.020000 (best of 29)
! wall 0.332857 comb 0.340000 user 0.290000 sys 0.050000 (best of 30)
! read batch
! wall 0.023623 comb 0.020000 user 0.000000 sys 0.020000 (best of 124)
! wall 0.023666 comb 0.020000 user 0.000000 sys 0.020000 (best of 125)
! read batch w/ reused fd
! wall 0.023828 comb 0.020000 user 0.000000 sys 0.020000 (best of 124)
! wall 0.023556 comb 0.020000 user 0.000000 sys 0.020000 (best of 126)
Previously, the "startrev" argument would be ignored due to
"startrev = 0" in the benchmark function. This meant that
`hg perfrevlog` always started at revision 0.
Rename the local variable to "beginrev" so the variable does the
right thing.
As the commit message for the previous changeset says, we wish
for zstd to be a 1st class citizen in Mercurial. To make that
happen, we need to enable Python to talk to the zstd C API. And
that requires bindings.
This commit vendors a copy of existing Python bindings. Why do we
need to vendor? As the commit message of the previous commit says,
relying on systems in the wild to have the bindings or zstd present
is a losing proposition. By distributing the zstd and bindings with
Mercurial, we significantly increase our chances that zstd will
work. Since zstd will deliver a better end-user experience by
achieving better performance, this benefits our users. Another
reason is that the Python bindings still aren't stable and the
API is somewhat fluid. While Mercurial could be coded to target
multiple versions of the Python bindings, it is safer to bundle
an explicit, known working version.
The added Python bindings are mostly a fully-featured interface
to the zstd C API. They allow one-shot operations, streaming,
reading and writing from objects implements the file object
protocol, dictionary compression, control over low-level compression
parameters, and more. The Python bindings work on Python 2.6,
2.7, and 3.3+ and have been tested on Linux and Windows. There are
CFFI bindings, but they are lacking compared to the C extension.
Upstream work will be needed before we can support zstd with PyPy.
But it will be possible.
The files added in this commit come from Git commit
e637c1b214d5f869cf8116c550dcae23ec13b677 from
https://github.com/indygreg/python-zstandard and are added without
modifications. Some files from the upstream repository have been
omitted, namely files related to continuous integration.
In the spirit of full disclosure, I'm the maintainer of the
"python-zstandard" project and have authored 100% of the code
added in this commit. Unfortunately, the Python bindings have
not been formally code reviewed by anyone. While I've tested
much of the code thoroughly (I even have tests that fuzz APIs),
there's a good chance there are bugs, memory leaks, not well
thought out APIs, etc. If someone wants to review the code and
send feedback to the GitHub project, it would be greatly
appreciated.
Despite my involvement with both projects, my opinions of code
style differ from Mercurial's. The code in this commit introduces
numerous code style violations in Mercurial's linters. So, the code
is excluded from most lints. However, some violations I agree with.
These have been added to the known violations ignore list for now.
zstd is a new compression format and it is awesome, yielding
higher compression ratios and significantly faster compression
and decompression operations compared to zlib (our current
compression engine of choice) across the board.
We want zstd to be a 1st class citizen in Mercurial and to eventually
be the preferred compression format for various operations.
This patch starts the formal process of supporting zstd by vendoring
a copy of zstd. Why do we need to vendor zstd? Good question.
First, zstd is relatively new and not widely available yet. If we
didn't vendor zstd or distribute it with Mercurial, most users likely
wouldn't have zstd installed or even available to install. What good
is a feature if you can't use it? Vendoring and distributing the zstd
sources gives us the highest liklihood that zstd will be available to
Mercurial installs.
Second, the Python bindings to zstd (which will be vendored in a
separate changeset) make use of zstd APIs that are only available
via static linking. One reason they are only available via static
linking is that they are unstable and could change at any time.
While it might be possible for the Python bindings to attempt to
talk to different versions of the zstd C library, the safest thing to
do is link against a specific, known-working version of zstd. This
is why the Python zstd bindings themselves vendor zstd and why we
must as well. This also explains why the added files are in a
"python-zstandard" directory.
The added files are from the 1.1.1 release of zstd (Git commit
4c0b44f8ced84c4c8edfa07b564d31e4fa3e8885 from
https://github.com/facebook/zstd) and are added without modifications.
Not all files from the zstd "distribution" have been added. Notably
missing are files to support interacting with "legacy," pre-1.0
versions of zstd. The decision of which files to include is made by
the upstream python-zstandard project (which I'm the author of). The
files in this commit are a snapshot of the files from the 0.5.0
release of that project, Git commit
e637c1b214d5f869cf8116c550dcae23ec13b677 from
https://github.com/indygreg/python-zstandard.
This broke in c7236da49964 due to a refactored manifest API.
The fix is a bit hacky - perfbdiff doesn't yet support tree manifests
for example. But it gets the job done.
A test has been added for --alldata so this doesn't happen again.
Airspeed velocity (ASV) is a python framework for benchmarking Python packages
over their lifetime. The results are displayed in an interactive web frontend.
Add ASV benchmarks for mercurial that use contrib/perf.py extension that could
be run against multiple reference repositories.
The benchmark suite now includes revsets from contrib/base-revsets.txt with
variants, perftags, perfstatus, perfmanifest and perfheads.
Installation requires asv>=0.2, python-hglib and virtualenv
This is part of PerformanceTrackingSuitePlan
https://www.mercurial-scm.org/wiki/PerformanceTrackingSuitePlan
Now that all the functionality has been moved to manifestlog/manifestrevlog/etc,
we can finally change all the uses of repo.manifest to use the new versions. A
future diff will then delete repo.manifest.
One additional change in this commit is to change repo.manifestlog to be a
@storecache property instead of @property. This is required by some uses of
repo.manifest require that it be settable (contrib/perf.py and the static http
server). We can't do this in a prior change because we can't use @storecache on
this until repo.manifest is no longer used anywhere.
The --all argument changes the behavior of `perfbdiff` to pull
in fulltext revision pairs for all changes related to a changeset.
The p1 and p2 manifests will be bdiffed against current. Every file
that changed between p1 and current will have its file revisions
loaded and bdiffed.
This mode of operation effectively measured the bdiff time required
for `hg commit`.
Before, we only supported benchmarking a single pair of texts
with bdiff. We want to enable feeding larger corpora into this
benchmark. So rewrite the code to support that.
Now that the 'vfs' classes moved in their own module, lets use the new module
directly. We update code iteratively to help with possible bisect needs in the
future.
SIGUSR1 and SIGUSR2 are reserved for user-defined behaviors. They may be
redefined by an hg extension [1], but cannot be easily redefined for chg.
Since the default behavior (kill) is not that useful for chg, let's forward
them to hg, hoping it got redefined there and could be more useful.
[1] https://bitbucket.org/facebook/hg-experimental/commits/e7c883a465
Now that the feature no longer lives in the extension, we document it in the
help of the core config. This include the new 'ui.color' option introduced in
the previous changesets.
As a result the color extensions can now be deprecated.
This is a documentation patch only; color is still disabled by default.
I'm adding some performance logging to ui.write - this benchmark lets us
confirm that the cost of that logging is acceptably low.
At this point, the microbenchmark on Linux over SSH shows:
! wall 3.213560 comb 0.410000 user 0.350000 sys 0.060000 (best of 4)
while on the Mac locally, it shows:
! wall 0.342325 comb 0.180000 user 0.110000 sys 0.070000 (best of 20)
Commit 3054ae3a66112970a091d3939fee32c2d0c1a23e from
https://github.com/indygreg/python-zstandard is imported without
modifications (other than removing unwanted files).
The vendored zstd library within has been upgraded from 1.1.2 to
1.1.3. This version introduced new APIs for threads, thread
pools, multi-threaded compression, and a new dictionary
builder (COVER). These features are not yet used by
python-zstandard (or Mercurial for that matter). However,
that will likely change in the next python-zstandard release
(and I think there are opportunities for Mercurial to take
advantage of the multi-threaded APIs).
Relevant to Mercurial, the CFFI bindings are now fully
implemented. This means zstd should "just work" with PyPy
(although I haven't tried). The python-zstandard test suite also
runs all tests against both the C extension and CFFI bindings to
ensure feature parity.
There is also a "decompress_content_dict_chain()" API. This was
derived from discussions with Yann Collet on list about alternate
ways of encoding delta chains.
The change most relevant to Mercurial is a performance enhancement in
the simple decompression API to reuse a data structure across
operations. This makes decompression of multiple inputs significantly
faster. (This scenario occurs when reading revlog delta chains, for
example.)
Using python-zstandard's bench.py to measure the performance
difference...
On changelog chunks in the mozilla-unified repo:
decompress discrete decompress() reuse zctx
1.262243 wall; 1.260000 CPU; 1.260000 user; 0.000000 sys 170.43 MB/s (best of 3)
0.949106 wall; 0.950000 CPU; 0.950000 user; 0.000000 sys 226.66 MB/s (best of 4)
decompress discrete dict decompress() reuse zctx
0.692170 wall; 0.690000 CPU; 0.690000 user; 0.000000 sys 310.80 MB/s (best of 5)
0.437088 wall; 0.440000 CPU; 0.440000 user; 0.000000 sys 492.17 MB/s (best of 7)
On manifest chunks in the mozilla-unified repo:
decompress discrete decompress() reuse zctx
1.367284 wall; 1.370000 CPU; 1.370000 user; 0.000000 sys 274.01 MB/s (best of 3)
1.086831 wall; 1.080000 CPU; 1.080000 user; 0.000000 sys 344.72 MB/s (best of 3)
decompress discrete dict decompress() reuse zctx
0.993272 wall; 0.990000 CPU; 0.990000 user; 0.000000 sys 377.19 MB/s (best of 3)
0.678651 wall; 0.680000 CPU; 0.680000 user; 0.000000 sys 552.06 MB/s (best of 5)
That should make reads on zstd revlogs a bit faster ;)
# no-check-commit
According to the specification [1], $XDG_RUNTIME_DIR should be ignored
unless:
The directory MUST be owned by the user, and he MUST be the only one
having read and write access to it. Its Unix access mode MUST be 0700.
This patch adds a check and ignores it if it does not meet part of the
criteria.
[1]: https://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html
Previously, the code was similar to what revlog._chunks() was doing,
which took a raw data segment and delta chain, obtained buffers for
the raw revlog chunks within, and decompressed them.
This commit splits the "get raw chunks" action from "decompress." The
goal of this change is to more accurately measurely decompression
performance.
On a ~50k deltachain for a manifest in mozilla-central:
! full
! wall 0.430548 comb 0.440000 user 0.410000 sys 0.030000 (best of 24)
! deltachain
! wall 0.016053 comb 0.010000 user 0.010000 sys 0.000000 (best of 181)
! read
! wall 0.008078 comb 0.010000 user 0.000000 sys 0.010000 (best of 362)
! rawchunks
! wall 0.033785 comb 0.040000 user 0.040000 sys 0.000000 (best of 100)
! decompress
! wall 0.327126 comb 0.320000 user 0.320000 sys 0.000000 (best of 31)
! patch
! wall 0.032391 comb 0.030000 user 0.030000 sys 0.000000 (best of 100)
! hash
! wall 0.012587 comb 0.010000 user 0.010000 sys 0.000000 (best of 233)
Now, all URL in Mercurial source tree should refer mercurial-scm.org
domain instead of selenic.com.
*.po files are ignored in this patch, because they might contain
msgid/msgstr coming from old source files.
This ignorance seems safe enough, because such msgstr should be
ignored at runtime, because:
- msgid corresponded to it should be invalid, or
- msgstr itself should be marked as fuzzy at synchronized to recent hg.pot
If any additional examination for *.po files is needed in the future,
let i18n/check-translation.py achieve such examination.
BTW, some binary files (e.g. *.png) are meaningless for checking
reference to old domain in this patch, but aren't ignored like as *.po
files, because excluding multiple suffixes is difficult for regexp
matching.
Before this patch, check-code.py applies filtering on the file
content, to which filtering of previous check is already applied.
This might hide issues, which should be detected by a subsequent check
in "checks" list.
Fortunately, this problem hasn't appeared, because there is no
overlapping of filename matching (examined in the order below).
1. *.py or *.cgi
2. test-* (not *.t suffix)
3. *.c or *.h
4. *.t
5. *.txt
6. *.tmpl
For example, adding a test, which wants to examine raw comment text in
*.py files, at the end of current "checks" list doesn't work as
expected, because a filter for *.py files normalizes comment text in
them.
Putting such test at the beginning of "checks" list also resolves this
problem, but such dependence on the order decreases maintainability of
check-code.py itself.
This patch discards filtering result of previous check at the
beginning of each checks, for independence of each checks.