We cannot just ask perfrevset to provide debug output because we usually want
to compare output from old version of Mercurial that do not support it. So, we
are using a regular expression.
(/we now have \d problems/).
This makes the root install folder (on Windows) nice and tidy. The
only files left in the root folder are:
hg.exe
python27.dll
COPYING.rtf
ReadMe.html
the last of which was probably out-of-date 7 years ago
\s is equivalent to the character class [ \t\n\r\f\v]. Using \s+ in
a regular expression against input with multiple lines may match across
multiple lines.
For the regexp in question, "\+\s+" would match "+\n " and similar
sequences, leading to false positives for functions that were included
in diff context, after a modified hunk.
In their infinite wisdom, the Python maintainers stripped bytes of its
% and format() methods for 3.x. They've now added % back to 3.5, but
format() is still missing. Since we don't have any particular need for
it, we should keep avoiding it.
Extension authors (notably at companies using hg) have been
cargo-culting the `testedwith = 'internal'` bit from hg's own
extensions, which then defeats our "file bugs over here" logic in
dispatch. Let's be more aggressive about trying to give extension
authors a hint about what testedwith should say.
The previous patch ensures all module names are recorded in `imports`
as absolute names, so we no longer need to treat modules as ones
imported relatively from the target source if they appear to not be
from the stdlib.
This patch makes `imported_modules()` always yield absolute
`dotted_name_of_path()`-ed name by strict detection with
`fromlocal()`.
This change improves circular detection in some points:
- locally defined modules, of which name collides against one of
standard library, can be examined correctly
For example, circular import related to `commands` is overlooked
before this patch.
- names not useful for circular detection are ignored
Names below are also yielded before this patch:
- module names of standard library (= not locally defined one)
- non-module names (e.g. `node.nullid` of `from node import nullid`)
These redundant names decrease performance of circular detection.
For example, with files at 13dc86d189c9, average loops per file in
`checkmod()` is reduced from 165 to 109.
- `__init__` can be handled correctly in `checkmod()`
For example, current implementation has problems below:
- `from xxx import yyy` doesn't recognize `xxx.__init__` as imported
- `xxx.__init__` imported via `import xxx` is treated as `xxx`,
and circular detection is aborted, because `key` of such
module name is not `xxx` but `xxx.__init__`
- it is easy to enhance for `from . import xxx` style or so (in the
future)
Module name detection in `imported_modules()` can use information
in `ast.ImportFrom` fully.
It is assumed that all locally defined modules are correctly specified
to `import-checker.py` at once.
Strictly speaking, when `from foo.bar.baz import module1` imports
`foo.bar.baz.module1` module, current `imported_modules()` yields only
`foo.bar.baz.__init__`, even though also `foo.__init__` and
`foo.bar.__init__` should be yielded to detect circular import
exactly.
But this limitation is reasonable one for improvement in this patch,
because current `__init__` files in Mercurial seems to be implemented
carefully.
`fromlocalfunc()` uses:
- `modulename` (of the target source) to compose absolute module
name imported relatively from it
It is assumed that `modulename` is an `dotted_name_of_path()`-ed
source file, which may have `.__init__` at the end of it.
This assumption makes composing `prefix` of relative name easy.
- `localmods` to examine whether there is a locally defined (=
Mercurial specific) module matching against the specified name
It is assumed that module names not existing in `localmods` are
ones of Python standard library.
We now have a lock triggered for any transaction. We use it to ensure no-read
are made in read-only mode. We need more that just "no changegroup is added",
since bundle2 allows for more than just changegroup to be exchanged. We still
protect pushkey as it may write data without opening a transaction.
This is a preparation for subsequent patches, which expect that all
locally defined (= mercurial specific) modules are already known
before examinations.
Looping twice for specified modules is a little redundant, but
reasonable cost for improvement in subsequent patches.
Before this patch, "import-check.py" is invoked via "xargs" in
"test-module-imports.t", but it doesn't ensure that
"import-checker.py" is certainly invoked with all mercurial specific
files at once.
"xargs" may invoke specified command multiple times with part of
arguments given from stdin: according to "xargs(1)" man page, this
dividing arguments is system-dependent.
This patch adds "xargs" like mode to "import-checker.py".
This can ensure that "import-checker.py" is certainly invoked with all
mercurial specific files at once in "test-module-imports.t". This is
assumed by subsequent patches.
We are about to drop 2.4 requirement in Mercurial's setup.py, we bump rpm
dependency first for the sake of smaller changeset. Clean up of the spec file
can come after the dependency is actually dropped.
Future work will allow us to use docker to build debs.
Right now this doesn't install any config files. I plan to do that as
a followup, but getting something basic and working checked in seems
like more of a priority than getting everything done in one big step.
This also does not create a source deb yet. I haven't looked into that
process.
Note that this declares incompatibility with the `mercurial-common`
package. It's typical for debian packages to be split between
architecture-independent bits and native bits, meaning the python bits
downstream live in mercurial-common and the c extension bits live in
mercurial. We don't do that because we want to (ideally) give users a
single deb file to install.
This allows me to build rpm packages using boot2docker on my Mac. It's
probably a very fragile hack, but it seems to work well enough for now
that I felt it was worth sharing.
I'm about to start interacting with docker for Debian packaging too,
so it's time to centralize this so that any bugfixes I figure out
apply to both codepaths.
We were manually creating a base with explicit subset testing. We should let
smartset magic happen and optimise that logic if needed.
benchmark show some massive speedup when "parents set" is huge and "subset" is
small.
revset: 42:68 and roots(42:tip)
0) wall 0.011322 comb 0.010000 user 0.010000 sys 0.000000 (best of 161)
1) wall 0.002282 comb 0.010000 user 0.010000 sys 0.000000 (best of 1082)
Minor speedup in simple case (were fullreposet helps)
revset: roots(0::tip)
0) wall 0.095688 comb 0.100000 user 0.100000 sys 0.000000 (best of 85)
1) wall 0.084448 comb 0.080000 user 0.080000 sys 0.000000 (best of 95)
revset: roots((0:tip)::)
0) wall 0.146752 comb 0.140000 user 0.140000 sys 0.000000 (best of 58)
1) wall 0.143538 comb 0.140000 user 0.140000 sys 0.000000 (best of 59)
And small overhead then the "parents set" is fairly complicated (transforming it
into a revset once and for all appears to be faster).
revset: roots((tip~100::) - (tip~100::tip))
0) wall 0.004652 comb 0.010000 user 0.010000 sys 0.000000 (best of 544)
1) wall 0.004878 comb 0.010000 user 0.010000 sys 0.000000 (best of 479)
revset: roots((0::) - (0::tip))
0) wall 0.146587 comb 0.150000 user 0.150000 sys 0.000000 (best of 53)
1) wall 0.157192 comb 0.160000 user 0.160000 sys 0.000000 (best of 53)
revset: first(roots((0::) - (0::tip)))
0) wall 0.152924 comb 0.150000 user 0.150000 sys 0.000000 (best of 57)
1) wall 0.153192 comb 0.160000 user 0.160000 sys 0.000000 (best of 55)
It should not be included in the Windows installers because it prevents
loading CA certificates from the system store on Python 2.7.9, implemented
by b572b9736ab0. The msi packages bundles Python 2.7.9, so cacert.pem is no
longer necessary.
Backed out changeset 7c9264a028c0
The check-commit script search for "bug" withing bracket and ask people to use
(issueXXXX) instead. The test was too wide and matching any "(+b+u+g"sequence.
These are Windows dlls, and eliminate the following import check diffs that are
not on Unix:
mercurial/changegroup.py mixed imports
stdlib: os, struct, tempfile, zlib
relative: bz2
mercurial/encoding.py mixed imports
stdlib: locale, os
relative: unicodedata
Breadth-first allows finding the shortest cycle including the starting
module. This lets us terminate our search early when we've discovered
shorter paths already. This gives a tremendous speed-up to the
cycle-finding portion of the test, dropping total runtime from 39s to
3s.
Although Python supports `X = Y if COND else Z`, this was only
introduced in Python 2.5. Since we have to support Python 2.4, it was
a very common thing to write instead `X = COND and Y or Z`, which is a
bit obscure at a glance. It requires some intricate knowledge of
Python to understand how to parse these one-liners.
We change instead all of these one-liners to 4-liners. This was
executed with the following perlism:
find -name "*.py" -exec perl -pi -e 's,(\s*)([\.\w]+) = \(?(\S+)\s+and\s+(\S*)\)?\s+or\s+(\S*)$,$1if $3:\n$1 $2 = $4\n$1else:\n$1 $2 = $5,' {} \;
I tweaked the following cases from the automatic Perl output:
prev = (parents and parents[0]) or nullid
port = (use_ssl and 443 or 80)
cwd = (pats and repo.getcwd()) or ''
rename = fctx and webutil.renamelink(fctx) or []
ctx = fctx and fctx or ctx
self.base = (mapfile and os.path.dirname(mapfile)) or ''
I also added some newlines wherever they seemd appropriate for readability
There are probably a few ersatz ternary operators still in the code
somewhere, lurking away from the power of a simple regex.
At the moment, check-commit will complain about the topic being capitalized,
but not the summary that comes after it. This diff corrects that deficiency.
The packages has to be installed by root but they would be installed
insecurely, owned by the uid of the unprivileged user that made the package.
The local user with that uid could thus write to /usr/local/bin/hg .
bdist_mpkg calls out to pax to create the package, but pax do apparently not
have the power to control what it is writing.
Instead, patch the pax files and set their uid fields to 0 before they are
wrapped in a dmg.
Before this patch, "reverting subrepo subrepo/path" lines in *.t test
files require "(glob)", because such lines are recognized as
"reverting path/to/managed/file" by "check-code.py".
On the other hand, "(glob)" for such "reverting ..." line is
recognized as useless by "runt-tests.py", because subrepo paths shown
in such lines are always normalized by "util.pconvert". And this
causes "no result code from test" warning.
As a preparation for discarding "(glob)" from such lines in subsequent
patch, this patch avoids warning against them, by adding negative
lookahead assertion "(?!subrepo )" to the regexp.
When "hg.bat" is invoked via interactive shell "cmd.exe" on Windows,
it can store own exit code into ERRORLEVEL correctly, regardless of
explicit "exit" statement in it: "cmd.exe" seems to hold ERRORLEVEL
updated by the last command in the batch file (= "python hg", in
"hg.bat" case).
On the other hand, "hg.bat" is invoked indirectly via
"subprocess.Popen" (e.g. shell alias, hooks, hgclient and so on), the
parent process always receives exit code 0 from spawned "hg.bat":
batch files on Windows seem not to be really spawned like as shell
scripts on UNIX, but to be executed in the "cmd.exe" process.
This patch returns exit code explicitly for indirect invocation.
"/b" should be specified for "exit" to prevent "cmd.exe" from being
terminated when "hg.bat" is invoked interactively from it.
This change touches every module in which repository.sopener was being used, and
changes it for the equivalent repository.svfs.
It should now be possible to remove localrepo.sopener.
When generating many new files into a set of many possible new directories,
there is the possibility that the same path is chosen as both file and
directory. How likely this is depends on the size of the dictionary used,
the generated directory structure and the number of generated files.
Now that we have decided on the use of 'name' instead of 'label' we rename this
function accordingly.
The old method 'debuglabelcomplete' has been left as a deprecated command so
that current scripts don't break.
The recursive addremove operation occurs completely before the first subrepo is
committed. Only hg subrepos support the addremove operation at the moment- svn
and git subrepos will warn and abort the commit.
I frequently find myself wanting to run hgweb in a production-like
environment, with a real HTTP server and multiple WSGI workers.
This patch introduces a Docker environment for running Mercurial
under Apache + mod_wsgi. With just a few command executions, it is
possible to spin up a Docker container running hgweb.
The container is tailored for Mercurial developers wanting to run
Mercurial from a source checkout. It is **not** meant to be something
suitable for production use.
The container provides a default hgweb environment with an empty
repository that allows pushes. You can thus start a container and push
your favorite repository there for quick testing.
The container is designed to allow customizations. Users can provide
their own hgweb configurations and mount existing directories containing
repositories into the container.
The behavior of the container and how to control things is documented in
the README.rst file.
The goal is to allow access to file outside ofthe store directory from the
transaction. The obvious target are the `bookmarks` file. But we can envision
usage for cache too.
We keep passing a main opener explicitly because a lot of code rely on this
default opener. The main opener (operating on store) is using an empty key ''.
We use a `formatter` object in the perf extensions. This allow the use of
formatted output like json. To avoid adding logic to create a formatter and pass
it around to the timer function in every command, we add a `gettimer` function
in charge of returning a `timer` function as simple as before but embedding an
appropriate formatter.
This new `gettimer` function also return the formatter as it needs to be
explicitly closed at the end of the command.
example output:
$ hg --config ui.formatjson=True perfvolatilesets visible obsolete
[
{
"comb": 0.02,
"count": 126,
"sys": 0.0,
"title": "obsolete",
"user": 0.02,
"wall": 0.0199398994446
},
{
"comb": 0.02,
"count": 117,
"sys": 0.0,
"title": "visible",
"user": 0.02,
"wall": 0.0250301361084
}
]
The merge tool configuration is an essential part of a good initial user
experience. 'make osx' installers and direct 'make' installation did not have
merge tool configuration. Now they have.
Note: The installer fixes for windows have been done blindly and might require
additional changes.
The phrase "revision or range" comes from a pre-revset era. Since the
documentation for ranges now is under the revset docs, and as a
helpful hint nudging users towards revsets, I think it's better to say
"revision or revset"
Before this patch, "import-checker.py" just replaces "/" in specified
filenames by ".". This makes modules for pure Python build belong to
"mercurial.pure" package, and prevents "import-checker.py" from
correctly checking about cyclic dependency in them.
This patch discards "pure" component from fully qualified name of such
modules.
To avoid discarding "pure" from the module name of standard libraries
unexpectedly, this patch allows "dotted_name_of_path" to discard
"pure" only from Mercurial specific modules, which are specified via
command line arguments.
Before this patch, "import-checker.py" assumes that the name of
Mercurial module recognized by "imported_modules" doesn't have package
part: for example, "util".
This is reason why "import-checker.py" always builds fully qualified
module name up relatively, if the given module doesn't belong to
standard Python library.
But in fact, modules imported in "from mercurial import XXXX" style
already have fully qualified name: for example, "mercurial.util"
module imported by "mercurial.parsers" is treated as
"mercurial.mercurial.util" because of building module name up
relatively.
This prevents "import-checker.py" from correctly checking about cyclic
dependency in them.
This patch avoids building module name up relatively, also if module
name starts with "mercurial.", to treat modules imported in "from
mercurial import XXXX" style correctly.
Augments the analyze command to additionally walk the repo's current
directory structure (or of any directory tree), counting how many files
appear in which paths. This data is saved in the repo model to be used
by synthesize, for creating an initial commit with many files.
This change is aimed at developing, testing and measuring scaling
improvements when importing/converting a large repository to mercurial.
Augments the synthesize command to use an additional parameter to the analyzed
repo model: the number of files in each directory at a given snapshot. Before
synthesizing history, an arbitrary number of files will be generated in a
distribution matching the analyzed directory structure.
Intended for developing, testing and measuring scaling improvements when
importing/converting a large repository to Mercurial.
This prepares for porting test-commandserver.py to .t test.
Though command-server test needs many Python codes, .t test will be more
readable than .py test thanks to inlined output.
If `hg analyze` is run on a revision set which contains no merges, then
`hg synthesize` will raise IndexError trying to select from p2distance,
which will be empty.
The existing roots(x - y) revset only considered the most recent 100
revisions. This was a good start. But expanding it to the full history
of the repository can dramatically increase execution time and thus
constitutes a useful benchmark.
The histedit command uses a revset like:
(_intlist('1234\x001235')) and merge()
Previously the optimizer gave a weight of 1.5 to the _intlist side (1 for the
function, 0.5 for the string) which caused it to process the merge() side first.
This caused it to evaluate merge against every commit in the repo, which took
2.5 seconds on a large repo.
I changed the weight of _intlist to 0, since it's a trivial calculation, which
makes it process intlist first, which makes merge apply only to the revs in the
list. Which makes the revset take 0.15 seconds now. Cutting off 2.4 seconds off
our histedit performance.
>From the revset benchmark:
revset #25: (_intlist('20000\x0020001')) and merge()
0) obsolete feature not enabled but 54243 markers found!
! wall 0.036767 comb 0.040000 user 0.040000 sys 0.000000 (best of 100)
1) obsolete feature not enabled but 54243 markers found!
! wall 0.000198 comb 0.000000 user 0.000000 sys 0.000000 (best of 9084)
Strip executes a revset like this:
max(parents(_intlist('1234\x001235')) - _intlist('1234\x001235'))
Previously the parents() revset would do 'subset & parents' which iterates over
each item in the subset and checks if it's in parents. subset is usually the
entire repo (a spanset) so this takes a while.
Reversing the parameters to be 'parents & subset' means the operation becomes
O(number of parents) instead of O(size of repo). It also means the result gets
evaluated immediately (since parents isn't a lazy set), but I think this is a
win in most scenarios.
This shaves 0.3 seconds off strip (amend/histedit/rebase/etc) for large repositories.
revset #0: parents(20000)
0) obsolete feature not enabled but 54243 markers found!
! wall 0.006256 comb 0.010000 user 0.010000 sys 0.000000 (best of 289)
1) obsolete feature not enabled but 54243 markers found!
! wall 0.000391 comb 0.000000 user 0.000000 sys 0.000000 (best of 4323)
Previously descendants() would force the provided subset to become a set. In
the case of revsets like '(%ld::) - (%ld)' (as used by histedit) this would
force the '- (%ld)' set to be evaluated, which produced a set containing every
commit in the repo (except %ld). This takes 0.6s on large repos.
This changes descendants to trust the subset to implement __contains__
efficiently, which improves the above revset to 0.16s. Shaving 0.4 seconds off
of histedit.
revset #27: (20000::) - (20000)
0) obsolete feature not enabled but 54243 markers found!
! wall 0.023640 comb 0.020000 user 0.020000 sys 0.000000 (best of 100)
1) obsolete feature not enabled but 54243 markers found!
! wall 0.019589 comb 0.020000 user 0.020000 sys 0.000000 (best of 100)
This commit removes the final revset related perf hotspot from histedit.
Combined with the previous two patches, they shave a little over 3 seconds off
histedit on large repos.
The internal commit API was changed in 2eef89bfd70d to expect None from the
filectx function when a file is to be deleted, not an IOError. This change
keeps synthrepo up-to-date.
Simplifies the rpm build process.
We will use platform specific rpmbuild directories and will not clean them and
will drop the explicit copy to build directory.
Docker can be run by ordinary users if they are in the docker group. The build
process would however be run as a root user, only protected by the sandboxing.
That caused problems with the shared directory where rpmbuild would be picky
about building from sources owned by less privileged users and producing files
owned by root.
Instead, add a build user with the right uid/gid to the image and run the
docker process as that user.
The added revset is used by obsolescence and currently results in
recursion in __contains__ between 2 lazysets. We should have
coverage of this revset.